cover
Contact Name
Paska Marto Hasugian
Contact Email
paskamartohasugian@students.usu.ac.id
Phone
-
Journal Mail Official
editorjournal@seaninstitute.or.id
Editorial Address
Komplek New Pratama ASri Blok C, No.2, Deliserdang, Sumatera Utara, Indonesia
Location
Unknown,
Unknown
INDONESIA
"Journal of Data Science
Published by SEAN INSTITUTE
ISSN : -     EISSN : 30252792     DOI : https://doi.org/10.58471
The "Journal of Data Science" is a real journal that focuses on the field of data science. It covers a wide range of topics related to data analysis, machine learning, statistics, data mining, and related areas. The journal aims to publish high-quality research papers, reviews, and technical notes that contribute to the advancement of data science.The Journal of Data Science welcomes submissions from researchers, academics, and practitioners working in the field of data science. It provides a platform for sharing novel research findings, methodologies, algorithms, and applications in various domain
Articles 28 Documents
Career Pattern Analysis of SMKN 1 Stabat Graduates Using K-Means Clustering Algorithm on Tracer Study Dataset Ibrahim Ibrahim; Muhammad Iqbal
Journal Of Data Science Vol. 3 No. 01 (2025): Journal Of Data Science, March 2025
Publisher : Sean Institute

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.58471/jds.v3i01.6543

Abstract

Tracer study is a method commonly used to determine the condition of graduates of an educational institution, including the career patterns they pursue. This study aims to analyze the career patterns of SMKN 1 Stabat graduates by utilizing the K-Means clustering algorithm. The dataset was obtained from the results of a tracer study of 287 alumni of SMKN 1 Stabat. The dataset used came from a tracer study conducted on graduates in the last five years. By grouping data using K-Means, it is hoped that specific patterns can be found that can help schools improve the quality of learning and student work readiness.[4] The results of the analysis show several dominant career pattern groups, such as the industrial sector, entrepreneurship, and further education.
Comparison and Evaluation of Euclidean Distance and Arccosine Distance in Adaptive K-Means Clustering Algorithm for Penguin Species Clustering Herlina Br Nainggolan; Pandi Barita Nauli Simangungsong
Journal Of Data Science Vol. 3 No. 02 (2025): Journal Of Data Science, September 2025
Publisher : Sean Institute

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.58471/jds.v3i2.6890

Abstract

Clustering is an important method in unsupervised learning for grouping data based on similarity of characteristics. This study aims to cluster penguin species based on weight, height, and wing length attributes using the K-Means algorithm with two distance approaches: Euclidean and Arccosine. The dataset consists of 342 data points after cleaning. Evaluation results show that the Arccosine distance yields a clustering accuracy of 89.6%, higher than the Euclidean distance at 63.09%. This indicates that Arccosine is more optimal for classifying penguin species.
Comparison and Evaluation of Euclidean Distance and Dice Distance in the K-Means Adaptive Algorithm for Clustering Composite Indexes of Food Security and Vulnerability Maps Emma Romasta Naulina Nainggolan; Paska Marto Hasugian
Journal Of Data Science Vol. 3 No. 02 (2025): Journal Of Data Science, September 2025
Publisher : Sean Institute

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.58471/jds.v3i2.6941

Abstract

This study aims to compare and evaluate the effectiveness of two distance measurement methods, namely Euclidean Distance and Dice Distance, in the K-Means Adaptive algorithm for clustering Food Security and Vulnerability Composite Index data. The dataset used includes index data from 2022 to 2024, comprising 305 entries, which were then cleaned to 298 entries. The evaluation was conducted manually using a sample dataset and automatically using the entire dataset via Google Colab with Python. The algorithm's performance was assessed using the Silhouette Score metric to measure the quality of the resulting clusters. The evaluation results showed that the Euclidean method produced an average Silhouette Score of 0.3082, indicating an suboptimal cluster structure. This study concludes that the choice of distance method significantly influences clustering results, and selection should be tailored to the characteristics of the data.
Comparison and Evaluation of Euclidean Distance and Divergence in Adaptive K-Means Algorithm for Clustering Human Development Index of Indonesia Province Maria Claudia Purba; Zakarias Situmorang
Journal Of Data Science Vol. 3 No. 02 (2025): Journal Of Data Science, September 2025
Publisher : Sean Institute

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.58471/jds.v3i2.6942

Abstract

This research explores the application of the Adaptive K-Means clustering algorithm on Human Development Index (HDI) data across 34 provinces in Indonesia, comparing the performance of Euclidean and Divergence distance metrics. The HDI indicators used include life expectancy, years of schooling, and per capita expenditure. Data processing was conducted both manually on sample data and automatically using Python for the complete dataset. Results demonstrate that the choice of distance metric significantly impacts clustering effectiveness. Divergence outperformed Euclidean based on silhouette score evaluations, offering more representative cluster separation. Scatter plot visualizations tracked the iterative clustering process. The study contributes to optimizing clustering techniques for socio-economic indicators such as HDI.
Feature Engineering for Predictive Maintenance: Identifying Key Predictors of Machine Defects Using Machine Learning Chinedu Sebastian Ani; Godwin Harold Chukwuemeka; Uchendu Onwusoronye Onwurah
Journal Of Data Science Vol. 3 No. 02 (2025): Journal Of Data Science, September 2025
Publisher : Sean Institute

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.58471/jds.v3i2.7267

Abstract

In the modern industrial environments, the ability to predict equipment failure before it occurs is essential for minimizing downtime and maximizing operational efficiency. This research explores the use of feature engineering to identify key indicators of mechanical faults in a cement mill fan system. Vibration data were collected over 34 weeks from critical components of the fan and processed using several statistical techniques to extract relevant features. Various feature selection methods including Principal Component Analysis (PCA), Minimum Redundancy Maximum Relevance (mRMR), ReliefF, Chi-square, ANOVA, and Kruskal-Wallis were used to determine the most informative features. These features were then used to train and evaluate machine learning models, with neural networks demonstrating superior performance. Among all models, the neural network optimized with Chi-square-selected features achieved the highest classification accuracy, fastest prediction speed, and lowest misclassification cost. These results highlight the effectiveness of combining robust feature selection with deep learning methods for reliable fault detection and predictive maintenance in industrial systems.
Ground Acceleration Clustering Using Self-Organizing Map Method Siska Simamora; Amran Manalu; Paska Marto Hasugian
Journal Of Data Science Vol. 3 No. 02 (2025): Journal Of Data Science, September 2025
Publisher : Sean Institute

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.58471/jds.v3i2.7281

Abstract

Peak Ground Acceleration (PGA) is an important parameter in seismic studies because it is directly related to the level of shaking felt on the earth's surface. Analysis of ground acceleration data is needed to identify patterns, group regions based on their seismic characteristics, and support earthquake disaster mitigation efforts. This study uses the Self-Organizing Map (SOM) method, which is an unsupervised learning approach based on artificial neural networks that can map high-dimensional data into a two-dimensional map representation without losing its topological structure. The ground acceleration dataset used in this study consists of key seismic parameters such as depth, magnitude, source distance, and PGA values. The SOM learning process is carried out iteratively to produce a cluster map that groups earthquake data into several groups with different ground acceleration characteristics. The results show that the SOM method is able to identify ground acceleration distribution patterns more clearly than conventional methods, by producing clusters that represent variations in PGA from low to high. These findings can provide important contributions to earthquake risk mapping, regional spatial planning, and the formulation of more accurate disaster mitigation strategies.
Harnessing the Power of Pressurized Separation: Revolutionizing Crude Oil Processing and Storage for Optimal Performance Nnadikwe Johnson; Samuel Hanotu Kwelle; Nwosi Hezekiah Andrew
Journal Of Data Science Vol. 3 No. 02 (2025): Journal Of Data Science, September 2025
Publisher : Sean Institute

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.58471/jds.v3i2.7356

Abstract

The main goal of this research was to simulate a high-pressure (HP) separator to assess how changes in operational factors affect the properties of the products generated. The objective was to improve the efficiency of crude oil processing and storage by analyzing these impacts. The study involved comparing simulation outcomes from two software platforms, namely CHEMCAD and UniSim, to evaluate their effectiveness in modeling and optimizing the separation process.The research outcomes indicated a high level of agreement between the simulated results and actual industrial data, validating their accuracy and reliability. Furthermore, a comprehensive sensitivity analysis was carried out to fine-tune the process parameters, focusing on adjusting key gas stream properties such as temperature, pressure, and flow rate to optimize the separation process effectively. This analysis provided valuable insights into the system dynamics and highlighted areas for potential process enhancement. Notably, the study revealed that increasing the separator inlet pressure from 30 to 80 bar resulted in significant improvements inThe adjustment in separator inlet pressure resulted in a notable reduction in the outlet gas flow rate from 1202 to 871.15 kmol/h, accompanied by an increase in the methane mole fraction from 0.69 to 0.74. Moreover, the rise in pressure led to an escalation in the preheater heating duty from 8.71 to 11.48 GJ/h. Conversely, the simulation findings demonstrated that raising the temperature of the separator feed stream from 43 to 83 ◦C caused a surge in the outlet gas stream flow rate from 871.15 to 1142.98 kmol/h.Furthermore, the variation in temperature led to a decrease in the methane concentration in the gas output and consequently lowered the heating duty required by the heat exchanger. Additionally, the research findings indicated that augmenting the inlet feed flow rate did not yield a substantial effect on the methane gas concentration in the final product.
AI-Based Scheduling and Performance of Tertiary Hospitals in Onitsha, Anambra State Chinyere Ifeyinwa Ifeanyichukwu
Journal Of Data Science Vol. 3 No. 02 (2025): Journal Of Data Science, September 2025
Publisher : Sean Institute

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.58471/jds.v3i2.7389

Abstract

The study employed a descriptive survey research design to examine the impact of AI-based scheduling on hospital performance in Onitsha, focusing on Federal Medical Center Onitsha and Guinness Eye Clinic Onitsha. A stratified random sampling technique was used to select 60 respondents from doctors, nurses, administrative staff, patients, and security personnel. Data were collected using a four-cluster structured questionnaire rated on a four-point Likert scale. The instrument's validity was ensured through expert review, while reliability yielded Cronbach’s alpha values ranging from 0.79 to 0.86. Data collection lasted two weeks, and analysis involved descriptive statistics and ANOVA at a 0.05 significance level. This research found that there was a poor average rate of adoption of AI-based scheduling systems in tertiary hospitals in Onitsha (domain mean: 2.05-2.82 on a scale of 5). Mean of staff training was the highest (2.82) and replacement of manual systems with full replacement of manual systems showed the lowest (2.05). With respect to effect on service delivery, mean values ranged between 2.17 to 2.57 indicating little perceived effects. Some of the top challenges experienced by the participants included, data privacy (3.58), resistance to change (3.20), technical expertise (3.10). Nevertheless, solutions such as government funding (3.68) and staff training (3.50) had high levels of support, the means to increase AI adoption practically in a wide variety of ways. Finally, although the modern implementation and perceived efficiency of the AI-based scheduling of tertiary hospitals in Onitsha are minor, widespread awareness of its positive aspect has been realized. The ability to address highlighted barriers using specific policy interventions and institutional assistance may notably improve performance and efficiency in hospitals.

Page 3 of 3 | Total Record : 28