cover
Contact Name
Amri Muhaimin
Contact Email
amri.muhaimin.stat@upnjatim.ac.id
Phone
+6285132556260
Journal Mail Official
jasid@upnjatim.ac.id
Editorial Address
Gedung Giri Santika, Program Studi Sains Data, Fakultas Ilmu Komputer, Universitas Pembangunan Nasional "Veteran" Jawa Timur, Surabaya
Location
Kota surabaya,
Jawa timur
INDONESIA
Jurnal Aplikasi Sains Data
ISSN : -     EISSN : 3108947X     DOI : 10.33005
Jurnal Aplikasi Sains Data (JASID) is a peer-reviewed scientific journal published by the Data Science Study Program at Universitas Pembangunan Nasional "Veteran" East Java (UPN "Veteran" Jatim). Serving as a dedicated platform, JASID facilitates the dissemination of knowledge and practical experiences among researchers, practitioners, and academics in the field of data science applications. The journal covers a broad spectrum of topics within data science, including but not limited to machine learning, data mining, data analysis, data visualization, and natural language processing. It also emphasizes real-world applications of data science across diverse sectors such as business, finance, healthcare, and education. Through a rigorous peer review process, JASID upholds high standards of quality and originality in its published works. The journal aims to be a valuable resource for data science professionals and enthusiasts in Indonesia, fostering interdisciplinary collaboration and enhancing public understanding of the transformative potential of data science applications. Jurnal Aplikasi Sains Data (JASID) is a peer-reviewed scientific journal dedicated to advancing knowledge and innovation in the diverse field of data science applications. The journal welcomes original research articles, comprehensive reviews, and practical case studies that contribute to the understanding and implementation of data science across various domains. JASID’s scope encompasses, but is not limited to, the following areas: Machine Learning Data Mining Data Analysis Data Visualization Natural Language Processing Cloud Computing Big Data Internet of Things (IoT) Artificial Intelligence (AI) Robotics Ethical Considerations in Data Science Application of Data Science in sectors such as Business, Finance, Healthcare, Education, and Government Objectives JASID strives to: Foster and disseminate cutting-edge research and development in data science applications, both within Indonesia and globally. Serve as a reputable platform for researchers, practitioners, and academics to exchange knowledge, experiences, and innovative ideas in data science. Promote interdisciplinary collaboration among academia, industry, and government institutions to accelerate the advancement and practical adoption of data science. Raise awareness of the transformative potential and societal benefits of data science applications.
Articles 10 Documents
Comparison of ARIMA and SARIMA Methods for Non-Oil and Gas Export Forecasting in East Java Dinda Galuh Guminta
Jurnal Aplikasi Sains Data Vol. 1 No. 1 (2025): Journal of Data Science Applications.
Publisher : Program Studi Sains Data UPN "Veteran" Jawa Timur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33005/jasid.v1i1.2

Abstract

Forecasting plays a pivotal role in economic planning, particularly in aligning supply with demand and informing production decisions. This study aims to compare the performance of the Autoregressive Integrated Moving Average (ARIMA) and Seasonal ARIMA (SARIMA) models in forecasting the non-oil and gas export values of East Java, a region known for its dynamic trade activity. Using monthly time series data spanning from January 2007 to January 2024, sourced from the Central Statistics Agency (BPS) of East Java Province, this research conducts an in-depth analysis of forecasting accuracy and model suitability. Before model implementation, the dataset underwent several preprocessing steps to ensure its quality, including the handling of missing values and outlier adjustments. Both ARIMA and SARIMA models were developed, calibrated, and evaluated using standard forecasting performance metrics, namely Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). The ARIMA model exhibited consistently lower error rates across all three metrics, indicating its robustness in capturing the underlying patterns within the export data. In contrast, while the SARIMA model incorporated seasonal components, its performance did not surpass that of ARIMA in this specific case. The comparative findings suggest that, despite the seasonal nature of trade, the ARIMA model is more suitable for short-term forecasting of East Java’s non-oil and gas exports. This research contributes to the broader literature on economic forecasting by emphasizing the importance of selecting appropriate models based on data characteristics. Furthermore, the results provide valuable insights for policymakers and stakeholders engaged in export planning and regional trade development In this result the ARIMA model overcome the SARIMA with MAPE 0.116 to 0.983.
Implementation of Content-Based Filtering in Tourist Destination Recommendation System in Central Java Adigama Tri Nugraha
Jurnal Aplikasi Sains Data Vol. 1 No. 1 (2025): Journal of Data Science Applications.
Publisher : Program Studi Sains Data UPN "Veteran" Jawa Timur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33005/jasid.v1i1.3

Abstract

Tourism is a potential sector that plays an important role in the regional economy with significant contributions to regional income and foreign exchange earnings. Central Java, as one of the provinces with great potential in the tourism sector, has a variety of tourist attractions that include natural, artificial, special interest destinations, and more. One effort to optimize the tourism sector in Central Java is to improve tourism information services by creating a recommendation system for tourist attractions in Central Java. This research aims to create a personalized recommendation system for tourist attractions in Central Java based on user preferences using content-based filtering methods and neural network machine learning. This method is used to analyze the features of tourist attractions and user preferences, and to generate relevant recommendations. The model is trained using Adam optimization with a learning rate of 0.01 and 300 epochs. The evaluation results show that this method can provide tourist attraction recommendations in Central Java that tend to match user preferences with relatively low error rates, as indicated by a Mean Squared Error (MSE) value of 0.1766. Thus, this research can contribute to optimizing the tourism sector in Central Java and guide individuals in finding tourist attractions that suit their individual preferences.
Comparative Analysis of Stochastic Gradient Descent Optimization and Adaptive Moment Estimation in Emotion Classification from Audio Using Convolutional Neural Network Aldelia Jocelyn Tutuhatunewa
Jurnal Aplikasi Sains Data Vol. 1 No. 1 (2025): Journal of Data Science Applications.
Publisher : Program Studi Sains Data UPN "Veteran" Jawa Timur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33005/jasid.v1i1.5

Abstract

Emotion is a fundamental aspect of human life that profoundly shapes behavior, social interactions, and decision-making processes. The ability to effectively communicate and foster mutual understanding between individuals relies heavily on accurately recognizing and expressing emotions. Among various channels of emotional expression, sound stands out as a powerful and direct medium that reflects and conveys human emotional states. This makes audio-based emotion recognition a critical and rapidly evolving field of study. With the rapid advancements in information technology and artificial intelligence, research focused on recognizing emotions through sound signals has gained significant momentum. Machine learning algorithms, particularly deep learning models like neural networks, have demonstrated remarkable capabilities in identifying and classifying emotions expressed through multiple modalities such as text, images, videos, and especially audio signals. Within the family of neural networks, Convolutional Neural Networks (CNNs) have been especially effective for audio emotion classification, due to their strength in extracting hierarchical and spatial features directly from raw input data. This study specifically investigates the comparative effectiveness of two popular optimization algorithms—Stochastic Gradient Descent (SGD) and Adaptive Moment Estimation (Adam)—in training CNN models for emotion classification from audio recordings. Utilizing the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset, experimental results indicate that CNNs trained with the SGD optimizer achieve an overall accuracy of 53%, surpassing the 48% accuracy achieved by Adam. These results underscore the potential advantages of SGD in fine-tuning deep learning models for audio-based emotion recognition. Consequently, researchers and practitioners are encouraged to consider SGD optimization to improve the performance and robustness of emotion classification systems based on audio data.
Application of Fuzzy Inference System for Quality Assessment of Formula Milk for Pregnant Women in Stunting Program Wa Fijriyani R Ganisi
Jurnal Aplikasi Sains Data Vol. 1 No. 1 (2025): Journal of Data Science Applications.
Publisher : Program Studi Sains Data UPN "Veteran" Jawa Timur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33005/jasid.v1i1.8

Abstract

Stunting remains a significant global public health challenge, affecting more than 149 million children under five years of age worldwide as reported by the United Nations in 2020. Indonesia alone accounts for approximately 6.3 million stunted children, highlighting the urgent need for effective intervention strategies. Stunting is primarily caused by chronic malnutrition during the first 1,000 days of life, which includes inadequate nutritional intake during pregnancy, poor infant feeding practices, and environmental factors such as inadequate sanitation. The 2022 Indonesian Nutrition Status Survey (SSGI) indicated a stunting prevalence of 21.6%, showing improvement from 24.4% in 2021, yet still significantly above the national target of 14% set for 2024. Given the critical role of maternal nutrition in reducing stunting risk, providing pregnant women with appropriate nutritional guidance is essential. This study aims to develop a decision support model using a Fuzzy Inference System (FIS) to assist pregnant women in selecting the most suitable formula milk based on nutritional value and affordability. The Mamdani FIS method was applied to analyze data from eight commercially available formula milk products. The evaluation measured the membership degrees corresponding to recommendation levels, factoring in both price and nutrition. The results identified Anmum Materna as the most favorable option, with a membership degree of 0.937, classified under the "Highly Recommended" category. This formula is priced at IDR 70,000 and contains a total nutritional value of 1024 grams, offering a balance of quality and affordability. This model demonstrates potential as a practical tool to support informed nutritional choices during pregnancy, contributing to stunting prevention efforts.
Application of K-Prototypes Clustermix Algorithm for Clustering Risk Factors of Diabetes Disease Martina Hildha Arda
Jurnal Aplikasi Sains Data Vol. 1 No. 1 (2025): Journal of Data Science Applications.
Publisher : Program Studi Sains Data UPN "Veteran" Jawa Timur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33005/jasid.v1i1.9

Abstract

Diabetes mellitus (DM) is recognized as one of the most rapidly increasing chronic diseases worldwide, posing a significant public health challenge. According to the International Diabetes Federation (IDF), approximately 537 million people were living with diabetes mellitus globally, with projections estimating a rise to 643 million by 2030 and 783 million by 2045. Additionally, the World Health Organization (WHO) reported a 3% increase in mortality rates attributed to diabetes mellitus between 2000 and 2019, underscoring the urgent need for effective risk detection and management strategies. Early identification of risk factors is crucial to mitigating the impact of DM, and clustering analysis offers a promising method for stratifying patients based on risk profiles. This study employs the k-prototypes algorithm, which is particularly suited to clustering datasets with mixed numeric and categorical variables, to analyze DM risk factors. Utilizing data from the 2022 Behavioral Risk Factor Surveillance System (BRFSS) annual survey, the study examines a sample of 2,480 diabetes mellitus patients across the United States. The clustering analysis identified two optimal clusters (k=2) based on a high silhouette score of 0.821, indicating strong cluster cohesion and separation. Cluster 2, consisting of 77 patients, exhibited a higher risk profile for diabetes compared to Cluster 1, which included 2,403 patients. The clusters were characterized by significant differences in average values of key DM risk factors including weight, fruit and vegetable consumption, mental and physical health status, age, alcohol consumption, hypertension, smoking status, physical activity, mobility difficulties, sex, education level, income, and ethnicity. These findings highlight the utility of k-prototypes clustering in identifying high-risk DM subgroups to inform targeted prevention and intervention efforts.
Application of Convolutional Neural Network (CNN) for Web-Based Translation of Indonesian Text into Sign Language Prameswari, Diajeng; Larasati; Muhammad Naswan Izzudin Akmal; Prismahardi Aji Riyantoko; Dwi Arman Prasetya
Jurnal Aplikasi Sains Data Vol. 1 No. 2 (2025): Journal of Data Science Applications.
Publisher : Program Studi Sains Data UPN "Veteran" Jawa Timur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33005/jasid.v1i2.12

Abstract

Communication for the deaf and hard of hearing is often hindered by the limited number of sign language interpreters. This research aims to develop a web-based text-to-text sign language translation system using Convolutional Neural Networks (CNN) to bridge this communication gap. The system is built with the ASL Alphabet dataset containing 87,000 images from 29 classes (A-Z, SPACE, DELETE, NOTHING). The CNN model was designed with three convolutional layers and trained for 15 epochs using 80% of the data, while 20% of the data was used for testing. The user interface was developed using Streamlit for ease of use. Training results showed a training accuracy of 98.96% and a validation accuracy of 98.61% at the 15th epoch. Model evaluation yielded an overall accuracy of 98%, with high precision, recall, and F1-score values for most classes. This research demonstrates the significant potential of CNN in developing automatic sign language translators, which is expected to improve information accessibility and inclusivity for the deaf community.
Statistical Analysis of Infant Malnutrition Cases in North Sumatra Before and After COVID-19 Using the Wilcoxon Test Sitanggang, Desi Daomara; Putri, Serlinda Mareta; Agustin, Sesillia; Prasetya, Dwi Arman; Fahrudin, Tresna Maulana
Jurnal Aplikasi Sains Data Vol. 1 No. 2 (2025): Journal of Data Science Applications.
Publisher : Program Studi Sains Data UPN "Veteran" Jawa Timur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33005/jasid.v1i2.16

Abstract

Child malnutrition remains a very important public health issue in Indonesia. Malnutrition is a condition of deficiency in energy and essential nutrients that can lead to impaired physical growth, mental development, and an increased risk of mortality in children. The prevalence of malnutrition among toddlers in Indonesia is still quite high and shows disparities between regions, especially in provinces with high poverty rates. One province of concern is North Sumatra, which, according to data from the Ministry of Health, has had a significant incidence of malnutrition in the last five years. This condition was exacerbated by the emergence of the COVID-19 pandemic at the end of 2019, which has had a major impact on various sectors of life, including family health and economy. The pandemic caused significant disruptions to primary healthcare systems, including a decrease in posyandu activities, immunizations, and monitoring of children's nutritional status. The decline in household income during the pandemic made it difficult for families to meet their balanced nutritional food needs. A UNICEF study showed an increased risk of acute malnutrition in children during the pandemic, especially in previously vulnerable areas. To measure the impact of the COVID-19 pandemic on the incidence of child malnutrition, a statistical approach that can compare data before and after the pandemic is needed. This study aims to analyze the difference in the incidence of child malnutrition before and after the COVID-19 pandemic in North Sumatra Province using the Wilcoxon test method. Using the Wilcoxon Signed-rank Test statistical method, a comparative analysis was performed between the medians of the data from 2018 and 2023. The results of the study showed that there was a difference between the medians of the two data sets.
Application of K-Means Clustering for Regency/City Clustering in East Java Based on 2024 Human Development Index Indicators Emilia, Kholidatus; Rahayu, Ayu Sri; Yuliani, Devina Putri; Prasetya, Dwi Arman; Riyantoko, Prismahardi Aji
Jurnal Aplikasi Sains Data Vol. 1 No. 2 (2025): Journal of Data Science Applications.
Publisher : Program Studi Sains Data UPN "Veteran" Jawa Timur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33005/jasid.v1i2.21

Abstract

This study applies the K-Means clustering algorithm to group 38 regencies and cities in East Java Province based on five Human Development Index (HDI) indicators for the year 2024. These indicators include Life Expectancy (UHH), Expected Years of Schooling (HLS), Mean Years of Schooling (RLS), and Real Expenditure Per Capita (PPK). The aim of this research is to uncover hidden patterns and disparities in regional development, which can be used as a basis for more targeted and data-driven policy interventions.The optimal number of clusters was determined using three evaluation metrics: the Elbow Method, Silhouette Score, and Davies-Bouldin Index. These evaluations collectively identified three distinct clusters. Cluster 0 represents regions with high levels of development across all indicators. Cluster 1 consists of regions with moderate development levels and potential for improvement, while Cluster 2 contains regions with significantly lower values, particularly in education and income metrics.In addition to clustering, a correlation analysis was conducted to examine the relationship between HDI and its supporting indicators. The results show that Mean Years of Schooling (RLS) and Real Expenditure Per Capita (PPK) have the strongest positive correlation with HDI across all clusters. This highlights the key role of education and economic well-being in improving human development. The findings emphasize the importance of clustering analysis in shaping equitable and region-specific development strategies.
Feature Importance-Guided Ensemble Classification for Predicting Recurrence in Differentiated Thyroid Cancer Muhammad Ghinan Navsih; Wahyu Putra Pratama; Hikmata Tartila; Dwi Arman Prasetya; Tresna Maulana Fahrudin
Jurnal Aplikasi Sains Data Vol. 1 No. 2 (2025): Journal of Data Science Applications.
Publisher : Program Studi Sains Data UPN "Veteran" Jawa Timur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33005/jasid.v1i2.22

Abstract

Accurate prediction of cancer recurrence is critical for improving patient monitoring and personalized treatment planning. In this study, we propose a machine learning framework to predict recurrence in patients with differentiated thyroid cancer using statistically selected clinical features. Feature relevance was assessed using ANOVA for ordinal/numerical variables and the Chi-square test for one-hot encoded categorical variables, allowing us to identify the most informative predictors. We then trained three distinct classifiers—Random Forest, Logistic Regression, and XGBoost—and combined them using a hard voting ensemble strategy. The proposed ensemble achieved an accuracy of 98.7% on the test set, with particularly strong precision and recall scores for the recurrent class, indicating its potential clinical utility. Interestingly, all three base classifiers produced identical predictions on the test data, suggesting the dataset’s strong internal structure and the effectiveness of our feature selection process. This work highlights the value of integrating statistical feature selection with ensemble modeling for robust and interpretable prediction in clinical oncology applications.
Application of XGBoost for Risk Level Classification of Fires in Surabaya City in 2024 and Interactive Spatial Visualization Based on Streamlit Sarah, Sarah Aprilia Hasibuan; Divia, Divia Prisillia Prisca; Dila, Annita Fadhilah Aprilia; Arman, Dwi Arman Prasetya; Prisma, Prismahardi Aji Riyantoko
Jurnal Aplikasi Sains Data Vol. 1 No. 2 (2025): Journal of Data Science Applications.
Publisher : Program Studi Sains Data UPN "Veteran" Jawa Timur

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33005/jasid.v1i2.24

Abstract

 Fire in urban areas such as Surabaya City is a non-natural disaster that can have a significant impact on public safety, economic stability, and the environment. This study aims to develop a fire risk level classification model using Extreme Gradient Boosting (XGBoost) algorithm based on selected predictor variables, namely response time, fire subtype, and number of victims affected. The dataset consists of 859 fire events throughout 2024, enriched with spatial and demographic attributes. The research methodology involved data preprocessing (including label coding and normalization), class imbalance handling with Synthetic Minority Over-sampling Technique (SMOTE), model training with XGBoost, and evaluation using metrics such as accuracy, precision, recall, and f1-score. The classification model achieved excellent performance, with an overall accuracy of 1.00% and perfect precision, recall, and f1-score of 1.00 across all risk categories (low, medium, and high). Confusion matrix and ROC curve analysis confirmed the high predictive ability of this model. In addition, the results were visualized using a Streamlit-based interactive dashboard to enhance the usability of the model for decision-making. These findings highlight the potential of XGBoost as a powerful tool for fire risk classification and emphasize its relevance in supporting early warning systems and evidence-based disaster mitigation policies in urban environments.

Page 1 of 1 | Total Record : 10