cover
Contact Name
Husni Teja Sukmana
Contact Email
husni@bright-journal.org
Phone
+62895422720524
Journal Mail Official
jads@bright-journal.org
Editorial Address
Gedung FST UIN Jakarta, Jl. Lkr. Kampus UIN, Cemp. Putih, Kec. Ciputat Tim., Kota Tangerang Selatan, Banten 15412
Location
Kota adm. jakarta pusat,
Dki jakarta
INDONESIA
Journal of Applied Data Sciences
Published by Bright Publisher
ISSN : -     EISSN : 27236471     DOI : doi.org/10.47738/jads
One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes applied to collect, treat and analyze data will help to render scientific research results reproducible and thus more accountable. The datasets itself should also be accessible to other researchers, so that research publications, dataset descriptions, and the actual datasets can be linked. The journal Data provides a forum to publish methodical papers on processes applied to data collection, treatment and analysis, as well as for data descriptors publishing descriptions of a linked dataset.
Articles 55 Documents
Search results for , issue "Vol 6, No 4: December 2025" : 55 Documents clear
A Data-Driven Training Kit to Enhance the Note Recording Skills of Music Learners Chantanasut, Thaworada
Journal of Applied Data Sciences Vol 6, No 4: December 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i4.1063

Abstract

Music note recording is a fundamental skill in music education, yet many undergraduate learners struggle due to the lack of structured, self-guided practice resources. This study aimed to develop and evaluate a data-driven training kit designed to Enhance the Note Recording Skills of Music Learners. A total of 20 first-year students from a Bachelor of Education program in Western Music and Vocal Education were selected through purposive sampling. Research instruments included the training kit, expert evaluation forms, pre- and post-tests, student satisfaction surveys, and behavior observation checklists. Quantitative data were analyzed using mean, standard deviation, and paired sample t-tests. Results showed a statistically significant improvement in students' performance after the intervention (t = 18.789, p .001), with a large effect size (Cohen’s d = 2.53). Experts rated the kit highly (M = 4.86, SD = 0.29), and students reported very high satisfaction (M = 4.94, SD = 0.14). These findings support the kit’s effectiveness as an engaging and pedagogically sound tool for developing music note recording skills in higher education settings. 
A Hybrid LSTM–Stacking–SMOTE Model for Weather-Aware Palm Oil Price Prediction Addressing Data Imbalance and Forecast Accuracy Kusmanto, Kusmanto; Subagio, S; Manja, Erni
Journal of Applied Data Sciences Vol 6, No 4: December 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i4.922

Abstract

Accurate forecasting of palm oil prices is crucial for agribusiness decision-making due to high market volatility influenced by dynamic weather conditions. This study proposes a novel hybrid deep learning model combining Long Short-Term Memory (LSTM), Stacking Ensemble, and Synthetic Minority Over-sampling Technique (SMOTE) to improve predictive accuracy and handle class imbalance in price trend classification. The model was trained using a multivariate time-series dataset sourced from Kaggle, consisting of daily records of temperature, humidity, rainfall, and palm oil prices. A binary classification scheme was applied by labeling instances as either price increase (class 1) or price stable/decrease (class 0), based on a 0% price change threshold. Four experimental configurations were evaluated: standard LSTM, LSTM + SMOTE, LSTM + Stacking, and the proposed LSTM + SMOTE + Stacking. The proposed model outperformed all baselines, achieving the highest accuracy of 83.12%, an F1-score of 0.8466, MAE of 0.1688, RMSE of 0.4109, and a perfect recall of 1.0000, indicating excellent sensitivity to minority class trends. In contrast, the standard LSTM achieved only 77.32% accuracy and an F1-score of 0.7224, showing limited ability in handling imbalanced data. Visualization of loss curves and confusion matrices confirmed the model’s learning stability and classification effectiveness. This study contributes a novel integration of ensemble learning and oversampling in time-series commodity forecasting and demonstrates the effectiveness of this approach in capturing weather-driven price patterns, offering a robust framework for predictive analytics in agriculture.
Optimizing The XGBoost Model with Grid Search Hyperparameter Tuning for Maximum Temperature Forecasting Sugiarto, Sugiarto; Mas Diyasa, I Gede Susrama; Alhamda, Denisa Septalian; Aryananda, Rangga Laksana; Fatmah Sari, Allan Ruhui; Sukri, Hanifudin; Dewi, Deshinta Arrowa
Journal of Applied Data Sciences Vol 6, No 4: December 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i4.885

Abstract

This study presents a novel comparative approach to maximum temperature forecasting in Surabaya, Indonesia, by integrating Extreme Gradient Boosting (XGBoost) with Grid Search Hyperparameter Tuning and benchmarking it against Autoregressive Integrated Moving Average (ARIMA) and Neural Prophet models. The main idea is to evaluate the capability of XGBoost in capturing nonlinear patterns in environmental time series data, which traditional models often fail to address. Using 15,388 historical daily maximum temperature records from the BMKG Juanda weather station spanning 1981–2022, the objective is to identify the most accurate predictive model for short- and medium-term forecasts. The modeling process involved four stages: data acquisition, preprocessing, training, and evaluation, with performance assessed using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). The findings show that, after hyperparameter tuning, XGBoost achieved the best performance with MAE = 0.32 and RMSE = 0.65, outperforming ARIMA (MAE = 0.85, RMSE = 1.20) and Neural Prophet (MAE = 0.70, RMSE = 0.98). Prediction results for 2025 indicate peak maximum temperatures in January, October, and November, aligning with recent climate patterns. The contribution of this research lies in demonstrating the superiority of a tuned XGBoost model for complex environmental datasets, offering a practical tool for urban climate planning, agricultural scheduling, and heatwave risk mitigation. The novelty of this work is the systematic integration of Grid Search-based optimization with XGBoost for meteorological forecasting in a tropical urban context, producing higher accuracy than both classical statistical and modern hybrid time series methods. These results highlight the model’s adaptability and potential for broader climate-related applications, with future research recommended to incorporate additional meteorological variables such as humidity and wind speed for even greater predictive capability.
Spatio-Temporal Variation in Bus Service Satisfaction and Policy Implications: A Case-Based Study in an Emerging Urban Area Nga, Nguyen Thi Ngoc; Minh, Hai Nguyen
Journal of Applied Data Sciences Vol 6, No 4: December 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i4.879

Abstract

This study investigates the spatio-temporal variation in bus service satisfaction in a rapidly urbanizing city in Southeast Asia. The research addresses a critical gap in urban transit studies, where passenger satisfaction is often treated as a static construct, overlooking how satisfaction may fluctuate across different geographic areas and time periods. By applying a spatio-temporal analytical framework, this study aims to provide a more dynamic and localized understanding of transit service perceptions. The research builds upon existing approaches by integrating a structured survey based on the Customer Satisfaction Survey (CSS) method with Best–Worst Scaling (BWS) to prioritize service attributes. A stratified sampling technique was employed across multiple wards in the study area, with 612 valid responses collected during both peak and off-peak hours. The survey captured data on passenger experiences and preferences, disaggregated by time of day, location, and demographic characteristics. Multinomial Logit (MNL) modeling was used to estimate the relative importance of key service dimensions, such as punctuality, comfort, frequency, and accessibility. The analysis revealed significant spatial and temporal heterogeneity in satisfaction levels. For instance, passengers in peripheral wards rated reliability and onboard conditions more negatively compared to those in central areas. Similarly, satisfaction levels were lower during evening hours, particularly concerning bus overcrowding and wait times. The findings suggest that transit policy must adopt a more flexible and localized strategy, rather than uniform service standards, to address distinct user expectations. Targeted improvements in underperforming routes and time slots could enhance overall user experience and promote public transport usage. This study contributes new insights to the evaluation of urban bus services in emerging cities and underscores the value of incorporating spatio-temporal dynamics into transit planning and customer satisfaction research.
An Explainable Credit Card Fraud Detection Model using Machine Learning and Deep Learning Approaches Alkhozae, Mona; Almasre, Miada; Almakky, Abeer; Alhebshi, Reemah M.; Alamri, Amani; Hakami, Widad; Alshahrani, Lamia
Journal of Applied Data Sciences Vol 6, No 4: December 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i4.962

Abstract

This study proposes an adaptive, interpretable real-time fraud detection and prevention system designed for high-risk financial environments, capable of processing over 1.6 million imbalanced credit card transactions with low latency. The objective is to build a unified framework that integrates predictive accuracy, explainability, and adaptability. The methodology follows four phases: exploratory data analysis to reveal structural and behavioral fraud patterns, feature engineering with domain-informed attributes and ADASYN oversampling to mitigate the 1:174 imbalance, training of multiple models (XGBoost, LightGBM, Random Forest, Gradient Boosting, and MLP), and an ensemble architecture evaluated with SHAP-based explainability. The system introduces three key contributions: stability-aware SHAP caching that reduces explanation latency to 41.2 ms, reinforcement learning–based threshold tuning that dynamically adapts to evolving fraud patterns, and out-of-distribution detection to enhance resilience against data drift. Results demonstrate strong performance, with XGBoost achieving 99.86% accuracy, 96.36% precision, 80.59% recall, F1-score of 0.878, and ROC-AUC of 0.9988, outperforming other models. The full system attained 93.2% accuracy, 90.2% F1-score, and 96.1% AUC at the system level, successfully blocking 91% of fraudulent transactions while maintaining a false positive rate of 7.8%. Novelty lies in combining explainability and adaptivity in a production-ready architecture, where reinforcement learning enables continuous threshold self-regulation and SHAP stability analysis validates interpretability across models. These findings show that high fraud detection accuracy and transparency are not mutually exclusive, offering a scalable blueprint for financial institutions and other critical domains requiring real-time, explainable, and adaptive decision-making.
Comparative Analysis of Novel Deep Reinforcement Learning Methods for Food Distribution Optimization Hutahaean, Jeperson; Siagian, Yessica; Saputra, Endra
Journal of Applied Data Sciences Vol 6, No 4: December 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i4.956

Abstract

Uneven food distribution across various regions in Indonesia often results in supply-demand imbalances, leading to price surges, stock shortages, and overall market instability. This challenge is compounded by the limitations of conventional distribution systems, which are ill-equipped to respond to rapidly changing market dynamics. In response, this study introduces a novel, AI-driven approach by implementing Deep Reinforcement Learning (DRL) to optimize food distribution policies using real-world data. Specifically, we perform a comparative evaluation of four emerging DRL models—Double Deep Q-Network (Double DQN), Dueling DQN, Proximal Policy Optimization (PPO), and Advantage Actor-Critic (A2C)—to determine their effectiveness in learning adaptive distribution strategies from national food logistics data provided by Indonesia’s Central Bureau of Statistics (BPS). Each model was trained within a custom simulation environment based on the Markov Decision Process (MDP) framework and evaluated using five core performance metrics: cumulative reward, average reward, success rate, sample efficiency, and best reward. The results reveal that A2C consistently outperformed the other models, delivering the highest average reward and most stable training performance, while PPO demonstrated strong efficiency and success rate. These findings underscore the potential of policy-gradient methods—particularly A2C—as robust and intelligent solutions for dynamic food logistics management. This research offers one of the first comparative benchmarks of DRL methods in the food distribution domain and highlights their applicability for future integration into national AI-powered logistics systems.
An Artificial Neural Network-Based Geo-Spatial Model for Real-Time Flood Risk Prediction Using Multi-Source High-Resolution Data Aziz, RZ Abdul; Nurpambudi, Ramadhan; Herwanto, Riko; Hasibuan, Muhammad Said
Journal of Applied Data Sciences Vol 6, No 4: December 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i4.913

Abstract

Flood prediction presents a pressing challenge in disaster management, especially in regions vulnerable to extreme weather events. In response, this study offers a novel approach to flood risk prediction by developing a deep learning-based Geo-Spatial Artificial Neural Network (ANN). The model actively integrates high-resolution satellite imagery, meteorological data, and topographic indicators, such as rainfall, elevation, and land use to capture complex spatial and environmental relationships that influence flood risk. This study conducted data preprocessing using Principal Component Analysis (PCA) and normalization to ensure consistency across datasets. It built the ANN with multiple hidden layers and trained it using the backpropagation algorithm on historical flood data. Furthermore, it designed the ANN model with multiple hidden layers and trained it using the backpropagation algorithm. The model achieved a notable 92% prediction accuracy, significantly outperforming traditional flood prediction methods, which typically yield 75–85% accuracy. Conventional metrics were Mean Squared Error (1.41) and R-squared (0.94). It confirmed the model’s superior ability to predict high-risk flood zones. The model also effectively captured non-linear patterns that conventional statistical or deterministic methods often failed to detect. The results showed that the model generalizes well and adapts effectively, making it suitable for real-time and data-driven flood forecasting. By integrating artificial intelligence with geo-spatial analytics, this study offers a scalable, accurate, and efficient tool for early warning systems and risk management. It recommends that future research should focus on incorporating additional data sources and refining model training techniques to further enhance scalability and performance.
A Data-Driven Mixed Integer Nonlinear Programming Model for Cost-Optimal Scheduling of Perishable Production and Workforce Putri, Mimmy Sari Syah; Mawengkang, Herman; Suwilo, Saib; Tulus, Tulus
Journal of Applied Data Sciences Vol 6, No 4: December 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i4.1019

Abstract

This study presents a data-driven, Mixed Integer Nonlinear Programming (MINLP) framework for optimizing the multi-period production scheduling of perishable products with integrated workforce planning. Its primary novelty is the holistic integration of a continuous exponential decay function for product deterioration with dynamic workforce planning, creating a unified model that optimizes production, inventory, and labor simultaneously. This approach addresses key challenges in perishable inventory systems by treating labor as a controllable resource rather than a fixed constraint. Mathematically, the model includes nonlinear inventory balance equations with decay terms and resource-dependent capacity constraints. The objective is to minimize total operational cost, comprising production, holding, and spoilage costs. Computational experiments, based on a realistic case study, demonstrate that the proposed model reduces total system cost by 6.2% and spoilage costs by 43.2% compared to a standard heuristic benchmark. The resulting production and labor schedules align closely with demand fluctuations, supporting both economic and operational efficiency. This unified framework advances the mathematical modeling of sustainable production planning and offers a practical tool for real-world industries such as food processing and pharmaceuticals.
Enhancing Sustainable Biogas Generation Through a Real-Time Digital Twin of a Modular Bioreactor Amirkhanov, Bauyrzhan; Kunelbayev, Murat; Issa, Sabina; Amirkhanova, Gulshat; Nurgazy, Tomiris; Zhumasheva, Ainur; Alipbeki, Ongarbek
Journal of Applied Data Sciences Vol 6, No 4: December 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i4.779

Abstract

This article presents the design and research of a modular horizontal tubular bioreactor for efficient biogas production based on anaerobic digestion technology. The study combines a digital twin implemented in the MATLAB/Simulink environment with a physical bioreactor equipped with a sensor and control system. The developed mathematical model describes the biochemical processes of acidogenesis and methanogenesis, the thermal regime and the sensitivity of the system to key parameters. Numerical modeling and visualization methods were used for the analysis. The experiments were carried out for 30 days at a mesophilic temperature of 37 ° C, repeated three times to increase reliability. The raw material used was a mixture of cattle manure and food waste in a 3:1 ratio, with a total volume of 60 liters. Readings from temperature, pH, and methane sensors were taken every 10 minutes. Experimental data confirmed the high efficiency of the design: removal of up to 70.5% of volatile substances and methane yield of up to 80.5%. Predictive analysis has shown that the digital twin is able to predict the behavior of the system and apply corrective actions in real time. The novelty of the work lies in the integration of a digital twin with a physical bioreactor in real time through industrial communication protocols.
AMIKOM-RECSYS: Enhancing Movie Recommender System using Large Language Model (ChatGpt), Deep Learning and Probabilistic Matrix Factorization Hanafi, Hanafi; Widowati, Anik Sri; Wahyuni, Sri Ngudi
Journal of Applied Data Sciences Vol 6, No 4: December 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i4.897

Abstract

E-commerce has become one of the most widely used digital applications globally, enabling personalized product discovery and purchasing. To support these services, recommender systems are essential, offering item suggestions based on user preferences. Most recommender systems rely on machine learning algorithms to estimate user-item relevance scores, often utilizing product ratings. However, a persistent challenge in this domain is the issue of data sparsity, where only a small fraction of users provides explicit ratings, leading to reduced accuracy in recommendation results. In this study, we introduce a novel hybrid recommendation algorithm, named AMIKOM-RECSYS, designed to address the sparsity problem and enhance rating prediction. Our model integrates three main components included a Large Language Model (LLM) using ChatGPT, a Transformer-based encoder (BERT), and Probabilistic Matrix Factorization (PMF). The LLM generates descriptive information about movies based on specific prompts, which is then passed to BERT to encode the content into meaningful 2D vector representations. These enriched embeddings are subsequently utilized by the PMF algorithm to predict missing user-item ratings. We evaluate the proposed model on two benchmark datasets, ML-1M and ML-10M using Root Mean Squared Error (RMSE) as the evaluation metric. The AMIKOM-RECSYS model achieved RMSE values of 0.8681 on ML-1M and 0.7791 on ML-10M under a 50:50 data split, outperforming several baseline models including CNN-PMF, LSTM-PMF, and Attention-PMF. These results highlight the effectiveness of integrating LLM and Transformer-based contextual understanding into matrix factorization frameworks. In future work, we plan to extend this framework by incorporating other matrix factorization techniques such as Singular Value Decomposition (SVD) and integrating additional sources of user information, including social media activity, to further improve recommendation performance.