cover
Contact Name
Husni Teja Sukmana
Contact Email
husni@bright-journal.org
Phone
+62895422720524
Journal Mail Official
jads@bright-journal.org
Editorial Address
Gedung FST UIN Jakarta, Jl. Lkr. Kampus UIN, Cemp. Putih, Kec. Ciputat Tim., Kota Tangerang Selatan, Banten 15412
Location
Kota adm. jakarta pusat,
Dki jakarta
INDONESIA
Journal of Applied Data Sciences
Published by Bright Publisher
ISSN : -     EISSN : 27236471     DOI : doi.org/10.47738/jads
One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes applied to collect, treat and analyze data will help to render scientific research results reproducible and thus more accountable. The datasets itself should also be accessible to other researchers, so that research publications, dataset descriptions, and the actual datasets can be linked. The journal Data provides a forum to publish methodical papers on processes applied to data collection, treatment and analysis, as well as for data descriptors publishing descriptions of a linked dataset.
Articles 55 Documents
Search results for , issue "Vol 6, No 3: September 2025" : 55 Documents clear
Impact of Sample Size on the Robustness of Machine Learning Algorithms for Detecting Loan Defaults Using Imbalanced Data Kobone, Boitumelo Tryphina; Montshiwa, Tlhalitshi Volition
Journal of Applied Data Sciences Vol 6, No 3: September 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i3.713

Abstract

This study aimed to assess the impact of sample size on the robustness of five machine learning classifiers: Support Vector Machine (SVM), Random Forest (RF), Naïve Bayes (NB), Decision Trees (DT), and K-Nearest Neighbour (K-NN). Although there are data-balancing techniques that aid in addressing data imbalance, they have some limitations which are discussed in this paper. The current study continues the trend in the application of these five ML classifiers for credit default detection, but it makes a contribution by examining whether sample size increment can better their performance when they are trained using a different imbalanced loan default dataset which has not been the focus of previous studies, although most ML algorithms are known to perform well when trained with large datasets. The study used a secondary loan default imbalanced dataset from Kaggle.com, where 85% of participants made loan payments and 15% defaulted. Stratified random sampling was used to select different sample sizes starting with 2% of the total observations, followed by 5%, then 10% up to 90% of the dataset, with the dependent variable being the stratum. The study found no consistent change in the classification metrics with the change in sample size, but RF and DT achieved 100% performance regardless of sample size and are therefore recommended as the most robust to data imbalance in loan default detection. The average classification metrics for NB and K-NN ranged from 72% to 92%, and SVM produced the lowest averages which were between 69% and 75%. NB, K-NN and SVM yielded poor sensitivity rates of 0% to 53%, indicating poor loan payments prediction but they had sensitivity scores in range of 84% to 86%, indicating good loan default classification. Future studies should consider other sampling methods, deep and hybrid learning methods with comparison to RF and DT.
Multi-Label Classification of Indonesian Voice Phishing Conversations: A Comparative Study of XLM-RoBERTa and ELECTRA Hidayat, Ahmad; Madenda, Sarifuddin; Hustinawaty, Hustinawaty
Journal of Applied Data Sciences Vol 6, No 3: September 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i3.858

Abstract

Mobile phones have become a primary means of communication, yet their advancement has also been exploited by cybercriminals, particularly through voice phishing schemes. Voice phishing is a form of social engineering fraud carried out via telephone conversations to illegally obtain personal or financial information. The complexity of voice phishing continues to increase, as a single conversation may involve multiple fraudulent schemes simultaneously, necessitating the application of multi-label classification to comprehensively identify all motives of fraud. Previous studies have predominantly utilized single-label approaches and foreign-language data, making them less relevant to the Indonesian language context and unable to produce speaker segmentation outputs for conversational analysis. This study contributes by developing a multi-label voice phishing classification system specifically for Indonesian telephone conversations to address this gap. Audio data were collected from open sources and simulated recordings, resulting in a total of 300 samples labeled into six categories: five phishing modes and one non-phishing category. The proposed system consists of a preprocessing pipeline that includes noise reduction, speaker segmentation, automatic transcription, and text cleaning to preserve the context of two-way conversations. Two machine learning models based on transformer architectures, XLM-RoBERTa and ELECTRA, are employed to identify various fraud schemes that may occur simultaneously within a single conversation. The dataset was split into training, validation, and testing sets with two division ratios for performance evaluation. Several combinations of hyperparameters were tested to obtain the most optimal model configuration. Evaluation was conducted using a supervised learning approach and various performance metrics. The experimental results show that XLM-RoBERTa achieved the highest average accuracy of 97.04 ± 1.15% and the highest average F1-score of 92.66 ± 2.59%. These results highlight the novelty of applying multi-label classification in the Indonesian language context for voice phishing detection, contributing to more effective fraud identification in real-world telephony systems.
Incorporate Transformer-Based Models for Anomaly Detection Dewi, Deshinta Arrova; Singh, Harprith Kaur Rajinder; Periasamy, Jeyarani; Kurniawan, Tri Basuki; Henderi, Henderi; Hasibuan, M. Said; Nathan, Yogeswaran
Journal of Applied Data Sciences Vol 6, No 3: September 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i3.762

Abstract

This paper explores the effectiveness of Transformer-based models, specifically the Time-Series Transformer (TST) and Temporal Fusion Transformer (TFT), for anomaly detection in streaming data. We review related work on anomaly detection models, highlighting traditional methods' limitations in speed, accuracy, and scalability. While LSTM Autoencoders are known for their ability to capture temporal patterns, they suffer from high memory consumption and slower inference times. Though efficient in terms of memory usage, the Matrix Profile provides lower performance in detecting anomalies. To address these challenges, we propose using Transformer-based models, which leverage the self-attention mechanism to capture long-range dependencies in data, process sequences in parallel, and achieve superior performance in both accuracy and efficiency. Our experiments show that TFT outperforms the other models with an F1-score of 0.92 and a Precision-Recall AUC of 0.71, demonstrating significant improvements in anomaly detection. The TST model also shows competitive performance with an F1-score of 0.88 and Precision-Recall AUC of 0.68, offering a more efficient alternative to LSTMs. The results underscore that Transformer models, particularly TST and TFT, provide a robust solution for anomaly detection in real-time applications, offering improved performance, faster inference times, and lower memory usage than traditional models. In conclusion, Transformer-based models stand out as the most effective and scalable solution for large-scale, real-time anomaly detection in streaming time-series data, paving the way for their broader application across various industries. Future work will further focus on optimizing these models and exploring hybrid approaches to enhance detection capabilities and real-time performance.
Designing a Culturally Adaptive Information Framework for Anxiety Disorders: A Mixed-Methods Thematic Analysis in Malaysia Zailani, Achmad Udin; Wan Ahmad, Wan Nooraishya; Muh Tuah, Nooralisa; Tze Ping, Nicholas Pang Tze Ping Pang
Journal of Applied Data Sciences Vol 6, No 3: September 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i3.771

Abstract

This study addresses critical gaps in Malaysia's mental health landscape by developing a culturally adaptive framework for anxiety disorder resources, where only 28% of adults recognize symptoms due to cultural stigma and poor resource design. Our key contribution is a user-centered framework integrating visual-interactive tools with cultural adaptation strategies to improve accessibility and literacy. The objective was to investigate how information design can overcome barriers, using a mixed-methods approach with 12 anxiety disorder patients (screened via DASS-21). Findings revealed: (1) format preferences (infographics: 40%, videos: 35%, simulations: 25%), (2) accessibility barriers (technical language: 45%, lack of credible sources: 65%, insufficient examples: 30%), and (3) demand for demographic personalization (age-targeted content: 78%, mood-tracking tools: 62%). Quantitative results showed strong alignment between preferred formats and comprehension gains (infographics improved understanding by 40% vs. text). The novelty lies in merging cognitive load theory with Malay cultural values (familial collectivism, Islamic coping mechanisms) into actionable design principles. Our framework demonstrates that culturally tailored visual-interactive content increases engagement by 35-40% compared to generic materials, while simplified Malay Language reduces stigma-related avoidance by 28%. These ideas translate into three evidence-based strategies: (a) minimalist visual formats to reduce cognitive load, (b) family-involved examples to respect collectivism, and (c) hybrid delivery (online/offline) for rural accessibility. The study provides policymakers with metrics-backed guidance, showing SMS-based hybrid tools achieve 58% adherence in low-bandwidth areas versus 22% for chatbots. Future work should validate scalability in larger cohorts and test AR/VR adaptations (requested by 70% of youth participants). This research advances both mental health communication theory and practical interventions for Southeast Asia's multicultural contexts.
Detecting Gender-Based Violence Discourse Using Deep Learning: A CNN-LSTM Hybrid Model Approach Kurniawan, Tri Basuki; Dewi, Deshinta Arrova; Henderi, Henderi; Hasibuan, M. Said; Zakaria, Mohd Zaki; Ismail, Abdul Azim Bin
Journal of Applied Data Sciences Vol 6, No 3: September 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i3.761

Abstract

Gender-Based Violence (GBV) is a critical social issue impacting millions worldwide. Social media discussions offer valuable insights into public awareness, sentiment, and advocacy, yet manually analyzing such vast textual data is highly challenging. Traditional text classification methods often struggle with contextual understanding and multi-class categorization, making it difficult to accurately identify discussions on Sexual Violence, Physical Violence, and other topics. To address this, the present study proposes a hybrid deep learning approach combining Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks. CNN is utilized for extracting key linguistic features, while LSTM enhances the classification process by maintaining sequential dependencies. This hybrid CNN+LSTM model is evaluated against standalone CNN and LSTM models to assess its performance in classifying GBV-related tweets. The dataset was sourced from Kaggle, containing real-world Twitter discussions on GBV. Experimental results demonstrate that the hybrid model surpasses both CNN and LSTM models, achieving an accuracy of 89.6%, precision of 88.4%, recall of 89.1%, and F1-score of 88.7%. Confusion matrix and ROC curve analyses further confirm the hybrid model’s superior performance, correctly identifying Sexual Violence (82%), Physical Violence (15%), and Other (3%) cases with reduced misclassification rates. These results suggest that combining CNN’s feature extraction with LSTM’s contextual learning provides a more balanced and effective classification model for GBV-related text. This work supports the development of AI-based tools for social media monitoring, policy-making, and advocacy, helping stakeholders better understand and respond to GBV discussions. Future research could explore transformer-based models like BERT and real-time classification applications to further improve performance.
Navigating Heart Stroke Terrain: A Cutting-Edge Feed-Forward Neural Network Expedition Praveen, S Phani; Mantena, Jeevana Sujitha; Sirisha, Uddagiri; Dewi, Deshinta Arrova; Kurniawan, Tri Basuki; Onn, Choo Wou; Yorman, Yorman
Journal of Applied Data Sciences Vol 6, No 3: September 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i3.763

Abstract

Heart stroke remains one of the leading causes of death worldwide, necessitating early and accurate prediction systems to enable timely medical intervention. While a variety of machine learning approaches have been employed to address this issue, including Logistic Regression, Decision Trees, Random Forests, Support Vector Machines, and K-Nearest Neighbors, these models often suffer from limitations such as overfitting, insufficient generalization, poor performance on imbalanced datasets, and inability to capture complex nonlinear patterns in clinical data. Additionally, many existing works do not comprehensively integrate both clinical and demographic features or lack rigorous evaluation metrics beyond accuracy alone. This study proposes a novel Feed-Forward Neural Network (FFNN) model for heart stroke prediction, designed to overcome the shortcomings of conventional models. Unlike shallow classifiers, the FFNN architecture employed here leverages multiple hidden layers and nonlinear activation functions to learn intricate relationships within the dataset. The dataset used comprises various attributes such as age, hypertension, heart disease, BMI, and smoking status, which were preprocessed through normalization, one-hot encoding, and imputation techniques to ensure data quality and model performance. Experiments were conducted using a stratified train-test split, and the model was trained using the Adam optimizer with carefully tuned hyperparameters. Comparative evaluations against baseline models (Logistic Regression, Random Forest, and SVM) were carried out using precision, recall, F1-score, and ROC-AUC as performance metrics. The proposed FFNN achieved the highest accuracy of 96.47%, along with substantial improvements in recall and F1-score, highlighting its superior capability in identifying potential stroke cases even in imbalanced datasets. This work bridges a significant gap in heart stroke prediction by demonstrating the effectiveness of deep learning models—specifically FFNNs—in extracting complex patterns from diverse patient data. It also sets the stage for further exploration of deep learning-based clinical decision support systems.
A Dual-Fusion Hybrid Model with Attention for Stunting Prediction among Children under Five Years Hadikurniawati, Wiwien; Hartomo, Kristoko Dwi; Sembiring, Irwan; Arthur, Christian
Journal of Applied Data Sciences Vol 6, No 3: September 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i3.831

Abstract

Malnutrition remains a persistent global health challenge, especially among children under five. Traditional assessment methods often rely on static anthropometric measures, which are limited in capturing complex growth patterns. This study aims to develop a robust classification model for predicting the nutritional status of children under five years old, addressing the critical public health challenge of stunting. The model contributes to the growing need for accurate, data-driven early detection systems in child health monitoring by introducing a hybrid framework that combines deep learning and classical machine learning techniques. The proposed approach integrates automatically extracted features from a One-Dimensional Convolutional Neural Network (1D-CNN) with classical anthropometric indicators. These combined features are processed through an additive attention mechanism, highlighting the most informative attributes. The attention-weighted representation is then classified using an ensemble stacking method that aggregates predictions from multiple base classifiers, including decision trees, nearest neighbor algorithms, support vector machines, etc. Synthetic Minority Over-sampling Technique (SMOTE) is applied to the training dataset to mitigate data imbalance, particularly the underrepresentation of severe and moderate malnutrition cases. The research utilizes a dataset comprising 2,789 records of children under five years old collected from community health posts in Indonesia. Data preprocessing included cleaning, normalization, and gender encoding. The model’s performance was evaluated using 5-fold cross-validation and measured by accuracy, precision, recall, and area under the curve metrics. The results show that the proposed model achieved an average accuracy of 99.70% and an area under the curve of 99.99%. An ablation study further demonstrated the significant contribution of each component, feature extraction, fusion mechanism, and ensemble classifier to the final performance. This approach reveals a robust and scalable solution for early nutritional status prediction in healthcare settings.
Integrating Moving Average Indicators with Long Short-Term Memory Model in Bitcoin Price Forecasting Quang, Phung Duy; Duy, Nguyen Hoang; Khoai, Pham Quang; Duong, Bui Duc
Journal of Applied Data Sciences Vol 6, No 3: September 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i3.766

Abstract

Bitcoin price forecasting remains a challenging task due to the market's high volatility and complex nonlinear dynamics. This study proposes a novel forecasting framework by integrating Long Short-Term Memory (LSTM) networks with Moving Average (MA) indicators—specifically Simple Moving Average (SMA), Exponential Moving Average (EMA), and Weighted Moving Average (WMA)—as auxiliary input features to enhance model accuracy. The objective is to examine the frequency-specific effectiveness of these hybrid models across daily and high-frequency datasets. Using historical Bitcoin data from Bitstamp between January 2021 and December 2024, we conducted experiments at four epoch levels (50, 100, 150, 200) to determine optimal model configurations. Empirical results reveal that, on daily data, LSTM combined with a 10-period WMA achieves the lowest Mean Absolute Percentage Error (MAPE) of 2.1661% at 150 epochs, while for high-frequency data, the combination with a 10-period SMA yields superior performance with a MAPE of 0.4895%. Furthermore, increasing epochs beyond the optimal point led to performance degradation, indicating overfitting. Compared to the standalone LSTM model, our integrated approach demonstrates significantly improved adaptability to short-term fluctuations and heightened forecasting precision. This research contributes a comprehensive comparative analysis of MA-enhanced deep learning models for cryptocurrency price prediction, and offers practical insights for algorithmic traders, financial analysts, and decision-support systems in volatile digital asset markets.
Enhancing Aspect-Based Sentiment Analysis in Tourism Reviews Through Hybrid Data Augmentation Iswari, Ni Made Satvika; Afriliana, Nunik
Journal of Applied Data Sciences Vol 6, No 3: September 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i3.842

Abstract

The increasing reliance on online reviews in tourism has made User-Generated Content (UGC) an invaluable resource for understanding visitor perceptions. However, extracting meaningful insights from these reviews remains challenging due to their unstructured nature, aspect imbalance, and the prevalence of code-mixing between languages such as Indonesian and English—particularly in multicultural destinations like Bali. Aspect-Based Sentiment Analysis (ABSA) offers a promising solution by associating sentiment polarity with specific aspects of tourist experiences. Yet, its performance is often constrained by limited and imbalanced datasets, especially for underrepresented aspects such as sanitation and amenities. This study proposes a hybrid data augmentation framework that integrates three complementary strategies: generative augmentation using ChatGPT, semantic filtering via Sentence-BERT (SBERT), and domain refinement through Masked Language Modeling (MLM). The framework is designed to improve ABSA performance on multilingual tourism reviews by generating synthetic aspect-relevant data while preserving semantic integrity and contextual nuance. Using 398 reviews of Kuta Beach in Bali, we evaluate the effectiveness of the proposed approach across five tourism aspects: scenery, dusk, surf, amenities, and sanitation. Results show that the hybrid strategy reduces hallucination rates from 12% (using ChatGPT alone) to 3.8%, increases F1-scores for underrepresented aspects by up to 5.1%, and improves cross-lingual alignment (Cohen’s κ = 0.78). These improvements demonstrate the synergy between generative and semantic augmentation in addressing real-world ABSA challenges. The proposed method not only advances the state of multilingual ABSA but also offers practical implications for tourism analytics, allowing destination managers to better understand and respond to aspect-specific visitor feedback. The framework is extensible to other low-resource domains, were linguistic diversity and data scarcity present similar limitations.
An IoT-Enabled Smart System Utilizing Linear Regression for Sheep Growth and Health Monitoring Efendi, Syahril; Sihombing, Poltak; Mawengkang, Herman; Turnip, Arjon; Weber, Gerhard Wilhelm
Journal of Applied Data Sciences Vol 6, No 3: September 2025
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v6i3.901

Abstract

The global livestock industry faces significant pressures from climate change, land constraints, and rising consumer demand, necessitating greater efficiency and sustainability in production. To address these challenges, there is a critical need for accessible, data-driven tools; however, accessible and individualized tools for monitoring the growth and health of livestock like sheep remain underdeveloped, limiting farmers' ability to transition from reactive to proactive management. This study developed and validated an Internet of Things (IoT) smart system for monitoring sheep using an Arduino and ESP32 platform equipped with a DHT22 sensor for temperature and humidity and a load cell for weight. Weekly weight data from 15 sheep were collected over a six-month period. Simple linear regression was then applied to model the individual growth trajectory of each animal. The IoT system was successfully implemented and deployed in a farm setting. The primary finding was that individualized linear regression models provided a highly accurate method for tracking sheep growth, with R² values consistently exceeding 99% for most animals. The system effectively delivered real-time reports on growth trajectories and health-relevant environmental conditions (e.g., temperature and humidity) to a smartphone interface, confirming its practical utility. The primary implication of this research is a validated framework for practical and interpretable precision livestock farming. The system empowers farmers to shift from reactive to proactive management by using individualized growth curves as baselines for early problem detection. This dual-function system enhances productivity through precise growth tracking while supporting animal welfare via environmental monitoring, offering a valuable tool for modern, sustainable sheep farming.