cover
Contact Name
Husni Teja Sukmana
Contact Email
husni@bright-journal.org
Phone
+62895422720524
Journal Mail Official
jads@bright-journal.org
Editorial Address
Gedung FST UIN Jakarta, Jl. Lkr. Kampus UIN, Cemp. Putih, Kec. Ciputat Tim., Kota Tangerang Selatan, Banten 15412
Location
Kota adm. jakarta pusat,
Dki jakarta
INDONESIA
Journal of Applied Data Sciences
Published by Bright Publisher
ISSN : -     EISSN : 27236471     DOI : doi.org/10.47738/jads
One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes applied to collect, treat and analyze data will help to render scientific research results reproducible and thus more accountable. The datasets itself should also be accessible to other researchers, so that research publications, dataset descriptions, and the actual datasets can be linked. The journal Data provides a forum to publish methodical papers on processes applied to data collection, treatment and analysis, as well as for data descriptors publishing descriptions of a linked dataset.
Articles 543 Documents
Feature Engineering for Tropical Rainfall Forecasting Using Random Forest and Support Vector Regression Slamet, Cepy; Imron, Rizka M; Wahana, Agung; Maylawati, Dian Sa'adillah; Zulfikar, Wildan Budiawan; Ramdhani, Muhammad Ali
Journal of Applied Data Sciences Vol 7, No 1: January 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i1.1111

Abstract

The complex dynamics of weather variability in Indonesia, influenced by multiple climatic drivers, make rainfall forecasting in tropical regions a significant scientific challenge. This study proposes an automated feature engineering pipeline to enhance the performance of Random Forest Regression (RFR) and Support Vector Regression (SVR) models for tropical rainfall prediction. Monthly rainfall data spanning 388 months (1993–2025) from a BMKG station were used as the basis for model development. The pipeline systematically generates temporal, seasonal, statistical, and anomaly-based features to provide domain-informed representations for non-sequential learning algorithms. Model performance was evaluated under four temporal data partitioning scenarios using R², RMSE, and probabilistic confidence intervals derived from bootstrap residual simulations. Results indicate that RFR achieved the highest predictive accuracy (R² = 0.93; RMSE = 31.01 mm) and demonstrated superior temporal–seasonal stability (Rolling CV: R² = 0.81 ± 0.07; RMSE = 55.44 ± 16.18), with comparable performance between wet and dry seasons. Conversely, SVR showed greater sensitivity to seasonal variability, with R² dropping to 0.55 during the wet season, indicating higher uncertainty under extreme rainfall conditions. Robustness and drift analyses further revealed that RFR adapts better to temporal and seasonal shifts, while SVR remains relevant as an adaptive model for extreme risk analysis. Overall, this study contributes to the development of automated feature engineering, reproducible climatological forecasting pipelines, and probabilistic modeling frameworks for rainfall prediction under uncertainty in tropical regions.
Analyzing Student Sentiments and Insights on Generative AI for Independent Learning in Universities Iswari, Ni Made Satvika; Wijaya, I Nyoman Yudi Anggara; Yuniari, Ni Putu Widya
Journal of Applied Data Sciences Vol 7, No 1: January 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i1.1083

Abstract

Transformations in higher education brought about by Generative AI have significantly changed how university students’ access, comprehend, and develop learning materials. This study explores Indonesian university students’ perceptions and experiences regarding the use of Generative AI for independent learning, employing qualitative surveys together with sentiment analysis powered by machine learning. Data were collected from open-ended questionnaires and analyzed using four key algorithms, such as Naive Bayes, Logistic Regression, Random Forest, and Linear SVM, to classify student sentiments towards generative AI technologies. These four classical machine learning models were employed as baseline algorithms commonly used in sentiment analysis to benchmark performance on small, imbalanced educational datasets before applying more complex transformer-based methods. In addition to quantitative analysis, this study also implements thematic analysis of open-ended responses to identify prominent issues, challenges, and student recommendations concerning the use of generative AI in learning. Evaluation results identified Linear SVM as the most consistent model, with the highest weighted F1-score (0.63), although all models showed limitations in detecting negative sentiment due to class imbalance (only three negative samples out of forty responses), which affected model generalization. Key findings indicate that students perceive Generative AI as a supportive tool that accelerates understanding, creativity, and reference searching; however, they remain wary of risks related to dependency, reduced originality, and academic integrity dilemmas. This article recommends the implementation of ethical policy, AI digital literacy training, and enhancement of campus infrastructure to ensure that AI technologies enrich the learning process without compromising student independence and integrity.
Face Detection Based on Anti-Spoofing with FaceNet Method for Filtering Contract Cheating in Online Exam Ujianto, Erik Iman Heri; Diyasa, I Gede Susrama Mas; Junaidi, Achmad; Fatullah, Ryan Reynickha; Permanasari, Wahyu Melinda; Sari, Allan Ruhui Fatmah
Journal of Applied Data Sciences Vol 7, No 1: January 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i1.1167

Abstract

This study develops a reliable face-based verification system for online examinations by integrating a face recognition model with a blink detection mechanism to minimize the risk of identity fraud, also known as "contract cheating," and static image manipulation. "Contract cheating" refers to the practice where students hire others to complete their exams or assignments, compromising academic integrity. The growing reliance on online exams has raised concerns about the credibility of facial verification, as conventional methods are often vulnerable to spoofing attempts. To address this issue, the proposed system combines FaceNet, a deep learning model for identity recognition, with Dlib’s eye blink detection to provide a stronger layer of protection. The system was evaluated using 5-fold and 10-fold K-fold cross-validation, and additional testing assessed the impact of different video frame rates on performance. The results show that the system performs effectively in identifying legitimate users and detecting spoofing. FaceNet achieved an accuracy of 96.67 percent, outperforming DeepFace, which showed poorer results in precision, recall, and F1 score for some participants. Both models were evaluated on the same dataset, consisting of 150 images. The preprocessing pipeline, including face detection using MTCNN, cropping, and resizing, was applied consistently to both models to ensure a fair comparison of their performance. The system also demonstrated adaptability, achieving correct classifications at both 15 and 30 frames per second. Anti-spoofing tests based on the eye blink detection system detected all real faces, while static images were classified as spoofing. These results confirm that combining face recognition with liveness detection enhances the security of online examination platforms. The findings demonstrate the system's potential to reduce contract cheating and impersonation fraud, making online examinations more credible. Future work may focus on implementing adaptive thresholding for blink detection and integrating multimodal verification techniques to improve robustness across diverse real-world environments.
Multidimensional Data-Driven Modeling of Sustainable E-Commerce Development with Direct and Interaction Effects Loan, Mai Thanh; Tam, Phan Thanh
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1213

Abstract

Sustainable e-commerce development has become a critical issue for emerging economies as digital markets expand rapidly, alongside growing concerns about regulatory effectiveness, trust, and resource efficiency. This study investigates the key factors influencing sustainable e-commerce development in Vietnam by explicitly integrating Institutional Theory, the Technology–Organization–Environment (TOE) framework, and the Resource-Based View (RBV) into a unified structural equation modeling (SEM) framework. Institutional Theory is operationalized through regulatory quality, capturing the role of formal rules and enforcement mechanisms in shaping market stability and legitimacy. The TOE framework is reflected in digital infrastructure, government support, and competitive pressure, which together represent technological readiness and environmental conditions. RBV is operationalized through resource availability and management capacity, emphasizing firms’ internal capabilities to sustain long-term e-commerce performance. In addition, trust is incorporated as both a direct determinant and a moderating mechanism that strengthens the effectiveness of institutional and organizational factors. A mixed-method research design was employed. The qualitative phase involved in-depth discussions with 35 policymakers, business managers, and e-commerce platform managers to refine the theoretical integration and measurement scales. Based on these insights, a structured questionnaire was administered to frequent online shoppers in Ho Chi Minh City and Dong Nai Province, yielding 653 valid responses. SEM results indicate that regulatory quality, digital infrastructure, government support, competitive pressure, resource availability, and trust all have significant positive effects on sustainable e-commerce development. Resource availability and regulatory quality exert the strongest impacts, while trust and management capacity significantly moderate the effects of regulatory quality and resource availability, respectively. By explicitly mapping institutional, technological, organizational, and relational constructs into a coherent SEM framework, this study provides a theoretically grounded and empirically validated model of sustainable e-commerce development in an emerging economy context. The findings offer valuable implications for policymakers and practitioners seeking to foster a resilient, trustworthy, and sustainable e-commerce ecosystem in Vietnam and similar developing economies.
RankPro-M Method to Alleviate the Sparsity Problem in Collaborative Filtering Lestari, Sri; Yulmaini, Yulmaini; Irianto, Suhendro Yusuf; Sabita, Hari
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1173

Abstract

The rapid shift from conventional commerce to online platforms has been driven by evolving consumer behavior that demands fast, accurate, and personalized services. Consequently, e-commerce has become a primary channel for product marketing and service delivery without temporal or spatial constraints. However, the continuous expansion of e-commerce platforms has led to a substantial increase in both the volume and diversity of available products, thereby complicating the task of delivering personalized recommendations aligned with user preferences. Recommender systems offer an effective solution to this challenge, with Collaborative Filtering (CF) being among the most widely adopted techniques. Despite its popularity, CF suffers from a critical limitation known as the data sparsity problem, which adversely affects recommendation accuracy and system reliability. This study proposes RankPro-M, a ranking-oriented imputation approach designed to mitigate the impact of sparsity in recommender systems. RankPro-M operates by identifying items with high rating frequency and imputing missing ratings using mode values as representations of dominant user preferences. The imputed rating matrix is subsequently processed through ranking aggregation mechanisms (Borda, Copeland, and WP-Rank) to generate item recommendations. Experimental results demonstrate that the application of RankPro-M consistently improves recommendation quality, as indicated by increased Normalized Discounted Cumulative Gain (NDCG) values across multiple evaluation scenarios. These findings confirm that RankPro-M effectively addresses data sparsity and enhances the performance of ranking-based recommender systems.
A Modified Watershed Algorithm for Rice Plant Growth Stage Analysis Putra, Teri Ade; Yuhandri, Yuhandri; Ramadhanu, Agung
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1117

Abstract

Information technology plays a crucial role in enhancing various sectors, including agriculture. In particular, technological advancements in crop monitoring are essential for sustainable food production, where accurate growth analysis is vital. Image-based approaches have emerged as a promising tool for assessing crop growth, particularly in rice plants. This study aims to enhance rice plant image segmentation using an improved Watershed algorithm, integrating the Laplacian operator and Distance Transform. This study utilizes a Support Vector Machine (SVM) classifier for segmenting and classifying rice plant growth stages, achieving accuracy, precision, recall, and F1-score metrics. The dataset consists of 1080 images of rice plants, with 74 images used for training, 31 for testing, and 975 images for validation. The image processing pipeline involves preprocessing steps such as grayscale conversion, normalization, color segmentation, Otsu thresholding, filtering, and edge detection. Following preprocessing, the Watershed algorithm is applied in two scenarios: the conventional method and the enhanced method with the Laplacian operator and Distance Transform. Performance evaluation is based on accuracy, precision, recall, and F1-score metrics. The results show that the enhanced Watershed algorithm significantly outperforms the conventional method, achieving an accuracy of 99.58%, precision of 80.55%, recall of 79.92%, and an F1-score of 81.50%. While there is a slight imbalance in precision and recall, the model demonstrates reliable performance in identifying rice plant growth. This study confirms that integrating the Laplacian operator and Distance Transform into the Watershed algorithm significantly improves segmentation accuracy, supporting the development of automated monitoring systems in smart farming. Furthermore, this approach opens avenues for application in other crops and diverse environmental conditions.
An Adaptive Random Forest for Data Stream Sentiment Classification under Concept Drift Arkana, Brian Farrel; Sudianto, Sudianto; Isnaeni, Nenen
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1153

Abstract

Data labeling plays a crucial role in determining the performance of machine learning models, especially in data stream environments where concept drift frequently occurs. The primary objective of this study is to analyze the effectiveness of adaptive learning models in managing dynamic data distribution changes and to evaluate the influence of different labeling strategies on sentiment classification performance using user reviews from the OVO mobile application. The research contributes to understanding how labeling approaches interact with adaptive modeling under real-time data stream conditions. Two labeling methods were employed: score-based labeling derived from user ratings and content-based labeling generated automatically using the IndoRoBERTa language model. These labeled data streams were evaluated using two classifiers: a conventional Random Forest model and an Adaptive Random Forest model designed to handle evolving data distributions. The evaluation was conducted through streaming experiments that continuously fed new review data to simulate real-world drift scenarios. The results reveal that in the score-based labeling scenario, the conventional Random Forest model’s accuracy gradually declined, reaching a final accuracy of 31%, while the Adaptive Random Forest achieved 80%, reflecting a 49% performance gap. In the content-based labeling scenario, both models improved over time, with final accuracies of 57% for Random Forest and 76% for the adaptive model, resulting in a 19% difference. These findings indicate that Adaptive Random Forest is more robust in adapting to distributional and temporal changes in data streams regardless of the labeling strategy used. This study implies that combining adaptive learning with semantically rich labeling approaches can substantially enhance model reliability in real-time sentiment analysis tasks. Future research may further explore hybrid adaptive mechanisms to improve the resilience of data stream classification models across various domains.
Nutritionally Balanced Menu Optimization for a Healthy Lifestyle using Integer Linear Programming Suwarno, Suwarno; Arvando, Anderson; Davina, Davina; Gantoro, Brain; Sama, Hendi; Deli, Deli
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1141

Abstract

Unhealthy dietary patterns and limited access to personalized nutrition guidance contribute significantly to chronic diseases such as diabetes. These issues highlight the need for a reliable, data-driven approach capable of generating individualized dietary recommendations aligned with nutritional standards. This study aims to develop an Integer Linear Programming (ILP) approach integrated with nutritional datasets to generate personalized and nutritionally balanced meal plans. The goal is to determine whether ILP can effectively balance calorie and macronutrient distribution according to user-specific health profiles while ensuring compliance with dietary guidelines and disease-related restrictions. This study applied an ILP-based optimization framework to calculate total daily energy expenditure and macronutrient ratios, incorporating disease-specific constraints and balanced food distributions across meals. Using 244 standardized food items from clinical dietary data, the model’s performance was validated through comparisons with three AI models (ChatGPT, Gemini, DeepSeek) and a certified medical expert across three evaluation rounds. All AI models indicated that the generated meal plans adhered to macronutrient balance and health-specific requirements. Expert validation produced a mean score of 4.85 out of 5 on a Likert scale, reflecting strong agreement regarding the system’s nutritional adequacy, practicality, and safety. These outcomes confirm the ILP framework’s capability to produce balanced, individualized, and clinically sound meal plans. results demonstrate that ILP-based optimization can effectively generate scientifically sound and practical dietary recommendations, meeting both nutritional standards and user-specific needs. The findings highlight ILP’s potential as a computational decision-support tool that complements professional nutrition guidance. Future work should enhance the objective function by adding parameters that model individual preferences, allergy limitations, and cultural dietary norms, and should incorporate extensive clinical datasets to support adaptive recommendation mechanisms that consider chrononutrition, nutritional adequacy, and preparation methods, along with expert-driven adjustments to portion sizes and meal timing for more tailored dietary guidance.
SME Business Intelligence Support Using Retrieval-Augmented Generation and RFM Segmentation Rosalina, Rosalina; Ismail, Noor Lees; Sahuri, Genta; Wibawa, Joseph Tedja Nugraha
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1163

Abstract

This study presents the design and evaluation of a cloud-based business intelligence support system for small and medium enterprises that integrates retrieval-grounded text generation with recency–frequency–monetary customer segmentation to enhance digital customer communication and promotional decision making. The primary objective is to assist individual small businesses in responding accurately to customer inquiries while simultaneously leveraging historical transaction data to identify actionable customer groups, all within their existing messaging workflows through a mobile keyboard interface. The proposed framework combines two complementary components. The first component automatically generates customer replies by retrieving semantically relevant information from a structured business knowledge base and using it to produce grounded, context-aware responses. The second component analyzes invoice records to segment customers into loyal, moderate, and at-risk groups, enabling sellers to tailor promotional strategies based on observed purchasing behavior. The system is implemented as a cloud service accessed by individual enterprises without requiring local infrastructure or model training. System evaluation was conducted using real small business data collected over several weeks. Experimental procedures included retrieval faithfulness analysis, response correctness evaluation with confidence intervals, customer cluster validation using silhouette analysis, end-to-end latency measurement, and structured user acceptance testing. Performance results demonstrate that the retrieval mechanism consistently provides accurate knowledge grounding, while the segmentation module effectively distinguishes high-value and churn-risk customers. The average response time remained within a range perceived as acceptable for real-time mobile conversations, and user testing confirms that the keyboard-based interface does not disrupt normal communication practices. The findings indicate that embedding retrieval-grounded generation and lightweight customer analytics directly into daily messaging tools can significantly improve the operational efficiency of small enterprises. This integrated approach reduces the burden of manual response handling while enabling data-driven promotional decision making. The framework offers a practical pathway for adopting artificial intelligence in small business environments and provides a foundation for future enhancements such as temporal behavior modeling and multilingual support.
A Hybrid TF-IDF and Knowledge Graph-Enhanced Retrieval-Augmented Generation Framework with Large Language Models for Domain-Aware Question Answering Utami, Lilyani Asri; Rachmi, Hilda; Hidayatulloh, Syarif
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1136

Abstract

This study aims to develop a domain-aware legal Question-Answering (QA) system tailored for Indonesia’s Micro, Small, and Medium Enterprises (MSMEs) by proposing a hybrid Retrieval-Augmented Generation (RAG) framework that integrates Term Frequency–Inverse Document Frequency (TF-IDF), Knowledge Graph (KG), and Large Language Model (LLM) components. In this framework, TF-IDF contributes by performing lexical-level retrieval to identify the most relevant documents based on keyword weighting; the KG enriches this retrieval by providing semantic relationships among legal entities, enabling deeper contextual understanding; and the LLM generates coherent responses conditioned on both lexical and semantically grounded evidence. Together, these components work synergistically to strengthen factual grounding during retrieval and improve contextual reasoning during generation. Methodologically, the system processes a curated dataset of 1,400 legal question–answer pairs collected from national legal repositories, including legislation, government regulations, and MSME digitalization guidelines. The process includes text preprocessing, keyword extraction using TF-IDF, semantic enrichment through a KG that maps legal entities and their relationships, and answer generation via an LLM powered by the RAG pipeline. The system was evaluated using Precision, Recall, F1-Score, Bilingual Evaluation Understudy (BLEU), and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics, validated by five legal experts. Results show an accuracy improvement from 76.5% to 83.5% after integrating KG, with Precision of 0.853, Recall of 0.877, and F1-Score of 0.865. The generative evaluation yielded a BLEU score of 0.9276 and ROUGE-L of 0.9301, indicating strong linguistic and semantic alignment between system outputs and expert-authored references. The study concludes that this approach offers a practical foundation for building AI-based legal assistance tools and highlights future opportunities for expansion to other legal domains and multilingual RAG applications.