cover
Contact Name
Husni Teja Sukmana
Contact Email
husni@bright-journal.org
Phone
+62895422720524
Journal Mail Official
jads@bright-journal.org
Editorial Address
Gedung FST UIN Jakarta, Jl. Lkr. Kampus UIN, Cemp. Putih, Kec. Ciputat Tim., Kota Tangerang Selatan, Banten 15412
Location
Kota adm. jakarta pusat,
Dki jakarta
INDONESIA
Journal of Applied Data Sciences
Published by Bright Publisher
ISSN : -     EISSN : 27236471     DOI : doi.org/10.47738/jads
One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes applied to collect, treat and analyze data will help to render scientific research results reproducible and thus more accountable. The datasets itself should also be accessible to other researchers, so that research publications, dataset descriptions, and the actual datasets can be linked. The journal Data provides a forum to publish methodical papers on processes applied to data collection, treatment and analysis, as well as for data descriptors publishing descriptions of a linked dataset.
Articles 588 Documents
Forecasting Bank Efficiency Using Data Envelopment Analysis with Directional Distance Functions and Machine Learning: Time-Series Validation and Shapley Value Interpretation Chau Dinh Linh
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1244

Abstract

This study develops a structured framework to forecast the operational efficiency of commercial banks in Vietnam. The analysis is based on a balanced panel of 27 banks over the period 2016–2024. Bank efficiency is first measured using a directional distance function within a data envelopment analysis framework (DEA – DDF). This approach incorporates both desirable outputs and undesirable outputs related to credit risk. The estimated efficiency scores are then used as prediction targets in several machine learning models. Model performance is evaluated under both conventional test settings and time-series cross-validation, and predictions are interpreted using Shapley value–based analysis (SHAP). Under a conventional test set, the gradient boosting model (XGBoost) shows the best performance, with a root mean squared error of 0.060 and a coefficient of determination (R²) of 0.353. However, when time-series cross-validation is applied to reflect realistic forecasting conditions, predictive accuracy declines sharply. The average coefficient of determination falls to approximately 0.005. This suggests that static validation can overstate performance and that forecasting efficiency in a changing financial environment remains difficult. The interpretation results provide additional insights. Net interest margin has a positive effect on predicted efficiency, although the effect weakens at very high levels. The cost-to-income ratio shows a threshold around 0.55, beyond which efficiency declines more strongly. Bank size has a largely neutral impact. The interaction between capital adequacy and profitability shows a conditionally negative pattern. Prediction errors are larger in the most recent year and among banks with very high efficiency scores. In summary, the results highlight both the potential and the limitations of machine learning in forecasting efficiency and emphasize the importance of time-aware validation.
Utilization of K-means Clustering for Classifying Diabetes Risk Populations According to Health Behaviors and 3Es-2Ss Health Literacy Supaporn Yodmunee; Wongpanya S. Nuankaew; Thapanapong Sararat; Pratya Nuankew
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1042

Abstract

This study focused on classifying populations at risk for diabetes using K-means clustering integrated with the 3Es–2Ss health literacy framework: eating, exercise, emotion, smoking cessation, and alcohol cessation. Biological, behavioral, and health literacy data were analyzed. The dataset was collected from 126 participants identified as at-risk individuals in Ngao District, Lampang Province, Thailand. This relatively small, community-based sample provides valuable insights into local health behaviors but limits the generalizability and statistical power of the findings to broader populations. The K-means clustering analysis, guided by the Elbow method, identified k = 4 as the optimal number of clusters, yielding four distinct groups with different socio-demographic and health characteristics. These clusters revealed variations in health profiles, economic status, and behavioral literacy within the Thai population. Despite the small sample size and limited generalizability, missing data and inconsistencies were systematically addressed through data cleaning and normalization to maintain analytical reliability. The results suggest that K-means clustering can serve as an effective decision-support tool for public health planning, particularly for Non-Communicable Disease (NCD) prevention and diabetes management at the local level.
Comparison of Multilingual Model Sensitivity for Political Fact Verification with Integrated Multi-Evidence Nova Agustina; Kusrini Kusrini; Ema Utami; Tonny Hidayat
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1198

Abstract

Political news is frequently targeted by the dissemination of fake news on social media, which can influence public opinion and undermine trust in democratic processes. The main challenge in addressing this issue lies in the limited sensitivity of cross-lingual fact verification models in capturing semantic relationships between claims and evidence in long-text, multi-evidence settings. Existing approaches often struggle to assess the relevance and quality of evidence, resulting in suboptimal verification performance. This study compares three multilingual Large Language Models (LLMs), namely mBERT, XLM-R, and LaBSE, for political fact verification using an integrated multi-evidence approach. Experiments are conducted on the PolitiFact dataset, with performance evaluated using sensitivity, accuracy, precision, and F1-score metrics.The results indicate that mBERT achieves the highest overall sensitivity at 89.44%, followed by LaBSE at 81.81% and XLM-R at 78.81%. However, mBERT exhibits lower precision, whereas LaBSE provides a better balance between precision (87.02%) and accuracy (86.46%), resulting in an F1-score of 84.33%. XLM-R demonstrates lower sensitivity but maintains competitive precision (85.47%) and accuracy (84.60%), with an F1-score of 82.00%. Sensitivity analysis based on the number of evidence reveals distinct model behaviors, where mBERT performs optimally with six pieces of evidence, XLM-R is more effective under limited evidence conditions, and LaBSE shows a stable and increasing sensitivity trend as the amount of evidence increases, indicating robustness in multi-evidence scenarios. Further statistical analysis shows that XLM-R has the lowest performance variance, while LaBSE statistically outperforms mBERT in several evaluation aspects. Overall, LaBSE is recommended as the most balanced model for multi-evidence-based political fact verification.
Psychometric Validation of an AI-Based Evaluation System for Identifying Discrepancies in Learning Processes P. Wayan Arta Suyasa; I Gusti Ngurah Pujawan; Dewa Gede Hendra Divayana; I Dewa Ayu Made Budhyani; I Made Sugiarta; I Made Candiasa
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1168

Abstract

This research advances the field of educational evaluation by designing and psychometrically validating an artificial intelligence (AI)- based diagnostic tool to detect discrepancies in university learning processes. The main novelty is the integration of the Provus Discrepancy Model combined with a forward-chaining inference engine. This research aims to transform evaluation from an administrative activity to an ongoing process of improvement. The tool was developed and validated through a sequential mixed-methods approach with 400 participants from 3 state universities and 8 evaluation experts. Results from the study provide evidence that the validated system created a substantial range of psychometric characteristics. These psychometric characteristics include strong content validity (SD-CVI/Ave = 0.94); high internal consistency and reliability (Cronbach's α = 0.94); solid construct validity as demonstrated through Confirmatory Factor Analysis (CFA) (CFI = 0.94; RMSEA = 0.054) and a substantial range of predictive analytics (diagnostic learning analytics), which the AI learning analytics engine evaluated learning discrepancies with a 92.4% diagnostic accuracy (47.4% more accurate than manual evaluation methods). The system's validated usefulness is demonstrated through high system usability (SUS = 88.2); high practical utility (85% total score on the Pragmatic Utility Assessment); significant utility (real-world) practical utility (detected 45 discrepancy patterns), cost efficiency (73% cost and 67% analysis time compared to traditional methods), and a range of analytics (predictive and learning discrepancy analytics). The significant contribution of this study is the development of the world's first integrated AI evaluation system that meets high methodological and psychometric standards, along with a set of real-time diagnostic analytics. Ultimately, this study developed the first truly integrated, novel paradigm evaluation system that combined the historically established evaluation construct and mechanisms with the most advanced AI capabilities, providing educators and institutions with evaluation tools to deliver data-driven pedagogical strategies and interventions in higher education. 
Predicting Whistleblowing Intention Using Supervised Machine Learning: Integrating TPB and IEDM in State-Owned Enterprises Muhammad Rizal Satria; Hamfri Djajadikerta; Amelia Setiawan
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1292

Abstract

Whistleblowing plays a critical role in detecting organizational misconduct; however, understanding the determinants of whistleblowing intention remains a challenge. Prior studies predominantly rely on regression or structural equation modeling, which focus on explanatory relationships rather than predictive evaluation. This study addresses this limitation by integrating the Theory of Planned Behavior and the Integrated Ethical Decision-Making Model within a supervised machine learning framework. Data were collected from 382 permanent employees of Indonesian state-owned enterprises (BUMN) using a structured questionnaire. Three classification algorithms—Logistic Regression, Support Vector Machine (SVM), and Random Forest—were implemented to evaluate predictive performance. The results indicate that Random Forest achieved the highest predictive accuracy and discrimination capability. Feature importance analysis reveals that perceived behavioral control is the strongest predictor of whistleblowing intention, followed by ethical awareness and attitude, while subjective norms show comparatively weaker influence. These findings refine TPB by demonstrating the dominant role of perceived behavioral control in high-risk ethical decisions and reinforce the importance of ethical awareness as a cognitive trigger within the IEDM framework. The study contributes by bridging behavioral theory and predictive analytics while offering governance insights for strengthening whistleblowing systems in state-owned enterprises.
Automated Pixel-Level Concrete Defect Detection using U-Net Architecture: A Comparative Study with Clustering-Based Segmentation Halifia Hendri; Larissa Navia Rani; Sofika Enggari; Agung Ramadhanu; Febri Hadi
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1298

Abstract

Concrete surface defect detection is a critical aspect of maintaining the integrity and safety of infrastructure in civil engineering. Traditional manual inspection methods are time-consuming, prone to human subjectivity, and often limited by physical accessibility, necessitating the development of robust automated solutions. This paper presents an automated pixel-level concrete surface defect detection system utilizing the U-Net deep learning architecture. The primary contribution and novelty of our approach lie in optimizing the network's encoder-decoder structure with skip connections to effectively capture both broad contextual features and precise spatial localization. This overcomes the critical limitations of existing traditional methods, which frequently struggle with complex concrete background textures, inherent noise, and uneven illumination. To validate our approach, the proposed U-Net model is systematically compared against a widely used baseline method, K-Means clustering combined with Gray-Level Co-occurrence Matrix (GLCM) texture analysis. The evaluation was conducted using a comprehensive dataset consisting of 1000 high-resolution concrete images. Experimental results reveal that the deep learning architecture vastly outperforms the traditional baseline. Specifically, the U-Net model achieved an outstanding F1-Score of 92.47%, a precision of 93.18%, and a mean Intersection over Union (mIoU) of 86.55%. In stark contrast, the K-Means and GLCM approach only yielded an F1-Score of 69.83% and an mIoU of 54.21%. These quantitative findings demonstrate that the proposed U-Net-based system not only successfully minimizes false segmentations but also provides a highly reliable, efficient, and scalable computational framework. Ultimately, this research delivers a practical solution that can be seamlessly integrated into continuous automated structural health monitoring systems, paving the way for safer and more proactive civil infrastructure management.
A Hybrid Fuzzy-LLM Framework for Difficulty Estimation of Math Word Problems: A Data-Driven Human-in-the-Loop Study Shilpa Kadam; Jabez Christopher; PTV Praveen Kumar; Dipak Kumar Satpathi
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1187

Abstract

Assessing the difficulty levels of Math Word Problems (MWPs) is essential for adaptive learning, yet most existing MWP datasets lack standardized difficulty annotations. This paper proposes a decision framework that integrates a 2-tuple Fuzzy Linguistic Decision Model (FLDM) with Large Language Models (LLMs) for automated difficulty estimation. A corpus of over 2,000 MWPs was compiled, of which 200 were annotated by seven instructors and an additional 454 were validated by ten experts. Consensus stability improved markedly (Fleiss’ κ = 0.14 → Cohen’s κ = 0.32), reflecting stronger alignment between expert judgments and the proposed fuzzy 2-tuple aggregation. Sixteen LLM configurations were evaluated, including GPT-3.5, GPT-4o-Mini, Gemini Flash, and LLaMA-3.2 under Zero-Shot, Five-Shot, and RAG settings. GPT-3.5 Zero-Shot achieved the best performance (Precision=0.65, Recall=0.63, F1=0.63), outperforming GPT-4o-Mini and Gemini variants. The validated dataset and linguistic ground truth were integrated into a web-based annotation system (themathbits.com), demonstrating scalability for real-world deployment. The results show that combining human linguistic judgments with fuzzy modeling and LLM inference improves reliability of MWP difficulty estimation, providing a foundation for future adaptive learning platforms. 
PRAKE: A Modified RAKE Model for Keyword Extraction in Accreditation Assessment Descriptions Helena Nurramdhani Irmanda; Sri Hartati; Sri Mulyana
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1057

Abstract

Study program accreditation requires aligning assessment criteria with the Self-Evaluation Sheet (LED), which is usually written as a lengthy and complex narrative. Finding relevant information requires a method that can automatically extract keywords from assessment descriptions as representations of the criteria. Keyword extraction can be applied through the Rapid Automatic Keyword Extraction (RAKE) method, a simple technique that works without labeled data. However, standard RAKE uses stopwords as delimiters to segment candidate phrases, making it less effective for complex sentences such as those found in accreditation assessment descriptions. Because a single sentence may contain several ideas, the extraction process should handle phrases carefully through splitting, merging, or extension according to their structure and meaning. To address this limitation, this study introduces PRAKE (Phrase-Refined RAKE), a modified RAKE algorithm that enhances candidate phrase extraction. Modifications are carried out at the Candidate Phrase Extraction stage through three techniques, including Phrase Completion to complete short phrases afterwards with the prefix of the previous phrase, Phrase Restructuring to rearrange phrases through merging or separation based on structure and meaning, and Semantic Phrase Composition to form new phrases from different elements that are semantically interrelated. Additionally, a domain term weighting based on term frequency is integrated into the scoring calculation to strengthen the relevance of terms to the accreditation context. The model achieved a precision of 0.90, recall of 0.83, and F1-score of 0.85, representing the average performance across all 101 assessment descriptions evaluated in this study. The results demonstrate that PRAKE adapts better to accreditation terminology and improves keyword relevance and extraction efficiency. These findings indicate that PRAKE provides a foundation for automated evaluation and can be extended for cross-domain document analysis.
Improvement of Interpolation Performance with Statistical Method in Total Suspended Solid Identification Hadi Syahputra; Yuhandri Yuhandri; Sumijan sumijan
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1190

Abstract

Total Suspended Solids (TSS) is one of the key parameters used to determine water quality, which can be observed through the density level of suspended particles. The determination of TSS aims to ensure that river pollution levels can be controlled to maintain good environmental quality. However, the identification of TSS is still performed manually, which requires a relatively long processing time. This condition highlights the need for an effective and efficient identification process. Based on these considerations, this study aims to develop an extraction technique to identify TSS in river water using the Interpolation Mean Square (IMS) algorithm. The development of the extraction technique within the IMS algorithm is crucial for improving the performance of linear interpolation methods. Mean Square is proposed as a parameter in the interpolation process to optimize the extraction algorithm. The segmentation process based on the performance of the IMS algorithm involves exploring and grouping image intensity values. The resulting segmented image clusters are subsequently selected based on the values produced by the Mean Square computation, which are then processed as the final segmentation output. The experimental results show an improvement in the performance evaluation results of the IMS algorithm providing an increase of 7% to 10% over the previous linear interpolation method. The evaluation results produced by the IMS algorithm are 90.19% accuracy, 99.99% sensitivity, and 83.33% specificity. These results indicate that the improved interpolation method presented in the IMS algorithm produces optimal results in determining TSS. Improving the performance of the interpolation method through the development of an IMS-based extraction technique has succeeded in producing optimal identification results. The superiority of the IMS algorithm provides novelty in the development of interpolation techniques for automated segmentation. Furthermore, the findings of this study can effectively support the West Sumatra Environmental Agency in addressing river water pollution issues.
From Luxury to Mass Market: How Brand Love and Luxury Perception Drive Purchase Intentions Through TPB Risky Rahmawati Pinardi; Agung Wahyu Handaru; Agus Wibowo
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1246

Abstract

This study examines psychological mechanisms shaping consumer purchase intentions for mass market products from former luxury brands. Grounded in the Theory of Planned Behavior and attachment theory, we investigate how luxury brand perception and brand love influence attitudes and subjective norms as antecedents of behavioral intention. Data were collected from 868 Jakarta consumers through an online survey and analyzed using partial least squares structural equation modeling to assess measurement and structural relationships. Results indicate that luxury brand perception exerts significant direct effects on purchase intention and indirect effects mediated specifically through attitude and subjective norm. Brand love influences purchase intention exclusively through these same attitudinal and normative pathways, with no significant direct effect observed. Both attitude and subjective norm significantly mediate the relationships between luxury perception, brand love, and purchase intention. Contrary to expectations, self-referencing does not moderate the attitude intention or subjective norm intention relationships, suggesting limited influence of self-related cognitive processing in this context. Theoretically, this research advances the Theory of Planned Behavior by positioning brand love as an antecedent rather than an outcome of attitudes and subjective norms, thereby integrating emotional attachment as a foundational driver within rational decision frameworks. Managerially, findings suggest that luxury brands entering mass markets should prioritize preserving symbolic heritage and cultivating emotional bonds while leveraging social validation mechanisms to translate brand love into actual purchase behavior. Limitations include the cross-sectional design restricting causal inference and the single culture Jakarta sample limiting generalizability. Future research should employ longitudinal and cross-cultural designs to examine dynamic emotional attachment processes and test model robustness across diverse consumer contexts.