cover
Contact Name
Husni Teja Sukmana
Contact Email
husni@bright-journal.org
Phone
+62895422720524
Journal Mail Official
jads@bright-journal.org
Editorial Address
Gedung FST UIN Jakarta, Jl. Lkr. Kampus UIN, Cemp. Putih, Kec. Ciputat Tim., Kota Tangerang Selatan, Banten 15412
Location
Kota adm. jakarta pusat,
Dki jakarta
INDONESIA
Journal of Applied Data Sciences
Published by Bright Publisher
ISSN : -     EISSN : 27236471     DOI : doi.org/10.47738/jads
One of the current hot topics in science is data: how can datasets be used in scientific and scholarly research in a more reliable, citable and accountable way? Data is of paramount importance to scientific progress, yet most research data remains private. Enhancing the transparency of the processes applied to collect, treat and analyze data will help to render scientific research results reproducible and thus more accountable. The datasets itself should also be accessible to other researchers, so that research publications, dataset descriptions, and the actual datasets can be linked. The journal Data provides a forum to publish methodical papers on processes applied to data collection, treatment and analysis, as well as for data descriptors publishing descriptions of a linked dataset.
Articles 553 Documents
Forecasting Bank Efficiency Using Data Envelopment Analysis with Directional Distance Functions and Machine Learning: Time-Series Validation and Shapley Value Interpretation Chau Dinh Linh
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1244

Abstract

This study develops a structured framework to forecast the operational efficiency of commercial banks in Vietnam. The analysis is based on a balanced panel of 27 banks over the period 2016–2024. Bank efficiency is first measured using a directional distance function within a data envelopment analysis framework (DEA – DDF). This approach incorporates both desirable outputs and undesirable outputs related to credit risk. The estimated efficiency scores are then used as prediction targets in several machine learning models. Model performance is evaluated under both conventional test settings and time-series cross-validation, and predictions are interpreted using Shapley value–based analysis (SHAP). Under a conventional test set, the gradient boosting model (XGBoost) shows the best performance, with a root mean squared error of 0.060 and a coefficient of determination (R²) of 0.353. However, when time-series cross-validation is applied to reflect realistic forecasting conditions, predictive accuracy declines sharply. The average coefficient of determination falls to approximately 0.005. This suggests that static validation can overstate performance and that forecasting efficiency in a changing financial environment remains difficult. The interpretation results provide additional insights. Net interest margin has a positive effect on predicted efficiency, although the effect weakens at very high levels. The cost-to-income ratio shows a threshold around 0.55, beyond which efficiency declines more strongly. Bank size has a largely neutral impact. The interaction between capital adequacy and profitability shows a conditionally negative pattern. Prediction errors are larger in the most recent year and among banks with very high efficiency scores. In summary, the results highlight both the potential and the limitations of machine learning in forecasting efficiency and emphasize the importance of time-aware validation.
Utilization of K-means Clustering for Classifying Diabetes Risk Populations According to Health Behaviors and 3Es-2Ss Health Literacy Supaporn Yodmunee; Wongpanya S. Nuankaew; Thapanapong Sararat; Pratya Nuankew
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1042

Abstract

This study focused on classifying populations at risk for diabetes using K-means clustering integrated with the 3Es–2Ss health literacy framework: eating, exercise, emotion, smoking cessation, and alcohol cessation. Biological, behavioral, and health literacy data were analyzed. The dataset was collected from 126 participants identified as at-risk individuals in Ngao District, Lampang Province, Thailand. This relatively small, community-based sample provides valuable insights into local health behaviors but limits the generalizability and statistical power of the findings to broader populations. The K-means clustering analysis, guided by the Elbow method, identified k = 4 as the optimal number of clusters, yielding four distinct groups with different socio-demographic and health characteristics. These clusters revealed variations in health profiles, economic status, and behavioral literacy within the Thai population. Despite the small sample size and limited generalizability, missing data and inconsistencies were systematically addressed through data cleaning and normalization to maintain analytical reliability. The results suggest that K-means clustering can serve as an effective decision-support tool for public health planning, particularly for Non-Communicable Disease (NCD) prevention and diabetes management at the local level.
Comparison of Multilingual Model Sensitivity for Political Fact Verification with Integrated Multi-Evidence Nova Agustina; Kusrini Kusrini; Ema Utami; Tonny Hidayat
Journal of Applied Data Sciences Vol 7, No 2: May 2026
Publisher : Bright Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47738/jads.v7i2.1198

Abstract

Political news is frequently targeted by the dissemination of fake news on social media, which can influence public opinion and undermine trust in democratic processes. The main challenge in addressing this issue lies in the limited sensitivity of cross-lingual fact verification models in capturing semantic relationships between claims and evidence in long-text, multi-evidence settings. Existing approaches often struggle to assess the relevance and quality of evidence, resulting in suboptimal verification performance. This study compares three multilingual Large Language Models (LLMs), namely mBERT, XLM-R, and LaBSE, for political fact verification using an integrated multi-evidence approach. Experiments are conducted on the PolitiFact dataset, with performance evaluated using sensitivity, accuracy, precision, and F1-score metrics.The results indicate that mBERT achieves the highest overall sensitivity at 89.44%, followed by LaBSE at 81.81% and XLM-R at 78.81%. However, mBERT exhibits lower precision, whereas LaBSE provides a better balance between precision (87.02%) and accuracy (86.46%), resulting in an F1-score of 84.33%. XLM-R demonstrates lower sensitivity but maintains competitive precision (85.47%) and accuracy (84.60%), with an F1-score of 82.00%. Sensitivity analysis based on the number of evidence reveals distinct model behaviors, where mBERT performs optimally with six pieces of evidence, XLM-R is more effective under limited evidence conditions, and LaBSE shows a stable and increasing sensitivity trend as the amount of evidence increases, indicating robustness in multi-evidence scenarios. Further statistical analysis shows that XLM-R has the lowest performance variance, while LaBSE statistically outperforms mBERT in several evaluation aspects. Overall, LaBSE is recommended as the most balanced model for multi-evidence-based political fact verification.