Claim Missing Document
Check
Articles

Found 3 Documents
Search

PENERAPAN K-MODES DALAM KLASTERISASI KABUPATEN/KOTA DI JAWA BARAT BERDASARKAN INDIKATOR INFRASTRUKTUR Rahman, Abd.; Anadra, Rahmi; Fitrianto , Anwar; Erfiani, Erfiani; Dwi Jumansyah, L.M. Risman
Jurnal Lebesgue : Jurnal Ilmiah Pendidikan Matematika, Matematika dan Statistika Vol. 5 No. 3 (2024): Jurnal Lebesgue : Jurnal Ilmiah Pendidikan Matematika, Matematika dan Statistik
Publisher : LPPM Universitas Bina Bangsa

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.46306/lb.v5i3.787

Abstract

Clustering is a statistical method used to group data based on certain similar characteristics, particularly in the context of complex and diverse data. This study aims to cluster districts/cities in West Java Province based on infrastructure indicators, namely access to clean water, sanitation, electricity, and energy, using the K-Modes clustering method. The data used is categorical data sourced from SUSENAS West Java 2023. The cluster analysis resulted in four distinct clusters, each representing significant differences in infrastructure characteristics across regions. The first cluster consists of 8 regions, the second cluster includes 7 regions, the third cluster consists of 1 region, and the fourth cluster contains 11 regions. These characteristic differences among clusters indicate infrastructure disparities that need to be addressed in planning more equitable development to improve the quality of life of people in West Java
Loan Approval Classification Using Ensemble Learning on Imbalanced Data Anadra, Rahmi; Sadik, Kusman; Soleh, Agus M; Astari, Reka Agustia
Enthusiastic : International Journal of Applied Statistics and Data Science Volume 4 Issue 2, October 2024
Publisher : Universitas Islam Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.20885/enthusiastic.vol4.iss2.art1

Abstract

Loan processing is an important aspect of the financial industry, where the right decisions must be made to determine loan approval or rejection. However, the issue of default by loan applicants has become a significant concern for financial institutions. Hence, ensemble learning needs to be used with random forest and Extreme Gradient Boosting (XGBoost) algorithms. Unbalanced data are handled using the Synthetic Minority Over-sampling Technique (SMOTE). This research aimed to improve accuracy and precision in credit risk assessment to reduce human workload. Both algorithms used a dataset of 4,296 with 13 variables relevant to making loan approval decisions. The research process involved data exploration, data preprocessing, data sharing, model training, model evaluation with accuracy, sensitivity, specificity, and F1-score, model selection with 10-fold cross-validation, and important variables. The results showed that XGBoost with imbalanced data handling had the highest accuracy rate of 98.52% and a good balance between sensitivity of 98.83%, specificity of 98.01, and F1-score of 98.81%. The most important variables in determining loan approval are credit score, loan term, loan amount, and annual income.
Sentiment Analysis of Tokopedia Customer Reviews Using BiLSTM and IndoBERT with Comparative Analysis of Preprocessing and Labeling Methods Anadra, Rahmi; Wijayanto, Hari; Sadik, Kusman
International Journal of Advances in Data and Information Systems Vol. 6 No. 3 (2025): December 2025 - International Journal of Advances in Data and Information Syste
Publisher : Indonesian Scientific Journal

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.59395/ijadis.v6i3.1458

Abstract

This study addresses key challenges in Indonesian sentiment analysis related to preprocessing, labeling strategies, and class imbalance. It compares the performance of BiLSTM and IndoBERT using user reviews collected from Tokopedia. The dataset was manually and automatically labeled, then processed under three preprocessing schemes. Both models were trained with tuned hyperparameters and imbalance-handling techniques and evaluated through twenty rounds of stratified five-fold cross-validation. Performance was assessed using balanced accuracy and F1-score. IndoBERT achieved the highest results, with balanced accuracy up to 0.85 and F1-scores up to 0.83, while BiLSTM reached balanced accuracy up to 0.78 and F1-scores up to 0.76. Applying class weight and focal loss improved model performance by approximately 2% to 11% over the baseline. BiLSTM demonstrated greater training efficiency, requiring only 1 to 2.5 minutes per epoch, compared with IndoBERT’s 2.6 to 3.6 minutes. Although manual labeling remained superior in capturing contextual nuance and emotional cues, GPT-based labeling showed strong agreement with the human annotations. A four-way ANOVA revealed that all main factors and several interactions significantly influenced classification outcomes. Overall, BiLSTM provides faster training efficiency, whereas IndoBERT delivers higher predictive accuracy.