Hiola, Yani Prihantini
Unknown Affiliation

Published : 3 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 3 Documents
Search

Performance Evaluation of Multinomial Logistic Regression, Random Forest, and XGBoost Methods in Data Classification Mega Maulina; Hiola, Yani Prihantini; Indahwati; Aam Alamudi
Journal of Mathematics, Computations and Statistics Vol. 8 No. 2 (2025): Volume 08 Nomor 02 (Oktober 2025)
Publisher : Jurusan Matematika FMIPA UNM

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35580/jmathcos.v8i2.8459

Abstract

The development of data volume and complexity in the digital era increases the need for effective classification methods to support decision-making. Decision-making in classification tasks often requires methods that are well-suited to the data, along with the ability to produce accurate and reliable predictions. As scientific knowledge continues to advance, a wide range of classification methods have been developed. This study aims to analyze the performance of three commonly used classification methods Multinomial Logistic Regression, Random Forest, and XGBoost, in handling diverse data characteristics. Ten varied public datasets were used in this research, with differences in the number of classes, features, instances, balanced and imbalanced data conditions. Evaluation was conducted based on accuracy, F1-score, precision, and recall. The analysis results show that Random Forest consistently delivers the best performance particularly on imbalanced data. XGBoost demonstrates superiority on more complex datasets, while Multinomial Logistic Regression proves more effective on relatively small datasets. This research provides valuable insights into selecting appropriate classification methods based on data characteristics and highlights the effectiveness of ensemble-based approaches in handling diverse data. Based on the findings, it is recommended that the selection of classification algorithms be tailored to the characteristics of the dataset. Random Forest is preferable for handling imbalanced data, while XGBoost is ideal for complex datasets requiring robust hyperparameter tuning. Multinomial Logistic Regression remains a viable option for simpler datasets with fewer observations and features. Future research could explore hybrid models that combine these approaches to further optimize classification performance across various domains.
School Accreditation Prediction Based on Literacy and Numeracy: Ordinal Logistic Regression vs KNN Syukri, Nabila; Hiola, Yani Prihantini; Putri, Mega Ramatika; Susetyo, Budi
Bulletin of Computer Science Research Vol. 6 No. 1 (2025): December 2025
Publisher : Forum Kerjasama Pendidikan Tinggi (FKPT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bulletincsr.v6i1.861

Abstract

School accreditation in Indonesia has traditionally relied on administrative inputs and institutional documentation, which often fail to capture the actual quality of student learning. In contrast, the National Assessment provides direct evidence of student literacy and numeracy outcomes, offering a more objective and outcome-based measure of educational quality. Leveraging these results as predictors for accreditation rankings is therefore crucial, as they reflect the competencies most relevant to effective learning delivery. This study aims to develop and evaluate classification models for school accreditation rankings using literacy and numeracy results as predictor variables. The dataset consists of secondary data from the 2023 and 2024 National School Assessments, covering 789 schools across four provinces: DKI Jakarta, Yogyakarta, Bali, and Banten. Two methods were applied, Ordinal Logistic Regression and K-Nearest Neighbors (K-NN) under two scenarios: with and without class imbalance handling. To address imbalance, two techniques were employed: Synthetic Minority Oversampling Technique (SMOTE) and Class Weight. The results indicate that K-NN consistently outperformed Ordinal Logistic Regression in both scenarios. On data without imbalance handling, K-NN achieved Accuracy, Precision, Recall, and F1-Score of 0.803, 0.705, 0.587, and 0.619, respectively. with imbalance treatment using SMOTE, the values were 0.753, 0.619, 0.686, and 0.644. While class balancing did not significantly improve overall accuracy, it enhanced the model’s ability to recognize minority classes. These findings highlight the strong relationship between literacy and numeracy outcomes and school accreditation status, demonstrating that outcome-based measures can complement traditional accreditation instruments, and that conventional statistical approaches are still relevant for modeling school accreditation.
Evaluation of Tree-Based Models for Predicting Social Assistance Recipient Status Based on National Socio-Economic Survey (SUSENAS) 2024 Hiola, Yani Prihantini; Zulhijrah; Putra, I Gusti Ngurah Sentana; Limba, Syella Zignora; Sartono, Bagus; Firdawanti, Aulia Rizki; Susetyo, Budi; Dito, Gerry Alfa
Journal of Mathematics, Computations and Statistics Vol. 9 No. 1 (2026): Volume 09 Issue 01 (March 2026)
Publisher : Jurusan Matematika FMIPA UNM

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35580/xyyv0f37

Abstract

Abstract. Poverty is a major socioeconomic challenge in Indonesia that affects the effectiveness of social protection programs. In response to this challenge, the government has created social assistance programs to improve the welfare of the people. However, the distribution of social assistance is often considered to be inaccurate, resulting in households that are deemed eligible for social assistance not being identified as recipients. One solution to improve the accuracy of distribution is the application of machine learning in the context of classification. Several tree-based models, such as LightGBM, Random Forest, and XGBoost, were selected because of their superior capabilities compared to classical models such as logistic regression, especially in handling complex data and fulfilling model assumptions. This study compares the performance of these three models in predicting social assistance recipient status using data from the 2024 West Java Provincial National Socioeconomic Survey (SUSENAS). Model evaluation was conducted on several data pre-processing scenarios involving outlier handling, class balancing, and feature engineering. The results show that LightGBM consistently outperforms the other models on six metrics, namely Accuracy, Balanced Accuracy, F1-Score, ROC-AUC, PR-AUC, and Brier Score, out of a total of eight evaluation metrics used. SHAP analysis identifies Social Assistance History and Asset Score as the most influential features for model prediction. Friedman and Nemenyi nonparametric tests confirmed significant performance differences between LightGBM and other models based on the F1-Score, PR-AUC, and Brier Score metrics. These findings indicate that tree-based models, particularly LightGBM, can support the development of a more targeted and data-driven social assistance targeting system. Keywords: Social Assistance; Tree-Based; SHAP; SUSENAS; Hybrid Bayesian Optimization