Claim Missing Document
Check
Articles

Found 2 Documents
Search

Using genetic algorithm feature selection to optimize XGBoost performance in Australian credit Pertiwi, Dwika Ananda Agustina; Ahmad, Kamilah; Salahudin, Shahrul Nizam; Annegrat, Ahmed Mohamed; Muslim, Much Aziz
Journal of Soft Computing Exploration Vol. 5 No. 1 (2024): March 2024
Publisher : SHM Publisher

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52465/joscex.v5i1.302

Abstract

To reduce credit risk in credit institutions, credit risk management practices need to be implemented so that lending institutions can survive in the long term. Data mining is one of the techniques used for credit risk management. Where data mining can find information patterns from big data using classification techniques with the resulting level of accuracy. This research aims to increase the accuracy of classification algorithms in predicting credit risk by applying genetic algorithms as the best feature selection method. Thus, the most important feature will be used to search for credit risk information. This research applies a classification method using the XGBoost classifier on the Australian credit dataset, then carries out an evaluation by measuring the level of accuracy and AUC. The results show an increase in accuracy of 2.24%, with an accuracy value of 89.93% after optimization using a genetic algorithm. So, through research on genetic algorithm feature selection, we can improve the accuracy performance of the XGBoost algorithm on the Australian credit dataset.
A Performance Comparison of Data Balancing Model to Improve Credit Risk Prediction in P2P Lending Pertiwi, Dwika Ananda Agustina; Ahmad, Kamilah; Unjung, Jumanto; Muslim, Much Aziz
Scientific Journal of Informatics Vol. 11 No. 4: November 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v11i4.14018

Abstract

Purpose: The problem of imbalanced datasets often affects the performance of classification models for prediction, one of which is credit risk prediction in P2P lending. To overcome this problem, several data balancing models have been applied in the existing literature. However, existing research only evaluates performance based on classification model performance. Thus, in addition to measuring the performance of classification models, this study involves the contribution of the performance of data balancing models including Random Oversampling (ROS), Random Undersampling (RUS), and Synthetic Minority Oversampling (SMOTE). Methods: This research uses the Lending Club dataset with an imbalanced ratio (IR) of 4.098, and 2 classifiers such as LightGBM and XGBoost, as well as 10 cross-validation to assess the performance of the data balancing model including Random Oversampling (ROS), Random Undersampling (RUS), and Synthetic Minority Oversampling (SMOTE). Then the model is evaluated using the metrics of accuracy, recall, precision, and F1-score. Result: The research results show that SMOTE has superior performance as a data balancing model in P2P lending, with an accuracy of the LightGBM+SMOTE model of 92.56% and the XGBoost+SMOTE model of 92.32%, where this performance is better than other models. Novelty: This research concludes that SMOTE as a data balancing model to improve credit risk prediction in P2P lending has superior performance. Apart from that, in this case, we find that the larger the data size used as a model training sample, the superior performance obtained by the classification model in predicting credit risk in P2P lending.