Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Scientific Journal of Informatics

A Performance Comparison of Data Balancing Model to Improve Credit Risk Prediction in P2P Lending Pertiwi, Dwika Ananda Agustina; Ahmad, Kamilah; Unjung, Jumanto; Muslim, Much Aziz
Scientific Journal of Informatics Vol. 11 No. 4: November 2024
Publisher : Universitas Negeri Semarang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.15294/sji.v11i4.14018

Abstract

Purpose: The problem of imbalanced datasets often affects the performance of classification models for prediction, one of which is credit risk prediction in P2P lending. To overcome this problem, several data balancing models have been applied in the existing literature. However, existing research only evaluates performance based on classification model performance. Thus, in addition to measuring the performance of classification models, this study involves the contribution of the performance of data balancing models including Random Oversampling (ROS), Random Undersampling (RUS), and Synthetic Minority Oversampling (SMOTE). Methods: This research uses the Lending Club dataset with an imbalanced ratio (IR) of 4.098, and 2 classifiers such as LightGBM and XGBoost, as well as 10 cross-validation to assess the performance of the data balancing model including Random Oversampling (ROS), Random Undersampling (RUS), and Synthetic Minority Oversampling (SMOTE). Then the model is evaluated using the metrics of accuracy, recall, precision, and F1-score. Result: The research results show that SMOTE has superior performance as a data balancing model in P2P lending, with an accuracy of the LightGBM+SMOTE model of 92.56% and the XGBoost+SMOTE model of 92.32%, where this performance is better than other models. Novelty: This research concludes that SMOTE as a data balancing model to improve credit risk prediction in P2P lending has superior performance. Apart from that, in this case, we find that the larger the data size used as a model training sample, the superior performance obtained by the classification model in predicting credit risk in P2P lending.