Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : MIND (Multimedia Artificial Intelligent Networking Database) Journal

RESULTANT: Data Preparation Techniques to Improve XGBoost Algorithm Performance KURNIA RAMADHAN PUTRA; SOFIA UMAROH; NUR FITRIANTI; SATRIA NUGRAHA
MIND (Multimedia Artificial Intelligent Networking Database) Journal Vol 8, No 1 (2023): MIND Journal
Publisher : Institut Teknologi Nasional, Bandung

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.26760/mindjournal.v8i1.42-51

Abstract

ABSTRAKPrediksi credit scoring saat ini banyak digunakan dalam layanan peer-to-peer lending oleh perusahaan teknologi finansial. Salah satu teknologi yang digunakan untuk credit scoring adalah data mining menggunakan algoritma machine learning XGBoost yang memiliki tingkat akurasi yang tinggi. RESULTANT diusulkan sebagai teknik yang digunakan untuk memaksimalkan hasil dari salah satu tahapan data mining yaitu preparasi data. Dataset yang digunakan adalah data Lending Club dengan total 2.260.701 record dan 151 variabel. Tahapan yang dilakukan pada RESULTANT adalah seleksi fitur, penanganan missing value, penanganan data outlier dan penanganan data ketidakseimbangan. Dari tahap RESULTANT, dihasilkan 44 variabel akhir yang siap digunakan untuk membangun model dengan menggunakan algoritma XGBoost. Hasil menunjukkan bahwa RESULTANT mampu meningkatkan performa algoritma XGBoost dengan akurasi 99,17%, presisi 99,28%, recall 99,05%, spesifisitas 99,29%, ROC/AUC 99,94%, dan skor f1 99,17%.Kata kunci: XGBoost, Preparasi Data, Seleksi Fitur, Missing Value, OutlierABSTRACTCredit scoring predictions are currently widely used in peer-to-peer lending services by financial technology companies. One of the technologies used for credit scoring is data mining using the XGBoost machine learning algorithm which has a high degree of accuracy. We present RESULTANT as a technique used to maximize the results of one of the stages of data mining, namely data preparation. The dataset used is Lending Club data with a total of 2,260,701 records and 151 variables. The stages carried out in RESULTANT are feature selection, handling missing values, handling outlier data and handling imbalance data. From the RESULTANT stage, 44 final variables are produced which are ready to be used to build models using the XGBoost algorithm. The results showed that RESULTANT was able to improve the performance of the XGBoost algorithm with accuracy 99,17%, precision 99,28%, recall 99,05%, specificity 99,29%, ROC/AUC 99.94%, and f1-score 99,17%.Keywords: XGBoost, Data Preparation, Feature Selection, Missing Value, Outlier