Jurnal FASILKOM (teknologi inFormASi dan ILmu KOMputer)
Vol. 16 No. 1 (2026): Jurnal FASILKOM (teknologi inFormASi dan ILmu KOMputer)

Prediksi Lead Scoring untuk Optimasi Penjualan Menggunakan Random Forest dan Teknik SMOTE

Pratama Putra, Daffa (Unknown)
Agil Kusuma, Dimas (Unknown)
Al Akbar, M. Rizki (Unknown)
Ibrahim, Ali (Unknown)
Fathoni, Fathoni (Unknown)



Article Info

Publish Date
30 Apr 2026

Abstract

Accurate lead scoring systems have become a strategic necessity for organizations operating in data-driven marketing environments, as they enable systematic identification of high-value customer prospects to maximize sales conversion efficiency. A fundamental challenge confronting conventional classification models is the class imbalance inherent in real-world marketing data, which induces majority-class bias and substantially reduces sensitivity toward minority-class prospects. This study proposes a Random Forest (RF)-based lead scoring prediction model integrated with the Synthetic Minority Over-sampling Technique (SMOTE) to address this limitation systematically. The dataset employed is the Lead Scoring Dataset from Kaggle, comprising 9,240 customer prospect records from an educational company with a class imbalance ratio of 1.59:1. Preprocessing included missing value treatment, removal of attributes exceeding 40% data loss, mode-based imputation, and categorical feature encoding. Following an 80:20 stratified split, SMOTE was applied exclusively to the training set to produce a balanced class distribution and prevent data leakage. The RF model was configured with n_estimators = 100, max_features = 'sqrt', and class_weight = 'balanced'. The proposed RF+SMOTE model achieved accuracy of 88.80%, precision of 86.44%, recall of 84.13%, F1-Score of 85.27%, and AUC-ROC of 0.9453, outperforming the baseline across four of five evaluation metrics. The most notable improvement was observed in recall, with a gain of 1.26 percentage points. Stratified 5-Fold Cross-Validation confirmed robust generalization capability, with AUC-ROC values consistently ranging between 94% and 95%. These findings demonstrate that the hybrid RF+SMOTE approach effectively enhances high-potential prospect detection while maintaining overall model stability for real-world Customer Relationship Management (CRM) deployment.

Copyrights © 2026






Journal Info

Abbrev

JIK

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management

Description

Jurnal FASILKOM (teknologi inFormASi dan ILmu KOMputer) is expected to be a media of scientific study of research result, a thought and a study criticial analysis to a System engineering research, Informatics Engineering, Information Technology, Computer Engineering, Informatics Management, and ...