Pratama, Samuel Effendi
Universitas Multi Data Palembang

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Penerapan Algoritma Random Forest Berbasis Shap Feature Importance dan GridsearchCV Untuk Deteksi Phishing Pratama, Samuel Effendi; Udjulawa, Daniel
Progresif: Jurnal Ilmiah Komputer Vol 22, No 1 (2026): Januari
Publisher : STMIK Banjarbaru

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35889/progresif.v22i1.3345

Abstract

The rapid growth of internet users in Indonesia has increased the risk of cyberattacks, particularly phishing. Phishing is a digital fraud attempt that disguises links to resemble official websites in order to steal users’ sensitive information. This study aims to develop a phishing link detection model using a machine learning approach. The dataset consists of 11,430 URL entries from Mendeley Data, including features such as URL length, suspicious symbols, and subdomain levels. The Random Forest algorithm was chosen for its ability to handle high-dimensional data and resist overfitting. Feature selection was performed using SHAP (Shapley Additive Explanations) to assess feature contributions, while model optimization was conducted with GridSearchCV. The best configuration, RF + GS + SHAP Threshold-P10, achieved an accuracy of 0.9650 and an F1-score of 0.9651, producing an accurate, efficient, and interpretable phishing detection model.Keywords: Phishing; Random Forest; GridSearchCV; SHAP; Machine Learning AbstrakPesatnya pertumbuhan pengguna internet di Indonesia meningkatkan risiko serangan siber, salah satunya phishing. Phishing merupakan upaya penipuan digital dengan menyamarkan tautan agar menyerupai situs resmi untuk mencuri informasi sensitif pengguna. Penelitian ini bertujuan membangun model deteksi tautan phishing menggunakan pendekatan machine learning. Dataset yang digunakan berisi 11.430 entri URL dari Mendeley Data, mencakup fitur seperti panjang URL, simbol mencurigakan, dan tingkat subdomain. Algoritma random forest dipilih karena mampu menangani data berdimensi tinggi serta tahan terhadap overfitting. Seleksi fitur dilakukan dengan SHAP (Shapley Additive Explanations) untuk menilai kontribusi fitur, sedangkan optimasi parameter model menggunakan GridSearchCV. Hasil penelitian menunjukkan konfigurasi RF + GS + SHAP Threshold-P10 memberikan akurasi 0,9650 dan F1-score 0,9651, menghasilkan model yang akurat, efisien, dan transparan dalam mendeteksi tautan phishing.Kata kunci: Phishing; Random Forest; GridSearchCV; SHAP; Machine Learning