INOVTEK Polbeng - Seri Informatika
Vol. 10 No. 3 (2025): November

Phishing Website Detection Using a Machine Learning Classification Approach

Ibnu Arifin (Unknown)
Chairani (Unknown)



Article Info

Publish Date
16 Sep 2025

Abstract

Phishing is a form of cybercrime that is increasingly prevalent, with millions of attacks recorded annually. This study develops a phishing website detection model using a machine learning classification approach, employing a pipeline that includes data preprocessing, feature selection, and model validation. The dataset was obtained from the UCI Machine Learning Repository and consists of 235,795 URLs with a relatively balanced distribution between phishing (100,945) and non-phishing (134,850). After data cleaning and feature selection, 21 optimal features were retained, ensuring they were safe from potential data leakage. Two algorithms were evaluated: decision tree and random forest, using 10-fold cross-validation. The random forest algorithm achieved an average accuracy of 97.78%, while the decision tree was slightly higher at 98.02%. However, random forest outperformed in class discrimination, as measured by ROC-AUC (99.73%) and PR-AUC (99.78%), compared to decision tree values of 99.49% and 99.40%. The method also incorporated a 10-fold cross-validation procedure to minimize data leakage and ensure reliable model evaluation. The Wilcoxon test further confirmed that the performance difference between the two algorithms is statistically significant. Overall, although the decision tree demonstrates strong classification performance, random forest proves to be more consistent and reliable in detecting phishing websites, making it a superior choice in the context of cybersecurity.

Copyrights © 2025






Journal Info

Abbrev

ISI

Publisher

Subject

Computer Science & IT

Description

The Journal of Innovation and Technology (INOVTEK Polbeng—Seri Informatika) is a distinguished publication hosted by the State Polytechnic of Bengkalis. Dedicated to advancing the field of informatics, this scientific research journal serves as a vital platform for academics, researchers, and ...