Jurnal Teknik Informatika C.I.T. Medicom
Vol 17 No 6 (2026): Computer Science

Comparative Study of CatBoost, XGBoost, Random Forest, and Decision Tree for Phishing Web Page Classification

Haryani, Haryani (Unknown)
Agustyaningrum, Cucu Ika (Unknown)



Article Info

Publish Date
11 Feb 2026

Abstract

Phishing is a fraudulent method in which attackers using fake websites steal user information such as login credentials and sensitive financial data. Therefore, this study compares four machine learning algorithms, namely CatBoost, XGBoost, Random Forest, and Decision Tree, in classifying phishing websites efficiently and accurately. In this study, the dataset used is the Web Page Phishing Dataset, which begins with exploration and preprocessing, which includes data cleaning, handling missing values, normalization, feature selection, and testing. Post-split. The data used has been divided into training data and test data, namely 80:20. The model was implemented using Python in Google Colaboratory. Model performance evaluation was measured in five main metrics, such as accuracy, precision, recall, F1-score, and AUC. The experimental results indicate that CatBoost achieved the best position with a performance of 89.57% in accuracy, 85.74% in F1-score, 88.73% in precision, 88.78% in recall, and 89.00% in AUC. XGBoost ranked second with a very competitive performance, followed by Random Forest, which was relatively stable with an accuracy value of 89.41% and an F1-score of 85.35%. On the other hand, the decision tree achieved the lowest performance with an accuracy of 88.69% and an F1-score of 84.10%. These performance results indicate limitations in handling complex data, as well as a tendency to overfit. Overall, ensemble boosting-based algorithms, especially CatBoost and XGBoost, outperform single trees in detecting phishing websites. These results will be benefical to?progress in the next generation for the construction of intelligent based phishing detection system under machine learning. In addition, the outcomes of this study will gain momentum for future works where hyperparameter optimization, larger datasets and real-time applications for phishing detection systems?can be focused. Furthermore, this work will contrast the application of ensemble?algorithm in the cybersecurity field.

Copyrights © 2026






Journal Info

Abbrev

JTI

Publisher

Subject

Computer Science & IT

Description

The Jurnal Teknik Informatika C.I.T a scientific journal of Decision support sistem , expert system and artificial inteligens which includes scholarly writings on pure research and applied research in the field of information systems and information technology as well as a review-general review of ...