Hanif Journal of Information Systems
Vol. 3 No. 1 (2025): August Edition

Detecting Zero-Width Characters Obfuscated in Phishing URLs using the XGBOOST Algorithm

Asadel, Ahmad (Unknown)
Zulherry, Andi (Unknown)



Article Info

Publish Date
31 Jan 2026

Abstract

Phishing attacks represent one of the most common and damaging cyber threats, with techniques continuously evolving to become more sophisticated and harder to detect. One of the latest evasion methods of concern is the use of Zero-Width Characters (ZWC)—invisible Unicode Characters inserted into URLs to deceive traditional detection systems and human visual perception. This research aims to develop and evaluate an effective and reliable machine learning model to detect phishing URLs that have been obfuscated using ZWC. The eXtreme Gradient Boosting (XGBoost) algorithm was chosen for its proven superiority in handling complex data and its performance optimization capabilities. This study utilized a public dataset from Kaggle consisting of 11,430 URL samples, which was then modified through a feature engineering process. Specifically, 50% of the phishing URLs were injected with one of five types of ZWC (ZWSP, ZWNJ, ZWJ, RLM, LRM), and a dedicated binary feature was created to flag the presence of these Characters. Initial training revealed signs of minor overfitting. Consequently, a hyperparameter tuning process was conducted by adjusting the max_depth and min_child_weight parameters to create a more robust model. The final model was evaluated on 20% of the test data and demonstrated exceptionally high performance, achieving an Accuracy of 97.24%, Precision of 97.03%, Recall of 97.37%, and an AUC score of 0.9972. The high Recall value is particularly crucial, proving the model's reliability in minimizing the risk of missed threats. This research successfully proves that an XGBoost-based approach with targeted feature engineering can be an effective solution against advanced phishing attacks.

Copyrights © 2025






Journal Info

Abbrev

hanif

Publisher

Subject

Computer Science & IT Library & Information Science

Description

Hanif journal of Information Systems aims to provide scientific literatures specifically on studies of applied research in information systems (IS)/information technology (IT) and public review of the development of theory, method and applied sciences related to the subject. Hanif Journal of ...