Phishing attacks through malicious URLs have become a critical cybersecurity threat, resulting in substantial financial losses and data exposures on a global scale. Conventional approaches like blacklisting and rule-based detection often fall behind as phishing methods become more advanced, including zero-day phishing URLs. In this research, machine learning models based on Random Forest and Gradient Boosting are designed and tested to accurately identify phishing URLs. The dataset, obtained from Kaggle, consists of 11,430 URLs with extracted features representing URL characteristics such as length, subdomain count, HTTPS status, and domain age. The two models underwent training and validation with the help of stratified train-test splits and cross-validation techniques. To evaluate the models, several performance indicators—such as accuracy, precision, recall, F1-score, and ROC AUC—were applied. Results from the experiments reveal that Gradient Boosting slightly exceeds the performance of Random Forest, achieving an accuracy of 98.0%, precision of 98.1%, and an F1-score of 98.0%. The best-performing model was integrated into a web application built with Streamlit, providing real-time phishing detection for end-users. This research contributes to developing adaptive and efficient phishing URL detection systems, enhancing cybersecurity defenses against evolving phishing threats. The implementation demonstrates practical applicability and ease of use for non-expert users.
Copyrights © 2025