Phishing is one of the most prevalent cybersecurity threats that exploits malicious URLs to deceive users and steal sensitive information. This study proposes a URL-based phishing detection method using the lightweight Transformer model TinyBERT and compares its performance with three baseline models: SVM based on character n-grams, Random Forest based on lexical URL features, and Char-CNN. The dataset used in this study consists of 49,750 URLs with multi-class labels (benign, defacement, malware, and phishing), which were subsequently binarized into phishing (label 1) and non-phishing (label 0). The data were divided using a stratified split into training, validation, and testing sets with a ratio of 70%–15%–15%. To address class imbalance, the TinyBERT model was trained using a weighted loss approach based on class weights. The evaluation was conducted using a confusion matrix, accuracy, precision, recall, F1-score, as well as ROC and Precision–Recall curves. Experimental results demonstrate that TinyBERT achieved the best performance, with an accuracy of 0.9925, phishing recall of 0.9512, and an F1-score of 0.9387. In addition, the model produced the lowest number of false negatives (22) compared with the baseline models. These findings indicate that TinyBERT is more effective in minimizing phishing URLs that are incorrectly classified as benign, making it more suitable for implementing URL-based phishing detection in cybersecurity systems.
Copyrights © 2026