This research aims to investigate the potential security risks associated with phishing email attacks and compare the performance of three main classification algorithms: random forest, SVM, and a combination of k-fold cross-validation with the xgboost model. The dataset consists of 18,634 emails, with 7,312 identified as phishing emails and 11,322 considered safe. Through experiments, the combination of k-fold cross-validation and xgboost demonstrated the best performance with the highest accuracy of 0.9712828770799785. The email classification graph provides a visual insight into the distribution of classification results, aiding in understanding patterns and trends in phishing attack detection. The analysis of the ROC curve results indicates that k-fold cross-validation and xgboost have a higher AUC compared to random forest and SVM, signifying a better ability to predict the correct class. The conclusion emphasizes the importance of the combination of k-fold cross-validation and xgboost in enhancing email security, with the potential for increased accuracy through parameter adjustments.
Copyrights © 2024