This study aims to compare the effectiveness of four machine learning models in email classification, namely Support Vector Machine (SVM), Decision Tree, Naive Bayes, and Neural Network. This research uses datasets obtained from the Kaggle website. The first dataset contains 18,650 phishing emails (7,328 phishing and 11,322 non-phishing). The second dataset is the result of merging two different datasets containing Indonesian spam emails, resulting in a total of 4,681 emails (2,670 spam and 2,011 non-spam). The merging was done to obtain a more representative amount of data for model evaluation. The results of the study of the two datasets above showed that the Neural Network achieved the highest accuracy with an average of 96.60%. Then, followed by SVM with an average accuracy of 96.43%. Meanwhile, Decision Tree has a fairly high accuracy with an average of 92.38%. In contrast, Naive Bayes recorded the lowest performance with an average accuracy of 90.22%. Although Neural Network has the highest accuracy, other models may be more suitable depending on the needs of the system. Models with lower accuracy, such as Naive Bayes, can be more useful in systems with computational limitations due to their efficiency. SVM offers a balance between high accuracy and computational efficiency, making it an ideal choice for systems that require optimal performance without too much computational burden. Decision Tree is superior in result interpretation, making it suitable for applications that require transparency in decision making.
Copyrights © 2025