Spam emails, sent en masse to numerous addresses, are a major annoyance. To combat this, effective filters are necessary, such as classification to separate spam from non-spam. This can be achieved through an anti-spam model utilizing text mining like TF-IDF. Using the KDD process, a study analyzed a dataset of 6046 entries, split 77.2% non-spam and 22.8% spam. Logistic Regression showed the best accuracy at 98%, outperforming Decision Tree (59%) and Support Vector Machine (95%). Thus, Logistic Regression emerged as the optimal algorithm for email classification.
Copyrights © 2024