Claim Missing Document
Check
Articles

Found 1 Documents
Search

Spam Email Classification Optimization With NLP-Based Naïve Bayes on TF-IDF and SMOTE Andi Maslan; Azan Rahman; Umar Faruq; Rabei Raad Ali Al-Jawr
Jurnal Nasional Teknik Elektro dan Teknologi Informasi Vol 14 No 4: November 2025
Publisher : This journal is published by the Department of Electrical and Information Engineering, Faculty of Engineering, Universitas Gadjah Mada.

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22146/jnteti.v14i4.20931

Abstract

The rapid advancement of information and communication technology has transformed the way humans interact and exchange information. Among various digital communication tools, email remains one of the most widely used; however, it is often exploited to send spam messages. Spam emails can contain phishing links, malware, or unsolicited advertisements, posing significant risks to individuals and organizations. Therefore, developing accurate and efficient spam detection methods is becoming increasingly important. This study proposes a lightweight and efficient spam email classification approach using the naïve Bayes algorithm combined with TF-IDF feature extraction and the synthetic minority oversampling technique (SMOTE) to address class imbalance. A series of preprocessing steps tokenization, lemmatization, stopword removal, and term frequency-inverse document frequency (TF-IDF) transformation were applied to normalize and vectorize email text data. The SMOTE technique was applied precisely to the training dataset to balance the class distribution and avoid data leakage during evaluation. Experimental results showed that the naïve Bayes model initially achieved 88% accuracy, 86% recall, 100% precision, and 92% F1 score. After proper application of SMOTE, the model achieved 100% accuracy, precision, recall, and F1 score, indicating perfect classification of spam and non-spam (ham) emails. These results confirm that proper class balancing significantly improves the model’s ability to detect spam emails. Overall, this study highlights the effectiveness of combining TF-IDF, naïve Bayes, and SMOTE as a robust yet computationally efficient solution for modern spam detection, particularly suited to real-time and resource-constrained environments.