Jurnal Nasional Teknik Elektro dan Teknologi Informasi
Vol 14 No 4: November 2025

Spam Email Classification Optimization With NLP-Based Naïve Bayes on TF-IDF and SMOTE

Andi Maslan (Unknown)
Azan Rahman (Unknown)
Umar Faruq (Unknown)
Rabei Raad Ali Al-Jawr (Unknown)



Article Info

Publish Date
28 Nov 2025

Abstract

The rapid advancement of information and communication technology has transformed the way humans interact and exchange information. Among various digital communication tools, email remains one of the most widely used; however, it is often exploited to send spam messages. Spam emails can contain phishing links, malware, or unsolicited advertisements, posing significant risks to individuals and organizations. Therefore, developing accurate and efficient spam detection methods is becoming increasingly important. This study proposes a lightweight and efficient spam email classification approach using the naïve Bayes algorithm combined with TF-IDF feature extraction and the synthetic minority oversampling technique (SMOTE) to address class imbalance. A series of preprocessing steps tokenization, lemmatization, stopword removal, and term frequency-inverse document frequency (TF-IDF) transformation were applied to normalize and vectorize email text data. The SMOTE technique was applied precisely to the training dataset to balance the class distribution and avoid data leakage during evaluation. Experimental results showed that the naïve Bayes model initially achieved 88% accuracy, 86% recall, 100% precision, and 92% F1 score. After proper application of SMOTE, the model achieved 100% accuracy, precision, recall, and F1 score, indicating perfect classification of spam and non-spam (ham) emails. These results confirm that proper class balancing significantly improves the model’s ability to detect spam emails. Overall, this study highlights the effectiveness of combining TF-IDF, naïve Bayes, and SMOTE as a robust yet computationally efficient solution for modern spam detection, particularly suited to real-time and resource-constrained environments.

Copyrights © 2025






Journal Info

Abbrev

JNTETI

Publisher

Subject

Computer Science & IT Control & Systems Engineering Electrical & Electronics Engineering Energy Engineering

Description

Topics cover the fields of (but not limited to): 1. Information Technology: Software Engineering, Knowledge and Data Mining, Multimedia Technologies, Mobile Computing, Parallel/Distributed Computing, Artificial Intelligence, Computer Graphics, Virtual Reality 2. Power Systems: Power Generation, ...