Paradigma
Vol. 28 No. 1 (2026): March 2026 Period

Comparative Analysis of Email Spam Detection Using SVM with TF-IDF and Word2Vec on Multilingual Datasets

Katamsyi, Kaifa Ahlal (Unknown)
Akbar, Ahmad Taufiq (Unknown)
Nurkholis, Andi (Unknown)
Prapcoyo, Hari (Unknown)
Akbar, Bagus Muhammad (Unknown)
Saifullah, Shoffan (Unknown)



Article Info

Publish Date
31 Mar 2026

Abstract

The rapid growth of email communication has increased the prevalence of spam emails, which can disrupt productivity and compromise information security. This study presents a comparative analysis of two text representation methods—TF-IDF and Word2Vec—for spam email classification using a Support Vector Machine (SVM) with a Radial Basis Function kernel. The experiments utilized Indonesian and English email datasets totaling 5,421 emails, split into 75% training and 25% testing sets. Two scenarios were evaluated: baseline with default parameters and after hyperparameter optimization using Grid Search combined with K-Fold Cross Validation. The results indicate that TF-IDF consistently outperformed Word2Vec across both languages, achieving the highest accuracy of 0.9562 on the English dataset after tuning. Word2Vec showed substantial improvement following parameter adjustment, reducing the performance gap with TF-IDF. The findings highlight the importance of hyperparameter optimization for enhancing the quality of feature representations and improving classification performance. This study also demonstrates that TF-IDF provides more stable results across different linguistic contexts, while Word2Vec benefits significantly from careful tuning. The results provide practical insights for implementing efficient spam email detection systems in multilingual environments. Future research could explore additional classifiers, deep learning approaches, and contextual embeddings to further improve classification accuracy and robustness.

Copyrights © 2026






Journal Info

Abbrev

paradigma

Publisher

Subject

Computer Science & IT

Description

The Paradigma Journal is intended as a medium for scientific studies of research, thought and analysis-critical issues on Computer Science, Information Systems, and Information Technology, both nationally and internationally. The scientific article refers to theoretical reviews and empirical studies ...