This Author published in this journals
All Journal Paradigma
Katamsyi, Kaifa Ahlal
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Comparative Analysis of Email Spam Detection Using SVM with TF-IDF and Word2Vec on Multilingual Datasets Katamsyi, Kaifa Ahlal; Akbar, Ahmad Taufiq; Nurkholis, Andi; Prapcoyo, Hari; Akbar, Bagus Muhammad; Saifullah, Shoffan
Paradigma - Jurnal Komputer dan Informatika Vol. 28 No. 1 (2026): March 2026 Period
Publisher : LPPM Universitas Bina Sarana Informatika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31294/p.v28i1.12339

Abstract

The rapid growth of email communication has increased the prevalence of spam emails, which can disrupt productivity and compromise information security. This study presents a comparative analysis of two text representation methods—TF-IDF and Word2Vec—for spam email classification using a Support Vector Machine (SVM) with a Radial Basis Function kernel. The experiments utilized Indonesian and English email datasets totaling 5,421 emails, split into 75% training and 25% testing sets. Two scenarios were evaluated: baseline with default parameters and after hyperparameter optimization using Grid Search combined with K-Fold Cross Validation. The results indicate that TF-IDF consistently outperformed Word2Vec across both languages, achieving the highest accuracy of 0.9562 on the English dataset after tuning. Word2Vec showed substantial improvement following parameter adjustment, reducing the performance gap with TF-IDF. The findings highlight the importance of hyperparameter optimization for enhancing the quality of feature representations and improving classification performance. This study also demonstrates that TF-IDF provides more stable results across different linguistic contexts, while Word2Vec benefits significantly from careful tuning. The results provide practical insights for implementing efficient spam email detection systems in multilingual environments. Future research could explore additional classifiers, deep learning approaches, and contextual embeddings to further improve classification accuracy and robustness.