Garuda - Garba Rujukan Digital

Article Per Year (5 Year)

p-Index From 2021 - 2026

0.23

P-Index

This Author published in this journals

All Journal Paradigma

Katamsyi, Kaifa Ahlal

Unknown Affiliation

Author-ID : 9984148

Computer Science & IT

Published : 1 Documents Claim Missing Document

Claim Missing Document

Articles

Comparative Analysis of Email Spam Detection Using SVM with TF-IDF and Word2Vec on Multilingual Datasets Katamsyi, Kaifa Ahlal; Akbar, Ahmad Taufiq; Nurkholis, Andi; Prapcoyo, Hari; Akbar, Bagus Muhammad; Saifullah, Shoffan
Paradigma - Jurnal Komputer dan Informatika Vol. 28 No. 1 (2026): March 2026 Period
Publisher : LPPM Universitas Bina Sarana Informatika

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.31294/p.v28i1.12339

The rapid growth of email communication has increased the prevalence of spam emails, which can disrupt productivity and compromise information security. This study presents a comparative analysis of two text representation methods—TF-IDF and Word2Vec—for spam email classification using a Support Vector Machine (SVM) with a Radial Basis Function kernel. The experiments utilized Indonesian and English email datasets totaling 5,421 emails, split into 75% training and 25% testing sets. Two scenarios were evaluated: baseline with default parameters and after hyperparameter optimization using Grid Search combined with K-Fold Cross Validation. The results indicate that TF-IDF consistently outperformed Word2Vec across both languages, achieving the highest accuracy of 0.9562 on the English dataset after tuning. Word2Vec showed substantial improvement following parameter adjustment, reducing the performance gap with TF-IDF. The findings highlight the importance of hyperparameter optimization for enhancing the quality of feature representations and improving classification performance. This study also demonstrates that TF-IDF provides more stable results across different linguistic contexts, while Word2Vec benefits significantly from careful tuning. The results provide practical insights for implementing efficient spam email detection systems in multilingual environments. Future research could explore additional classifiers, deep learning approaches, and contextual embeddings to further improve classification accuracy and robustness.

Co-Authors Akbar, Ahmad Taufiq Akbar, Bagus Muhammad Andi Nurkholis Prapcoyo, Hari Saifullah, Shoffan

Title

Found 1 Documents
Search

Abstract

Title Search

Found 1 Documents Search

Abstract

Title

Found 1 Documents
Search