Jurnal Teknologi Informasi : Jurnal Keilmuan dan Aplikasi Bidang Teknik Informatika
Vol. 19 No. 2 (2025): Jurnal Teknologi Informasi : Jurnal Keilmuan dan Aplikasi Bidang Teknik Inform

SPAM EMAIL CLASSIFICATION USING SUPPORT VECTOR MACHINE (SVM) AND TF-IDF: A CASE STUDY WITH THE TREC 2007 AND ENRON-SPAM DATASETS

Paramartha, I Gusti Ngurah Darma (Unknown)
Sudestra, I Made Ardi (Unknown)
Gama, Adie Wahyudi Oktavia (Unknown)
Prathama, Gede Humaswara (Unknown)



Article Info

Publish Date
31 Aug 2025

Abstract

Spam emails represent a substantial concern within the digital landscape, impeding users with unsolicited communications. This study elucidates the utilization of a Support Vector Machine (SVM) coupled with a TF-IDF Vectorizer for categorizing emails into spam and non-spam classifications. The model was developed utilizing two publicly accessible pre-processed datasets: the TREC 2007 Public Spam Corpus and the Enron-Spam Dataset. By employing the TF-IDF algorithm, which allocates heightened importance to infrequent yet pertinent terms, alongside SVM, renowned for its efficacy in textual classification, the model exhibits remarkable efficacy, achieving an accuracy of 99.04%, a precision of 98.57% and a recall of 99.62%. These findings underscore the model's formidable capacity to discern spam emails while concurrently minimizing false positives accurately. This is critical for real-world applications where authentic emails must not be erroneously categorized as spam. Furthermore, this study elaborates on the justification for the selection of TF-IDF and SVM in the context of spam email classification, in addition to the evaluation outcomes of the model, which align with existing literature, wherein the integration of SVM with TF-IDF has demonstrated substantial performance in spam detection endeavours.

Copyrights © 2025






Journal Info

Abbrev

JTI

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management Electrical & Electronics Engineering Engineering Library & Information Science

Description

Jurnal Teknologi Informasi (JTI) diterbitkan adalah Jurnal Jurusan Teknik Informatika Universitas Palangka Raya dengan ISSN 1907-896X, E-ISSN 2656-0321. Jurnal Teknologi Informasi (JTI) merupakan Jurnal Keilmuan dan Aplikasi Bidang Teknik Informatika yang menyajikan hasil penelitian yang fokus pada ...