Garuda - Garba Rujukan Digital

Brilliance: Research of Artificial Intelligence

Vol. 6 No. 1 (2026): Brilliance: Research of Artificial Intelligence, Article Research May 2026

Rizi, Muhammad Alfa (Unknown)
Rachmat, Nur (Unknown)

Publish Date
19 Jan 2026

Spam email remains a significant problem in digital communication, particularly for Indonesian-language emails, due to linguistic complexity, informal writing styles, and similarities between spam and legitimate (ham) messages. These factors often reduce the effectiveness of traditional spam filtering techniques. This study evaluates the performance of the Support Vector Machine (SVM) algorithm for classifying Indonesian spam emails using a combination of Term Frequency–Inverse Document Frequency (TF-IDF) and N-gram features. The proposed approach applies a text preprocessing pipeline, including case folding, text cleaning, tokenization, stopword removal, and stemming, to reduce noise and improve feature representation. Text data are transformed into numerical vectors using TF-IDF with unigram and bigram configurations to capture individual terms and contextual phrase patterns commonly found in spam emails. A linear kernel SVM is used as the classification model, and its performance is evaluated using K-Fold Cross-Validation to ensure robustness and reduce evaluation bias. The model is assessed using accuracy, precision, recall, and F1-score metrics. Experiments are conducted on the Indonesian Email Spam Dataset, consisting of 2,636 emails, with 1,368 spam messages and 1,268 non-spam (ham) messages. Experimental results show that the proposed model achieved an average accuracy of 98.71%, precision of 98.34%, recall of 99.20%, and F1-score of 98.76 across 10-fold cross-validation. This study contributes to the development of an efficient and lightweight spam detection model for Indonesian-language emails and provides empirical evidence that SVM combined with TF-IDF and N-gram features remains a reliable alternative to more complex deep learning approaches for medium-sized text datasets.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Brilliance: Research of Artificial Intelligence

Website

Abbrev

brilliance

Publisher

Information Technology and Science

Subject

Decision Sciences, Operations Research & Management Mathematics Other

Description

Brilliance: Research of Artificial Intelligence is The Scientific Journal. Brilliance is published twice in one year, namely in February, May and November. Brilliance aims to promote research in the field of Informatics Engineering which focuses on publishing quality papers about the latest ...

Article Info

Abstract

Indonesian-Language Spam Email Classification Using Support Vector Machine

Article Info

Abstract