Jurnal Informatika
Vol. 12 No. 2 (2025): October

Comparative study of DistilBERT and ELECTRA-Small Models in Spam Email Classification

Ferdy Agusman (Ministry of Finance of the Republic of Indonesia)



Article Info

Publish Date
03 Oct 2025

Abstract

Spam email detection is one of the challenging tasks in cybersecurity due to the variability of spam content. These characteristics make it harder to identify spam, therefore researchers create different spam detection methods. Among these, Natural Language Processing (NLP) and machine learning techniques have shown outstanding results in classifying emails as spam or non-spam. Transformer-based models, such as BERT, have demonstrated pinpoint accuracy in text classification tasks. However, the computational requirements and resources are not practical in resource-limited environments. To mitigate this, smaller and more lightweight models, such as the DistilBERT and ELECTRA-Small, have been developed. This paper presents a comparative study of the DistilBERT and ELECTRA-Small models for spam email classification. The objective is to evaluate the performance and computational efficiency of these two compact transformer architectures. Both DistilBERT and ELECTRA-Small models were fine-tuned on an email dataset comprising 5728 samples. Our experimental results on the primary test set indicate that both models achieved an accuracy of almost 99%. However, when evaluated on a separate external validation set containing 10,000 emails, the ELECTRA-Small model achieved an accuracy of 86.53%, outperforming DistilBERT's 83.68%. Furthermore, ELECTRA-Small demonstrated superior computational efficiency with a training time of 00:02:00, compared to DistilBERT's 00:04:46. This study represents one of the few studies to directly compare the performance and computational efficiency of these two models in the context of spam email detection, highlighting their potential as lightweight and effective solutions for real-world applications.

Copyrights © 2025






Journal Info

Abbrev

ji

Publisher

Subject

Computer Science & IT

Description

Jurnal Informatika first publication in 2014 (ISSN: e. 2528-2247 p. 2355-6579) is scientific journal research in Informatics Engineering, Informatics Management, and Information Systems, published by Universitas Bina Sarana Informatika which the articles were never published online or in print. The ...