Jurnal Computer Science and Information Technology (CoSciTech)
Vol 6 No 3 (2025): Jurnal Computer Science and Information Technology (CoSciTech)

Deteksi Spam Email Multibahasa: Menggunakan Cross-Lingual Transfer Learning

Mahalisa, Galih (Unknown)
Alfah, Rina (Unknown)
Sanjaya, Hendra (Unknown)



Article Info

Publish Date
26 Dec 2025

Abstract

Targeting the challenge of text classification in Indonesian, which often faces a scarcity of adequate labeled data, this research adapts the pre-trained language model BERT-base-multilingual-cased, which was trained on a large multilingual corpus. The strategy involves two stages: first, the model is fine-tuned on a rich English-language spam dataset, and second, the trained model is then further fine-tuned using a much smaller Indonesian-language dataset. Quantitative evaluation results show that the model achieved very good and consistent performance in both languages. On the English dataset, the model reached an Accuracy of 0.9738 and an F1-score of 0.9436. More significantly, on the Indonesian dataset, the model achieved an Accuracy of 0.9492 with an F1-score of 0.9494. The comparable performance between the two languages, despite the Indonesian dataset being much smaller, proves that the semantic knowledge acquired from the source language (English) can be efficiently transferred for the same classification task in the target language (Indonesian). This research provides a strong demonstration of how transfer learning can bridge the data resource gap and has important implications for the development of NLP applications in the context of low-resource languages

Copyrights © 2025






Journal Info

Abbrev

coscitech

Publisher

Subject

Computer Science & IT

Description

Jurnal CoSciTech (Computer Science and Information Technology) merupakan jurnal peer-review yang diterbitkan oleh Program Studi Teknik Informatika, Fakultas Ilmu Komputer, Univeritas Muhammadiyah Riau (UMRI) sejak April tahun 2020. Jurnal CoSciTech terdaftar pada PDII LIPI dengan Nomor ISSN ...