Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Building of Informatics, Technology and Science

Klasifikasi Spam Bahasa Indonesia dengan IndoBERT dan XLM-RoBERTa: Evaluasi Pooling, Stride, dan Late-Fusion Darmono, Darmono; Saputro, Rujianto Eko; Barkah, Azhari Shouni
Building of Informatics, Technology and Science (BITS) Vol 7 No 2 (2025): September 2025
Publisher : Forum Kerjasama Pendidikan Tinggi

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/bits.v7i2.8034

Abstract

Spam detection for Indonesian short messages such as SMS and email remains challenging due to lexical variation, character obfuscation, and class imbalance. This study provides a systematic evaluation to determine the most balanced configuration between accuracy and efficiency for Indonesian spam filtering. We compare two pretrained backbones (IndoBERT and XLM RoBERTa), along with representation strategies (truncation versus chunking), summarization schemes (pooling), and feature fusion approaches. The system follows a feature based design with an emphasis on simplicity, and is assessed using F1 Macro, spam class recall, AUPRC (Area Under the Precision Recall Curve), and efficiency metrics in terms of embedding build time and training latency. Results indicate that IndoBERT achieves superior binary classification performance with high efficiency, while XLM RoBERTa slightly outperforms on AUPRC, making it more suitable for risk ranking scenarios. Truncation combined with mean pooling consistently yields stable results. Although late fusion only provides marginal improvements, it remains relevant as it highlights the potential of domain specific signals to enhance robustness under heavy obfuscation. The final recommendation for production is IndoBERT with truncation, mean pooling, and embedding only. Limitations include the focus on short messages and the lack of evaluation under extreme obfuscation. Future work should explore character level augmentation, cross domain evaluation, and cost sensitive threshold tuning.
Co-Authors Adam Prayogo Kuncoro Adam Prayogo Kuncoro Adiatma, Febriansyah Husni Adiya, Az Zahra Dwi Nur Afriansyah, Fery Aimah, Samsul Arif Mu'amar Wahid Aulia Hamdi Azhari Shouni Barkah Bagaskoro, Galih Berlilana Berlilana Cahyo, Samsul Dwi Chyntia Raras Ajeng Widiawati Damayanti, Wenti Risma Dani Arifudin Darmono Deasy Komarasary Dhanar Intan Surya Saputra Dhanar Intan Surya Saputra Ely Purnawati Ely Purnawati, Ely Embong Octavianto Fandy Setyo Utomo Fatudin, Arif Faturama, Rafi Febrianti, Diah Ratna Fery Afriansyah Filanzi, Shendy Giat Karyono Hasna Salsa Dhia hidayatulloh, hanif Ikmah Ikmah Ikmah, Ikmah Ilham, Rifqi Arifin Indriyani, Ria Irwansyah Munandar Ismail, Dimas Shafa Malik Junianto, Haris Kusuma, Bagus Adhi Latif, Imam Sofarudin Lughri Wijaya Pamungkas Maharani, Revalyna Octavia Maulana Baihaqi, Wiga Millatul Izza, Nia Mohd. Hafiz Zakaria Munandar, Irwansyah Nanjar, Agi Ndari, Arum Vika Nia Millatul Izza Novita Eka Ramadhani Nurfaizi, Maulana Octavianto, Embong Pandu W, Muhammad Arfianto Prasetyo, Agung Pungkas Subarkah Purwadi Purwadi Putranto, R. Vitto Mahendra Radeta Tea Makdatuang Ramadhan, Rio Fadly Ria Indriyani Rizqi Aulia Widianto Rohmah, Umdah Aulia Rosana Fadila Sari safitri feriawan, Titi Salam, Sazilah Salsa Dhia, Hasna Samsul Aimah Saputra , Dhanar Intan Surya Saputra, Alfin Nur Aziz Saputri, Inka Sari, Rida Purnama Sarmini Sarmini - Sarmini Sarmini Sarmini Sazilah Salam Serli, Serli Sofa, Nur Sri Hartini Subarkah, Pungkas Suliswaningsih, Suliswaningsih Syahputra, Akhmal Angga Tanzilla, Armeyta Putri Tarwoto, T Tea Makdatuang, Radeta Titi Safitri Maharani Toni Anwar Turino, Turino Wahyuni, Irmawati Tri Wenti Risma Damayanti Wiga Maulana Baihaqi Wijaya, Anugerah Bagus Yuli Purwati Yulianto, Koko Edy