Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Journal of Computing Theories and Applications

Quantifying the Impact of Text Preprocessing on IndoBERT Fine-Tuning for Indonesian Informal Culinary Sentiment Analysis Rahmat Budianoor; Setyo Wahyu Saputro; Friska Abadi; Radityo Adi Nugroho; Andi Farmadi
Journal of Computing Theories and Applications Vol. 3 No. 4 (2026): JCTA 3(4) 2026
Publisher : Universitas Dian Nuswantoro

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.62411/jcta.15980

Abstract

Indonesian culinary comments on social media platforms such as Instagram are characterized by informal spelling, regional language mixing, slang expressions, and emojis, posing substantial challenges for automated sentiment classification. While IndoBERT has demonstrated strong performance across Indonesian natural language processing tasks, the contribution of individual preprocessing components to fine-tuning performance on informal text remains underexplored, particularly in the culinary domain. This study addresses this gap by conducting a systematic preprocessing ablation study on IndoBERT-Base fine-tuning for Indonesian culinary sentiment classification, accompanied by a comparative evaluation against Naive Bayes with TF-IDF, SVM with TF-IDF, and BiLSTM as representative baselines. A dataset of 3,500 manually labeled Instagram culinary comments across three sentiment classes was used, with a stratified 80/10/10 split. Six preprocessing variants were evaluated under identical experimental conditions to isolate the contribution of each component. The results show that slang normalization is the most impactful single preprocessing step, yielding a macro F1-score gain of +0.0609 over the no-preprocessing baseline, while the full pipeline achieves an accuracy of 0.8800 and a macro F1-score of 0.8465. IndoBERT-Base with the full pipeline outperforms all baselines across all evaluation metrics. Per-class analysis reveals that the negative class achieves the lowest F1-score of 0.7600, with sarcastic expressions and Banjar regional vocabulary identified as primary sources of misclassification. These findings indicate that preprocessing decisions have a measurable and non-uniform effect on IndoBERT fine-tuning performance. In this study, slang normalization provides the most substantial individual contribution in bridging the vocabulary gap between informal user-generated text and the model’s pre-training distribution.
Co-Authors A.A. Ketut Agung Cahyawan W AA Sudharmawan, AA Abdullayev, Vugar Achmad Zainudin Nur Adi Mu'Ammar, Rifqi Aflaha, Rahmina Ulfah Ahmad Juhdi Alfando, Muhammad Alvin Amalia, Raisa Andi Farmadi Andi Farmadi Andi Farmandi Arif, Nuuruddin Hamid Athavale, Vijay Anant budiman, irwan Deni Kurnia Dodon Turianto Nugrahadi Dwi Kartini Dwi Kartini, Dwi Emma Andini Fatma Indriani Fauzan Luthfi, Achmad Febrian, Muhamad Michael Halimah Halimah Halimah Herteno, Rudy Herteno, Rudy Indriani, Fatma Irwan Budiman Irwan Budiman Irwan Budiman Itqan Mazdadi, Muhammad Kartika, Najla Putri M Kevin Warendra Mafazy, Muhammad Meftah Martalisa, Asri Maulana Abdul Rahman Maulana, Muhammad Rafly Alfarizqy Mera Kartika Delimayanti Muhamad Fawwaz Akbar Muhammad Adika Riswanda Muhammad Alkaff Muhammad Azmi Adhani Muhammad Denny Ersyadi Rahman Muhammad Fikri Muhammad Haekal Muhammad Itqan Mazdadi Muhammad Khairin Nahwan Muhammad Mirza Hafiz Yudianto Muhammad Nazar Gunawan Muhammad Noor Muhammad Reza Faisal, Muhammad Reza Muhammad Sholih Afif Muliadi Muliadi Muliadi Aziz Muliadi Muliadi Nabella, Putri Nor Indrani Nugrahadi, Dodon Nurlatifah Amini Nursyifa Azizah Prastya, Septyan Eka Pratama, Muhammad Yoga Adha Putri Nabella Raditya, Virgi Atha Radityo Adi Nugroho Rahman Hadi Rahman Rahmat Budianoor Rahmat Ramadhani Rahmawati, Nanda Putri Rahmayanti Rahmayanti Ramadhan, Muhammad Rizky Aulia Reina Alya Rahma Rezeki, Abdillah Rinaldi Riza Susanto Banner Rizal, Muhammad Nur Rizky Ananda, Muhammad Rizky, Muhammad Hevny Rudy Herteno Rudy Herteno SALLY LUTFIANI Saputro, Setyo Wahyu Saputro, Setyo Wahyu Saragih, Triando Hamonangan Sarah Monika Nooralifa Sa’diah, Halimatus Septyan Eka Prastya Setyo Wahyu Saputro Setyo Wahyu Saputro Siti Fathmah Siti Napi'ah Tri Mulyani Ulya, Azizatul Umar Ali Ahmad Vina Maulida, Vina Wahyu Dwi Styadi Wahyu Saputro, Setyo Yunida, Rahmi