Targeting the challenge of text classification in Indonesian, which often faces a scarcity of adequate labeled data, this research adapts the pre-trained language model BERT-base-multilingual-cased, which was trained on a large multilingual corpus. The strategy involves two stages: first, the model is fine-tuned on a rich English-language spam dataset, and second, the trained model is then further fine-tuned using a much smaller Indonesian-language dataset. Quantitative evaluation results show that the model achieved very good and consistent performance in both languages. On the English dataset, the model reached an Accuracy of 0.9738 and an F1-score of 0.9436. More significantly, on the Indonesian dataset, the model achieved an Accuracy of 0.9492 with an F1-score of 0.9494. The comparable performance between the two languages, despite the Indonesian dataset being much smaller, proves that the semantic knowledge acquired from the source language (English) can be efficiently transferred for the same classification task in the target language (Indonesian). This research provides a strong demonstration of how transfer learning can bridge the data resource gap and has important implications for the development of NLP applications in the context of low-resource languages
Copyrights © 2025