Claim Missing Document
Check
Articles

Found 1 Documents
Search

COMPARISON OF PRE-TRAINED BERT-BASED TRANSFORMER MODELS FOR REGIONAL LANGUAGE TEXT SENTIMENT ANALYSIS IN INDONESIA Taufiq Dwi Purnomo; Joko Sutopo
International Journal Science and Technology Vol. 3 No. 3 (2024): November: International Journal Science and Technology
Publisher : Asosiasi Dosen Muda Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.56127/ijst.v3i3.1739

Abstract

This study compared the performance of eight pre-trained BERT-based models for sentiment analysis across ten regional languages in Indonesia. The objective was to identify the most effective model for analyzing sentiment in low-resource Indonesian languages, given the increasing need for automated sentiment analysis tools. The study utilized the NusaX dataset and evaluated the performance of IndoBERT (IndoNLU), IndoBERT (IndoLEM), Multilingual BERT, and NusaBERT, each in both base and large variants. Model performance was assessed using the F1-score metric. The results indicated that models pre-trained on Indonesian data, specifically IndoBERT (IndoNLU) and NusaBERT, generally outperformed the multilingual BERT and IndoBERT (IndoLEM) models. IndoBERT-large (IndoNLU) achieved the highest overall F1-score of 0.9353. Performance varied across the different regional languages. Javanese, Minangkabau, and Banjar consistently showed high F1 scores, while Batak Toba proved more challenging for all models. Notably, NusaBERT-base underperformed compared to IndoBERT-base (IndoNLU) across all languages, despite being retrained on Indonesian regional languages. This research provides valuable insights into the suitability of different pre-trained BERT models for sentiment analysis in Indonesian regional languages.