The Sundanese language, once spoken by 48 million individuals, has experienced a significant decline in speakers, losing 2 million in the past decade. This decline is attributed to weakened intergenerational transmission and the dominance of more widely used languages. The challenges in developing Natural Language Processing (NLP) tools for Sundanese stem from the lack of annotated corpora, trained language models, and adequate processing tools, complicating efforts to preserve and enhance the language's usability. This research aims to address these challenges by implementing emotion classification in Sundanese text using Long Short-Term Memory (LSTM) and Bidirectional Encoder Representations from Transformers (BERT) models. The study utilizes a dataset of annotated Sundanese tweets, applying preprocessing techniques such as cleansing, stopword removal, stemming, and tokenization to prepare the data for analysis. The results indicate that the BERT model significantly outperforms the LSTM model, achieving an accuracy of approximately 80% compared to LSTM's 70%. These findings highlight the potential of advanced NLP techniques in enhancing the understanding of emotional nuances in Sundanese communication and contribute to the revitalization of the language in the digital age.
Copyrights © 2024