Text classification is one of the most popular tasks in natural language processing, especially in the context of sentiment classification. Insufficient training data poses a significant challenge in many text classification studies. This research focuses on optimizing classification performance using the Passive Aggressive (PA) algorithm, leveraging limited training data. It compares conventional text representation methods like TF-IDF with modern approaches employing word embeddings such as FastText and BERT. The primary dataset encompasses sentiment issues related to Kaesang Pangarep's appointment as the chairman of PSI, gathered through Twitter crawling, and classified into positive, negative, and neutral sentiment labels. Two versions of the training data, each containing only 300 balanced tweets for positive, negative, and neutral classes, were used. The data was split 80% for training and 20% for validation in the search for an optimal model. External data with different issues and pre-existing sentiment labels was used to augment the training data. Experimental results demonstrated that the BERT language model, used as input features for the Passive Aggressive method with hyperparameter tuning, outperformed TF-IDF features. Evaluation on the test data revealed that BERT features with Passive Aggressive achieved an F1-score of 0.52, surpassing the conventional TF-IDF representation with an F1-score of 0.42. The utilization of the BERT language model significantly contributed to improving text classification performance in the field of natural language processing, particularly for the Passive Aggressive method.
Copyrights © 2024