Rahmad, Bayu Aji Hamengku
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Improving the Performance of Machine Learning Classifiers in Sentiment Analysis of Jenius Application Using Latent Dirichlet Allocation in Text Preprocessing Prasetyo, Vincentius Riandaru; Benarkah, Njoto; Rahmad, Bayu Aji Hamengku
Jurnal Teknik Informatika (Jutif) Vol. 6 No. 5 (2025): JUTIF Volume 6, Number 5, Oktober 2025
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2025.6.5.5238

Abstract

Sentiment analysis aims to classify a person’s opinion into a specific sentiment, such as positive or negative. The choice of preprocessing used can influence the performance of a sentiment analysis model. The Latent Dirichlet Allocation (LDA) method, commonly used for topic modelling, can be employed as an additional preprocessing step to identify relevant words associated with a particular sentiment label. This study aims to assess whether the LDA method, implemented in the preprocessing stage, can enhance the performance of machine learning models, including Naïve Bayes, Decision Tree, KNN, Logistic Regression, and SVM. This study utilized a dataset comprising 1,800 reviews, with 900 labelled as positive and 900 as negative. Words with an LDA score of at least 0.15 were given additional weight in the TF-IDF stage before model training. After the model was developed, evaluation was carried out by calculating accuracy, precision, recall, and F1-score. The use of LDA in preprocessing improved the performance of all classification models by 1-3% across most evaluation metrics. Specifically, the Logistic Regression model achieved the best performance, followed by SVM and KNN. This performance improvement is aligned with the use of LDA to reduce semantic noise and improve feature representation. Furthermore, this research is also helpful for monitoring customer opinions in the digital banking sector, enabling the rapid and accurate identification of priority issues. Further research could explore the comparison of performance with other topic modelling and feature extraction methods, as well as expanding the dataset and utilizing multiclass models.