This Author published in this journals
All Journal Jurnal INFOTEL
Siti Khomsah
Telkom University, Indonesia

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Semi-Supervised Sentiment Classification Using Self-Learning and Enhanced Co-Training Agus Sasmito Aribowo; Siti Khomsah; Shoffan Saifullah
JURNAL INFOTEL Vol 17 No 3 (2025): August
Publisher : LPPM INSTITUT TEKNOLOGI TELKOM PURWOKERTO

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.20895/infotel.v17i3.1344

Abstract

Sentiment classification is usually done manually by humans. Manual senti- ment labeling is ineffective. Therefore, automated labeling using machine learning is es- sential. Building a computerized labeling model presents challenges when labeled data is scarce, which can decrease model accuracy. This study proposes a semi-supervised learn- ing (SSL) framework for sentiment analysis with limited labeled data. The framework integrates self-learning and enhanced co-training. The co-training model combines three machine learning methods: Support Vector Machine (SVM), Random Forest (RF), and Lo- gistic Regression (LR). We use TF-IDF and FastText for feature extraction. The co-training model will generate pseudo-labels. Then, the pseudo-labels from models (SVM, RF, LR) are checked to choose the highest confidence — this is called self-learning. This framework is applied to English and Indonesian language datasets. We ran each dataset five times. The performance difference between the baseline model (without pseudo-labels) and SSL (with pseudo-labels) is not significant; the Wilcoxon Signed-Rank Test confirms it, obtaining a p- value < 0.05. Results show that SSL produces pseudo-labels on unlabeled data with quality close to the original labels on unlabeled data. Although the significance test performs well on four datasets, it has not yet surpassed the performance of the supervised classification (baseline). Labeling using SSL proves more efficient than manual labeling, as evidenced by the processing time of around 10-20 minutes to label thousands to tens of thousands of samples. In conclusion, self-learning in SSL with co-training can effectively label unla- beled data in multilingual and limited datasets, but it has not yet converged across various datasets.