Garuda - Garba Rujukan Digital

Jurnal INFOTEL

Vol 17 No 3 (2025): August

Agus Sasmito Aribowo (Universitas Pembangunan Nasional Veteran Yogyakarta, Indonesia)
Siti Khomsah (Telkom University, Indonesia)
Shoffan Saifullah (AGH University of Krakow, Poland)

Publish Date
31 Aug 2025

Sentiment classification is usually done manually by humans. Manual senti- ment labeling is ineffective. Therefore, automated labeling using machine learning is es- sential. Building a computerized labeling model presents challenges when labeled data is scarce, which can decrease model accuracy. This study proposes a semi-supervised learn- ing (SSL) framework for sentiment analysis with limited labeled data. The framework integrates self-learning and enhanced co-training. The co-training model combines three machine learning methods: Support Vector Machine (SVM), Random Forest (RF), and Lo- gistic Regression (LR). We use TF-IDF and FastText for feature extraction. The co-training model will generate pseudo-labels. Then, the pseudo-labels from models (SVM, RF, LR) are checked to choose the highest confidence — this is called self-learning. This framework is applied to English and Indonesian language datasets. We ran each dataset five times. The performance difference between the baseline model (without pseudo-labels) and SSL (with pseudo-labels) is not significant; the Wilcoxon Signed-Rank Test confirms it, obtaining a p- value < 0.05. Results show that SSL produces pseudo-labels on unlabeled data with quality close to the original labels on unlabeled data. Although the significance test performs well on four datasets, it has not yet surpassed the performance of the supervised classification (baseline). Labeling using SSL proves more efficient than manual labeling, as evidenced by the processing time of around 10-20 minutes to label thousands to tens of thousands of samples. In conclusion, self-learning in SSL with co-training can effectively label unla- beled data in multilingual and limited datasets, but it has not yet converged across various datasets.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

Jurnal INFOTEL

Website

Abbrev

infotel

Publisher

Universitas Telkom

Subject

Computer Science & IT Electrical & Electronics Engineering

Description

Jurnal INFOTEL is a scientific journal published by Lembaga Penelitian dan Pengabdian Masyarakat (LPPM) of Institut Teknologi Telkom Purwokerto, Indonesia. Jurnal INFOTEL covers the field of informatics, telecommunication, and electronics. First published in 2009 for a printed version and published ...

Article Info

Abstract

Semi-Supervised Sentiment Classification Using Self-Learning and Enhanced Co-Training

Article Info

Abstract