This Author published in this journals
All Journal JURTEKSI
Fazli Nugraha Tambunan
STIKOM Tunas Bangsa

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

SELENIUM–INDOBERT PIPELINE FOR PSEUDO-LABELING SENTIMENT ANALYSIS OF INDONESIAN YOUTUBE COMMENTS Fazli Nugraha Tambunan; Heru Satria Tambunan; Doughlas Pardede
JURTEKSI (jurnal Teknologi dan Sistem Informasi) Vol. 12 No. 2 (2026): Maret 2026
Publisher : Lembaga Penelitian dan Pengabdian Kepada Masyarakat (LPPM) STMIK Royal Kisaran

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33330/jurteksi.v12i2.4415

Abstract

YouTube has become a major platform for public discourse in Indonesia, yet large-scale sentiment analysis of its comments remains challenging due to dynamic content, informal language, and limited labeled data. This study proposes a Selenium–IndoBERT pipeline for sentiment analysis of Indonesian YouTube comments using a pseudo-labeling approach. Data were collected from ten YouTube videos discussing the One Piece flag phenomenon, yielding 10,842 comments after preprocessing. Selenium was employed to extract comments from dynamic pages, while IndoBERT was fine-tuned on a small manually labeled dataset and used to generate pseudo-labels for unlabeled data. Model performance was evaluated using probabilistic metrics, including Coverage, Expected Calibration Error (ECE), and Brier Score. At a confidence threshold of 0.75, 78.5% of comments received pseudo-labels, with an ECE of 0.095 and a Brier Score of 0.174. Manual validation showed substantial agreement with human annotations (Fleiss’ kappa = 0.72). The results indicate that the proposed pipeline enables scalable and reliable sentiment analysis with minimal manual annotation.