JURTEKSI
Vol. 12 No. 2 (2026): Maret 2026

SELENIUM–INDOBERT PIPELINE FOR PSEUDO-LABELING SENTIMENT ANALYSIS OF INDONESIAN YOUTUBE COMMENTS

Fazli Nugraha Tambunan (STIKOM Tunas Bangsa)
Heru Satria Tambunan (STIKOM Tunas Bangsa)
Doughlas Pardede (University Deli Sumatera)



Article Info

Publish Date
30 Mar 2026

Abstract

YouTube has become a major platform for public discourse in Indonesia, yet large-scale sentiment analysis of its comments remains challenging due to dynamic content, informal language, and limited labeled data. This study proposes a Selenium–IndoBERT pipeline for sentiment analysis of Indonesian YouTube comments using a pseudo-labeling approach. Data were collected from ten YouTube videos discussing the One Piece flag phenomenon, yielding 10,842 comments after preprocessing. Selenium was employed to extract comments from dynamic pages, while IndoBERT was fine-tuned on a small manually labeled dataset and used to generate pseudo-labels for unlabeled data. Model performance was evaluated using probabilistic metrics, including Coverage, Expected Calibration Error (ECE), and Brier Score. At a confidence threshold of 0.75, 78.5% of comments received pseudo-labels, with an ECE of 0.095 and a Brier Score of 0.174. Manual validation showed substantial agreement with human annotations (Fleiss’ kappa = 0.72). The results indicate that the proposed pipeline enables scalable and reliable sentiment analysis with minimal manual annotation.

Copyrights © 2026






Journal Info

Abbrev

jurteksi

Publisher

Subject

Computer Science & IT

Description

JURTEKSI (Jurnal Teknologi dan Sistem Informasi) is a scientific journal which is published by STMIK Royal Kisaran. This journal published twice a year on December and June. This journal contains a collection of research in information technology and computer ...