Garuda - Garba Rujukan Digital

JURTEKSI

Vol. 12 No. 2 (2026): Maret 2026

Fazli Nugraha Tambunan (STIKOM Tunas Bangsa)
Heru Satria Tambunan (STIKOM Tunas Bangsa)
Doughlas Pardede (University Deli Sumatera)

Publish Date
30 Mar 2026

YouTube has become a major platform for public discourse in Indonesia, yet large-scale sentiment analysis of its comments remains challenging due to dynamic content, informal language, and limited labeled data. This study proposes a Selenium–IndoBERT pipeline for sentiment analysis of Indonesian YouTube comments using a pseudo-labeling approach. Data were collected from ten YouTube videos discussing the One Piece flag phenomenon, yielding 10,842 comments after preprocessing. Selenium was employed to extract comments from dynamic pages, while IndoBERT was fine-tuned on a small manually labeled dataset and used to generate pseudo-labels for unlabeled data. Model performance was evaluated using probabilistic metrics, including Coverage, Expected Calibration Error (ECE), and Brier Score. At a confidence threshold of 0.75, 78.5% of comments received pseudo-labels, with an ECE of 0.095 and a Brier Score of 0.174. Manual validation showed substantial agreement with human annotations (Fleiss’ kappa = 0.72). The results indicate that the proposed pipeline enables scalable and reliable sentiment analysis with minimal manual annotation.

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

JURTEKSI

Website

Abbrev

jurteksi

Publisher

STMIK Royal Kisaran

Subject

Computer Science & IT

Description

JURTEKSI (Jurnal Teknologi dan Sistem Informasi) is a scientific journal which is published by STMIK Royal Kisaran. This journal published twice a year on December and June. This journal contains a collection of research in information technology and computer ...

Article Info

Abstract

SELENIUM–INDOBERT PIPELINE FOR PSEUDO-LABELING SENTIMENT ANALYSIS OF INDONESIAN YOUTUBE COMMENTS

Article Info

Abstract