YouTube has become a major platform for public discourse in Indonesia, yet large-scale sentiment analysis of its comments remains challenging due to dynamic content, informal language, and limited labeled data. This study proposes a Selenium–IndoBERT pipeline for sentiment analysis of Indonesian YouTube comments using a pseudo-labeling approach. Data were collected from ten YouTube videos discussing the One Piece flag phenomenon, yielding 10,842 comments after preprocessing. Selenium was employed to extract comments from dynamic pages, while IndoBERT was fine-tuned on a small manually labeled dataset and used to generate pseudo-labels for unlabeled data. Model performance was evaluated using probabilistic metrics, including Coverage, Expected Calibration Error (ECE), and Brier Score. At a confidence threshold of 0.75, 78.5% of comments received pseudo-labels, with an ECE of 0.095 and a Brier Score of 0.174. Manual validation showed substantial agreement with human annotations (Fleiss’ kappa = 0.72). The results indicate that the proposed pipeline enables scalable and reliable sentiment analysis with minimal manual annotation.
Copyrights © 2026