Public engagement on YouTube provides a valuable source for examining audience responses to film productions; however, sentiment classification of Indonesian-language comments remains methodologically challenging due to informal expressions, noisy text, and imbalanced class distributions. This study evaluates the robustness of a classical machine learning pipeline for sentiment classification of YouTube comments on the trailer of the Indonesian animated film Merah Putih: One for All. A total of 5,469 comments were collected using the YouTube Data API v3. After preprocessing and lexicon-based pseudo-labeling, 5,192 comments were retained, consisting of 4,006 negative and 1,186 positive instances. Text features were represented using TF-IDF, while SMOTE was applied only to the training set after a stratified 80:20 split to prevent data leakage. Two classifiers were compared under identical experimental conditions: Multinomial Naïve Bayes and linear Support Vector Machine. The SVM model achieved 81.59% accuracy, 83% precision, 82% recall, and 82% F1-score on the original held-out test set, outperforming Naïve Bayes, which obtained 76.82% accuracy. The findings suggest that margin-based classification is more suitable than probabilistic classification for sparse, high-dimensional Indonesian YouTube comments, particularly when feature independence assumptions are likely violated. The study contributes a leakage-controlled evaluation of classical sentiment classification under imbalanced social-media conditions and highlights the methodological implications of pseudo-labeling and synthetic oversampling in Indonesian film-related opinion mining.
Copyrights © 2026