The proliferation of social media has transformed the communication landscape and positioned platforms like YouTube as vital repositories of public sentiment. Manual analysis is no longer feasible because of the exponential volume of user-generated content making automated solutions critical. This study addresses this challenge by evaluating the efficacy of three deep learning architectures including BiLSTM, RoBERTa, and DistilGPT-2 for multi-class sentiment classification, contributing a novel empirical comparison between recurrent, encoder, and generative models on noisy text. The research utilizes the "YouTube Comments Sentiment Dataset" sourced from Kaggle containing over one million entries distributed across Positive, Negative, and Neutral classes with a relatively balanced composition. Methodologically the models were trained to convergence using early stopping and assessed based on weighted F1-scores alongside training duration. The results demonstrate that transformer-based models numerically outperformed the recurrent architecture as RoBERTa achieved the highest F1-score of 0.77 surpassing BiLSTM (0.71) by a margin of 6 percentage points. Transformers also exhibited superior efficiency by converging within 5 epochs compared to 16 for BiLSTM. Despite these numerical gaps statistical analysis via ANOVA revealed that the performance differences were not significant (P > 0.05). Conclusively RoBERTa offers the highest raw accuracy, but DistilGPT-2 emerges as the most practical choice for resource-constrained applications involving limited memory or computational power. It provides a strategic balance of comparable performance and rapid training capabilities even though challenges remain in distinguishing ambiguous neutral comments.
Copyrights © 2026