Sentiment analysis is an important task in natural language processing, aimed at identifying and classifying opinions or emotions in textual data. This study compares the performance of four classification algorithms—Naïve Bayes, Support Vector Machine (SVM), Random Forest, and Long Short-Term Memory (LSTM)—on 25,000 English-language movie reviews with balanced sentiment labels. Text preprocessing includes cleaning, tokenization, and TF-IDF vectorization for traditional models. For LSTM, both randomly initialized embeddings and pre-trained embeddings are tested. Results, evaluated using accuracy, F1-score, and confusion matrix, show that SVM performs best with 89% accuracy, followed by Naïve Bayes and LSTM at 86%, and Random Forest at 82%. LSTM performs poorly with TF-IDF or self-trained embeddings but improves significantly with pre-trained embeddings. These findings indicate that traditional models, especially SVM, remain highly effective for sentiment analysis on moderately sized datasets, while LSTM requires proper text representation to perform competitively.
Copyrights © 2026