Background of study: Online reviews are crucial in shaping consumer decisions, particularly in the hospitality industry, but accurately extracting sentiment remains challenging due to language subjectivity, varied expression styles, and significant class imbalance where positive reviews outweigh negative and neutral ones. Traditional machine learning methods often fail to address these issues effectively, favoring the majority class.Aims and scope of paper: This study employs BERT and LSTM deep learning models to classify hotel reviews into positive, neutral, and negative sentiment categories. The primary aim is to compare their performance in sentiment analysis and managing imbalanced data, evaluating both with and without under-sampling.Methods: A 20,000-review dataset from TripAdvisor was pre-processed, including stop word/special character removal, tokenization, stemming, and lemmatization. Star ratings were categorized: 4-5 as positive, 3 as neutral, and 1-2 as negative. Random under-sampling was applied to the majority (positive) class to balance the dataset. BERT (bert-base-uncased) and LSTM models were trained with an 80:20 training-validation split, and evaluated using accuracy, precision, recall, and F1-score, with 5-fold cross-validation.Result: BERT without under-sampling achieved the highest overall accuracy of 0.86, with strong F1-scores for positive (0.93) and negative (0.79) sentiments. However, all models struggled with neutral sentiments (BERT F1-score: 0.43, LSTM: 0.25). Under-sampling improved neutral class recall (BERT: 0.79) but decreased overall accuracy (BERT: 0.73; LSTM: 0.67) and positive class precision.Conclusion: BERT generally outperforms LSTM for hotel review sentiment analysis, particularly with imbalanced data. While under-sampling helps address class imbalance by improving neutral recall, it incurs significant performance trade-offs, reducing overall accuracy and precision in majority classes due to information loss. Future work should explore advanced resampling (SMOTE, ADASYN) or transfer learning (RoBERTa, XLNet) for better balance and neutral sentiment classification.