Background of study: Background of study: The impact of online reviews on consumer behavior is especially relevant in the hospitality industry, and the sentiment corresponding to these reviews is difficult to determine due to the subjectivity involved in the reviews, disparate writing styles, and the noticeable class imbalance resulting from the positive reviews outnumbering the negative and neutral ones. Standard machine learning approaches are biased toward the majority class and do not address these problems well.Aims and scope of paper: The present research uses BERT and LSTM deep learning models to perform classification of customer reviews for hotels into three categories: positive, neutral, and negative. The main focus of the research is to analyze the performance of the models concerning sentiment prediction and the handling of the data imbalance problem and to benchmark the models with and without the use of under-sampling.Methods: The dataset comprising of 20,000 reviews from the TripAdvisor platform was pre processed in various ways including the removal of stop words/special characters, tokenization, stemming, and lemmatization. The customer reviews were assigned star ratings, which were aggregated into categories of 4-5 stars as positive, 3 stars as neutral, and 1-2 stars as negative. Random under-sampling was used to the positive class to achieve balance in the dataset. The BERT (bert-base-uncased) and LSTM models were prepared with what was assumed to be a final train-validation split of 80:20, and were evaluated based on standard metrics of accuracy, precision, recall, and rel F1 score, and with a cross-validation of 5 folds.Result: Without the use of under-sampling, BERT achieved the best overall performance with an accuracy of 0.86 and an F1 score of 0.93 for the positive sentiment class and an F1 score of 0.79 in the negative sentiment class. However, all models struggled with neutral sentiments (BERT F1-score: 0.43, LSTM: 0.25). Under-sampling improved neutral class recall (BERT: 0.79) but decreased overall accuracy (BERT: 0.73; LSTM: 0.67) and positive class precision.Conclusion: BERT generally outperforms LSTM for hotel review sentiment analysis, particularly with imbalanced data. While under-sampling helps address class imbalance by improving neutral recall, it incurs significant performance trade-offs, reducing overall accuracy and precision in majority classes due to information loss. Future work should explore advanced resampling (SMOTE, ADASYN) or transfer learning (RoBERTa, XLNet) for better balance and neutral sentiment classification.