Social media platforms have emerged as essential channels for real time crisis communication, offering valuable insights into public sentiment and humanitarian needs during emergencies. This study benchmarks the performance of state of the art deep learning models for classifying sentiment and humanitarian relevance in crisis related tweets. Using publicly available datasets CrisisMMD, HumAID, and CrisisBench we evaluate three architectures: IDBO CNN BiLSTM, BERTweet, and CrisisTransformers. These models were assessed using cross validation and standard performance metrics (accuracy, F1 score, precision, and recall). Results indicate that CrisisTransformers outperform both traditional CNN LSTM hybrids and general purpose transformers, achieving an accuracy of 0.861 and F1 score of 0.847. Domain specific pretraining significantly enhances contextual understanding, particularly in multilingual and ambiguous tweet scenarios. While transformer models offer superior classification capabilities, their computational complexity poses challenges for real time deployment. Additionally, operational risks, such as data bias and misinformation, necessitate careful management through structured human oversight and the integration of explainable AI mechanisms. This research provides a robust comparison of NLP models for crisis applications and recommends strategies for effective deployment, including bias mitigation and fairness aware learning. The findings contribute to building ethical and efficient NLP systems for humanitarian response.
Copyrights © 2025