This paper discusses the critical pre-processing steps for appropriate sentiment analysis (SA) in an educational domain, especially when working on text messenger data from instant messaging applications like WhatsApp and Telegram. As these platforms often generate noisy, unstructured, and multilingual messages that include textisms, emojis, and mixed-language expressions, proper data preparation is essential to ensure reliable analytical outcomes. The primary goal of this work is to discover and validate pre-processing approaches applied for improving the model’s performance when working with such rich data. In order to do so, we performed an SLR to establish best practices on text pre-processing for SA using methods applicable for informal, user-generated content. Characteristics extracted via the SLR, namely, textism normalization, stop word removal, punctuation removal, stemming, translation of mixed-language text and tokenization were next applied to a gathered dataset from educational subject groups. These techniques achieved a great increase of 0.705 to 0.893 based on the BERT model accuracy. These results emphasize the need of well-developed pre-processing pipelines for handling multilingual and unstructured text in educational communication channels. However, the study is limited to text data from WhatsApp and Telegram, focusing only on Malay and English languages. Further studies could explore other languages, platforms and more advanced normalization processes in a way that continues to enhance the predictive capacity of pre-processing strategies for sentiment analysis across an array of educational contexts.