This Author published in this journals
All Journal bit-Tech
Salem Abdullah Salem Garfan
Universiti Pendidikan Sultan Idris

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Enhancing Sentiment Classification Accuracy Through Pre-Processing In Educational Text Messenger Data Md Abdul Bakir; Suliana Sulaiman; Salem Abdullah Salem Garfan
bit-Tech Vol. 8 No. 2 (2025): bit-Tech
Publisher : Komunitas Dosen Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.32877/bt.v8i2.3377

Abstract

This paper discusses the critical pre-processing steps for appropriate sentiment analysis (SA) in an educational domain, especially when working on text messenger data from instant messaging applications like WhatsApp and Telegram. As these platforms often generate noisy, unstructured, and multilingual messages that include textisms, emojis, and mixed-language expressions, proper data preparation is essential to ensure reliable analytical outcomes. The primary goal of this work is to discover and validate pre-processing approaches applied for improving the model’s performance when working with such rich data. In order to do so, we performed an SLR to establish best practices on text pre-processing for SA using methods applicable for informal, user-generated content. Characteristics extracted via the SLR, namely, textism normalization, stop word removal, punctuation removal, stemming, translation of mixed-language text and tokenization were next applied to a gathered dataset from educational subject groups. These techniques achieved a great increase of 0.705 to 0.893 based on the BERT model accuracy. These results emphasize the need of well-developed pre-processing pipelines for handling multilingual and unstructured text in educational communication channels. However, the study is limited to text data from WhatsApp and Telegram, focusing only on Malay and English languages. Further studies could explore other languages, platforms and more advanced normalization processes in a way that continues to enhance the predictive capacity of pre-processing strategies for sentiment analysis across an array of educational contexts.