Fauzan Ray T
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Pengaruh Agregasi Data pada Klasifikasi Sentimen untuk Dataset Terbatas Menggunakan SGD Classifier Fauzan Ray T; Surya Agustian; Febi Yanto; Pizaini
Computer Science and Information Technology Vol 5 No 3 (2024): Jurnal Computer Science and Information Technology (CoSciTech)
Publisher : Universitas Muhammadiyah Riau

Show Abstract | Download Original | Original Source | Check in Google Scholar

Abstract

Social media, especially Twitter or X, is a rich source of data for sentiment analysis. However, dataset limitation is a major challenge in utilizing machine learning, especially to produce fast and accurate sentiment analysis. This research applies data aggregation techniques to expand the training dataset and tests various preprocessing steps, such as cleaning, case folding, normalization, stemming, and lexicon-based methods. The classification method used is Stochastic Gradient Descent Classifier with text representation using Fast Text language model to generate word embedding. Lexicon-based preprocessing, particularly for emoji and emoticon handling, shows significant impact when data is added, as it is able to capture additional emotion and context that is often overlooked in conventional text analysis. Experimental results show that data addition and preprocessing optimization improved F1 Score from a baseline of 40% to 52.13%, surpassing the organizer which reached 51.28%. These findings emphasize the importance of data aggregation, preprocessing optimization, and parameter tuning using grid search in improving model performance on text sentiment classification with limited datasets.