Jurnal Computer Science and Information Technology (CoSciTech)
Vol 5 No 3 (2024): Jurnal Computer Science and Information Technology (CoSciTech)

Pengaruh Agregasi Data pada Klasifikasi Sentimen untuk Dataset Terbatas Menggunakan SGD Classifier

Fauzan Ray T (Unknown)
Surya Agustian (Unknown)
Febi Yanto (Unknown)
Pizaini (Unknown)



Article Info

Publish Date
27 Dec 2024

Abstract

Social media, especially Twitter or X, is a rich source of data for sentiment analysis. However, dataset limitation is a major challenge in utilizing machine learning, especially to produce fast and accurate sentiment analysis. This research applies data aggregation techniques to expand the training dataset and tests various preprocessing steps, such as cleaning, case folding, normalization, stemming, and lexicon-based methods. The classification method used is Stochastic Gradient Descent Classifier with text representation using Fast Text language model to generate word embedding. Lexicon-based preprocessing, particularly for emoji and emoticon handling, shows significant impact when data is added, as it is able to capture additional emotion and context that is often overlooked in conventional text analysis. Experimental results show that data addition and preprocessing optimization improved F1 Score from a baseline of 40% to 52.13%, surpassing the organizer which reached 51.28%. These findings emphasize the importance of data aggregation, preprocessing optimization, and parameter tuning using grid search in improving model performance on text sentiment classification with limited datasets.

Copyrights © 2024






Journal Info

Abbrev

coscitech

Publisher

Subject

Computer Science & IT

Description

Jurnal CoSciTech (Computer Science and Information Technology) merupakan jurnal peer-review yang diterbitkan oleh Program Studi Teknik Informatika, Fakultas Ilmu Komputer, Univeritas Muhammadiyah Riau (UMRI) sejak April tahun 2020. Jurnal CoSciTech terdaftar pada PDII LIPI dengan Nomor ISSN ...