Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : Journal of Computer System and Informatics (JoSYC)

Klasifikasi Sentimen pada Dataset Terbatas Menggunakan Random Forest dan Word2Vec Fitri, Dina Deswara; Agustian, Surya; Pizaini, Pizaini; Sanjaya, Suwanto
Journal of Computer System and Informatics (JoSYC) Vol 6 No 1 (2024): November 2024
Publisher : Forum Kerjasama Pendidikan Tinggi (FKPT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/josyc.v6i1.6246

Abstract

Sentiment measurement of public opinion on social media is essential for understanding societal views on various issues, including public figures and political events. This research explores the effectiveness of the Random Forest algorithm with Word2Vec-based word representation for sentiment classification on a limited dataset. The case study involves tweets regarding Kaesang Pangarep as the Chairman of PSI, supplemented by external data related to Covid-19 and general topics. The dataset was processed using cleaning techniques, case folding, stopword removal, stemming, and tokenization. Words in the dataset were represented using the Word2Vec model with a Continuous Bag of Words (CBOW) architecture and a vector dimension of 500. Random Forest was employed to classify sentiment into positive, negative, or neutral categories. In the initial phase, the model was trained using 300 samples per label; however, the results showed unsatisfactory performance with an F1-Score of 49.00% and an accuracy of 50.00%. To improve performance, the dataset was expanded by adding 900 samples from Kaesang and 1,080 samples from external topics. The final results indicated an improvement with an F1-Score of 49.89%, an accuracy of 58.29%, precision of 49.16%, and recall of 56.47%. This research confirms that the use of Random Forest with word representation from Word2Vec can enhance sentiment classification performance, even with a limited dataset, and contributes to the development of sentiment analysis techniques in the field of machine learning.