TEKNIK INFORMATIKA
Vol 17, No 1: JURNAL TEKNIK INFORMATIKA

A Comparative Analysis of Random Forest, XGBoost, and LightGBM Algorithms for Emotion Classification in Reddit Comments

Nenny Anggraini (Syarif Hidayatullah State Islamic University Jakarta)
Syopiansyah Jaya Putra (Syarif Hidayatullah State Islamic University Jakarta)
Luh Kesuma Wardhani (Syarif Hidayatullah State Islamic University Jakarta)
Farid Dhiya Ul Arif (Syarif Hidayatullah State Islamic University Jakarta)
Nashrul Hakiem (Syarif Hidayatullah State Islamic University Jakarta)
Imam Marzuki Shofi (Syarif Hidayatullah State Islamic University Jakarta)



Article Info

Publish Date
20 May 2024

Abstract

This research aims to compare the performance of three classification algorithms, namely Random Forest, XGBoost, and LightGBM, in classifying emotions in Reddit comments. Emotion classification in Reddit comments is a complex classification problem due to its numerous variations and ambiguities. This research utilizes the GoEmotions Fine-Grained dataset, filtered down to 7,325 Reddit comments with 5 different basic emotion labels. In this study, data preprocessing steps, feature extraction using CountVectorizer and TF-IDF, and hyperparameter tuning using GridSearchCV for each algorithm are conducted. Subsequently, model evaluation is performed using Cross-Validation and confusion matrix. The results of the study indicate that Random Forest outperforms the XGBoost and LightGBM algorithm with an accuracy of 75.38% compared to XGBoost with 69.05% accuracy and LightGBM with 66.63% accuracy.

Copyrights © 2024






Journal Info

Abbrev

ti

Publisher

Subject

Computer Science & IT

Description

Jurnal Teknik Informatika merupakan wadah bagi insan peneliti, dosen, praktisi, mahasiswa dan masyarakat ilmiah lainnya untuk mempublikasikan artikel hasil penelitian, rekayasa dan kajian di bidang Teknologi Informasi. Jurnal Teknik Informatika diterbitkan 2 (dua) kali dalam ...