Garuda - Garba Rujukan Digital

ILKOM Jurnal Ilmiah

Vol 18, No 1 (2026)

Garonga, Melki (Unknown)
Rangga Punne, Mc Rore (Unknown)
Damayanti, Irene Devi (Unknown)

Publish Date
20 Apr 2026

This research investigates four feature extraction techniques TF-IDF, Smoothed TF-IDF, Inverse Term Counting (ITC), and ITC Smoothed to determine how effectively they enhance text-based emotion classification when working with imbalanced datasets. The study also seeks to pinpoint the most effective pairing between feature extraction methods and classification algorithms. Its key contributions include a methodical side-by-side comparison of these lesser-examined TF-IDF variations and demonstrating empirically that linear models handle class imbalances with considerable resilience. The analysis drew upon an Indonesian Twitter dataset comprising 4,132 tweets, categorized into six unequally distributed emotional states: anger, fear, joy, love, sadness, and neutrality. These four feature extraction approaches were assessed using five distinct classifiers: Naive Bayes, Logistic Regression, SVM, Random Forest, and KNN. Performance was measured through accuracy, precision, recall, and F1-score. Findings indicate that linear classifiers, specifically Logistic Regression and SVM, delivered superior performance, achieving accuracy rates between 93.71% and 94.44%. These models consistently outperformed both probabilistic and distance-based algorithms regardless of the feature extraction method applied. Interestingly, the impact of smoothing proved context-dependent. While applying smoothing to both TF-IDF and ITC boosted the performance of linear models over their unsmoothed counterparts, it paradoxically reduced accuracy for the standard ITC method. This outcome questions the widely held belief that smoothing universally enhances model performance. The combination of Logistic Regression with the unITC Smoothed method yielded the peak accuracy of 94.44%. The study offers actionable guidance, suggesting the pairing of Logistic Regression with ITC as a highly effective strategy for text-based emotion classification. It also contributes theoretically by underscoring the particular aptitude of linear models for managing high-dimensional text data within imbalanced class contexts

Citation Download

EndNote, Reference Manager, ProCite

Latex, Jabref

Check in Google Scholar

Journal Info

ILKOM Jurnal Ilmiah

Website

Abbrev

ILKOM

Publisher

Universitas Muslim Indonesia

Subject

Computer Science & IT

Description

ILKOM Jurnal Ilmiah is an Indonesian scientific journal published by the Department of Information Technology, Faculty of Computer Science, Universitas Muslim Indonesia. ILKOM Jurnal Ilmiah covers all aspects of the latest outstanding research and developments in the field of Computer science, ...

Article Info

Abstract

Optimization of Text Emotion Classification through the Combination of ITC Smoothed and Linear Models

Article Info

Abstract