This research investigates four feature extraction techniques TF-IDF, Smoothed TF-IDF, Inverse Term Counting (ITC), and ITC Smoothed to determine how effectively they enhance text-based emotion classification when working with imbalanced datasets. The study also seeks to pinpoint the most effective pairing between feature extraction methods and classification algorithms. Its key contributions include a methodical side-by-side comparison of these lesser-examined TF-IDF variations and demonstrating empirically that linear models handle class imbalances with considerable resilience. The analysis drew upon an Indonesian Twitter dataset comprising 4,132 tweets, categorized into six unequally distributed emotional states: anger, fear, joy, love, sadness, and neutrality. These four feature extraction approaches were assessed using five distinct classifiers: Naive Bayes, Logistic Regression, SVM, Random Forest, and KNN. Performance was measured through accuracy, precision, recall, and F1-score. Findings indicate that linear classifiers, specifically Logistic Regression and SVM, delivered superior performance, achieving accuracy rates between 93.71% and 94.44%. These models consistently outperformed both probabilistic and distance-based algorithms regardless of the feature extraction method applied. Interestingly, the impact of smoothing proved context-dependent. While applying smoothing to both TF-IDF and ITC boosted the performance of linear models over their unsmoothed counterparts, it paradoxically reduced accuracy for the standard ITC method. This outcome questions the widely held belief that smoothing universally enhances model performance. The combination of Logistic Regression with the unITC Smoothed method yielded the peak accuracy of 94.44%. The study offers actionable guidance, suggesting the pairing of Logistic Regression with ITC as a highly effective strategy for text-based emotion classification. It also contributes theoretically by underscoring the particular aptitude of linear models for managing high-dimensional text data within imbalanced class contexts
Copyrights © 2026