Jurnal Sistem Informasi dan Informatika (SIMIKA)
Vol. 9 No. 1 (2026): Jurnal Sistem Informasi dan Informatika (Simika)

IMBALANCED DATA HANDLING FOR OPTIMIZING RANDOM FOREST IN SENTIMENT ANALYSIS OF EAST JAVA GUBERNATORIAL ELECTION

Rahma Putri Widyaiswari (Telkom University)
Anisa Dzulkarnain (Telkom University)
Alqis Rausanfita (Telkom University)



Article Info

Publish Date
09 Feb 2026

Abstract

Social media has become a strategic platform in conveying public opinion, especially at the moment of the Regional Head Election (Pilkada). The large amount of opinion data produced opens up opportunities for the application of sentiment analysis to map public perception. One of the main challenges in the classification of sentiment is the imbalance of distribution between classes, which can degrade the accuracy of the model, especially in recognizing minority classes. This study aims to analyze the impact of the application of data balancing techniques on the performance of the 2024 East Java Regional Election sentiment classification model using the Random Forest algorithm. The series of processes in the study include data preprocessing, manual sentiment labeling, text preprocessing, word weighting with TF-IDF, and model training on three data ratios, namely 90:10, 80:20, and 70:30. Each ratio was tested in three scenarios, namely no balancing (baseline), undersampling using the Tomek Links method, and oversampling using Borderline-SMOTE. Of all scenarios, Borderline-SMOTE gave the highest accuracy of 82.40% at an 80:20 ratio, an increase of 2.19% compared to the unbalanced condition at the same ratio. These results show that data balancing is able to improve the performance of the model in classifying sentiment more proportionally.

Copyrights © 2026






Journal Info

Abbrev

jsii

Publisher

Subject

Computer Science & IT Control & Systems Engineering

Description

Jurnal Sistem Informasi dan Informatika aims to provide scientific literature specifically on studies of applied research in information systems (IS), information technology (IT) and public review of the development of theory, method, and applied sciences related to the ...