Garuda - Garba Rujukan Digital

Anisa Dzulkarnain

Telkom University

Author-ID : 9731496

Computer Science & IT Control & Systems Engineering

Published : 3 Documents Claim Missing Document

Claim Missing Document

Articles

IMBALANCED DATA HANDLING FOR OPTIMIZING RANDOM FOREST IN SENTIMENT ANALYSIS OF EAST JAVA GUBERNATORIAL ELECTION Rahma Putri Widyaiswari; Anisa Dzulkarnain; Alqis Rausanfita
Jurnal Sistem Informasi dan Informatika (Simika) Vol. 9 No. 1 (2026): Jurnal Sistem Informasi dan Informatika (Simika)
Publisher : Program Studi Sistem Informasi, Universitas Banten Jaya

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47080/simika.v9i1.4131

Social media has become a strategic platform in conveying public opinion, especially at the moment of the Regional Head Election (Pilkada). The large amount of opinion data produced opens up opportunities for the application of sentiment analysis to map public perception. One of the main challenges in the classification of sentiment is the imbalance of distribution between classes, which can degrade the accuracy of the model, especially in recognizing minority classes. This study aims to analyze the impact of the application of data balancing techniques on the performance of the 2024 East Java Regional Election sentiment classification model using the Random Forest algorithm. The series of processes in the study include data preprocessing, manual sentiment labeling, text preprocessing, word weighting with TF-IDF, and model training on three data ratios, namely 90:10, 80:20, and 70:30. Each ratio was tested in three scenarios, namely no balancing (baseline), undersampling using the Tomek Links method, and oversampling using Borderline-SMOTE. Of all scenarios, Borderline-SMOTE gave the highest accuracy of 82.40% at an 80:20 ratio, an increase of 2.19% compared to the unbalanced condition at the same ratio. These results show that data balancing is able to improve the performance of the model in classifying sentiment more proportionally.

COMPARISON OF SPLIT DATA RATIO PERFORMANCE IN SENTIMENT ANALYSIS OF PON XXI ACEH-SUMUT 2024 USING SUPPORT VECTOR MACHINE WITH SMOTE APPLICATION Karina Shafa Amalia; Anisa Dzulkarnain; Berlian Rahmy Lidyawati
Jurnal Sistem Informasi dan Informatika (Simika) Vol. 9 No. 1 (2026): Jurnal Sistem Informasi dan Informatika (Simika)
Publisher : Program Studi Sistem Informasi, Universitas Banten Jaya

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47080/simika.v9i1.4161

The 21st National Sports Week (PON) Aceh-North Sumatra 2024 is the largest multi-sport competition in Indonesia, sparking diverse public responses on social media platforms, particularly X (formerly Twitter). The high volume and diverse nature of comments related to PON XXI pose challenges in understanding public sentiment and communication patterns. This study aims to compare the performance of various training and testing data splitting ratios in the Support Vector Machine (SVM) algorithm with an RBF kernel for sentiment classification of X platform data related to PON XXI Aceh-North Sumatra 2024. The research methodology involved data collection using the Tweet Harvest library, gathering 2,503 Indonesian-language posts during the period from 9 August to 20 October 2024. Text preprocessing included cleaning, case adjustment, normalisation, tokenisation, stop word removal, and stemming. The dataset was classified into three sentiment categories: positive, negative, and neutral. Four different split ratios were evaluated: 90:10, 80:20, 70:30, and 60:40. The SMOTE (Synthetic Minority Over-sampling Technique) was applied to address the class imbalance issue. The results show that the 80:20 split ratio achieved optimal performance with the highest accuracy of 86.23%, precision of 86.10%, recall of 86.23%, and F1 score of 86.15%. These findings indicate that the appropriate data split ratio significantly influences model performance and provides valuable insights for developing more accurate and representative public opinion analysis models for Indonesian social media content.

Optimalisasi Random Forest untuk Sentimen Pilkada Jawa Timur dengan Chi-Square dan Mutual Information Rahma Putri Widyaiswari; Anisa Dzulkarnain; Alqis Rausanfita
JUITA: Jurnal Informatika JUITA Vol. 13 Issue 3, November 2025
Publisher : Department of Informatics Engineering, Universitas Muhammadiyah Purwokerto

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30595/juita.v13i3.26778

The rise of social media has transformed the way people express opinions, including in political contexts. In the 2024 East Java Gubernatorial Election, social media platform X became a major outlet for public sentiment toward the governor and deputy governor candidates. This study aims to analyse public sentiment toward three candidate pairs by categorizing the data into three sentiment classes: positive, negative, and neutral. Feature selection was conducted by combining Term Frequency-Inverse Document Frequency (TF-IDF) with Chi-Square and Mutual Information (MI) methods to improve feature quality. The Random Forest algorithm was employed as the primary classification model. In addition, several other algorithms were tested for comparison. The results indicate that the TF-IDF and Chi-Square combination with Random Forest achieved the highest accuracy of 82.07%. These findings highlight the importance of feature selection in improving model performance for sentiment classification. The study provides insights into public opinion that can serve as a reference for strategic decision-making in the political and public sectors.

Title

Found 3 Documents
Search

Abstract

Abstract

Abstract

Title Search

Found 3 Documents Search

Abstract

Abstract

Abstract

Title

Found 3 Documents
Search