Purpose: Sentiment analysis, commonly referred to as opinion mining, involves the study of people's opinions, emotions, and attitudes toward various subjects. While the Random Forest algorithm is frequently employed in sentiment classification tasks, its integration with Particle Swarm Optimization (PSO) for feature selection remains relatively underexplored. This study investigates whether PSO-based feature selection can enhance the predictive performance of Random Forest by optimizing the selection of relevant textual features, ultimately leading to more accurate sentiment classification. Methods: The research adopts a structured text preprocessing approach that includes data cleansing, case folding, normalization, stop-word removal, and stemming to refine the input text. Term Frequency-Inverse Document Frequency (TF-IDF) is applied to extract features, followed by PSO-driven feature selection to refine the input set for the Random Forest classifier. The proposed model is evaluated using a Twitter sentiment dataset related to “Bawaslu”, with performance measured based on Out-of-Bag (OOB) error and accuracy metrics. Result: Empirical results demonstrate that incorporating PSO-based feature selection into the Random Forest model substantially lowers the OOB error to 20.42%, compared to 28.72% in the baseline Random Forest model. Furthermore, the optimized model achieves an accuracy of 78.35%, outperforming the standard approach. However, the introduction of PSO-based feature selection increases computational demands, indicating a trade-off between classification accuracy and processing efficiency. Novelty: This study introduces the novel integration of PSO-driven feature selection with Random Forest classification for sentiment analysis, addressing challenges in imbalanced text data. By optimizing feature selection through a metaheuristic approach, it enhances model accuracy and efficiency. The novelty lies in applying PSO to refine feature selection in text classification, offering new insights into improving machine learning models for imbalanced datasets. Future research could explore reducing computational overhead and investigating hybrid selection techniques to further enhance scalability and performance.
Copyrights © 2025