This study aims to classify reviews of the SIREKAP 2024 application by utilizing Large Language Model (LLM)-based Gemini pre-processing, Term Frequency–Inverse Document Frequency (TF-IDF) feature extraction, and the Random Forest algorithm as the classification method. The data used consist of user reviews obtained from the Google Play Store and categorized into five rating classes. Model performance evaluation was conducted using the 10-Fold Cross-Validation method with the Macro F1-Score metric. The testing results indicate that the lowest F1-Score achieved was 31.87%, while the highest reached 37.28%, with an overall average Macro F1-Score of 34.62%. These findings demonstrate that the Random Forest algorithm is capable of producing relatively stable classification performance through its ensemble learning mechanism, which combines multiple decision trees. However, its performance is still influenced by the imbalance in data distribution across classes. Therefore, Random Forest plays a role in maintaining prediction stability and reducing overfitting, although further development is required to improve classification performance on imbalanced review data
Copyrights © 2026