The rapid growth of e-commerce platforms has resulted in a large volume of unstructured user reviews, creating challenges for scalable analysis. This study proposes a multi-class topic classification framework for Indonesian Shopee application reviews by integrating BERTopic-based embedding-driven topic modeling with ensemble learning. A total of 23,956 reviews are analyzed, with BERTopic applied exclusively to 19,167 training reviews to derive eight dominant topic labels, which serve as pseudo-labels for supervised classification using CatBoost and Extra Trees. Model performance is evaluated on a held-out test set under baseline and hybrid resampling settings to address severe class imbalance. The results show that hybrid resampling substantially improves balanced accuracy, particularly for CatBoost, while ROC–AUC remains consistently high, indicating robust class discrimination. Analysis of an unlabeled 2025 dataset, used solely in a deployment-style setting, reveals semantically consistent topic distributions on unseen data. Overall, the findings demonstrate that embedding-based topic modeling combined with ensemble learning provides an effective and scalable solution for multi-class topic classification in highly imbalanced e-commerce review data, with clear separation between training, evaluation, and post-deployment analysis.
Copyrights © 2026