CAUCHY: Jurnal Matematika Murni dan Aplikasi
Vol 11, No 1 (2026): CAUCHY: JURNAL MATEMATIKA MURNI DAN APLIKASI

A Study on Multi-Class Topic Prediction for E-commerce Review Data Using Ensemble Learning

Alifviansyah, Kevin (Unknown)



Article Info

Publish Date
30 May 2026

Abstract

The exponential growth of e-commerce platforms has generated massive volumes of unstruc tured user reviews, necessitating advanced automated analysis methodologies to extract actionable insights for strategic decision-making. This study addresses multi-class text classi f ication challenges by integrating BERTopic-based topic modeling with ensemble learning algorithms to analyze Indonesian e-commerce reviews. A dataset comprising 24,000 customer reviews from Google Play Store underwent systematic preprocessing and topic extraction using BERTopic, yielding eight distinct thematic clusters reflecting application performance, product quality, pricing, delivery logistics, and service reliability. The dataset exhibited severe class imbalance with an imbalance ratio of 65:1, where the dominant class represented 76.02% of instances while minority classes constituted less than 2.12%. Hybrid resampling techniques combining undersampling and oversampling successfully reduced the imbalance ratio to 1.4:1. TF-IDF vectorization transformed preprocessed text into numerical features, followed by supervised classification using CatBoost and Extra Trees classifiers optimized through randomized hyperparameter search with stratified k fold cross-validation. CatBoost demonstrated superior performance, achieving balanced accuracy of 0.829, recall of 0.829, and AUC of 0.965, attributed to its ordered boosting mechanism and capacity for handling categorical and imbalanced data. Independent validation of 2025 data confirmed robust gen eralization with prediction confidence exceeding 0.90, revealing significant temporal evolution in which product-related topics emerged dominant at 70.35%, pricing concerns increased from 6.58% to 16.57%, while application issues decreased from 76.02% to 2.51%. This research establishes a methodologically rigorous framework integrating unsupervised topic discovery with supervised ensemble classification, demonstrating computational efficiency while providing scalable solutions for automated review categorization.

Copyrights © 2026






Journal Info

Abbrev

Math

Publisher

Subject

Mathematics

Description

Jurnal CAUCHY secara berkala terbit dua (2) kali dalam setahun. Redaksi menerima tulisan ilmiah hasil penelitian, kajian kepustakaan, analisis dan pemecahan permasalahan di bidang Matematika (Aljabar, Analisis, Statistika, Komputasi, dan Terapan). Naskah yang diterima akan dikilas (review) oleh ...