The rapid expansion of the fashion e-commerce sector has intensified the need for accurate sales segmentation to support targeted marketing and efficient inventory management. This study proposes a robust methodology for classifying fashion product sales into three categories: high-selling, moderately-selling, and low-selling, using the Random Forest algorithm integrated with the Synthetic Minority Over-sampling Technique (SMOTE) and hyperparameter optimization. A real-world dataset comprising over 20,000 product records from an online marketplace was preprocessed through missing-value handling, categorical encoding, and numerical feature standardization. Class labels were generated using quantile-based segmentation of sales volume, followed by class balancing with SMOTE. The Random Forest model was tuned using RandomizedSearchCV and evaluated through accuracy, precision, recall, F1-score, and Receiver Operating Characteristic–Area Under Curve (ROC-AUC) metrics. Experimental results demonstrate strong predictive performance, achieving an accuracy of 90.43%, macro-precision of 90.60%, macro-recall of 90.45%, macro-F1 of 90.50%, and macro ROC-AUC of 0.9783. Feature importance analysis revealed that price, category, and customer ratings were the most influential predictors of sales segmentation. These findings validate the effectiveness of ensemble learning combined with class imbalance handling for multi-class classification in retail datasets. From a scientific perspective, this research contributes to the literature by presenting a reproducible, data-driven framework for product segmentation in heterogeneous and imbalanced datasets. Practically, the proposed approach can guide fashion retailers in refining pricing strategies, optimizing marketing campaigns, and improving inventory decisions in competitive online marketplaces. The methodology is adaptable to other e-commerce domains, offering broader implications for business intelligence and predictive analytics.
Copyrights © 2025