Sales demand forecasting is crucial for inventory optimization in retail, especially for Micro, Small, And Medium Enterprises (MSMEs). This study examines the effect of entropy-based feature selection on the performance of a two-stage machine learning framework comprising K-Means clustering and Naive Bayes classification. The research was conducted on transactional data collected from a footwear MSME in Palembang, Indonesia, covering January to December 2024. Shannon Entropy and Information Gain were applied to identify and retain the most informative features before clustering and classification tasks. Two experimental scenarios were investigated: (1) using all features without selection and (2) applying entropy-based feature selection with Information Gain thresholds of 0.4 and 0.5 for category-based and quantity-based targets, respectively. The first scenario yielded moderate performance, with a Silhouette Score of 0.5747 and a classification accuracy of 96.97%. In contrast, the second scenario demonstrated superior results, achieving a Silhouette Score of 0.6261 and a classification accuracy of 99.49% when quantity sold was used as the target variable. These findings indicate that entropy-based feature selection reduces data dimensionality, enhances clustering compactness, and improves classification accuracy. This research contributes to the field by presenting a practical framework for sales demand forecasting in retail environments. Future work will focus on integrating additional contextual variables, such as seasonal trends and promotions, and validating the system in real-world retail settings
Copyrights © 2025