Machine learning and data mining, the quality of a dataset significantly influences model performance. One common issue is data imbalance, where one class in a dataset has significantly fewer samples than another. This imbalance can lead to biased models that favor the majority class, resulting in poor predictive performance for minority class instances. To address this issue, this study employs a resampling approach using the MWMOTE (Majority Weighted Minority Oversampling Technique) method, enhanced with K-Means Clustering. The MWMOTE algorithm generates synthetic samples for the minority class, while K-Means Clustering helps improve the distribution of generated samples by forming well-structured clusters. Experimental results on 10 different datasets demonstrate that the proposed MWMOTE + K-Means approach significantly improves classification performance. Compared to the baseline accuracy of 70%, the proposed method enhances precision by 10%, recall by 40%, and F-measure by 40%. However, the computational cost is slightly increased due to the additional clustering step required for synthetic data generation. Despite the increased computation time, the improvement in classification metrics suggests that integrating K-Means with MWMOTE is a promising technique for handling imbalanced data. Future research could explore optimizing the computational efficiency of this approach and comparing it with other oversampling techniques.
Copyrights © 2025