This study investigates the impact of the Synthetic Minority Over-sampling Technique (SMOTE) on the classification of direct and indirect band gap types in imbalanced ABO₃ perovskite oxide datasets. In the dataset used, the direct band gap class constitutes approximately 84% of the samples, while the indirect class represents only 16%, leading conventional classification models to become biased toward the majority class. To address this issue, SMOTE was employed to balance the class distribution, and its performance was evaluated using several machine learning algorithms, including Multi-Layer Perceptron (MLP), Extra Trees, CatBoost, and Gradient Boosting. Model performance was assessed using 5-fold stratified cross-validation, with particular emphasis on F1-macro and recall metrics to ensure adequate evaluation of the minority class. The results show that although SMOTE did not significantly improve overall accuracy (baseline: 0.89; SMOTE: 0.88), it enhanced the models’ ability to recognize the minority class. Notable improvements in F1-macro were observed, increasing from 0.76 to 0.78 for MLP and from 0.75 to 0.78 for CatBoost. These findings highlight the importance of using F1-macro as a more informative evaluation metric than accuracy for imbalanced datasets and provide methodological insights for developing more robust predictive models in materials informatics.
Copyrights © 2026