Class imbalance can occur in various types of datasets, one of which is bank marketing datasets. The class imbalance can cause classification problems. To handle the problem, the SMOTE method can be used. However, the application of SMOTE can cause class overlapping and interfere with classification performance. Therefore, this research will try to handle it by combining the SMOTE method with undersampling methods consisting of ENN, NCL, and TomekLink. The classification algorithm used is Logistic Regression and the performance evaluation uses sensitivity, specificity, and g-means of the model. The results show that the SMOTE-ENN combination produces the most optimal results with sensitivity, specificity, and g-means of 94.05%, 83.22%, and 88.47% respectively on bank marketing datasets, while on credit card fraud datasets it has almost uniform results with sensitivity, specificity, and g-means ranging from 88.62%, 97.59%, and 93.00%. Finally, on cerebral stroke datasets, SMOTE-ENN produces the highest sensitivity at 80.1%, the highest specificity on SMOTE-NCL at 75.62%, and the highest g-means on SMOTE at 77.03%.
Copyrights © 2024