Nadira Sri Belinda
Unknown Affiliation

Published : 2 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search
Journal : PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND OFFICIAL STATISTICS

SMOTE and Nearmiss Methods for Disease Classification with Unbalanced Data : Case Study: IFLS 5 Anas Rulloh Budi Alamsyah; Salsabila Rahma Anisa; Nadira Sri Belinda; Adi Setiawan
Proceedings of The International Conference on Data Science and Official Statistics Vol. 2021 No. 1 (2021): Proceedings of 2021 International Conference on Data Science and Official St
Publisher : Politeknik Statistika STIS

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.34123/icdsos.v2021i1.240

Abstract

Unbalanced data are often encountered in practice. They complicate the search for a model suitable for classification. This is because the number of individuals who have a history of a disease is less than the number of individuals who do not. We analyse the IFLS 5 data on medical history of a set of patients. We split the dataset in the proportion 80:20 to training and test subsets. Of course, both datasets are unbalanced, with only a small minority of patients who had a stroke. We apply the SMOTE and Nearmiss methods and evaluate the rate of correct classification. After being treated using the two methods, the training data was transformed into balanced data. The classification process is carried out to test the comparison of the effectiveness of the two methods in solving the problem of unbalanced data. Based on the results obtained, it can be concluded that the Nearmiss method is better than SMOTE in balancing the data. It was obtained by comparing several measures such as accuracy, F-score, Kappa, sensitivity, and specificity on the SMOTE and Nearmiss methods.