Prayogo, Rizal Dwi
Unknown Affiliation

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Handling Imbalanced Data for Acute Coronary Syndrome Classification Based on Ensemble and K-Means SMOTE Method Muzakki, Muhammad Faris; Prayogo, Rizal Dwi; Rizky A, M Afif
JOIV : International Journal on Informatics Visualization Vol 7, No 3-2 (2023): Empowering the Future: The Role of Information Technology in Building Resilien
Publisher : Society of Visual Informatics

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30630/joiv.7.3-2.1429

Abstract

Acute Coronary Syndrome (ACS) is a disease that has a high mortality rate with a mortality percentage of 40% after 5 years from diagnosis. Despite the high mortality rate, the conventional process of overestimating ACS can be life-threatening. For this reason, several alternatives for prediagnosis have been investigated to reduce the detection of ACS intensively, one of which is by using a machine learning approach. The machine learning-based prediagnosis approach utilizes patient medical record data as input for making detection models. This approach can produce an optimal model when there is quite a lot of data and the labels have a fairly balanced comparison. However, in machine learning-based ACS detection studies, researchers often do not have balanced data between positive and negative labels that have the potential to cause overfitting. That problem occurs because obtaining additional data with specific labels is difficult. To solve the imbalanced problem in ACS detection, we generated synthetic ACS data using the K-Means SMOTE method. The synthesis data is used as training data to build an ensemble-based machine-learning model. In this study, we obtain an increase in the F1 score of more than 10% when compared to machine learning models that do not use the K-Means SMOTE as an oversampling process. In addition to the greater F1 score, the results obtained are relatively more resistant to overfitting because the data variations in the training set are more diverse.