Mila Desi Anasanti
Program Studi S2 Ilmu Komputer, Universitas Nusa Mandiri, Jakarta; Department of Information Studies, University College London, London; Bart and London Genome Center, Queen Mary University of London, London

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

Improving Cardiovascular Disease Prediction by Integrating Imputation, Imbalance Resampling, and Feature Selection Techniques into Machine Learning Model Fadlan Hamid Alfebi; Mila Desi Anasanti
IJCCS (Indonesian Journal of Computing and Cybernetics Systems) Vol 17, No 1 (2023): January
Publisher : IndoCEISS in colaboration with Universitas Gadjah Mada, Indonesia.

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.22146/ijccs.80214

Abstract

Cardiovascular disease (CVD) is the leading cause of death worldwide. Primary prevention is by early prediction of the disease onset. Using laboratory data from the National Health and Nutrition Examination Survey (NHANES) in 2017-2020 timeframe (N= 7.974), we tested the ability of machine learning (ML) algorithms to classify individuals at risk. The ML models were evaluated based on their classification performances after comparing four imputation, three imbalance resampling, and three feature selection techniques.Due to its popularity, we utilized decision tree (DT) as the baseline. Integration of multiple imputation by chained equation (MICE) and synthetic minority oversampling with Tomek link down-sampling (SMOTETomek) into the model improved the area under the curve-receiver operating characteristics (AUC-ROC) from 57% to 83%. Applying simultaneous perturbation feature selection and ranking (spFSR) reduced the feature predictors from 144 to 30 features and the computational time by 22%. The best techniques were applied to six ML models, resulting in Xtreme gradient boosting (XGBoost) achieving the highest accuracy of 93% and AUC-ROC of 89%.The accuracy of our ML model in predicting CVD outperforms those from previous studies. We also highlight the important causes of CVD, which might be investigated further for potential effects on electronic health records.