Heart disease is the main cause of death in the world. To reduce this high mortality rate, accurate prediction capabilities are needed in warning people with heart disease to prevent and manage this condition. This study uses a machine learning model to predict heart disease. The purpose of this research is to improve the ability of a machine learning classification model, namely Logistic Regression (LR), in predicting heart disease. So that prediction errors that can harm patients can be significantly reduced. To achieve this goal, research is carried out using two important approaches, namely data preparation and model optimization. At the data preparation stage, data imbalance problems were found between people with heart disease and non-heart disease sufferers. To deal with this problem, the Neighborhood Cleaning Rule (NCL) algorithm is used to correct data imbalances. The use of NCL in the data preparation stage has a significant impact on improving the performance of the prediction model. Furthermore, at the model optimization stage, the GridSearchCV method is used to find the best hyperparameter combination in the Logistic Regression (LR) algorithm. By finding optimal hyperparameters, the performance of the prediction model can be improved. In addition, this study also implemented Weighted Logistic Regression which allows setting class weights, which also contributes to improving model performance. The results of testing the model using the evaluation metrics Accuracy, Recall, and Area Under Curve (AUC) show an increase in the ability of the model. The recall score increased from 0.10 to 0.93, and the AUC score increased from 0.83 to 0.98. This study used a dataset obtained from Kaggle from the Centers for Disease Control and Prevention (CDC). With better predictive ability in identifying heart disease, it is hoped that it can provide accurate early warning to individuals at risk, thereby significantly reducing mortality from heart disease.
Copyrights © 2023