Cardiovascular disease is a disease with a fairly high number of deaths. In Indonesia, the term cardiovascular is more popular with heart disease, which is a condition that can cause narrowing and blockage of blood vessels. Cardiovascular disease has two risks, the first is a risk that can be changed, such as stress, increased blood pressure, unhealthy diet, increased glucose levels, abnormal cholesterol and lack of physical activity. Meanwhile, risks that cannot be changed include family disease, gender, age and obesity. In this research, we can examine and analyze the performance of the two best classification algorithms, namely the decision tree algorithm and the random forest algorithm, in classifying cardiovascular disease based on the cause of the disease. The aspects studied are the performance results of each algorithm and evaluated using Area Under the Curve (AUC), classification report, k-Fold Cross Validation and Confusion matrix. The dataset used was taken from the Kaggle website with the data used being Cardiovascular Disease data which consists of 68.205 rows (patient data) and 17 attributes. . Based on the evaluation results using the Area Under The Curve (AUC) value, the highest result was obtained at 0.761 by the Random Forest algorithm with balanced data conditions with Random oversampling. Meanwhile, the lowest AUC value was obtained by the Decision Tree algorithm with unbalanced data of 0.592. Based on these results, it is known that the Random Forest algorithm with a balanced data scheme is a better algorithm, with a balanced data scenario using SMOTE and Random Oversampling techniques.
Copyrights © 2024