NUMERICAL (Jurnal Matematika dan Pendidikan Matematika)
Vol. 6 No. 2 (2022)

Increasing Accuracy of Classification in C4.5 Algorithm by Applying Principle Component Analysis for Diabetes Diagnosis

Michael Sitanggang (Mathematics Departement, State of University Medan)
Elmanani Simamora (Mathematics Departement, State of University Medan)



Article Info

Publish Date
04 Nov 2022

Abstract

The data revolution in medical records has increased the automation of medical devices in determining the factors that cause any disease, but it also poses challenges to their analysis. According to WHO, about 6% of the world's population of more than 420 million people live with type 1 or type 2 diabetes and this number has estimated to rise beyond half a billion by 2030, which means that one of the ten adults in the future is suffering from diabetes. With the rapid development of machine learning, machine learning has been applied to many aspects of medical health. In this study, we used Decision Tree C4.5 to predict diabetes mellitus. This research used a diabetic dataset obtained from UCI machine learning repository with 419 instances and 16 attributes. In this dataset, mostly of attributes are numeric types that are continuous. This research results of the improved C4.5 algorithm by applying PCA. Many algorithms have been proposed to overcome misclassification and overfitting on classifications Decision Tree C4.5. Feature reduction is one option that is intended to eliminate irrelevant data and overcome outliers in the data so as to increase classification accuracy. Based on the results of the experiment, the application of PCA in C4.5 resulted in an increase in accuracy of 6.55% were achieved.

Copyrights © 2022