Jurnal Ilmiah Kursor
Vol. 12 No. 3 (2024)

THE INFLUENCE OF DATA CATEGORIZATION AND ATTRIBUTE INSTANCES REDUCTION USING THE GINI INDEX ON THE ACCURACY OF THE CLASSIFICATION ALGORITHM MODEL

Willy Fernando (Unknown)
Jollyta, Deny (Unknown)
Dadang Priyanto (Unknown)
Dwi Oktarina (Unknown)



Article Info

Publish Date
25 May 2024

Abstract

Numerical data problems are typically caused by a failure to comprehend the data and the outcomes of its processing. In order to give richer context and a deeper understanding of the facts, numerical data must be transformed into categories. On the other hand, changes in data have a significant impact on the analysis's outcomes. The purpose of this study is to see how transforming numerical data into categories affects the model produced by the classification algorithms. The dataset used in this study is the Maternal Health Risk. Categorization refers to formal arrangements. Categorization is also accomplished by using the Gini Index to limit the number of instances of an attribute. The classification is carried out using the Random Forest (RF), K-Nearest Neighbor (K-NN) and Support Vector Machine (SVM) algorithms to produce a model. The influence of data modifications to model can be observed in the confusion matrix with 5 different data splitting. The study results suggested that changing numerical data to categories data significantly improved the performance of the SVM model from 76.92% to 80.77% at a data splitting percentage of 95/5.

Copyrights © 2024






Journal Info

Abbrev

kursor

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management

Description

Jurnal Ilmiah Kursor is published in January 2005 and has been accreditated by the Directorate General of Higher Education in 2010, 2014, 2019, and until now. Jurnal Ilmiah Kursor seeks to publish original scholarly articles related (but are not limited) to: Computer Science. Computational ...