Journal of Intelligent Systems
Vol 1, No 2 (2015)

Penerapan Metode Average Gain, Threshold Pruning dan Cost Complexity Pruning Untuk Split Atribut Pada Algoritma C4.5

Rahayu, Erna Sri ( Pascasarjana Universitas Dian Nuswantoro)
Wahono, Romi Satria ( Dian Nuswantoro University)
Supriyanto, Catur ( Dian Nuswantoro University)



Article Info

Publish Date
29 Dec 2015

Abstract

C4.5 is a supervised learning classifier to establish a Decision Tree of data. Split attribute is main process in the formation of a decision tree in C4.5. Split attribute in C4.5 can not be overcome in any misclassification cost split so the effect on the performance of the classifier. After the split attributes, the next process is pruning. Pruning is process to cut or eliminate some of unnecessary branches. Branch or node that is not needed can cause the size of Decision Tree to be very large and it is called over- fitting. Over- fitting is state of the art for this time. Methods for split attributes are Gini Index, Information Gain, Gain Ratio and Average Gain which proposed by Mitchell. Average Gain not only overcome the weakness in the Information Gain but also help to solve the problems of Gain Ratio. Attribute split method which proposed in this research is use average gain value multiplied by the difference of misclassification. While the technique of pruning is done by combining threshold pruning and cost complexity pruning. In this research, testing the proposed method will be applied to datasets and then the results of performance will be compared with results split method performance attributes using the Gini Index, Information Gain and Gain Ratio. The selecting method of split attributes using average gain that multiplied by the difference of misclassification can improve the performance of classifiying C4.5. This is demonstrated through the Friedman test that the proposed split method attributes, combined with threshold pruning and cost complexity pruning have accuracy ratings in rank 1. A Decision Tree formed by the proposed method are smaller. Keyword: Decision Tree, C4.5, split attribute, pruning, over-fitting, gain, average gain.

Copyrights © 2015






Journal Info

Abbrev

JIS

Publisher

Subject

Computer Science & IT

Description

Journal of Intelligent Systems adalah jurnal ilmiah berkala yang memuat hasil penelitian pada bidang komputasi dan sistem cerdas dari aspek teori, praktis maupun aplikasi. Jurnal ini akan mempublikasikan makalah orisinal baik makalah technical maupun makalah survei atau review perkembangan terakhir ...