Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
Vol 7 No 1 (2023): February 2023

A Comparative Study of CatBoost and Double Random Forest for Multi-class Classification

Annisarahmi Nur Aini Aldania (IPB University)
Agus Mohamad Soleh (IPB University)
Khairil Anwar Notodiputro (IPB University)



Article Info

Publish Date
02 Feb 2023

Abstract

Multi-class classification has its challenge compared to binary classification. The challenges mainly caused by the interactions between explanatory and responses variable are increasingly complex. Ensemble-based methods such as boosting and random forest (RF) have been proven to handle classification problems. We conducted this research to study multi-class classification using CatBoost, a method developed with gradient boosting and double random forest (DRF), RF’s development that is good to be used when the resulting RF model is underfitting. Analysis was carried out using simulation and empirical data. In the simulation study, we generate data based on the distance between classes: high, medium, and low. The empirical data used is the industrial classification code, namely KBLI. CatBoost and DRF can rightly solve the multi-class classification problem at a high distance, measured by a 100% balanced accuracy score. At a medium distance, CatBoost and DRF produce balanced accuracy scores of 99.25% and 97.54%, respectively, whereas 32.37% and 23.97% at the low distance. In empirical studies, CatBoost’s performance outperforms DRF by 4.27%. All the differences are statistically significant based on the t-test result. We also use LIME to explain individual predictions of CatBoost and learn words that contribute the most to an example class’s prediction.

Copyrights © 2023






Journal Info

Abbrev

RESTI

Publisher

Subject

Computer Science & IT Engineering

Description

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) dimaksudkan sebagai media kajian ilmiah hasil penelitian, pemikiran dan kajian analisis-kritis mengenai penelitian Rekayasa Sistem, Teknik Informatika/Teknologi Informasi, Manajemen Informatika dan Sistem Informasi. Sebagai bagian dari semangat ...