Slamet Widodo
Universitas Amikom Purwokerto

Published : 1 Documents Claim Missing Document
Claim Missing Document
Check
Articles

Found 1 Documents
Search

A Comprehensive Evaluation of CatBoost and LightGBM Algorithms for Honorarium Prediction on Categorical Datasets with Class Imbalance Slamet Widodo; Fandy Setyo Utomo; Berlilana
JUITA: Jurnal Informatika JUITA Vol. 13 Issue 3, November 2025
Publisher : Department of Informatics Engineering, Universitas Muhammadiyah Purwokerto

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30595/juita.v13i3.27363

Abstract

Determining income, including honoraria in the academic environment, is often done manually and subjectively, necessitating a predictive model to objectively determine the honorarium amount. However, the development of the prediction model faces challenges due to the dataset's characteristics, which include categorical data and an imbalanced class distribution. This research aims to evaluate the predictive performance and computational resource efficiency of the CatBoost and LightGBM algorithms in predicting honorariums. The dataset used includes 58,332 actual honorarium data of employees from higher education institution "A" in Purwokerto for the period from January 2024 to February 2025. The methods used include data preprocessing, dataset splitting using Stratified Splitting, modeling with CatBoost, LightGBM, Random Forest, Neural Network, and Linear Regression, as well as evaluation using MSE, RMSE, MAE, R² metrics, and computational resources (execution time, memory, CPU time). LightGBM achieved an RMSE of 665.960 and an R² of 0.54, while recording the lowest memory usage at only 2.67 MB. CatBoost produced an RMSE of 667.395 and an R² of 0.53, excelling in processing categorical features without one-hot encoding. Meanwhile, Linear Regression showed the lowest accuracy and high memory usage. These results confirm that LightGBM is the most optimal choice for fast, efficient, and accurate honorarium predictions. However, this research is limited to testing in a laboratory environment. Further research is recommended to implement direct integration with an active database and the integration of information retrieval methods to enhance the effectiveness and security of real-time honorarium predictions, as well as to integrate interpretability methods such as SHAP to improve decision-making transparency.