Claim Missing Document
Check
Articles

Found 1 Documents
Search

Hybrid LBFA-Based Feature Selection for Improving Machine Learning Classification Performance in Heart Disease Prediction Hana Azizah; Eni Sumarminingsih; Adji Achmad Rinaldo Fernandes
UNP Journal of Statistics and Data Science Vol. 4 No. 2 (2026): UNP Journal of Statistics and Data Science
Publisher : Departemen Statistika Universitas Negeri Padang

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.24036/ujsds/vol4-iss2/478

Abstract

Feature selection and feature engineering are essential steps in developing accurate machine learning models, particularly when dealing with imbalanced datasets and redundant variables. However, many feature augmentation methods are often applied without a consistent preprocessing strategy, which can reduce model reliability and increase the risk of information leakage. To overcome this issue, this study proposes a hybrid classification framework that combines CatBoost-based feature selection with two feature augmentation techniques: LOGIT transformation and Log Density Ratio (LDR). A structured preprocessing pipeline was designed to ensure consistency throughout the modeling process. One-hot encoding was applied for the LOGIT transformation, while numerical standardization was used for LDR estimation. The generated features were then integrated with the selected original variables to produce richer feature representations for classification. The proposed framework was evaluated using the Heart Disease dataset with three gradient boosting algorithms, namely LightGBM, XGBoost, and CatBoost. Model performance was assessed using accuracy, precision, sensitivity, specificity, and F1-score. The results show that the proposed approach consistently improved classification performance across all models. Among the tested models, LightGBM combined with LOGIT and LDR achieved the best performance, obtaining an accuracy of 0.9618, precision of 0.9485, sensitivity of 0.9620, specificity of 0.9625, and F1-score of 0.9552. These findings suggest that combining feature selection with structured feature augmentation can significantly improve predictive performance in imbalanced classification tasks