Claim Missing Document
Check
Articles

Found 2 Documents
Search
Journal : International Journal of Engineering, Science and Information Technology

Hybrid Deep Fixed K-Means (HDF-KMeans) Zuhanda, Muhammad Khahfi; Kohsasih, Kelvin Leonardi; Octaviandy, Pieter; Hartono, Hartono; Kurnia, Dian; Tarigan, Nurliana; Ginting, Manan; Hutagalung, Manahan
International Journal of Engineering, Science and Information Technology Vol 5, No 3 (2025)
Publisher : Malikussaleh University, Aceh, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52088/ijesty.v5i3.913

Abstract

K-Means is one of the most widely used clustering algorithms due to its simplicity, scalability, and computational efficiency. However, its practical application is often hindered by several well-known limitations, such as high sensitivity to initial centroid selection, inconsistency across different runs, and suboptimal performance when dealing with high-dimensional or non-linearly separable data. This study introduces a hybrid clustering algorithm named Hybrid Deep Fixed K-Means (HDF-KMeans) to address these issues. This approach combines the advantages of two state-of-the-art techniques: Deep K-Means++ and Fixed Centered K-Means. Deep K-Means++ leverages deep learning-based feature extraction to transform raw data into more meaningful representations while employing advanced centroid initialization to enhance clustering accuracy and adaptability to complex data structures. Complementarily, Centered K-Means improve the stability of clustering results by locking certain centroids based on domain knowledge or adaptive strategies, effectively reducing variability and convergence inconsistency. Integrating these two methods results in a robust hybrid model that delivers improved accuracy and consistency in clustering performance. The proposed HDF-KMeans algorithm is evaluated using five benchmark medical datasets: Breast Cancer, COVID-19, Diabetes, Heart Disease, and Thyroid. Performance is assessed using standard classification metrics: Accuracy, Precision, Recall, and F1-Score. The results show that HDF-KMeans outperforms traditional K-Means, K-Means++, and K-Means-SMOTE algorithms across all datasets, excelling in overall accuracy and F1 Score. While some trade-offs are observed in specific precision or recall metrics, the model maintains a solid balance, demonstrating reliability. This study highlights HDF-KMeans as a promising and effective solution for complex clustering tasks, particularly in high-stakes domains like healthcare and biomedical analysis.
A Hybrid GDHS and GBDT Approach for Handling Multi-Class Imbalanced Data Classification Hartono, Hartono; Zuhanda, Muhammad Khahfi; Syah, Rahmad; Rahman, Sayuti; Ongko, Erianto
International Journal of Engineering, Science and Information Technology Vol 5, No 3 (2025)
Publisher : Malikussaleh University, Aceh, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52088/ijesty.v5i3.894

Abstract

Multiclass imbalanced classification remains a significant challenge in machine learning, particularly when datasets exhibit high Imbalance Ratios (IR) and overlapping feature distributions. Traditional classifiers often fail to accurately represent minority classes, leading to biased models and suboptimal performance. This study proposes a hybrid approach combining Generalization potential and learning Difficulty-based Hybrid Sampling (GDHS) as a preprocessing technique with Gradient Boosting Decision Tree (GBDT) as the classifier. GDHS enhances minority class representation through intelligent oversampling while cleaning majority classes to reduce noise and class overlap. GBDT is then applied to the resampled dataset, leveraging its adaptive learning capabilities. The performance of the proposed GDHS+GBDT model was evaluated across six benchmark datasets with varying IR levels, using metrics such as Matthews Correlation Coefficient (MCC), Precision, Recall, and F-Value. Results show that GDHS+GBDT consistently outperforms other methods, including SMOTE+XGBoost, CatBoost, and Select-SMOTE+LightGBM, particularly on high-IR datasets like Red Wine Quality (IR = 68.10) and Page-Blocks (IR = 188.72). The method improves classification performance, especially in detecting minority classes, while maintaining high accuracy.