Claim Missing Document
Check
Articles

Comparison Between K-Fold Cross Validation And Percentage Split In Decision Tree Algorithms For Anemia Classification Rahmawati, Nanda Putri; Irwan Budiman; Muhammad Itqan Mazdadi; Andi Farmadi; Friska Abadi
Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol. 8 No. 1 (2026): February
Publisher : Jurusan Teknik Elektromedik, Politeknik Kesehatan Kemenkes Surabaya, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/ijeeemi.v8i1.315

Abstract

Anemia is a significant global health challenge characterized by a pathological deficit in hemoglobin concentration, often leading to physiological instability. Accurate clinical diagnosis typically relies on complete blood count (CBC) tests, which provide critical hematological parameters for classification. While machine learning models have demonstrated high efficacy in diagnosing anemia, existing research often relies on static data partitioning strategies that may overlook evaluation reliability and performance stability. This study addresses this gap by shifting the focus from architectural benchmarking to validation robustness, specifically evaluating the C4.5 algorithm's performance across different data-splitting techniques. The research uses a dataset comprising 1,281 clinical records with 14 numerical features and 9 anemia-type labels. To assess stability, two distinct partitioning strategies were implemented: a static Percentage Split (ranging from 60:40 to 90:10) and iterative K-Fold Cross Validation (with K values of 3, 5, 7, 10, and 15). Experimental results demonstrate that the C4.5 algorithm achieved its peak performance with the 90:10 Percentage Split, achieving an average accuracy of 99.46%, precision of 98.32%, and recall of 99.28%. In comparison, the K-Fold (K=10) approach yielded a slightly lower but more stable accuracy of 99.19% with a significantly reduced standard deviation (±0.09), highlighting its reliability for clinical applications. While the high-ratio percentage split maximizes training exposure and predictive potential, the K-Fold method provides a more objective, generalizable benchmark by accounting for the entire data distribution. The study further identifies challenges in classifying minority classes, such as Leukemia with thrombocytopenia, due to inherent data scarcity. Ultimately, this research confirms that the C4.5 algorithm, when paired with an optimal partitioning protocol, remains a robust and highly interpretable solution for clinical anemia screening, outperforming several complex modern architectures
The Effect of Smote-Tomek on the Classification of Chronic Diseases Based on Health and Lifestyle Data Muhammad Adika Riswanda; Friska Abadi; Muhammad Itqan Mazdadi; Mohammad Reza Faisal; Rudy Herteno
Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics Vol. 8 No. 1 (2026): February
Publisher : Jurusan Teknik Elektromedik, Politeknik Kesehatan Kemenkes Surabaya, Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.35882/ijeeemi.v8i1.324

Abstract

Machine learning models for chronic disease prediction are often trained on imbalanced healthcare datasets, where non-disease cases dominate. This condition can lead to misleadingly high accuracy while failing to identify patients with chronic diseases, limiting clinical usefulness. This study aims to analyze the impact of class imbalance on model performance and to evaluate the effectiveness of the SMOTE–Tomek resampling technique in improving chronic disease prediction. This research provides empirical evidence that accuracy alone is insufficient for evaluating healthcare models and demonstrates that imbalance-aware preprocessing is essential for valid and reliable chronic disease detection. Five classification models, such as Support Vector Machine, Random Forest, K-Nearest Neighbors, Gradient Boosting, and XGBoost, were evaluated on a lifestyle-based chronic disease dataset under two conditions: without resampling and with SMOTE–Tomek. Model performance was assessed using accuracy, precision, recall, F1-score, and AUC. Without SMOTE–Tomek, all models failed to detect chronic disease cases, producing near-zero recall and F1-scores despite accuracy exceeding 80%. After applying SMOTE–Tomek, substantial improvements were observed across all models, particularly in recall and AUC. Support Vector Machine achieved the best overall performance, with an accuracy of 92.9%, a precision of 92%, a recall of 93.9%, an F1-score of 0.93, and an AUC of 0.98. The findings confirm that handling class imbalance is a prerequisite for meaningful chronic disease prediction. The consistent increase in recall and AUC across all evaluated models confirms that the improvement stems from enhanced class separability rather than metric inflation. The proposed approach supports more reliable early screening and decision-support systems in preventive healthcare
Comparasion Of Weather Classification Methods On Weather Images Using GLCM Features With Random Forest And Catboost Algoritms Noorhafizi, Muhammad; Saragih, Triando Hamonangan; Mazdadi, Muhammad Itqan; Muliadi, Muliadi; Herteno, Rudy; Rozaq, Hasri Awal Akbar
International Journal of Advances in Data and Information Systems Vol. 7 No. 1 (2026): April 2026 - International Journal of Advances in Data and Information Systems
Publisher : Indonesian Scientific Journal

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.59395/ijadis.v7i1.1456

Abstract

Weather image classification is an essential process for improving automated weather information systems. However, most existing studies rely on numerical meteorological data and rarely utilize the textural characteristics embedded in atmospheric imagery. This study addresses that limitation by applying the Gray Level Co-Occurrence Matrix (GLCM) for texture feature extraction combined with Random Forest (RF) and CatBoost algorithms for classification. The dataset, obtained from Kaggle, consists of 1,125 weather images categorized into four classes: cloudy, rain, shine, and sunrise. All images were uniformly normalized and augmented using four rotation angles (0°, 45°, 90°, 135°). GLCM features were extracted with a pixel distance of 1 and gray-level quantization of 8, generating four statistical attributes: contrast, correlation, energy, and homogeneity. Both algorithms were optimized through parameter tuning and evaluated using a 5-fold cross-validation scheme with an 80:20 split ratio. Results show that the Random Forest model (n_estimators = 100, max_depth = 10, random_state = 42) achieved the highest accuracy of 92.43% (±1.12), precision of 92.50%, recall of 92.43%, and F1-score of 92.42%. In comparison, CatBoost (iterations = 100, learning_rate = 0.1, depth = 6) achieved an accuracy of 68.88% (±2.31). The findings demonstrate that GLCM feature extraction combined with Random Forest offers superior stability and accuracy for weather image classification, providing a foundation for efficient and interpretable weather information systems.
Co-Authors AA Sudharmawan, AA Abdilah, Muhammad Fariz Fata Abdullayev, Vugar Ade Agung Harnawan, Ade Agung Adela Putri Ariyanti Afifa, Ridha Ahdyani, Annisa Salsabila Ahmad Rusadi Ahmad Rusadi Ahmad Rusadi Arrahimi - Universitas Lambung Mangkurat) Ahmad Rusadi Arrahimi - Universitas Lambung Mangkurat) Ahmad Shofi Khairian Ahmad Tajali Aidil Akbar Al Ghifari, Muhammad Akmal Alamudin, Muhammad Faiq Amalia, Raisa Andi - Farmadi Andi Farmadi Andi Farmadi Anna Khumaira Sari Anshory, Muhammad Naufal Ansyari, Muhammad Ridho Antoh, Soterio Ardiansyah Sukma Wijaya Athavale, Vijay Anant Athavale, Vijay Annant budiman, irwan Buih, Putri Helena Junjung Deni Sutaji Dina Arifah Djordi Hadibaya Dodon Turianto Nugrahadi Dwi Kartini Dwi Kartini Dwi Kartini, Dwi Dzira Naufia Jawza Erdi, Muhammad Faisal, Mohammad Reza Fathmah, Siti Fatma Indriani Fayyadh, Muhammad Naufaldi Fitriani, Karlina Elreine Fitrinadi Friska Abadi Haekal, Muhammad Hafizah, Rini Helma Herlinda Herteno, Rudi Herteno, Rudy Indriani, Fatma Irwan Budiman Irwan Budiman Irwan Budiman Irwan Budiman M. Apriannur M. Khairul Rezki Mafazy, Muhammad Meftah Maulana, Muhammad Rafly Alfarizqy Muflih Ihza Rifatama Muhamad Fawwaz Akbar Muhamad Ihsanul Qamil Muhammad Adika Riswanda Muhammad Khairin Nahwan Muhammad Mada Muhammad Mirza Hafiz Yudianto Muhammad Mursyidan Amini Muhammad Reza Faisal, Muhammad Reza Muliadi Muliadi Muliadi Muliadi Muliadi Muliadi Muliadi Muliadi Muliadi Muliadi Muliadi Muliadi Nabella, Putri Noorhafizi, Muhammad Normaidah, Normaidah Nugraha, Muhammad Amir Nursyifa Azizah P., Chandrasekaran Patrick Ringkuangan Prastya, Septyan Eka Putri Nabella Radityo Adi Nugroho Rahmah, Indah Noor Rahmat Hidayat Rahmat Ramadhani Rahmat Ramadhani Rahmawati, Nanda Hesti Rahmawati, Nanda Putri Ramadhan, Mita Azzahra Ramadhani, Muhammad Irfan Ramadhani, Rahmat Ratnapuri, Prima Happy Riadi, Agus Teguh Rifki Izdihar Oktvian Abas Pullah Rifki Rinaldi Rizky, Muhammad Miftahur Rozaq, Hasri Akbar Awal Rozaq, Hasri Awal Akbar Rudy Herteno Saputra, Adryan Maulana Saputro, Setyo Wahyu Saragih, Triando Hamonangan Satrio Yudho Prakoso Setyo Wahyu Saputro Shalehah Syahputra, Muhammad Reza Tajali, Ahmad Totok Wianto Wahyu Dwi Styadi Wijaya Kusuma, Arizha Yanche Kurniawan Mangalik YILDIZ, Oktay Yoga Pambudi Yudha Sulistiyo Wibowo Zaini Abdan