Claim Missing Document
Check
Articles

Optimizing Heart Disease Classification Using C4.5, Random Forest, and XGBoost with ANOVA, Chi-Square, and AdaBoost Pratama, Andika; Assegaff, Setiawan; Jasmir, Jasmir; Nurhadi, Nurhadi
Jurnal Teknik Informatika (Jutif) Vol. 7 No. 2 (2026): JUTIF Volume 7, Number 2, April 2026
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2026.7.2.5430

Abstract

Heart disease remains one of the leading causes of mortality worldwide, underscoring the need for accurate and scalable prediction models within clinical informatics. This study proposes a leakage-safe machine learning pipeline combining stratified splitting, SMOTE-based imbalance handling, and in-fold feature selection using ANOVA, Chi-Square, and AdaBoost-assisted ranking to enhance classification performance on a large heart-disease dataset consisting of 10,000 samples and 21 attributes. Three widely used algorithms, C4.5, Random Forest, and XGBoost, were evaluated to determine the optimal model-feature selection configuration for structured medical data. The results demonstrate that feature relevance contributes more significantly to predictive performance than increasing model complexity, with Random Forest achieving the highest accuracy, precision, recall, and F1-Score at 98.43% when combined with Chi-Square or ANOVA feature selection. C4.5 showed the greatest relative improvement, rising from 76.52% to 97.57% using AdaBoost-assisted selection, while XGBoost improved from 66.32% to 94.88% after statistical filtering. The dominant features identified such as CRP, BMI, blood pressure, fasting glucose, LDL, triglycerides, and homocysteine align with well-established cardiovascular biomarkers, supporting clinical validity. This research provides an important contribution to computer science by demonstrating an efficient and scalable hybrid FS-boosting framework capable of reducing unnecessary model complexity, improving generalization, and supporting low-latency deployment in clinical decision-support systems. The findings highlight the potential of structured-data machine learning to strengthen digital health diagnostics in resource-limited environments.
Enhancement Of The C4.5 Decision Tree Algorithm With Anova For Predicting Academic Achievement Of Students At Smpn.16 Kota Jambi Osviarni, Rice; Assegaff, Setiawan; Jasmir, Jasmir; Nurhadi, Nurhadi
Jurnal Teknik Informatika (Jutif) Vol. 7 No. 2 (2026): JUTIF Volume 7, Number 2, April 2026
Publisher : Informatika, Universitas Jenderal Soedirman

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.52436/1.jutif.2026.7.2.5431

Abstract

This study aims to improve the accuracy of predicting student academic achievement by integrating the Analysis of Variance (ANOVA) method with the C4.5 Decision Tree algorithm. In the context of information systems, this research holds significant importance for the development of more reliable Decision Support Systems (DSS) or early warning systems in school environments. The research was conducted at SMPN 16 Jambi City using secondary data from three academic years (2022/2023-2024/2025) covering academic variables, attendance, and parental income. The main issue addressed was the limitations of the C4.5 algorithm in handling irrelevant features and unbalanced data, which, at the system implementation level, can lead to inaccurate recommendations or alerts.This research method employed a data mining approach with stages including data cleaning, numeric conversion, missing value imputation, formation of derived variables, and categorization of the target variable "Achievement." The initial C4.5 model produced 72.81% accuracy on the training data and 69.71% accuracy on cross-validation. After feature selection using ANOVA, one insignificant variable was removed, resulting in a hybrid C4.5+ANOVA model with nine key features. Test results showed an increase in accuracy to 80.44% on the training data and 73.66% on the cross-validation data, representing an improvement of 7.63 and 3.95 percentage points, respectively.This improvement in model performance directly translates to an enhancement in the quality of the information system's output, yielding more reliable reports and predictions for teachers and school management.
Fitur Information Gain untuk Meningkatkan Nilai Performa Pengklasifikasi Machine Learning pada Analisis Sentimen Komentar Spam Pengguna Youtube Jasmir, Jasmir; Gunardi, Gunardi; Rohaini, Eni; Naibaho, Ronald; Sukoco, Bambang; Jasmir , Jasmir
Jurnal Teknologi Informasi dan Ilmu Komputer Vol 13 No 2: April 2026
Publisher : Fakultas Ilmu Komputer, Universitas Brawijaya

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.25126/jtiik.132

Abstract

Perkembangan pesat media sosial telah memberikan ruang bagi setiap individu untuk menyampaikan pendapat, baik berupa komentar positif maupun negatif terhadap konten yang mereka akses. Kemudahan dalam memberikan opini secara daring ini berdampak pada semakin besarnya jumlah ulasan yang tersedia. Namun, volume ulasan yang sangat besar sering kali sulit untuk dianalisis secara manual dan berpotensi menimbulkan bias dalam penilaian. Untuk mengatasi permasalahan tersebut, diperlukan pendekatan otomatis melalui klasifikasi sentimen yang bertujuan mengelompokkan opini pengguna ke dalam kategori positif atau negatif. Dalam penelitian ini digunakan tiga algoritma pembelajaran mesin, yaitu Naïve Bayes (NB), K-Nearest Neighbor (KNN), dan Random Forest (RF). Data penelitian diperoleh dari public dataset UCI Machine Learning. Fokus penelitian adalah meningkatkan kinerja klasifikasi dengan memanfaatkan teknik seleksi fitur information gain. Hasil eksperimen menunjukkan bahwa penerapan information gain secara konsisten meningkatkan performa semua algoritma yang diuji, baik pada metrik akurasi, presisi, recall, maupun f1-score. Naïve Bayes awalnya memperoleh akurasi tertinggi sebesar 74,33% pada kondisi tanpa fitur tambahan. Namun, setelah penerapan information gain, algoritma KNN menunjukkan hasil paling optimal dengan akurasi mencapai 81,28% serta performa yang relatif seimbang pada semua metrik evaluasi. Sementara itu, Random Forest juga mengalami peningkatan, meskipun tidak melampaui KNN. Secara keseluruhan, penelitian ini menegaskan bahwa pemilihan fitur yang relevan melalui information gain mampu meningkatkan efisiensi dan efektivitas klasifikasi sentimen, serta dapat menjadi pendekatan yang potensial untuk menganalisis opini dalam skala besar.   Abstract The rapid growth of social media has provided individuals with the opportunity to freely express their opinions, whether positive or negative, toward the content they encounter. The increasing ease of sharing opinions online has resulted in a massive volume of user reviews. However, the large number of reviews is difficult to analyze manually and may introduce bias in interpretation. To address this issue, sentiment classification is applied to automatically categorize user opinions into positive or negative classes. In this study, three machine learning algorithms were employed: Naïve Bayes (NB), K-Nearest Neighbor (KNN), and Random Forest (RF). The dataset was obtained from the public UCI Machine Learning repository. The main objective of this research is to improve classification performance by utilizing feature selection through the information gain method. Experimental results demonstrate that applying information gain consistently enhances the performance of all evaluated algorithms across multiple metrics, including accuracy, precision, recall, and F1-score. Without feature selection, Naïve Bayes achieved the highest accuracy of 74.33%. However, after applying information gain, KNN outperformed the other algorithms by reaching an accuracy of 81.28% and exhibited balanced results across all evaluation metrics. Random Forest also showed improvement but did not surpass the performance of KNN. Overall, these findings highlight the importance of feature selection in improving both the efficiency and effectiveness of sentiment classification. Furthermore, the use of information gain proves to be a promising approach for large-scale opinion analysis, particularly in handling the high dimensionality of textual data.
An Adaptive Feature-Aware Hybrid Resampling Strategy for Imbalanced Diabetes Classification with Integrated Balanced Index Evaluation Jasmir, Jasmir; Pahlevi, Riza; Gunardi, Gunardi; Rohaini, Eni; Annisa, Tiko Nur
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol 10 No 2 (2026): April 2026
Publisher : Ikatan Ahli Informatika Indonesia (IAII)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29207/resti.v10i2.7418

Abstract

Class imbalance remains a critical challenge in medical data classification, particularly in diabetes prediction, as it significantly degrades minority-class sensitivity. This study proposes an Adaptive Feature-Aware Hybrid Resampling Strategy (AHRS) that dynamically integrates oversampling and undersampling based on Imbalance Ratio (IR) and Feature Importance (FI). Unlike conventional static resampling methods, AHRS iteratively adjusts class distribution while preserving informative feature structures. In addition, this study introduces the Integrated Balanced Index (IBI), a bounded composite metric integrating precision, recall, and specificity to provide a fairer evaluation of classification performance on imbalanced medical datasets. The proposed approach was evaluated using the Pima Indian Diabetes Dataset (768 instances) with K-Nearest Neighbor, Naïve Bayes, and Random Forest classifiers under 5-fold stratified cross-validation. Experimental results demonstrate that AHRS consistently outperforms SMOTE, Random Oversampling, and Tomek Links, achieving accuracy improvements of 5–7% and recall gains of up to 10%. Random Forest combined with AHRS achieved the highest IBI score of 0.90, indicating strong balance between sensitivity and specificity. The findings suggest that adaptive, feature-aware resampling combined with balanced evaluation metrics provides a reliable and interpretable framework for fair medical classification systems and Clinical Decision Support Systems (CDSS).
Penulisan Jurnal Ilmiah Berbasis Teknologi Digital untuk Meningkatkan Kompetensi Publikasi Mahasiswa Marthiawati, Noneng; Rohayani, Hetty; Kurniawansyah, Kevin; Jasmir, Jasmir; Gustinar, Gustinar
Journal of Social Responsibility Projects by Higher Education Forum Vol 6 No 2 (2025): November 2025
Publisher : Forum Kerjasama Pendidikan Tinggi (FKPT)

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.47065/jrespro.v6i2.9656

Abstract

This community service activity was conducted at Universitas Muhammadiyah Jambi, involving students from the Information Systems study program as the primary participants. The main problems identified among participants included a limited understanding of the structure of scientific articles, inadequate paraphrasing skills, and a lack of competence in using reference management applications. In addition, the utilization of digital technology to support scientific writing was not yet optimal, despite the availability of adequate technological facilities. This activity aimed to improve students’ competencies in writing scientific articles in accordance with journal standards and to enhance their ability to utilize digital technology as a supporting tool for writing and publication. The method applied was a training-based approach combined with intensive mentoring. The training materials covered the structure of scientific articles, proper academic writing techniques, paraphrasing strategies to avoid plagiarism, the use of reference management tools, and the application of digital technology in scientific writing. The contribution of this activity lies in strengthening students’ practical academic skills while fostering a productive and technology-oriented academic culture. The results indicated a significant improvement in students’ understanding and skills. The average score increased from 54.25 in the pre-test to 79.63 in the post-test, representing a 46.78 percent improvement. Furthermore, most participants were able to independently produce draft scientific articles with a more systematic structure and effectively use reference management tools. The activity also contributed to increased motivation among students to write and publish scientific work.