Claim Missing Document
Check
Articles

Found 12 Documents
Search

Performance Comparison of Decision Tree, KNN, and Naive Bayes for Air Quality Classification Thanri, Yan Yang; Iriani, Juli Iriani; Tanti, Lili Tanti; Zaidi, Luthfi Zaidi
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer Vol. 25 No. 2 (2026)
Publisher : Universitas Bumigora

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.30812/matrik.v25i2.5121

Abstract

Air quality degradation has become a critical environmental and public health issue, necessitating accurateand reliable classification models to support effective monitoring systems. This study aims toconduct a comparative analysis of four machine learning algorithms-Decision Tree, k-Nearest Neighbor (kNN), Naive Bayes, and Stochastic Gradient Descent (SGD)-for classifying air quality using environmental parameters, including particulate matter ≤ 2.5 μm (PM2.5), carbon monoxide (CO), temperature, humidity, nitrogen dioxide (NO2), and sulfur dioxide (SO2). The methodology employssupervised learning, where each model is trained and evaluated using classification accuracy, area under the receiver operating characteristic curve (AUC), F1-Score, precision, recall, and Matthews Correlation Coefficient (MCC), supported by ROC curve and confusion matrix analyses. The results show that the Decision Tree algorithm achieves the best overall performance, attaining a classification accuracy of 93.8% with a balanced precision, recall, and F1-Score, indicating strong and consistent predictive capability. The kNN and Naive Bayes models record the highest AUC values (0.980 and 0.982, respectively), demonstrating excellent class separability, although their accuracy and F1-Score are lower than those of the Decision Tree. In addition, the SGD model, implemented with a modified Huber loss function and L2 regularization, provides interpretable feature-weight analysis, identifyingPM2.5 and CO as dominant indicators of the Hazardous air quality class, while temperature and humidity significantly influence the Fair and Good classes. Based on the comprehensive evaluation, the Decision Tree algorithm is recommended as the most reliable model for accurate air quality classification, whereas the SGD model is particularly suitable for feature contribution analysis to enhance interpretability. These findings offer practical insights for selecting appropriate machine learning models in air quality monitoring and decision-support systems.
OPTIMIZING DECISION TREE PERFORMANCE WITH RECURSIVE FEATURE ELIMINATION FOR HIGH-DIMENSIONAL MUSHROOM CLASSIFICATION Lili Tanti; Safrizal; Yan Yang Thanri
JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer) Vol. 11 No. 2 (2025): JITK Issue November 2025
Publisher : LPPM Nusa Mandiri

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.33480/jitk.v11i2.6816

Abstract

Classifying mushroom species presents a significant challenge within biological data analysis because of the wide variety of species and their distinct attributes. This research investigates the effectiveness of the Decision Tree classifier for mushroom categorization by comparing two splitting criteria, the Gini Index and Entropy. Additionally, the study employs the Recursive Feature Elimination (RFE) method for dimensionality reduction to enhance model efficiency and performance. The dataset was collected, cleaned, and analyzed exploratorily before feature selection was conducted using RFE. The Decision Tree model was trained and evaluated using accuracy, precision, recall, and F1-score metrics. The results showed that applying RFE improved computational efficiency without compromising model accuracy. The Gini criterion provided more stable results across all metrics, while Entropy demonstrated higher precision in certain cases. Model optimization through parameter tuning produced the best parameter combination at max_depth = 5, min_samples_leaf = 5, and min_samples_split = 10. This study concludes that integrating RFE with the Decision Tree can significantly enhance the performance of high-dimensional dataset classification. The findings are expected to serve as a reference for developing efficient and accurate biological data classification models