Acute Respiratory Infection (ARI) is one of the most common respiratory diseases with diverse and overlapping clinical symptoms, making initial identification challenging and necessitating a systematic, data-driven classification approach. This study aims to compare the performance of the Decision Tree and K-Nearest Neighbor (KNN) algorithms in classifying ARI-related disease categories. The novelty of this research lies in the specific construction of ARI labels into five distinct categories from the Pediatric Respiratory Infections dataset, coupled with a rigorous feature selection process to handle mixed data types and address class imbalance using weighted evaluation metrics. The dataset consisted of 801 patient records with 91 initial attributes. The classification label was constructed from the Main diagnostic column and grouped into five categories: Asthma/Bronchospasm/Wheezing, Pneumonia/Pneumopathy, Bronchiolitis, Laryngeal/Upper Respiratory, and Other. After feature selection to remove noise and redundancy, 54 features were used, consisting of 24 numerical and 30 categorical features. The research stages included preprocessing, label construction, missing value handling, categorical encoding, KNN normalization, 80:20 train-test splitting, and model evaluation. The results show that Decision Tree achieved higher performance with 67.08% accuracy and 69.12% weighted F1-score, while KNN achieved 65.84% accuracy and 64.18% weighted F1-score. Thus, Decision Tree demonstrates superior performance and interpretability for this specific dataset.
Copyrights © 2026