This research compares three machine learning algorithms: Random Forest (RF), Decision Tree (DT), and K-Nearest Neighbors (KNN) for classifying illnesses influenced by climate, patient history, and clinical indicators. The dataset obtained from Kaggle contains 5,200 records combining meteorological and symptom data. Two pre-processing scenarios were tested to examine their impact on model performance: (1) normalization using Min-Max, and (2) normalization followed by balancing with the Synthetic Minority Over-sampling Technique (SMOTE). Results show that normalization significantly improves KNN’s performance, increasing its accuracy from 0.324 on raw data to 0.968. In the first scenario, Random Forest achieved the highest accuracy of 0.985, followed by Decision Tree with 0.974 and KNN with 0.968. After applying SMOTE, Random Forest maintained stable accuracy at 0.985, while Decision Tree and KNN slightly decreased to 0.964. These findings indicate that Random Forest is the most robust and consistent algorithm for this classification task. Furthermore, the study reveals that SMOTE does not always enhance accuracy and must be applied selectively. Information gain analysis identifies symptom features as the strongest predictors. Overall, this research provides guidance in selecting the optimal algorithm and pre-processing strategy for building effective weather-related disease classification systems. Keywords: Classification of Diseases, Decision Tree, K-Nearest Neighbors, Random Forest, SMOTE
Copyrights © 2026