Thyroid disease is a health dysfunction that requires immediate and accurate diagnosis. This research seeks to design a classification model based on the Random Forest algorithm to detect the type of thyroid disease utilizing data from the UCI Repository. In the data processing stage, KNNImputer is used to handle missing data by calculating the average value of the nearest neighbors based on Euclidean distance, thus ensuring better data quality for model training. The developed model was evaluated utilizing the confusion matrix, which showed an accuracy of 98%, with precision, recall, and F1 score values reached 98% based on weighted avg.These results corroborate that the proposed model is highly reliable in detecting various types of thyroid diseases, such as Negative, Hypothyroid, and Hyperthyroid. This research makes an important contribution to the application of data mining technology for medical diagnosis, while proving that optimal data processing through methods such as KNN Imputer can significantly improve model performance.
Copyrights © 2025