It is estimated that at least 17 million Indonesians suffer from thyroid disorders. Interestingly, nearly 60% of those living with a thyroid disorder do not receive a diagnosis. Thus, it is necessary to carry out research that applies methods to predict thyroid disease. Before applying prediction methods, it is crucial to implement classification methods to obtain an accurate prediction model. However, to achieve optimal classification results and to avoid inaccuracies, a balance in the used data is required. Data imbalance is a condition where the ratio between classes in the data is uneven, which can result in the generated model becoming biased. The main objective of the research is to present a solution that can improve the accuracy of early detection of thyroid diseases through addressing data imbalance and implementing appropriate classification algorithms. The research methodology began with the collection and analysis of a dataset consisting of 9172 data points. Preprocessing was then performed, resulting in 5321 training data points and 1331 test data points. The testing phase employed 7 different classification algorithms with 7 different resampling methods and evaluation using a confusion matrix. This research achieved the highest accuracy rate of 98%, obtained from the combination of the Random Forest Algorithm and the Random Over Sampling method. It can be concluded that the combination of the Random Forest Algorithm with the Random Over Sampling resampling method can improve early detection accuracy for thyroid diseases.
Copyrights © 2024