Early detection of thyroid cancer recurrence is a crucial factor in patient survival and treatment effectiveness. Misdetection results in disease severity, high cost, recovery time, and decreased service quality. In addition, the main challenges in developing a Machine Learning (ML)-based detection decision support system are class imbalance in medical data and high feature dimensions that can affect model accuracy and efficiency. This study proposes a feature selection-based approach and class imbalance handling to improve the performance of early detection of Thyroid cancer. Several feature selection techniques, such as Information Gain (IG), Gain Ratio (GR), Gini Decrease (GD), and Chi-Square (CS), can select features based on weighted ranking. In addition, to overcome the imbalanced class distribution, we use the Synthetic Minority Over-Sampling Technique (SMOTE). ML classification models such as k-NN, Tree, SVM, Naive Bayes, AdaBoost, Neural Network (NN), and Logistic Regression (LR) are tested and evaluated based on a confusion matrix, including accuracy, precision, recall, time, and log loss. Experimental results show that the combination of imbalanced class handling strategies significantly improves the prediction performance of ML algorithms. In addition, we found that the combination of CS+NN feature selection techniques consistently showed optimal performance. This study emphasizes the importance of data pre-processing and proper algorithm selection in the development of a machine learning-based thyroid cancer detection system.
Copyrights © 2025