This research focuses on predicting the life expectancy of lung cancer patients after undergoing thoracic surgery, using a decision tree classification algorithm (C4.5) combined with adaptive synthetic sampling to handle data imbalance. Data imbalance in the lung cancer patient dataset is a major obstacle in obtaining accurate prediction results, especially in identifying minority classes. Data imbalance in the lung cancer patient dataset is a major obstacle in obtaining accurate prediction results, especially in identifying minority classes. By applying ADASYN, the data distribution becomes more even, thus improving the performance of the C4.5 model. The results showed that combining these methods increased the prediction accuracy from 67% to 87%. In addition, the precision, recall, and f1-score for minority classes have significantly improved, which were previously difficult to identify by the model. Thus, combining the C4.5 algorithm and the ADASYN technique proved effective in dealing with the challenge of data imbalance and resulted in better prediction in the case of lung cancer. This study is expected to contribute to the field of medical classification and serve as a reference for further research on similar cases.
Copyrights © 2025