Background: Lung disease is a leading cause of death globally, with more than 4 million cases each year, including 500,000 new cases in Indonesia, most of which are detected at an advanced stage.Objective: This study aims to compare the performance of three decision tree algorithms, XGBoost, C4.5, and Random Forest, in detecting lung disease and to determine the best method based on evaluation metrics.Methods: A total of 30,000 data samples from Kaggle were processed through a cleaning stage using the IQR method, categorical attribute coding, and data division into 80% for training and 20% for testing. The classification models used include XGBoost, C4.5, and Random Forest. Model performance evaluation used a confusion matrix, accuracy, precision, recall, and F1-score.Result: The results showed that the C4.5 algorithm had the best performance with an accuracy of 94.33% and zero false negatives. XGBoost followed with an accuracy of 93.18%, while Random Forest was the lowest (90.07%).Conclusion: These findings indicate that C4.5 has great potential in an accurate early detection system, helping to reduce the risk of misdiagnosis, especially in false negative cases, and supporting clinical decision making in health facilities.
Copyrights © 2025