Tree seedling survival is a critical factor in forest regeneration and sustainable ecosystem management. However, predicting seedling survival remains challenging due to complex interactions between environmental conditions, soil biotic factors, and functional plant traits. This study aims to compare the performance of CatBoost and Light Gradient Boosting Machine (LightGBM) algorithms in predicting tree seedling survival using a machine learning approach. The dataset, obtained from the Tree Survival Prediction dataset on Kaggle, includes environmental variables, soil interaction factors, and functional traits. The target variable is binary, indicating whether a seedling survives or not. Data preprocessing involved handling missing values, encoding categorical variables, normalization, and model validation using 10-fold cross-validation. Model performance was evaluated using accuracy, precision, recall, F1-score, and Receiver Operating Characteristic Area Under Curve (ROC-AUC). The results show that LightGBM outperforms CatBoost, achieving an accuracy of 0.8456, precision of 0.8718, recall of 0.8553, F1-score of 0.8635, and ROC-AUC of 0.9282. In comparison, CatBoost achieves an accuracy of 0.8223 and ROC-AUC of 0.9132. Feature importance analysis indicates that arbuscular mycorrhizal fungi, phenolics, and lignin are the most influential factors affecting seedling survival. These findings demonstrate that LightGBM is a reliable and efficient model for smart forestry applications, supporting data-driven decision-making and improving reforestation strategies. The model enables simulation of planting scenarios, improving resource efficiency and restoration success rates. Keywords - CatBoost, LightGBM, Machine Learning, Seedling Survival, Smart Forestry
Copyrights © 2026