This study investigates the application of the Extreme Gradient Boosting (XGBoost) algorithm to predict the age period of trilobite fossils based on geological and geospatial data. The challenges addressed in this research include the high complexity of paleontological data, the presence of missing values, and class imbalance in the target variable time_period, which can negatively affect predictive performance. The objective of this study is to develop an accurate and robust fossil age prediction model through systematic data preprocessing, feature selection, and model optimization. The dataset used in this research was obtained from Kaggle and consists of the attributes longitude, latitude, lithology, environment, and collection_type as the main features. The research workflow includes data cleaning, missing value imputation, categorical feature encoding, data splitting using stratified train–test split, and class imbalance handling through a class weight adjustment approach. The XGBoost model was trained on the training dataset and further optimized using RandomizedSearchCV to obtain the optimal hyperparameter configuration. Evaluation results on the testing dataset show that the tuned XGBoost model achieved an accuracy of 95%, precision of 90%, recall of 93%, and an F1-score of 91%, outperforming the model without hyperparameter tuning. These results demonstrate that the integration of geological–geospatial feature selection and hyperparameter tuning in XGBoost is effective in improving the performance of trilobite fossil age period prediction. The results of this study are expected to serve as a computational support approach in paleontology to assist fossil period determination in a more objective, efficient, and data-driven manner.
Copyrights © 2026