The Spaceship Titanic dataset is fictional yet complex and challenging, featuring a mix of numerical and categorical features and missing values. This study aims to evaluate the performance of three machine learning model scenarios for classifying passenger status as “Transported” or “not”. The three scenarios implemented include linear-like models, a combination of the Top 5 Diverse models, and tree-based/ensemble models, each using a voting classifier approach. The voting model is employed because it can combine the strengths of multiple algorithms to reduce bias and variance, thus improving overall prediction accuracy and stability. The voting mechanism aggregates predictions from several base classifiers using two strategies: hard voting, which selects the majority class, and soft voting, which averages the predicted probabilities across models. The dataset was obtained from Kaggle and processed through several stages: data preprocessing, data splitting, model training, and evaluation. The evaluation results show that the tree-based/ensemble scenario achieved the highest accuracy of 90.38%, followed by the Top 5 Diverse model combination at 87.31% and the Linear-like model at 76.51%. Visualization using the confusion matrix, ROC Curve, and Feature importance analysis further supports the claim that ensemble models are superior at detecting complex classification patterns. These findings suggest that tree-based ensemble models provide the most optimal approach for classification tasks on a dataset like Spaceship Titanic.
Copyrights © 2025