Football is one of the world’s most widely followed sports, making it an appealing subject for predictive analytics using modern data technologies. This study aims to build a predictive model for international football match outcomes by applying the CRISP-DM methodology as the analytical framework. The dataset used is international_matches.csv covering the period 1993–2022, which underwent a series of preprocessing steps including data cleaning, feature engineering, encoding, imputation, and scaling. Several machine learning algorithms were evaluated, namely Logistic Regression, Random Forest, and HistGradientBoostingClassifier (HistGBM). The best model was obtained using the optimized HistGBM, which demonstrated superior capability in identifying home-team victories, achieving a Recall of 78%. This high sensitivity indicates that comparative features—such as rank difference and squad strength disparity across goalkeeper, defense, midfield, and attack attributes—play a crucial role in predicting dominant match outcomes. The trained model was subsequently deployed into an interactive Streamlit-based web application that enables users to input match-related information and obtain real-time predictions. Overall, this study shows that machine learning methods can be effectively utilized to support data-driven analysis of international football match outcomes.
Copyrights © 2026