Purpose – The rapid integration of artificial intelligence (AI) in education has raised concerns about excessive student dependence, potentially undermining critical thinking and learning autonomy. This study aims to identify the most effective machine learning algorithm for detecting AI dependency in learning activities and to examine the impact of training–testing data proportion on predictive performance.Methods - This study employs the CRISP-DM framework and applies two supervised classification algorithms, Random Forest and Support Vector Machine (SVM), to a synthetic dataset of 10,000 AI-assisted learning sessions. The target variable, perceived AI assistance level, was discretised into three categories (low, medium, and high). Model performance was evaluated under four dataset split scenarios (60:40, 70:30, 80:20, and 90:10) using accuracy, AUC, precision, recall, and F1-score.Findings - The results show that Random Forest consistently outperforms SVM across all dataset proportions and evaluation metrics. The highest performance was achieved by Random Forest with a 60:40 split, yielding an accuracy of 67.6% and an AUC of 80.8%. Although SVM demonstrated stable performance, it required larger training datasets and remained inferior to Random Forest.Research limitations - The use of synthetic data and limited behavioural features restricts the generalisability of the findings. The moderate accuracy indicates that AI dependency is a complex construct not fully captured by the current model. Originality - This study provides empirical evidence on the combined influence of algorithm selection and dataset proportion in detecting AI dependency, offering practical guidance for developing early-warning systems to support responsible AI use in education.
Copyrights © 2026