Data infrastructure has become a strategic backbone of contemporary education because digital learning environments continuously generate student traces that can be transformed into actionable evidence for teaching, advising, and institutional planning. Yet the practical value of educational data depends on much more than storage capacity. Institutions must integrate heterogeneous sources, manage raw and curated data simultaneously, enforce privacy constraints, and deliver analytics outputs that are operationally useful and ethically defensible. This study develops a layered educational data infrastructure architecture that connects raw learning data, extract-transform-load processes, governance mechanisms, curated analytics repositories, and machine-learning services. This paper includes a reproducible empirical evaluation using the real xAPI-Edu-Data benchmark collected from the Kalboard 360 learning management environment. Three machine-learning models are compared under a common preprocessing pipeline, and an ablation analysis quantifies the incremental value of integrated behavioral, parental, and contextual features. The best-performing model achieves a test macro-F1 of 0.797 and a macro one-vs-rest ROC-AUC of 0.919, while the ablation study shows that the full integrated feature set clearly outperforms demographic-only and behavior-only alternatives. The paper contributes structured architecture, mathematical formalization of integrated learning analytics, and empirical evidence that richer, better-governed data pipelines produce more useful predictive signals for educational decision support.
Copyrights © 2025