International Journal of Information Technology and Computer Science Applications (IJITCSA)
Vol. 4 No. 1 (2025): January - April 2026

A Lakehouse-Oriented Big Data Infrastructure for Educational Analytics: Integrating Administrative and Assessment Data for Early Student Risk Prediction

Bhairav Kaphle (Madan Bhandari University of Science and Technology (MBUST))
Biswajit Shrestha (Madan Bhandari University of Science and Technology (MBUST))



Article Info

Publish Date
13 Apr 2026

Abstract

Educational institutions increasingly depend on heterogeneous digital systems, yet many analytics initiatives remain fragmented across student information, registration, assessment, and learning platforms. This paper proposes a lakehouse-oriented big data infrastructure for educational analytics and validates it through a reproducible early-risk prediction study using the Open University Learning Analytics Dataset (OULAD). The study integrates five public OULAD tables student information, course registration, assessment metadata, student assessment submissions, and course presentation metadata into temporally valid feature tables aligned to the student–module–presentation level. We define a windowed feature engineering framework that constructs actionable indicators such as submission rate, weighted completion score, average submission lag, and assessment coverage gap at 30%, 50%, 70%, and 100% of the course timeline. Two supervised classifiers, logistic regression and random forest, are evaluated under a stratified 80/20 protocol. The results show that administrative data alone provides weak discrimination (AUC  0.673), whereas integrated mid-course assessment evidence substantially improves performance. At the 50% course window, the random-forest model achieves an AUC of 0.947, F1 of 0.879, and recall of 0.829; even at the 30% window the model already reaches an AUC of 0.904. These findings demonstrate that the value of educational prediction depends not only on model choice but also on data integration architecture. The paper contributes (i) a lakehouse-oriented reference architecture for higher-education analytics, (ii) a temporally constrained feature engineering strategy for early-warning systems, and (iii) an empirical ablation showing that multi-source integration yields large and operationally meaningful gains.

Copyrights © 2025






Journal Info

Abbrev

jitcsa

Publisher

Subject

Computer Science & IT Decision Sciences, Operations Research & Management Education Engineering

Description

he Journal of Information Technology and Computer Science Applications (JITCSA) is an information technology and computer science publication. Applications from both fields for solving real cases are also welcome. JITCSA accepts research articles, systematic reviews, literature studies, and other ...