Computer vision-based Human Activity Recognition (HAR) systems hold significant potential for applications in educational settings, particularly for monitoring student activities in laboratories or classrooms. Activities such as typing, smartphone usage, and resting are often visually indistinguishable due to their highly similar seated postures. This study proposes a spatiotemporal modeling approach to automatically and non-invasively recognize such activities. Body poses are extracted from video streams using MediaPipe Pose and represented as sequential feature vectors, which are then analyzed using a Long Short-Term Memory (LSTM) network to capture temporal dynamics. The model is trained on video data of students performing three primary activity classes. Evaluation on validation data demonstrates a classification accuracy of 98.48%, with average precision, recall, and F1-score values of approximately 98%. However, testing on unseen videos shows a decrease in accuracy to around 65%, primarily due to misclassification in segments with minimal movement. These findings suggest that the model is sensitive to subtle pose transitions, which are common in seated activity contexts. Overall, the proposed approach demonstrates promising potential for automated student activity monitoring and provides a foundation for developing pose-based behavioral analysis systems in contextual learning environments.
Copyrights © 2025