This study investigates multitask learning approaches for human motion forecasting and fall classification using pose data extracted from video sequences. A custom dataset, the TelUP HumanFall Forecasting Dataset, was developed, containing annotated video frames representing fall and non-fall scenarios captured from six participants. Pose information was extracted using YOLOv11, producing 17 keypoints per frame, which were normalized and segmented into temporal sequences for training. Three deep learning architectures, Multilayer Perceptron (MLP), Recurrent Neural Network (RNN), and Long Short-Term Memory (LSTM), were implemented and evaluated. The models were assessed in a subject-independent test set consisting of two participants to ensure generalization. Quantitative evaluation measured the forecast error using the mean per joint position error (MPJPE) and classification accuracy. The MLP achieved the lowest MPJPE of 0.2630 (131.5 pixels), while the LSTM obtained the highest classification accuracy of 92.89%. Qualitative analysis revealed limitations in the capture of complex joint dynamics. Despite fast training convergence, the results emphasize a trade-off between forecast precision and classification accuracy. Future work will explore more expressive architectures and improved pose extraction methods to enhance forecast realism.
Copyrights © 2025