Manual monitoring of CCTV systems for detecting anomalous movements, such as criminal activity, is highly inefficient and prone to human error, thus urging the need for automated surveillance systems. A key research gap is that most object detection models (spatial in nature) fail to understand the temporal context (movement patterns over time) which is key to distinguishing normal and anomalous activities. This study proposes the design of a hybrid deep learning model YOLOv8-LSTM to address this issue. Using the 4D R&D (Define, Design, Develop) research methodology, an architecture is designed in which YOLOv8 (yolov8m) functions as a spatial feature extractor (generating a 106-dimensional vector) from each video frame. The sequence of these features is then analyzed using a Bidirectional Long Short- Term Memory (Bi-LSTM) equipped with an Attention Pooling mechanism to model temporal dependencies and classify movements. The prototype test results on the test set show strong performance, achieving an AUC of 0.8646 and an F1-Score of 0.6530. Qualitative analysis through 3D latent space visualization successfully demonstrated the model's effectiveness: initially overlapping spatial features (YOLOv8 input) were successfully mapped into clearly separated clusters of normal and anomalous classes (LSTM output). This study validates that the proposed hybrid architecture effectively combines spatial and temporal understanding for accurate anomalous motion detection.
Copyrights © 2026