Cyber bullying in close-contact environments, especially schools, is another huge social problem that has a deep impact on victims psychologically and mentally. Recent developments in deep learning and remote monitoring technologies offer the potential to improve real-time detection and intervention strategies. In this work, we explore the effectiveness of remote deep neural networks (DNNs) for classifying and identifying trigger events in close-contact bullying events. Using spatial-temporal video data and multimodal sensor inputs, we present a hierarchical DNN framework that fuses real-time audio, video, and physiological signals to accurately identify bullying events.In our proposed system, we use transfer learning using pre-trained vision transformers (ViTs) and convolutional neural networks (CNNs) to extract the key visual features, while Bidirectional Long Short-Term Memory (Bi-LSTM) networks analyze the speech and contextual cues. We develop a hierarchical user model to classify events into verbal,physical, and psychological bullying. The deployed system on edge devices, with cloud-assisted inference, yields real-time low-latency detection.
Copyrights © 2025