The exponential rise of short-form video platforms such as TikTok, Instagram Reels, and YouTube Shorts has transformed digital content consumption patterns, creating both opportunities and challenges in media analysis. One critical need is the efficient segmentation and classification of temporal segments within these videos to enable applications in content moderation, targeted advertising, and audience behavior research. This study proposes a lightweight deep learning architecture that integrates Convolutional Neural Networks (CNN) for visual feature extraction and Long Short-Term Memory (LSTM) networks for temporal sequence modeling. The proposed CNN-LSTM framework is optimized for computational efficiency while maintaining high classification accuracy, making it suitable for deployment in resource-constrained environments. Experimental evaluations on a curated short-form video dataset show that the model achieves competitive performance compared with larger architectures, with significant reductions in memory usage and inference time. Furthermore, the temporal segmentation module effectively isolates meaningful visual-audio segments, enabling more precise classification outcomes. The results highlight the potential of lightweight architectures to address the scalability demands of modern video analysis systems without sacrificing accuracy. This research contributes to the growing discourse on efficient multimedia processing by bridging the gap between high-performance models and practical, real-time applications in the evolving short-form video ecosystem.
Copyrights © 2026