Traditional Indonesian dances serve as a vital expression of cultural identity and regional heritage, yet their preservation through intelligent video recognition remains limited due to technical challenges in motion complexity, costume variation, and the lack of annotated datasets. Prior research commonly employed Convolutional Neural Networks (CNNs) with manually defined hyperparameters, which often resulted in overfitting and poor adaptability when applied to dynamic and real-world video inputs. To overcome these limitations, this study proposes a robust and adaptive classification framework utilizing a hyperparameter-tuned CNN model. The approach automatically optimizes key training parameters such as learning rate, batch size, optimizer type, and epoch count through iterative experimentation, thereby maximizing the model’s ability to generalize across both static and temporal data domains. The model was trained using image datasets representing three traditional dances (Gambyong, Remo, and Topeng), and subsequently tested on segmented frames extracted from YouTube videos. Results indicate strong model performance, achieving 99.67% accuracy on the training set and 100% accuracy, precision, recall, and F1-score across all testing videos. The proposed method successfully bridges the gap between still-image learning and real-world motion recognition, making it suitable for practical applications in digital archiving and cultural documentation. This study’s contribution lies not only in the model’s technical effectiveness but also in its support for preserving intangible cultural assets through intelligent and automated video-based recognition. Future work may incorporate temporal modelling or multi-camera perspectives to further enrich motion understanding and extend the system to broader performance domains.