This study presents a music metadata recommendation system based on facial emotion detection using the Vision Transformer (ViT-B/16) model. The system classifies user emotions into seven categories using the KDEF facial dataset and matches them with music metadata (title, artist, genre, mood) labeled with corresponding emotional tags. The ViT-B/16 model was trained using transfer learning and evaluated with accuracy, precision, recall, and F1-score. The model achieved an accuracy of 89% and an average F1-score of 0.89. The recommendation system was assessed by 30 participants, with 87% indicating that the suggested song metadata matched the detected emotion. The system offers real-time emotion recognition and automatic mood-based song suggestions. However, classification accuracy for visually similar emotions such as “fear” and “angry” remains a challenge. Future development may include audio and lyric analysis, as well as user preference integration, to enhance recommendation relevance.
Copyrights © 2026