Optical Music Recognition (OMR) faces significant challenges when applied to polyphonic music scores, due to the high symbol density and the overlapping of notes. This study proposes a hybrid method of combining the detection of noteheads using YOLOv12 with rule-based pitch inference, which converts the spatial position of the detected noteheads into accurate pitch information. The dataset used in this study is DeepScoresV2-Dense, which is processed through annotation conversion, image normalization, and staff extraction as a reference to infer the pitch of a note. The YOLOv12 model was trained for 30 epochs using a transfer learning approach, resulting in an mAP50 value of 0.75, a precision of 0.85, and a recall of 0.58 on the validation data. The implementation of rule-based pitch inference successfully achieved a pitch accuracy of 0.87 with an F1 score of 0.87, demonstrating a balance between accuracy and completeness of prediction. This result shows that the integration of YOLOv12 and rule-based pitch inference can be an effective solution for pitch extraction in polyphonic music scores, with potential applications in music information retrieval, digital music score conversion, and an artificial intelligence-based music learning system.
Copyrights © 2025