AI-generated music systems such as MusicGen and Stable Audio 2.0 are increasingly capable of producing stylistically coherent and musically rich compositions. However, questions remain about whether these outputs constitute genuine creativity or mere replication of training data. This study evaluates the memorization and creativity levels of these models using symbolic and audio-based metrics, alongside perceptual assessments. A dual-model evaluation was conducted: symbolic outputs were assessed using chroma-based DTW, Smith–Waterman, melodic n-grams, and MGEval metrics, while audio outputs were analyzed for waveform similarity and listener ratings. Anti-Memorization Guidance (AMG) was introduced to reduce overfitting, with 50 outputs generated per model under both standard and AMG conditions. Results showed significant memorization in standard outputs, particularly with high Replication Index scores and latent similarity clusters. AMG effectively lowered memorization and increased Novelty Scores and Harmonic Surprise. Subjective tests using MUSHRA and Likert-style ratings revealed that AMG-enhanced outputs were perceived as more creative but slightly less typical in genre. Correlations between objective and subjective metrics further validated the effectiveness of the hybrid evaluation framework. The study concludes that AI music systems can be guided toward greater originality using anti-memorization strategies. While achieving historical creativity remains challenging, perceptually and statistically creative outputs are attainable. This framework offers a replicable approach for evaluating creativity and informs ethical, legal, and design considerations in AI music generation.
Copyrights © 2025