菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-28
📄 Abstract - SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton

Generating symphonic music requires simultaneously managing high-level structural form and dense, multi-track orchestration. Existing symbolic models often struggle with a "complexity-control imbalance", in which scaling bottlenecks limit long-term granular steerability. We present SymphonyGen, a 3D hierarchical framework for contemporary cinematic orchestration. SymphonyGen employs a cascading decoder architecture that decomposes the Bar, Track, and Event axes, improving computational efficiency and scalability over conventional 1D or 2D models. We introduce "short-score" conditioning via a beat-quantized multi-voice harmony skeleton, enabling outline control while preserving textural diversity. The model is further refined using Group Relative Policy Optimization (GRPO) with a cross-modal audio-perceptual reward, aligning symbolic output with modern acoustic expectations. Additionally, we implement a dissonance-averse sampling algorithm to suppress unintended tonal clashes during inference. Objective evaluations show that both reinforcement learning and dissonance-averse sampling effectively enhance harmonic cleanliness while maintaining melodic expression. Subjective evaluations demonstrate that SymphonyGen outperforms baselines in musicality and preference for orchestral music generation. Demo page: this https URL

顶级标签: audio machine learning model training
详细标签: symbolic music generation orchestration reinforcement learning harmony skeleton controllable generation 或 搜索:

SymphonyGen:基于可控和声骨架的三维层次化管弦乐生成 / SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton


1️⃣ 一句话总结

该论文提出了一种名为SymphonyGen的三维层次化框架,通过将音乐结构分解为小节、轨道和事件三个维度,并引入基于节拍量化的多声部和声骨架作为控制信号,解决了现有模型在复杂性与可控性之间的平衡问题,从而高效生成高质量、多轨道的现代管弦乐曲。

源自 arXiv: 2604.25498