📄
Abstract - SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton
Generating symphonic music requires simultaneously managing high-level structural form and dense, multi-track orchestration. Existing symbolic models often struggle with a "complexity-control imbalance", in which scaling bottlenecks limit long-term granular steerability. We present SymphonyGen, a 3D hierarchical framework for contemporary cinematic orchestration. SymphonyGen employs a cascading decoder architecture that decomposes the Bar, Track, and Event axes, improving computational efficiency and scalability over conventional 1D or 2D models. We introduce "short-score" conditioning via a beat-quantized multi-voice harmony skeleton, enabling outline control while preserving textural diversity. The model is further refined using Group Relative Policy Optimization (GRPO) with a cross-modal audio-perceptual reward, aligning symbolic output with modern acoustic expectations. Additionally, we implement a dissonance-averse sampling algorithm to suppress unintended tonal clashes during inference. Objective evaluations show that both reinforcement learning and dissonance-averse sampling effectively enhance harmonic cleanliness while maintaining melodic expression. Subjective evaluations demonstrate that SymphonyGen outperforms baselines in musicality and preference for orchestral music generation. Demo page: this https URL
SymphonyGen:基于可控和声骨架的三维层次化管弦乐生成 /
SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton
1️⃣ 一句话总结
该论文提出了一种名为SymphonyGen的三维层次化框架,通过将音乐结构分解为小节、轨道和事件三个维度,并引入基于节拍量化的多声部和声骨架作为控制信号,解决了现有模型在复杂性与可控性之间的平衡问题,从而高效生成高质量、多轨道的现代管弦乐曲。