SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton

📄 Abstract - SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton

Generating symphonic music requires simultaneously managing high-level structural form and dense, multi-track orchestration. Existing symbolic models often struggle with a "complexity-control imbalance", in which scaling bottlenecks limit long-term granular steerability. We present SymphonyGen, a 3D hierarchical framework for contemporary cinematic orchestration. SymphonyGen employs a cascading decoder architecture that decomposes the Bar, Track, and Event axes, improving computational efficiency and scalability over conventional 1D or 2D models. We introduce "short-score" conditioning via a beat-quantized multi-voice harmony skeleton, enabling outline control while preserving textural diversity. The model is further refined using Group Relative Policy Optimization (GRPO) with a cross-modal audio-perceptual reward, aligning symbolic output with modern acoustic expectations. Additionally, we implement a dissonance-averse sampling algorithm to suppress unintended tonal clashes during inference. Objective evaluations show that both reinforcement learning and dissonance-averse sampling effectively enhance harmonic cleanliness while maintaining melodic expression. Subjective evaluations demonstrate that SymphonyGen outperforms baselines in musicality and preference for orchestral music generation. Demo page: this https URL

SymphonyGen：基于可控和声骨架的三维层次化管弦乐生成 / SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton

1️⃣ 一句话总结

该论文提出了一种名为SymphonyGen的三维层次化框架，通过将音乐结构分解为小节、轨道和事件三个维度，并引入基于节拍量化的多声部和声骨架作为控制信号，解决了现有模型在复杂性与可控性之间的平衡问题，从而高效生成高质量、多轨道的现代管弦乐曲。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要