强化学习如何实现专家级芯片布局? / How Can Reinforcement Learning Achieve Expert-level Placement?
1️⃣ 一句话总结
本文提出通过从专家设计的最终布局反向推导出每一步的放置轨迹,并用这些轨迹训练一个奖励模型,从而使强化学习在芯片布局任务中达到与人类专家相当甚至更优的效果。
Chip placement is a critical step in physical design. While reinforcement learning (RL)-based methods have recently emerged, their training primarily focuses on wirelength optimization, and therefore often fail to achieve expert-quality layouts. We identify the reward design as the primary cause for the performance gap with experts, and instead of formalizing intricate processes, we circumvent this by directly learning from expert layouts to derive a reward model. Our approach starts from the final expert layouts to infer step-by-step expert trajectories. Using these trajectories as demonstrations or preferences, we train a model that captures the latent implicit rewards in expert results. Experiments show that our framework can efficiently learn from even a single design and generalize well to unseen cases.
强化学习如何实现专家级芯片布局? / How Can Reinforcement Learning Achieve Expert-level Placement?
本文提出通过从专家设计的最终布局反向推导出每一步的放置轨迹,并用这些轨迹训练一个奖励模型,从而使强化学习在芯片布局任务中达到与人类专家相当甚至更优的效果。
源自 arXiv: 2604.25191