矢量棱镜:通过分层语义结构为矢量图形制作动画 / Vector Prism: Animating Vector Graphics by Stratifying Semantic Structure
1️⃣ 一句话总结
这篇论文提出了一个通过聚合多个弱预测来恢复矢量图形语义结构的新框架,从而解决了当前视觉语言模型在自动动画制作中因图形元素碎片化而难以协调运动的难题,显著提升了动画的连贯性和可解释性。
Scalable Vector Graphics (SVG) are central to modern web design, and the demand to animate them continues to grow as web environments become increasingly dynamic. Yet automating the animation of vector graphics remains challenging for vision-language models (VLMs) despite recent progress in code generation and motion planning. VLMs routinely mis-handle SVGs, since visually coherent parts are often fragmented into low-level shapes that offer little guidance of which elements should move together. In this paper, we introduce a framework that recovers the semantic structure required for reliable SVG animation and reveals the missing layer that current VLM systems overlook. This is achieved through a statistical aggregation of multiple weak part predictions, allowing the system to stably infer semantics from noisy predictions. By reorganizing SVGs into semantic groups, our approach enables VLMs to produce animations with far greater coherence. Our experiments demonstrate substantial gains over existing approaches, suggesting that semantic recovery is the key step that unlocks robust SVG animation and supports more interpretable interactions between VLMs and vector graphics.
矢量棱镜:通过分层语义结构为矢量图形制作动画 / Vector Prism: Animating Vector Graphics by Stratifying Semantic Structure
这篇论文提出了一个通过聚合多个弱预测来恢复矢量图形语义结构的新框架,从而解决了当前视觉语言模型在自动动画制作中因图形元素碎片化而难以协调运动的难题,显著提升了动画的连贯性和可解释性。
源自 arXiv: 2512.14336