菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-26
📄 Abstract - AnyID: Ultra-Fidelity Universal Identity-Preserving Video Generation from Any Visual References

Identity-preserving video generation offers powerful tools for creative expression, allowing users to customize videos featuring their beloved characters. However, prevailing methods are typically designed and optimized for a single identity reference. This underlying assumption restricts creative flexibility by inadequately accommodating diverse real-world input formats. Relying on a single source also constitutes an ill-posed scenario, causing an inherently ambiguous setting that makes it difficult for the model to faithfully reproduce an identity across novel contexts. To address these issues, we present AnyID, an ultra-fidelity identity-preservation video generation framework that features two core contributions. First, we introduce a scalable omni-referenced architecture that effectively unifies heterogeneous identity inputs (e.g., faces, portraits, and videos) into a cohesive representation. Second, we propose a primary-referenced generation paradigm, which designates one reference as a canonical anchor and uses a novel differential prompt to enable precise, attribute-level controllability. We conduct training on a large-scale, meticulously curated dataset to ensure robustness and high fidelity, and then perform a final fine-tuning stage using reinforcement learning. This process leverages a preference dataset constructed from human evaluations, where annotators performed pairwise comparisons of videos based on two key criteria: identity fidelity and prompt controllability. Extensive evaluations validate that AnyID achieves ultra-high identity fidelity as well as superior attribute-level controllability across different task settings.

顶级标签: video generation aigc multi-modal
详细标签: identity preservation video synthesis reference unification reinforcement learning fine-tuning attribute controllability 或 搜索:

AnyID:从任意视觉参考生成超高保真度通用身份保持视频 / AnyID: Ultra-Fidelity Universal Identity-Preserving Video Generation from Any Visual References


1️⃣ 一句话总结

这篇论文提出了一个名为AnyID的新框架,它能利用人脸、肖像或视频等多种形式的身份信息作为参考,生成超高保真度且能精确控制角色属性的定制化视频,解决了以往方法只能依赖单一参考源的限制。

源自 arXiv: 2603.25188