📄
Abstract - OmniFashion: Towards Generalist Fashion Intelligence via Multi-Task Vision-Language Learning
Fashion intelligence spans multiple tasks, i.e., retrieval, recommendation, recognition, and dialogue, yet remains hindered by fragmented supervision and incomplete fashion annotations. These limitations jointly restrict the formation of consistent visual-semantic structures, preventing recent vision-language models (VLMs) from serving as a generalist fashion brain that unifies understanding and reasoning across tasks. Therefore, we construct FashionX, a million-scale dataset that exhaustively annotates visible fashion items within an outfit and organizes attributes from global to part-level. Built upon this foundation, we propose OmniFashion, a unified vision-language framework that bridges diverse fashion tasks under a unified fashion dialogue paradigm, enabling both multi-task reasoning and interactive dialogue. Experiments on multi-subtasks and retrieval benchmarks show that OmniFashion achieves strong task-level accuracy and cross-task generalization, highlighting its offering of a scalable path toward universal, dialogue-oriented fashion intelligence.
OmniFashion:通过多任务视觉-语言学习迈向通用时尚智能 /
OmniFashion: Towards Generalist Fashion Intelligence via Multi-Task Vision-Language Learning
1️⃣ 一句话总结
这篇论文提出了一个名为OmniFashion的统一视觉-语言框架,它通过构建大规模数据集和创新的对话范式,将检索、推荐、识别等多种时尚任务整合在一起,实现了跨任务的准确理解和推理,为构建通用的、对话驱动的时尚智能系统提供了可行路径。