TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval

📄 Abstract - TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval

Composed Image Retrieval (CIR) is an important image retrieval paradigm that enables users to retrieve a target image using a multimodal query that consists of a reference image and modification text. Although research on CIR has made significant progress, prevailing setups still rely simple modification texts that typically cover only a limited range of salient changes, which induces two limitations highly relevant to practical applications, namely Insufficient Entity Coverage and Clause-Entity Misalignment. In order to address these issues and bring CIR closer to real-world use, we construct two instruction-rich multi-modification datasets, M-FashionIQ and M-CIRR. In addition, we propose TEMA, the Text-oriented Entity Mapping Architecture, which is the first CIR framework designed for multi-modification while also accommodating simple modifications. Extensive experiments on four benchmark datasets demonstrate that TEMA's superiority in both original and multi-modification scenarios, while maintaining an optimal balance between retrieval accuracy and computational efficiency. Our codes and constructed multi-modification dataset (M-FashionIQ and M-CIRR) are available at this https URL.

TEMA：锚定图像、跟随文本的多模态图像检索方法 / TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval

1️⃣ 一句话总结

这篇论文提出了一种名为TEMA的新型图像检索框架，专门解决现有方法在用户同时提出多个修改要求时表现不佳的问题，并通过构建两个多修改数据集和一种高效的文本-图像匹配架构，在保持检索精度的同时大幅提升了处理复杂多模态查询的能力。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要