菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-23
📄 Abstract - Universal Guideline-Driven Image Clustering via a Hybrid LLM Agent

Unifying image clustering across different clustering scenarios remains challenging due to fundamental gaps among tasks. We introduce a Guideline-Driven Image Clustering Agent, the first universal framework that bridges these gaps through textual guidelines. To incorporate complex guidelines without task-specific training, we propose Generative Concept Proxy Modeling, which generates guideline-aware embeddings via concept proxy extraction. For scenarios requiring automatic cluster discovery, we introduce LLM Traversal based on Minimum Spanning Tree that selectively applies LLM reasoning for complex semantic judgments. Our method generalizes across diverse clustering scenarios spanning from general to fine-grained categorization, from global to local criteria, and from balanced to long-tail distributions. Our framework consistently outperforms specialized methods across diverse clustering tasks.

顶级标签: llm multi-modal computer vision
详细标签: image clustering llm agent guideline-driven minimum spanning tree concept proxy 或 搜索:

基于通用准则的混合大语言模型智能体图像聚类方法 / Universal Guideline-Driven Image Clustering via a Hybrid LLM Agent


1️⃣ 一句话总结

本文提出了一种无需任务特定训练的图像聚类通用框架,通过引入文本准则和混合大语言模型智能体,自动将图像按不同规则(如细分类别、局部特征或长尾分布)分组,并在多种复杂聚类任务上超越了现有专门方法。

源自 arXiv: 2606.24094