菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-16
📄 Abstract - OmniGCD: Abstracting Generalized Category Discovery for Modality Agnosticism

Generalized Category Discovery (GCD) challenges methods to identify known and novel classes using partially labeled data, mirroring human category learning. Unlike prior GCD methods, which operate within a single modality and require dataset-specific fine-tuning, we propose a modality-agnostic GCD approach inspired by the human brain's abstract category formation. Our $\textbf{OmniGCD}$ leverages modality-specific encoders (e.g., vision, audio, text, remote sensing) to process inputs, followed by dimension reduction to construct a $\textbf{GCD latent space}$, which is transformed at test-time into a representation better suited for clustering using a novel synthetically trained Transformer-based model. To evaluate OmniGCD, we introduce a $\textbf{zero-shot GCD setting}$ where no dataset-specific fine-tuning is allowed, enabling modality-agnostic category discovery. $\textbf{Trained once on synthetic data}$, OmniGCD performs zero-shot GCD across 16 datasets spanning four modalities, improving classification accuracy for known and novel classes over baselines (average percentage point improvement of $\textbf{+6.2}$, $\textbf{+17.9}$, $\textbf{+1.5}$ and $\textbf{+12.7}$ for vision, text, audio and remote sensing). This highlights the importance of strong encoders while decoupling representation learning from category discovery. Improving modality-agnostic methods will propagate across modalities, enabling encoder development independent of GCD. Our work serves as a benchmark for future modality-agnostic GCD works, paving the way for scalable, human-inspired category discovery. All code is available $\href{this https URL}{here}$

顶级标签: multi-modal machine learning model evaluation
详细标签: generalized category discovery zero-shot learning modality-agnostic clustering representation learning 或 搜索:

OmniGCD:实现模态无关的广义类别发现的抽象方法 / OmniGCD: Abstracting Generalized Category Discovery for Modality Agnosticism


1️⃣ 一句话总结

这项研究提出了一种名为OmniGCD的新方法,它像人脑一样抽象地学习类别,只需在合成数据上训练一次,就能直接应用于图像、文本、音频等多种类型的数据,自动发现其中已知和未知的类别,而无需针对每个新数据集重新调整模型。

源自 arXiv: 2604.14762