OmniGCD:实现模态无关的广义类别发现的抽象方法 / OmniGCD: Abstracting Generalized Category Discovery for Modality Agnosticism
1️⃣ 一句话总结
这项研究提出了一种名为OmniGCD的新方法,它像人脑一样抽象地学习类别,只需在合成数据上训练一次,就能直接应用于图像、文本、音频等多种类型的数据,自动发现其中已知和未知的类别,而无需针对每个新数据集重新调整模型。
Generalized Category Discovery (GCD) challenges methods to identify known and novel classes using partially labeled data, mirroring human category learning. Unlike prior GCD methods, which operate within a single modality and require dataset-specific fine-tuning, we propose a modality-agnostic GCD approach inspired by the human brain's abstract category formation. Our $\textbf{OmniGCD}$ leverages modality-specific encoders (e.g., vision, audio, text, remote sensing) to process inputs, followed by dimension reduction to construct a $\textbf{GCD latent space}$, which is transformed at test-time into a representation better suited for clustering using a novel synthetically trained Transformer-based model. To evaluate OmniGCD, we introduce a $\textbf{zero-shot GCD setting}$ where no dataset-specific fine-tuning is allowed, enabling modality-agnostic category discovery. $\textbf{Trained once on synthetic data}$, OmniGCD performs zero-shot GCD across 16 datasets spanning four modalities, improving classification accuracy for known and novel classes over baselines (average percentage point improvement of $\textbf{+6.2}$, $\textbf{+17.9}$, $\textbf{+1.5}$ and $\textbf{+12.7}$ for vision, text, audio and remote sensing). This highlights the importance of strong encoders while decoupling representation learning from category discovery. Improving modality-agnostic methods will propagate across modalities, enabling encoder development independent of GCD. Our work serves as a benchmark for future modality-agnostic GCD works, paving the way for scalable, human-inspired category discovery. All code is available $\href{this https URL}{here}$
OmniGCD:实现模态无关的广义类别发现的抽象方法 / OmniGCD: Abstracting Generalized Category Discovery for Modality Agnosticism
这项研究提出了一种名为OmniGCD的新方法,它像人脑一样抽象地学习类别,只需在合成数据上训练一次,就能直接应用于图像、文本、音频等多种类型的数据,自动发现其中已知和未知的类别,而无需针对每个新数据集重新调整模型。
源自 arXiv: 2604.14762