菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-24
📄 Abstract - SOM-VQ: Topology-Aware Tokenization for Interactive Generative Models

Vector-quantized representations enable powerful discrete generative models but lack semantic structure in token space, limiting interpretable human control. We introduce SOM-VQ, a tokenization method that combines vector quantization with Self-Organizing Maps to learn discrete codebooks with explicit low-dimensional topology. Unlike standard VQ-VAE, SOM-VQ uses topology-aware updates that preserve neighborhood structure: nearby tokens on a learned grid correspond to semantically similar states, enabling direct geometric manipulation of the latent space. We demonstrate that SOM-VQ produces more learnable token sequences in the evaluated domains while providing an explicit navigable geometry in code space. Critically, the topological organization enables intuitive human-in-the-loop control: users can steer generation by manipulating distances in token space, achieving semantic alignment without frame-level constraints. We focus on human motion generation - a domain where kinematic structure, smooth temporal continuity, and interactive use cases (choreography, rehabilitation, HCI) make topology-aware control especially natural - demonstrating controlled divergence and convergence from reference sequences through simple grid-based sampling. SOM-VQ provides a general framework for interpretable discrete representations applicable to music, gesture, and other interactive generative domains.

顶级标签: model training multi-modal aigc
详细标签: vector quantization self-organizing maps generative models interpretability human-in-the-loop 或 搜索:

SOM-VQ:面向交互式生成模型的拓扑感知分词方法 / SOM-VQ: Topology-Aware Tokenization for Interactive Generative Models


1️⃣ 一句话总结

这篇论文提出了一种名为SOM-VQ的新方法,它将向量量化与自组织映射相结合,为生成模型学习具有明确拓扑结构的离散编码,使得用户能够通过直观地操作编码空间中的距离来控制和引导生成过程,特别适用于需要人机交互的领域,如人体运动生成。

源自 arXiv: 2602.21133