凯瑟琳:一种无需分词或注意力机制的、基于振荡器的字节级文本分类方法 / Kathleen: Oscillator-Based Byte-Level Text Classification Without Tokenization or Attention
1️⃣ 一句话总结
这篇论文提出了一种名为‘凯瑟琳’的新型文本分类模型,它无需分词和复杂的注意力机制,直接处理原始字节数据,通过创新的频率域组件,用极少的参数实现了与大型模型相当甚至更好的性能。
We present Kathleen, a text classification architecture that operates directly on raw UTF-8 bytes using frequency-domain processing -- requiring no tokenizer, no attention mechanism, and only 733K parameters. Kathleen introduces three novel components: (1) RecurrentOscillatorBanks -- damped sinusoid convolutions with temporal memory for O(L) sequence processing; (2) an FFT-Rotate Wavetable Encoder that maps all 256 byte values using a single learnable vector (256 floats), replacing conventional embedding tables (65K parameters) while improving accuracy; (3) PhaseHarmonics -- a sinusoidal non-linearity with just 6 learnable phase parameters that our ablation identifies as the single most impactful component (+2.6% accuracy, <0.001% of model parameters). Through comprehensive ablation of a 1.8M-parameter predecessor, we show that frequency-domain components systematically outperform complex cognitive architectures: removing a 560K-parameter bio-inspired framework costs only -0.2%, while removing the 6-parameter PhaseHarmonics costs -2.6%. The resulting Kathleen-Clean achieves 88.6% on IMDB, 92.3% on AG News, and 83.3% on SST-2 -- outperforming a tokenized counterpart with 16x more parameters on IMDB (+1.6%) and AG News (+2.1%). Kathleen processes sequences in O(L) time and memory, enabling byte-level operation at sequence lengths where O(L^2) Transformers exhaust GPU memory.
凯瑟琳:一种无需分词或注意力机制的、基于振荡器的字节级文本分类方法 / Kathleen: Oscillator-Based Byte-Level Text Classification Without Tokenization or Attention
这篇论文提出了一种名为‘凯瑟琳’的新型文本分类模型,它无需分词和复杂的注意力机制,直接处理原始字节数据,通过创新的频率域组件,用极少的参数实现了与大型模型相当甚至更好的性能。
源自 arXiv: 2604.07969