Avey-B:一种高效的非自回归双向编码器 / Avey-B
1️⃣ 一句话总结
这篇论文将原本用于自回归任务的Avey模型改造成了一个高效的双向编码器Avey-B,通过引入参数分离、稳定性优化和神经压缩等新技术,使其在多项文本理解任务上的表现超越了传统的Transformer编码器,同时能更高效地处理长文本。
Compact pretrained bidirectional encoders remain the backbone of industrial NLP under tight compute and memory budgets. Their effectiveness stems from self-attention's ability to deliver high-quality bidirectional contextualization with sequence-level parallelism, as popularized by BERT-style architectures. Recently, Avey was introduced as an autoregressive, attention-free alternative that naturally admits an encoder-only adaptation. In this paper, we reformulate Avey for the encoder-only paradigm and propose several innovations to its architecture, including decoupled static and dynamic parameterizations, stability-oriented normalization, and neural compression. Results show that this reformulated architecture compares favorably to four widely used Transformer-based encoders, consistently outperforming them on standard token-classification and information-retrieval benchmarks while scaling more efficiently to long contexts.
Avey-B:一种高效的非自回归双向编码器 / Avey-B
这篇论文将原本用于自回归任务的Avey模型改造成了一个高效的双向编码器Avey-B,通过引入参数分离、稳定性优化和神经压缩等新技术,使其在多项文本理解任务上的表现超越了传统的Transformer编码器,同时能更高效地处理长文本。
源自 arXiv: 2602.15814