菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-02
📄 Abstract - LLM-based Embeddings: Attention Values Encode Sentence Semantics Better Than Hidden States

Sentence representations are foundational to many Natural Language Processing (NLP) applications. While recent methods leverage Large Language Models (LLMs) to derive sentence representations, most rely on final-layer hidden states, which are optimized for next-token prediction and thus often fail to capture global, sentence-level semantics. This paper introduces a novel perspective, demonstrating that attention value vectors capture sentence semantics more effectively than hidden states. We propose Value Aggregation (VA), a simple method that pools token values across multiple layers and token indices. In a training-free setting, VA outperforms other LLM-based embeddings, even matches or surpasses the ensemble-based MetaEOL. Furthermore, we demonstrate that when paired with suitable prompts, the layer attention outputs can be interpreted as aligned weighted value vectors. Specifically, the attention scores of the last token function as the weights, while the output projection matrix ($W_O$) aligns these weighted value vectors with the common space of the LLM residual stream. This refined method, termed Aligned Weighted VA (AlignedWVA), achieves state-of-the-art performance among training-free LLM-based embeddings, outperforming the high-cost MetaEOL by a substantial margin. Finally, we highlight the potential of obtaining strong LLM embedding models through fine-tuning Value Aggregation.

顶级标签: llm natural language processing model evaluation
详细标签: sentence embeddings attention mechanisms value aggregation training-free methods semantic representation 或 搜索:

基于大语言模型的嵌入:注意力值比隐藏状态更能编码句子语义 / LLM-based Embeddings: Attention Values Encode Sentence Semantics Better Than Hidden States


1️⃣ 一句话总结

这篇论文发现,从大语言模型的注意力机制中提取的‘注意力值’向量,比传统使用的最终层‘隐藏状态’能更好地捕捉句子的整体含义,并提出了一种简单有效的聚合方法,在不额外训练的情况下就达到了顶尖的句子表示效果。

源自 arXiv: 2602.01572