菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-22
📄 Abstract - Brain-Grounded Axes for Reading and Steering LLM States

Interpretability methods for large language models (LLMs) typically derive directions from textual supervision, which can lack external grounding. We propose using human brain activity not as a training signal but as a coordinate system for reading and steering LLM states. Using the SMN4Lang MEG dataset, we construct a word-level brain atlas of phase-locking value (PLV) patterns and extract latent axes via ICA. We validate axes with independent lexica and NER-based labels (POS/log-frequency used as sanity checks), then train lightweight adapters that map LLM hidden states to these brain axes without fine-tuning the LLM. Steering along the resulting brain-derived directions yields a robust lexical (frequency-linked) axis in a mid TinyLlama layer, surviving perplexity-matched controls, and a brain-vs-text probe comparison shows larger log-frequency shifts (relative to the text probe) with lower perplexity for the brain axis. A function/content axis (axis 13) shows consistent steering in TinyLlama, Qwen2-0.5B, and GPT-2, with PPL-matched text-level corroboration. Layer-4 effects in TinyLlama are large but inconsistent, so we treat them as secondary (Appendix). Axis structure is stable when the atlas is rebuilt without GPT embedding-change features or with word2vec embeddings (|r|=0.64-0.95 across matched axes), reducing circularity concerns. Exploratory fMRI anchoring suggests potential alignment for embedding change and log frequency, but effects are sensitive to hemodynamic modeling assumptions and are treated as population-level evidence only. These results support a new interface: neurophysiology-grounded axes provide interpretable and controllable handles for LLM behavior.

顶级标签: llm natural language processing theory
详细标签: brain-computer interface model interpretability neural representation model steering lightweight adapter 或 搜索:

基于人类大脑活动构建坐标系统以解读和调控大语言模型 / Brain-Grounded Axes for Reading and Steering LLM States


1️⃣ 一句话总结

本研究提出了一种创新方法,利用人类大脑活动(MEG数据)而非文本监督来构建一个外部、稳定的坐标系统,并通过训练轻量级适配器将大语言模型(LLM)的内部状态映射到这些“大脑轴”上,从而实现对模型行为的可解释且可控的调控,而无需微调模型本身。


2️⃣ 论文创新点

1. 基于大脑活动图谱的坐标系统构建

2. 轻量级适配器映射与无微调控

3. 鲁棒且跨模型的大脑轴验证

4. 大脑轴引导在提升流畅性方面优于纯文本探针

5. 严格的轴验证与消融分析


3️⃣ 主要结果与价值

结果亮点

实际价值


4️⃣ 术语表

源自 arXiv: 2512.19399