NeuronSpark:一种具有选择性状态空间动态的脉冲神经网络语言模型 / NeuronSpark: A Spiking Neural Network Language Model with Selective State Space Dynamics
1️⃣ 一句话总结
这篇论文提出了一个名为NeuronSpark的纯脉冲神经网络语言模型,它通过一系列创新技术,首次证明了不依赖Transformer蒸馏、仅从随机初始化开始,纯脉冲神经网络架构也能在大规模语言建模任务上取得有希望的结果。
We ask whether a pure spiking backbone can learn large-scale language modeling from random initialization, without Transformer distillation. We introduce NeuronSpark, a 0.9B-parameter SNN language model trained with next-token prediction and surrogate gradients. The model combines selective state-space spiking dynamics, leakage-current inter-layer communication, PonderNet adaptive timesteps, fused Triton PLIF kernels, and stabilization techniques (residual centering, lateral-inhibition normalization, and natural-gradient compensation). Under a constrained budget (about 1.4B pretraining tokens and 6.5K SFT steps), NeuronSpark-0.9B reaches 3.6 pretraining loss and shows early multi-turn dialogue behavior after SFT. These results support the feasibility of end-to-end language modeling with a pure SNN architecture at this scale.
NeuronSpark:一种具有选择性状态空间动态的脉冲神经网络语言模型 / NeuronSpark: A Spiking Neural Network Language Model with Selective State Space Dynamics
这篇论文提出了一个名为NeuronSpark的纯脉冲神经网络语言模型,它通过一系列创新技术,首次证明了不依赖Transformer蒸馏、仅从随机初始化开始,纯脉冲神经网络架构也能在大规模语言建模任务上取得有希望的结果。
源自 arXiv: 2603.16148