菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-16
📄 Abstract - Training-Free Generation of Protein Sequences from Small Family Alignments via Stochastic Attention

Most protein families have fewer than 100 known members, a regime where deep generative models overfit or collapse. We propose stochastic attention (SA), a training-free sampler that treats the modern Hopfield energy over a protein alignment as a Boltzmann distribution and draws samples via Langevin dynamics. The score function is a closed-form softmax attention operation requiring no training, no pretraining data, and no GPU, with cost linear in alignment size. Across eight Pfam families, SA generates sequences with low amino acid compositional divergence, substantial novelty, and structural plausibility confirmed by ESMFold and AlphaFold2. Generated sequences fold more faithfully to canonical family structures than natural members in six of eight families. Against profile HMMs, EvoDiff, and the MSA Transformer, which produce sequences that drift far outside the family, SA maintains 51 to 66 percent identity while remaining novel, in seconds on a laptop. The critical temperature governing generation is predicted from PCA dimensionality alone, enabling fully automatic operation. Controls confirm SA encodes correlated substitution patterns, not just per-position amino acid frequencies.

顶级标签: biology model training machine learning
详细标签: protein sequence generation training-free sampling stochastic attention langevin dynamics generative models 或 搜索:

基于随机注意力的小型蛋白质家族比对的无训练序列生成 / Training-Free Generation of Protein Sequences from Small Family Alignments via Stochastic Attention


1️⃣ 一句话总结

这项研究提出了一种无需训练、无需大量数据的‘随机注意力’方法,能够仅根据少量已知的蛋白质家族成员信息,快速生成结构合理且具有新颖性的新蛋白质序列。

源自 arXiv: 2603.14717