Training-Free Generation of Protein Sequences from Small Family Alignments via Stochastic Attention

📄 Abstract - Training-Free Generation of Protein Sequences from Small Family Alignments via Stochastic Attention

Most protein families have fewer than 100 known members, a regime where deep generative models overfit or collapse. We propose stochastic attention (SA), a training-free sampler that treats the modern Hopfield energy over a protein alignment as a Boltzmann distribution and draws samples via Langevin dynamics. The score function is a closed-form softmax attention operation requiring no training, no pretraining data, and no GPU, with cost linear in alignment size. Across eight Pfam families, SA generates sequences with low amino acid compositional divergence, substantial novelty, and structural plausibility confirmed by ESMFold and AlphaFold2. Generated sequences fold more faithfully to canonical family structures than natural members in six of eight families. Against profile HMMs, EvoDiff, and the MSA Transformer, which produce sequences that drift far outside the family, SA maintains 51 to 66 percent identity while remaining novel, in seconds on a laptop. The critical temperature governing generation is predicted from PCA dimensionality alone, enabling fully automatic operation. Controls confirm SA encodes correlated substitution patterns, not just per-position amino acid frequencies.

基于随机注意力的小型蛋白质家族比对的无训练序列生成 / Training-Free Generation of Protein Sequences from Small Family Alignments via Stochastic Attention

1️⃣ 一句话总结

这项研究提出了一种无需训练、无需大量数据的‘随机注意力’方法，能够仅根据少量已知的蛋白质家族成员信息，快速生成结构合理且具有新颖性的新蛋白质序列。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要