抗体-抗原复合物的计算建模:基于蛋白质语言模型与多序列比对的方法 / Computational Modeling of Antibody-Antigen Complexes: PLM-Based and MSA-Based Approaches
1️⃣ 一句话总结
该研究揭示了现有计算模型在预测抗体与抗原结合结构时效果较差的原因,并提出了两种改进策略:一是利用蛋白质语言模型提升抗体自身结构预测精度,二是通过优化多序列比对和回收机制,在不修改模型参数的情况下显著提高复合物预测的可靠性。
Antibodies play a central role in the immune response by specifically recognizing and neutralizing antigens, and therapeutic antibodies have become major drugs for cancer and autoimmune diseases. However, their discovery still relies on extensive in vitro screening, and accurate computational modeling of antibody structures and antibody-antigen interactions can prioritize candidates, reduce experimental burden, and accelerate rational design. Despite recent advances in high-accuracy protein and complex prediction, a persistent performance gap remains for antibody-related tasks compared with general protein-protein interactions, limiting downstream design. This thesis investigates why antibody-related tasks are harder and proposes improvements along two complementary directions. First, we investigate protein language model (PLM)-based methods for antibody and antibody-antigen structure prediction. Using embeddings from multiple PLMs, our approach achieves the best CDR-H3 accuracy among compared PLM-based methods on antibody monomer prediction. Extending it to complex prediction does not generalize: without co-evolutionary signals between antibody and antigen, single-sequence PLM representations do not reliably identify binding interfaces. Second, we develop two MSA-based interventions for antibody-antigen complex prediction: MSA refinement, which combines CDR-focused filtering with depth recovery from a larger sequence database, and convergence-aware recycling, which selects a stable intermediate recycle state for final diffusion sampling. Together, these interventions provide consistent gains over the AlphaFold3 baseline on a held-out antibody-antigen test set. Because the methods modify MSA construction and recycling behavior rather than model parameters, they apply without retraining or weight access.
抗体-抗原复合物的计算建模:基于蛋白质语言模型与多序列比对的方法 / Computational Modeling of Antibody-Antigen Complexes: PLM-Based and MSA-Based Approaches
该研究揭示了现有计算模型在预测抗体与抗原结合结构时效果较差的原因,并提出了两种改进策略:一是利用蛋白质语言模型提升抗体自身结构预测精度,二是通过优化多序列比对和回收机制,在不修改模型参数的情况下显著提高复合物预测的可靠性。
源自 arXiv: 2605.28886