成为你自己的老师:通过无监督奖励优化引导蛋白质语言模型 / Be Your Own Teacher: Steering Protein Language Models via Unsupervised Reward Optimization
1️⃣ 一句话总结
本文提出了一种无需人工标注或实验反馈的方法,让蛋白质语言模型通过自身生成的样本和内置的奖励信号(结合模型不确定性和语义一致性)进行自我优化,从而在生成具有特定功能的新蛋白质序列时,达到接近有监督方法的性能,大幅降低了生物分子设计的成本。
Protein language models (PLMs) have emerged as powerful tools for controllable biomolecular design, yet their post-training adaptation typically relies on costly wet-lab validation or curated preference datasets. To overcome this supervision bottleneck, we introduce unsupervised reward optimization of PLMs, a comprehensive framework for steerable protein generation without ground-truth labels. Our key insight is that task-agnostic rewards, which combine intrinsic model uncertainty with extrinsic semantic consistency informed by protein representation models, exhibit strong correlation with controllability measures across base models and temperature regimes. Building upon this discovery, we propose two offline algorithms: Soft Reward Optimization (SRO) and Binarized Reward Optimization (BRO), which effectively maximize the classical RLHF objective induced by these proxy rewards. Extensive experiments on compositional out-of-distribution prompts demonstrate that both methods significantly outperform competitive baselines (DPO, KTO), while approaching oracle performance across multiple sampling temperatures, model scales and protein families. Moreover, PLMs fine-tuned with unsupervised rewards can achieve consistently higher coverage compared to their base model in pass@k evaluations. By enabling self-improvement of PLMs through their own generated experience, our framework provides a scalable pathway toward controllable biomolecular design in settings where labeled preferences or experimental feedback are scarce or unavailable.
成为你自己的老师:通过无监督奖励优化引导蛋白质语言模型 / Be Your Own Teacher: Steering Protein Language Models via Unsupervised Reward Optimization
本文提出了一种无需人工标注或实验反馈的方法,让蛋白质语言模型通过自身生成的样本和内置的奖励信号(结合模型不确定性和语义一致性)进行自我优化,从而在生成具有特定功能的新蛋白质序列时,达到接近有监督方法的性能,大幅降低了生物分子设计的成本。
源自 arXiv: 2606.18961