基于大语言模型的科学方程发现:通过物理信息令牌正则化策略优化 / LLM-Based Scientific Equation Discovery via Physics-Informed Token-Regularized Policy Optimization
1️⃣ 一句话总结
这篇论文提出了一种名为PiT-PO的新方法,它通过强化学习训练大语言模型,使其能自动从数据中发现既符合物理规律又简洁有效的数学方程,并在流体力学等复杂问题上取得了优异表现。
Symbolic regression aims to distill mathematical equations from observational data. Recent approaches have successfully leveraged Large Language Models (LLMs) to generate equation hypotheses, capitalizing on their vast pre-trained scientific priors. However, existing frameworks predominantly treat the LLM as a static generator, relying on prompt-level guidance to steer exploration. This paradigm fails to update the model's internal representations based on search feedback, often yielding physically inconsistent or mathematically redundant expressions. In this work, we propose PiT-PO (Physics-informed Token-regularized Policy Optimization), a unified framework that evolves the LLM into an adaptive generator via reinforcement learning. Central to PiT-PO is a dual-constraint mechanism that rigorously enforces hierarchical physical validity while simultaneously applying fine-grained, token-level penalties to suppress redundant structures. Consequently, PiT-PO aligns LLM to produce equations that are both scientifically consistent and structurally parsimonious. Empirically, PiT-PO achieves state-of-the-art performance on standard benchmarks and successfully discovers novel turbulence models for challenging fluid dynamics problems. We also demonstrate that PiT-PO empowers small-scale models to outperform closed-source giants, democratizing access to high-performance scientific discovery.
基于大语言模型的科学方程发现:通过物理信息令牌正则化策略优化 / LLM-Based Scientific Equation Discovery via Physics-Informed Token-Regularized Policy Optimization
这篇论文提出了一种名为PiT-PO的新方法,它通过强化学习训练大语言模型,使其能自动从数据中发现既符合物理规律又简洁有效的数学方程,并在流体力学等复杂问题上取得了优异表现。
源自 arXiv: 2602.10576