菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-15
📄 Abstract - Foresight Optimization for Strategic Reasoning in Large Language Models

Reasoning capabilities in large language models (LLMs) have generally advanced significantly. However, it is still challenging for existing reasoning-based LLMs to perform effective decision-making abilities in multi-agent environments, due to the absence of explicit foresight modeling. To this end, strategic reasoning, the most fundamental capability to anticipate the counterpart's behaviors and foresee its possible future actions, has been introduced to alleviate the above issues. Strategic reasoning is fundamental to effective decision-making in multi-agent environments, yet existing reasoning enhancement methods for LLMs do not explicitly capture its foresight nature. In this work, we introduce Foresight Policy Optimization (FoPO) to enhance strategic reasoning in LLMs, which integrates opponent modeling principles into policy optimization, thereby enabling explicit consideration of both self-interest and counterpart influence. Specifically, we construct two curated datasets, namely Cooperative RSA and Competitive Taboo, equipped with well-designed rules and moderate difficulty to facilitate a systematic investigation of FoPO in a self-play framework. Our experiments demonstrate that FoPO significantly enhances strategic reasoning across LLMs of varying sizes and origins. Moreover, models trained with FoPO exhibit strong generalization to out-of-domain strategic scenarios, substantially outperforming standard LLM reasoning optimization baselines.

顶级标签: llm agents model training
详细标签: strategic reasoning foresight optimization multi-agent environments policy optimization opponent modeling 或 搜索:

大语言模型战略推理的前瞻性优化 / Foresight Optimization for Strategic Reasoning in Large Language Models


1️⃣ 一句话总结

这篇论文提出了一种名为‘前瞻性策略优化’的新方法,通过让大语言模型在决策时不仅考虑自身利益,还能预测并模拟对手的潜在行动,从而显著提升了它们在多智能体合作或竞争环境中的战略决策和泛化能力。

源自 arXiv: 2604.13592