菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-02
📄 Abstract - Agentic Code Reasoning

Can LLM agents explore codebases and reason about code semantics without executing the code? We study this capability, which we call agentic code reasoning, and introduce semi-formal reasoning: a structured prompting methodology that requires agents to construct explicit premises, trace execution paths, and derive formal conclusions. Unlike unstructured chain-of-thought, semi-formal reasoning acts as a certificate: the agent cannot skip cases or make unsupported claims. We evaluate across three tasks (patch equivalence verification, fault localization, and code question answering) and show that semi-formal reasoning consistently improves accuracy on all of them. For patch equivalence, accuracy improves from 78% to 88% on curated examples and reaches 93% on real-world agent-generated patches, approaching the reliability needed for execution-free RL reward signals. For code question answering on RubberDuckBench Mohammad et al. (2026), semi-formal reasoning achieves 87% accuracy. For fault localization on Defects4J Just et al. (2014), semi-formal reasoning improves Top-5 accuracy by 5 percentage points over standard reasoning. These results demonstrate that structured agentic reasoning enables meaningful semantic code analysis without execution, opening practical applications in RL training pipelines, code review, and static program analysis.

顶级标签: llm agents model evaluation
详细标签: code reasoning program analysis structured prompting semantic analysis agent evaluation 或 搜索:

智能体代码推理 / Agentic Code Reasoning


1️⃣ 一句话总结

这篇论文提出了一种名为‘半形式化推理’的结构化提示方法,让大语言模型智能体能够在不实际执行代码的情况下,通过构建明确的前提、追踪执行路径和推导正式结论,来探索代码库并理解代码语义,从而在多个代码分析任务中显著提升了准确性。

源自 arXiv: 2603.01896