基于结构化扰动的LLM资助提案评审能力评估 / Evaluating LLM-Based Grant Proposal Review via Structured Perturbations
1️⃣ 一句话总结
这项研究通过系统性地修改提案内容来测试大语言模型在评审科研资助申请时的能力,发现分章节评审效果最好,但模型普遍擅长检查格式合规性,而难以评估整体质量与清晰度,目前只能作为人工评审的辅助工具。
As AI-assisted grant proposals outpace manual review capacity in a kind of ``Malthusian trap'' for the research ecosystem, this paper investigates the capabilities and limitations of LLM-based grant reviewing for high-stakes evaluation. Using six EPSRC proposals, we develop a perturbation-based framework probing LLM sensitivity across six quality axes: funding, timeline, competency, alignment, clarity, and impact. We compare three review architectures: single-pass review, section-by-section analysis, and a 'Council of Personas' ensemble emulating expert panels. The section-level approach significantly outperforms alternatives in both detection rate and scoring reliability, while the computationally expensive council method performs no better than baseline. Detection varies substantially by perturbation type, with alignment issues readily identified but clarity flaws largely missed by all systems. Human evaluation shows LLM feedback is largely valid but skewed toward compliance checking over holistic assessment. We conclude that current LLMs may provide supplementary value within EPSRC review but exhibit high variability and misaligned review priorities. We release our code and any non-protected data.
基于结构化扰动的LLM资助提案评审能力评估 / Evaluating LLM-Based Grant Proposal Review via Structured Perturbations
这项研究通过系统性地修改提案内容来测试大语言模型在评审科研资助申请时的能力,发现分章节评审效果最好,但模型普遍擅长检查格式合规性,而难以评估整体质量与清晰度,目前只能作为人工评审的辅助工具。
源自 arXiv: 2603.08281