ARA: Agentic Reproducibility Assessment For Scalable Support Of Scientific Peer-Review

📄 Abstract - ARA: Agentic Reproducibility Assessment For Scalable Support Of Scientific Peer-Review

Scientific peer review increasingly struggles to assess reproducibility at the scale and complexity of modern research output. Evaluating reproducibility requires reconstructing experimental dependencies, methodological choices, data flows, and result-generating procedures, which often exceeds what human reviewers can provide. Agentic Reproducibility Assessment (ARA) formalizes reproducibility assessment as a structured reasoning task over scientific documents. Given a paper, ARA extracts a directed workflow graph linking sources, methods, experiments, and outputs, then evaluates its reconstructability using structural and content-based scores for reproducibility assessments. Experiments on 213 ReScience C articles - the largest cross-domain benchmark of human-validated computational reproducibility studies considered to date - demonstrate ARA's generalizability and consistent workflow reconstruction and assessment across LLMs, model temperatures, and scientific domains. ARA achieves ~61% accuracy on three benchmarks, and the highest accuracy reported on ReproBench (60.71% vs. 36.84%) and GoldStandardDB (61.68% vs. 43.56%), highlighting its potential to complement human review at scale and enabling next-generation peer review. Code and Data available: this https URL.

ARA：面向可扩展科学同行评审的代理式可重复性评估 / ARA: Agentic Reproducibility Assessment For Scalable Support Of Scientific Peer-Review

1️⃣ 一句话总结

本文提出一种名为ARA的AI方法，能自动从论文中提取研究的工作流程（如数据来源、方法、实验步骤等），并像人类审稿人一样判断这些流程是否可以被重现，从而大大扩展了同行评审中可重复性检查的规模和效率。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要