菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-18
📄 Abstract - BacktestBench: Benchmarking Large Language Models for Automated Quantitative Strategy Backtesting

Quantitative backtesting is essential for evaluating trading strategies but remains hampered by high technical barriers and limited scalability. While Large Language Models (LLMs) offer a transformative path to automate this complex, interdisciplinary workflow through advanced code generation, tool usage, and agentic planning, the practical realization is significantly challenged by the current lack of a large-scale benchmark dedicated to automated quantitative backtesting, which hinders progress in this field. To bridge this critical gap, we introduce BacktestBench, the first large-scale benchmark for automated quantitative backtesting. Built from over 6 million real market records, it comprises 18,246 meticulously annotated question-answering pairs across four task categories: metrics calculation, ticker selection, strategy selection, and parameter confirmation. We also propose AutoBacktest, a robust multi-agent baseline that translates natural language strategies into reproducible backtests by coordinating a Summarizer for semantic factor extraction, a Retriever for validated SQL generation, and a Coder for Python backtesting implementation. Our evaluation on 23 mainstream LLMs, complemented by targeted ablations, identifies key factors that influence end-to-end performance and highlights the importance of grounded verification and standardized indicator representations.

顶级标签: llm multi-agents financial
详细标签: benchmark backtesting quantitative finance multi-agent system code generation 或 搜索:

BacktestBench:面向自动化量化策略回测的大语言模型评测基准 / BacktestBench: Benchmarking Large Language Models for Automated Quantitative Strategy Backtesting


1️⃣ 一句话总结

本文提出了首个大规模自动化量化回测评测基准BacktestBench,包含超过1.8万个基于真实市场数据的问答任务,并设计了一个多智能体基线系统AutoBacktest,以评估和推动大语言模型在自动生成、执行和验证交易策略方面的能力。

源自 arXiv: 2605.17937