Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition

📄 Abstract - Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition

The ability of large language models (LLMs) to manage and acquire economic resources remains unclear. In this paper, we introduce \textbf{Market-Bench}, a comprehensive benchmark that evaluates the capabilities of LLMs in economically-relevant tasks through economic and trade competition. Specifically, we construct a configurable multi-agent supply chain economic model where LLMs act as retailer agents responsible for procuring and retailing merchandise. In the \textbf{procurement} stage, LLMs bid for limited inventory in budget-constrained auctions. In the \textbf{retail} stage, LLMs set retail prices, generate marketing slogans, and provide them to buyers through a role-based attention mechanism for purchase. Market-Bench logs complete trajectories of bids, prices, slogans, sales, and balance-sheet states, enabling automatic evaluation with economic, operational, and semantic metrics. Benchmarking on 20 open- and closed-source LLM agents reveals significant performance disparities and winner-take-most phenomenon, \textit{i.e.}, only a small subset of LLM retailers can consistently achieve capital appreciation, while many hover around the break-even point despite similar semantic matching scores. Market-Bench provides a reproducible testbed for studying how LLMs interact in competitive markets.

市场基准：评估大语言模型在经济与贸易竞争中的表现 / Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition

1️⃣ 一句话总结

这篇论文提出了一个名为Market-Bench的评估框架，通过模拟多智能体供应链中的采购与零售竞争，来测试大语言模型在经济资源管理和贸易决策中的实际能力，发现只有少数模型能持续盈利，多数模型表现平平。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要