当有效信号失效时:大语言模型特征与强化学习交易策略之间的机制边界 / When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies
1️⃣ 一句话总结
这项研究发现,虽然大语言模型能从新闻中提取出对股价有预测能力的特征,但这些特征在宏观经济冲击等市场环境突变时,反而会干扰强化学习交易策略的表现,导致其不如仅使用价格信息的简单策略。
Can large language models (LLMs) generate continuous numerical features that improve reinforcement learning (RL) trading agents? We build a modular pipeline where a frozen LLM serves as a stateless feature extractor, transforming unstructured daily news and filings into a fixed-dimensional vector consumed by a downstream PPO agent. We introduce an automated prompt-optimization loop that treats the extraction prompt as a discrete hyperparameter and tunes it directly against the Information Coefficient - the Spearman rank correlation between predicted and realized returns - rather than NLP losses. The optimized prompt discovers genuinely predictive features (IC above 0.15 on held-out data). However, these valid intermediate representations do not automatically translate into downstream task performance: during a distribution shift caused by a macroeconomic shock, LLM-derived features add noise, and the augmented agent under-performs a price-only baseline. In a calmer test regime the agent recovers, yet macroeconomic state variables remain the most robust driver of policy improvement. Our findings highlight a gap between feature-level validity and policy-level robustness that parallels known challenges in transfer learning under distribution shift.
当有效信号失效时:大语言模型特征与强化学习交易策略之间的机制边界 / When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies
这项研究发现,虽然大语言模型能从新闻中提取出对股价有预测能力的特征,但这些特征在宏观经济冲击等市场环境突变时,反而会干扰强化学习交易策略的表现,导致其不如仅使用价格信息的简单策略。
源自 arXiv: 2604.10996