When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies

📄 Abstract - When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies

Can large language models (LLMs) generate continuous numerical features that improve reinforcement learning (RL) trading agents? We build a modular pipeline where a frozen LLM serves as a stateless feature extractor, transforming unstructured daily news and filings into a fixed-dimensional vector consumed by a downstream PPO agent. We introduce an automated prompt-optimization loop that treats the extraction prompt as a discrete hyperparameter and tunes it directly against the Information Coefficient - the Spearman rank correlation between predicted and realized returns - rather than NLP losses. The optimized prompt discovers genuinely predictive features (IC above 0.15 on held-out data). However, these valid intermediate representations do not automatically translate into downstream task performance: during a distribution shift caused by a macroeconomic shock, LLM-derived features add noise, and the augmented agent under-performs a price-only baseline. In a calmer test regime the agent recovers, yet macroeconomic state variables remain the most robust driver of policy improvement. Our findings highlight a gap between feature-level validity and policy-level robustness that parallels known challenges in transfer learning under distribution shift.

当有效信号失效时：大语言模型特征与强化学习交易策略之间的机制边界 / When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies

1️⃣ 一句话总结

这项研究发现，虽然大语言模型能从新闻中提取出对股价有预测能力的特征，但这些特征在宏观经济冲击等市场环境突变时，反而会干扰强化学习交易策略的表现，导致其不如仅使用价格信息的简单策略。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要