菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-13
📄 Abstract - When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies

Can large language models (LLMs) generate continuous numerical features that improve reinforcement learning (RL) trading agents? We build a modular pipeline where a frozen LLM serves as a stateless feature extractor, transforming unstructured daily news and filings into a fixed-dimensional vector consumed by a downstream PPO agent. We introduce an automated prompt-optimization loop that treats the extraction prompt as a discrete hyperparameter and tunes it directly against the Information Coefficient - the Spearman rank correlation between predicted and realized returns - rather than NLP losses. The optimized prompt discovers genuinely predictive features (IC above 0.15 on held-out data). However, these valid intermediate representations do not automatically translate into downstream task performance: during a distribution shift caused by a macroeconomic shock, LLM-derived features add noise, and the augmented agent under-performs a price-only baseline. In a calmer test regime the agent recovers, yet macroeconomic state variables remain the most robust driver of policy improvement. Our findings highlight a gap between feature-level validity and policy-level robustness that parallels known challenges in transfer learning under distribution shift.

顶级标签: llm reinforcement learning financial
详细标签: trading agents feature extraction prompt optimization distribution shift policy robustness 或 搜索:

当有效信号失效时:大语言模型特征与强化学习交易策略之间的机制边界 / When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies


1️⃣ 一句话总结

这项研究发现,虽然大语言模型能从新闻中提取出对股价有预测能力的特征,但这些特征在宏观经济冲击等市场环境突变时,反而会干扰强化学习交易策略的表现,导致其不如仅使用价格信息的简单策略。

源自 arXiv: 2604.10996