Recurrent Deep Reinforcement Learning for Chemotherapy Control under Partial Observability

📄 Abstract - Recurrent Deep Reinforcement Learning for Chemotherapy Control under Partial Observability

Chemotherapy dose optimization can be formulated as a dynamic treatment regime, requiring sequential decisions under uncertainty that must balance tumor suppression against toxicity. However, most reinforcement learning approaches assume full observability of the patient state, a condition rarely met in clinical practice. We investigate whether memory-augmented policies can improve chemotherapy control under partial observability. To this end, we employ a recurrent TD3-based approach with separate LSTM actor-critic networks and evaluate it on the AhnChemoEnv benchmark from DTR-Bench, considering both off-policy and on-policy recurrent architectures against feed-forward TD3 and Soft Actor-Critic. Pharmacokinetic and pharmacodynamic variability are held fixed to isolate hidden-state uncertainty and observation noise and to avoid confounding effects from inter-patient variability. Across ten random seeds, recurrence yields modest benefit under full observability but substantially stronger and more stable performance under partial observability, with more consistent tumor suppression and improved normal-cell preservation. These findings indicate that memory-based policies are particularly beneficial when clinically relevant state information is incomplete or noisy.

基于循环深度强化学习的部分可观测条件下的化疗控制 / Recurrent Deep Reinforcement Learning for Chemotherapy Control under Partial Observability

1️⃣ 一句话总结

本研究提出一种结合记忆机制的循环深度强化学习算法，通过在部分可观测（即临床信息不完整或带有噪声）条件下进行化疗剂量优化，显著提升了肿瘤抑制效果并更好保护了正常细胞，相比传统方法更稳定有效。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要