Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls

📄 Abstract - Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls

Test-time adaptation enables large language models (LLMs) to modify their behavior at inference without updating model parameters. A common approach is many-shot prompting, where large numbers of in-context learning (ICL) examples are injected as an input-space test-time update. Although performance can improve as more demonstrations are added, the reliability and limits of this update mechanism remain poorly understood, particularly for open-source models. We present an empirical study of many-shot prompting across tasks and model backbones, analyzing how performance varies with update magnitude, example ordering, and selection policy. We further study Dynamic and Reinforced ICL as alternative test-time update strategies that control which information is injected and how it constrains model behavior. We find that many-shot prompting is effective for structured tasks where demonstrations provide high information gain, but is highly sensitive to selection strategy and often shows limited benefits for open-ended generation tasks. Overall, we characterize the practical limits of prompt-based test-time adaptation and outline when input-space updates are beneficial versus harmful.

通过多样本提示进行测试时适应：优势、局限与陷阱 / Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls

1️⃣ 一句话总结

这篇论文通过实验研究发现，在推理时给大语言模型输入大量示例（多样本提示）可以有效提升其在结构化任务上的表现，但这种方法的有效性高度依赖于示例的选择策略，并且对开放式生成任务帮助有限，甚至可能有害。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要