菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-29
📄 Abstract - Within-Model vs Between-Prompt Variability in Large Language Models for Creative Tasks

How much of LLM output variance is explained by prompts versus model choice versus stochasticity through sampling? We answer this by evaluating 12 LLMs on 10 creativity prompts with 100 samples each (N = 12,000). For output quality (originality), prompts explain 36.43% of variance, comparable to model choice (40.94%). But for output quantity (fluency), model choice (51.25%) and within-LLM variance (33.70%) dominate, with prompts explaining only 4.22%. Prompts are powerful levers for steering output quality, but given the substantial within-LLM variance (10-34%), single-sample evaluations risk conflating sampling noise with genuine prompt or model effects.

顶级标签: llm model evaluation natural language processing
详细标签: output variance prompt engineering creativity evaluation sampling stochasticity fluency vs originality 或 搜索:

大语言模型在创意任务中的模型内与提示间变异性研究 / Within-Model vs Between-Prompt Variability in Large Language Models for Creative Tasks


1️⃣ 一句话总结

这篇论文通过大规模实验发现,在评估大语言模型的创意能力时,提示语对输出质量(如原创性)的影响与模型选择相当,但对输出数量(如流畅度)影响很小,同时模型内部因随机采样产生的波动很大,因此仅凭单次测试结果可能误导对模型或提示语真实效果的判断。

源自 arXiv: 2601.21339