菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-26
📄 Abstract - To Write or to Automate Linguistic Prompts, That Is the Question

LLM performance is highly sensitive to prompt design, yet whether automatic prompt optimization can replace expert prompt engineering in linguistic tasks remains unexplored. We present the first systematic comparison of hand-crafted zero-shot expert prompts, base DSPy signatures, and GEPA-optimized DSPy signatures across translation, terminology insertion, and language quality assessment, evaluating five model configurations. Results are task-dependent. In terminology insertion, optimized and manual prompts produce mostly statistically indistinguishable quality. In translation, each approach wins on different models. In LQA, expert prompts achieve stronger error detection while optimization improves characterization. Across all tasks, GEPA elevates minimal DSPy signatures, and the majority of expert-optimized comparisons show no statistically significant difference. We note that the comparison is asymmetric: GEPA optimization searches programmatically over gold-standard splits, whereas expert prompts require in principle no labeled data, relying instead on domain expertise and iterative refinement.

顶级标签: llm natural language processing model evaluation
详细标签: prompt optimization prompt engineering dspy gepa zero-shot 或 搜索:

语言提示:手动撰写还是自动生成,这是个问题 / To Write or to Automate Linguistic Prompts, That Is the Question


1️⃣ 一句话总结

这篇论文首次系统性地比较了专家手动设计的提示词与自动优化提示词在翻译、术语插入和语言质量评估等语言任务上的表现,发现两者在多数情况下效果相当,但各有优势,且自动优化能显著提升基础提示的性能。

源自 arXiv: 2603.25169