GEO-Bench:生成式引擎优化中的排名操纵基准测试 / GEO-Bench: Benchmarking Ranking Manipulation in Generative Engine Optimization
1️⃣ 一句话总结
该论文提出了一个名为GEO-Bench的统一基准测试平台,系统比较了多种利用大语言模型生成结果进行排名操纵的攻击方法,发现攻击的有效性与隐蔽性之间存在权衡,且黑盒内容重写方法在流畅性和绕过检测方面表现突出,为评估和防御这类操纵提供了标准化的评估工具。
Large language models (LLMs) increasingly rank products, documents, and recommendations for user queries, which makes manipulating these rankings a growing concern for fairness and information integrity. Research on generative engine optimization (GEO) has produced many manipulation methods, but each is evaluated on its own dataset with its own metrics, so their relative strength and detectability stay unclear. We present GEO-Bench, a benchmark that evaluates GEO ranking-manipulation attacks under one protocol. It unifies black-box prompt-based attacks (TAP, Zero-Shot), white-box gradient-based attacks (STS, RAF, StealthRank), and ten white-hat C-SEO strategies. We score every method on five datasets against a fixed open-weight ranker (Llama-3.1-8B-Instruct), using metrics for both effectiveness (NRG, Success@{\alpha}, Promote@{\alpha}) and stealth (keyword violation rate, perplexity ratio). Our evaluation shows that effectiveness and stealth trade off across adversarial attacks, that black-box content rewriting matches or exceeds gradient-based attacks on rank promotion while producing more fluent text and can evade both keyword- and perplexity-based detection on some domains, and that the access model does not predict attack strength. By standardizing datasets, attack implementations, and metrics, GEO-Bench enables the first direct comparison across these attack paradigms and supports the development of detection methods.
GEO-Bench:生成式引擎优化中的排名操纵基准测试 / GEO-Bench: Benchmarking Ranking Manipulation in Generative Engine Optimization
该论文提出了一个名为GEO-Bench的统一基准测试平台,系统比较了多种利用大语言模型生成结果进行排名操纵的攻击方法,发现攻击的有效性与隐蔽性之间存在权衡,且黑盒内容重写方法在流畅性和绕过检测方面表现突出,为评估和防御这类操纵提供了标准化的评估工具。
源自 arXiv: 2605.29107