GMP: A Benchmark for Content Moderation under Co-occurring Violations and Dynamic Rules

📄 Abstract - GMP: A Benchmark for Content Moderation under Co-occurring Violations and Dynamic Rules

Online content moderation is essential for maintaining a healthy digital environment, and reliance on AI for this task continues to grow. Consider a user comment using national stereotypes to insult a politician. This example illustrates two critical challenges in real-world scenarios: (1) Co-occurring Violations, where a single post violates multiple policies (e.g., prejudice and personal attacks); (2) Dynamic rules of moderation, where determination of a violation depends on platform-specific guidelines that evolve across contexts . The intersection of co-occurring harms and dynamically changing rules highlights a core limitation of current AI systems: although large language models (LLMs) are adept at following fixed guidelines, their judgment capabilities degrade when policies are unstable or context-dependent . In practice, such shortcomings lead to inconsistent moderation: either erroneously restricting legitimate expression or allowing harmful content to remain online . This raises a critical question for evaluation: Does high performance on existing static benchmarks truly guarantee robust generalization of AI judgment to real-world scenarios involving co-occurring violations and dynamically changing rules?

GMP：一个针对规则共存违规与动态规则下的内容审核基准 / GMP: A Benchmark for Content Moderation under Co-occurring Violations and Dynamic Rules

1️⃣ 一句话总结

这篇论文提出了一个名为GMP的新基准，用于测试AI在内容审核中处理‘一条内容同时违反多条规则’和‘审核规则动态变化’这两个现实难题的能力，揭示了当前大语言模型在复杂、动态的真实场景下判断力会下降的问题。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要