菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-27
📄 Abstract - Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification

Recent years have witnessed the rapid development of Large Language Model-based Multi-Agent Systems (MAS), which excel at collaborative decision-making and complex problem-solving. However, malicious agents in MAS may inject misinformation to mislead other agents and disrupt system performance, giving rise to a new research direction that focuses on attack mechanisms and defense strategies in MAS. Prior studies largely assume malicious agents act independently and investigate the corresponding defense strategies. However, we argue that malicious agents may exhibit collaborative behaviors, enabling more effective attacks through internal information exchange. In this paper, we propose an adaptive cooperative attack framework, where malicious agents autonomously coordinate and dynamically adjust their attack strategies through multi-round interactions. Furthermore, we introduce Sentence-Level Trustworthiness Analysis and Rectification (STAR), a defense framework that identifies and rectifies misleading information at the sentence level within agent communications. Our experiments show that cooperative attacks lead to a significantly larger degradation in task success rate than independent attacks, resulting in a relative drop of 5.34\%. Meanwhile, STAR effectively mitigates both cooperative and independent threats and improves task success rate by an average of 36.76\%. The code is available at this https URL.

顶级标签: llm agents natural language processing
详细标签: multi-agent systems cooperative attacks defense framework sentence-level rectification trustworthiness analysis 或 搜索:

面向合作攻击的基于句子级修正的LLM多智能体系统防御方法 / Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence-Level Rectification


1️⃣ 一句话总结

本文发现多智能体系统中的恶意智能体可以通过内部信息交换进行合作攻击,并为此提出了一种能动态协调的攻击框架,同时设计了一种句子级别的可信度分析与修正防御框架,有效识别和纠正智能体通信中的误导信息,大幅降低了合作攻击对系统任务成功率的影响。

源自 arXiv: 2605.28104