Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization

📄 Abstract - Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization

As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to instability when highly non-linear policies induce extreme local curvature in the inner maximization. Standard remedies that enforce global Jacobian bounds are overly conservative, suppressing sensitivity in all directions and inducing a large Price of Robustness. We introduce Adversarially-Aligned Jacobian Regularization (AAJR), a trajectory-aligned approach that controls sensitivity strictly along adversarial ascent directions. We prove that AAJR yields a strictly larger admissible policy class than global constraints under mild conditions, implying a weakly smaller approximation gap and reduced nominal performance degradation. Furthermore, we derive step-size conditions under which AAJR controls effective smoothness along optimization trajectories and ensures inner-loop stability. These results provide a structural theory for agentic robustness that decouples minimax stability from global expressivity restrictions.

基于对抗对齐雅可比正则化的智能体AI系统鲁棒性研究 / Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization

1️⃣ 一句话总结

本文提出了一种名为‘对抗对齐雅可比正则化’的新方法，专门用于提升由大语言模型驱动的自主智能体系统的稳定性，它通过精准控制智能体在对抗攻击方向上的敏感度，在保证系统安全的同时，最大程度地保留了其原有的优秀性能。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要