菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-19
📄 Abstract - Robotics-Inspired Guardrails for Foundation Models in Socially Sensitive Domains

Foundation models are increasingly deployed in socially sensitive domains such as education, mental health, and caregiving, where failures are often cumulative and context-dependent. Existing guardrail approaches -- ranging from training-time alignment to prompting, decoding constraints, and post-hoc moderation -- primarily provide empirical risk reduction rather than enforceable behavioral guarantees, and largely treat safety as a property of individual outputs rather than interaction trajectories. We reframe guardrails as a problem of runtime behavioral control over interaction trajectories, drawing on robotics to introduce formal constructs for constraint enforcement in uncertain, closed-loop systems. We instantiate these ideas in the Grounded Observer framework and apply it across three real-world deployments: small talk, in-home autism therapy, and behavioral de-escalation in schools. Across settings, the framework enables runtime interventions that mitigate drift into undesirable interaction regimes while adapting to diverse social contexts. We discuss extensions to the framework and propose research directions toward stronger guarantees.

顶级标签: llm systems
详细标签: guardrails foundation models safety runtime control socially sensitive 或 搜索:

面向社交敏感领域基础模型的机器人启发式护栏方法 / Robotics-Inspired Guardrails for Foundation Models in Socially Sensitive Domains


1️⃣ 一句话总结

本文借鉴机器人控制思想,提出一种能在教育、心理健康等敏感领域实时约束AI对话行为、防止有害交互轨迹的框架,并在自闭症治疗、学校行为干预等实际场景中验证了其有效性。

源自 arXiv: 2605.19940