菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-11
📄 Abstract - CacheSolidarity: Preventing Prefix Caching Side Channels in Multi-tenant LLM Serving Systems

Large Language Models (LLMs) rely on optimizations like Automatic Prefix Caching (APC) to accelerate inference. APC works by reusing previously computed states for the beginning part of a request (prefix), when another request starts with the same text. While APC improves throughput, it introduces timing side channels: cache hits are faster than misses, creating observable latency differences. In multi-tenant systems, attackers can exploit these differences to infer sensitive information, e.g., by incrementally reconstructing another user's request by observing hit/miss patterns. Current defenses take a sledgehammer approach: they disable APC and cache sharing, isolating users, and sacrificing efficiency for regular users. This paper presents CacheSolidarity, a system that secures multi-tenant LLM serving systems against APC side channels without sacrificing performance and efficiency. CacheSolidarity monitors cache reuse across users, flags suspicious sharing, and selectively isolates prefixes, restricting their reuse only when necessary. Evaluation shows that CacheSolidarity enables up to 70% higher cache reuse and 30% lower inference latency compared to existing defenses that isolate users. CacheSolidarity's lightweight design demonstrates how security in LLM serving does not have to come at the cost of unnecessarily reduced performance or unbearable overheads.

顶级标签: llm systems model evaluation
详细标签: prefix caching side channel attack multi-tenant security inference optimization cache management 或 搜索:

CacheSolidarity:防止多租户大语言模型服务系统中的前缀缓存侧信道攻击 / CacheSolidarity: Preventing Prefix Caching Side Channels in Multi-tenant LLM Serving Systems


1️⃣ 一句话总结

这篇论文提出了一种名为CacheSolidarity的新系统,它能在不牺牲性能的前提下,有效防止多租户大语言模型服务中因共享前缀缓存而引发的计时侧信道攻击,从而保障用户数据安全。

源自 arXiv: 2603.10726