Floe: Federated Specialization for Real-Time LLM-SLM Inference

📄 Abstract - Floe: Federated Specialization for Real-Time LLM-SLM Inference

Deploying large language models (LLMs) in real-time systems remains challenging due to their substantial computational demands and privacy concerns. We propose Floe, a hybrid federated learning framework designed for latency-sensitive, resource-constrained environments. Floe combines a cloud-based black-box LLM with lightweight small language models (SLMs) on edge devices to enable low-latency, privacy-preserving inference. Personal data and fine-tuning remain on-device, while the cloud LLM contributes general knowledge without exposing proprietary weights. A heterogeneity-aware LoRA adaptation strategy enables efficient edge deployment across diverse hardware, and a logit-level fusion mechanism enables real-time coordination between edge and cloud models. Extensive experiments demonstrate that Floe enhances user privacy and personalization. Moreover, it significantly improves model performance and reduces inference latency on edge devices under real-time constraints compared with baseline approaches.

Floe：面向实时大语言模型-小语言模型推理的联邦专业化框架 / Floe: Federated Specialization for Real-Time LLM-SLM Inference

1️⃣ 一句话总结

这篇论文提出了一个名为Floe的混合联邦学习框架，它通过将云端大语言模型的通用知识与本地小模型的个性化能力相结合，在保护用户隐私的同时，显著降低了边缘设备上的实时推理延迟并提升了模型性能。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要