菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-29
📄 Abstract - Differentially-Private Text Rewriting reshapes Linguistic Style

Differential Privacy (DP) for text matured from disjointed word-level substitutions to contiguous sentence-level rewriting by leveraging the generative capacity of language models. While this form of text privatization is best suited for balancing formal privacy guarantees with grammatical coherence, its impact on the register identity of text remains largely unexplored. By conducting a multidimensional stylistic profiling of differentially-private rewriting, we demonstrate that the cost of privacy extends far beyond lexical variation. Specifically, we find that rewriting under privacy constraints induces a systematic functional mutation of the text's communicative signature. This shift is characterized by the severe attrition of interactive markers, contextual references, and complex subordination. By comparing autoregressive paraphrasing against bidirectional substitution across a spectrum of privacy budgets, we observe that both architectures force convergence toward a non-involved and non-persuasive register. This register-blind sanitization effectively preserves semantic content but structurally homogenizes the nuanced stylistic markers that define human-authored discourse.

顶级标签: natural language processing machine learning
详细标签: differential privacy text rewriting linguistic style register analysis privacy-preserving nlp 或 搜索:

差分隐私文本重写重塑语言风格 / Differentially-Private Text Rewriting reshapes Linguistic Style


1️⃣ 一句话总结

本文发现,在文本上应用差分隐私保护时,虽然能保留语义并保证语法通顺,但会系统性地抹去人类写作中体现互动性、语境关联和复杂句式的语言风格特征,导致所有文本趋向一种单一、非说服性的正式语体。

源自 arXiv: 2604.26656