菜单

🤖 系统
📄 Abstract - Where Culture Fades: Revealing the Cultural Gap in Text-to-Image Generation

Multilingual text-to-image (T2I) models have advanced rapidly in terms of visual realism and semantic alignment, and are now widely utilized. Yet outputs vary across cultural contexts: because language carries cultural connotations, images synthesized from multilingual prompts should preserve cross-lingual cultural consistency. We conduct a comprehensive analysis showing that current T2I models often produce culturally neutral or English-biased results under multilingual prompts. Analyses of two representative models indicate that the issue stems not from missing cultural knowledge but from insufficient activation of culture-related representations. We propose a probing method that localizes culture-sensitive signals to a small set of neurons in a few fixed layers. Guided by this finding, we introduce two complementary alignment strategies: (1) inference-time cultural activation that amplifies the identified neurons without backbone fine-tuned; and (2) layer-targeted cultural enhancement that updates only culturally relevant layers. Experiments on our CultureBench demonstrate consistent improvements over strong baselines in cultural consistency while preserving fidelity and diversity.

顶级标签: multi-modal natural language processing model evaluation
详细标签: text-to-image cultural bias multilingual models model alignment representation activation 或 搜索:

文化褪色之处:揭示文本到图像生成中的文化鸿沟 / Where Culture Fades: Revealing the Cultural Gap in Text-to-Image Generation


1️⃣ 一句话总结

这篇论文发现,当前的多语言文本生成图像模型在处理不同语言提示时,常常产生文化中立或偏向英语文化的结果,其根源在于模型内部文化相关表征未被充分激活,而非缺乏文化知识;为此,作者提出了一种定位文化敏感神经元的方法,并设计了两种无需全面微调模型的策略来增强生成图像的文化一致性。


📄 打开原文 PDF