菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-30
📄 Abstract - Language Ideologies in a Multilingual Society: An LLM-based Analysis of Luxembourgish News Comments

Detecting language ideologies is a valuable yet complex task for understanding how identities are constructed through discourse. In Luxembourg's multicultural and multilingual society, language ideologies reflect more than simple preferences: they carry deep cultural and social meanings, shaping identities and social belonging. Following recent developments in applying Natural Language Processing tools to linguistics and social science, this paper explores the potential of large language models to assist in the detection of language ideologies. We manually annotate a corpus of user comments in Luxembourgish with predefined ideological categories and then evaluate the performance of large language models under varying prompt conditions to assess their ability to replicate these human annotations. Since Luxembourgish is a small language and poorly represented in the LLMs' training data, we also investigate whether machine-translating the data to high-resource languages increases performance on the ideology detection task. Our findings suggest that, while LLMs are not yet fully optimized for a multi-class ideological annotation task, they are practical tools to identify language ideological content.

顶级标签: llm natural language processing multilingual
详细标签: language ideology annotation multilingual society luxembourgish ideology detection 或 搜索:

多语社会中的语言意识形态:基于大语言模型的卢森堡新闻评论分析 / Language Ideologies in a Multilingual Society: An LLM-based Analysis of Luxembourgish News Comments


1️⃣ 一句话总结

本研究尝试用大语言模型自动识别多语社会(卢森堡)新闻评论中隐藏的语言意识形态,发现尽管模型在多类别标注任务上尚未达到最优,但已能有效筛选出含有意识形态内容的文本,且将小众的卢森堡语翻译成高资源语言后效果有所提升。

源自 arXiv: 2604.27661