菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-26
📄 Abstract - Approaches to Analysing Historical Newspapers Using LLMs

This study presents a computational analysis of the Slovene historical newspapers \textit{Slovenec} and \textit{Slovenski narod} from the sPeriodika corpus, combining topic modelling, large language model (LLM)-based aspect-level sentiment analysis, entity-graph visualisation, and qualitative discourse analysis to examine how collective identities, political orientations, and national belonging were represented in public discourse at the turn of the twentieth century. Using BERTopic, we identify major thematic patterns and show both shared concerns and clear ideological differences between the two newspapers, reflecting their conservative-Catholic and liberal-progressive orientations. We further evaluate four instruction-following LLMs for targeted sentiment classification in OCR-degraded historical Slovene and select the Slovene-adapted GaMS3-12B-Instruct model as the most suitable for large-scale application, while also documenting important limitations, particularly its stronger performance on neutral sentiment than on positive or negative sentiment. Applied at dataset scale, the model reveals meaningful variation in the portrayal of collective identities, with some groups appearing predominantly in neutral descriptive contexts and others more often in evaluative or conflict-related discourse. We then create NER graphs to explore the relationships between collective identities and places. We apply a mixed methods approach to analyse the named entity graphs, combining quantitative network analysis with critical discourse analysis. The investigation focuses on the emergence and development of intertwined historical political and socionomic identities. Overall, the study demonstrates the value of combining scalable computational methods with critical interpretation to support digital humanities research on noisy historical newspaper data.

顶级标签: llm natural language processing data
详细标签: historical text analysis sentiment analysis topic modeling named entity recognition digital humanities 或 搜索:

利用大语言模型分析历史报纸的方法 / Approaches to Analysing Historical Newspapers Using LLMs


1️⃣ 一句话总结

这项研究结合了主题建模、大语言模型情感分析、实体关系图等多种计算方法和定性分析,揭示了20世纪初斯洛文尼亚两家报纸如何呈现集体身份与政治倾向,并评估了适用于历史文本分析的最佳大语言模型。

源自 arXiv: 2603.25051