引入Hlava Cor与Hlava AD语料库:指代与话语关系中的标注者差异 / Introducing corpora Hlava Cor and Hlava AD: Human Label Variation in Coreference and Discourse Relations
1️⃣ 一句话总结
该论文通过两份多人独立标注的捷克语语料库,揭示了人工标注指代关系和话语关系时普遍存在的主观理解差异,并发现当自动模型对某些案例预测不一致时,这些案例往往对人工标注者也更具歧义性。
As previous research on annotator disagreement in discourse phenomena has shown, understanding text coherence varies considerably from one individual to another. To explore this phenomenon, we created two corpora with multiple annotations of Czech texts, accompanied by annotators' explanations of their choices. The first corpus consists of 1,024 contexts annotated in parallel by three annotators. It captures differences in the identification of coreference across various text types and grammatical-semantic categories, including pronouns, full noun phrases, and anaphoric adverbials. The second corpus comprises 512 contexts, annotated in parallel by five annotators, and focuses on identifying discourse relations in attributive and non-attributive constructions. Both corpora achieve a comparable inter-annotator agreement of approximately 60-65%. For coreference annotation, agreement tends to be lower in cases where automatic coreference resolution models disagree, suggesting that when the models disagree, the examples tend to be more difficult or ambiguous for human annotators to interpret. The annotators' comments, both for coreference and discourse relations, further reveal differences in interpretation, varying levels of confidence in text understanding, and individual reading strategies.
引入Hlava Cor与Hlava AD语料库:指代与话语关系中的标注者差异 / Introducing corpora Hlava Cor and Hlava AD: Human Label Variation in Coreference and Discourse Relations
该论文通过两份多人独立标注的捷克语语料库,揭示了人工标注指代关系和话语关系时普遍存在的主观理解差异,并发现当自动模型对某些案例预测不一致时,这些案例往往对人工标注者也更具歧义性。
源自 arXiv: 2606.25383