菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-26
📄 Abstract - Temporal Simultaneity Predicts Annotation Quality in Sentiment Corpora

Annotation quality is difficult to sustain when campaigns span weeks or months with small annotator pools. We present a Setswana sentiment dataset of 3,565 tweets annotated by three native-speaker annotators across eight batches and examine why inter-annotator agreement (IAA) declines over time. Despite an aggregate Randolph's free-marginal Kappa of $\kappa = 0.76$, "excellent," per-batch $\kappa$ falls by more than 32 points across the annotation task. Through six targeted analyses, we find that (i) label confusion concentrates on the negative/neutral boundary, (ii) two annotators show run-length drift consistent with autopilot labeling, and (iii) the dominant predictor of $\kappa$ is temporal simultaneity: tweets labeled within one minute achieve $\kappa = 0.98$, while those labeled more than a day apart reach only $\kappa = 0.65$. Annotation speed and tweet-level linguistic features show no meaningful association with $\kappa$. We benchmark three open multilingual encoders and proprietary models (GPT-5 and Gemini) on three-class sentiment classification; fine-tuning yields gains of 29 to 43 macro-F1 points over pretrained baselines, with GPT-5 few-shot leading overall (62.2 macro-F1). We release the dataset, per-annotation timestamps, and analysis code to support reproducible quality auditing for future African language NLP resources.

顶级标签: natural language processing data benchmark
详细标签: annotation quality sentiment analysis inter-annotator agreement setswana temporal analysis 或 搜索:

时间同步性预测情感语料库中的标注质量 / Temporal Simultaneity Predicts Annotation Quality in Sentiment Corpora


1️⃣ 一句话总结

本文发现,在长时间、小团队的语料标注任务中,标注质量下降的主要原因是标注时间的不一致性:同一时间段内标注的样本一致性极高,而间隔一天以上标注的样本一致性显著降低,这为监控和提升标注质量提供了简单有效的预测指标。

源自 arXiv: 2605.27239