对话中个人事实的标注方案与分类器 / An Annotation Scheme and Classifier for Personal Facts in Dialogue
1️⃣ 一句话总结
本文提出了一种改进的对话中个人事实标注方案(新增人口统计、财产等类别和时间、有效性等属性),并基于此训练了一个轻量级的多头分类器,在识别用户个人事实方面比现有大模型方法更准确、更高效。
The advancement of Large Language Models (LLMs) has enabled their application in personalized dialogue systems. We present an extended annotation scheme for personal fact classification that addresses limitations in existing approaches, particularly PeaCoK. Our scheme introduces new categories (Demographics, Possessions) and attributes (Duration, Validity, Followup) that enable structured storage, quality filtering, and identification of facts suitable for dialogue continuation. We manually annotated 2,779 facts from Multi-Session Chat and trained a multi-head classifier based on transformer encoders. Combined with the Gemma-300M encoder, the classifier achieves $81.6 \pm 2.6$\% macro F1, outperforming all few-shot LLM baselines (best: GPT-5.4-mini, 72.92\%) by nearly 9 percentage points while requiring substantially fewer computational resources. Error analysis reveals persistent challenges in semantic boundary disambiguation, temporal aspect interpretation, and pragmatic reasoning for followup assessment. The dataset\footnotemark[1] and classifier\footnotemark[2] are publicly available.
对话中个人事实的标注方案与分类器 / An Annotation Scheme and Classifier for Personal Facts in Dialogue
本文提出了一种改进的对话中个人事实标注方案(新增人口统计、财产等类别和时间、有效性等属性),并基于此训练了一个轻量级的多头分类器,在识别用户个人事实方面比现有大模型方法更准确、更高效。
源自 arXiv: 2605.10339