并非所有主观性都相同!为NLP中主观性评估定义理想标准 / Not All Subjectivity Is the Same! Defining Desiderata for the Evaluation of Subjectivity in NLP
1️⃣ 一句话总结
这篇立场论文提出了七个评估主观性敏感NLP模型的理想标准,旨在确保评估方法能有效反映多元观点并关注用户影响,同时指出当前研究在区分模糊与多声输入、主观表达有效性等方面仍存在不足。
Subjective judgments are part of several NLP datasets and recent work is increasingly prioritizing models whose outputs reflect this diversity of perspectives. Such responses allow us to shed light on minority voices, which are frequently marginalized or obscured by dominant perspectives. It remains a question whether our evaluation practices align with these models' objectives. This position paper proposes seven evaluation desiderata for subjectivity-sensitive models, rooted in how subjectivity is represented in NLP data and models. The desiderata are constructed in a top-down approach, keeping in mind the user-centric impact of such models. We scan the experimental setup of 60 papers and show that various aspects of subjectivity are still understudied: the distinction between ambiguous and polyphonic input, whether subjectivity is effectively expressed to the user, and a lack of interplay between different desiderata, amongst other gaps.
并非所有主观性都相同!为NLP中主观性评估定义理想标准 / Not All Subjectivity Is the Same! Defining Desiderata for the Evaluation of Subjectivity in NLP
这篇立场论文提出了七个评估主观性敏感NLP模型的理想标准,旨在确保评估方法能有效反映多元观点并关注用户影响,同时指出当前研究在区分模糊与多声输入、主观表达有效性等方面仍存在不足。
源自 arXiv: 2603.28351