📄
Abstract - AI evaluation may bias perceptions: The importance of context in interpreting academic writing
This paper examines how estimates of AI use in scientific writing can be biased when evaluation methods ignore contextual differences across countries and fields. Using large-scale data on journal publications from Dimensions, we construct AI-likeness benchmarks based on differences between human-written and LLM-rephrased abstracts. We show that a pooled benchmark may confound pre-existing stylistic variation with AI-generated text, producing substantial distortions across country-field groups even in pre-LLM publications. In contrast, country-field-specific benchmarks attenuate such distortions and provide a more credible baseline for comparison. Applying these methods to publications in 2025 reveals that the pooled benchmark systematically overestimates AI use in certain countries and fields while underestimating it in others. These findings highlight the importance of context-aware measurement for accurate and equitable evaluation of AI use in science.
人工智能评估可能扭曲认知:语境在解读学术写作中的重要性 /
AI evaluation may bias perceptions: The importance of context in interpreting academic writing
1️⃣ 一句话总结
该研究发现,在评估学术论文是否由AI辅助撰写时,如果忽略不同国家和学科领域固有的风格差异(如语言习惯、写作传统),会导致对AI使用量的错误判断——某些领域被高估,另一些被低估,因此呼吁采用针对具体国家-领域的评估基准来获得更公平的结果。