菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-19
📄 Abstract - Toward Trustworthy Evaluation of Sustainability Rating Methodologies: A Human-AI Collaborative Framework for Benchmark Dataset Construction

Sustainability or ESG rating agencies use company disclosures and external data to produce scores or ratings that assess the environmental, social, and governance performance of a company. However, sustainability ratings across agencies for a single company vary widely, limiting their comparability, credibility, and relevance to decision-making. To harmonize the rating results, we propose adopting a universal human-AI collaboration framework to generate trustworthy benchmark datasets for evaluating sustainability rating methodologies. The framework comprises two complementary parts: STRIDE (Sustainability Trust Rating & Integrity Data Equation) provides principled criteria and a scoring system that guide the construction of firm-level benchmark datasets using large language models (LLMs), and SR-Delta, a discrepancy-analysis procedural framework that surfaces insights for potential adjustments. The framework enables scalable and comparable assessment of sustainability rating methodologies. We call on the broader AI community to adopt AI-powered approaches to strengthen and advance sustainability rating methodologies that support and enforce urgent sustainability agendas.

顶级标签: llm data benchmark
详细标签: sustainability rating esg human-ai collaboration dataset construction evaluation framework 或 搜索:

迈向可持续性评级方法的可信评估:一个用于基准数据集构建的人机协作框架 / Toward Trustworthy Evaluation of Sustainability Rating Methodologies: A Human-AI Collaborative Framework for Benchmark Dataset Construction


1️⃣ 一句话总结

这篇论文针对当前不同机构给出的企业可持续性(ESG)评分差异巨大、难以比较的问题,提出了一个结合人类专家原则与大型语言模型(LLM)能力的人机协作框架,旨在高效、可扩展地构建可信的基准数据集,从而更可靠地评估和改进各种可持续性评级方法。

源自 arXiv: 2602.17106