菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-09
📄 Abstract - MONETA: Multimodal Industry Classification through Geographic Information with Multi Agent Systems

Industry classification schemes are integral parts of public and corporate databases as they classify businesses based on economic activity. Due to the size of the company registers, manual annotation is costly, and fine-tuning models with every update in industry classification schemes requires significant data collection. We replicate the manual expert verification by using existing or easily retrievable multimodal resources for industry classification. We present MONETA, the first multimodal industry classification benchmark with text (Website, Wikipedia, Wikidata) and geospatial sources (OpenStreetMap and satellite imagery). Our dataset enlists 1,000 businesses in Europe with 20 economic activity labels according to EU guidelines (NACE). Our training-free baseline reaches 62.10% and 74.10% with open and closed-source Multimodal Large Language Models (MLLM). We observe an increase of up to 22.80% with the combination of multi-turn design, context enrichment, and classification explanations. We will release our dataset and the enhanced guidelines.

顶级标签: multi-modal agents benchmark
详细标签: industry classification multimodal llm geospatial data multi-agent systems data enrichment 或 搜索:

MONETA:通过多智能体系统利用地理信息进行多模态行业分类 / MONETA: Multimodal Industry Classification through Geographic Information with Multi Agent Systems


1️⃣ 一句话总结

这篇论文提出了一个名为MONETA的多模态行业分类新方法,它通过结合公司网站文本、维基百科信息和卫星图像等多种数据源,无需大量人工标注或模型重新训练,就能自动、高效地对欧洲企业进行行业分类,其最佳方案比现有基线模型提升了超过22%的准确率。

源自 arXiv: 2604.07956