菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-11
📄 Abstract - Catalogue Grounded Multimodal Attribution for Museum Video under Resource and Regulatory Constraints

Audiovisual (AV) archives in museums and galleries are growing rapidly, but much of this material remains effectively locked away because it lacks consistent, searchable metadata. Existing method for archiving requires extensive manual effort. We address this by automating the most labour intensive part of the workflow: catalogue style metadata curation for in gallery video, grounded in an existing collection database. Concretely, we propose catalogue-grounded multimodal attribution for museum AV content using an open, locally deployable video language model. We design a multi pass pipeline that (i) summarises artworks in a video, (ii) generates catalogue style descriptions and genre labels, and (iii) attempts to attribute title and artist via conservative similarity matching to the structured catalogue. Early deployments on a painting catalogue suggest that this framework can improve AV archive discoverability while respecting resource constraints, data sovereignty, and emerging regulation, offering a transferable template for application-driven machine learning in other high-stakes domains.

顶级标签: multi-modal systems model evaluation
详细标签: video-language model metadata generation museum archives similarity matching automated cataloging 或 搜索:

资源与监管约束下基于馆藏目录的博物馆视频多模态属性归因 / Catalogue Grounded Multimodal Attribution for Museum Video under Resource and Regulatory Constraints


1️⃣ 一句话总结

这篇论文提出了一种基于本地部署视频语言模型的自动化方法,通过多步骤流程为博物馆视频生成与现有馆藏目录关联的描述、标签和作者归属信息,旨在以低成本、合规的方式提升音像档案的可检索性。

源自 arXiv: 2603.11147