菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-05
📄 Abstract - Detecting Media Clones in Cultural Repositories Using a Positive Unlabeled Learning Approach

We formulate curator-in-the-loop duplicate discovery in the AtticPOT repository as a Positive-Unlabeled (PU) learning problem. Given a single anchor per artefact, we train a lightweight per-query Clone Encoder on augmented views of the anchor and score the unlabeled repository with an interpretable threshold on the latent l_2 norm. The system proposes candidates for curator verification, uncovering cross-record duplicates that were not verified a priori. On CIFAR-10 we obtain F1=96.37 (AUROC=97.97); on AtticPOT we reach F1=90.79 (AUROC=98.99), improving F1 by +7.70 points over the best baseline (SVDD) under the same lightweight backbone. Qualitative "find-similar" panels show stable neighbourhoods across viewpoint and condition. The method avoids explicit negatives, offers a transparent operating point, and fits de-duplication, record linkage, and curator-in-the-loop workflows.

顶级标签: machine learning data systems
详细标签: positive unlabeled learning duplicate detection cultural heritage representation learning curator-in-the-loop 或 搜索:

使用正例-无标签学习方法检测文化资料库中的媒体克隆 / Detecting Media Clones in Cultural Repositories Using a Positive Unlabeled Learning Approach


1️⃣ 一句话总结

这篇论文提出了一种新方法,帮助博物馆或档案馆的管理员高效地从海量数字藏品中找出重复或高度相似的物品,该方法只需管理员提供一个参考样本,就能自动学习并推荐可能的重复项供人工确认,大大提升了工作效率。

源自 arXiv: 2604.04071