Transcoda:基于数据驱动的合成训练的端到端零样本光学乐谱识别 / Transcoda: End-to-End Zero-Shot Optical Music Recognition via Data-Centric Synthetic Training
1️⃣ 一句话总结
本文提出了一种名为Transcoda的光学乐谱识别系统,通过先进的合成数据生成、编码标准化和语法约束解码,仅用6小时在单GPU上训练一个小模型,就大幅超越了现有大型模型在合成乐谱和历史扫描乐谱上的识别准确率。
Optical Music Recognition (OMR), the task of transcribing sheet music into a structured textual representation, is currently bottlenecked by a lack of large-scale, annotated datasets of real scans. This forces models to rely on either few-shot transfer or synthetic training pipelines that remain overly simplistic. A secondary challenge is encoding non-uniqueness: in the popular Humdrum **kern format for transcribing music, multiple different text encodings can render into the same visual sheet music. This one-to-many mapping creates a harder learning task and introduces high uncertainty during decoding. We propose Transcoda, an OMR system built on (i) an advanced synthetic data generation pipeline, (ii) a normalization of the **kern encoding to enforce a unique normal form and (iii) grammar-based decoding to ensure the syntactic correctness of the output. This approach allows us to train a compact 59M-parameter model in just 6 hours on a single GPU that outperforms billion-parameter baselines. Transcoda achieves the best score among state of the art baselines on a newly curated benchmark of synthetically rendered scores at 18.46% OMR-NED (compared to 43.91% for the next-best system, Legato) and reduces the error rate on historical Polish scans to 63.97% OMR-NED (down from 80.16% for SMT++).
Transcoda:基于数据驱动的合成训练的端到端零样本光学乐谱识别 / Transcoda: End-to-End Zero-Shot Optical Music Recognition via Data-Centric Synthetic Training
本文提出了一种名为Transcoda的光学乐谱识别系统,通过先进的合成数据生成、编码标准化和语法约束解码,仅用6小时在单GPU上训练一个小模型,就大幅超越了现有大型模型在合成乐谱和历史扫描乐谱上的识别准确率。
源自 arXiv: 2605.10835