菜单

🤖 系统
📄 Abstract - NVIDIA Nemotron Parse 1.1

We introduce Nemotron-Parse-1.1, a lightweight document parsing and OCR model that advances the capabilities of its predecessor, Nemoretriever-Parse-1.0. Nemotron-Parse-1.1 delivers improved capabilities across general OCR, markdown formatting, structured table parsing, and text extraction from pictures, charts, and diagrams. It also supports a longer output sequence length for visually dense documents. As with its predecessor, it extracts bounding boxes of text segments, as well as corresponding semantic classes. Nemotron-Parse-1.1 follows an encoder-decoder architecture with 885M parameters, including a compact 256M-parameter language decoder. It achieves competitive accuracy on public benchmarks making it a strong lightweight OCR solution. We release the model weights publicly on Huggingface, as well as an optimized NIM container, along with a subset of the training data as part of the broader Nemotron-VLM-v2 dataset. Additionally, we release Nemotron-Parse-1.1-TC which operates on a reduced vision token length, offering a 20% speed improvement with minimal quality degradation.

顶级标签: computer vision natural language processing model training
详细标签: document parsing ocr multimodal transformer table extraction token compression 或 搜索:

NVIDIA Nemotron-Parse 1.1:轻量级文档解析与OCR模型 / NVIDIA Nemotron Parse 1.1


1️⃣ 一句话总结

NVIDIA Nemotron-Parse 1.1是一个8.85亿参数的轻量级文档解析和OCR模型,在通用OCR、Markdown格式化、结构化表格解析以及从图像、图表中提取文本方面相比前代有显著改进,并通过令牌压缩变体实现了20%的速度提升。


2️⃣ 论文创新点

1. 轻量级文档解析模型

2. 无位置嵌入解码器

3. 多令牌推理

4. NVpdftex数据生成管道

5. 多格式数据增强


3️⃣ 主要结果与价值

结果亮点

实际价值


4️⃣ 术语表

📄 打开原文 PDF