菜单

🤖 系统
📄 Abstract - NVIDIA Nemotron Nano V2 VL

We introduce Nemotron Nano V2 VL, the latest model of the Nemotron vision-language series designed for strong real-world document understanding, long video comprehension, and reasoning tasks. Nemotron Nano V2 VL delivers significant improvements over our previous model, Llama-3.1-Nemotron-Nano-VL-8B, across all vision and text domains through major enhancements in model architecture, datasets, and training recipes. Nemotron Nano V2 VL builds on Nemotron Nano V2, a hybrid Mamba-Transformer LLM, and innovative token reduction techniques to achieve higher inference throughput in long document and video scenarios. We are releasing model checkpoints in BF16, FP8, and FP4 formats and sharing large parts of our datasets, recipes and training code.

顶级标签: multi-modal natural language processing model training
详细标签: vision-language document understanding video comprehension mamba-transformer token reduction 或 搜索:

📄 论文总结

NVIDIA Nemotron Nano V2 VL / NVIDIA Nemotron Nano V2 VL


1️⃣ 一句话总结

这篇论文介绍了NVIDIA最新推出的视觉语言模型Nemotron Nano V2 VL,它通过改进模型架构和训练方法,在文档理解、长视频分析和推理任务上表现更出色,同时提升了处理长内容的效率。


📄 打开原文 PDF