QCalEval: Benchmarking Vision-Language Models for Quantum Calibration Plot Understanding

📄 Abstract - QCalEval: Benchmarking Vision-Language Models for Quantum Calibration Plot Understanding

Quantum computing calibration depends on interpreting experimental data, and calibration plots provide the most universal human-readable representation for this task, yet no systematic evaluation exists of how well vision-language models (VLMs) interpret them. We introduce QCalEval, the first VLM benchmark for quantum calibration plots: 243 samples across 87 scenario types from 22 experiment families, spanning superconducting qubits and neutral atoms, evaluated on six question types in both zero-shot and in-context learning settings. The best general-purpose zero-shot model reaches a mean score of 72.3, and many open-weight models degrade under multi-image in-context learning, whereas frontier closed models improve substantially. A supervised fine-tuning ablation at the 9-billion-parameter scale shows that SFT improves zero-shot performance but cannot close the multimodal in-context learning gap. As a reference case study, we release NVIDIA Ising Calibration 1, an open-weight model based on Qwen3.5-35B-A3B that reaches 74.7 zero-shot average score.

QCalEval：量子校准图理解的视觉语言模型基准测试 / QCalEval: Benchmarking Vision-Language Models for Quantum Calibration Plot Understanding

1️⃣ 一句话总结

该论文提出了首个专门评估视觉语言模型（VLM）理解量子计算校准图表能力的基准测试QCalEval，发现通用模型在零样本下表现尚可，但多图学习时开权重模型性能下降，而微调虽能提升零样本效果却无法弥补多模态上下文学习的差距。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要