菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-30
📄 Abstract - RuC: HDL-Agnostic Rule Completion Benchmark Generation

Large Language Models (LLMs) have rapidly improved in performance across code-related tasks, making their integration into Register Transfer Level (RTL) development increasingly attractive. Mimicking the behavior of inline code assistants, many benchmarks evaluate LLMs' capabilities in code completion, either assessing the generation of entire hardware modules or the completion of a single line within a module. However both of these approaches lack the ability to control the granularity of the code-completion sample size and the syntactic range of completions. To overcome these limitations, we present a framework for language-agnostic rule completion (RuC), a grammar-driven, rule-selectable benchmark generator that automatically produces RTL code-completion tasks from a set of input hardware description sources. RuC uses the target Hardware Description Language (HDL) grammar to mask syntactically defined code regions and prompts a model to regenerate them using the surrounding unmasked code as context, enabling a controlled and scalable evaluation of the domain-specific model's code-understanding capabilities, ranging from assignments to the reconstruction of entire logic blocks. We use RuC to generate two SystemVerilog rule-completion benchmarks from the Tiny Tapeout shuttle TT07 and the CVE2 RISC-V core to demonstrate RuC's applicability to a broad range of designs, and conduct a comparative study of the code completion capabilities of modern open-source LLMs across diverse settings. Results indicate that completion performance strongly depends on the model type, the grammatical structure of the masked region, and the prompting strategy. Specifically, the highest scores are obtained with Fill-in-the-Middle (FIM) prompting. These findings highlight the value of grammar-driven, arbitrarily granular benchmarks for meaningful evaluation of LLM capabilities in RTL development workflows.

顶级标签: llm systems benchmark
详细标签: rtl code completion hardware description language grammar-driven benchmark systemverilog fill-in-the-middle 或 搜索:

RuC:与硬件描述语言无关的规则补全基准生成 / RuC: HDL-Agnostic Rule Completion Benchmark Generation


1️⃣ 一句话总结

本文提出了一种名为RuC的自动化框架,它能基于硬件描述语言的语法规则,从任意HDL代码中生成不同粒度的代码补全测试任务,从而更精细地评估大语言模型在寄存器传输级开发中的代码理解能力,实验表明模型性能受语法结构、提示策略等因素显著影响。

源自 arXiv: 2604.27780