菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-15
📄 Abstract - KernelBlaster: Continual Cross-Task CUDA Optimization via Memory-Augmented In-Context Reinforcement Learning

Optimizing CUDA code across multiple generations of GPU architectures is challenging, as achieving peak performance requires an extensive exploration of an increasingly complex, hardware-specific optimization space. Traditional compilers are constrained by fixed heuristics, whereas finetuning Large Language Models (LLMs) can be expensive. However, agentic workflows for CUDA code optimization have limited ability to aggregate knowledge from prior exploration, leading to biased sampling and suboptimal solutions. We propose KernelBlaster, a Memory-Augmented In-context Reinforcement Learning (MAIC-RL) framework designed to improve CUDA optimization search capabilities of LLM-based GPU coding agents. KernelBlaster enables agents to learn from experience and make systematically informed decisions on future tasks by accumulating knowledge into a retrievable Persistent CUDA Knowledge Base. We propose a novel profile-guided, textual-gradient-based agentic flow for CUDA generation and optimization to achieve high performance across generations of GPU architectures. KernelBlaster guides LLM agents to systematically explore high-potential optimization strategies beyond naive rewrites. Compared to the PyTorch baseline, our method achieves geometric mean speedups of 1.43x, 2.50x, and 1.50x on KernelBench Levels 1, 2, and 3, respectively. We release KernelBlaster as an open-source agentic framework, accompanied by a test harness, verification components, and a reproducible evaluation pipeline.

顶级标签: llm agents systems
详细标签: cuda optimization in-context reinforcement learning memory-augmented agents gpu programming code generation 或 搜索:

KernelBlaster:一种通过记忆增强的上下文强化学习实现跨任务持续优化的CUDA代码优化框架 / KernelBlaster: Continual Cross-Task CUDA Optimization via Memory-Augmented In-Context Reinforcement Learning


1️⃣ 一句话总结

这篇论文提出了一个名为KernelBlaster的智能框架,它让AI助手在优化GPU计算代码时,能像人类一样记住并复用过去的成功经验,从而在不同型号的GPU上持续、高效地找到最佳性能优化方案。

源自 arXiv: 2602.14293