← 返回列表

arXiv 提交日期: 2026-03-02

📄 Abstract - TiledAttention: a CUDA Tile SDPA Kernel for PyTorch

TiledAttention is a scaled dot-product attention (SDPA) forward operator for SDPA research on NVIDIA GPUs. Implemented in cuTile Python (TileIR) and exposed as a PyTorch-callable function, it is easier to modify than low-level CUDA templates while retaining realistic behavior via online softmax and tiled $K,V$ streaming. The approach is both performant and directly editable at the schedule level from Python (tile shapes, staging, shared-memory layout), enabling rapid, reproducible kernel research without template-heavy CUDA/CUTLASS rewrites. We benchmark TiledAttention on an NVIDIA DGX GB10 node with a reproducible harness and compare against PyTorch SDPA (auto-dispatch) and explicit unfused baselines across sequence length, head dimension, and precision (FP16/BF16). While production fused baselines remain stronger overall, TiledAttention delivers large speedups over standard eager attention paths and is available for direct use within PyTorch workflows, providing a practical balance between performance and customizability.

顶级标签: systems model training

TiledAttention：一个用于PyTorch的CUDA分块SDPA内核 / TiledAttention: a CUDA Tile SDPA Kernel for PyTorch

1️⃣ 一句话总结

这篇论文介绍了一个名为TiledAttention的、易于修改且性能良好的注意力计算内核，它通过高级Python接口实现了对GPU计算过程的灵活控制，为快速研究和定制化优化提供了便利。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2603.01960

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要