菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-24
📄 Abstract - Energy-Efficient CNN Acceleration with MSDF Digit-Serial Arithmetic on FPGA

This paper presents an energy-efficient hardware acceleration of the convolutional layers in the U-Net architecture for image segmentation, implemented on FPGA. While digit-serial arithmetic, particularly most-significant-digit-first (MSDF) techniques, offers a compact hardware footprint, it suffers from initial latency before producing the first output digit. This delay accumulates in cascaded operations like multiplication followed by addition, where each unit introduces its own startup overhead. To overcome this, we propose a merged multiply-add (MMA) architecture that fuses these operations into a unified pipeline. Instead of incurring separate delays, the MMA introduces a single streamlined latency per iteration, shorter than the combined latency of conventional cascaded units, resulting in enhanced throughput and efficiency. The MMA units are designed to process spatial input depths in parallel, achieving significantly higher performance than both standalone MSDF-based and conventional designs. We evaluate the proposed design using U-Net as a target application. Despite operating at a lower frequency than a CPU, the FPGA-based accelerator achieves up to an order of magnitude higher energy efficiency, delivering up to $15.14$ GOPS/W compared to $1.93$ GOPS/W for CPU-based inference. The design also shows approximately $9\times$ reduction in energy consumption compared to MSDF-based FPGA implementations. These results highlight the efficacy of the merged arithmetic approach for resource-constrained, latency-sensitive edge applications in medical imaging and computer vision.

顶级标签: systems computer vision machine learning
详细标签: fpga acceleration digit-serial arithmetic unet energy efficiency image segmentation 或 搜索:

基于MSDF数字串行算术的FPGA能效型CNN加速 / Energy-Efficient CNN Acceleration with MSDF Digit-Serial Arithmetic on FPGA


1️⃣ 一句话总结

本文提出了一种融合乘加运算的FPGA硬件架构,通过消除传统数字串行计算中的逐级延迟,显著提升了图像分割U-Net网络的加速效率,在功耗和性能上均优于CPU及其他FPGA方案,特别适合资源受限的实时边缘应用。

源自 arXiv: 2606.25562