LALE:用于土地覆盖估计的轻量级Transformer架构 / LALE: Lightweight-Transformer Architecture for Land-Cover Estimation
1️⃣ 一句话总结
本文提出了一种名为LALE的轻量级遥感图像分割模型,通过将高分辨率局部特征交给轻量卷积模块处理、低分辨率全局特征交给Transformer模块处理,并结合全MLP解码器,在保持高精度的同时大幅降低了参数数量和计算成本,比传统模型效率高出数倍。
Semantic segmentation of remote sensing imagery requires models that capture both global context and local detail under tight computational budgets. Prior work typically optimizes for one of these axes: attention for global context, convolution for local detail, or compactness for efficiency. While hybrid approaches aim to capture both, they require architectural changes and encoder backbones with computational overhead, limiting efficiency and performance. We present LALE (Lightweight-transformer Architecture for Land-cover Estimation), an end-to-end remote sensing image segmentation architecture, that bifurcates its encoder by resolution: lightweight ConvMixer stages handle high-resolution local features, while transformer stages handle low-resolution global context, confining the quadratic cost of self-attention to deep, downsampled feature maps. An all-MLP multi-scale decoder, together with RMSNorm and StarReLU throughout, further reduces compute and parameter count. On the large-scale ARAS400k remote-sensing segmentation benchmark, LALE establishes a strong efficiency-performance trade-off against CNN, transformer, and hybrid baselines. Our smallest variant, (just 1.6M parameters), reaches within 2.6 F1 points of the best baseline (UPerNet) while using 4.5x fewer parameters, 7x less storage, 17x fewer GMACs, and delivering 1.8x higher throughput.
LALE:用于土地覆盖估计的轻量级Transformer架构 / LALE: Lightweight-Transformer Architecture for Land-Cover Estimation
本文提出了一种名为LALE的轻量级遥感图像分割模型,通过将高分辨率局部特征交给轻量卷积模块处理、低分辨率全局特征交给Transformer模块处理,并结合全MLP解码器,在保持高精度的同时大幅降低了参数数量和计算成本,比传统模型效率高出数倍。
源自 arXiv: 2606.02092