菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-10
📄 Abstract - FD-DB: Frequency-Decoupled Dual-Branch Network for Unpaired Synthetic-to-Real Domain Translation

Synthetic data provide low-cost, accurately annotated samples for geometry-sensitive vision tasks, but appearance and imaging differences between synthetic and real domains cause severe domain shift and degrade downstream performance. Unpaired synthetic-to-real translation can reduce this gap without paired supervision, yet existing methods often face a trade-off between photorealism and structural stability: unconstrained generation may introduce deformation or spurious textures, while overly rigid constraints limit adaptation to real-domain statistics. We propose FD-DB, a frequency-decoupled dual-branch model that separates appearance transfer into low-frequency interpretable editing and high-frequency residual compensation. The interpretable branch predicts physically meaningful editing parameters (white balance, exposure, contrast, saturation, blur, and grain) to build a stable low-frequency appearance base with strong content preservation. The free branch complements fine details through residual generation, and a gated fusion mechanism combines the two branches under explicit frequency constraints to limit low-frequency drift. We further adopt a two-stage training schedule that first stabilizes the editing branch and then releases the residual branch to improve optimization stability. Experiments on the YCB-V dataset show that FD-DB improves real-domain appearance consistency and significantly boosts downstream semantic segmentation performance while preserving geometric and semantic structures.

顶级标签: computer vision model training machine learning
详细标签: domain adaptation image translation frequency decoupling synthetic-to-real unpaired translation 或 搜索:

FD-DB:用于无配对合成到真实域转换的频率解耦双分支网络 / FD-DB: Frequency-Decoupled Dual-Branch Network for Unpaired Synthetic-to-Real Domain Translation


1️⃣ 一句话总结

这篇论文提出了一种名为FD-DB的新方法,它通过将图像外观转换分解为低频可解释编辑和高频细节补偿两个分支,有效解决了合成数据转为真实数据时面临的‘逼真度’与‘结构稳定性’难以兼顾的难题,从而在提升下游视觉任务性能的同时更好地保持了原始内容的几何结构。

源自 arXiv: 2602.09476