Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
π© Overview
(a) Overview of Semantic-First Diffusion (SFD). Semantics (dashed curve) and textures (solid curve) follow asynchronous denoising trajectories. SFD operates in three phases: Stage I β Semantic initialization, where semantic latents denoise first; Stage II β Asynchronous generation, where semantics and textures denoise jointly but asynchronously, with semantics ahead of textures; Stage III β Texture completion, where only textures continue refining. After denoising, the generated semantic latent sβ is discarded, and the final image is decoded solely from the texture latent zβ. (b) Training convergence on ImageNet 256Γ256 without guidance. SFD achieves substantially faster convergence than DiT-XL/2 and LightningDiT-XL/1 by approximately 100Γ and 33.3Γ, respectively.
β¨ Highlights
- We propose Semantic-First Diffusion (SFD), a novel latent diffusion paradigm that performs asynchronous denoising on semantic and texture latents, allowing semantics to denoise earlier and subsequently guide texture generation.
- SFD achieves state-of-the-art FID score of 1.04 on ImageNet 256Γ256 generation.
- Exhibits 100Γ and 33.3Γ faster training convergence compared to DiT and LightningDiT, respectively.
π§ͺ Quantitative Results
Explicitly leading semantics ahead of textures with a moderate offset (Ξt = 0.3) achieves an optimal balance between early semantic stabilization and texture collaboration, effectively harmonizing their joint modeling.
With AutoGuidance
| Model | Epochs | #Params | FID (NPU) |
|---|---|---|---|
| SFD-XL | 80 | 675M | 1.30 |
| SFD-XL | 800 | 675M | 1.06 |
| SFD-XXL | 80 | 1.0B | 1.19 |
| SFD-XXL | 800 | 1.0B | 1.04 |
π¨ Visual Results
π Links
- π Project Page: https://yuemingpan.github.io/SFD.github.io/
- π Paper (arXiv): https://arxiv.org/pdf/2512.04926
- πΎ Code: https://github.com/yuemingPAN/SFD
- π§° License: MIT
π§© Citation
@article{Pan2025SFD,
title={Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion},
author={Pan, Yueming and Feng, Ruoyu and Dai, Qi and Wang, Yuqi and Lin, Wenfeng and Guo, Mingyu and Luo, Chong and Zheng, Nanning},
journal={arXiv preprint arXiv:2512.04926},
year={2025}
}