DifBFSR: Blind Face Super-Resolution via Conditional Diffusion Contraction
DOI:
https://doi.org/10.31577/cai_2024_2_369Keywords:
Blind face super-resolution, diffusion model, face restoration, image generationAbstract
Blind Face Super-Resolution (BFSR) has recently gained widespread attention, which aims to super-resolve Low-Resolution (LR) face images with complex unknown degradation to High-Resolution (HR) face images. However, existing BFSR methods suffer from two major limitations. First, most of them are trained on synthetic degradation data pairs with pre-defined degradation models, which leads to poor performance due to the degradation mismatch between other unknown complex degradations in real-world scenarios. Second, some methods rely on hand-crafted face priors as constraints, such as facial landmarks and parsing maps, which require additional callouts and laborious hyperparameter tuning for real cases. To tackle these issues, we propose a simple and effective self-supervised cooperative learning framework via a conditional diffusion contraction method for BFSR, dubbed DifBFSR, which establishes the posterior distribution of HR images from degraded LR images with unknown degradation via a powerful diffusion model without expensive supervised training or additional constraint design. Specifically, we first transform the degraded LR face image to an intermediate HR face prediction with degradation-invariant by a simple Super-Resolution module (SRM), which only relies on self-supervised optimization. To enhance the face prediction, we propose a Contraction Filter Module (CFM) to gradually contract the restoration error by adaptive dynamic filtering, which efficiently leverages rich nature face prior encapsulated in the pre-trained diffusion model through conditional posterior sampling. Finally, by combining the SRM, CFM, and diffusion model in a self-supervised cooperative learning framework, DifBFSR can robustly handle unknown complex degradations, which favorably avoids the cumbersome training and parameter tuning. Extensive qualitative and quantitative experiments on complex degraded synthetic and real-world datasets show that our method outperforms state-of-the-art BFSR methods.