Beyond First-Order Tweedie: Solving Inverse Problems using Latent Diffusion

1University of Texas, Austin   2Google Research   3Google DeepMind

Abstract

Sampling from the posterior distribution poses a major computational challenge in solving inverse problems using latent diffusion models. Common methods rely on Tweedie's first-order moments, which are known to induce a quality-limiting bias. Existing second-order approximations are impractical due to prohibitive computational costs, making standard reverse diffusion processes intractable for posterior sampling.

This paper introduces Second-order Tweedie sampler from Surrogate Loss (STSL), a novel sampler that offers efficiency comparable to first-order Tweedie with a tractable reverse process using second-order approximation. Our theoretical results reveal that the second-order approximation is lower bounded by our surrogate loss that only requires O(1) compute using the trace of the Hessian, and by the lower bound we derive a new drift term to make the reverse process tractable. Our method surpasses SoTA solvers PSLD[3] and P2L[4], achieving 4X and 8X reduction in neural function evaluations, respectively, while notably enhancing sampling quality on FFHQ, ImageNet, and COCO benchmarks. In addition, we show STSL extends to text-guided image editing and addresses residual distortions present from corrupted images in leading text-guided image editing methods.

To our best knowledge, this is the first work to offer an efficient second-order approximation in solving inverse problems using latent diffusion and editing real-world images with corruptions.

STSL for Image Inversion

Qualitative results on Motion Deblurring:

Qualitative results on Super-Resolution (8X):

Qualitative results on Gaussian Deblurring:

Qualitative results on Free-from Inpainting:


Comparison with existing methods:

STSL for Image Editing

We introduce a new framework for high-fidelity image editing in real-world environments with corruptions. To the best of our knowledge, this is the first framework that can handle corruptions in image editing pipelines.


Image editing from corrupted image:

"a high quality photo of a tiger face" "a high quality photo of a leopard face"


"a high quality photo of a cat face" "a high quality photo of a fox face"


Motion Blur

Super-Resolution (8X)

Gaussian Blur


Comparison with existing methods:

References

[1] Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Prompt-to-prompt image editing with cross attention control.ICLR, 2023.
[2] Mokady, Ron, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Null-text Inversion for Editing Real Images using Guided Diffusion Models. CVPR, 2023
[3] Rout, Litu, Negin Raoof, Giannis Daras, Constantine Caramanis, Alex Dimakis, and Sanjay Shakkottai. Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models. NeurIPS, 2023.
[4] Chung, Hyungjin, Jong Chul Ye, Peyman Milanfar, and Mauricio Delbracio. Prompt-tuning latent diffusion models for inverse problems ArXiv, 2023

Acknowledgment

This research has been partially supported by NSF Grant 2019844, Google Research, and the UT Austin Machine Learning Lab (MLL). Litu Rout has been supported by the Ju-Nam and Pearl Chew Endowed Presidential Fellowship in Engineering and the George J. Heuer, Jr. Ph.D. Endowed Graduate Fellowship.

BibTeX

@misc{rout2023secondorder,
      title={Beyond First-Order Tweedie: Solving Inverse Problems using Latent Diffusion}, 
      author={Rout, L and Chen, Y and Kumar, A and Caramanis, C and Shakkottai, S and Chu, W},      
      journal={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      year={2024}
}