FreeInv: Free Lunch for Improving DDIM Inversion

1Alibaba Group  2Beihang University 

Abstract

Naive DDIM inversion process usually suffers from a trajectory deviation issue, i.e., the latent trajectory during reconstruction deviates from the one during inversion. To alleviate this issue, previous methods either learn to mitigate the deviation or design cumbersome compensation strategy to reduce the mismatch error, exhibiting substantial time and computation cost. In this work, we present a nearly free-lunch method (named FreeInv) to address the issue more effectively and efficiently. In FreeInv, we randomly transform the latent representation and keep the transformation the same between the corresponding inversion and reconstruction time-step. It is motivated from a statistical perspective that an ensemble of DDIM inversion processes for multiple trajectories yields a smaller trajectory mismatch error on expectation. Moreover, through theoretical analysis and empirical study, we show that FreeInv performs an efficient ensemble of multiple trajectories. FreeInv can be freely integrated into existing inversion-based image and video editing techniques. Especially for inverting video sequences, it brings more significant fidelity and efficiency improvements. Comprehensive quantitative and qualitative evaluation on PIE benchmark and DAVIS dataset shows that FreeInv remarkably outperforms conventional DDIM inversion, and is competitive among previous state-of-the-art inversion methods, with superior computation efficiency.

Experiments

Image-Editing Results.

Comparison with state-of-the-art inversion methods.

We conduct a comparison with state-of-the-art inversion-enhancing techniques, covering

  • Null-Text Inversion (NTI) [1]
  • EDICT [2]
  • DDPM Inversion (DI) [3]
  • Virtual Inversion (VI) [4]
  • BELM [5]
  • PnP Inversion (PI) [6]
We adopt P2P [7] as the baseline editing framework, and all the inversion approaches are integrated into it. The results show that FreeInv is competitive among the state-of-the-art inversion methods, with superior computation efficiency.

Plugging in PnP and MasaCtrl.

Thanks to operational simplicity, FreeInv can be readily plugged into existing inversion-based image editing frameworks. Besides P2P, we compare the image editing results of PnP [8] and MasaCtrl [9] with and without FreeInv.


Video-Editing Results.

Our results.

We integrate FreeInv into a representative inversion-based video editing method, TokenFlow [10]. The video editing results are presented as follow.


Input "Lionel Messi" "LeBron James" "Will Smith"

Input "Pixar Animation" "A Tiger" "An Orange Cat"

Input "8-bit pixel art" "A marble sculpture" "Pixar animation"

Input "In the forest" "Brown trousers" "A silver robot"

Input "A car drifting on the ice"

Comparisons to baselines.

We compare video editing results among

  • TokenFlow ([10]),
  • TokenFlow+STEM-Inv ([11]),
  • TokenFlow+FreeInv(Ours).

"Lionel Messi" Ours TokenFlow ([10]) STEM-Inv ([11])

"Pixar animation" Ours TokenFlow ([10]) STEM-Inv ([11])

"A black SUV" Ours TokenFlow ([10]) STEM-Inv ([11])

Comparison to DDIM Inversion.

We make a comparison of reconstruction results between DDIM inversion and FreeInv. Additionally, we show the editing results with the inverted latent, denoted as DDIM editing and FreeInv editing, respectively. The visualization demonstrates that FreeInv boosts the reconstruction fidelity and further benefits editing quality.

Input DDIM recon. FreeInv recon. DDIM editing FreeInv editing

Input DDIM recon. FreeInv recon. DDIM editing FreeInv editing

Input DDIM recon. FreeInv recon. DDIM editing FreeInv editing

Input DDIM recon. FreeInv recon. DDIM editing FreeInv editing

BibTeX

@article{bao2025freeinv,
            title={FreeInv: Free Lunch for Improving DDIM Inversion},
            author={Bao, Yuxiang and Liu, Huijie and Gao, Xun and Fu, Huan and Kang, Guoliang},
            journal={arXiv preprint arXiv:2503.23035},
            year={2025}
            }

References

[1] Ron Mokady, et al. "Null-text inversion for editing real images using guided diffusion models." In CVPR, 2023.

[2] Bram Wallace, et al. "Edict: Exact diffusion inversion via coupled transformations." In CVPR, 2023.

[3] Inbar Huberman-Spiegelglas, et al. "An edit friendly ddpm noise space: Inversion and manipulations." In CVPR, 2024.

[4] Sihan Xu, et al. "Inversion-free image editing with natural language." In CVPR, 2024.

[5] Fangyikang Wang, et al. "BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models." In NeurIPS, 2024.

[6] Xuan Ju, et al. "Pnp inversion: Boosting diffusion-based editing with 3 lines of code." In ICLR, 2024.

[7] Amir Hertz, et al. "Prompt-to-prompt image editing with cross attention control." In ICLR, 2023.

[8] Narek Tumanyan, et al. "Plug-and-play diffusion features for text-driven image-to-image translation." In CVPR, 2023.

[9] Mingdeng Cao, et al. "Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing." In ICCV, 2023.

[10] Geyer, Michal, et al. "Tokenflow: Consistent diffusion features for consistent video editing." In ICLR, 2024.

[11] Maomao Li, et al. "A video is worth 256 bases: Spatial-temporal expectation-maximization inversion for zero-shot video editing." In CVPR, 2024.