Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

Existing single image-to-3D creation methods typically involve a two-stage process, first generating multi-view images, and then using these images for 3D reconstruction. However, training these two stages separately leads to significant data bias in the inference phase, thus affecting the quality of reconstructed results. We introduce a unified 3D generation framework, named Ouroboros3D, which integrates diffusion-based multi-view image generation and 3D reconstruction into a recursive diffusion process. In our framework, these two modules are jointly trained through a self-conditioning mechanism, allowing them to adapt to each other's characteristics for robust inference. During the multi-view denoising process, the multi-view diffusion model uses the 3D-aware maps rendered by the reconstruction module at the previous timestep as additional conditions. The recursive diffusion framework with 3D-aware feedback unites the entire process and improves geometric consistency. Experiments show that our framework outperforms separation of these two stages and existing methods that combine them at the inference phase.

Concept comparison between Ouroboros3D and previous two-stage methods. Instead of directly combining multi-view diffusion model and reconstruction model, our self-conditioned framework involves joint training of these two models and establish them as a recursive association. At each step of the denoising process, the rendered 3D-aware maps are fed to the multi-view generation in the next step.

Concept of 3D-aware recursive diffusion. During multi-view denoising, the diffusion model uses 3D-aware maps rendered by the reconstruction module at the previous step as conditions.

Overview of Ouroboros3D. In the denoising sampling loop, we decode the predicted x0 to noise-corrupted images, which are then used to recover 3D representation by a feed-forward reconstruction model. Then the rendered color images and coordinates maps are encoded and fed into the next denoising step.

BibTeX

@article{wen2024ouroboros3d,
  title={Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion},
  author={Wen, Hao and Huang, Zehuan and Wang, Yaohui and Chen, Xinyuan and Sheng, Lu},
  journal={arXiv preprint arXiv:2406.03184},
  year={2024}
}

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

CVPR 2025

Abstract

Image-to-3D

3D-aware Recursive Diffusion

Method Overview

Results on GSO Dataset

More Results

BibTeX