arxiv:2501.13087

Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation

Published on Jan 22

Authors:

Abstract

Orchid, a unified latent diffusion model, efficiently generates color, depth, and surface normal images from text and supports joint monocular depth and normal estimation, demonstrating competitive performance in geometry prediction and realism in inpainting.

AI-generated summary

We introduce Orchid, a unified latent diffusion model that learns a joint appearance-geometry prior to generate color, depth, and surface normal images in a single diffusion process. This unified approach is more efficient and coherent than current pipelines that use separate models for appearance and geometry. Orchid is versatile - it directly generates color, depth, and normal images from text, supports joint monocular depth and normal estimation with color-conditioned finetuning, and seamlessly inpaints large 3D regions by sampling from the joint distribution. It leverages a novel Variational Autoencoder (VAE) that jointly encodes RGB, relative depth, and surface normals into a shared latent space, combined with a latent diffusion model that denoises these latents. Our extensive experiments demonstrate that Orchid delivers competitive performance against SOTA task-specific methods for geometry prediction, even surpassing them in normal-prediction accuracy and depth-normal consistency. It also inpaints color-depth-normal images jointly, with more qualitative realism than existing multi-step methods.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2501.13087 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2501.13087 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2501.13087 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.