PetCaseFinder

Peer-reviewed veterinary case report

VolGen: Volumetric Latent Diffusion Models for 3D Object Generation.

Year:
2025
Authors:
Tang J et al.

Abstract

We propose to extend 2D latent diffusion models, well known from the Stable-Diffusion series, to volumetric latent diffusion models for 3D object generation. Specifically, we first train a Volumetric Variational Auto-Encoder (VVAE) to compress 3D occupancy grids into a latent space, which compresses the $512^{3}$ occupancy grid into a $32^{3}$ latent code. We then train a diffusion model on this latent space, utilizing 3D convolutions and cross-attention layers for image conditioning. This Volumetric Latent Diffusion Model (VLDM) generates accurate and smooth mesh surfaces from single-view image inputs, and generalizes well to unseen domains during inference in around 10 seconds. Our key insight is that a simple volume-based latent diffusion model can also perform well for 3D generation tasks, without relying on sparse representations like point clouds or 3D specific techniques like triplane Neural Radiance Fields (NeRF). Extensive experiments demonstrate the effectiveness of our latent diffusion models in the 3D domain, indicating a promising direction for 3D generation tasks.

Find similar cases for your pet

PetCaseFinder finds other peer-reviewed reports of pets with the same symptoms, plus a plain-English summary of what was tried across them.

Search related cases →

Original publication: https://europepmc.org/article/MED/40758522