Peer-reviewed veterinary case report
Diffusion Models are Efficient Data Generators for Human Mesh Recovery.
- Year:
- 2025
- Authors:
- Ge Y et al.
Abstract
Despite remarkable progress having been made on the problem of 3D human pose and shape estimation (HPS), current state-of-the-art methods rely heavily on either confined indoor mocap datasets or datasets generated by a rendering engine using computer graphics (CG). Both categories of datasets exhibit inadequacies in furnishing adequate human identities and authentic in-the-wild background scenes, which are crucial for accurately simulating real-world distributions. In this work, we show that synthetic data created by generative models is complementary to CG-rendered data for achieving remarkable generalization performance on diverse real-world scenes. We propose an effective data generation pipeline based on recent diffusion models, termed HumanWild, which can effortlessly generate human images and corresponding 3D mesh annotations. Specifically, we first collect a large-scale human-centric dataset with comprehensive annotations, e.g., text captions, the depth map, and surface normal images. To generate a wide variety of human images with initial labels, we train a customized, multi-condition ControlNet model. The key to this process is using a 3D parametric model, e.g., SMPL-X, to easily create precise 2D keypoints, depth maps, and surface normal images by rendering the 3D mesh [id=PR]with specific camera parameters. As there exists inevitable noise in the initial labels, we apply an off-the-shelf foundation segmentation model to filter negative data samples, and a 2D vertex estimator to rectify the SMPL-X parameters by using SMPLify. Our data generation pipeline is both flexible and customizable, making it adaptable to various real-world tasks, such as human interaction in complex scenes and humans captured by wide-angle lenses. By relying solely on generative models, we can produce large-scale, in-the-wild human images with high-quality annotations, significantly reducing the need for manual image collection and annotation. The generated dataset encompasses a wide range of viewpoints, environments, and human identities, ensuring its versatility across different scenarios. To verify the effectiveness of the generated data, we perform comprehensive data ablation experiments by performing data ablation studies on top of the both generated data and existing datasets, and evaluating on a wide range of HPS benchmarks. We hope our work could pave the way for scaling up 3D human recovery to in-the-wild scenes.
Find similar cases for your pet
PetCaseFinder finds other peer-reviewed reports of pets with the same symptoms, plus a plain-English summary of what was tried across them.
Search related cases →Original publication: https://europepmc.org/article/MED/41252231