Peer-reviewed veterinary case report

How text prompts create detailed 4D human avatars with motion

By Youwang K et al.·2026·View original on Europe PMC →

PetCaseFinder translated the abstract of this peer-reviewed paper into plain English so pet owners can read it. We do not publish original research — every detail traces back to the citation above. How we work →

Original publication title: CLIP-Actor-X: Text-driven 4D Human Avatar Generation via Cross-modal Synthesis-through-Optimization.

Plain-English summary

I'm sorry, but the abstract you provided is about a technology for generating human avatars and does not relate to veterinary research or pet health. If you have a specific veterinary case or research abstract you'd like me to translate into plain English for a pet owner, please share that, and I'll be happy to help!

Abstract

We propose CLIP-Actor-X, a text-driven motion generation and neural mesh stylization system for 4D human avatar generation. CLIP-Actor-X generates a detailed 3D human mesh, motion animation, and texture to conform to a given text prompt input from a user. CLIP- Actor-X system mainly consists of two modules. First, for generating realistic human motion, we build a text-driven human motion synthesis module modeled by a retrieval-augmented generative model, powered by a text-to-motion diffusion model. Second, our novel zero-shot neural style optimization module detailizes and texturizes the sampled sequence of a neutral human mesh template, such that the resulting mesh and appearance comply with the input text prompt in a temporally-consistent and pose-agnostic manner. In contrast to the prior arts that use an artist-designed, non-animatable mesh as an input, our output representation is animatable and better aligned between an input text and the generated avatar without additional post-processes, e.g., re-alignment, retargeting, or rigging. We further propose the ways to stabilize the optimization process: spatio-temporal view augmentation and visibility-aware embedding attention, which deals with poorly rendered views. We demonstrate that CLIP-Actor-X produces perceptually plausible and human-recognizable human avatar in motion with detailed geometry and texture solely from a natural language prompt.

Find similar cases for your pet

PetCaseFinder finds other peer-reviewed reports of pets with the same symptoms, plus a plain-English summary of what was tried across them.

Search related cases →

Original publication on Europe PMC: https://europepmc.org/article/MED/41729673