Peer-reviewed veterinary case report

How AI creates realistic two-person human interaction motions

By Manjotho AA et al.·2026·View original on Europe PMC →

PetCaseFinder translated the abstract of this peer-reviewed paper into plain English so pet owners can read it. We do not publish original research — every detail traces back to the citation above. How we work →

Original publication title: CoShMDM: Contact and Shape-Aware Latent Motion Diffusion Model for Human Interaction Generation.

Plain-English summary

This research focuses on improving how computers generate realistic movements for two people interacting based on text descriptions. Current methods often struggle with creating believable contact between characters and are limited to a single body shape, which can lead to unrealistic movements where the characters seem to pass through each other. The new approach, called CoShMDM, uses advanced techniques to better capture how people interact and move together, taking into account different body shapes and ensuring that their movements look natural and connected. The results show that this method significantly reduces the issues of characters overlapping and improves the realism of their interactions. Overall, the new model performs better than existing methods, making the generated interactions more lifelike.

Abstract

Generating realistic two-person interaction motions from text holds immense potential in computer vision and animations. While existing latent motion diffusion models offer compact and efficient representations, they often fail to produce physically plausible contacts and are typically constrained to a single canonical body shape. As a result, the generated motion sequences exhibit substantial mesh penetrations and lack interaction realism. To address these limitations, we propose a contact and shape-aware latent motion representation and diffusion model (CoShMDM) for generating realistic two-person interactions from text. Our framework begins by constructing contact-compatible motion using SMPL-based meshes and a normal alignment-based mesh contact matrix to capture fine-grained mesh-level contacts. To account for shape diversity, we incorporate SMPL shape parameters and iteratively learn contact dynamics across different body shapes. Additionally, a reinforcement learning-based mesh penetration avoidance policy network, guided by signed distance fields, is introduced to minimize mesh penetrations while preserving contact fidelity and shape-aware motion. We further employ a dual-encoder VQ-VAE to learn disentangled latent representations for motion and contacts, which are then utilized in a text- and body-shape-conditioned diffusion model. To ensure spatial, temporal, and semantic coherence, we integrate a novel contact and motion consistency module into the diffusion transformer. Extensive evaluations on the InterHuman and InterX datasets demonstrate that our method outperforms state-of-the-art approaches achieving lowest FID scores (4.801 and 0.013), with 19% and 17.3% reductions in mesh penetrations, and 17.8% and 33.2% gains in contact similarity, respectively.

Find similar cases for your pet

PetCaseFinder finds other peer-reviewed reports of pets with the same symptoms, plus a plain-English summary of what was tried across them.

Search related cases →

Original publication on Europe PMC: https://europepmc.org/article/MED/41855059