Peer-reviewed veterinary case report

Realistic talking head videos for multi-turn conversations

By Chen P et al.·2026·View original on Europe PMC →

PetCaseFinder translated the abstract of this peer-reviewed paper into plain English so pet owners can read it. We do not publish original research — every detail traces back to the citation above. How we work →

Original publication title: RSATalker: Realistic Socially-Aware Talking Head Generation for Multi-Turn Conversation.

Plain-English summary

This research focuses on creating realistic talking avatars for virtual reality (VR) that can engage in conversations with multiple turns. Current methods either look good but are too expensive to run or can handle conversations but lack realistic details. The new system, called RSATalker, uses a technique that combines 3D facial movements with realistic textures to create lifelike videos of avatars that also understand social relationships, like family ties or power dynamics. The researchers developed a special training process and a dataset that helps the avatars learn these social cues. Overall, RSATalker shows impressive results in making avatars that look real and can interact socially, and the researchers plan to share their code and dataset for others to use.

Abstract

Talking head generation is increasingly important in virtual reality (VR), especially for social scenarios involving multi-turn conversation. Existing approaches face notable limitations: mesh-based 3D methods can model dual-person dialogue but lack realistic textures, large-model-based 2D methods produce natural appearances but incur prohibitive computational costs. Recently, 3D Gaussian Splatting (3DGS)-based methods achieve efficient and realistic rendering but remain speaker-only and ignore social relationships. We introduce RSATalker, the first framework that leverages 3DGS for realistic and socially-aware talking head generation, with support for multi-turn conversation. Our method first drives mesh-based 3D facial motion from speech, then binds 3D Gaussians to mesh facets to render high-fidelity 2D avatar videos. To capture interpersonal dynamics, we propose a socially-aware module that encodes social relationships, including blood and non-blood as well as equal and unequal, into high-level embeddings through a learnable query mechanism. We design a three-stage training paradigm and construct the RSATalker dataset with speech-mesh-image triplets annotated with social relationships. Our method supports applications such as VR telepresence, social VR, and embodied conversational agents. The socially-aware conditioning can also be extended to other human motion generation tasks. Extensive experiments demonstrate that RSATalker achieves state-of-the-art performance in both realism and social awareness. The code and dataset will be released.

Find similar cases for your pet

PetCaseFinder finds other peer-reviewed reports of pets with the same symptoms, plus a plain-English summary of what was tried across them.

Search related cases →

Original publication on Europe PMC: https://europepmc.org/article/MED/41921161