Peer-reviewed veterinary case report
Audio-driven single image talking face animation with transformers.
- Year:
- 2026
- Authors:
- Li Y & Shen X.
- Affiliation:
- Department of Intelligent Technology · China
Abstract
Audio-driven talking-head video generation is a critical task in cross-modal expressive synthesis, with applications in virtual humans, digital content creation, and human-computer interaction. Existing methods, however, often suffer from unnatural lip movements and distortions in non-speech facial regions, especially under exaggerated expressions or emotional variations. These issues arise due to the entanglement of linguistic content, prosodic emotion, and speaker-specific attributes within the audio signal. To address these challenges, we propose ExpNet, a Transformer-based expression regression framework that decouples global head motion from local facial expressions using 3DMM coefficients. The method employs a conditional VAE for robust head pose coefficient generation, while a CNN-Transformer architecture regresses expression coefficients. ExpNet introduces ALiBi-based relative positional bias in the self-attention mechanism, which captures long-range dependencies while focusing on local temporal context. It also conditions on the first-frame expression coefficient to preserve identity and emotion consistency throughout the video. Experimental evaluations on multiple datasets, including HDTF, MEAD, and LRS3, demonstrate that the method presented in this paper outperforms existing methods in terms of expression realism, lip synchronization, and video quality. Ablation studies reveal that key components such as ALiBi, landmark supervision, and the Transformer module are crucial for improving temporal stability, reducing lip jitter, and enhancing overall facial animation consistency.
Find similar cases for your pet
PetCaseFinder finds other peer-reviewed reports of pets with the same symptoms, plus a plain-English summary of what was tried across them.
Search related cases →Original publication: https://europepmc.org/article/MED/41554855