Peer-reviewed veterinary case report
A multistream attention based neural network for visual speech recognition and sign language understanding.
- Year:
- 2025
- Authors:
- Talaat FM & Hassan BM.
- Affiliation:
- Faculty of Artificial Intelligence
Abstract
This paper introduces SignKeyNet, a novel multi-stream keypoint-based neural network designed to enhance lip-reading recognition and support foundational sign language understanding. The architecture decouples signer movements into three primary streams hands, face, and body using 133 pose keypoints extracted via pose estimation techniques. Each stream is processed independently using specialized attention modules, followed by an attention-based fusion mechanism that models cross-modal spatiotemporal dependencies. SignKeyNet is evaluated on the MIRACL-VC1 lip-reading dataset, achieving superior performance over baseline models such as HMMs, DTW, CNNs, LSTMs, and Two-stream ConvNets, with results including an accuracy of 0.85, a Word Error Rate (WER) of 0.12, and a Character Error Rate (CER) of 0.06. These results highlight the effectiveness of attention-driven, multi-modal architectures for visual speech recognition tasks. While the current evaluation focuses on lip-reading due to dataset constraints, the proposed architecture is extendable to full Sign Language Translation (SLT) systems. SignKeyNet demonstrates strong potential for real-time deployment in accessibility technologies, particularly for the deaf and hard-of-hearing communities.
Find similar cases for your pet
PetCaseFinder finds other peer-reviewed reports of pets with the same symptoms, plus a plain-English summary of what was tried across them.
Search related cases →Original publication: https://europepmc.org/article/MED/41444251