PetCaseFinder

Peer-reviewed veterinary case report

Hand gesture 3D pose estimation method based on swin transformer and CNN.

Year:
2026
Authors:
Dang R & Feng G.
Affiliation:
School of Architecture · China

Abstract

Existing gesture pose estimation methods commonly exhibit limitations such as singular feature extraction for hand characteristics and the neglect of long-range topological relationships between joints, thus restricting their prediction accuracy. To address these issues, this study proposes a gesture pose estimation method that takes depth images as input. First, rough gesture features are extracted using a convolutional network, while a Swin Transformer module captures the topological relationships between joints and global features through spatial information. A U-shaped network is used to process the features hierarchically, preserving local joint information at various resolutions, which is fused with global features. Two-dimensional Gaussian heatmaps are introduced to represent the distribution of keypoints, improving network supervision for target feature regression. The backend network outputs the final keypoint coordinates. We evaluated the method on a newly constructed dataset, achieving a reduced average squared error of 7.012-4.776 mm lower than that of the baseline model. The experimental results demonstrate the superior performance of the proposed method, when comparing it to state-of-the-art mainstream pose estimation networks.

Find similar cases for your pet

PetCaseFinder finds other peer-reviewed reports of pets with the same symptoms, plus a plain-English summary of what was tried across them.

Search related cases →

Original publication: https://europepmc.org/article/MED/41772146