PetCaseFinder

Peer-reviewed veterinary case report

STFTransNet: A Transformer Based Spatial Temporal Fusion Network for Enhanced Multimodal Driver Inattention State Recognition System.

Year:
2025
Authors:
Kim M & Choi G.
Affiliation:
Department of Artificial Intelligence Engineering · South Korea

Abstract

Recently, studies on driver inattention state recognition as an advanced mobility application technology are being actively conducted to prevent traffic accidents caused by driver drowsiness and distraction. The driver inattention state recognition system is a technology that recognizes drowsiness and distraction by using driver behavior, biosignals, and vehicle data characteristics. Existing driver drowsiness detection systems are wearable accessories that have partial occlusion of facial features and light scattering due to changes in internal and external lighting, which results in momentary image resolution degradation, making it difficult to recognize the driver's condition. In this paper, we propose a transformer based spatial temporal fusion network (STFTransNet) that fuses multi-modality information for improved driver inattention state recognition in images where the driver's face is partially occluded by wearing accessories and the instantaneous resolution is degraded due to light scattering from changes in lighting in a driving environment. The proposed STFTransNet consists of (i) a mediapipe face mesh-based facial landmark extraction process for facial feature extraction, (ii) an RCN-based two-stream cross-attention process for learning spatial features of driver face and body action images, (iii) a TCN-based temporal feature extraction process for learning temporal features of extracted features, and (iv) an ensemble of spatial and temporal features and a classification process to recognize the final driver state. As a result of the experiment, the proposed STFTransNet achieved an accuracy of 4.56% better than the existing VBFLLFA model in the NTHU-DDD public DB, 3.48% better than the existing InceptionV3 + HRNN model in the StateFarm public DB, and 3.78% better than the existing VBFLLFA model in the YawDD public DB. The proposed STFTransNet is designed as a two-stream network that can input the driver's face and action images and solves the degradation in driver inattention state recognition performance due to partial facial feature occlusion and light blur through spatial feature and temporal feature fusion.

Find similar cases for your pet

PetCaseFinder finds other peer-reviewed reports of pets with the same symptoms, plus a plain-English summary of what was tried across them.

Search related cases →

Original publication: https://europepmc.org/article/MED/41013081