TriNet-HAR: A trimodal framework for human activity recognition
2026, vol.18 , no.1, pp. 45-56
Article [2026-01-05]
Human Activity Recognition (HAR) is vital in healthcare, security, and human-computer interaction. Traditional unimodal or bimodal approaches often fail to capture the full complexity of human actions. To overcome this, TriNet-HAR, a novel trimodal deep learning framework is proposed that integrates depth maps, 3D skeletal joints, and inertial signals via a unified, explainable architecture. Each modality is processed using a dedicated module: Convolutional Neural Network (CNN) for depth, joint-wise attention layers mimicking Graph-Convolution-Network (GCN) for skeletons, and positional encoding-enhanced Long Short-Term Memory (LSTM) for Inertial Measurement Unit (IMU) data. A multi-head self-attention and cross-attention fusion strategy refines inter-modality dependencies. SHapley Additive exPlanations (SHAP) and Gradient-weighted Class Activation Mapping (Grad-CAM) are employed for interpretability. Evaluated on the University of Texas at Dallas Multimodal Human Action Dataset (UTD-MHAD) dataset, TriNet-HAR achieves 98% accuracy, outperforming unimodal and bimodal baselines in precision, recall, and F1-score, demonstrating superior robustness, adaptability, and transparency.
humana activity recognition, deep learning, multi-modal fusion, encoding, data fusion
https://doi.org/10.59035/XAHH5647
Basamma Umesh Patil, Chetan R, D V Ashoka, Ashok V Sutagundar, Nagamani H Shahapure. TriNet-HAR: A trimodal framework for human activity recognition. International Journal on Information Technologies and Security, vol.18 , no.1, 2026, pp. 45-56. https://doi.org/10.59035/XAHH5647