Next Article in Journal
Numerical Algorithms for Acoustic Wave Propagation in Pipelines via a Class of Stochastic Partial Differential Systems
Previous Article in Journal
Optimal Decay Estimates for the 2D Micropolar Equations with Mixed Velocity Dissipation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Analytical Modeling of Hybrid CNN-Transformer Dynamics for Emotion Classification

by
Ergashevich Halimjon Khujamatov
1,*,
Mirjamol Abdullaev
2 and
Sabina Umirzakova
1
1
Department of Computer Engineering, Gachon University, Seognam-daero, Sujeong-gu, Seongnam-si 1342, Republic of Korea
2
Department of Information Systems and Technologies, Tashkent State University of Economics, Tashkent 100066, Uzbekistan
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(1), 85; https://doi.org/10.3390/math14010085
Submission received: 9 October 2025 / Revised: 6 December 2025 / Accepted: 23 December 2025 / Published: 25 December 2025

Abstract

Facial expression recognition (FER) is crucial for affective computing and human–computer interaction; however, it is still difficult to achieve under various conditions in the real world, such as lighting, occlusion, and pose. This work presents a lightweight hybrid network, SE-Hybrid + Face-ViT, which merges convolutional and transformer architectures through multi-level feature fusion and adaptive channel attention. The network includes a convolutional stream to capture the fine-grained texture of the image and a retrained Face-ViT branch to provide the high-level semantic context. Squeeze-and-Excitation (SE) modules adjust the channel responses at different levels, thus allowing the network to focus on the emotion-salient cues and suppress the redundant features. The proposed architecture, trained and tested on the large-scale AffectNet benchmark, achieved 70.45% accuracy and 68.11% macro-F1, thereby outperforming the latest state-of-the-art models such as TBEM-Transformer, FT-CSAT, and HFE-Net by around 2–3%. Grad-CAM-based visualization of the model confirmed accurate attention to the most significant facial areas, resulting in better recognition of subtle expressions such as fear and contempt. The findings indicate that SE-Hybrid + Face-ViT is a computationally efficient yet highly discriminative FER strategy that successfully addresses the issue of how to preserve details while globally reasoning with contextual information locally.
Keywords: facial expression recognition; analytical modeling; multi-level feature fusion; emotion classification; mathematical representation of deep networks facial expression recognition; analytical modeling; multi-level feature fusion; emotion classification; mathematical representation of deep networks

Share and Cite

MDPI and ACS Style

Khujamatov, E.H.; Abdullaev, M.; Umirzakova, S. Analytical Modeling of Hybrid CNN-Transformer Dynamics for Emotion Classification. Mathematics 2026, 14, 85. https://doi.org/10.3390/math14010085

AMA Style

Khujamatov EH, Abdullaev M, Umirzakova S. Analytical Modeling of Hybrid CNN-Transformer Dynamics for Emotion Classification. Mathematics. 2026; 14(1):85. https://doi.org/10.3390/math14010085

Chicago/Turabian Style

Khujamatov, Ergashevich Halimjon, Mirjamol Abdullaev, and Sabina Umirzakova. 2026. "Analytical Modeling of Hybrid CNN-Transformer Dynamics for Emotion Classification" Mathematics 14, no. 1: 85. https://doi.org/10.3390/math14010085

APA Style

Khujamatov, E. H., Abdullaev, M., & Umirzakova, S. (2026). Analytical Modeling of Hybrid CNN-Transformer Dynamics for Emotion Classification. Mathematics, 14(1), 85. https://doi.org/10.3390/math14010085

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop