Dual-Branch Network with Dynamic Time Warping: Enhancing Micro-Expression Recognition Through Temporal Alignment
Abstract
1. Introduction
- (i)
- We introduce a novel DTW implementation that resolves variable-length sequence alignment through path-weighted averaging.
- (ii)
- We propose a dual-branch spatiotemporal architecture that combines a 1D-CNN pathway with spatial dropout for local AU detection, a BiLSTM pathway, and feature fusion via channel-wise concatenation.
- (iii)
- We develop a hybrid augmentation pipeline that implements temporal warping and universal Gaussian noise injection to landmark coordinates.
- (iv)
- We deliver an end-to-end real-time optimization framework that achieves 142 FPS through frame decimation and circular buffering with an overwrite strategy.
2. Related Work
2.1. Traditional Micro-Expression Recognition Methods
2.2. Deep Learning Approaches in Facial Expression Analysis
2.3. Temporal Modeling Techniques
2.4. Facial Landmark-Based Approaches
2.5. Research Gaps and Motivation
3. Methodology
3.1. System Overview
End-to-End Processing Pipeline
3.2. Facial Landmark Extraction
3.2.1. MediaPipe Facial Mesh Architecture
3.2.2. 3D Coordinate Normalization
- Centroid Alignment
- 2.
- Scale Invariance
- Depth Enhancement
- Magnifies micro-expressions in -direction
- Compensates for perspective foreshortening effects
| # Geometric normalization implementation landmarks = np.array([[lm.x, lm.y, lm.z] for lm in face_landmarks]) centroid = np.mean(landmarks, axis = 0) normalized = landmarks − centroid max_norm = np.max(np.linalg.norm(normalized, axis = 1)) normalized /= max_norm normalized[:, 2] * = 1.5 # Depth enhancement |
3.3. Dynamic Sequence Alignment
3.3.1. Dynamic Time Warping (DTW) Algorithm
3.3.2. Reference Sequence Generation
3.3.3. Path-Weighted Averaging Technique
3.3.4. Sequence Standardization (z-Score Normalization)
3.4. Hybrid Data Augmentation
3.4.1. Temporal Warping Strategy
3.4.2. Gaussian Noise Injection
3.4.3. Augmentation Strategy
3.5. Dual-Branch Spatiotemporal Network
3.5.1. CNN Branch Architecture
3.5.2. BiLSTM Branch Architecture
3.5.3. Feature Fusion Mechanism (Channel-Wise Concatenation)
3.5.4. Classification Head
3.6. Training Configuration
3.6.1. Loss Function
3.6.2. Optimizer Settings
3.6.3. Regularization Strategies ( Weight Decay, Dropout)
3.6.4. Early Stopping Criteria
4. Experimental Setup
4.1. Dataset Preparation
4.1.1. CASMEII Dataset Description
4.1.2. Data Splitting Strategy
4.1.3. Class Distribution Analysis
4.1.4. SAMM Dataset Description
4.2. Evaluation Metrics
4.2.1. Primary Metrics
4.2.2. Secondary Metrics
4.2.3. Statistical Significance Testing
4.3. Baseline Methods
4.3.1. 3D-CNN Baseline
4.3.2. LSTM-Based Sequence Model
4.4. Implementation Details
4.4.1. Hardware Configuration
- Processor: AMD Ryzen 7 7840H with Radeon 780 M Graphics (8 cores, 16 threads @3.8 GHz)
- Graphics Processing Unit: AMD Radeon 780 M Graphics
- Memory: 32 GB LPDDR5 (6400 MHz, dual-channel configuration)
- Storage: 1 TB UMIS PREYJ1T24MKN2QWY NVMe SSD
- Peripheral: SunplusIT Integrated Camera 1080P Camera for real-time validation
4.4.2. Software Environment
- Operating System: Microsoft Windows 11 23H2 (OS Version 22631.5624, Microsoft Windows NT kernel 10.0.22631.5624)
- Deep Learning Framework: TensorFlow 2.18.0 (CPU)
- OpenCV 4.8.1.78 (video I/O abd optical flow)
- MediaPipe 0.10.21 (real-time facial landmark detection)
- NumPy 1.26.4 (array operations)
- SciPy 1.11.4 (DTW optimization)
- Evaluation Toolkit: scikit-learn 1.5.1 (metrics computation)
4.4.3. Hyperparameter Settings
4.4.4. Real-Time Optimization Techniques
4.4.5. Reproducibility Protocol
5. Results and Analysis
5.1. Overall Performance
Comparison with Baselines
5.2. Model Training Convergence Analysis
5.3. Ablation Studies
5.3.1. Impact of DTW Alignment (+2.44% Accuracy)
5.3.2. Contribution of Dual-Branch Architecture (+1.84% F1-Score)
5.4. Real-Time Performance
5.4.1. Frame Rate Analysis (142.3 FPS)
5.4.2. Conclusion
6. Discussion
6.1. Interpretation of Key Findings
6.2. Advantages of Proposed Approach
6.3. Comparison with State-of-the-Art
6.4. Limitations and Challenges
6.5. Practical Implications for Psychological Diagnostics
7. Conclusions and Future Work
7.1. Summary of Contributions
7.2. Potential Applications
7.3. Future Research Directions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhao, G.; Li, X.; Li, Y.; Pietikäinen, M. Facial Micro-Expressions: An Overview. Proc. IEEE 2023, 111, 1215–1235. [Google Scholar] [CrossRef]
- Oh, Y.H.; See, J.; Le Ngo, A.C.; Phan, R.C.; Baskaran, V.M. A Survey of Automatic Facial Micro-Expression Analysis: Databases, Methods, and Challenges. Front. Psychol. 2018, 9, 1128. [Google Scholar] [CrossRef]
- Yang, J.; Wu, Z.; Wu, R. Micro-expression recognition based on contextual transformer networks. Vis. Comput. 2025, 41, 1527–1541. [Google Scholar] [CrossRef]
- Gan, Y.S.; Liu, K.-H.; Liong, G.-B.; Liong, S.-T. Micro-expression recognition in wild video environments: Latent feature-based ANN (LFANN) from 3D reconstructed faces. Neurocomputing 2025, 625, 129480. [Google Scholar] [CrossRef]
- Ben, X.; Ren, Y.; Zhang, J.; Wang, S.J.; Kpalma, K.; Meng, W.; Liu, Y.J. Video-Based Facial Micro-Expression Analysis: A Survey of Datasets, Features and Algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 5826–5846. [Google Scholar] [CrossRef] [PubMed]
- Liu, S.; Huang, Y.; Yu, H.; Xu, Y. AMNet: An attention-enhanced multi-branch network for micro-expression recognition. Vis. Comput. 2025, 41, 6521–6532. [Google Scholar] [CrossRef]
- Zhao, S.; Tao, H.; Zhang, Y.; Xu, T.; Zhang, K.; Hao, Z.; Chen, E. A two-stage 3D CNN based learning method for spontaneous micro-expression recognition. Neurocomputing 2021, 448, 276–289. [Google Scholar] [CrossRef]
- Wang, Z.; Zhang, K.; Luo, W.; Sankaranarayana, R. HTNet for micro-expression recognition. Neurocomputing 2024, 602, 128196. [Google Scholar] [CrossRef]
- Liong, S.-T.; See, J.; Wong, K.; Phan, R.C.W. Less is more: Micro-expression recognition from video using apex frame. Signal Process. Image Commun. 2018, 62, 82–92. [Google Scholar] [CrossRef]
- Yildirim, S.; Chimeumanu, M.S.; Rana, Z.A. The influence of micro-expressions on deception detection. Multimed. Tools Appl. 2023, 82, 29115–29133. [Google Scholar] [CrossRef]
- Nikbin, S.; Qu, Y. A Study on the Accuracy of Micro Expression Based Deception Detection with Hybrid Deep Neural Network Models. Eur. J. Electr. Eng. Comput. Sci. 2024, 8, 14–20. [Google Scholar] [CrossRef]
- Yuan, S.; Shao, Z.; Ma, Z.; Cao, T.; Xing, H.; Liu, Y.; Cao, Y. Deception detection based on micro-expression and feature selection methods. EURASIP J. Image Video Process. 2025, 2025, 8. [Google Scholar] [CrossRef]
- Gilanie, G.; Cheema, S.; Latif, A.; Saher, A.; Ahsan, M.; Ullah, H.; Oommen, D. A robust method of bipolar mental illness detection from facial micro expressions using machine learning methods. Intell. Autom. Soft Comput. 2024, 39, 57–71. [Google Scholar] [CrossRef]
- Sumi, K.; Ueda, T. Micro-Expression Recognition for Detecting Human Emotional Changes. In Proceedings of the Human-Computer Interaction. Novel User Experiences, Toronto, ON, Canada, 17–22 July 2016; pp. 60–70. [Google Scholar]
- Esmaeili, V.; Shahdi, S.O. Automatic micro-expression apex spotting using Cubic-LBP. Multimed. Tools Appl. 2020, 79, 20221–20239. [Google Scholar] [CrossRef]
- Wei, J.; Lu, G.; Yan, J. A comparative study on movement feature in different directions for micro-expression recognition. Neurocomputing 2021, 449, 159–171. [Google Scholar] [CrossRef]
- Liong, S.T.; Phan, R.C.W.; See, J.; Oh, Y.H.; Wong, K. Optical strain based recognition of subtle emotions. In Proceedings of the 2014 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), 1–4 December 2014, 2014; pp. 180–184. [Google Scholar]
- Liu, Y.J.; Zhang, J.K.; Yan, W.J.; Wang, S.J.; Zhao, G.; Fu, X. A Main Directional Mean Optical Flow Feature for Spontaneous Micro-Expression Recognition. IEEE Trans. Affect. Comput. 2016, 7, 299–310. [Google Scholar] [CrossRef]
- Liu, Y.; Du, H.; Zheng, L.; Gedeon, T. A Neural Micro-Expression Recognizer. In Proceedings of the 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), Lille, France, 14–18 May 2019; pp. 1–4. [Google Scholar]
- Xia, Z.; Hong, X.; Gao, X.; Feng, X.; Zhao, G. Spatiotemporal Recurrent Convolutional Networks for Recognizing Spontaneous Micro-Expressions. IEEE Trans. Multimed. 2020, 22, 626–640. [Google Scholar] [CrossRef]
- Xia, Z.; Peng, W.; Khor, H.Q.; Feng, X.; Zhao, G. Revealing the Invisible With Model and Data Shrinking for Composite-Database Micro-Expression Recognition. IEEE Trans. Image Process. 2020, 29, 8590–8605. [Google Scholar] [CrossRef] [PubMed]
- Zeng, X.; Zhao, X.; Zhong, X.; Liu, G. A Survey of Micro-expression Recognition Methods Based on LBP, Optical Flow and Deep Learning. Neural Process. Lett. 2023, 55, 5995–6026. [Google Scholar] [CrossRef]
- Goh, K.M.; Ng, C.H.; Lim, L.L.; Sheikh, U.U. Micro-expression recognition: An updated review of current trends, challenges and solutions. Vis. Comput. 2020, 36, 445–468. [Google Scholar] [CrossRef]
- Huang, X.; Zhao, G.; Hong, X.; Zheng, W.; Pietikäinen, M. Spontaneous facial micro-expression analysis using Spatiotemporal Completed Local Quantized Patterns. Neurocomputing 2016, 175, 564–578. [Google Scholar] [CrossRef]
- Huang, X.; Wang, S.J.; Liu, X.; Zhao, G.; Feng, X.; Pietikäinen, M. Discriminative Spatiotemporal Local Binary Pattern with Revisited Integral Projection for Spontaneous Facial Micro-Expression Recognition. IEEE Trans. Affect. Comput. 2019, 10, 32–47. [Google Scholar] [CrossRef]
- Zhi, R.; Xu, H.; Wan, M.; Li, T. Combining 3D Convolutional Neural Networks with Transfer Learning by Supervised Pre-Training for Facial Micro-Expression Recognition. IEICE Trans. Inf. Syst. 2019, E102.D, 1054–1064. [Google Scholar] [CrossRef]
- Li, J.; Wang, Y.; See, J.; Liu, W. Micro-expression recognition based on 3D flow convolutional neural network. Pattern Anal. Appl. 2019, 22, 1331–1339. [Google Scholar] [CrossRef]
- Quang, N.V.; Chun, J.; Tokuyama, T. CapsuleNet for Micro-Expression Recognition. In Proceedings of the 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), Lille, France, 14–18 May 2019; pp. 1–7. [Google Scholar]
- Chen, H.; Cui, J.; Zhang, Y.; Zhang, Y. VIT and Bi-LSTM for Micro-Expressions Recognition. In Proceedings of the 2022 IEEE 5th International Conference on Information Systems and Computer Aided Education (ICISCAE), Dalian, China, 23–25 September 2022; pp. 946–951. [Google Scholar]
- Jeong, S.-D. Speaker Identification Using Dynamic Time Warping Algorithm. J. Korea Acad.-Ind. Coop. Soc. 2011, 12, 2402–2409. [Google Scholar] [CrossRef]
- Mayya, V.; Pai, R.M.; Pai, M.M.M. Combining temporal interpolation and DCNN for faster recognition of micro-expressions in video sequences. In Proceedings of the 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Jaipur, India, 21–24 September 2016; pp. 699–703. [Google Scholar]
- Ngo, A.C.L.; Liong, S.T.; See, J.; Phan, R.C.W. Are subtle expressions too sparse to recognize? In Proceedings of the 2015 IEEE International Conference on Digital Signal Processing (DSP), Singapore, 21–24 July 2015; pp. 1246–1250. [Google Scholar]
- Iqtait, M.; Mohamad, F.; Mamat, M. Feature extraction for face recognition via active shape model (ASM) and active appearance model (AAM). In Proceedings of the IOP Conference Series: Materials science and engineering, Suzhou, China, 22–24 June 2018; p. 012032. [Google Scholar]
- Alsarayreh, A.; Mohamad, F. Enhanced Constrained Local Models (CLM) for Facial Feature Detection. Int. J. Eng. Res. Technol. 2020, 13, 3217. [Google Scholar] [CrossRef]
- Yang, Z.; Ge, W.; Zhang, Z. Face Recognition Based on MTCNN and Integrated Application of FaceNet and LBP Method. In Proceedings of the 2020 2nd International Conference on Artificial Intelligence and Advanced Manufacture (AIAM), Manchester, UK, 15–17 October 2020; pp. 95–98. [Google Scholar]
- Khan, S.S.; Sengupta, D.; Ghosh, A.; Chaudhuri, A. MTCNN++: A CNN-based face detection algorithm inspired by MTCNN. Vis. Comput. 2024, 40, 899–917. [Google Scholar] [CrossRef]
- Baltrušaitis, T.; Robinson, P.; Morency, L.P. OpenFace: An open source facial behavior analysis toolkit. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–10 March 2016; pp. 1–10. [Google Scholar]
- Lo, L.; Xie, H.X.; Shuai, H.H.; Cheng, W.H. MER-GCN: Micro-Expression Recognition Based on Relation Modeling with Graph Convolutional Networks. In Proceedings of the 2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), Shenzhen, China, 6–8 August 2020; pp. 79–84. [Google Scholar]
- Wang, W.; Bi, B.; Yan, M.; Wu, C.; Bao, Z.; Peng, L.; Si, L. StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding. arXiv 2019, arXiv:1908.04577. [Google Scholar] [CrossRef]
- Lopez, E.; Uncini, A.; Comminiello, D. Hierarchical Hypercomplex Network for Multimodal Emotion Recognition. In Proceedings of the 2024 IEEE 34th International Workshop on Machine Learning for Signal Processing (MLSP), London, UK, 22–25 September 2024; pp. 1–6. [Google Scholar]
- Saha, P.; Ansaruddin Kunju, A.K.; Majid, M.E.; Bin Abul Kashem, S.; Nashbat, M.; Ashraf, A.; Hasan, M.; Khandakar, A.; Shafayet Hossain, M.; Alqahtani, A.; et al. Novel multimodal emotion detection method using Electroencephalogram and Electrocardiogram signals. Biomed. Signal Process. Control 2024, 92, 106002. [Google Scholar] [CrossRef]
- Wang, S.; Qu, J.; Zhang, Y.; Zhang, Y. Multimodal Emotion Recognition from EEG Signals and Facial Expressions. IEEE Access 2023, 11, 33061–33068. [Google Scholar] [CrossRef]
- Li, Q.; Zhan, S.; Xu, L.; Wu, C. Facial micro-expression recognition based on the fusion of deep learning and enhanced optical flow. Multimed. Tools Appl. 2019, 78, 29307–29322. [Google Scholar] [CrossRef]
- Peng, Y.; Wang, W.; Kong, W.; Nie, F.; Lu, B.L.; Cichocki, A. Joint Feature Adaptation and Graph Adaptive Label Propagation for Cross-Subject Emotion Recognition From EEG Signals. IEEE Trans. Affect. Comput. 2022, 13, 1941–1958. [Google Scholar] [CrossRef]
- Alghamdi, A.M.; Ashraf, M.U.; Bahaddad, A.A.; Almarhabi, K.A.; Al Shehri, W.A.; Daraz, A. Cross-subject EEG signals-based emotion recognition using contrastive learning. Sci. Rep. 2025, 15, 28295. [Google Scholar] [CrossRef]
- Samal, P.; Hashmi, M.F. A dynamic spectrum driven network for enhanced multimodal emotion recognition with EEG and ECG signals. Biocybern. Biomed. Eng. 2026, 46, 139–161. [Google Scholar] [CrossRef]




| Technique | Scope | Parameter |
|---|---|---|
| Regularization | Dense layers | |
| Dropout | CNN branch | rate = 0.3 |
| Dropout | Classifier | rate = 0.5 |
| Hyperparameter | Symbol | Value |
|---|---|---|
| Batch Size | 2 | |
| Maximum Epochs | 100 | |
| Learning Rate | ||
| Regularization | ||
| Gradient Clipping | None |
| Characteristic | Value | Description |
|---|---|---|
| Subjects | 26 | Chinese participants |
| Expression types | 5 | Happiness, Surprise, Disgust, Repression, Others |
| Video resolution | 640 480 | Captured at 200 FPS |
| Total samples | 247 | Micro-expression clips |
| Expression | Samples | Proportion |
|---|---|---|
| Happiness | 33 | 13.36% |
| Surprise | 25 | 10.12% |
| Disgust | 60 | 24.29% |
| Repression | 27 | 10.93% |
| Others | 102 | 41.30% |
| Expression | Samples | Proportion |
|---|---|---|
| Happiness | 26 | 16.35% |
| Contempt | 12 | 7.55% |
| Surprise | 15 | 9.43% |
| Anger | 57 | 35.85% |
| Disgust | 9 | 5.66% |
| Sadness | 6 | 3.77% |
| Fear | 8 | 5.03% |
| Others | 26 | 16.36% |
| Parameter | Value | Scope | Optimization Method |
|---|---|---|---|
| Batch Size | 2 | Training | Empirical validation |
| Max Epochs | 20 | Training | Early stopping criterion |
| Learning Rate () | Optimization | Adam default | |
| Regularization () | Regularization | Grid search | |
| CNN Dropout Rate | 0.3 | Feature extraction | Bayesian optimization |
| Classifier Dropout | 0.5 | Classification | Cross-validation |
| Sequence Length | 20 | Preprocessing | Ablation study (Section 5.3.2) |
| Minimum Frames | 10 | Real-time detection | Frame threshold analysis |
| Method | CASMEII Accuracy (%) | CASMEII F1-Score | SAMM Accuracy (%) | SAMM F1-Score |
|---|---|---|---|---|
| Optical Flow + SVM [43] | 58.03 | Null | Null | Null |
| 3D-FCNN [27] | 59.11 | Null | Null | Null |
| 3D-CNNs(train from scratch) [26] | 94.20 | Null | 95.80 | Null |
| 3D-CNNs(with transfer learning) [26] | 97.60 | Null | 97.40 | Null |
| BiLSTM + Attention [29] | 70.00 | 0.7220 | Null | Null |
| Ours | 99.22 | 0.9949 | 98.74 | 98.64 |
| Improvement vs. best baseline | +1.66% | +37.80% | +1.38% | - |
| Ours (5-fold) | 96.77 | 0.9586 | - | - |
| Class | Precision | Recall | F1-Score |
|---|---|---|---|
| happiness | 1.00 | 0.88 | 0.93 |
| others | 0.95 | 1.00 | 0.98 |
| surprise | 1.00 | 0.96 | 0.98 |
| Weighted Avg | 0.97 | 0.97 | 0.97 |
| Alignment Method | Accuracy (%) | Acc |
|---|---|---|
| Linear interpolation | 96.86 | Base |
| DTW (Ours) | 99.22 | +2.44% |
| Configuration | F1 |
|---|---|
| CNN branch only | 94.74 |
| LSTM branch only | 97.65 |
| Concatenation (Ours) | 99.49 |
| Component | AMD Ryzen 7 7840H (FPS) | Latency (ms) |
|---|---|---|
| End-to-End | 142.3 | 7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yao, Q.; Wang, M.; Chen, D.; Liu, D.; Li, Y. Dual-Branch Network with Dynamic Time Warping: Enhancing Micro-Expression Recognition Through Temporal Alignment. Symmetry 2026, 18, 775. https://doi.org/10.3390/sym18050775
Yao Q, Wang M, Chen D, Liu D, Li Y. Dual-Branch Network with Dynamic Time Warping: Enhancing Micro-Expression Recognition Through Temporal Alignment. Symmetry. 2026; 18(5):775. https://doi.org/10.3390/sym18050775
Chicago/Turabian StyleYao, Qiaohong, Mengmeng Wang, Dayu Chen, Dan Liu, and Yubin Li. 2026. "Dual-Branch Network with Dynamic Time Warping: Enhancing Micro-Expression Recognition Through Temporal Alignment" Symmetry 18, no. 5: 775. https://doi.org/10.3390/sym18050775
APA StyleYao, Q., Wang, M., Chen, D., Liu, D., & Li, Y. (2026). Dual-Branch Network with Dynamic Time Warping: Enhancing Micro-Expression Recognition Through Temporal Alignment. Symmetry, 18(5), 775. https://doi.org/10.3390/sym18050775

