Micro-Expression Recognition Using Transformers Neural Networks
Abstract
1. Introduction
1.1. Problem Statement
1.2. Related Works
2. Materials and Methods
2.1. Data Set
2.2. Segmentation and Frame Extraction
2.3. Applied Approach
- Modules
- Data Flow: This module is present in each of the subsequent modules. First, channel attention is computed and multiplied by the input, producing an enhanced representation of the most significant channels. Then, spatial attention is calculated and multiplied by the output of the channel attention step, generating a final representation that emphasizes both relevant channels and spatial regions. The final output is the input image with channel and spatial attention applied, allowing the network to focus on the most relevant features for the task.
- Transformer DualHybridFace_IncepCBAM: This configuration uses only the Inception and CBAM branch (incep_sca), excluding the vit_pos branch. It includes the following hyperparameters:
- in_channels: Number of input channels (default: 3 for RGB and 2 for grayscale images).
- num_classes: Number of classes for the classification task (default: 3 for RGB images and 2 for grayscale images).
- fc: A sequence of fully connected layers that combines the extracted features and performs classification.
- Transformer DualHybridFace_ViT: This configuration uses only the vit_pos branch based on the Vision Transformer module; it does not include the incep_sca branch nor the LSTM module. It includes the following hyperparameters:
- in_channels: Number of input channels (default: 3 for RGB images and 2 for grayscale images).
- num_classes: Number of classes for the classification task (default: 3 for RGB images and 2 for grayscale images).
- fc: A sequence of fully connected layers that combines the extracted features and performs classification.
- Transformer DualHybridFace_LSTM: This module is similar to DualHybridFace but incorporates an LSTM layer after the vit_pos branch. The output of the LSTM is combined with the features from the incep_sca branch and passed through a fully connected layer for classification. It includes the following hyperparameters:
- in_channels: Number of input channels (default: 3 for RGB images and 2 for grayscale images).
- num_classes: Number of classes for the classification task (default: 3 for RGB images and 2 for grayscale images).
- hidden_dim: Dimensionality of the additional feature representation (default: 512 for RGB images).
- fc: A sequence of fully connected layers that combines the features from both branches and performs classification.
- Hyperparameters
- in_channels: The number of input channels (default: 3 for RGB images and 2 for grayscale images).
- num_classes: The number of output classes for each convolution branch (default: 3 for RGB images and 2 for grayscale images).
- Hybrid Architecture
- Inception: Enables efficient extraction of high-frequency features such as textures and local details, which are crucial in many vision tasks. While CNNs excel at this, pure Transformer models tend to focus more on low-frequency, global dependencies. Combining Inception with ViT allows the system to leverage the strengths of both approaches.
- Spatial and Channel Attention with CBAM: By introducing spatial and channel attention methods, the CBAM improves performance in tasks like object detection and semantic segmentation by enabling the model to selectively focus on the most informative regions and channels.
- Global Dependency Capture with ViT: The primary benefit of ViT is its capacity to use self-attention mechanisms to record long-range dependencies throughout an image. This is especially helpful for duties that call for a deep understanding of the scene.
- Efficiently extract both high- and low-frequency features (Inception)
- Selectively emphasize relevant spatial regions and channels (CBAM)
- Capture global dependencies across the image (ViT)
- DualHybridFace Transformer Model
2.4. Transformer
- The model receives an input image to be classified. The ViT partitions this image into small blocks called patches, which are then transformed into numerical vectors through a process known as linear embedding, analogous to describing the colors of a visual scene using descriptive terms.
- After embedding the patches, the model incorporates positional embeddings, which allow it to retain information about the original spatial arrangement of each patch. This step is critical, as the meaning of visual components may depend on their spatial relationships.
- Once the patches have been embedded and assigned positional information, they are arranged into a sequence and processed through a Transformer encoder. This encoder functions as a mechanism that learns the relationships between patches, forming a holistic representation of the image.
- Finally, to enable image classification, a special classification token is appended at the beginning of the sequence. This token is trained jointly with the rest of the model and ultimately contains the information necessary to determine the image category.
- Patch Embedding: The image is divided into patches and converted into linear embeddings. This is the initial step in preparing the image so that the Transformer can interpret it.
- Classification Token: A special token added to the sequence of embeddings that, after passing through the Transformer, contains the necessary information for image classification.
- Positional Embeddings: Incorporated into the patch embeddings to preserve spatial details about the original position of each patch in the image.
- Transformer Blocks: A series of blocks that sequentially process the embeddings using attention mechanisms to understand relationships among the different patches.
- Layer Normalization: Applied to stabilize the embedding values before and after passing through the Transformer blocks.
- Representation Layer or Pre-Logits: An optional layer that may transform the extracted features before final classification, depending on whether a representation size has been defined (patch size).
- Classification Head: The final component of the model that maps the processed features to the predicted classes.
- Mask Generation: An additional layer suggesting that the model may also be designed for segmentation tasks by producing a mask for the image.
- Weight Initialization: Functions that initialize the weights and biases of linear and normalization layers with specific values, providing a suitable starting point for training.
- Additional Functions: Supplementary functions required to exclude certain parameters from weight decay, manipulate the classification head, and define the data flow throughout the model.
- Image Size: Defines the size of the input image and determines how it will be divided into patches (14).
- Patch Size: Specifies the dimensions of each patch (1).
- Input Channels: Indicates the number of channels in the input image (3).
- Number of Classes: Determines the number of output categories for the classification head (1000).
- Embedding Dimension: The embedding dimension for each patch, representing the feature space in which the Transformer operates (512).
- Depth: The depth of the Transformer, referring to the number of sequential Transformer blocks in the model (3).
- Number of Attention Heads: The count of attention heads in every Transformer block allows the model to concentrate on various parts of the image at the same time (4).
- MLP Ratio: The ratio between the hidden layer size of the multilayer perceptron (MLP) and the embedding dimension (2).
- Query-Key-Value Attention Bias: Enables bias terms in the query, key, and value projections when set to true (True).
- Attention Dropout Rate: The dropout rate applied specifically to the attention mechanism for regularization (0.3).
- Attention Head Dropout Type: Specifies the dropout strategy applied to the attention heads (e.g., HeadDropOut).
- Attention Head Dropout Rate: The dropout rate applied to the attention heads (0.3).
2.5. Mathematical Foundation
- For each image patch is compressed into a vector of length , where .
- A series of embedded image patches is produced by mapping the flattened patches into D dimensions using a trained linear projection .
- A learnable class embedding is prepended to the sequence of embedded image patches. The value of represents the classification output .
- Single-Head Attention
- Q is the Query
- K is the Key
- V is the value
- dk is the Ker Dimension
- T is the sequence Lenght
- Multi-head Attention
- i is the Token Number.
- j is the Projection of the Token Length.
- h is the height of the image.
- w is the width of the image.
- s is the stride, which can be explained as s = ceil·(k/2).
- p is the padding, which can be described as p = ceil·(k/4).
- k is the kernel.
- Convolutional Block Attention Module (CBAM) Model
- Channel (channel: 48): The number of input channels.
- Reduction (reduction: 16): Used to reduce the channel dimension in the fully connected layers, enabling greater computational efficiency.
- Kernel Size (k_size: 3): The kernel size for the 2D convolution used in spatial attention.
- Grouping Channel
- X is the input tensor of dimensions c × h × w.
- i and j are the spatial coordinates.
- c is the number of channels.
- h, w are the height and width of the image, correspondingly.
- 2.
- Convolution Layer
- Y is the convolution output.
- X is the input.
- W is the convolution kernel.
- k is the kernel size.
- 3.
- Batch Normalization
- μ and σ2 are the mean and variance of Y calculated over the batch.
- γ and β are the learned scale and bias parameters, respectively.
- ϵ is a constant for numerical stability.
- 4.
- Sigmoid Activation
- Z is the input of the sigmoid function, which is the output of the Batch Normalization layer.
- 5.
- Spatial Attention Mask
- X is the input tensor.
- Z is the output of the convolution layer continued by Batch Normalization and Sigmoid Activation.
- Squeeze Operation: The “squeeze” operation consists of reducing the spatial dimensions of a feature tensor X (with dimensions ) to a single-channel feature tensor (dimensions ). This is achieved through Global Average Pooling:
- Si represents the “squeeze” value for channel , indicating the importance of the channel relative to the other channels.
- Excitation Operation: The “excitation” operation uses fully linked layers to model the relationships between channels and to learn attention weights.
- δ represents an activation function (in this case, ReLU).
- W1 and W2 are learned weight matrices.
- Scale Operation: The “scale” operation scales the original feature channels using the attention weights calculated in the “excitation” stage.
- Yijk is the final value of the output tensor Y after applying channel attention, where each value of channel i at spatial position (j,k) is scaled by the excitation weight Ei.
- Inception Model Architecture
- in_channels: 3 input channels.
- out_channels: 6 output channels for each convolutional branch.
- N is the number of patch tokens.
- C denotes the feature dimension.
- is the High-Frequency Mixer.
- is the Low-Frequency Mixer.
- denotes the feature dimension of the High-Frequency Mixer.
- denotes the feature dimension of the Low-Frequency Mixer.
- and denote the outputs of the high-frequency mixers.
- FC is the Fully Connected layer, referring to a linear or dense layer.
- MaxPool performs max subsampling to reduce resolution and capture invariant features.
- DwConv refers to the Depthwise Convolution layer (channel-wise separable) and efficiently applies convolutions separately for each channel to capture spatial and channel-wise patterns.
- is the output of the Low-Frequency Mixer.
- Upsample is an operation that improves the spatial resolution of a feature or feature map.
- MSA (Multi-Head Self-Attention) enables capturing global dependencies among tokens.
- AvePooling (Average Pooling) performs subsampling by averaging regions to reduce resolution and smooth features.
- ITM is the Inception Token Mixer.
- FFN is the Feedforward Neural Network.
- LN denotes Layer Normalization.
- Long Short-Term Memory (LSTM)
- input_size: Number of input features per time step.
- hidden_size: Dimensionality of the hidden vector () and the cell state ().
- batch_first: If True, the LSTM input and output will have the shape .
- ft is the activation vector of the forget gate.
- σ serve as the sigmoid function, which converts input values into the range [0,1].
- Wf is the weight matrix for the forget gate.
- contains the chain of the previous hidden state and the current input.
- bf is the bias of forget gate.
- is the activation vector of the input gate.
- σ serves as the sigmoid function, which regulates the amount of new information added.
- is the new candidate memory that can be added to the cell state.
- tanh is the hyperbolic tangent function, which converts input values into the range [−1, 1].
- Wi, bi are the weights and bias of the input gate, correspondingly.
- Wc, bc are the weights and bias for the candidate memory, correspondingly.
- is the updated cell state.
- ⊙ denotes the element-wise product operation.
- ft⊙Ct−1 represents the data maintained from the previous cell state.
- represents the new information added to the cell state.
- ot is the activation vector of the output gate.
- ht is the hidden state and output of the LSTM at the current time step.
- Wo is the weight matrix for the output gate.
- contains the concatenation of the prior hidden state and the current input.
- bo is the bias of the output gate.
- σ serves as the sigmoid function, which regulates the amount of information emitted.
- tanh is the hyperbolic tangent function applied to the cell state.
- Optical Flow
3. Results and Discussion
- Stochastic Gradient Descent
- Optical Flow
- Testing with Proprietary Data
- Comparison of Machine Learning Models
- Novelty of the B-LiT/DualHybridFace
- The information related to the ethical
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Shen, X.; Sui, H. The time—frequency characteristics of EEG activities while recognizing microexpressions. In Proceedings of the IEEE Biomedical Circuits and Systems Conference (BioCAS), Shanghai, China, 17–19 October 2016; pp. 180–183. [Google Scholar]
- Merghani, W.; Davison, A.K.; Yap, M.H. A Review on Facial Micro-Expressions Analysis: Datasets, Features and Metrics. arXiv 2018, arXiv:1805.02397. [Google Scholar] [CrossRef]
- Matsumoto, D.; Hwang, H.S.; López, R.M.; Pérez-Nieto, Á.P. Lectura de la expresión facial de las emociones: Investigación básica en la mejora del reconocimiento de emociones. Ansiedad Estrés 2013, 19, 121–129. [Google Scholar]
- Revina, I.M.; Emmanuel, W.S. A Survey on Human Face Expression Recognition Techniques. J. King Saud Univ.—Comput. Inf. Sci. 2021, 33, 619–628. [Google Scholar] [CrossRef]
- Canal, F.Z.; Müller, T.R.; Matias, J.C.; Scotton, G.G.; de Sa Junior, A.R.; Pozzebon, E.; Sobieranski, A.C. A survey on facial emotion recognition techniques: A state-of-the-art literature review. Inf. Sci. 2022, 582, 593–617. [Google Scholar] [CrossRef]
- Ekman, P.; Friessen, W. Unmasking the Face; Consulting Psychologists Press: Redwood City, CA, USA, 1984. [Google Scholar]
- Strauss, D.; Steidl, G.; Heiß, C.; Flotho, P. Lagrangian Motion Magnification with Landmark-Prior and Sparse PCA for Facial Microexpressions and Micromovements. In Proceedings of the 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Scotland, UK, 8 September 2022; pp. 2215–2218. [Google Scholar]
- Zong, Y.; Zheng, W.; Cui, Z.; Zhao, G.; Hu, B. Toward Bridging Microexpressions From Different Domains. IEEE Trans. Cybern. 2020, 50, 5047–5060. [Google Scholar] [CrossRef]
- Fibriani, I.; Mardiyanto, R.; Purnomo, M.H. Detection of Kinship through Microexpression Using Colour Features and Extreme Learning Machine. In Proceedings of the 2021 International Seminar on Intelligent Technology and Its Applications (ISITIA), Surabaya, Indonesia, 4 August 2021; pp. 331–336. [Google Scholar]
- Megías, C.F.; Mateos, J.C.P.; Ribaudi, J.S.; Fernández-Abascal, E.G. Validación española de una batería de películas para inducir emociones. Psicothema 2011, 23, 778–785. [Google Scholar]
- Navarro-Corrales, E. El lenguaje no verbal. de Rev. Comun. 2011, 20, 46–51. [Google Scholar]
- Herrera, R.R.; Torres, S.D.; Martínez, R.A.O. Recognition of emotions through HOG. In Proceedings of the 2018 XX Congreso Mexicano de Robótica (COMRob), Ensenada, Mexico, 12–14 September 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Ekman, P. An argument for basic emotions. Cogn. Emot. 1992, 6, 169–200. [Google Scholar] [CrossRef]
- Ekman, P.; Harrieh, O. Expresiones faciales de la emoción. Annu. Rev. Psycology 1979, 30, 527–554. [Google Scholar] [CrossRef]
- Ekamn, P. Emotions Revealed: Recognizing Faces and Feelings to Improve Communication and Emotional Life; Times Books; Henry Holt and Co.: New York, NY, USA, 2003. [Google Scholar]
- Kay, T.; Ringel, Y.; Cohen, K.; Azulay, M.-A.; Mendlovic, D. Person Recognition using Facial Micro-Expressions with Deep Learning. arXiv 2023, arXiv:2306.13907. [Google Scholar] [CrossRef]
- Yap, M.H.; See, J.; Hong, X.; Wang, S.-J. Facial Micro-Expressions Grand Challenge 2018 Summary. In Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, China, 15–19 May 2018; pp. 675–678. [Google Scholar]
- Wang, C.; Peng, M.; Bi, T.; Chen, T. Micro-Attention for Micro-Expression Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6339–16347. [Google Scholar]
- Wang, S.-J.; Li, B.-J.; Liu, Y.-J.; Yan, W.-J.; Ou, X.; Huang, X.; Xu, F.; Fu, X. Micro-expression recognition with small sample size by transferring long-term convolutional neural network. IEEE Int. Conf. Image Process. 2018, 321, 251–262. [Google Scholar] [CrossRef]
- Gu, Q.-L.; Yang, S.; Yu, T. Lite general network and MagFace CNN for micro-expression spotting in long videos. Multimed. Syst. 2023, 29, 3521–3530. [Google Scholar] [CrossRef]
- Guo, Y.; Li, B.; Ben, X.; Ren, Y.; Zhang, J.; Yan, R.; Li, Y. A Magnitude and Angle Combined Optical Flow Feature for Microexpression Spotting. IEEE Multimed. 2021, 28, 29–39. [Google Scholar] [CrossRef]
- Yang, P.; Zhu, T. Micro-Expression Recognition Method Based on Transformer with Separable Self-Attention. In Proceedings of the 2024 4th International Conference on Machine Learning and Intelligent Systems Engineering (MLISE), Zhuhai, China, 28–30 June 2024; pp. 90–95. [Google Scholar] [CrossRef]
- Cheng, X.; Shang, L. Decoding Emotions: How Graph Transformer with Adaptive Graph Structure Learning Understands Micro-Expressions. In Proceedings of the 2025 IEEE 19th International Conference on Automatic Face and Gesture Recognition (FG), Tampa/Clearwater, FL, USA, 26–30 May 2025; pp. 1–10. [Google Scholar] [CrossRef]
- Zhang, L.; Hong, X.; Arandjelovic, O.; Zhao, G. Short and Long Range Relation Based Spatio-Temporal Transformer for Micro-Expression Recognition. IEEE Trans. Affect. Comput. 2022, 13, 1973–1985. [Google Scholar] [CrossRef]
- Zhang, Y.; Lin, W.; Zhang, Y.; Xu, J.; Xu, Y. Leveraging vision transformers and entropy-based attention for accurate micro-expression recognition. Dent. Sci. Rep. 2025, 15, 13711. [Google Scholar] [CrossRef]
- Zhai, Z.; Zhao, J.; Long, C.; Xu, W.; He, S.; Zhao, H. Feature Representation Learning with Adaptive Displacement Generation and Transformer Fusion for Micro-Expression Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; Volume abs/2304.04420. [Google Scholar] [CrossRef]
- Zhao, X.; Lv, Y.; Huang, Z. Multimodal Fusion-based Swin Transformer for Facial Recognition Micro-Expression Recognition. In Proceedings of the 2022 IEEE International Conference on Mechatronics and Automation (ICMA), Guilin, China, 7–10 August 2022; pp. 780–785. [Google Scholar] [CrossRef]
- Wang, S.; Lv, S.; Wang, X. Infrared Facial Expression Recognition Using Wavelet Transform. In Proceedings of the 2008 International Symposium on Computer Science and Computational Technology, Shanghai, China, 20–22 December 2008; pp. 327–330. [Google Scholar] [CrossRef]
- Modelo Basado en Componentes. Metodologías de Software. 1 December 2017. Available online: https://metodologiassoftware.wordpress.com/2017/12/01/modelo-basado-en-componentes/ (accessed on 9 April 2023).
- Jubair, B.; Humera, S.; Rawoof, S.A. An Approach to generate Reusable design from legacy components and Reuse Levels of different environments. Int. J. Curr. Eng. Technol. 2014, 4, 4234–4237. [Google Scholar]
- Happe, J.; Koziolek, H. A QoS Driven Development Process Model for Component-Based Software Systems; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4063. [Google Scholar]
- Jolly, E.; Cheong, J.H.; Xie, T.; Chang, L.J. Py-Feat. 2022. Available online: https://py-feat.org/basic_tutorials/01_basics.html (accessed on 26 April 2024).
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. IEEE Xplore 2019. [Google Scholar] [CrossRef]
- Bertasius, G.; Wang, H.; Torresani, L. Is Space-Time Attention All You Need for Video Understanding? Icml 2021, 2, 4. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uskoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Poluskhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Kong, D.; Zhang, J.; Zhang, S.; Yu, X.; Prodhan, F.A. MHIAIFormer: Multihead Interacted and Adaptive Integrated Transformer With Spatial-Spectral Attention for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 14486–14501. [Google Scholar] [CrossRef]
- Si, C.; Yu, W.; Zhou, P.; Zhou, Y.; Wang, X.; Yan, S. Inception Transformer. Adv. Neural Inf. Process. Syst. 2022, 35, 23495–23509. [Google Scholar]
- Calzone, O. Medium. An Intuitive Explanation of LSTM. 21 February 2022. Available online: https://medium.com/@ottaviocalzone/an-intuitive-explanation-of-lstm-a035eb6ab42c (accessed on 29 May 2024).
- Sánchez García, F.T.; Lin, B.; Romero-Herrera, R. Detection of facial micro-expressions using CNN. J. Theor. Appl. Inf. Technol. 2023, 101, 7592–7601. [Google Scholar]
- Tran, T.-K.; Vo, Q.-N.; Hong, X.; Zhao, G. Micro-expression spotting: A new benchmark. Neurocomputing 2021, 443, 356–368. [Google Scholar] [CrossRef]
- Torres, L. The Machine Learners. Curva ROC y AUC en Python. Available online: https://www.themachinelearners.com/curva-roc-vs-prec-recall/ (accessed on 31 May 2024).





























| Feature | CASME II | SAMM | SMIC |
|---|---|---|---|
| Number of Sample | 255 | 159 | 164 |
| Participants | 35 | 32 | 16 |
| Ethnicities | Chinese | Chinese and 12 more | Chinese |
| Facial Resolution | 280 ∗ 340 | 400 ∗ 400 | 640 ∗ 480 |
| Happiness (25) | Happiness (24) | ||
| Surprise (15) | Surprise (13) | Positive (51) | |
| Categories | Anger (99) | Anger (20) | Negative (70) |
| Sadness (20) | Sadness (3) | Surprise (43) | |
| Fear (1) Others (69) | Fear (7) Others (84) |
| Model Version | Number of Epochs | Training Time | Loss of Training | Accuracy | F1 Score |
|---|---|---|---|---|---|
| Without Optical Flow | 60 | 1 h 45 min and 58 s | 0.2108 | 90.28% | 0.8453 |
| With Optical Flow | 100 | 23 min and 55 s | 0.0986 | 90% | 0.8556 |
| Training Loss | Accuracy | Epochs | Training Time | CPU Usage | Memory Usage |
|---|---|---|---|---|---|
| 0.1917 | 0.9474 | 60 | 8 min 38 s | 13.3% | 72.1% |
| Model | Flopd (Aprox) | Parameters | Estimated FPS (GTX 1060, Batch) |
|---|---|---|---|
| ResNet-50 | 4.1 GFLOPs | 25.6 M | 35–50 FPS |
| Vision Transformer (Vit-B/16) | ~17.6 GFLOPSs | 86 M | 8–12 FPS |
| Vision Transfomer (Vit-L/16) | ~60 FLOPS | 307 M | 2–4 FPS |
| F1-Score | Precision | Accuracy | Recall | Mean Square Error (MSE) | Mean Absolute Error (MAE) | |
|---|---|---|---|---|---|---|
| 0.8574 | 0.8127 | 0.8519 | 0.8127 | 0.2278 | 0.2175 | 0.8243 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Romero-Herrera, R.; Sánchez García, F.T.; Álvarez Peñaloza, N.A.; López Lin, B.Y.L.; Utrilla, E.J.J. Micro-Expression Recognition Using Transformers Neural Networks. Computers 2025, 14, 559. https://doi.org/10.3390/computers14120559
Romero-Herrera R, Sánchez García FT, Álvarez Peñaloza NA, López Lin BYL, Utrilla EJJ. Micro-Expression Recognition Using Transformers Neural Networks. Computers. 2025; 14(12):559. https://doi.org/10.3390/computers14120559
Chicago/Turabian StyleRomero-Herrera, Rodolfo, Franco Tadeo Sánchez García, Nathan Arturo Álvarez Peñaloza, Billy Yong Le López Lin, and Edwin Josué Juárez Utrilla. 2025. "Micro-Expression Recognition Using Transformers Neural Networks" Computers 14, no. 12: 559. https://doi.org/10.3390/computers14120559
APA StyleRomero-Herrera, R., Sánchez García, F. T., Álvarez Peñaloza, N. A., López Lin, B. Y. L., & Utrilla, E. J. J. (2025). Micro-Expression Recognition Using Transformers Neural Networks. Computers, 14(12), 559. https://doi.org/10.3390/computers14120559

