Uncovering Several Degrees of Anxiety in Mexican Students Through Advanced Deep Learning Techniques

Moreno-Armendáriz, Marco A.; Lara-Cázares, Arturo; Castillo-González, Jared; Galdo-Navarro, Halder V.

doi:10.3390/a19030235

Open AccessArticle

Uncovering Several Degrees of Anxiety in Mexican Students Through Advanced Deep Learning Techniques

by

Marco A. Moreno-Armendáriz

^1,*

,

Arturo Lara-Cázares

^1,*

,

Jared Castillo-González

²

and

Halder V. Galdo-Navarro

²

¹

Centro de Investigación en Computación, Instituto Politécnico Nacional, Ciudad de México 07738, México

²

Escuela Superior de Cómputo, Instituto Politécnico Nacional, Av. Juan de Dios Bátiz s/n, Col. Lindavista, Ciudad de México 07738, México

^*

Authors to whom correspondence should be addressed.

Algorithms 2026, 19(3), 235; https://doi.org/10.3390/a19030235

Submission received: 7 January 2026 / Revised: 16 February 2026 / Accepted: 18 March 2026 / Published: 20 March 2026

(This article belongs to the Special Issue Modern Algorithms for Image Processing and Computer Vision)

Download

Browse Figures

Versions Notes

Abstract

Emotion identification via computer vision has made continuous progress over the last few years. Although images have been the gold standard for the past two decades, video is increasingly common. Video is particularly suitable for the study of emotions, as it allows them to be considered as spatiotemporal phenomena. In particular, the discovery of anxiety among Mexican students is a key element for improving their learning in the classroom. In pursuit of this goal, we focused on the following challenges. First, the scarcity of specialized datasets for this task prompted us to develop an experimental protocol to generate a specific dataset; second, to conduct a thorough study of the appropriate number of emotional intensity levels; and third, to develop a suitable design for a deep learning architecture. Our pivotal results include the development of a new dataset labeled with three different emotion levels and appropriate ConvNet architectures, complemented by a study of various intensity levels. The optimal architecture achieved an F1-score of 0.7620 across five intensity levels and provides an adequate baseline for multiclass classification.

Keywords:

anxiety detection; convolutional neural networks; emotional intensity level analysis

1. Introduction

Anxiety is a universal human experience characterized by feelings of worry, nervousness, or unease in response to uncertainty. However, anxiety is not inherently negative. At optimal levels, it is a normal, motivating, and protective response that can help individuals cope with adversity [1]. In the academic environment, high anxiety is associated with greater burnout [2], negatively affects students’ engagement with their studies [3], is strongly correlated with increased perceived academic procrastination [4], and contributes to lower psychological well-being [5]. Overall, anxiety among undergraduate students has become an increasingly prevalent mental health concern. For instance, a study conducted by the Autonomous University of Aguascalientes reported that approximately 86.3% of university students in México experience moderate anxiety related to their studies [6].

Traditionally, anxiety is measured using questionnaires that quantify the intensity of the emotion based on responses. The Generalized Anxiety Disorder-7 (GAD-7) is one of the main anxiety measurement instruments; it uses four levels of emotion (minimal, mild, moderate, and severe). Its main advantage is its validation across different countries and populations, and it can be administered in approximately two minutes [7]. Another popular anxiety measurement instrument is the Depression Anxiety Stress Scale (DASS), which measures physiological arousal, panic-like symptoms, autonomic activation, and subjective fear to assess anxiety, stress, and depression on five scales [8]. In general, questionnaire-based tests have the disadvantages of measuring emotional states days in advance and failing to estimate the current emotional levels; they do not detect behavioral or physiological signs of anxiety; and users may underestimate or overestimate their symptoms. They cannot detect situational anxiety during interactions such as exams or interviews [7,8]. Furthermore, a recent study indicated that different anxiety scales, such as the DASS and GAD-7, differ in their conceptualization of anxiety, even though they measure the same variable [9].

New approaches to measuring anxiety, supported by technologies such as deep learning, specifically the area of computer vision (CV), address some of the limitations of questionnaire tests by taking into account information that traditional tests miss, such as facial expressions and body posture [10], providing a continuous real-time analysis of different stimuli during social interactions [11], and taking into account behavioral and physiological signs such as facial movements, head position, blink rate, and eye movements [12]. To apply computer vision techniques, digital data are required, particularly video recordings that contain image and audio sequences, which are necessary for deep learning algorithms to learn from different examples and capture events recorded by cameras [13].

To process video data, various specialized deep learning architectures have been developed. For example, 3D CNNs (three-dimensional convolutional neural networks) use multidimensional kernels to perform convolution operations directly on image sequences [14]. Alternatively, CNN (two-dimensional convolutional neural network) + LSTM (long short-term memory) architectures apply 2D convolution to each frame to extract spatial features and then use LSTM layers to capture temporal information in the video [15]. More recently, transformer-based architectures have gained popularity by extracting spatiotemporal tokens from videos and encoding them through a series of transformer layers [16].

In this work, we focused on developing and generating our own dataset. We are conducting an in-depth study of the appropriate number of emotional-intensity levels, which is crucial for adequately representing the phenomenon under study. In what follows, we design a suitable deep learning architecture using diverse metrics and comparisons as references. The following sections present a review of relevant work on the anxiety classification problem, followed by the methodology, experiments, results, and conclusions.

2. Related Works

This section addresses several essential works on the classification of anxiety using deep learning algorithms for video processing. Some focus on multiclass classification to determine the anxiety scale in facial gestures, while others are limited to binary classification and do not account for the emotion scale. In general, the anxiety emotion is not the main focus of the related works; authors just mixed with other emotions like depression or stress, so particular gesticulations are not studied as a whole.

Gavrilescu et al. (2019) [17] proposed an artificial neural network model, the Face Depression Anxiety Stress Scale Neural Network (FDASSNN), that estimates levels of depression, stress, and anxiety using the DASS scale, with the following levels: normal, mild, moderate, severe, and extremely severe. In their work, they constructed a dataset comprising 128 Caucasian subjects (64 men and 64 women), aged 18 to 35, who were exposed to multimedia stimuli from the LIRIS-ACCEDE database and assessed using the Self-Analysis Questionnaire (SAQ) to classify each participant’s anxiety level. They obtained an accuracy of 87.2% for depression, 77.9% for anxiety, and 90.2% for stress.

Wang et al. (2021) [18] addressed the problem of classifying anxiety and depression using deep learning, combining convolutional neural networks (CNNs) and long short-term memory (LSTM) architectures to process video data in both the spatial and temporal dimensions. Their work focused not only on classifying individuals as anxious, depressed, or without disorders, but also on the binary classification of anxiety, with the latter being of greater interest to this research. They collected facial recordings of 303 participants from the Affiliated Hospital of Guangdong Medical University while they completed the Self-Rating Depression Scale (SDS) and the Anxiety Self-Rating Scale (SAS). Subsequently, clinicians used scales such as the Hamilton Depression Rating Scale (HDRS) and the SCL-90-R Symptom Checklist to label the data. At the end of data collection, 103 participants were classified as having anxiety, 94 as having depression, and 106 as having no disorders. The authors reported a precision of 0.7208 for the binary classification of anxiety.

Grimm et al. (2022) [19] identified shortcomings in the GAD-7 (Anxiety Screening Questionnaire) because, as a self-reported measure, patients may over- or underestimate the severity of their condition, leading to a mismatch in the risk assessment. Furthermore, the GAD-7 does not account for information about the user’s physical characteristics, such as facial expressions, sounds, or language itself. This study proposed the GAD-V, a video-based anxiety assessment tool that leverages pre-trained multimodal transformer networks to recognize patterns in text, audio, and facial gestures. Specifically, they employed the BERT, Hubert, and Affectiva models for text, audio, and facial gesture processing, respectively. The authors conducted the experimentation phase using a proprietary dataset collected via an online survey from Videra Health, comprising 955 participants (69% women, 31% men) and 4775 videos. At the end of the study, they presented their results, evaluating their model using the area under the curve (AUC) and the Pearson correlation, obtaining values of 0.909 and 0.799, respectively.

Li et al. (2023) [20] focused on detecting depression and anxiety using multiple deep learning models, highlighting ResNet-18 as the backbone for extracting features from image sequences and subsequently processing them with the authors’ proposed attention mechanisms. The Voluntary Facial Expression Mimicry (VFEM) dataset was used for the anxiety/depression detection task and binary classification among patients with these disorders. The authors identified an imbalance in their dataset, so they used the F1-score to appropriately compare it with SOTA, achieving a score of 0.82 in binary anxiety classification.

Wu et al. (2024) [21] developed Adv-FVMamba, a deep learning model explicitly designed to detect anxiety disorders in imbalanced video data using the adversarial entropy loss, and used it to perform binary classification of anxiety. They also created the Qingdao Anxiety Disorders of Adolescents Video Dataset (QADAVB), which comprises facial videos of 112 adolescents (55 boys and 57 girls) performing a human–computer interaction task and responding to the Mental Health Testing (MHT) scale. This dataset was subsequently used as the primary grouping criterion to assess participants with anxiety. Similar to Li, X. et al. (2023) [20], they reported an F1-score of 0.751 due to dataset imbalance.

Xu et al. (2024) [22] proposed using machine learning algorithms, such as support vector machines, random forests, XGBoost, and ExtraTrees, to classify anxiety, depression, and stress; the DASS-21 test, comprising five measurement scales, was used as the reference. The authors selected the Facial Analysis for Clinical Emotional States (FACES) dataset for their experiment, which comprises high-quality facial videos of 11,427 participants, each labeled as depressed, anxious, or stressed using the DASS-21 scale. The authors reported a mean absolute error (MAE) of 5.249 for anxiety using random forest, whereas for binary classification, they achieved an F1-score of 0.66 with the ExtraTrees model. The most important conclusion of this article was the successful identification of two new emotional subgroups that combine depression, anxiety, and stress.

Lu et al. (2025) [23] integrated multiple deep learning modules to develop SFE-Former, a sequential facial expression recognition model specifically designed to detect depression and anxiety. To train the model, they used their own dataset called the Voluntary Facial Expression Mimicry Experiment (VFEM), which contains recordings of 139 participants in the control group and 184 individuals with a diagnosed disorder (84 with depression and 100 with anxiety). The data collection took place in a soundproof laboratory at a mental health center, where the participants observed and mimicked cartoon images depicting seven basic expressions (anger, disgust, fear, happiness, sadness, surprise, and neutrality). The authors emphasize that the conditions of most of these subjects were relatively severe and complex, as they were hospitalized. During the testing phase, they used the AVEC 2014 dataset to make an accurate comparison with SOTA. This dataset consists of 84 subjects aged 18–63 years and 300 videos. The authors’ results for anxiety detection were 0.889 for accuracy and 0.882 for the F1-score.

Taken together, these works reveal a consistent trend: anxiety recognition is usually approached through classification, whether binary or multiclass, but authors such as Grimm, B. et al. (2022) [19] have pointed out the limitations of questionnaire-based tests, which fail to utilize physical data from individuals, and they propose a new way to assess anxiety. On the other hand, Xu, Z. et al. (2024) [22] identified two new emotional subgroups representing complex combinations of depression, anxiety, and stress, paving the way for deep learning algorithms to support emotion classification. In particular, we did not find any references questioning the scales used to measure anxiety derived from multimedia data (videos), a new field explored in subsequent sections.

3. Methodology

In Figure 1, we present the proposed methodology. In the first part of the pipeline, dataset creation defines the experimental protocol and labeling process. Then, the anxiety dataset is constructed, and a preprocessing stage applies video modifications. Next, the model design stage includes training and testing of multiple models. Finally, model selection evaluates and selects the model with the Highest F1-score. We explain each component in detail below.

3.1. Dataset Creation: Anxiety Dataset

3.1.1. Experimental Protocol and Videotaping

To ensure the validity and representativeness of the collected dataset, we established inclusion, exclusion, and elimination criteria for participants. These criteria enabled us to delimit the target population and ensure that the data were collected under homogeneous conditions, consistent with the study’s objective. The definition of these criteria follows the methodological approach proposed by Hernández Sampieri et al. [24] for sample selection in experimental studies. Specifically, the exclusion criteria eliminated candidates who, despite meeting the inclusion criteria, could introduce noise or unwanted variability into the recordings. For this study, the exclusion criteria were as follows:

Requiring glasses for close vision during the test. We considered that glasses could obscure relevant facial gestures or introduce image distortions.
An inability to tie hair away from the face. Loose hair or hair that wholly or partially covers the face may prevent the correct detection of facial gestures.

We made the recordings in a controlled environment, i.e., with constant lighting, a white background, and a fixed camera position, thereby preventing noise that could bias the model’s training. We presented each participant with different audiovisual stimuli: the first to elicit happiness and the second to elicit anxiety. Figure 2 shows the experimental pipeline. It begins with a 5 s black screen, followed by a 10 s depiction of a happy face; then shows a 50 s comedic video with a 5 s interlude; then presents an image of anxiety for 25 s; and ends with a 35 s anxiety video. All recordings were made with 1080p cameras at 30 fps, using a white background, consistent lighting, a standardized distance between the camera and the participant, and an environment free of visual distractions.

Of all the recorded videos, 58 participants met the previously mentioned criteria. The dataset consisted of 47 male and 11 female participants, with a mean age of 21.58 ± 1.83 years. We cropped each participant’s video every 50 frames, yielding 3022 distinct clips, and then moved each clip to the labeling process. In Figure 3, we present a representative video of the anxiety dataset. The subjects participated in accordance with the Declaration of Helsinki [25] and provided written informed consent, ensuring that they understood the study’s purpose and the use of their data.

3.1.2. Labeling

Scales for measuring the intensity of an emotion are not rigid; authors such as Hamilton [26] have proposed using five levels of anxiety, ranging from 0 (no anxiety) to 4 (very severe anxiety). Meanwhile, Villareal et al. [27] concluded that effective management involves administering the GAD-7, which classifies anxiety into four levels (minimal, mild, moderate, and severe). The discrepancies among the scales proposed by different authors raise a new question: how many and which categories are best suited to measuring an emotion? In our results section, we seek an answer.

3.2. Preprocessing

In Figure 4, the process begins with face detection and frame-level positioning using the MediaPipe tool [28]. The frame size is then reduced from 1920 × 1080 to 240 × 240, lowering the computational cost during training. Next, we normalized each pixel value by dividing it by 255 (since the frames are 8-bit). After processing all videos, we split them into an 80% training set (2417 videos) and a 20% test set (605 videos). Finally, we augmented the training data by applying a horizontal flip, resulting in a total of 4834 videos. We used only horizontal flipping during preprocessing to minimize data augmentation and obtain a model that generalizes accurately to the problem.

3.3. Model Training: AnxietyNet

With the dataset ready, we propose three distinct architectures for clip classification: CNN3D, CNN3D with attention, and a time-distributed CNN. We selected these architectures because the data are sequential and require spatiotemporal processing, given the intrinsic inter-frame correlation in human emotion [29]. In other words, analyzing individual frames (e.g., using only a 2D CNN) does not provide the necessary information for this problem. Furthermore, video processing enables us to capture transitions between emotion scales that a single image cannot represent [29].

Figure 5 shows the architectures proposed in this work, starting with the CNN3D at the top. Its primary focus is on finding relationships with 3D kernels to extract features from videos. This network can directly capture motion information encoded in the sequence of frame capabilities that 2D CNNs can achieve on their own [30]. We built this model by stacking 3D convolutional layers that extract features, alternating with clustering layers that reduce the spatial resolution to obtain non-variable data with minimal distortion. Within the same framework, we present a variant that applies an attention mechanism before the classifier’s output layer, yielding the CNN3D + attention architecture. This architecture integrates spatial attention during 3D-CNN training to focus on regions of interest, simulating human vision by prioritizing relevant areas and ignoring unimportant backgrounds. This mechanism enables the more effective learning of spatiotemporal features without prior knowledge. Once the network processes the video clips, temporal attention determines which clips contain more significant movements [31].

The lower portion of Figure 5 represents the time-distributed CNN (TD-CNN) architecture, whose primary motivation is to apply the convolution operation with 2D kernels to each frame of the video while maintaining its temporal relationship; in this way, this ConvNet extracts spatial and temporal features [32]. TD-CNN addresses the limitations of standard CNNs by incorporating time-distributed layers. These layers enable each feature layer to be processed independently, facilitating learning of the spatiotemporal features in sequential data, such as video clips, without the high computational costs of other architectures, since it uses parallel training by distributing convolution, pooling, activation, and dense-layer operations along the time axis [33]. We trained all the architectures with the hyperparameters shown in Table 1.

As an insightful complement to the architectures presented in Figure 5, we introduce the mathematical definitions of each model. We defined our CNN3D model as shown in Equation (1). Similarly, we defined our CNN3D + Att model as shown in Equation (2).

Y_{1} = F_{1} (X_{0})

(1)

Y_{2} = F_{2} (X_{0})

(2)

where

X_{0} \in R^{F x H x W x C}

denotes the video input with dimensions F (frames), H (height), W (width), and C (channels).

Y_{1}

represents the anxiety classification obtained using Equation (3), which defines the CNN3D architecture shown in Figure 5a. Similarly,

Y_{2}

represents the anxiety classification obtained using Equation (4), which defines the CNN3D + Att architecture shown in Figure 5b.

\begin{matrix} X_{1} = R e L U (K_{1} ⊛ X_{0}), \\ X_{2} = R e L U (K_{2} ⊛ X_{1}), \\ X_{3} = B N (X_{2}), \\ X_{4} = M P (X_{3}), \\ X_{5} = R e L U (K_{3} ⊛ X_{4}), \\ X_{6} = B N (X_{5}), \\ X_{7} = M P (X_{6}), \\ X_{8} = R e L U (K_{4} ⊛ X_{7}), \\ X_{9} = B N (X_{8}), \\ X_{10} = M P (X_{9}), \\ X_{11} = F L (X_{10}), \\ X_{12} = R e L U (w_{1}^{T} X_{11} + b_{1}), \\ X_{13} = R e L U (w_{2}^{T} X_{12} + b_{2}), \\ X_{14} = R e L U (w_{3}^{T} X_{13} + b_{3}), \\ X_{15} = R e L U (w_{4}^{T} X_{14} + b_{4}), \\ X_{16} = S o f t m a x (w_{5}^{T} X_{15} + b_{5}) \end{matrix}

(3)

\begin{matrix} X_{1} = R e L U (K_{1} ⊛ X_{0}), \\ X_{2} = R e L U (K_{2} ⊛ X_{1}), \\ X_{3} = B N (X_{2}), \\ X_{4} = M P (X_{3}), \\ X_{5} = R e L U (K_{3} ⊛ X_{4}), \\ X_{6} = B N (X_{5}), \\ X_{7} = M P (X_{6}), \\ X_{8} = R e L U (K_{4} ⊛ X_{7}), \\ X_{9} = B N (X_{8}), \\ X_{10} = M P (X_{9}), \\ X_{11} = F L (X_{10}), \\ X_{12} = R e L U (w_{1}^{T} X_{11} + b_{1}), \\ X_{13} = R e L U (w_{2}^{T} X_{12} + b_{2}), \\ X_{14} = R e L U (w_{3}^{T} X_{13} + b_{3}), \\ X_{15} = R e L U (w_{4}^{T} X_{14} + b_{4}), \\ X_{16} = A t t (X_{15}), \\ X_{17} = S o f t m a x (w_{5}^{T} X_{16} + b_{5}) \end{matrix}

(4)

where

K_{i} \in R^{N_{k} x F x H x W x C}

is the convolution kernel (where

N_{k}

denotes the number of convolution kernels), ⊛ is the convolution operation,

R e L U

is defined by Equation (5),

B N

is defined by Equation (6) where

γ

is a learned scaling factor initialized as ones,

β

is a learned offset factor as zeros, and

\hat{X_{i}}

is defined in Equation (7) where X represents the values of the input image in the pooling region,

M P

is defined by Equation (8).

F L

is the flatten layer,

w_{i}

and

b_{i}

are the weights and bias of the fully connected layer, and the function

S o f t m a x

is defined in Equation (11).

A t t

represents the attention layer. It is important to note that we trained two models: one with the attention layer activated and another without.

R e L U (X) = \max (0, X)

(5)

B N (X_{i}) = γ \hat{X_{i}} + β

(6)

\hat{X_{i}} = \frac{X_{i} - E [X_{i}]}{\sqrt{V a r [X_{i}]}}

(7)

M P (X) = m a x {[x_{i}]}_{i = 0}^{N}

(8)

On the other hand, the TD-CNN model is presented in Equation (9) where

Y_{3}

is the classification obtained using Equation (10), which defines the TDD-CNN architecture shown in Figure 5c where

T D

it is the time-distributed layer,

K_{i} \in R^{N_{k} x F x H x W x C}

is the convolution kernel (where

N_{k}

denotes the number of convolution kernels), ⊛ is the convolution operation,

R e L U

is defined by Equation (5),

B N

is defined by Equation (6),

M P

is defined by Equation (8),

F L

is the flatten layer,

w_{i}

and

b_{i}

are the weights and bias of the fully connected layer, and the function

S o f t m a x

is defined by Equation (11).

D O

represents the dropout layer.

Y_{3} = F_{3} (X_{0})

(9)

\begin{matrix} X_{1} = T D (R e L U (K_{1} ⊛ X_{0})), \\ X_{2} = B N (X_{1}), \\ X_{3} = T D (M P (X_{2})), \\ X_{4} = T D (R e L U (K_{2} ⊛ X_{3})), \\ X_{5} = B N (X_{4}), \\ X_{6} = T D (M P (X_{5})), \\ X_{7} = T D (R e L U (K_{3} ⊛ X_{6})), \\ X_{8} = B N (X_{7}), \\ X_{9} = T D (M P (X_{8})), \\ X_{10} = F L (X_{9}), \\ X_{11} = R e L U (w_{1}^{T} X_{10} + b_{1}), \\ X_{12} = B N (X_{11}), \\ X_{13} = R e L U (w_{2}^{T} X_{12} + b_{2}), \\ X_{14} = B N (X_{13}), \\ X_{15} = D O (X_{14}), \\ X_{16} = R e L U (w_{3}^{T} X_{15} + b_{3}), \\ X_{17} = B N (X_{16}), \\ X_{18} = D O (X_{17}), \\ X_{19} = S o f t m a x (w_{4}^{T} X_{18} + b_{4}) \end{matrix}

(10)

We treated the problem as a classification task. We applied the Softmax function to the output layer of each architecture to obtain the class probabilities for each prediction. The Softmax function is described in Equation (11), with the following representations:

$z_{i}$ represents the input value (or logit) for the i-th class.
$\exp (z_{i})$ is the exponential of the input value for the i-th class.
$\sum_{j = 1}^{K} \exp (z_{j})$ represents the sum of the exponentials of all K input values (logits) (the result).
$Softmax (z_{i})$ is the probability of the i-th class, with all probabilities summing to 1.

Softmax (z_{i}) = \frac{\exp (z_{i})}{\sum_{j = 1}^{K} \exp (z_{j})}

(11)

For Softmax classification, we used categorical cross-entropy as the loss function during training. The loss function is described by Equation (12), with the following representations:

L is the average categorical cross-entropy loss over N samples.
N is the number of samples in the batch.
$y_{i c}$ is a binary indicator (0 or 1) if class c is the correct classification for sample i.
$p_{i c}$ is the predicted probability of sample i belonging to class c.

L = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{c = 1}^{M} y_{i c} \log (p_{i c})

(12)

4. Results

4.1. Labeling Assigments

To answer the question posed in Section 3.1.2, this study examined the intensity levels of the emotions of happiness and anxiety. To achieve this, we used a Likert scale [34] and recruited three labelers. We conducted the following labeling rounds:

Initial round: This round comprised 11 labeling levels, ranging from −5 (intense happiness) to +5 (intense anxiety); the labelers’ consensus was then used to determine the final label for each image.
Result: We observed a high concentration of data in a few categories and very little in the rest, creating a significant dataset imbalance that may bias the model toward the majority categories and reduce its ability to detect minority emotions. We decided to deprecate these labels.
Second round: The number of levels was reduced to seven, as shown in Figure 6.
Third round: The number of levels was six. In this version, we merged class 5 (medium anxiety) with class 6 (high anxiety) and renamed class 5 (severe anxiety). Therefore, we used the first four labels in purple and the label in blue in Figure 7.
Final round: We reduced the labels to five classes, so we used the previous four purple labels (high happiness, medium happiness, low happiness, and neutral); we combined purple class 4 (low anxiety) and blue new class 5 (severe anxiety), yielding new green class 4 (anxiety), as seen in Figure 7.

Following up with the analysis of the different numbers of intensity labels in Figure 7, we performed t-SNE to visualize the data structure in a low-dimensional space. First, Figure 8a shows a dataset with seven classes, each represented by a specific color. Groupings can be observed, primarily formed by classes associated with happiness, such as classes 0 and 1. These are distributed across a broad region, as shown on the left side of the figure. At the same time, the classes related to anxiety are more concentrated in the lower right, suggesting a grouping tendency by emotion type.

After, we used a t-SNE visualization to explore a representation with six classes, as shown in Figure 8b. In this representation, the overlap between classes representing different emotions decreased. The classes linked to happiness are concentrated and “isolated” on the left side. In contrast, the classes associated with anxiety (contrary to Figure 8a) occupy a larger area on the right, with several notable divisions that capture the intensity of this emotion. Class 2 serves as a point of comparison between the two emotions.

Finally, we used t-SNE to visualize the representation of five classes. As shown in Figure 9, this revealed notable divisions between the classes. The classes associated with happiness are grouped on the left, while those associated with anxiety are on the right. In general, the classes appear more evenly distributed, though some overlap remains.

The natural frequency of the affect appears in human interactions, in which neutral or low-intensity emotional states are more frequent and longer-lasting. Russell’s model supports this phenomenon [35], defining a neutral feeling as the origin, or “level of adaptation,” of the affective space. Furthermore, he observed that labelers rely on their own cognitive structures to interpret and categorize emotions. For this reason, when faced with subtle expressions (of a low intensity or close to the neutral origin) or ambiguous ones (located in the overlap zone between two categories), it is much more likely that labelers will converge on the neutral category because it is the easiest option to choose.

The videos for the five labels in the dataset are quantified in Figure 10a. This figure reveals a concentration of data in class 3, with 838 clips, while the extreme classes, class 0 and class 4, were the least represented, with 130 and 145 samples, respectively; this indicates a significant class imbalance in emotion recognition. This distribution was due to two factors: the natural frequency of affect and convergence in annotation. Figure 10b shows the final number of training videos after data augmentation.

From the preceding analyses, it is clear that the problem’s complexity lies in distinguishing among levels of emotional intensity, as they exhibit significant overlap. Therefore, it is necessary to design deep learning architectures capable of distinguishing between subtle characteristics. We present our design results below.

4.2. Testing Models

We trained the three ConvNets described in Section 3.3 using the three labeling combinations described in Section 4.1. From now on, we will rename the labels as classes. Figure 11 shows the behavior of the accuracy metric during the training of each ConvNet for the three types of labeling. The magenta lines correspond to seven intensity levels; this metric exhibited the most oscillations and the worst performance in all cases. On the other hand, the green (six classes) and blue (five classes) lines showed similar behavior.

Figure 12 shows the confusion matrices for five intensity levels. We observed that the lowest-performing classes were the second (low happiness) and the fourth (severe anxiety). The latter was primarily affected by the limited data available for this class; however, it is clear that facial gestures in general are low to moderate rather than severe [35]. Using these matrices, the performance metrics [36] shown in Table 2 were obtained.

Using the radar chart in Figure 13, we compared the performance of the models on the various represented metrics visually; this showed that fewer classes resulted in higher scores on the metrics; five classes were the best (purple area) across all models, and seven classes were the worst (green area). These results demonstrate that the more emotional-intensity scales there are, the more difficult it is to distinguish between them; therefore, it is preferable to group them into five classes rather than the original seven proposed.

Figure 14a shows an example of correct anxiety detection; Figure 14b illustrates a failure case in which the participant opens their mouth, expressing a surprise-related emotion; and Figure 14c shows another failure of anxiety detection, in which the participant’s mouth and cheeks resemble the low-happiness example shown in Figure 6. We suggest that these failures are a consequence of the very similar facial movements associated with low happiness and anxiety. We obtained these examples using the best-performing model.

On the other hand, we addressed this problem using regression models. Therefore, we selected the best-performing architecture for the classification task and modified the output layer to include a single neuron with a ReLU activation function to generate a continuous output. After training the model using the same parameters and evaluating it under both regression and classification formulations (with the regression output truncated according to Equation (13)), we observed a better performance with the classification formulation than with the regression formulation. Table 3 presents the model results and compares the two problem formulations.

c = \max (0, \min (o u t p u t, u))

(13)

where u is the number of anxiety levels and c is the classified anxiety level.

4.3. State-of-the-Art Comparison

As a complement to the above results, Table 4 summarizes the most essential works in the state of the art and provides a detailed analysis of the key aspects relevant to our study. Note that no publicly available anxiety dataset is currently available in the literature. Gavrilescu et al. (2019) [17] presented a five-level classification, but only reported an accuracy of 77.9%. Grimm et al. (2022) [19] reported a four-level classification; however, they only achieved an AUC of 0.909. The other authors presented F1-score values for binary classification. Our best-performing model achieved an F1-score of 0.7620, with a Spearman correlation of 0.8827 and confidence intervals ranging from 0.84 to 0.90, estimated using the bias-corrected and accelerated (BCa) bootstrap method with the support of the scikits.bootstrap statistical package, achieved for five levels and means of definition.

5. Discussion

During dataset creation, we standardized the recording conditions for each participant to minimize irrelevant stimuli, such as the background and clothing. When it came to labeling, the most significant decision was to switch from an 11-level to a 7-level scale, as the initial scale was prone to labeler subjectivity and exhibited a notable class imbalance, which could negatively impact model training. Therefore, regrouping into seven classes was not only a proposed solution to class imbalance but also a means of defining emotional categories more clearly. Using the T-SNE visualization as a reference, considerable overlap existed, even with five classes; this suggests that grouping is more effective, though not perfect, with fewer classes.

The t-SNE visualization showed effective separation of elements into classes 0, 1, and 2, while the remaining classes formed a linearly non-separable cluster. We reinforced the data’s separability by conducting research on dissociation, which showed that facial expressions of happiness are recognized more quickly and accurately than those of other emotions. This finding is an essential factor in motivating the grouping of the neutral and anxiety classes, given their greater recognition complexity.

Given the small sample size, we augmented the data by applying a horizontal flip to the videos. The main objective of this technique was to provide more relevant information to the deep learning models. Since we collected the dataset in a static environment with constant lighting, the horizontal flips, noise addition, and other video modifications were omitted. However, we cropped the faces from each frame to encourage the models to focus exclusively on facial features, thereby enabling an accurate generalization.

The use of various sets of anxiety levels shows that there are significant similarities between them. Based on the results in Figure 13, we can conclude that the neutral and low-anxiety categories share features that are difficult to differentiate, similar to those of medium and high anxiety. This suggests that rigid categories for an emotion are not always better, as in this study, where the facial features of anxiety are so similar that a broader grouping is necessary to differentiate them effectively.

6. Conclusions

Studying anxiety as a multiclass problem is challenging. Our key finding is that five classes provide the best fit for inferring anxiety emotion. To establish this result, we first labeled our dataset with 11 classes (from −5 to 5, with neutrality at level 0), which made the problem overly complex. We then reduced the number of classes to seven as a starting point for the experiments.

We conducted several experiments with three deep learning models and analyzed the quality of their learning using a range of metrics. Due to data imbalance, we used the F1-score as the primary metric. We noted that seven classes were still too many, so we proposed reducing them to six or five classes. Our ablation study comprised nine deep neural networks, from which the best model was CNN3D-ATT, with an F1 Score of 0.7620. This result provides an adequate baseline for the direct study of anxiety emotion, as noted in Table 4. We used deep learning models specifically designed to identify patterns in videos, since state-of-the-art research suggests that emotions are dynamic phenomena and that treating them as static using a single image may lead to misinterpretation of a gesture or even a transition in its natural movement.

In general, classifying anxiety using computer vision is a complex task due to the limited availability of real-life emotion datasets, the processing of large volumes of data, and the need for high-performance computing to train deep learning models. However, the main problems encountered are variability in participants’ gestures and in the measurement of anxiety. As future work, we plan to incorporate additional verification of the labels assigned to each video by a psychologist, which could improve AnxietyNet’s ability to accurately separate classes. Another aspect to include is multimodal models; in this case, students could be recorded giving presentations, enabling these modalities to be considered alongside facial expressions, body posture, and voice. At the same time, including data from participants who wear glasses or have facial hair is important for improving anxiety detection in real-world environments. We can address this challenge using cropping techniques or multimodal models, which could also improve AnxietyNet’s performance metrics.

Furthermore, incorporating new approaches that enable the differentiation of anxiety levels across genders and spontaneous anxiety manifestations represents an important direction for further exploration. Finally, expanding the anxiety dataset to include other academic activities and incorporating participants with diverse demographic characteristics could improve the model’s class separation. This is particularly important in the multiclass case, where classes exhibit significant overlap.

Author Contributions

Conceptualization, M.A.M.-A. and A.L.-C.; methodology, M.A.M.-A. and A.L.-C.; software, A.L.-C., J.C.-G. and H.V.G.-N.; validation, A.L.-C., J.C.-G. and H.V.G.-N.; formal analysis, M.A.M.-A.; investigation, A.L.-C. and J.C.-G.; resources, J.C.-G. and H.V.G.-N.; data curation, J.C.-G. and H.V.G.-N.; writing—original draft preparation, M.A.M.-A.; writing—review and editing, A.L.-C. and H.V.G.-N.; visualization, A.L.-C.; supervision, M.A.M.-A.; project administration, M.A.M.-A.; funding acquisition, M.A.M.-A. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Instituto Politécnico Nacional (IPN) through Secretaría de Investigación y Posgrado (IPN-SIP); Comisión de Operación y Fomento de Actividades Académicas del IPN (IPN-COFAA) and Programa de Estímulos al Desempeño de los Investigadores (IPN-EDI) and Secretaría de Ciencia, Humanidades, Tecnología e Innovación, Sistema Nacional de Investigadores (SECIHTI-SNII).

Institutional Review Board Statement

Ethical review and approval were waived for this study due to the Regulations of the General Health Law on Health Research in México (https://clinregs.niaid.nih.gov/sites/default/files/documents/mexico/HlthResRegs-GoogleTranslation.pdf accessed on 5 January 2026).

Informed Consent Statement

We obtained informed consent from all the subjects involved in this study.

Data Availability Statement

The data presented in this study are available from the corresponding author upon request, as the data obtained are currently copyrighted by the Mexican Institute of Industrial Property of the Mexican Government.

Acknowledgments

We are grateful to Jesús Adiel García Velázquez and Gael Hernández Solís for their assistance in creating the dataset.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cooray, S.E.; Bakala, A. Anxiety disorders in people with learning disabilities. Adv. Psychiatr. Treat. 2005, 11, 355–361. [Google Scholar] [CrossRef]
Liu, W.; Zhang, R.; Wang, H.; Rule, A.; Wang, M.; Abbey, C.; Singh, M.K.; Rozelle, S.; She, X.; Tong, L. Association between anxiety, depression symptoms, and academic burnout among Chinese students: The mediating role of resilience and self-efficacy. BMC Psychol. 2024, 12, 335. [Google Scholar] [CrossRef] [PubMed]
Lizarte Simón, E.J.; Gijón Puerta, J.; Galván Malagón, M.C.; Khaled Gijón, M. Influence of self-efficacy, anxiety and psychological well-being on academic engagement during university education. Educ. Sci. 2024, 14, 1367. [Google Scholar] [CrossRef]
Ghattas, A.H.S.; El-Ashry, A.M. Perceived academic anxiety and procrastination among emergency nursing students: The mediating role of cognitive emotion regulation. BMC Nurs. 2024, 23, 670. [Google Scholar] [CrossRef] [PubMed]
Rastogi, S.; Gupta, S.; Deepak, D.; Mishra, B.N.; Gore, R.; Singh, V. A Systematic Literature Review on Anxiety Among Undergraduate Students: Causes and Coping Strategies. Ann. Neurosci. 2025, 09727531251366078. [Google Scholar] [CrossRef]
Silva-Ramos, M.F.; López-Cocotle, J.J.; Meza-Zamora, M.E.C. Estrés académico en estudiantes universitarios. Investig. Cienc. 2020, 28, 75–83. [Google Scholar] [CrossRef]
Spitzer, R.L.; Kroenke, K.; Williams, J.B.W.; Löwe, B. A brief measure for assessing generalized anxiety disorder: The GAD-7. Arch. Intern. Med. 2006, 166, 1092–1097. [Google Scholar] [CrossRef]
Lovibond, P.F.; Lovibond, S.H. The structure of negative emotional states: Comparison of the Depression Anxiety Stress Scales (DASS) with the Beck Depression and Anxiety Inventories. Behav. Res. Ther. 1995, 33, 335–343. [Google Scholar] [CrossRef]
Ahmed, I.; Hazell, C.M.; Edwards, B.; Glazebrook, C.; Davies, E.B. A systematic review and meta-analysis of studies exploring prevalence of non-specific anxiety in undergraduate university students. BMC Psychiatry 2023, 23, 240. [Google Scholar] [CrossRef]
Zhang, H.; Feng, L.; Li, N.; Jin, Z.; Cao, L. Video-based stress detection through deep learning. Sensors 2020, 20, 5552. [Google Scholar] [CrossRef]
Ding, D.; Xu, W.; Liu, X.; Zhu, T. Facial video based stress detection for enhancing ecological validity. Acta Psychol. 2025, 255, 104877. [Google Scholar] [CrossRef]
Singh, A.; Kumar, D. Computer assisted identification of stress, anxiety, depression (SAD) in students: A state-of-the-art review. Med. Eng. Phys. 2022, 110, 103900. [Google Scholar] [CrossRef] [PubMed]
Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. Adv. Neural Inf. Process. Syst. 2014, 27. Available online: https://proceedings.neurips.cc/paper_files/paper/2014/file/ca007296a63f7d1721a2399d56363022-Paper.pdf (accessed on 5 January 2026).
Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 4489–4497. [Google Scholar]
Donahue, J.; Anne Hendricks, L.; Guadarrama, S.; Rohrbach, M.; Venugopalan, S.; Saenko, K.; Darrell, T. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2625–2634. [Google Scholar]
Arnab, A.; Dehghani, M.; Heigold, G.; Sun, C.; Lučić, M.; Schmid, C. Vivit: A video vision transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 6836–6846. [Google Scholar]
Gavrilescu, M.; Vizireanu, N. Predicting depression, anxiety, and stress levels from videos using the facial action coding system. Sensors 2019, 19, 3693. [Google Scholar] [CrossRef]
Wang, C.; Liang, L.; Liu, X.; Lu, Y.; Shen, J.; Luo, H.; Xie, W. Multimodal fusion diagnosis of depression and anxiety based on face video. In Proceedings of the 2021 IEEE International Conference on Medical Imaging Physics and Engineering (ICMIPE), Hefei, China, 13–14 November 2021; pp. 1–7. [Google Scholar]
Grimm, B.; Talbot, B.; Larsen, L. PHQ-V/GAD-V: Assessments to identify signals of depression and anxiety from patient video responses. Appl. Sci. 2022, 12, 9150. [Google Scholar] [CrossRef]
Li, X.; Lu, L.; Yi, X.; Wang, H.; Zheng, Y.; Yu, Y.; Wang, Q. LI-FPN: Depression and anxiety detection from learning and imitation. In Proceedings of the 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Istanbul, Turkey, 5–8 December 2023; pp. 567–573. [Google Scholar]
Wu, J.; Chen, D.; Ren, Z.; Li, Y.; Li, H.; Liu, Z. Adv-FVMamba: Anxiety Disorders Recognition in Imbalanced Video Datasets Using Adversarial Entropy Loss. In Proceedings of the 2024 17th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Shanghai, China, 26–28 October 2024; pp. 1–6. [Google Scholar]
Xu, X.; Zhang, X.; Zhang, Y. Faces of the Mind: Unveiling Mental Health States Through Facial Expressions in 11,427 Adolescents. arXiv 2024, arXiv:2405.20072. [Google Scholar] [CrossRef]
Lu, L.; Jiang, Y.; Li, X.; Wang, H.; Zou, Q.; Wang, Q. Depression and anxiety detection method based on serialized facial expression imitation. Eng. Appl. Artif. Intell. 2025, 149, 110354. [Google Scholar] [CrossRef]
Hernández Sampieri, R.; Fernández Collado, C.; Baptista Lucio, P. Metodología de la Investigación, 6th ed.; McGraw-Hill: Mexico City, Mexico, 2014. [Google Scholar]
World Medical Association. World Medical Association Declaration of Helsinki: Ethical principles for medical research involving human participants. JAMA 2025, 333, 71–74. [Google Scholar] [CrossRef]
Hamilton, M. The assessment of anxiety states by rating. Br. J. Med. Psychol. 1959, 32, 50–55. [Google Scholar] [CrossRef]
Villarreal-Zegarra, D.; Paredes-Angeles, R.; Mayo-Puchoc, N.; Arenas-Minaya, E.; Huarcaya-Victoria, J.; Copez-Lonzoy, A. Psychometric properties of the GAD-7 (General Anxiety Disorder-7): A cross-sectional study of the Peruvian general population. BMC Psychol. 2024, 12, 183. [Google Scholar] [CrossRef]
Lugaresi, C.; Tang, J.; Nash, H.; McClanahan, C.; Uboweja, E.; Hays, M.; Zhang, F.; Chang, C.L.; Yong, M.G.; Lee, J.; et al. Mediapipe: A framework for building perception pipelines. arXiv 2019, arXiv:1906.08172. [Google Scholar] [CrossRef]
Wang, Y.; Yan, S.; Liu, Y.; Song, W.; Liu, J.; Chang, Y.; Mai, X.; Hu, X.; Zhang, W.; Gan, Z. A survey on facial expression recognition of static and dynamic emotions. arXiv 2024, arXiv:2408.15777. [Google Scholar] [CrossRef]
Ji, S.; Xu, W.; Yang, M.; Yu, K. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 221–231. [Google Scholar] [CrossRef] [PubMed]
Huang, J.; Zhou, W.; Li, H.; Li, W. Attention-based 3D-CNNs for large-vocabulary sign language recognition. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 2822–2832. [Google Scholar] [CrossRef]
Montaha, S.; Azam, S.; Rafid, A.K.M.R.H.; Hasan, M.Z.; Karim, A.; Islam, A. Timedistributed-cnn-lstm: A hybrid approach combining cnn and lstm to classify brain tumor on 3d mri scans performing ablation study. IEEE Access 2022, 10, 60039–60059. [Google Scholar] [CrossRef]
Chowanda, A. Spatiotemporal features learning from song for emotions recognition with time distributed CNN. In Proceedings of the 2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI), Jakarta, Indonesia, 28 October 2021. [Google Scholar]
Joshi, A.; Kale, S.; Chandel, S.; Pal, D.K. Likert scale: Explored and explained. Br. J. Appl. Sci. Technol. 2015, 7, 396–403. [Google Scholar] [CrossRef]
Russell, J.A. A circumplex model of affect. J. Personal. Soc. Psychol. 1980, 39, 1161–1178. [Google Scholar] [CrossRef]
Hossin, M.; Sulaiman, M.N. A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process 2015, 5, 1. [Google Scholar]

Figure 1. Main tasks of the proposed methodology.

Figure 2. Videotaping for each participant.

Figure 3. The 50 frames in a video from the anxiety dataset. The participant shown in the figure is part of the anxiety dataset, and we obtained the participant’s consent.

Figure 4. Preprocessing pipeline.

Figure 5. Candidate ConvNets for AnxietyNet (each block shows the variable names of the corresponding equations): (a) CNN3D architecture (Equation (1)), (b) CNN3D + Att architecture (Equation (2)), and (c) TD-CNN architecture (Equation (9)).

Figure 6. Levels of intensity of the emotions of anxiety and happiness.

Figure 7. Combinations of labels included in the anxiety dataset.

Figure 8. t-SNE visualization of the seven and six labels in the anxiety dataset.

Figure 9. t-SNE visualization of the five labels in the anxiety dataset.

Figure 10. Number of videos for each class.

Figure 11. Learning evolution graphs using the accuracy metric.

Figure 12. Confusion matrix of five classes.

Figure 13. Radar graphs.

Figure 14. Examples of correct anxiety detection (a) and incorrect anxiety detection (b,c).

Table 1. Hyperparameter configuration.

Hyperparameter	Value
Epochs	35
Batch size	8
Learning rate	$1 \times 10^{- 3}$
Optimizer	ADAM
Loss function	Categorical cross-entropy

Table 2. Ablation study. Metric results for happiness and anxiety emotions *¹ using five classes; *² using six classes; and *³ using seven classes. The values in bold represent the best value for each metric.

Architecture	Accuracy	Precision	Recall	F1-Score	Specificity	Sensitivity
CNN3D *¹	0.7996	0.8070	0.7096	0.7423	0.9416	0.7096
TD-CNN *¹	0.8071	0.7849	0.7340	0.7547	0.9443	0.7340
CNN3D-ATT *¹	0.7996	0.8112	0.7352	0.7620	0.9419	0.7352
CNN3D *²	0.7041	0.7481	0.6900	0.7082	0.9374	0.6900
TD-CNN *²	0.7340	0.7590	0.7368	0.7453	0.9443	0.7368
CNN3D-ATT *²	0.7228	0.7416	0.7178	0.7261	0.9421	0.7178
CNN3D *³	0.6947	0.6331	0.6242	0.6244	0.9458	0.6242
TD-CNN *³	0.7078	0.6868	0.6420	0.6568	0.9476	0.6420
CNN3D-ATT *³	0.7022	0.6970	0.6378	0.6588	0.9462	0.6378

Table 3. Performance comparison between regression and classification tasks using five classes; *¹ classification task; *² regression task. The values in bold represent the best value for each metric.

Architecture	Accuracy	Precision	Recall	F1-Score	Specificity	Sensitivity	MAE	RMSE	MSE
CNN3D-ATT *¹	0.7996	0.8112	0.7352	0.7620	0.9419	0.7352	0.2134	0.4971	0.2471
CNN3D-ATT *²	0.5224	0.4369	0.5265	0.44652	0.8758	0.5265	0.33472	0.4765	0.2271

Table 4. Study of related works; *¹ the dataset was not available; *² obtained for binary classification; *³ studied the anxiety emotion directly; *⁴ multiclass result.

Authors	Dataset	Emotions	Used Models/Levels of Intensity	Reported Metrics
Gavrilescu et al. (2019) [17]	FACS–DASS *¹	Depression, stress, anxiety	FDASSNN/5 levels	Accuracy = 77.9%
Wang et al. (2021) [18]	By authors *¹	Depression, anxiety, without disorders	CNN+LSTM/binary classification	Precision = 0.7208
Grimm et al. (2022) [19]	By authors *¹	Anxiety	GAD-V/4 levels	AUC = 0.909
Li et al. (2023) [20]	VFEM *¹	Depression, anxiety	ResNet-18/binary classification	F1-score = 0.82 *²
Wu et al. (2024) [21]	QADAVB *¹	Anxiety	Adv-FVMMAmba/ binary clasification	F1-score = 0.751 *²
Xu et al. (2024) [22]	FACES *¹	Depression, stress, anxiety	ML algorithms/binary classification	F1-score = 0.66 *²
Lu et al. (2025) [23]	VFEM *¹	Depression, anxiety	SFE-Former/binary clasification	F1-score = 0.882 *²
AnxietyNet	Anxiety dataset *³	Happiness, anxiety	CNN3D, TD-CNN, CNN3D + attention/7, 6, 5 levels	F1-score = 0.7620 *⁴

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Moreno-Armendáriz, M.A.; Lara-Cázares, A.; Castillo-González, J.; Galdo-Navarro, H.V. Uncovering Several Degrees of Anxiety in Mexican Students Through Advanced Deep Learning Techniques. Algorithms 2026, 19, 235. https://doi.org/10.3390/a19030235

AMA Style

Moreno-Armendáriz MA, Lara-Cázares A, Castillo-González J, Galdo-Navarro HV. Uncovering Several Degrees of Anxiety in Mexican Students Through Advanced Deep Learning Techniques. Algorithms. 2026; 19(3):235. https://doi.org/10.3390/a19030235

Chicago/Turabian Style

Moreno-Armendáriz, Marco A., Arturo Lara-Cázares, Jared Castillo-González, and Halder V. Galdo-Navarro. 2026. "Uncovering Several Degrees of Anxiety in Mexican Students Through Advanced Deep Learning Techniques" Algorithms 19, no. 3: 235. https://doi.org/10.3390/a19030235

APA Style

Moreno-Armendáriz, M. A., Lara-Cázares, A., Castillo-González, J., & Galdo-Navarro, H. V. (2026). Uncovering Several Degrees of Anxiety in Mexican Students Through Advanced Deep Learning Techniques. Algorithms, 19(3), 235. https://doi.org/10.3390/a19030235

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Uncovering Several Degrees of Anxiety in Mexican Students Through Advanced Deep Learning Techniques

Abstract

1. Introduction

2. Related Works

3. Methodology

3.1. Dataset Creation: Anxiety Dataset

3.1.1. Experimental Protocol and Videotaping

3.1.2. Labeling

3.2. Preprocessing

3.3. Model Training: AnxietyNet

4. Results

4.1. Labeling Assigments

4.2. Testing Models

4.3. State-of-the-Art Comparison

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI