You are currently viewing a new version of our website. To view the old version click .
Mathematics
  • Article
  • Open Access

9 September 2025

RAMHA: A Hybrid Social Text-Based Transformer with Adapter for Mental Health Emotion Classification

,
and
1
Department of Computer Science, Mir Chakar Khan Rind University, Sibi 82000, Pakistan
2
Department of AI and SW, Gachon University, Seongnam 13120, Republic of Korea
*
Authors to whom correspondence should be addressed.
This article belongs to the Section E1: Mathematics and Computer Science

Abstract

Depression, stress, and anxiety are mental health disorders that are increasingly becoming a huge challenge in the digital age; at the same time, it is critical that they are detected early. Social media is a rich and complex source of emotional expressions that requires intelligent systems that can decode subtle psychological states from natural language. This paper presents RAMHA (RoBERTa with Adapter-based Mental Health Analyzer), a hybrid deep learning model that combines RoBERTa, parameter-efficient adapter layers, BiLSTM, and attention mechanisms and is further optimized with focal loss to address the class imbalance problem. When tested on three filtered versions of the GoEmotions dataset, RAMHA shows outstanding results, with a maximum accuracy of 92% in binary classification and 88% in multiclass tasks. A large number of experiments are performed to compare RAMHA with eight standard baseline models, including SVM, LSTM, and BERT. In these experiments, RAMHA is able to consistently outperform the other models in terms of accuracy, precision, recall, and F1-score. Ablation studies further confirm the contributions of the individual components of the architecture, and comparative analysis demonstrates that RAMHA outperforms the best previously reported F1-scores by a substantial margin. The results of our study not only indicate the potential of the adapter-enhanced transformer in emotion-aware mental health screening but also establish a solid basis for its use in clinical and social settings.

1. Introduction

Mental health is a complex subject that significantly impacts the way individuals think, feel, and behave. Psychological health problems encompass more than 200 distinct conditions, among which depression and anxiety disorders—including post-traumatic stress disorder (PTSD) and obsessive–compulsive disorder (OCD) []—are common. The International Classification of Diseases (ICD), maintained by the World Health Organization (WHO), provides a standardized system of classifications and codes for all known illnesses and health conditions worldwide. The most recent edition, ICD-11, contains over 55,000 codes [] covering diseases, symptoms, injuries, and risk factors, and it is widely used for diagnosis, treatment, research, insurance, and healthcare management.
Mental health disorders affect approximately one in seven children and adolescents aged 10–19, accounting for about 15% [] of the global disease burden in this age group. Among these, depression, anxiety, and behavioral disorders are particularly prominent and constitute major causes of morbidity and disability in adolescents. Globally, it is estimated that around 3.8% of the population suffers from depression, with women and older adults at the highest risk, amounting to roughly 280 million individuals. Depression is especially prevalent among women during pregnancy and postpartum. Furthermore, suicide is the third leading cause of death worldwide, claiming more than 700,000 lives annually, and it ranks fourth among individuals aged 15–29 years [].
Depression, anxiety, and stress-related disorders are recognized as major public health problems worldwide, particularly in connection with the increased use of online communication and social media. As more individuals—including younger cohorts—share their feelings and experiences online, vast volumes of user-generated text data have become available, offering a rich opportunity for automated mental health assessment via natural language processing (NLP) methods []. Without timely treatment or diagnosis, these conditions can severely degrade quality of life and potentially develop into chronic psychological disorders or lead to suicidal tendencies [].
Online forums and social media platforms have proven to be valuable for mental health research and sentiment analysis [,], particularly in monitoring populations []. However, a key challenge remains: the lack of reliable methods to classify and assess the severity of mental illnesses based on user-generated text, ideally validated with clinical confirmation []. Progress is further constrained by the scarcity of high-quality, large-scale, clinically validated datasets specifically targeting depression severity [].
Traditional machine learning methods, such as SVMs or Naive Bayes, have also been applied to predict psychological distress from text. However, the bag-of-words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), or hand-crafted lexicon features have a shallow nature, which has been known to constrain such models. Although they may take the word frequency on the surface level or hypothesis-driven pattern, they cannot take into consideration the emotional connotations of language and the subtleties of context. This limitation is even clearer when it comes to discriminating between sentiment (overall positive, negative, or neutral attitude) and emotion (specific psychological states like anxiety, stress, or depression), which require more detailed representations [,]. The emergence of Transformer-based models, notably BERT [] and RoBERTa [], has significantly advanced the field by providing context-aware representations of textual units, enhancing robustness for sentiment and intent analysis. Despite these improvements, challenges remain: these models are computationally expensive to fine-tune, often struggle to adapt to new application domains, and frequently act as black boxes, with internal mechanisms that are difficult to interpret, particularly in high-stakes scenarios such as mental health diagnosis [,].

Problem Statement

Although NLP has made significant progress, existing models for mental health detection face substantial limitations: limited understanding of subtle context, inability to capture long-range dependencies, high computational costs, and difficulty detecting implicit emotional cues. These limitations hinder the accurate and timely classification of depression, stress, and anxiety in user-generated text.
To address these challenges, we propose a new hybrid deep learning model with enhanced capacity to extract meaningful emotional signals from text while maintaining efficiency, and accuracy. Specifically, we introduce RAMHA (RoBERTa Adapter Mental Health Attention), a hybrid model designed to automatically identify depression, anxiety, and stress from textual data. RAMHA aims to balance semantic performance, and computational efficiency. Its architecture consists of a RoBERTa-large Transformer backbone with adapter-tuning via Pfeiffer Adapter Fusion [], which enables task-conditioned adaptation without retraining the entire model. This is followed by a BiLSTM layer that captures sequential emotional dependencies often expressed across multiple tokens, as BiLSTMs process [] text bidirectionally. A token-level attention mechanism highlights emotionally salient cues, supporting interpretable mental health classification [].
RAMHA distinguishes itself from prior hybrid Transformer–BiLSTM–attention models in several ways: (i) it employs adapter-based fine-tuning for parameter-efficient training, (ii) integrates task-specific attention after BiLSTM for token-level interpretability, (iii) uses focal loss to address class imbalance, and (iv) is evaluated on curated GoEmotions subsets for both binary and ternary classification, achieving consistent improvements over conventional fine-tuning approaches.
The major theoretical and empirical contributions of this work are as follows:
  • RAMHA Architecture:We propose a novel multi-stage hybrid model that integrates RoBERTa-large, lightweight adapters, BiLSTM, and attention mechanisms in an interpretable and parameter-efficient framework for mental health classification.
  • Emotion-to-Mental Mapping: We perform a psychologically meaningful remapping of an emotion-rich dataset into three diagnostic classes—depression, anxiety, and stress—enhancing real-world applicability.
  • Adapter Efficiency: Leveraging Pfeiffer Adapter Fusion, RAMHA fine-tunes task-specific parameters while largely preserving the generalization capabilities of the pretrained language model, with minimal training costs.
  • Empirical Verification: RAMHA is evaluated on GoEmotions data and demonstrates superior performance compared to baseline Transformer-based classifiers, particularly in detecting subtle emotional signals indicative of mental health conditions.
  • Binary and Ternary Classification: The model addresses both binary classification tasks (e.g., detecting the presence or absence of depression) and ternary classification tasks (e.g., distinguishing between depression, anxiety, and control).
RAMHA combines the semantic richness of pretrained language models, the emotional sensitivity of recurrent attention modules, and the flexibility of adapter-based tuning to provide and a scalable framework for early detection of mental health conditions. This approach is valuable for digital mental health platforms, clinical decision-support systems, and personalized self-care tools, particularly in low-resource or privacy-sensitive settings. Although the current implementation uses an English RoBERTa backbone, RAMHA is language-agnostic by design; replacing the encoder with a multilingual model or training language-specific adapters allows deployment in other languages with minimal retraining.
The paper is structured as follows: Section 2 reviews related work, Section 3 describes the methodology and model design, Section 4 details the dataset and hyperparameters, Section 5 presents results, discussion, and comparative analysis, and Section 6 concludes the study and outlines future directions.

3. Methodology and Model Design

In this section, we present the methodology and model design utilized in our study to characterize mental health states—specifically depression, anxiety, and stress—based on textual data and associated emotional cues. The proposed framework comprises five key components: data preparation, tokenization, model architecture, training, and evaluation. Each component has been carefully designed to enhance both model performance and interpretability, particularly in the context of emotion-driven mental health analysis. Figure 1 illustrates the overall workflow, while Algorithm 1 provides a detailed step-by-step description of the proposed model.
Algorithm 1 RAMHA: RoBERTa–Adapter–BiLSTM–Attention framework.
Require: Input text x
Ensure: Predicted class probabilities p
 1: T RoBERTaTokenizer ( x )
 2: E RoBERTa ( T )
 3: E Adapter ( E )
 4: H BiLSTM ( E )
 5: for each hidden state h i H  do
 6:     s i score ( h i )
 7: end for
 8: α i exp ( s i ) j exp ( s j )
 9: C i α i h i
10: C Dropout ( C , p )
11: z W C + b
12: p Softmax ( z )
13: F L ( p t ) α t ( 1 p t ) γ log ( p t )
14: return p
Figure 1. RAMHA model training pipeline.

3.1. Data Preparation

This is followed by loading and preparing the GoEmotions dataset, which is a large corpus built by Google consisting of 58k English Reddit submissions labeled on 28 emotion types along with a neutral label. The raw units are pieces of text and their emotion labels.
Figure 2 presents the step-by-step mechanism of the proposed architecture. In order to prepare this dataset to fall under the mental health theme, the following sub-steps have been undertaken:
  • Loading and Preprocessing: The data is read in CSV format. Every item of text is preprocessed, i.e., all unnecessary punctuation marks, symbols (i.e., dots, commas, and special characters), HTML tags, and a lot of whitespace are deleted. This makes the input noise-free and optimized to be tokenized and modeled downstream.
  • Mapping Emotions to Mental Health categories: A mapping dictionary has been formulated that has defined the original 28 emotion labels using three high-level categories of mental illnesses:
    Depression: It contains sadness, grief, loneliness, and disappointment.
    Anxiety: It consists of fear, nervousness, worry, and embarrassment.
    Stress: Stress includes labels such as annoyance, confusion, nervousness, disappointment, embarrassment, and surprise.
  • Choice of Primary Labels: A particular Reddit comment can be associated with various emotions; however, there is an argmax operation where we pick out the most obvious mental health label on a per-sample basis. This reduces the multi-label framework into a multi-class classification problem.
  • Label Encoding: Label encoding is used so that the category labels in mental health (depression, anxiety, stress) can be translated into integers (e.g., 0, 1, 2) and are compatible with PyTorch models.
  • Data Balancing: To mitigate the problem of class imbalance that exists in the combined datasets, we conducted proportional sampling so that each target class—depression, anxiety, and stress—would have the same number of samples before the data was split.
  • Data Splitting: The dataset was divided into training and validation subsets using an 80/20 split. The validation set was employed to monitor model performance during training and to mitigate the risk of overfitting. To ensure fairness and balanced representation of classes, we applied class-balancing techniques along with focal loss, making the validation set a reliable indicator of the model’s generalization performance, effectively serving as a substitute for a separate test set.
Figure 2. Block diagram of step-by-step mechanism of Algorithm 1.

3.2. Tokenization

The RoBERTa-Large tokenizer in the Hugging Face Transformers library is used to convert textual inputs to a numerical representation. RoBERTa uses a BPE (byte pair encoding)-based tokenizer to split each word, explained in the following equation.
S = { w 1 , w 2 , , w n }
Tokenizer carries out sub-word tokenization as follows:
S = { s i 1 , s i 2 , , s i m }
Final sequence shown in the following equation.
T = B P E ( S ) = { t 1 , t 2 , , t k }
Here, K m a x l e n g t h and t v , with V being RoBERTa’s vocabulary of subword tokens.
  • Truncation and Padding: Each input sequence is truncated to be equal to the maximum model length or padded to a fixed length to provide consistency.
  • Special Tokens: RoBERTa-specific tokens and padding tokens will be used where necessary.
  • Token Type and Attention Masks: These are automatically generated to create the difference between tokens and attention-based processing.
Encoded label and the attention masks joined with the tokenized sequences are organized under the PyTorch Dataset and Data Loader objects, ensuring that they are efficiently trained in minibatches.

3.3. Model Architecture

The proposed model will incorporate a form of Transformer representation learning and incorporate sequence modeling with an attention mechanism to learn the dynamics of the emotion in texts specific to mental health. The model architecture is illustrated in Figure 3:
Figure 3. RAMHA model architecture starts with pre-processing of the data and mapping labels; then, it uses RoBERTa as a tokenizer. It uses a RoBERTa encoder with Pfeiffer adapter incorporation, a subsequent BiLSTM and attention mechanism to capture the contextual information, and a softmax layer to create multi-class classification (depression, anxiety, and control) predictions.

3.4. Pretrained Encoder (RoBERTa-Large)

In this paper, we use RoBERTa-Large as the encoder upon which the deep contextualized word embedding becomes generated on raw textual data. RoBERTa is an enhanced form of BERT, which contains 24 layers of Transformers; each layer has a hidden dimension of 1024, 16 attention heads inside, and 4096 feedforward, and has a total of approximately 355 million parameters. In each layer, RoBERTa uses multi-head self-attention to derive the token representations, and these can be expressed as follows:
A t t e n t i o n ( Q , K , W ) = s o f t m a x Q K t d k V
In which Q, K, and V are the query, key, and value matrices formed by the hidden states, and dkt is the dimension of the key vectors. This formulation allows the model to infer long-range dependencies since it pays attention to all tokens present within the sequential order, irrespective of their location. A key question arises as to why RoBERTa is considered the most feasible choice; thus, this is illustrated in Figure 4.
Figure 4. Comparison highlighting RoBERTa as the most feasible choice.
Nevertheless, RoBERTa-Large is computationally feasible under our implementation with adapters, which add a few task-specific parameters and freeze the main Transformer weights. The end result of RoBERTa is given by a set of contextual embeddings to the BiLSTM and attention layers for further processing. These embeddings acquire sequential and emotional dependency to classify mental health.

3.5. Adapter Integration: Pfeiffer Configuration

In order to scale RoBERTa to domain-specific analysis of mental health without sacrificing training scalability, we incorporate adapter modules with the AdapterTransformers library. Adapters are small feedforward neural modules that are inserted into every Transformer layer and can be fine-tuned to a specific task without re-training all 355 million weights of RoBERTa-Large. Specifically, we use the Pfeiffer configuration, a widely adopted adapter setup that introduces a bottleneck architecture after the feed-forward sublayer in each Transformer block. Every adapter is composed of a down-projection, non-linearity, and up-projection layer. The adapter transforms a hidden representation in the following form.
A d a p t e r ( h ) = h + W u p . σ ( W d o w n . h )
The trainable parameters are those of the adapter (usually <1–3% of the full model), and the RoBERTa parameters are fixed in a frozen state, which also considerably lowers the chance of catastrophic forgetting and results in a more efficient computation. This modular architecture allows the domain adaptation with a low resource footprint and facilitates fast. Adapter integration is illustrated in Figure 5.
Figure 5. Adapter integration: Pfeiffer configuration.

3.6. BiLSTM

To further absorb the temporal dependency and emotional evolution in the text, the contextual embeddings returned by RoBERTa are fed into a Bidirectional Long Short-Term Memory (BiLSTM) network. BiLSTM, unlike unidirectional models, makes use of the forward and backward directions of the sequence, allowing the network to be aware of the past and the future context at the same time, a requirement in the recognition of subtle emotion expression in mental health narratives. The mathematical relationships among the various gates, hidden states, and memory cells for the initial forward direction layer are formally defined as follows, where the hidden state is a t .
i t a = σ ( U i a x t + W i a a ( t 1 ) + b i a )
f t a = σ ( U f a x t + W f a a ( t 1 ) + b f a )
o t a = σ ( U o a x t + W 0 a a ( t 1 ) + b o a )
U t a = tanh ( U u a x t + W u a a ( t 1 ) + b u a )
C t a = ( i t a U t a f t a C ( t 1 ) a )
a t = o t a t a n h C t a
Formal mathematical expressions for the backward layer and the hidden state c t are presented below.
i t c = σ ( U i c x t + W i c c ( t 1 ) + b i c )
f t c = σ ( U f c x t + W f c c ( t 1 ) + b f c )
o t c = σ ( U o c x t + W 0 c c ( t 1 ) + b o c )
U t c = tanh ( U u c x t + W u c c ( t 1 ) + b u c )
C t c = ( i t c U t c f t c C ( t 1 ) c )
c t = o t c tanh C t c
Final output is obtained using the following mathematical equations.
h t LSTM = LSTM ( c t ) , t { 1 , , m }
h t LSTM = LSTM ( c t ) , t { m , , 1 }
h t LSTM = h t LSTM h t LSTM , W t = h t LSTM
Here, the combined vector encodes information from both directions and goes inside a single model, which allows this model to better understand the contextual information, linguistic polarity, and emotional shifts along the sequence. These enhanced embeddings are then fed into an attention mechanism on a per-token importance weighting.

3.7. Attention Mechanism

In order to increase the model’s focus on the most emotionally informative sections of a sentence, we use an attention mechanism on the BiLSTM outputs. This system attaches learned weights to every token, allowing the model to use words that are emotionally significant towards mental health disorders.
Suppose that the BiLSTM output sequence is s = [ h 1 , h 2 , h t ] , where h t R 2 d . For each token t, an attention score is computed as
α t = exp ( V h ) t exp ( V h )
S A w = t α t h t
The symbols h and h represent the forward and backward hidden states obtained from the Bi-LSTM, while V denotes a trainable parameter.
The term S A w refers to the average of the attention weights assigned to the LSTM hidden states, as illustrated in Figure 6, where the attention score is calculated using the following equation:
score ( h t ) = v T tanh ( W h t + b )
Figure 6. Attention mechanism adopted from study [,].
The attention mechanism is applied to the h t LSTM outputs, allowing the model to focus more or less on specific terms within the input comment. This mechanism enables the model to selectively emphasize informative words by updating the feature vector, as demonstrated in Equation (23).
u t LSTM = tanh ( W w LSTM h t LSTM + b w LSTM )
α t LSTM = exp ( u t LSTM ) u w LSTM t exp ( u t LSTM ) u w LSTM
S LSTM = t α t LSTM h t LSTM
Here, u t denotes the hidden representation of h t , while u w is a context vector that is initialized and learned during training. The importance of each word is determined by comparing u t with u w , followed by normalization, as shown in Equation (24). The resulting attention weights α t are then used to compute the sentence representation S through a weighted sum, as defined in Equation (25). The sentence vector S effectively captures the most relevant information from the entire sequence and is subsequently passed on for downstream classification tasks.

3.8. Dropout

In order to reduce overfitting and enhance the model to perform better on unseen data, dropout regularization is performed on the attention output. The dropout randomly disables part of the neurons at training time and thus does not allow co-adaptation of features and favors robustness. In particular, we apply a dropout rate of 30%, i.e., during each forward propagation, 30% of the units in the sentence representation derived by the attention and s are randomly set to zero.
s = D r o p o u t ( s , p = 0.3 )
The result of such regularized vector s is then fed into the final classification layer. Dropout is one of the simplest and most effective methods to decrease variance and enhance performance with contrasting mental health manifestation models.

3.9. Classification Head

The last part of the architecture is the classification head that is charged with mapping the high-level representation s to mental health categories. We have a fully connected (linear) layer that is followed by a softmax operation that gives us the probabilities of depression, anxiety, or stress. The calculation is referred to as
y = s o f t m a x ( W c s + b c )
Above equation provides the output of the probability distribution with the help of learnable weights and biases, and this layer also allows interpretable multi-class prediction, which is based upon the emotionally salient information retrieved by earlier modules.
The adapter-based design also facilitates parameter-efficient transfer learning, allowing RAMHA to be adapted to other classification tasks, including International Classification of Diseases (ICD)-related clinical coding, by training new task-specific adapters and replacing the classification head.

4. Dataset and Hyper-Parameters

In this study, we now have a strong experimental framework to evaluate emotion classification with Transformer-based architectures. Here, we use the GoEmotions dataset with fine-grained emotional annotations, which is widely accepted. Moreover, our models are tuned with the help of well-chosen hyperparameters in order to improve the performance and guarantee a fair analysis in all experimental settings.

4.1. Dataset

This paper uses the GoEmotions dataset, which is a complete and popular benchmark of emotion classification tasks. The data set has three sub-versions that include GoEmotions1 with 70,000 instances, GoEmotions2 with 70,000 instances, and GoEmotions3 with 70,226 instances. Together, the data set offers a total of about 210,226 labeled examples, which guarantee a strong and varied training and evaluation base. Each instance in the dataset is annotated with one or more of 28 fine-grained emotion categories, which are mentioned in Table 2. The dataset is publicly accessible on Kaggle.com (https://www.kaggle.com/datasets/debarshichanda/goemotions, accessed on 1 August 2025). Figure 7 and Figure 8 illustrated datasets’ configuration and visualization of dataset distributions, respectively.
Table 2. Target classes and their corresponding labels.
Figure 7. Configuration of the three datasets used in this study, showing the number of instances, labels, and label names.
Figure 8. Comparative visualization of dataset distributions.

4.2. Computational Environment and Hyper-Parameters

All experiments were conducted on a system equipped with an Intel Core i9-12900K CPU, 64 GB RAM, and an NVIDIA GeForce GTX 1080 Ti GPU with 24 GB VRAM. The operating system was Ubuntu 22.04 LTS. The implementation used Python 3.10, PyTorch 2.1, and Hugging Face Transformers 4.39. Other key libraries included NumPy. To ensure reproducibility, random seeds were fixed at 42 across all experiments.
Input text was preprocessed by lowercasing, punctuation removal, and tokenization using the RoBERTa tokenizer. For model training, hyperparameters were selected based on a grid search over a validation set. Specifically, the learning rate was set to 2 × 10 5 , the dropout rate to 0.5, and the focal loss parameter γ to 2. These values were chosen to balance convergence speed and generalization, as well as to handle class imbalance in the curated GoEmotions subsets. The model was trained until validation loss stabilized. Furthermore, Table 3 highlights hyperparameter settings for the proposed model.
Table 3. Hyperparameter settings of the proposed model.

5. Results Discussions

The dataset is challenging due to overlapping labels, context-dependent meanings, and class imbalance. Therefore, attention-based deep learning algorithms are more applicable in order to identify these nuances and isolate classes. The performance of the text classifiers provides the evaluation with eight basic classification models named as LR, SVM, NB (Naive Bayes), LSTM, R-CNN, GRU, Bi-GRU, and BERT (here, SVM, LR, and NB are considered weak learners, whereas the remaining models are considered strong learners). We have three sub-versions of the GoEmotions dataset: GoEmotions-1, GoEmotions-2, and GoEmotions-3. Every dataset contains about 70,000 labeled samples that are divided into 28 categories of emotions. Our task-specific data were used to tune these models to identify text as being depression, stress, or anxiety. The performance of each model was evaluated using accuracy, precision, recall, and F1-score to provide a comprehensive assessment of effectiveness. Table 4 presents a comparative analysis of eight well-known classification models. Among traditional machine learning approaches, SVM and Logistic Regression (LR) achieved accuracies of 0.66 and 0.68, respectively, whereas Naive Bayes performed poorly, with an accuracy of 0.61 and an F1-score of 0.54. In contrast, deep learning models demonstrated superior performance. LSTM reached an accuracy of 0.72 with an F1-score of 0.67, while R-CNN and GRU also showed improved predictive capabilities. Notably, the Bi-GRU model outperformed other RNN-based baselines, achieving the highest accuracy of 0.75 and an F1-score of 0.71.
Table 4. Performance comparison of baseline models on GoEmotions datasets using accuracy, precision, recall, and F1-score metrics.
Transformer-based BERT was the most spectacular in terms of overall performance on all datasets and achieved an accuracy of 0.78 on GoEmotions-1 and very high results on GoEmotions-2 and 3 with consistent F1-scores of 0.75. These figures proved that pre-trained language models, such as BERT, particularly in multi-label, fine-grained emotion classification, have a strong generalization capability and representation.
Now here in Table 5 is a comparison of the results of six (6) advanced Transformer-based models, BERT, DistilBERT, ALBERT, XLNet, RoBERTa, RoBERTa+BiLSTM, and our proposed model RAMHA on GoEmotions-1, GoEmotions-2, and GoEmotions-3. Accuracy, precision, recall, and F1-score were used to test these models so that they could be properly assessed. Although commonly used Transformer-based architectures such as BERT, ALBERT, and DistilBERT were found to perform reliably (with accuracies of 0.77–0.79), XLNet and RoBERTa performed even better and reached F1-scores of 0.78.
Table 5. Achieved result on GoEmotions datasets against various models.
The use of BiLSTM together with RoBERTa further increased the ability of the model to capture sequential dependencies, and there were steady improvements with F1-scores of 0.80. Nevertheless, our suggested hybrid model, RAMHA, was substantially better than all baselines in all data sets. On GoEmotions-1, it scored an accuracy of 0.88 and an F1-score of 0.85, with comparable good results on GoEmotions-2 and GoEmotions-3, with the best accuracy of 0.89 and an F1-score of 0.87.
One notable point here is that GoEmotions-3 consistently performed best across all models in Table 5, including BERT, DistilBERT, ALBERT, XLNet, and our proposed RAMHA. This dataset appears more suitable for depression–stress–anxiety classification as it is better aligned with clinically relevant categories and provides clearer class separability with less semantic overlap. In particular, the narrower and coherent label space reduces noise, making adapter-based fine-tuning more effective. Such performance enhancement in RAMHA can further be explained by its integration of adapter-based tuning, bidirectional temporal feature extraction, attention-driven focus on key emotional cues, and the robustness of focal loss in handling class imbalance.
On the whole, the findings verify that RAMHA does not merely outperform the traditional deep learning and Transformer baselines but also offers a very robust and generalizable fine-grained emotion classification solution to large-scale multi-label data.
To further assess the robustness and versatility of our proposed model, RAMHA, we evaluated its performance on the GoEmotions-1, GoEmotions-2, and GoEmotions-3 datasets across different emotion-to-class mappings. As shown in Table 6, RAMHA consistently outperformed other models, achieving high scores across all evaluation metrics, including accuracy, precision, recall, and F1-score. The model demonstrated strong performance in multi-class classification of depression, stress, and anxiety, attaining a maximum accuracy of 89% and an F1-score of 0.87, highlighting its effectiveness in multi-class emotion-based mental health classification.
Table 6. Performance of RAMHA (proposed model) on GoEmotions datasets across different emotion-to-class mapping combinations, compared against various baseline models.
Notably, performance improved further under binary or alternative tri-class mappings. For instance, using the depression–stress–control mapping, RAMHA achieved an accuracy of 0.90 and an F1-score of 0.88. In the binary depression vs. control scenario, it attained its highest performance with 93% accuracy, precision, and recall between 0.91–0.92, and an F1-score of 0.92. These results underscore the generalization capability of RAMHA across different emotion groupings, demonstrating its practical applicability for mental health classification. The combination of adapter-based fine-tuning, BiLSTM, attention mechanisms, and focal loss proves effective for both fine-grained and coarse-grained mappings, providing a scalable and precise solution.
As a whole, Table 6 provides a broader perspective on how class similarity and opposition affect model performance by examining different class combinations. When depression, stress, and anxiety are grouped, their overlapping and semantically similar vocabulary makes it harder for the models to distinguish between them, leading to slightly lower scores. In contrast, combinations that include the control class achieve higher results, as control exhibits vocabulary patterns that are more distinct from mental health categories, making separation easier. Furthermore, reducing the task to two-class settings consistently improves performance due to less inter-class competition, with the depression–control pair achieving the highest accuracy because of its clear linguistic contrast.
Compared to full Transformer fine-tuning, RAMHA trains substantially fewer parameters, resulting in lower memory usage and faster convergence, while inference adds only a modest BiLSTM-attention overhead, preserving near-backbone latency.

5.1. Ablation Study

Table 7 presents the ablation study results, evaluating the contribution of each architectural component in the RAMHA model across three GoEmotions datasets: GoEmotions-1, GoEmotions-2, and GoEmotions-3. Metrics reported include accuracy, precision, recall, and F1-score.
Table 7. Ablation study: effect of architectural components on performance.
Using only RoBERTa establishes a baseline, achieving accuracies between 0.80–0.81 and F1-scores of 0.77–0.79, demonstrating the effectiveness of contextual embeddings but revealing limitations in sequential modeling and interpretability. Incorporating Pfeiffer Adapters allows task-specific adaptation without retraining the full model, slightly improving performance (accuracy: 0.80–0.82, F1-score: 0.77–0.80).
Adding a BiLSTM layer captures sequential dependencies, yielding minor gains over RoBERTa alone (accuracy: 0.81–0.83, F1-score: 0.78–0.80). The combination of Adapters + BiLSTM integrates task-adaptive embeddings with sequential modeling, further improving all metrics (accuracy: 0.83–0.84, F1-score: 0.80–0.82).
Introducing Attention focuses the model on emotionally salient tokens, enhancing performance consistently across datasets (accuracy: 0.84–0.85, F1-score: 0.82–0.83). Finally, the complete RAMHA model, which combines RoBERTa, adapters, BiLSTM, attention, and focal loss, achieves the highest performance (accuracy: 0.87–0.89, F1-score: 0.85–0.87), demonstrating the complementary contributions of all components.
Overall, the incremental improvements confirm that RAMHA’s hybrid architecture effectively captures complex emotional signals in text, offering a robust and interpretable framework for multi-class mental health classification.

5.2. Comparative Study

Table 8 presents a comparative analysis of recent studies that employed the GoEmotions dataset and other emotion-related corpora. Earlier approaches, such as [], implemented classical fine-tuning methods, including SBERT with WordNet and KEA-ELECTRA, achieving relatively low F1-scores of 49.00 and 49.60, respectively. Subsequent models, such as BERT and its variants with Masked Language Modeling (MLM) and Curriculum Prompting Design (CPD), as explored in [,], yielded modest improvements, raising the F1-score to 51.25–52.34. Further gains were reported in [], where Seq2Emo, combined with concatenated datasets (SemEval18 + GoEmotions), achieved an F1-score of 59.57, highlighting the advantages of multi-source learning.
Table 8. Comparative study of existing methods based on F1-score.
The CNN+BERT+RoBERTa ensemble presented in [] demonstrated a substantial performance increase, achieving an F1-score of 84.58, indicating the potential of hybrid Transformer architectures. Building on this trend, our proposed RAMHA model—which integrates RoBERTa, adapter-based fine-tuning, BiLSTM, attention mechanisms, and focal loss—achieves a new state-of-the-art performance with an F1-score of 88.00. Moreover, RAMHA with different class mappings, such as the binary classification of depression versus control, reached a 92% F1-score and 93% accuracy.
The high F1-score of RAMHA can be attributed to a combination of dataset balancing strategies and loss optimization. Given that the original GoEmotions dataset was imbalanced across mental health categories, we applied class-balancing techniques to ensure equitable representation of depression, anxiety, and stress samples. Additionally, integrating focal loss mitigated bias toward over-represented classes and emphasized challenging-to-classify examples, which are prevalent in under-represented categories. RAMHA effectively balances deep contextual embeddings with sequential emotional dynamics, a synergy that baseline models, such as RoBERTa-only or BiLSTM-only, could not achieve, either lacking refined contextual features or failing to capture bidirectional sequential dependencies. This combination of advanced architecture and imbalance handling enabled RAMHA to detect subtle patterns across all classes, resulting in the highest F1-scores reported in Table 8.
Figure 9 (parts a,c,e) depict the classification accuracy over 15 epochs for GoEmotions-1, GoEmotions-2, and GoEmotions-3, respectively, in a three-class system comprising depression, anxiety, and stress. These graphs demonstrate consistent convergence and steadily increasing accuracy, highlighting the robustness of our model. Correspondingly, Figure 9 (parts b,d,f), shows the training and validation loss curves for the same three-class configurations.
Figure 9. Graph of accuracy, training loss, and validation loss of proposed model RAMHA.
Figure 9 part g,h, presents an alternative three-class configuration (depression, stress, and control) without anxiety, where the model maintains high performance with well-aligned loss curves. Finally, Figure 9 part i,j, illustrate the model’s performance in a binary classification scenario (depression vs. stress), demonstrating high precision and smooth loss reduction. These results collectively indicate the efficiency and adaptability of RAMHA, even when applied to simplified or alternative mappings of emotional categories.
Figure 10 (parts a–c), depict the confusion matrices for GoEmotions-1, GoEmotions-2, and GoEmotions-3, respectively, mapped to three classes: depression, anxiety, and stress. These matrices highlight the model’s ability to differentiate closely related emotional states, where high diagonal values indicate a high true positive rate and low inter-class confusion.
Figure 10. Confusion matrices of proposed model RAMHA.
Figure 10 (part d), shows the adjusted three-class mapping (depression, stress, and control), where the exclusion of anxiety leads to clearer separability among classes, demonstrating enhanced differentiation. Figure 10 (parts e,f), examine binary classification scenarios: depression vs. stress and depression vs. control. In these cases, class boundaries are even more pronounced with minimal misclassification, underscoring the model’s robust and discriminative performance in simpler settings. Overall, the confusion matrices confirm that RAMHA achieves strong discriminative capability across both complex and reduced emotional label spaces.
Potential limitations include domain shift when applying RAMHA to different social platforms, languages, or clinical modalities. Ethical considerations include the risk of misclassification and privacy concerns; deployment should be accompanied by human oversight and compliance with data protection regulations.

6. Conclusions and Future Work

This paper introduced RAMHA (RoBERTa with Adapter-based Mental Health Analyzer), a hybrid deep learning framework designed for early detection of mental health disorders through social media text. By integrating RoBERTa with adapter layers, BiLSTM, attention mechanisms, and focal loss, RAMHA demonstrated the ability to capture nuanced emotional expressions while effectively addressing class imbalance. Experimental evaluations on three filtered versions of the GoEmotions dataset showed that RAMHA consistently outperforms traditional machine learning models (SVM, LSTM) and Transformer-based baselines (BERT, RoBERTa), achieving up to 92% accuracy in binary classification and 88% accuracy in multiclass classification. Here, the dataset GoEmotions-3 consistently demonstrated superior performance across all models, reflecting its closer alignment with clinically relevant categories and clearer class boundaries. The differences in performance by the groupings of classes point (Table 6) to the importance of semantic overlap as compared to distinct vocabulary patterns. The simplified or contrasting class arrangements enhanced accuracy even more, and the role of dataset design should not be underestimated. The ablation studies further confirmed the contribution of each architectural component, while comparative analysis highlighted that RAMHA surpasses the best previously reported F1 scores by a significant margin.
Although the results are encouraging, there are several avenues for future research. First, validation of RAMHA on larger and more diverse datasets, including multilingual and cross-cultural corpora, would enhance its generalizability. Second, in the future, the RAMHA model can be extended to incorporate acoustic and image modalities for mental health classification. In summary, RAMHA not only advances the state of the art in emotion-based mental health detection but also lays the groundwork for future developments in AI-driven mental healthcare systems.

Author Contributions

Conceptualization, M.K.; methodology, M.K. and L.K.; software, M.K.; validation, L.K.; formal analysis, L.K. and A.C.; investigation, M.K.; resources, A.C.; data curation, M.K.; writing—original draft preparation, M.K. and L.K.; writing—review and editing, L.K. and A.C.; supervision, L.K. and A.C.; project administration, L.K. and A.C.; funding acquisition, A.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI22C0646) and supported by the National Institute of Health (NIH) research project in South Korea (project No. 2024ER080300).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available in Kaggle at URL https://www.kaggle.com/datasets/debarshichanda/goemotions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. What Are the Types of Mental Disorders?—my.clevelandclinic.org. Available online: https://my.clevelandclinic.org/health/diseases/22295-mental-health-disorders (accessed on 1 August 2025).
  2. International Classification of Diseases (ICD)—who.int. Available online: https://www.who.int/standards/classifications/classification-of-diseases?utm_source=chatgpt.com (accessed on 17 August 2025).
  3. Mental Health of Adolescents—who.int. Available online: https://www.who.int/news-room/fact-sheets/detail/adolescent-mental-health (accessed on 1 August 2025).
  4. Depressive Disorder (Depression)—who.int. Available online: https://www.who.int/news-room/fact-sheets/detail/depression (accessed on 1 August 2025).
  5. Jin, Y.; Liu, J.; Li, P.; Wang, B.; Yan, Y.; Zhang, H.; Ni, C.; Wang, J.; Li, Y.; Bu, Y.; et al. The Applications of Large Language Models in Mental Health: Scoping Review. J. Med. Internet Res. 2025, 27, e69284. [Google Scholar] [CrossRef]
  6. Ilias, L.; Mouzakitis, S.; Askounis, D. Calibration of transformer-based models for identifying stress and depression in social media. IEEE Trans. Comput. Soc. Syst. 2023, 11, 1979–1990. [Google Scholar] [CrossRef]
  7. Kumar, M.; Khan, L.; Chang, H.T. Evolving techniques in sentiment analysis: A comprehensive review. PeerJ Comput. Sci. 2025, 11, e2592. [Google Scholar] [CrossRef]
  8. Khan, L.; Amjad, A.; Afaq, K.M.; Chang, H.T. Deep sentiment analysis using CNN-LSTM architecture of English and Roman Urdu text shared in social media. Appl. Sci. 2022, 12, 2694. [Google Scholar] [CrossRef]
  9. Conway, M.; O’Connor, D. Social media, big data, and mental health: Current advances and ethical implications. Curr. Opin. Psychol. 2016, 9, 77–82. [Google Scholar] [CrossRef]
  10. Ernala, S.K.; Birnbaum, M.L.; Candan, K.A.; Rizvi, A.F.; Sterling, W.A.; Kane, J.M.; De Choudhury, M. Methodological gaps in predicting mental health states from social media: Triangulating diagnostic signals. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Scotland, UK, 4–9 May 2019; pp. 1–16. [Google Scholar]
  11. Tolentino, J.C.; Schmidt, S.L. DSM-5 criteria and depression severity: Implications for clinical practice. Front. Psychiatry 2018, 9, 450. [Google Scholar] [CrossRef]
  12. Hassan, A.U.; Hussain, J.; Hussain, M.; Sadiq, M.; Lee, S. Sentiment analysis of social networking sites (SNS) data using machine learning approach for the measurement of depression. In Proceedings of the 2017 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 18–20 October 2017; pp. 138–140. [Google Scholar]
  13. Khan, L.; Shahreen, M.; Qazi, A.; Jamil Ahmed Shah, S.; Hussain, S.; Chang, H.T. Migraine headache (MH) classification using machine learning methods with data augmentation. Sci. Rep. 2024, 14, 5180. [Google Scholar] [CrossRef] [PubMed]
  14. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
  15. Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
  16. Pfeiffer, J.; Kamath, A.; Rücklé, A.; Cho, K.; Gurevych, I. Adapterfusion: Non-destructive task composition for transfer learning. arXiv 2020, arXiv:2005.00247. [Google Scholar]
  17. Han, S.; Mao, R.; Cambria, E. Hierarchical attention network for explainable depression detection on Twitter aided by metaphor concept mappings. arXiv 2022, arXiv:2209.07494. [Google Scholar] [CrossRef]
  18. Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
  19. Kerasiotis, M.; Ilias, L.; Askounis, D. Depression detection in social media posts using transformer-based models and auxiliary features. Soc. Netw. Anal. Min. 2024, 14, 196. [Google Scholar] [CrossRef]
  20. Chiong, R.; Budhi, G.S.; Dhakal, S.; Chiong, F. A textual-based featuring approach for depression detection using machine learning classifiers and social media texts. Comput. Biol. Med. 2021, 135, 104499. [Google Scholar] [CrossRef] [PubMed]
  21. Kabir, M.; Ahmed, T.; Hasan, M.B.; Laskar, M.T.R.; Joarder, T.K.; Mahmud, H.; Hasan, K. DEPTWEET: A typology for social media texts to detect depression severities. Comput. Hum. Behav. 2023, 139, 107503. [Google Scholar]
  22. Resnik, P.; Armstrong, W.; Claudino, L.; Nguyen, T.; Nguyen, V.A.; Boyd-Graber, J. Beyond LDA: Exploring supervised topic modeling for depression-related language in Twitter. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, Denver, CO, USA, 5 June 2015; pp. 99–107. [Google Scholar]
  23. Coppersmith, G.; Leary, R.; Whyne, E.; Wood, T. Quantifying suicidal ideation via language usage on social media. In Proceedings of the Joint Statistics Meetings Proceedings, Statistical Computing Section, JSM, Seattle, WA, USA, 8–13 August 2015; Volume 110, p. 8. [Google Scholar]
  24. Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
  25. AlSagri, H.S.; Ykhlef, M. Machine learning-based approach for depression detection in twitter using content and activity features. IEICE Trans. Inf. Syst. 2020, 103, 1825–1832. [Google Scholar] [CrossRef]
  26. Sato, J.R.; Moll, J.; Green, S.; Deakin, J.F.; Thomaz, C.E.; Zahn, R. Machine learning algorithm accurately detects fMRI signature of vulnerability to major depression. Psychiatry Res. Neuroimaging 2015, 233, 289–291. [Google Scholar] [CrossRef]
  27. Mehmood, F.; Mumtaz, N.; Mehmood, A. Next-Generation Tools for Patient Care and Rehabilitation: A Review of Modern Innovations. Actuators 2025, 14, 133. [Google Scholar] [CrossRef]
  28. Amjad, A.; Khan, L.; Chang, H.T. Effect on speech emotion classification of a feature selection approach using a convolutional neural network. PeerJ Comput. Sci. 2021, 7, e766. [Google Scholar] [CrossRef] [PubMed]
  29. Amjad, A.; Khan, L.; Ashraf, N.; Mahmood, M.B.; Chang, H.T. Recognizing semi-natural and spontaneous speech emotions using deep neural networks. IEEE Access 2022, 10, 37149–37163. [Google Scholar] [CrossRef]
  30. Ashraf, N.; Khan, L.; Butt, S.; Chang, H.T.; Sidorov, G.; Gelbukh, A. Multi-label emotion classification of Urdu tweets. PeerJ Comput. Sci. 2022, 8, e896. [Google Scholar] [CrossRef]
  31. Khan, L.; Amjad, A.; Ashraf, N.; Chang, H.T.; Gelbukh, A. Urdu sentiment analysis with deep learning methods. IEEE Access 2021, 9, 97803–97812. [Google Scholar] [CrossRef]
  32. Orabi, A.H.; Buddhitha, P.; Orabi, M.H.; Inkpen, D. Deep learning for depression detection of twitter users. In Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, New Orleans, LA, USA, 5 June 2018; pp. 88–97. [Google Scholar]
  33. Kim, J.; Lee, J.; Park, E.; Han, J. A deep learning model for detecting mental illness from user content on social media. Sci. Rep. 2020, 10, 11846. [Google Scholar] [CrossRef] [PubMed]
  34. Mehmood, F.; Mehmood, A.; Whangbo, T.K. Alzheimer’s Disease Detection in Various Brain Anatomies Based on Optimized Vision Transformer. Mathematics 2025, 13, 1927. [Google Scholar] [CrossRef]
  35. Amanat, A.; Rizwan, M.; Javed, A.R.; Abdelhaq, M.; Alsaqour, R.; Pandya, S.; Uddin, M. Deep learning for depression detection from textual data. Electronics 2022, 11, 676. [Google Scholar] [CrossRef]
  36. Vandana; Marriwala, N.; Chaudhary, D. A hybrid model for depression detection using deep learning. Meas. Sens. 2023, 25, 100587. [Google Scholar] [CrossRef]
  37. Karim, M.R.; Syeed, M.M.; Fatema, K.; Hossain, S.; Khan, R.H.; Uddin, M.F. AnxPred: A Hybrid CNN-SVM Model with XAI to Predict Anxiety among University Students. In Proceedings of the 2024 IEEE 17th International Scientific Conference on Informatics (Informatics), Poprad, Slovakia, 13–15 November 2024; pp. 132–137. [Google Scholar]
  38. Khan, L.; Amjad, A.; Ashraf, N.; Chang, H.T. Multi-class sentiment analysis of urdu text using multilingual BERT. Sci. Rep. 2022, 12, 5436. [Google Scholar] [CrossRef]
  39. Mitra, S. Suicidal Intention Detection in Tweets Using BERT-Based Transformers. In Proceedings of the 2022 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), Greater Noida, India, 4–5 November 2022. [Google Scholar]
  40. Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, B.; De Laroussilhe, Q.; Gesmundo, A.; Attariyan, M.; Gelly, S. Parameter-efficient transfer learning for NLP. In Proceedings of the International Conference on Machine Learning. PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 2790–2799. [Google Scholar]
  41. Qasim, A.; Mehak, G.; Hussain, N.; Gelbukh, A.; Sidorov, G. Detection of depression severity in social media text using transformer-based models. Information 2025, 16, 114. [Google Scholar] [CrossRef]
  42. Tavchioski, I.; Robnik-Šikonja, M.; Pollak, S. Detection of depression on social networks using transformers and ensembles. arXiv 2023, arXiv:2305.05325. [Google Scholar] [CrossRef]
  43. Pfeiffer, J.; Rücklé, A.; Poth, C.; Kamath, A.; Vulić, I.; Ruder, S.; Cho, K.; Gurevych, I. Adapterhub: A framework for adapting transformers. arXiv 2020, arXiv:2007.07779. [Google Scholar] [CrossRef]
  44. Chakravarthi, B.R.; Rajiakodi, S.; Ponnusamy, R.; Sivagnanam, B.; Thakare, S.Y.; Thangasamy, S. Detecting caste and migration hate speech in low-resource Tamil language. In Language Resources and Evaluation; Springer: Cham, Switzerland, 2025; pp. 1–36. [Google Scholar]
  45. Sun, C.; Huang, L.; Qiu, X. Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence. arXiv 2019, arXiv:1903.09588. [Google Scholar] [CrossRef]
  46. Qorich, M.; Ouazzani, R.E. BERT-Based Models with BiLSTM for Self-chronic Stress Detection in Tweets. In Proceedings of the International Conference on Artificial Intelligence and Smart Environment, Errachidia, Morocco, 23–25 November 2023; Springer: Cham, Switzerland, 2023; pp. 376–383. [Google Scholar]
  47. Zanwar, S.; Wiechmann, D.; Qiao, Y.; Kerz, E. Exploring hybrid and ensemble models for multiclass prediction of mental health status on social media. arXiv 2022, arXiv:2212.09839. [Google Scholar] [CrossRef]
  48. Thekkekara, J.P.; Yongchareon, S.; Liesaputra, V. An attention-based CNN-BiLSTM model for depression detection on social media text. Expert Syst. Appl. 2024, 249, 123834. [Google Scholar] [CrossRef]
  49. Hossain, M.M.; Hossain, M.S.; Mridha, M.; Safran, M.; Alfarhood, S. Multi task opinion enhanced hybrid BERT model for mental health analysis. Sci. Rep. 2025, 15, 3332. [Google Scholar] [CrossRef]
  50. Khan, L.; Qazi, A.; Chang, H.T.; Alhajlah, M.; Mahmood, A. Empowering Urdu sentiment analysis: An attention-based stacked CNN-Bi-LSTM DNN with multilingual BERT. Complex Intell. Syst. 2025, 11, 10. [Google Scholar] [CrossRef]
  51. Sardelich, M.; Manandhar, S. Multimodal deep learning for short-term stock volatility prediction. arXiv 2018, arXiv:1812.10479. [Google Scholar] [CrossRef]
  52. Suresh, V.; Ong, D.C. Using knowledge-embedded attention to augment pre-trained language models for fine-grained emotion recognition. In Proceedings of the 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII), Nara, Japan, 28 September–1 October 2021; pp. 1–8. [Google Scholar]
  53. Wang, K.; Jing, Z.; Su, Y.; Han, Y. Large language models on fine-grained emotion detection dataset with data augmentation and transfer learning. arXiv 2024, arXiv:2403.06108. [Google Scholar] [CrossRef]
  54. Singh, G.; Brahma, D.; Rai, P.; Modi, A. Fine-grained emotion prediction by modeling emotion definitions. In Proceedings of the 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII), Nara, Japan, 28 September–1 October 2021; pp. 1–8. [Google Scholar]
  55. Huang, C.; Trabelsi, A.; Qin, X.; Farruque, N.; Mou, L.; Zaiane, O.R. Seq2Emo: A sequence to multi-label emotion classification model. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 4717–4724. [Google Scholar]
  56. Sitoula, R.S.; Pramanik, M.; Panigrahi, R. Fine-Grained Classification for Emotion Detection Using Advanced Neural Models and GoEmotions Dataset. J. Soft Comput. Data Min. 2024, 5, 62–71. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.