A Multi-Class Deep Learning Approach for Early Detection of Depressive and Anxiety Disorders Using Twitter Data

Bendebane, Lamia; Laboudi, Zakaria; Saighi, Asma; Al-Tarawneh, Hassan; Ouannas, Adel; Grassi, Giuseppe

doi:10.3390/a16120543

Open AccessArticle

A Multi-Class Deep Learning Approach for Early Detection of Depressive and Anxiety Disorders Using Twitter Data

by

Lamia Bendebane

^1,*,

Zakaria Laboudi

^2,*,

Asma Saighi

³,

Hassan Al-Tarawneh

⁴,

Adel Ouannas

⁵ and

Giuseppe Grassi

⁶

¹

Research Laboratory on Computer Science’s Complex Systems (ReLa(CS)2), University of Oum El Bouaghi, Oum El Bouaghi 04000, Algeria

²

Department of Networks and Telecommunications, University of Oum El Bouaghi, Oum El Bouaghi 04000, Algeria

³

Laboratory of Artificial Intelligence and Autonomous Things (LIAOA), University of Oum El Bouaghi, Oum El Bouaghi 04000, Algeria

⁴

Department of Data Sciences and Artificial Intelligence, Al-Ahliyya Amman University, Amman 11942, Jordan

⁵

Department of Mathematics and Computer Science, University of Larbi Ben M’hidi, Oum El Bouaghi 04000, Algeria

⁶

Dipartimento Ingegneria Innovazione, Universita del Salento, 73100 Lecce, Italy

^*

Authors to whom correspondence should be addressed.

Algorithms 2023, 16(12), 543; https://doi.org/10.3390/a16120543

Submission received: 27 October 2023 / Revised: 22 November 2023 / Accepted: 22 November 2023 / Published: 27 November 2023

(This article belongs to the Special Issue Supervised and Unsupervised Classification Algorithms (2nd Edition))

Download

Browse Figures

Versions Notes

Abstract

:

Social media occupies an important place in people’s daily lives where users share various contents and topics such as thoughts, experiences, events and feelings. The massive use of social media has led to the generation of huge volumes of data. These data constitute a treasure trove, allowing the extraction of high volumes of relevant information particularly by involving deep learning techniques. Based on this context, various research studies have been carried out with the aim of studying the detection of mental disorders, notably depression and anxiety, through the analysis of data extracted from the Twitter platform. However, although these studies were able to achieve very satisfactory results, they nevertheless relied mainly on binary classification models by treating each mental disorder separately. Indeed, it would be better if we managed to develop systems capable of dealing with several mental disorders at the same time. To address this point, we propose a well-defined methodology involving the use of deep learning to develop effective multi-class models for detecting both depression and anxiety disorders through the analysis of tweets. The idea consists in testing a large number of deep learning models ranging from simple to hybrid variants to examine their strengths and weaknesses. Moreover, we involve the grid search technique to help find suitable values for the learning rate hyper-parameter due to its importance in training models. Our work is validated through several experiments and comparisons by considering various datasets and other binary classification models. The aim is to show the effectiveness of both the assumptions used to collect the data and the use of multi-class models rather than binary class models. Overall, the results obtained are satisfactory and very competitive compared to related works.

Keywords:

depressive disorder; anxiety disorder; Twitter data; deep learning; grid search

1. Introduction

In this research, we are interested in analyzing social Twitter data (tweets) to help detect psychological disorders, more specifically depression and anxiety disorders. Millions of people are now living with mental disorders, which are one of the leading causes of ill health worldwide. Therefore, early detection is crucial for rapid intervention in order to reduce the escalation of these disorders. In what follows, we first provide an overview of depression and anxiety disorders, then highlight the use of the Twitter platform to help deal with them and finally summarize the paper structure.

1.1. Overview of Depression and Anxiety Disorders

According to the World Health Organization (WHO), one in eight people (1/8) in the world suffers from a mental disorder [1]. A mental disorder is a psychiatric disorder characterized by a major alteration at a clinical level of the cognitive state, the regulation of emotions or an individual’s behavior. It is usually accompanied by a feeling of distress or functional impairments in important areas. Depressive and anxiety disorders are major social issues that are increasing every day. Indeed, millions of people suffer from depression and anxiety disorders; however, only a few of them undergo proper treatments [2].

As stated in [2], depressive disorders can take several forms and levels such as disruptive mood dysregulation disorder, major depressive disorder, persistent depressive disorder and so on. The common feature of all these variants is the presence of sad, empty or irritable moods accompanied by associated changes that affect an individual’s ability to perform their functions (e.g., somatic and cognitive changes). The difference between them lies in the duration, timing or presumed etiology. For instance, the main feature of a major depressive episode is a period of at least two weeks, during which there is either a depressed mood or loss of interest or pleasure in all or almost all activities most of the time. This affects women more than men (which is a female predominance).

Regarding anxiety disorders, they are characterized by excessive worry occurring more days more often for at least 6 months [2] about some events or activities. They generally affect women more than men, especially people aged 35 to 45 years. As mentioned in [2], the diagnostic criteria for generalized anxiety disorders are associated with the presence of at least three of the following six symptoms: restlessness or feeling keyed up or on edge, fatigue, difficulty of concentration or memory lapses, irritability, muscle tension and sleep disturbance. Table 1 presents a comparison between depressive and anxiety disorders.

1.2. Detection of Depression and Anxiety Disorders on the Twitter Platform

In general, social media allows users to post and share their feelings and moods. This helped significantly analyze these contents in order to understand several mental disorders and make predictions accordingly. More specifically, the growing popularity of Twitter (known currently as X platform) has contributed to making it an excellent data source for performing such content analyses, in particular for depression and anxiety detection. Indeed, people with severe symptoms of mental disorders are affected in their professional, family and social lives. This is why the automatic detection of these symptoms through social media would have important implications for those affected.

In this paper, we focus on the analysis of data extracted from the Twitter platform (i.e., tweets) with the aim of developing models capable of detecting mental disorders in users, more specifically depression and anxiety. In this regard, much research has been conducted in order to understand the statements expressed through tweets and to classify them into positive and negative sentiments while taking into account certain parameters (e.g., population, language, etc.). Traditional approaches used classic machine learning algorithms such as decision trees and SVMs (support vector machines) (see for instance [3,4,5,6,7,8,9]). However, as the data volumes have become very large, recent research has shifted towards deep learning techniques such as recurrent neural networks (RNN) and convolutional neural networks (CNN) (see for example [10,11]).

Even if the detection of depressive and anxious disorders using deep learning could give satisfactory results, these approaches nevertheless mainly rely on binary classification models by treating each mental disorder separately (i.e., depressive or non-depressive/anxious or non-anxious). This is because dealing with one single mental disorder is easier. Table 1 shows us the severity of the distinction between these mental disorders due to the existence of several symptoms in common (e.g., disturbed sleep, fluctuations, etc.). On another side, some symptoms that are not in common between depression and anxiety disorders (e.g., dizziness, heart palpitations, etc.) can overlap with other disorders such as heart disease and cancer. Thus, it would be better if we managed to develop effective models capable of treating more than one mental disorder at the same time.

To fill this gap, we propose a well-defined methodology involving the use of deep learning so as to develop efficient multi-class models for detecting depression and anxiety via tweets analysis. The objective is to classify tweets into three distinct classes: normal, potentially depressive and potentially anxious. This multi-classification approach should allow a better understanding and a more precise assessment of the different nuances linked to these two mental disorders when they are expressed in tweets and thus improve the sensitivity and specificity of their detection.

The basic idea of our proposal is to build several multi-class deep learning models considering both simple and hybrid variants through an efficient combination of different models, in order to test them all. To validate our proposal, we first evaluate the performance of the tested models using different metrics. Then, the well-performing models are used to classify tweets from other datasets. Finally, we compare their performances with binary deep learning models that disjointedly classify depressive and anxious disorders. As a result, the accuracy of our models could reach up to 93%, which is very competitive with other related works, on the one hand, and show more accuracy than binary models that separately predict depressive and anxious disorders, on the other hand.

1.3. Paper Structure

The rest of this paper is organized as follows: Section 2 reviews and summarizes some related works on depression and anxiety detection with a special focus on those involving the Twitter platform. Section 3 provides the details of the proposed methodology for the detection of depressive and anxious disorders using multi-class deep learning models. Section 4 summarizes the experimental stage, gives a set of numerical results and discusses and analyzes the obtained results. Finally, Section 5 provides some concluding remarks.

2. Related Works

Many people around the world suffer from mental disorders due to several factors such as quality of life and stress. Consequently, intensive research efforts have been made in terms of diagnosis and management. In this regard, the evolution of computing technologies have further supported these efforts in different ways, notably by involving artificial intelligence [12]. Indeed, as reported in [13], artificial intelligence methods could improve psychotherapy by providing therapists and patients with real-time or near-real-time recommendations based on the patient’s response to treatment, especially since 40% of patients do not respond to psychotherapy as planned. In particular, machine learning and data mining techniques can be used to analyze a patient’s history to diagnose a problem, thereby helping to copy human reasoning or make logical decisions [12].

Much research has been conducted on the detection of depressive and anxiety mental disorders through social media platforms [3,4,5,6,7,8,9,10,11,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38], in particular using Twitter, while considering different factors such as population, period, language, etc. Most of such studies rely on supervised machine learning models for text classification using either traditional learning techniques such as SVM, RF, NB and LR or deep learning approaches such as RNN, LSTM, GRU, Bi_RNN, Bi_LSTM and Bi_GRU. In addition, some approaches are designed around hybridization of different models such as combining different variants of CNN with RNN (see for instance [33,37]). The general scheme of this kind of analysis mainly consists in collecting data according to some assumptions and hypothesis (i.e., keywords, location, etc.), preprocessing these data, labeling the data according to the target classes, extracting the features, training the adopted models and finally evaluating their performances so as they can be deployed (i.e., they become ready for use). Table 2, Table 3 and Table 4 summarize and compare some typical research studies according to the classification techniques used.

3. Research Methodology

The proposed process uses multi-class classification models to categorize tweets as “normal”, “potentially depressed” or “potentially anxious”. In order to achieve these objectives, we rely on a rigorous methodology which allows us to obtain efficient classifiers by exploiting Twitter data. This process carries out a clear sequence of well-defined phases, as illustrated in Figure 1. In the following, we detail each phase by providing explanations on its role within the system.

3.1. Preparation Dataset

The goal of this phase is to obtain a large number of relevant tweets. To do so, four steps are required. First, raw data are collected using dedicated tools. Then, these data are preprocessed to make them ready for use. Next, the preprocessed data are labeled in order to bind them to one among the three classes, namely “normal”, “potentially depressed” and “potentially anxious”. Finally, the labeled data are balanced so that their numbers are approximately equal.

3.1.1. Data Collection

The aim of this step is to collect a large dataset of tweets written in English. The period of tweets related to depression and anxiety is from 1 December 2019 to 31 December 2021. This period corresponds to the circumstances of the COVID-19 pandemic, where many people were affected by the requirements of confinement, isolation, risk of illness, loss of loved ones, etc. These poor living conditions have encouraged people to use social media to express their feelings. In contrast, the period of the tweets related to normal behaviors is from 25 January 2022 to 31 January 2022.

The keywords used to collect the data were carefully inspired by the symptoms of depression and anxiety summarized in Table 1. This procedure for collecting the data from Twitter is widely adopted by several deep learning approaches for many purposes. In what follows, we give some typical cases. For instance, Shen et al. have collected data for depression detection using keywords close to “(I’m/I was/I am/I’ve been) diagnosed depression” [36]. These data were reused in other works [5,28,36,37,38] for different purposes. Chang et al. use the disease name ‘Borderline, bpd, bipolar’ as keywords to predict borderline personality disorder (BPD) and bipolar disorder (BD) [39]. In [40], Wang collected data based on the name of five dietary supplements ‘Melatonin, Kava, Ginkgo, Biloba, Ginseng’ to predict depression, anxiety and mood Disorders. Note that the use of a single word as a keyword (e.g., name of a disease or a food supplement) does not confirm that the user is sick, so the ambiguity rate is systematically high. In contrast, using these words by indicating one symptom or more within an explanatory sentence may reduce the rate of ambiguity. This is because such sentences correspond to user statements and thus their content is more likely contain negative sentiments and expressions that help train models.

To generate depressive and anxiety tweets, we first used patterns close to: “I am/was/have been diagnosed/identified with depression/anxiety”. The aim is to target users who self-report their issues. Then, we intensified the search around these data using other keywords related to both common and non-common symptoms between depression and anxiety disorders. For common symptoms, we used several verbs like “feel”, “suffer”, “want”, “can”, “be”, “have” under several forms (conjugated in the past and the present according to negative and affirmative forms, depending on the meaning targeted) combined with words related to “sleep”, “appetite”, “fatigue”, “suicide”, “death”, “sadness”, “melancholy”, “fear”, “worry”, under several forms (nouns, adjectives, gerunds in addition to some of their synonyms). The degree of a given symptom was expressed using adverbs such as “so”, “very”, “little” (e.g., so sad, little sad).

In the same way, we have generated depressive and anxiety tweets based on the symptoms which are not in common. For depression disorder, we used keywords close to “loss of pleasure”, “despair about the future”, “feelings of failure”. Regarding anxiety disorder, we used keywords close to “Dizziness”, “heart palpitations”, “panic attack”. All these keywords were involved under several forms such as nouns, adjectives, gerunds in addition to some of their synonyms. Finally, normal tweets were generated based on keywords related to positive sentiments and feelings such as “happiness”, “love” and “beauty”. Table 5 gives typical examples of such keywords used within some parts of sentences that can appear in tweets.

Our choice to create our dataset can be summarized in two main points. First, in the context of deep learning, it will be better to rely on large volumes of data in the hope that they lead to good performances. Second, as one of the goals of our paper is to show the effects of the nuances between depression and anxiety disorders on training process, it would be better to rely on our own datasets provided that they follow a robust method leading to reliable data. On another side, one might ask whether the training of our models could be done using data extracted from other sources such as statements, reports and questionnaires of those affected in hospitals and clinics. Unfortunately, social media have their own specificities (posts form, language used, emoticons, multimedia contents, etc.). So, even if a given user is affected by a mental disorder, she/he will be most likely adapted to the way social media are used. Therefore, ideally, the models should be trained using data extracted from social media platforms.

3.1.2. Preprocessing of Data

The data collection phase results in building three datasets, denoted as D0, D1 and D2, with a total size of over seven million tweets, as shown in Table 6. Unfortunately, these data are unclear, incomplete, unstructured and containing errors and redundancy; therefore, it is not recommended to analyze them directly. This is why data preprocessing is a much-needed step to obtain relevant data. In our methodology, we have adopted 14 preprocessing techniques by removing: (1) emojis, (2) emoticons, (3) URLs, (4) hashtags (#), (5) mentions (@name), (6) special characters, (7) punctuation from text, (8) symbols, (9) digits, (10) repetitive letters from words, (11) extra whitespace, (12) uppercase letters, (13) contractions (e.g., “It’s” becomes “It is”) and (14) NaN and duplicates in column text. Table 6 gives the numbers of tweets before and after preprocessing the collected data.

The word clouds are given in Figure 2, which shows the visual representation of the most used keywords (tags) used in the preprocessed data in datasets D0, D1 and D2.

3.1.3. Data Labeling

The next step is data labeling; it implies assigning a label to each tweet in the datasets based on its class. The tweets from datasets D0, D1 and D2 are bound to the three classes “normal”, “potentially depressed” and “potentially anxious”, respectively. Therefore, we have labeled tweets from dataset D0 with value ‘0’, tweets from dataset D1 with value ‘1’ and finally tweets from dataset D2 with value ‘2’. This data labelling aims to build classification models that only classify tweets as potentially positive towards depressive and anxiety mental disorders or not; thus, the analysis is done at the tweet level. If so, the behaviors of concerned users on social media platforms will be analyzed through other systems which further process user data in order to make decisions (user level analysis).

In general, data collected from social media should always be taken with a certain degree of confidence. This is why we collected a large volume of data relating to users self-reporting their cases, in order to increase the degree of confidence in the statements contained in the tweets. Moreover, according to the above-stated objectives, our models may allow a certain tolerance regarding the confidence of tweets toward mental disorders because they do not make decisions about users but only classify tweets for further processing. In addition, large volumes of data are generally more suitable for deep learning approaches in order to obtain good results.

3.1.4. Balancing Data

After data labeling of datasets D0, D1 and D2, they are merged into a single dataset denoted as Main_dataset. Imbalanced datasets refer to those for which the target classes have an uneven distribution of observations leading to appearance of minority and majority classes [41]. This risks producing models with poor predictive performance, particularly for minority classes. Regarding our dataset, Table 5 shows that, after preprocessing, the contents of datasets D0, D1 and D2 represent approximately 32.00%, 32.63% and 35.37%, respectively. Consequently, our main dataset is quite balanced. Next, the Main-dataset is randomly divided into three balanced datasets that we refer to as Train_dataset, Test_dataset and Eval_dataset, as shown in Figure 3. The Train-dataset contains 70% of the tweets from each of the datasets D0, D1 and D2, which represents 70% of the total tweets from Main-dataset; this is used to train the models. The Test_dataset contains 15% of the tweets from each of the datasets D0, D1 and D2, which represents 15% of the total tweets from Main-dataset; this is used as a test dataset throughout the models training. Finally, the Eval_dataset contains the remaining tweets (about 15% of the total tweets); this is used in the evaluation phase.

3.2. Tokenization

Tokenization is a crucial procedure in our process. It breaks up each tweet in the dataset into words called tokens. These tokens help understand the context and thus develop the model for natural language processing tasks. In our dataset, the maximum length of tweets is 131 words.

3.3. Feature Extraction

This phase aims to extract the most important features from tweets. In our case, we use word embedding, which is one of the most popular representations of document vocabulary. It helps extract many useful features of a given word in a document (e.g., context, semantic, etc.). For this task, we rely on the GloVe model (Global and Vectors) which allows obtaining vector representations for words while integrating global statistics of words co-occurrence to obtain word vectors [42]. GloVe is developed as an open-source project at Stanford University and launched in 2014. Regarding our work, the pre-trained word vectors that are used are the GloVe Twitter word embedding (200 d), which are trained by using 2 billion tweets (containing 27 billion tokens and 1.2 million vocab). These data are made available under the Public Domain Dedication and License v1.0 [43].

3.4. Training the Models

In order to build well-performing models for classifying normal, depression and anxiety cases, our proposal is based on

An efficient hybridization that combines CNN model with other types of neural networks to take advantage of the strengths that characterize them such as (1) Simple RNN, (2) LSTM, (3) GRU, (4) Bidirectional RNN (BiRNN), (5) BiLSTM and (6) BiGRU. Subsequently, we build hybrid multi-class classifier models according to our multi-labeled dataset of tweets;
Dealing with the optimization of the learning rate parameter, which is considered one of the most important parameters in deep learning-based tasks. To do so, we first adopt the Adam optimizer while initializing the learning rate parameter with 0.0001 (the smallest value). Then, we call up the technique of Grid Search Optimization to find the best learning rate value for each model in the interval [0.0001, 0.001].

The result of each deep learning classifier is represented as knowledge (model.h5) in order to be used to predict normal cases and depressive and anxious disorders.

3.5. Evaluation of Models

In this phase, we evaluate the performance of all models built. For this purpose, we use the four metrics given by Formulas (1)–(4) namely, accuracy, precision, recall and F1-score, due to their wide use in the literature. These measures are calculated according to the confusion matrix, which summaries the number of correct and incorrect predictions made by a given classifier, as shown below.

Accuracy = \frac{T N + T P}{T N + F P + T P + F N}

(1)

Precision = \frac{T P}{F P + T P}

(2)

Recall = \frac{T P}{T P + F N}

(3)

F 1 -score = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(4)

N: Negative, P: Positive, T: True, F: False		N	P
	N	TN	FP
	P	FN	TP

(1): True Positives: when current and predicted values are positive with respect to a given class (i.e., both the current label and the label output by the model match the class label);
(2): True Negatives: when current and predicted values are negative with respect to a given class (i.e., both the current label and the label output by the model does not match the class label);
(3): False Positives: when the current value is negative while the predicted value is positive with respect to a given class;
(4): False Negatives: when the current value is positive while the predicted value is negative with respect to a given class.

4. Experiments, Numerical Results and Discussion

4.1. Software and Hardware Configuration

The training of our models was performed on an AMD Ryzen 5 4600H laptop endowed with a 3.00-GHz Radeon processor and 16-GB of RAM. The tweets composing the datasets were collected by using Twitter API and Twarc2 Python library. Regarding the parameters of the training process, we have empirically set them as follows: number of epochs is 20, batch size is 256, maximum tweets length is 131 words, embedding glove 200 d and Adam optimizer is adopted as the default optimization algorithm.

4.2. Performance of the Developed Models

To build multi-class models for predicting normal, depressive and anxiety tweets, we have tested around 100 models ranging from simple to hybrid models combining different types of neural network layers: convolution, recurrent, attention and bidirectional. Consequently, we found that the following hybrid multi-classifiers are the most representative typical cases of both success and failure: CNN_RNN, CNN_LSTM, CNN_GRU, CNN_BiRNN, CNN_BiLSTM and CNN_BiGRU. CNN_BiRNN, CNN_BiLSTM and CNN_BiGRU models are the best in terms of performance for all experiment instances while CNN_RNN and CNN_GRU models are the best in terms of performance improvements by involving grid search technique. Finally, CNN_LSTM model represents a failure case where the grid search technique was unable to provide performance improvements. Figure 4 show the performance of these models in terms of training accuracy and training loss, respectively. In particular, the well-performing model is CNN_BiGRU with a learning rate of 0.001.

By setting the learning rate value to 0.001, CNN_RNN was the worst model as it recorded poor accuracy. Moreover, CNN_LSTM and CNN_GRU also showed a significant value of overfitting (red and blue curves are far from each other). However, this unwanted overfitting effect gradually disappeared by setting the learning rate value to 0.0001. In contrast, value 0.001 for the learning rate led to better performance for CNN_BiRNN, CNN_BiLSTM and CNN_BiGRU compared to 0.0001, in addition to the good behavior regarding overfitting. Figure 4 shows the associated curves (the curves on the left concern learning rate value 0.0001 while the curves on the right concern learning rate value 0.001).

The above results suggest that changing the learning rate value of the Adam optimizer has positive or negative influence on the performance of each model. Thus, we need efficient methods to define such a value in order to provide efficient models. In this respect, we adopt grid search, which is a well-known technique serving as a Hyperparameter optimizer for each model. The results are given in Table 7 and Table 8.

According to Table 7 and Table 8, the best Accuracy achieved is 93.38%; it corresponds to CNN_ BiGRU model such that F1-score of the Normal class is 96%, F1-score of the Depression class is 91% and F1-score of the Anxiety class is 93%. Figure 5 illustrates the confusion matrix for both cases grid search and fixed-based learning rate values. Thus, it can be seen that the grid search could make some improvements in some cases for which the diagonal has a max of correct predictions.

4.3. Evaluation and Analysis of the Well-Performing Models

In this section, we evaluate our approach regarding the quality of the data collected and the models built. The objective is twofold: (1) verify the effectiveness of the assumptions used to collect data and (2) show the effectiveness of using multi-class models rather than binary class models. To this end, we leverage the dataset used in [36] to perform an evaluation using binary class models for depression and anxiety detection. Thus, we have randomly selected 12,982 tweets from Depression Dataset D1 and 2658 tweets from Non-Depression Dataset D2. After preprocessing these data, we obtained 5955 tweets labeled by ‘1’ and 2325 tweets labeled by ‘0’; the resulting dataset is denoted as Shen_dataset. These data are then tested by considering the well-performing models discussed in Table 7 and Table 8. The results are given on Table 9.

According to Table 9, one observes that the prediction accuracy of Shen_dataset is average and thus does not show very good results. This is because many depressive tweets were classified as anxious tweets by our models. Indeed, as mentioned in Table 1, there are some common symptoms between depressive and anxiety disorders which consequently may lead to committing classification errors. By knowing that the tweets of Shen_dataset were collected by using some keywords that overlap with anxiety disorders (e.g., “I am depressed and anxious”, “I am too tired”, “I am so sad” and “I have depression anxiety suicidal thoughts”), our models most likely classify them as anxiety tweets instead of depressive ones.

To check this issue, we have reused our dataset to build two binary class models for predicting depression and anxiety separately while keeping the same parameters values. These models are based on the hybridization of CNN and Bi-GRU. Hence, Main-dataset was divided into two datasets denoted as Dataset1 and Dataset2. Dataset1 contains only normal and depressive tweets labeled, respectively, with ‘0’ and ‘1’ while Dataset2 contains only normal and anxiety tweets labeled, respectively, with ‘0’ and ‘1’. Once these models are built, we test datasets Eval_dataset, Shen_dataset, Dataset1 and Dataset2 to make comparisons and thus draw conclusions. The results are given on Table 10.

According to Table 10, both binary class models classify depressive tweets from Shen_dataset as depressive and anxiety tweets with very high accuracy. Regarding our datasets, the obtained results are much better. For instance, Model_2 was trained to classify depressive tweets. By evaluating Dataset2 (anxiety dataset), the accuracy is about 86.35% which means that many anxious tweets were classified as non-depressive. Likewise, by evaluating Dataset1 (depressive dataset) using Model_3, the accuracy is about 62.96%; this means that most of depressive tweets were classified as non-anxious. The conclusions we draw from these results can be summarized as follows:

The source of the improved accuracy of the studied models comes from the way the data were collected by relying on both common and non-common symptoms instead of only using keywords related to common symptoms between depressive and anxiety disorders.
Our multi-class models seem to be more effective than the corresponding binary class models as they can resolve ambiguities. Indeed, as depressive and anxiety disorders present certain intersections, binary models most likely classify them as positive tweets (i.e., either depressive or anxious tweets) regardless of the model used (see for instance the results of using Model_2).

It should be noted that the conclusions drawn concern only the context of our work and can in no way be generalized.

4.4. Assessment of Our Proposal

Finally, we objectively assess our proposal against related works. Table 11 provides a comparison between our proposal and some other related works within the same context (i.e., those dealing with depression and/or anxiety disorders based on Twitter data), according to the following criteria:

C1.: Mental disorder: this refers to the mental disorder studied, which can be either depression (denoted as Dep) or anxiety (denoted as Anx) disorders.
C2.: Data collection: this refers to whether the training data were collected using keywords (e.g., symptoms, usernames, etc.) or reused from other datasets.
C3.: Dataset size: this refers to the total number of tweets used to train the models.
C4.: Type of learning model: this refers to whether the well-performing classifier adopts simple variants (denoted as S) or hybridization (denoted as H) of models.
C5.: Type of classification: this refers to whether the well-performing classifier is a binary (denoted as B) or a multi-class (denoted as M) model.
C6.: Accuracy achieved: this refers to the accuracy achieved by the well-performing classifier (measured as a percentage).

In view of the foregoing, the main potential advantage of our study is that it can be viewed as a complementary work to existing research focused on the detection of depression and anxiety disorders, as

In contrast to many related works that rely on binary classification, our approach is based on multi-class models;
Our study showed that multi-classification may be more efficient than binary class models as it could better resolve ambiguities issues, although this cannot be generalized;
The data were collected based on assumptions involving both common and non-common symptoms between depression and anxiety disorders.

Our approach also shows some drawbacks which are discussed in the following while trying to propose solutions. It should be noted that these limitations do not only concern our approach but much research working within the same context.

Although the data were generated according to a well-defined process, we still lack for more efficient methods for collecting data and labelling them (tweets). This still remains a big challenge for large volumes of data, in contrast to small volumes of data that can be processed and annotated within a reasonable time. As an ongoing work, we are currently studying the use of semantics to help collect and label the data through ontology-computing while considering emoji, emoticons and related contents.
In fact, many researchers have embarked on a frantic race to design/improve classification models for the detection of mental disorders through the Twitter platform. Undoubtedly, this is very important, but it should not be an end in itself because what is more important is to leverage these models in order to perform useful tasks. In this line of thinking, we are currently working to deploy our models within a syndromic surveillance system, in order to improve public health systems. At this level, our classification models are only used to classify the tweets as potentially positive toward depression and anxiety mental disorders or not. If so, the concerned users will be taken into account to study and monitor their behaviors on social media platforms through the syndromic surveillance system that further processes user data (tweets) in order to make some decisions and thus to perform the required actions. Indeed, it is far from easy to decide whether a given user is affected by a mental disorder by analyzing only one or a few tweets. Therefore, such models help make an early detection of both the affection of some people with mental disorders, on the one hand, and the start of mental disorders episodes for those already affected, on the other hand. In both cases, early identification helps minimize the damage. In addition, we also plan to study the ways the future syndromic surveillance system may help building labelled datasets with relevant data as in this stage, user behaviors undergo deeper analysis.

5. Conclusions

The objective of this work was to study the detection of depression and anxiety disorders through data extracted from the Twitter platform (tweets) using multi-class models based on deep learning. To this end, we have adopted a well-defined methodology, which includes several steps: data preparation (data collection, preprocessing, labelling and balancing), tokenization, feature extraction, models training and models evaluation. The training was carried out by relying on different simple multi-class models such as LSTM and GRU and hybrid ones such as CNN-LSTM and CNN-Bi-GRU. Finally, the performance of these models was evaluated using experiments, measurements and comparison considering different datasets. The experiments and analyses carried out showed us the effectiveness of some of the tested models to predict depressive, anxious and normal tweets compared to the others. The performance of these models was also able to outperform the corresponding binary class models when tested separately. Overall, the results obtained in this research work were very satisfactory, encouraging and promising.

Author Contributions

Conceptualization, L.B., Z.L., A.S., H.A.-T., A.O. and G.G.; methodology, L.B., Z.L., A.S., H.A.-T., A.O. and G.G.; software, L.B.; validation, L.B., Z.L. and A.S.; formal analysis, H.A.-T., A.O. and G.G.; investigation, Z.L. and A.S.; resources, L.B.; data curation, L.B., Z.L., A.S., H.A.-T., A.O. and G.G.; writing—original draft preparation, L.B., Z.L., A.S., H.A.-T., A.O. and G.G.; writing—review and editing, L.B., Z.L., A.S. and A.O.; visualization, L.B., Z.L. and A.S.; supervision, Z.L. and A.S.; project administration, Z.L.; funding acquisition, L.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research has received no external funding.

Data Availability Statement

Research data used in this article may be available upon request to the corresponding author (L.B.).

Acknowledgments

The authors would like to thank the General Directorate of Scientific Research and Technological Development (DGRSDT) in Algeria, for supporting this work as a doctoral research project at the University of Oum El Bouaghi—Algeria.

Conflicts of Interest

The authors declare that they have no conflict of interest.

List of Abbreviations

WHO	World Health Organization	LR	Logistic Regression
AI	Artificial Intelligent	SVM	Support Vector Machine
NLP	Natural Language Preprocessing	SVM-NB	Support Vector Machine-Naive Bayes
ML	Machine Learning	GBDT	Gradient-Boosted Decision Trees
DL	Deep Learning	AdaBoostM1	Adaptive Boosting M1
CNN	Convolution Neural Network	Liblinear	Library linear
RNN	Recurrent Neural Network	KNN	K-Nearest Neighbors
LSTM	Long Short-Term Memory	DT	Decision Tree
GRU	Gated Recurrent Unit	LDA	Linear Discriminant Analysis
Bi	Bidirectional	GNB	Gaussian Naive Bayes
MNB	Multinomial Naive Bayes	MDL	Minimum Description Length
SVR	Support Vector Regression	BERT	Bidirectional Encoder Representations from Transformers
LogReg	Logistic regression	USE	Universal Sentence Encoder
H, M, L	High, Medium, Low	MDHAN	Multi-Aspect Depression Detection with Hierarchical Attention Network
N, Mi, Mo, S	Normal, Mild, Moderate, Severe
ECG	Electrocardiogram
XGBoost	eXtreme Gradient Boosting
RFT	Random Forest Tree
GBC	Gradient Boosting Classifier

References

Mental Disorders. Available online: https://www.who.int/news-room/fact-sheets/detail/mental-disorders (accessed on 1 June 2023).
American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders, 5th ed.; Text Revision DSM-5-TR™; American Psychiatric Association: Washington, DC, USA, 2022; pp. 178–181, 218, 250. [Google Scholar]
Jain, S.; Narayan, S.P.; Dewang, R.K.; Bhartiya, U.; Meena, N.; Kumar, V. A machine learning based depression analysis and suicidal ideation detection system using questionnaires and twitter. In Proceedings of the 2019 IEEE Students Conference on Engineering and Systems (SCES), Allahabad, India, 29–31 May 2019. [Google Scholar] [CrossRef]
Victor, D.B.; Kawsher, J.; Labib, M.S.; Latif, S. Machine learning techniques for depression analysis on social media-case study on bengali community. In Proceedings of the 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 5–7 November 2020. [Google Scholar] [CrossRef]
Skaik, R.; Inkpen, D. Using twitter social media for depression detection in the canadian population. In Proceedings of the 2020 3rd Artificial Intelligence and Cloud Computing Conference, Kyoto, Japan, 18–20 December 2020. [Google Scholar] [CrossRef]
Azam, F.; Agro, M.; Sami, M.; Abro, M.H.; Dewani, A. Identifying depression among twitter users using sentiment analysis. In Proceedings of the 2021 International Conference on Artificial Intelligence (ICAI), IEEE, Islamabad, Pakistan, 5–7 April 2021. [Google Scholar] [CrossRef]
de Jesús Titla-Tlatelpa, J.; Ortega-Mendoza, R.M.; Montes-y-Gómez, M.; Villaseñor-Pineda, L. A profile-based sentiment-aware approach for depression detection in social media. EPJ Data Sci. 2021, 10, 1–18. [Google Scholar] [CrossRef]
Musleh, D.A.; Alkhales, T.A.; Almakki, R.A.; Alnajim, S.E.; Almarshad, S.K.; Alhasaniah, R.S.; Aljameel, S.S.; Almuqhim, A.A. Twitter arabic sentiment analysis to detect depression using machine learning. CMC 2022, 71, 3463–3477. [Google Scholar] [CrossRef]
Mustafa, R.U.; Ashraf, N.; Ahmed, F.S.; Ferzund, J.; Shahzad, B.; Gelbukh, A. A multiclass depression detection in social media based on sentiment analysis. In Proceedings of the 17th International Conference on Information Technology—New Generations (ITNG 2020), Las Vegas, NV, USA, 5–8 April 2020; Springer International Publishing: Cham, Switzerland, 2020. [Google Scholar] [CrossRef]
Ziwei, B.Y.; Chua, H.N. An application for classifying depression in tweets. In Proceedings of the 2nd International Conference on Computing and Big Data, Taichung, Taiwan, 18–20 October 2019. [Google Scholar] [CrossRef]
Uddin, A.H.; Bapery, D.; Arif, A.S.M. Depression analysis of bangla social media data using gated recurrent neural network. In Proceedings of the 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), Dhaka, Bangladesh, 3–5 May 2019. [Google Scholar] [CrossRef]
Pintelas, E.G.; Kotsilieris, T.; Livieris, I.E.; Pintelas, P. A review of machine learning prediction methods for anxiety disorders. In Proceedings of the 8th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-Exclusion, Thessaloniki, Greece, 20–22 June 2018; pp. 8–15. [Google Scholar] [CrossRef]
Gual-Montolio, P.; Jaén, I.; Martínez-Borba, V.; Castilla, D.; Suso-Ribera, C. Using Artificial Intelligence to Enhance Ongoing Psychological Interventions for Emotional Problems in Real- or Close to Real-Time: A Systematic Review. Int. J. Environ. Res. Public Health 2022, 19, 7737. [Google Scholar] [CrossRef] [PubMed]
Stephen, J.J.; Prabu, P. Detecting the magnitude of depression in Twitter users using sentiment analysis. IJECE 2019, 9, 3247–3255. [Google Scholar] [CrossRef]
Al Asad, N.; Pranto, M.A.M.; Afreen, S.; Islam, M.M. Depression detection by analyzing social media posts of user. In Proceedings of the 2019 IEEE International Conference on Signal Processing, Information, Communication & Systems (SPICSCON), Dhaka, Bangladesh, 28–30 November 2019. [Google Scholar] [CrossRef]
Almouzini, S.; Alageel, A. Detecting arabic depressed users from Twitter data. Procedia Comput. Sci. 2019, 163, 257–265. [Google Scholar] [CrossRef]
Arora, P.; Arora, P. Mining twitter data for depression detection. In Proceedings of the 2019 International Conference on Signal Processing and Communication (ICSC), Noida, India, 7–9 March 2019. [Google Scholar] [CrossRef]
Zhou, J.; Zogan, H.; Yang, S.; Jameel, S.; Xu, G.; Chen, F. Detecting community depression dynamics due to COVID-19 pandemic in Australia. IEEE Trans. Comput. Soc. Syst. 2021, 8, 982–991. [Google Scholar] [CrossRef]
AlSagri, H.; Ykhlef, M. Quantifying feature importance for detecting depression using random forest. IJACSA 2020, 11, 628–635. [Google Scholar] [CrossRef]
Kamite, S.R.; Kamble, V.B. Detection of depression in social media via twitter using machine learning approach. In Proceedings of the 2020 International Conference on Smart Innovations in Design, Environment, Management, Planning and Computing (ICSIDEMPC), Aurangabad, India, 30–31 October 2020. [Google Scholar] [CrossRef]
Safa, R.; Bayat, P.; Moghtader, L. Automatic detection of depression symptoms in twitter using multimodal analysis. J. Supercomput. 2021, 78, 4709–4744. [Google Scholar] [CrossRef]
Shetty, N.P.; Muniyal, B.; Anand, A.; Kumar, S.; Prabhu, S. Predicting depression using deep learning and ensemble algorithms on raw Twitter data. IJECE 2020, 10, 3751–3756. [Google Scholar] [CrossRef]
Kelley, S.W.; Mhaonaigh, C.N.; Burke, L.; Whelan, R.; Gillan, C.M. Machine learning of language use on Twitter reveals weak and non-specific predictions. NPJ Digit. Med. 2022, 5, 35. [Google Scholar] [CrossRef]
Kim, J.; Lee, J.; Park, E.; Han, J.; Kim, J.J.; Parker, S.L.; Doty, J.R.; Cunnington, R.; Gilbert, P.; Kirby, J.N. A deep learning model for detecting mental illness from user content on social media. Sci. Rep. 2020, 10, 1–6. [Google Scholar] [CrossRef]
Lin, C.; Hu, P.; Su, H.; Li, S.; Mei, J.; Zhou, J.; Leung, H. Sense-mood: Depression detection on social media. In Proceedings of the 2020 International Conference on Multimedia Retrieval, Dublin, Ireland, 8–11 June 2020. [Google Scholar] [CrossRef]
Ghosh, S.; Anwar, T. Depression intensity estimation via social media: A deep learning approach. IEEE Trans. Comput. Soc. Syst. 2021, 8, 1465–1474. [Google Scholar] [CrossRef]
Basiri, M.E.; Nemati, S.; Abdar, M.; Asadi, S.; Acharrya, U.R. A novel fusion-based deep learning model for sentiment analysis of COVID-19 tweets. Knowl.-Based Syst. 2021, 228, 107242. [Google Scholar] [CrossRef] [PubMed]
Almars, A.M. Attention-based Bi-LSTM model for Arabic depression classification. CMC-Comput. Mater. Contin. 2022, 71, 3091–3106. [Google Scholar] [CrossRef]
Pradhan, R.; Sharma, D.K. An ensemble deep learning classifier for sentiment analysis on code-mix Hindi–English data. Soft Comput. 2022, 27, 11053. [Google Scholar] [CrossRef] [PubMed]
Kute, R. Mental health analyzer for depression detection based on textual analysis. J. Adv. Inf. Technol. 2022, 13, 67–77. [Google Scholar] [CrossRef]
Ma, L.; Wang, Y. Constructing a semantic graph with depression symptoms extraction from twitter. In Proceedings of the 2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Siena, Italy, 9–11 July 2019. [Google Scholar] [CrossRef]
Zogan, H.; Razzak, I.; Wang, X.; Jameel, S.; Xu, G. Explainable depression detection with multi-aspect features using a hybrid deep learning model on social media. World Wide Web 2022, 25, 281–304. [Google Scholar] [CrossRef] [PubMed]
Bendebane, L.; Laboudi, Z.; Saighi, A. Mental Disorders Prediction from Twitter Data: Application to Syndromic Surveillance Systems. In Proceedings of the Novel & Intelligent Digital Systems Conferences, Athens, Greece, 28–29 September 2023. [Google Scholar] [CrossRef]
Govindasamy, K.A.; Palanichamy, N. Depression detection using machine learning techniques on twitter data. In Proceedings of the 2021 5th international conference on intelligent computing and control systems (ICICCS), Madurai, India, 6–8 May 2021. [Google Scholar] [CrossRef]
Santos, W.R.D.; de Oliveira, R.L.; Paraboni, I. SetembroBR: A social media corpus for depression and anxiety disorder prediction. Lang. Resour. Eval. 2023, 1–28. [Google Scholar] [CrossRef]
Shen, G.; Jia, J.; Nie, L.; Feng, F.; Zhang, C.; Hu, T.; Chua, T.-S.; Zhu, W. Depression detection via harvesting social media: A multimodal dictionary learning solution. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017. [Google Scholar] [CrossRef]
Kour, H.; Gupta, M.K. An hybrid deep learning approach for depression prediction from user tweets using feature-rich CNN and bi-directional LSTM. Multimed. Tools Appl. 2022, 81, 23649–23685. [Google Scholar] [CrossRef]
Shen, T.; Jia, J.; Shen, G.; Feng, F.; He, X.; Luan, H.; Tang, J.; Tiropanis, T.; Chua, T.S.; Hall, W. Cross-domain depression detection via harvesting social media. In Proceedings of the International Joint Conferences on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018. [Google Scholar]
Chang, C.H.; Saravia, E.; Chen, Y.S. Subconscious Crowdsourcing: A feasible data collection mechanism for mental disorder detection on social media. In Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), San Francisco, CA, USA, 18–21 August 2016. [Google Scholar] [CrossRef]
Wang, Y.; Zhao, Y.; Bian, J.; Zhang, R. Detecting signals of associations between dietary supplement use and mental disorders from Twitter. In Proceedings of the 2018 IEEE International Conference on Healthcare Informatics Workshop (ICHI-W), New York, NY, USA, 4–7 June 2018. [Google Scholar] [CrossRef]
Tyagi, S.; Mittal, S. Sampling approaches for imbalanced data classification problem in machine learning. In Proceedings of the ICRIC 2019: Recent Innovations in Computing, Jammu & Kashmir, India, 8–9 March 2019; Springer International Publishing: Berlin/Heidelberg, Germany, 2019. [Google Scholar] [CrossRef]
Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar, 25–29 October 2014. [Google Scholar]
Pennington, J.; Socher, R.; Manning, C.D. “GloVe: Global Vectors for Word Representation”. Available online: https://nlp.stanford.edu/projects/glove/ (accessed on 18 June 2023).
Gruda, D.; Hasan, S. Feeling anxious? Perceiving anxiety in tweets using machine learning. Comput. Hum. Behav. 2019, 98, 245–255. [Google Scholar] [CrossRef]
Leung, J.; Chung, J.Y.C.; Tisdale, C.; Chiu, V.; Lim, C.C.W.; Chan, G. Anxiety and Panic Buying Behaviour during COVID-19 Pandemic—A Qualitative Analysis of Toilet Paper Hoarding Contents on Twitter. Int. J. Environ. Res. Public Health 2021, 18, 1127. [Google Scholar] [CrossRef] [PubMed]
Al-Laith, A.; Alenezi, M. Monitoring People’s Emotions and Symptoms from Arabic Tweets during the COVID-19 Pandemic. Information 2021, 12, 86. [Google Scholar] [CrossRef]

Figure 1. The proposed methodology for building effective classifiers of mental disorders detection.

Figure 2. Word cloud of dataset etching after preprocessing; (a) Word cloud of dataset D0; (b) Word cloud of dataset D1; (c) Word cloud of dataset D2.

Figure 3. Distribution of the tweets from Main_dataset.

Figure 4. Comparison between training and test for accuracy and loss of hybrid models; (a) learning rate 0.001; (b) learning rate 0.0001.

Figure 5. Comparison between confusion matrix of the hybrid model; (a) CNN_RNN; (b) CNN_LSTM; (c) CNN_GRU; (d) CNN_BiRNN; (e) CNN_BiLSTM; (f) CNN_BiGRU.

Table 1. Differences and commonalities between depressive and anxiety disorders [2].

Type of Symptoms		Depressive Disorder	Anxiety Disorder
Physical Diagnoses	Age	-	35–45
	Duration of the disorder	15 days	6 months
	Gender	Women > Male
	In common with the same degree	Disturbed sleep, fluctuations in appetite or weight, agitation, anxiety, isolation (absenteeism) and sexual inhibition.
	In common but of different degree	Intense fatigue (loss of energy) * Suicidal thoughts *	Intense fatigue (loss of energy) * Suicidal thoughts *
	Which are not common points	-	Dizziness, heart palpitations.
Psychological diagnoses	In common with the same degree	Difficultly concentrating, fear, excessive worry and nightmares.
	In common but of different degree	Sad/melancholy ***	Sad/melancholy *
	Which are not common points	Loss of interest (loss of pleasure = anhedonia, despair about the future), feelings of guilt or failure, low self-esteem,	Panic attack

The symbols ‘*’, ‘**’ and ‘***’ refer to the degree of symptom by taking values 33%, 66% and 100%, respectively.

Table 2. Comparison of recent studies using traditional machine learning approaches to detect mental disorders from different data sources.

Ref.	Year	Data Source	Language	Prediction	ML Approaches	Accuracy (%)	F1-Score (%)
[3]	2019	Questionnaire (D1) Twitter (D2)	English	5 Levels of Depression	[RFT, XGBoost, LR, SVM]	D1: [76.34, 83.87, 59.22, 76.50] D2: [82.05, 84.02, 86.45, 85.44]	-
[4]	2020	Twitter	Bengali	Depression	[DT, RF, SVM, LR, NB, KNN]	[90.0, 90.3, 90.1, 90.2, 90.2, 90.2]	[90.1, 90.3, 90.3, 90.3, 90.3, 90.2]
[5]	2020	Twitter	English	Depression	[SVM, LR, RF, GBDT, XGBoost]	D1: [91.2, 92.7, 94.4, 96.0, 96.4] D2: [84.8, 87.9, 89.3, 91.1, 86.4]	D1: [89.9, 91.6, 93.5, 96.1, 95.8] D2: [80.0, 78.4, 77.9, 81.1, 88.7]
[6]	2021	Twitter	English	Depression	[RF, SVM]	[77.0, 73.0]	-
[8]	2022	Twitter	Arabic	Depression	[SVM, RF, LR, KNN, AdaBoost, NB]	RF: [82.39]	RF: [82.53]
[9]	2020	Twitter	English	Depression	SVM [H, M, L]	[86, 91, 86]	[84, 85, 85]
[9]	2020	Twitter	English	Depression	RF [H, M, L]	[80, 83, 83]	[72, 66, 84]
[11]	2019	Twitter	Bangla	Depression	GRU	75.7	-
[15]	2019	Twitter + Facebook	English	6 Level of Depression	SVM-NB	74	-
[16]	2019	Twitter + Patient Health Questionnaire (PHQ-9)	Arabic	Depression	[RF, NB, AdaBoostM1, Liblinear]	[83, 75.6, 55.2, 87.5]	[82.8, 75.6, 53.2, 87.5]
[17]	2019	Twitter	English	Depression	[MNB, SVR]	[78, 79.7]	-
[18]	2021	Twitter	English	Depression	Multi Model + TF-IDF feature: [LR, LDA, GNB]	[90.3, 90.4, 87.9]	[90.2, 90.3, 87.8]
[19]	2020	Twitter	English	Depression	RF	84.7	66.7
[20]	2021	Twitter	English	Depression	[NB, RF]	-	[94.87, 99.89]
[21]	2022	Twitter	English	Depression	GBC	91	89
[22]	2020	Twitter	English	Depression	[LSTM, CNN]	[93, 95]	-
[23]	2023	Twitter	English	Depression	[SVM, RF]	[59, 57]	[54, 53]

Table 3. Comparison of recent studies using simple deep learning approaches to detect mental disorders from different data sources.

Ref.	Year	Source	Language	Prediction	DL Approach	Accuracy (%)	F1-Score (%)
[24]	2020	Reddit	English	Depression and Non-Depression	[XGBoost, CNN]	[71.69, 75.13]	(depression, N-depression) [(58.02, 78.65), (79.49, 68.41)]
				Anxiety and Non-Anxiety	[XGBoost, CNN]	[70.41, 77.81]	(Anxiety, N-Anxiety) [(55.92, 77.73), (56.25, 85.14)]
				Bipolar and Non-Bipolar	[XGBoost, CNN]	[85.53, 90.20]	(Bipolar, N-Bipolar) [(53.59, 91.43), (52.95, 94.53)]
				BPD and Non-BPD	[XGBoost, CNN]	[85.14, 90.49]	(BPD, N-BPD) [(46.43, 91.37), (48.21, 94.76)]
				Schizophrenia and Non-Schizophrenia	[XGBoost, CNN]	[86.72, 94.33]	(Schizo, N-Schizo) [(40.97, 92.52), (38.07, 97.03)]
[25]	2020	Twitter	English	Depression	SenseMood system	88.39	93.60
[26]	2021	Twitter	English	Depression	LSTM-MDL-fine tuner	87.14	-
[27]	2022	Twitter + Google trends	English	Positive or Negative Opinions about COVID-19	[Proposed Model, CNN, BiGRU, FastText, NBSVM, DistilBERT]	[85.8, 81.6, 79.7, 79.6, 79.8, 85.5]	[85.8, 81.5, 79.7, 79.6, 79.8, 85.5]
[28]	2022	Twitter	Arabic	Depression	Attention-based Bi-LSTM	83	-
[29]	2022	Twitter	Hindi-English	Depression	[LSTM, BERT, USE, Proposal]	[65, 60, 60, 67]	-
[30]	2022	Twitter	Indian	Depression	[CNN, LSTM, Bi-LSTM]	[98.00, 94.84, 97.10]	-

Table 4. Comparison of recent studies using hybrid deep learning approaches to detect depression and anxiety disorders from different data sources.

Ref	Year	Data Source	Language	Prediction	Hybrid Approach	Accuracy (%)	F1-Score (%)
[32]	2022	Twitter	English	Depression	MDHAN	89.5	89.3
[33]	2023	Twitter	English	Normal, Depression and Anxiety	CNN-BiLSTM	88.93	[Normal, Dep, Anx]: [86, 90, 91]
[34]	2021	Twitter	English	Depression	[NB, NBTree]	D1: [92.34, 97.31] D2: [92.34, 97.31]	-
[35]	2023	Twitter	Portuguese	Depression and Anxiety	[LogReg, LSTM, CNN, BERT]	-	Dep: [58, 53, 52, 63] Anx [55, 50, 47, 61]

Table 5. Typical keywords used as parameters to collect our dataset.

Normal Tweets (D0)	Depressed Tweets (D1)	Anxious Tweets (D2)
To be full of the joys of spring. Feel relaxed/good/excited/alright/buzzing/in love. Enjoy my life. Walking on air. On top of the world. Over the moon. I am happy. Beautiful life. Peaceful mind.	I am/was/have been diagnosed with depression. I am/was/have been identified as depressed. I am depressed. I feel depressed. People do not die from suicide they die from sadness. Sometimes I am sad tired miserable for no reason at all. Nothing more depressing. I feel lost inside of myself.	I am/was/have been diagnosed with anxiety. I am/was/have been identified as anxious. I am anxious. I feel anxious. I am/feel scared. I am terrified. I have had dizziness for more than six months. I have had heart palpitations for more than six months.

Table 6. Number of tweets before and after preprocessing sub-steps.

Datasets	Tweets before Preprocessing	Tweets after Preprocessing	Percentage of Data after Preprocessing (%)
D0 (Normal)	2,892,049	1,017,101	32.00
D1 (Depressed)	2,295,038	1,037,050	32.63
D2 (Anxious)	1,996,568	1,124,419	35.37
Total Dataset	7,183,655	3,178,570	100.00

Table 7. The evaluation of our models on the evaluation dataset (Eval_dataset), based on fixed learning rate values for Adam optimizer.

N°	Models	Fixed Learning Rate	Accuracy (%)	F1-Score Class 0 (%)	F1-Score Class 1 (%)	F1-Score Class 2 (%)
1	CNN_RNN [33]	0.0001	36.07	69.00	19.00	2.00
2	CNN_LSTM [33]	0.0001	72.76	62.00	67.00	88.00
3	CNN_GRU [33]	0.0001	80.17	85.00	79.00	77.00
4	CNN_BiRNN [33]	0.0001	87.27	92.00	84.00	86.00
5	CNN_BiLSTM [33]	0.0001	88.93	86.00	90.00	91.00
6	CNN_BiGRU [33]	0.0001	87.94	85.00	87.00	92.00
7	CNN_RNN	0.001	35.42	0.00	0.00	52.00
8	CNN_LSTM	0.001	57.02	49.00	56.00	64.00
9	CNN_GRU	0.001	78.22	77.00	73.00	83.00
10	CNN_BiRNN	0.001	89.65	93.00	87.00	89.00
11	CNN_BiLSTM	0.001	91.82	92.00	91.00	93.00
12	CNN_BiGRU	0.001	93.38	96.00	91.00	93.00

Table 8. The evaluation of our models on the evaluation dataset (Eval_dataset), by using grid search optimizer to determine the learning rate value for Adam optimizer.

N°	Models	Fixed Learning Rate	Accuracy (%)	F1-Score Class 0 (%)	F1-Score Class 1 (%)	F1-Score Class 2 (%)
13	CNN_RNN_gs	0.0002	73.17	72.00	73.00	75.00
14	CNN_LSTM_gs	0.0008	55.74	33.00	55.00	75.00
15	CNN_GRU_gs	0.0006	88.24	89.00	84.00	90.00
16	CNN_BiRNN_gs	0.0001	88.51	92.00	86.00	87.00
17	CNN_BiLSTM_gs	0.0007	92.20	95.00	90.00	92.00
18	CNN_BiGRU_gs	0.0006	92.75	96.00	91.00	92.00

Table 9. Prediction of tweets from Shen_dataset using our well-performing models.

N°	Models	Accuracy (%)	Predict Class 0 (Tweets)	Predict Class 1 (Tweets)	Predict Class 2 (Tweets)	Correct Prediction	Convergence Ratio (%)
15	CNN_GRU_gs	88.24	2241	4778	1261	6832	82.51
16	CNN_BiRNN_gs	88.51	1740	2771	3769	4259	51.44
17	CNN_BiLSTM_gs	92.20	1884	3630	2766	5410	65.34
18	CNN_BiGRU_gs	92.75	1474	4002	2804	5213	62.96

Table 10. The CNN-BiGRU classifiers to predict normal cases and, depression and anxiety disorders using different datasets.

Models	Training Dataset	Type of Classification	Prediction	Evaluation Dataset	Accuracy (%)
Model_1	Train_Dataset	Multi-class	Normal, Depressed and Anxiety	Eval_dataset	92.75
Model_1	Train_Dataset	Multi-class	Normal, Depressed and Anxiety	Shen_dataset	62.96
Model_2	Dataset1	Binary-class	Normal and Depressed	Dataset2	86.35
Model_2	Dataset1	Binary-class	Normal and Depressed	Shen_dataset	95.34
Model_3	Dataset2	Binary-class	Normal and Anxiety	Dataset1	69.97
Model_3	Dataset2	Binary-class	Normal and Anxiety	Shen_dataset	94.84

Table 11. Comparison of our proposal with some related works.

Work	C1		C2		C3	C4		C5		C6
Work	Dep	Anx	Keyword-Based	Reused		S	H	B	M	(%)
[5]	X		-	from [36]	D1: 292,564	X		X		96.40
[6]	X		Diagnosis	-	89,776	X		X		77.00
[8]	X		Diagnosis	-	4542	X			X	82.39
[9]	X		Not mentioned	-	156,511	X			X	91.00
[15]	X		Tweets of specific users	-	2832		X		X	74.00
[18]	X		Tweets during COVID-19	-	94,707,264	X			X	90.40
[19]	X		Diagnosis	-	1 million	X		X		84.70
[25]	X		-	from [36]	D1: 292,564 D2: 10 billion D3: 35 million		X	X		88.39
[26]	X		-	from [36]	D1: 292,564	X			X	87.14
[36]	X		Diagnosis	-	D1: 292,564 D2: >10 billion D3: 35,067,677	X		X		85.00
[44]		X	Work and feeling	-	D1: 600	X		X		-
[44]		X	1418 users	-	D2: >3 million	X		X		-
[45]		X	Hashtags on toilet paper (COVID-19)	-	255,171	X			X	-
[46]		X	Hashtags on COVID-19	-	300,000	X		X		75.00
Our proposal	X	X	Diagnosis and symptoms	-	3,178,570		X		X	93.38

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bendebane, L.; Laboudi, Z.; Saighi, A.; Al-Tarawneh, H.; Ouannas, A.; Grassi, G. A Multi-Class Deep Learning Approach for Early Detection of Depressive and Anxiety Disorders Using Twitter Data. Algorithms 2023, 16, 543. https://doi.org/10.3390/a16120543

AMA Style

Bendebane L, Laboudi Z, Saighi A, Al-Tarawneh H, Ouannas A, Grassi G. A Multi-Class Deep Learning Approach for Early Detection of Depressive and Anxiety Disorders Using Twitter Data. Algorithms. 2023; 16(12):543. https://doi.org/10.3390/a16120543

Chicago/Turabian Style

Bendebane, Lamia, Zakaria Laboudi, Asma Saighi, Hassan Al-Tarawneh, Adel Ouannas, and Giuseppe Grassi. 2023. "A Multi-Class Deep Learning Approach for Early Detection of Depressive and Anxiety Disorders Using Twitter Data" Algorithms 16, no. 12: 543. https://doi.org/10.3390/a16120543

APA Style

Bendebane, L., Laboudi, Z., Saighi, A., Al-Tarawneh, H., Ouannas, A., & Grassi, G. (2023). A Multi-Class Deep Learning Approach for Early Detection of Depressive and Anxiety Disorders Using Twitter Data. Algorithms, 16(12), 543. https://doi.org/10.3390/a16120543

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Multi-Class Deep Learning Approach for Early Detection of Depressive and Anxiety Disorders Using Twitter Data

Abstract

1. Introduction

1.1. Overview of Depression and Anxiety Disorders

1.2. Detection of Depression and Anxiety Disorders on the Twitter Platform

1.3. Paper Structure

2. Related Works

3. Research Methodology

3.1. Preparation Dataset

3.1.1. Data Collection

3.1.2. Preprocessing of Data

3.1.3. Data Labeling

3.1.4. Balancing Data

3.2. Tokenization

3.3. Feature Extraction

3.4. Training the Models

3.5. Evaluation of Models

4. Experiments, Numerical Results and Discussion

4.1. Software and Hardware Configuration

4.2. Performance of the Developed Models

4.3. Evaluation and Analysis of the Well-Performing Models

4.4. Assessment of Our Proposal

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

List of Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI