An Encoder-Only Transformer Model for Depression Detection from Social Network Data: The DEENT Approach
Abstract
:1. Introduction
- Two encoder-only-based models (DEENT-Generic and DEENT-Bert) able to effectively estimate—regarding accuracy, balanced accuracy, precision, and f1-score—depression during the COVID-19 pandemic from a Twitter dataset.
- A labeled dataset, built using BERT and K-means clustering, containing non-depressive and depressive tweets.
2. Related Work
3. DEENT
3.1. Pipeline
3.2. Business and Data Understanding
3.3. Data Engineering
3.4. Depression Twitter Dataset
- A pre-trained model, called covid-twitter-bert-v2-mnli [38] (available on HugginFace [39]) and based on BERT, was used to find the probability score of each tweet (just the text column of the sentiment-oriented dataset was used) of belonging to two candidate labels (depressive and non-depressive). This model was used since it was already fine-tuned for classification problems related to the COVID-19 pandemic. Furthermore, this Transformer-based model generates word representations that depend on the context in which they appear, allowing the handling of ambiguous tweets. Remarkably, covid-twitter-bert-v2-mnli had not been used for mental illness classification as we do in this paper. From a development perspective, the Transformer library version 4.40.2 of HuggingFace was used to download the covid-twitter-bert-v2-mnli that computes the probability scores related to depression.
- The K-means clustering algorithm was enforced on the dataset containing the probability scores obtained in the previous step. We used K-means since it can handle borderline cases by assigning data points to the closest centroid, effectively defining decision boundaries for each label. In particular, we varied the number of clusters k from 2 to 9 and evaluated the K-Means outcome with the Silhouette coefficient (values near to 1 are desirable) to determine the clustering process quality. Results revealed that the coefficient (0.613) was highest when K-means operated with k = 2. Remarkably, K-means grouped the tweets as depressive (label = 1) and non-depressive (label = 0) and, consequently, produced a dataset for binary depression classification. The resultant dataset (see Table 3) was imbalanced, including 70,509 (56.87%) non-depressive tweets and 53,475 (43.13%) depressive. From a development perspective, the Scikit-learn 1.4.1 was used to build the K-means clustering model and Matplotlib 3.7.1 to plot the figures and analyze the results.
3.5. DEENT-Generic
3.6. DEENT-Bert
4. Evaluation
4.1. Performance Metrics
4.2. DEENT Training
4.3. Baseline
- SVM [16] analyzes the training data and identifies the separation between the support vectors and the optimal hyperplane (i.e., two classes for binary classification). SVM was configured using a linear kernel.
- RF [15] is an ensemble ML algorithm that combines many decision tree predictors and aggregates their outputs to achieve more accurate and robust predictions. This algorithm was configured with 100 estimators, a maximum tree depth of 150, and a random state equal to 42 to ensure reproducibility.
- XGBoost [17] is an optimized distributed gradient boosting algorithm designed for efficient and scalable ML model training, combining predictions from multiple weak models to produce a more robust prediction. This algorithm was trained with a learning rate equal to 0.1, a maximum tree depth of 15, and 150 estimators.
- RNNs [18] are feedback-connected neural networks, which allows them to have internal states. These internal states provide a memory that can hold information about previous inputs. We set up an LSTM with embedding and dense layers, an Adam optimizer with a learning rate equal to 0.001, and a BinaryCrossEntropyLoss function. The LSTM included 64 neurons, dropout equal to 0.5, a first dense layer with 32 neurons, a second with 8, a third with 2 (all using Tanh activation function), and the last layer with a single neuron with a Sigmoid activation function.
- CNN [19] has convolutional and pooling layers. Each convolutional layer computes dot products between the weights of output neurons called filters and a local region in the input volume of the layer to which they are connected. A pooling layer is placed between the convolutional layers to reduce the number of parameters and computations. We trained a CNN with an embedding, Conv1D, dense layer, Adam optimizer with a learning rate equal to 0.001, and BinaryCrossEntropyLoss function. The Conv1D layer included 64 neurons activated with the Tanh function. Before the dense layer, a dropout equal to 0.5 was set up to prevent overfitting during training. The first dense layer included 32 neurons, the second 8, and the third 2 with Tanh activation function. The last layer used a single neuron with a Sigmoid activation function.
- MentalBERT [47] is a model based on the encoder-only Transformer architecture proposed for detecting mental illnesses like depression, anxiety, and suicide. Notably, as this model was pre-trained using data related to mental health, we fine-tuned it with the depression-oriented dataset, Adam optimizer, and a learning rate of 1.5 × 10−4.
4.4. Results and Analysis
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
AUROC | Area Under the Receiver Operating Characteristics |
BERT | Bidirectional Encoder Representations from Transformers |
BiGRU | bidirectional Gated Recurrent Unit |
CNN | Convolutional Neural Network |
DL | Deep Learning |
FN | False Negatives |
FP | False Positives |
GAI | Generative Artificial Intelligence |
HAN | Hierarchical Attention Network |
KNN | k-nearest neighbors |
LSTM | Long short-term memory |
MDHAN | Multi-Aspect Depression Detection with Hierarchical Attention Network |
ML | Machine Learning |
NB | Naive Bayes |
NLP | Natural Language Processing |
RF | Random Forest |
RNN | Recurrent Neural Network |
SVM | Support Vector Machine |
TF-IDF | Term Frequency-Inverse Document Frequency |
TN | True Negatives |
TP | True Positives |
XGBoost | Extreme Gradient Boosting Machine |
WHO | World Health Organization |
References
- Depression. Available online: https://www.who.int/es/news-room/fact-sheets/detail/depression (accessed on 10 December 2024).
- Mental Disorders. Available online: https://www.who.int/news-room/fact-sheets/detail/mental-disorders (accessed on 10 December 2024).
- Depression. Available online: https://www.nimh.nih.gov/health/topics/depression (accessed on 10 December 2024).
- Deshpande, M.; Rao, V. Depression detection using emotion artificial intelligence. In Proceedings of the International Conference on Intelligent Sustainable Systems, ICISS 2017, Palladam, India, 7–8 December 2017; pp. 858–862. [Google Scholar] [CrossRef]
- Cacheda, F.; Fernandez, D.; Novoa, F.J.; Carneiro, V. Early detection of depression: Social network analysis and random forest techniques. J. Mach. Learn. Res. 2019, 21, e12554. [Google Scholar] [CrossRef] [PubMed]
- Sher, L. Post-COVID syndrome and suicide risk. QJM 2021, 114, 95–98. [Google Scholar] [CrossRef]
- Latoo, J.; Haddad, P.M.; Mistry, M.; Wadoo, O.; Islam, S.M.S.; Jan, F.; Iqbal, Y.; Howseman, T.; Riley, D.; Alabdulla, M. The COVID-19 pandemic: An opportunity to make mental health a higher public health priority. BJPsych Open 2021, 7, e172. [Google Scholar] [CrossRef] [PubMed]
- Li, S.; Wang, Y.; Xue, J.; Zhao, N.; Zhu, T. The impact of COVID-19 epidemic declaration on psychological consequences: A study on active weibo users. Int. J. Environ. Res. Public Health 2020, 17, 2032. [Google Scholar] [CrossRef]
- Adikari, A.; Nawaratne, R.; de Silva, D.; Ranasinghe, S.; Alahakoon, O.; Alahakoon, D. Emotions of COVID-19: Content analysis of self-reported information using artificial intelligence. J. Med. Internet Res. 2021, 23, e27341. [Google Scholar] [CrossRef] [PubMed]
- Simjanoski, M.; Ballester, P.L.; da Mota, J.C.; De Boni, R.B.; Balanzá-Martínez, V.; Atienza-Carbonell, B.; Bastos, F.I.; Frey, B.N.; Minuzzi, L.; Cardoso, T.d.A.; et al. Lifestyle predictors of depression and anxiety during COVID-19: A machine learning approach. Trends Psychiatry Psychother. 2022, 44, e20210365. [Google Scholar] [CrossRef]
- Jha, I.P.; Awasthi, R.; Kumar, A.; Kumar, V.; Sethi, T. Learning the mental health impact of COVID-19 in the United States with explainable artificial intelligence: Observational study. JMIR Ment. Health 2021, 8, e25097. [Google Scholar] [CrossRef]
- Huma; Sohail, M.K.; Akhtar, N.; Muhammad, D.; Afzal, H.; Mufti, M.R.; Hussain, S.; Ahmed, M. Analyzing COVID-2019 Impact on Mental Health through Social Media Forum. Comput. Mater. Contin. 2021, 67, 3737–3748. [Google Scholar] [CrossRef]
- Adarsh, V.; Arun Kumar, P.; Lavanya, V.; Gangadharan, G. Fair and Explainable Depression Detection in Social Media. Inf. Process. Manag. 2023, 60, 103168. [Google Scholar] [CrossRef]
- Zogan, H.; Razzak, I.; Wang, X.; Jameel, S.; Xu, G. Explainable depression detection with multi-aspect features using a hybrid deep learning model on social media. World Wide Web 2022, 25, 281–304. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2323. [Google Scholar] [CrossRef]
- Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E. Hierarchical Attention Networks for Document Classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 1480–1489. [Google Scholar] [CrossRef]
- Thushari, P.D.; Aggarwal, N.; Vajrobol, V.; Saxena, G.J.; Singh, S.; Pundir, A. Identifying discernible indications of psychological well-being using ML: Explainable AI in reddit social media interactions. Soc. Netw. Anal. Min. 2023, 13, 141. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
- Liu, M.; Xue, J.; Zhao, N.; Wang, X.; Jiao, D.; Zhu, T. Using Social Media to Explore the Consequences of Domestic Violence on Mental Health. J. Interpers. Violence 2021, 36, NP1965–1985NP. [Google Scholar] [CrossRef]
- Kalt, T. A New Probabilistic Model of Text Classification and Retrieval; Technical Report; University of Massachusetts: Amherst, MA, USA, 1998. [Google Scholar]
- Krasker, W.S. Estimation in linear regression models with disparate data points. Econometrica 1980, 48, 1333. [Google Scholar] [CrossRef]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Batool, A.; Byun, Y.C. Enhanced Sentiment Analysis and Topic Modeling During the Pandemic Using Automated Latent Dirichlet Allocation. IEEE Access 2024, 12, 81206–81220. [Google Scholar] [CrossRef]
- Meng, Q.M.; Wu, W.G. Artificial emotional model based on finite state machine. J. Cent. S. Univ. Technol. 2008, 15, 694–699. [Google Scholar] [CrossRef]
- Pourkeyvan, A.; Safa, R.; Sorourkhah, A. Harnessing the Power of Hugging Face Transformers for Predicting Mental Health Disorders in Social Networks. IEEE Access 2024, 12, 28025–28035. [Google Scholar] [CrossRef]
- Bombieri, M.; Rospocher, M.; Dall’Alba, D.; Fiorini, P. Automatic detection of procedural knowledge in robotic-assisted surgical texts. Int. J. Comput. Assist. Radiol. Surg. 2021, 16, 1287–1295. [Google Scholar] [CrossRef] [PubMed]
- Mehta, D.; Dwivedi, A.; Patra, A.; Anand Kumar, M. A transformer-based architecture for fake news classification. Soc. Netw. Anal. Min. 2021, 11, 39. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar] [CrossRef]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar] [CrossRef]
- Metzner, C.S.; Gao, S.; Herrmannova, D.; Lima-Walton, E.; Hanson, H.A. Attention Mechanisms in Clinical Text Classification: A Comparative Evaluation. IEEE J. Biomed. Health Inf. 2024, 28, 2247–2258. [Google Scholar] [CrossRef]
- Studer, S.; Bui, T.B.; Drescher, C.; Hanuschkin, A.; Winkler, L.; Peters, S.; Müller, K.R. Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology. Mach. Learn. Knowl. Extr. 2021, 3, 392–413. [Google Scholar] [CrossRef]
- Mann, S. Depressive/non-Depressive Tweets Between Dec’19 to Dec’20. Available online: https://ieee-dataport.org/open-access/depressivenon-depressive-tweets-between-dec19-dec20 (accessed on 10 December 2024).
- Loper, E.; Bird, S. NLTK: The Natural Language Toolkit. arXiv 2002, arXiv:cs/0205028. [Google Scholar] [CrossRef]
- Müller, M.; Salathé, M.; Kummervold, P.E. COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter. arXiv 2020, arXiv:2005.07503. [Google Scholar]
- Huggingface. Available online: https://huggingface.co/digitalepidemiologylab/covid-twitter-bert-v2-mnli (accessed on 10 December 2024).
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Int. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Huggingface BERT Model. Available online: https://huggingface.co/tiya1012/swmh4_bert (accessed on 10 December 2024).
- Sokolova, M.; Japkowicz, N.; Szpakowicz, S. Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation. In Proceedings of the AI 2006: Advances in Artificial Intelligence, Hobart, Australia, 4–8 December 2006; pp. 1015–1021. [Google Scholar] [CrossRef]
- García, V.; Mollineda, R.; Sánchez, J. Index of balanced accuracy: A performance measure for skewed class distributions. In Pattern Recognition and Image Analysis; Springer: Berlin/Heidelberg, Germany, 2009; Volume 5524, pp. 441–448. [Google Scholar] [CrossRef]
- Boutaba, R.; Salahuddin, M.A.; Limam, N.; Ayoubi, S.; Shahriar, N.; Solano, F.E.; Rendón, O.M.C. A comprehensive survey on machine learning for networking: Evolution, applications and research opportunities. J. Internet Serv. Appl. 2018, 9, 16. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar] [CrossRef]
- Ruby, U.; Yendapalli, V. Binary cross entropy with deep learning technique for Image classification. Int. J. Adv. Trends Comput. Sci. Eng. 2020, 9, 5393–5397. [Google Scholar] [CrossRef]
- Ji, S.; Zhang, T.; Ansari, L.; Fu, J.; Tiwari, P.; Cambria, E. MentalBERT: Publicly Available Pretrained Language Models for Mental Healthcare. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, Marseille, France, 20–25 June 2022; pp. 7184–7190. [Google Scholar] [CrossRef]
Reference | Data Source | Samples | Collection Time | Domain | Algorithm | Metric | Transformer Architecture |
---|---|---|---|---|---|---|---|
[8] | 17,864 | During COVID-19 | Sentiment Analysis | Naive Bayes, Linear Regression | |||
[9] | 73,000 | During COVID-19 | Sentiment Analysis | Latent Dirichlet Allocation | |||
[10] | Google Form Survey | 22,562 | During COVID-19 | Depression and Anxiety Classification | RF, XGBoost | Balanced Accuracy, Sensitivity, Specificity | |
[11] | Google Form Survey | 17,764 | During COVID-19 | Mental Illnesses Classification | RF, SVM | Accuracy, Sensitivity, Specificity, AUROC | |
[12] | 5877 msgs; 1000 related to depression | During COVID-19 | Mental Illnesses Classification | SVM, RF, NB | Precision, Recall, F1-Score | ||
[13] | 12,911 msgs; 4996 related to depression | During COVID-19 | Depression and Suicide Classification | SVM, RF, XGBoost, CNN, SVM + KNN | Accuracy, Precision, Recall, F1-Score, AUROC | ||
[14] | 447,856 | Before COVID-19 (2009 to 2016) | Depression Classification | MDHAN, SVM, BiGRU, MBiGRU, CNN, MCNN, HAN | Accuracy, Precision, Recall, F1-Score | ||
[21] | 46,103 msgs; 16,205 were related to depression | Before COVID-19 | Mental Illnesses Classification | XGBoost, RF, CNN, LSTM, BERT, and MentalBERT | Accuracy, Precision, Recall, F1-Score | Encoder-only | |
DEENT | 123,984 tweets; 53,475 related to depression | During COVID-19 (December 2019 to December 2020) | Depression Classification | DEENT-Generic and DEENT-Bert | Accuracy, Balanced Accuracy, Precision, Recall, F1-Score | Encoder-only |
Index | Text | Sentiment * |
---|---|---|
0 | rising cases of covid does not alarm me rising death rate does more testing capacity means more cases are detected earlier and asymtomatics and mild cases are identified india is in scary place go check out their graphs | 1 |
1 | please vote for chicagoindiaresolution marking india independence shared values of democracy human rights secularism | 0 |
2 | wishing all of you eidaladha hazrat ibrahim as ki sunnah aap sab ko mubarak in most parts of india | 1 |
3 | daily coronavirus cases in india top for first time covid | 1 |
4 | sitting here india style watching the raindrops hit this big ass pond listening to amy winehouse finallay understand what zahree was talking about | 0 |
Index | Text | Target * |
---|---|---|
0 | rising cases covid alarm rising death rate testing capacity means cases detected earlier asymtomatics mild cases identified india scary place go check graphs | 0 |
1 | please vote chicagoindiaresolution marking india independence shared values democracy human rights secularism | 1 |
2 | wishing eidaladha hazrat ibrahim ki sunnah aap sab ko mubarak parts india | 0 |
3 | daily coronavirus cases india top first time covid | 1 |
4 | sitting india style watching raindrops hit big ass pond listening amy winehouse finallay understand zahree talking | 0 |
Algorithm | Balancing Method | Recall |
---|---|---|
DEENT-Bert | None | 77.77% |
Weighted Loss Function | 80.54% | |
DEENT-Generic | None | 76.31% |
SMOTE | 80.72% | |
MentalBERT | None | 77.54% |
Weighted Loss Function | 79.98% | |
RNN | None | 76.83% |
SMOTE | 80.38% | |
CNN | None | 77.19% |
SMOTE | 77.92% | |
SVM | None | 74.66% |
SMOTE | 78.16% | |
XGBoost | None | 69.70% |
SMOTE | 71.81% | |
RF | None | 70.78% |
SMOTE | 75.82% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Narvaez Burbano, R.; Caicedo Rendon, O.M.; Astudillo, C.A. An Encoder-Only Transformer Model for Depression Detection from Social Network Data: The DEENT Approach. Appl. Sci. 2025, 15, 3358. https://doi.org/10.3390/app15063358
Narvaez Burbano R, Caicedo Rendon OM, Astudillo CA. An Encoder-Only Transformer Model for Depression Detection from Social Network Data: The DEENT Approach. Applied Sciences. 2025; 15(6):3358. https://doi.org/10.3390/app15063358
Chicago/Turabian StyleNarvaez Burbano, Robinson, Oscar Mauricio Caicedo Rendon, and Carlos. A. Astudillo. 2025. "An Encoder-Only Transformer Model for Depression Detection from Social Network Data: The DEENT Approach" Applied Sciences 15, no. 6: 3358. https://doi.org/10.3390/app15063358
APA StyleNarvaez Burbano, R., Caicedo Rendon, O. M., & Astudillo, C. A. (2025). An Encoder-Only Transformer Model for Depression Detection from Social Network Data: The DEENT Approach. Applied Sciences, 15(6), 3358. https://doi.org/10.3390/app15063358