MM-EMOG: Multi-label Emotion Graph Representation for Mental Health Classification on Social Media

More than 80% of people who commit suicide 001 disclose their intention to do so on social me-002 dia. The main information we can use in social 003 media is user-generated posts since personal 004 information is not always available. Identify-005 ing all possible emotions in a single textual 006 post is crucial to detecting the user’s mental 007 state; however, human emotions are very com-008 plex, and a single text instance likely expresses 009 multiple emotions. This paper proposes a new 010 multi-label emotion graph representation for 011 social media post-based mental health classi-012 fication. We first construct a word-document 013 graph tensor to describe emotion-based con-014 textual representation using emotion lexicons. 015 Then, it is trained by multi-label emotions and 016 conducts a graph propagation for harmonising 017 heterogeneous emotional information, and is 018 applied to a textual graph mental health clas-019 sification. We perform extensive experiments 020 on three publicly available social media mental 021 health classification datasets, and the results 022 show clear improvements. 1 2 023

1 Warning: This paper contains examples that are suicidal and depressive in nature.
2 All relevant code and data will be made available on Github upon acceptance.more contextual sources may be ideal for assessing an individual's mental health state, access to these data has become increasingly restrictive due to heightened data privacy concerns.This complicates research reproducibility since each study selects features based on what social media components are available to them.Due to this trend, the main information that can be used for mental health issue detection from social media are only usergenerated posts.Our research focuses on detecting mental illnesses by analysing only social media textual posts with the question, 'What would be the most important component from which we can identify the mental health condition using pure text from social media?'The answer can be found in the WHO's definition of mental disorder, stating that 'A mental disorder is characterized by a clinically significant disturbance in an individual's cognition, emotional regulation, or behaviour.'(World Health Organization, 2022).The ideal setup for mental state detection via textual posts would identify all possible emotions and integrate those feelings and emotional statuses.
Recent studies use deep learning to fine-tune contextual embeddings using mental health classification as a downstream task (Lara et al., 2021;Sawhney et al., 2021c).However, these studies focus on learning a single emotion for a word or  (Yao et al., 2019;Liu et al., 2020;Wang et al., 2022).
We adapt TextGCN (Yao et al., 2019) to learn local and global emotional trend via a graph-based structure G = (V, E, A), where V is a set of word and document nodes, E is a set of word-word, worddoc, and doc-doc edges, and A are edge weights.

Node Construction
We first preprocess the post text in two steps: further de-identification of emails, usernames, and URLs by replacement of tokens; and emoticon preservation which retains emoticons and emojis to be contextualised as individual tokens.We then create nodes by using each post as a document node and each token in the corpus as the word or token node.Token nodes are created either through 1) word split tokenisation (W) or 2) word-piece tokenisation (WP) using the BERT tokeniser.

Mental Health Post Classification
We evaluate MM-EMOG through a mental health post classification task (Figure 1 Step 2).Similar to Step 1, we leverage the corpus-wide co-occurrence information from TextGCN using the same graph construction method.For token node representations, we concatenate BERT and MM-EMOG embeddings and average all tokens for each document representation.Finally, the graph is passed to twolayers of GCN with a final output dimension equal to the number of mental health classes.Categorical cross entropy is used for single label classification.

258
beddings based on the lexicon used to train them.
Twitter-based datasets achieve better performance when trained with TEC and SenticNet which both include hashtags, emoticons, or emojis more frequently used on Twitter than on Reddit.This implies the importance of including these components in learning emotion representations for social media.We also compare the effect of different tokenisation methods and of further de-identification and emoticon preservation (Section 2.1).We observe that Twitter-based datasets have better performance for de-identified and emoticon preserved setups.This may be due to the frequent use of text.Due to the complexity of human emotions, it is very likely that multiple emotions are expressed by a single textual post and that those emotions can be correlated.To represent emotions and their correlation with the text, we can consider two types of textual representation techniques: sequential text representation and graph-based text representation.While sequential text representation promotes capturing text features from local consecutive word sequences, graph-based text representation can attract widespread attention and successfully understand word and document relationships

118
For wordpiece tokens, we incorporate emoticons 119 to the tokeniser vocabulary for emoticon preserva-120 tion and only apply lowercasing without additional 121 cleaning.For word split tokens, we employ a sim-122 ple text cleaning process that removes some punc-123 tuations and separates contractions.Stopwords are 124 kept to retain negation words.Finally, for word 125 split tokens, we initialise token nodes using GloVe 126 embeddings and average the weight of all token 127 nodes to represent the document node.For word-128 piece tokens, we use BERT embeddings where the 129 learned vector for [CLS] is used to initialise the 130 document node and the minimum of all learned 131 vector for each token is used for the token nodes.132 Edge Construction We leverage all types of co-133 occurrence relationships between tokens and docu-134 ments using Pointwise Mutual Information (PMI) 135 for word-to-word edges, TF-IDF for word-to-doc 136 edges, and Jaccard similarity for doc-to-doc edges 137 (Han et al., 2022).
Document Emotions We first gener-140 ate document-level, multi-label emotion classes to 141 use as targets.We leverage emotion lexicons that 142 contain word-emotion associations 3 .Assume a 143 document with words W ={w 1 , . . .w p } where p is 144 the number of unique words and a lexicon contain-145 ing terms K={k 1 , . . .k q } where q is the number of 146 lexicon terms.Each lexicon term k j is associated to 147 one or more emotions EM ={em 1 , . . .em r } where 148 r is the number of emotion classes 4 in the lexicon.149 When w i =k j , we extract the emotions EM k j as-150 sociated with w i .The final multi-label emotion 151 class for the document is the union of all emotions 152 associated with all of the words in the document 153 EM d ={EM w 1 ∪ EM w 2 ∪ . . .EM wp }.Multi-label Emotion Training To incorporate complex emotions into contextual embeddings, the node representations V and the adjacency matrix A are passed to a two-layer GCN where the second layer has an output dimension of s and a linear layer with an output dimension of r.We set s to 768 to follow popular pretrained embeddings.Graph propagation takes the input and maps each instance to multiple emotions.ReLu is used with binary cross entropy loss for multi-label learning.Back propagation updates the initial representations to incorporate emotion information during model training.The learned token node representations from the second GCN layer is extracted and used as the initial weights for BERT.During BERT training, the hidden layer of the [CLS] token is used for multi-label classification through a linear layer with an output dimension of r.Similarly, we use binary cross entropy loss function.The learned weights are extracted as multi-emotion contextual representations MM-EMOG EmoWord (EW) or EmoWordPiece (EWP) embeddings.

Figure 2 :
Figure 2: Emotion comparison using SenticNet lexicon usernames, URLs, emoticons and emojis on the platform.De-identification reduces noise during model training while preserving emoticons as separate tokens contextualises them like words.Comparing tokenisation setups, both CSSRS and Depression achieve better performance when wordpiece tokenised while a simple word split is better for TwitSuicide.We note that during graph construction using the word split setup, TwitSuicide's vocabulary size is only 330 while Depression and CSSRS have 1,178 and 2,673 respectively.The smaller graph of TwitSuicide allowed it to perform better on word split setup.Longer and larger datasets benefit more from wordpiece tokenisation because of the deconstruction of out-of-vocabulary words.Finally, we compared concatenating MM-EMOG embeddings with BERT and MentalBERT embeddings.There were no significant improvements in using one over the other so we retain BERT embeddings for the rest of the experiments.6 Conclusion Mental Illness Detection through individual social media posts is a challenging task due to limited information.Since mental health is deeply rooted in emotions, identifying all possible emotions within the text is crucial to further enrich contextual representations.We introduced MM-EMOG (Multilabel Mental Health Emotion Graph), which contextualises and harmonises complex heterogeneous emotions through a corpus-based, multi-label learning framework.Our results show that MM-EMOG successfully outperforms baselines in three social media mental health datasets with notable improvements over the most concerning classes.In the future, we aim to release a pretrained MM-EMOG model with generalised emotion representations for mental health downstream tasks.

Figure 1 :
Figure 1: Class Distribution mean" and "I feel the same way" are frequently expressed on the Supportive (SU) class however, these are directed toward situations that trigger negative emotions like having no one to talk to or being in an unpleasant environment.For the ID class, empathy is expressed towards hopelessness and self-harm.MM-EMOG captures emotional context that differentiates these better.SO fucking tired i want to die.*** adrenal exhaustion *** since surgery, I've not been well *** SC SC PC PC *** tired, *** foot hurts *** don't want to be here PC PC SC SC *** victim of a failed suicide attempt *** I dont wet-shave my neck.Ouch SI SI PC SC CSSRS Aannnnnnnd I failed... again.*** pills *** stomach Muscle cramp and Common cold chills... you mean.I think about blowing my brains *** the immensely sweet relief *** constant Anxiety and Fear no longer exist.All of my issues will disappear, and thats all that matters.Why is suicide bad, again?*** why should I continue?*** ID ID SU SU *** Im still sad that I had to go trough my life, sometimes bit angry to fate, *** nothing to show of my life.*** no longer bitter and *** that I was/am bad and deserved this.***IN IN SU ID *** you didnt study the right way :) Things change *** so dont give up!I thought I wouldnt make it *** but then I changed majors *** SU SU IN UN *** dressed in some of my finer casual *** made myself some coffee.scares get re opened *** pooring salt in them.I hate this feeling.*** pain im in again D D ND ND *** so revolting, yet so irresistible *** I must have it ND ND ND ND

Table 3 :
Best-found hyperparameters for all datasets, all lexicons, and all preprocessing setups.

Table 4 :
Qualitative comparison of MM-EMOG predictions over the two best performing baseline models: BERT and MentalBERT (MBERT).Examples are masked to prevent reverse search for each post.