Accurate Diagnosis of Suicide Ideation/Behavior Using Robust Ensemble Machine Learning: A University Student Population in the Middle East and North Africa (MENA) Region

Suicide is one of the most critical public health concerns in the world and the second cause of death among young people in many countries. However, to date, no study can diagnose suicide ideation/behavior among university students in the Middle East and North Africa (MENA) region using a machine learning approach. Therefore, stability feature selection and stacked ensembled decision trees were employed in this classification problem. A total of 573 university students responded to a battery of questionnaires. Three-fold cross-validation with a variety of performance indices was sued. The proposed diagnostic system had excellent balanced diagnosis accuracy (AUC = 0.90 [CI 95%: 0.86–0.93]) with a high correlation between predicted and observed class labels, fair discriminant power, and excellent class labeling agreement rate. Results showed that 23 items out of all items could accurately diagnose suicide ideation/behavior. These items were psychological problems and how to experience trauma, from the demographic variables, nine items from Post-Traumatic Stress Disorder Checklist (PCL-5), two items from Post Traumatic Growth (PTG), two items from the Patient Health Questionnaire (PHQ), six items from the Positive Mental Health (PMH) questionnaire, and one item related to social support. Such features could be used as a screening tool to identify young adults who are at risk of suicide ideation/behavior.


Introduction
According to the World Health Organization's (WHO) report [1], the rate of death by suicide is close to 800,000 globally. Suicide, as one of the most critical public health concerns, is the second reason for death among young people between the ages of 15 to 29 around the world. This age group in many countries belong to high school and university students. In England, the rate of university students' suicide has increased since 2009 [2], and, in the USA, death by suicide was ranked second among the same age group [3]. Suicide is more prevalent in developing countries, and, according to global statistics, around 79% of the annual suicide rate belongs to developing countries [1]. Despite some claims that suicide rate is lower in countries with dominant religious culture, a recent study including 12 countries with high Muslim population in Asia, Middle East, and Africa, found that the rate of university student suicide in these countries is comparable to the global data. Iran, as one of the developing countries located in the Middle East and North Africa (MENA) region, followed the global trend of suicide among university students. The university student suicide rates in Iran for men, women, and both were 4.20%, 2.90%, and 3.60%, respectively [4]. In one epidemiological survey of attempted suicide in Iran during 2007-2011, 72% were below 30 years [5]. In another study, the outcome and varieties of violent suicides during 20 years in the Southwest of Iran were investigated, and the age group 25-34 years were found to have the highest number of attempted suicides [6]. Bakhtar and Rezaeian [7] found that the rate of suicide attempts in the Iranian population was between 1.8% to 3.5%, and for suicide ideation between 6.2% to 42.7%.
As one of the leading causes of death for young people, suicide, and measures to prevent it are among the most critical health priorities in the "Mental Health Action Plan 2013-2020" [8]. To prevent suicide, researchers from different disciplines have tried to explore its predictors. Social and economic factors such as marital conflicts [5], a romantic breakup [9,10], and unemployment [11], are associated with suicide ideation and behavior. Yet, psychological issues are often considered as the leading reasons for suicide. Depression is strongly related to suicide in many studies [5,[12][13][14].
Exposure to trauma and post-traumatic stress disorder (PTSD) is yet another psychological issue that is significantly associated with suicide [15][16][17][18]. Trauma is defined as a deeply distressing and unpredictable event, which can occur directly or indirectly to individuals, and has ongoing adverse aftermaths on the person's physical, social, emotional, or psychological functioning [19]. The most common age for exposure to traumatic events is 16 to 20 [20] that has an overlap with the typical age group for suicide ideation and behavior. Repeated trauma exposure increases the risk of suicide behavior, and this finding is true for both natural and human-made traumatic events [21]. In the WHO mental health survey [22], 102,245 households in 21 countries reported a range of traumatic events that were associated with suicidal behaviors. The most prevalent types of traumatic events across both developed and developing countries were death of a loved one, witnessing of interpersonal violence, war, accidents, or traumatic event for a loved one. In this study, 9.6% of the population reported suicide ideation, and 2.8% had suicide attempts. Human-made trauma such as sexual and interpersonal violence had the most substantial effect on suicide behavior, and the number of traumatic events influences on the suicide ideation or attempt.
A time gap between suicide ideation and suicide attempt is critical in mental healthcare. Many researchers found a relationship between experiencing a traumatic event and post-traumatic stress disorder (PTSD) with higher risk of suicide ideation, and there is an association between the number and type of traumatic events and suicidality [22]. Several factors can protect people from suicidal behaviors. Positive mental health (PMH) [23] is one of these protecting factors. Some recent studies found that people with positive mental health (psychological and subjective well-being) are less likely to attempt suicide, even if they have suicide ideation [24][25][26][27][28]. Teismann and colleagues [29] found a moderating effect of PMH on the association of depression with suicidal thoughts among university students. Post-traumatic growth (PTG) can also affect suicide ideation and behaviors after a traumatic event [30][31][32]. Post-traumatic growth, as Tedeschi and Calhoun [33] defined it, refers to significant positive psychological changes in cognitive and emotional life resulting from the struggle with new conditions following traumatic or extremely stressful events. In particular, there are five domains of PTG: a greater appreciation of life, stronger social connections, developing a sense of power, recognizing few life's opportunities, and more spirituality [34]. People who experience PTG may be less likely to attempt suicide. Other factors, such as social support [30], resilience [35], and cognitive distortion [36], have been also shown to affect the relationship between suicide ideation and a suicide plan/attempt.
Although some studies attempted to predict the risk of suicide by investigating risk and protective factors, there is little knowledge of clinical prediction rules to predict university students' suicide ideation/behavior. Since suicide is a multi-faceted phenomenon that has complex risk and protecting factors to predict suicide ideation/ behavior, we need to use algorithms that "model complex relationships among a large number of factors" [37]. Recently, machine learning (ML) and related techniques started to enter the field of psychiatry and psychology with regard to suicide [38,39]. Passes and colleagues [17] predicted suicide in a sample of people with a mood disorder and found that previous hospitalization for major depression, PTSD, drug dependence, and psychosis could strongly predict the risk of death by suicide. In another study, Kessler and colleagues [40] predicted suicide in a sample of soldiers and veterans and found male sex, criminal offenses, and previous suicide ideation as the most influential risk factors for completed suicide. Kuroki [41] used a data mining approach to predict suicidal behaviors among 624 Filipino Americans and found depression and substance use disorder as the critical predictors of suicidal ideation. This study, however, did not focus on the sensitivity and specificity of the proposed diagnosis system. DelPozo-Banos and colleagues [13] explored the feasibility of using artificial neural networks to identify suicide risk among 2604 completed suicide cases. Prescription of psychotropic, depression, anxiety, and self-harm increased the risk of suicide in this system. The sensitivity, specificity, and accuracy of the proposed system were 64.57%, 81.86%, and 73.22%, respectively. Walsh and colleagues [42] utilized the ML approach to predict a suicide attempt among adolescents under the age of 18. Data used for this analysis extracted from electronic health records between 1998 to 2015 from a single medical center. The AUC in this system was 0.9. However, the sample group (adolescents) was different from the current study. In another study, Walsh and colleagues [43] conducted the same study among the adult population and found that the ML approach could predict the risk of suicide with an accuracy of 0.84. Both studies focused on a clinical sample that is different from the target group of the current study. Another study that can be comparable to our study is Ribeiro and colleagues [44]. Although it showed a good performance (AUC: 0.9), the population was different (adults with a history of self-injury), and it did not focus on important psychological variables that have been shown to be related to suicide, such as positive mental health, post-traumatic stress disorder, and post-traumatic growth. Apparently, the existing studies that used big data to predict suicide, used a limited number of inputs or omitted psychologically important inputs such as PTSD, while, as mentioned before, experiencing traumatic events and suicide risk is correlated. Moreover, there is an overlap between the age of trauma exposure [20], and suicide risk age [1] among university students. To the best of our knowledge, there is no study to diagnose suicide ideation/ behavior among university students in the MENA region and likely in the world with the help of machine learning.
Moreover, a reliable medical diagnosis system must meet the following simultaneous conditions, namely sensitivity, specificity, positive predictive value, and diagnostic odds ratio, higher than 80%, 95% [45], 95% [46], and 100% [47]. The methods developed in the literature do not meet such conditions. Therefore, this study aimed to diagnose suicide ideation/behavior through a machine learning approach toward a clinically-reliable diagnosis.

Sample Size Calculation and Sampling
Assuming the suicide ideation prevalence of (p = 26%) [48], the total sample size (N) with the precision of (d = 5%) and a type I error of (α = 0.01) could be calculated as the following [49,50].
A total of 511 subjects were sufficient. We have used an online sampling method to recruit the participants. This method is suggested to be an efficient, affordable, and practical method of sampling in national and international sampling [51], and is used in a number of studies with the student population [52][53][54]. A link for the questionnaire was created through Google form, and was distributed through different platforms including the universities' WhatsApp, Telegram, or Instagram groups. The recruitment process began in March 2020 and was stopped in May 2020 when we reached enough completed questionnaires, and, finally, 573 questionnaires with complete information were analyzed. Demographic characteristics consisted of age, gender, grade and educational level, the field of study, marital and occupational status, and history of psychological illness diagnosis. Moreover, the following six questionnaires were used.

PTSD Checklist (PCL-5)
The PTSD Checklist (PCL-5) [55] is a self-report checklist. This checklist has 20 items and screens the symptoms of PTSD and their severity in the last month. The checklist determines the most prevalent symptoms of PTSD including intrusion, avoidance, cognitive and mood alteration, and arousal and reactivity alteration in a 0 (not at all) to 4 (extremely) Likert scale. The scores range between 0 to 80, and 31 is considered as the cut-off score. People who scores 31 or higher in this scale indicate more PTSD symptoms with a specificity, sensitivity, and efficiency of 0.95, 0.85, and 0.95, respectively [56]. In this study, the Iranian version of the PCL-5 was used [57]. A 0.939 Cronbach's alpha was found for the current sample.

Post-Traumatic Growth Inventory (PTGI)
The Post-Traumatic Growth Inventory (PTGI) [58] assesses growth after traumatic events. This inventory has 21-items that focus on the five domains that may positively change after traumatic events. These domains include more and greater social connections, new possibilities, higher perceived skills and resources, more life's appreciation, and strengthening spiritual beliefs. The inventory is evaluated on a 6-point Likert scale (0-5) and higher scores demonstrate more PTG. Scores range between 0 to 105 with a cut-off point of 45, whereas the higher score show higher growth [59]. The PTGI has well-established validity and reliability, with Cronbach's alpha values between 0.67 and 0.90 [58]. In this study, the Iranian version of the PTGI was used [60]. The Cronbach's alpha for the current sample was 0.937.

Multidimensional Scale of Perceived Social Support (MSPSS)
"Family, Friends, and Significant Others" are the three domains of perceived social support that are assessed by the Multidimensional Scale of Perceived Social Support (MSPSS), as a self-report scale [64]. A 6-point Likert scale (1-7) assesses the level of perceived social support, and the total score ranges between 12 to 84 with a cut-off point of 48. The higher scores show a higher sense of having social support. In this study, the Iranian version of the MSPSS with good validity and reliability was used [65]. The Cronbach's alpha for the current sample was 0.922.

Positive Mental Health Scale (PMH)
The Positive Mental Health Scale (PMH) [66] assesses emotional and psychological aspects of well-being using nine items. Each item scores between 0 to 3 (do not agree-agree) and the higher scores show higher PMH with a cut-off point of 15. The scale showed good validity in different populations. In this study, the Iranian version of the PMH was used [67]. In the current sample, the Cronbach's alpha was 0.905.
The cut-off values for PMH and MSPSS questionnaire were calculated using the cut-off estimation method for receiver operating characteristic curve (ROC) proposed by Unal [68]. In this method, the "optimal" cut-point (c) is defined as the point to minimize the error function defined below.
where Se and Sp are the sensitivity and specificity obtained when comparing the results with the gold standard (high and low suicide risk

Suicide Behaviors Questionnaire-Revised (SBQ-R)
The Suicide Behaviors Questionnaire-Revised (SBQ-R) [69] contains four multiple-choice items that assess the frequency and severity of suicide ideation, suicidal attempts in the past year, and the possibility of suicide behavior in the future. The total SBQ-R severity scores range from 3 to 18 and the cut-off score of ≥8 can identify high and low risk groups. In addition, the cut-off score for the first question is ≥2. The Cronbach's alpha of 0.76 and 0.88 in nonclinical and clinical samples was reported in Osman et al. [69]. In this study, the Iranian version of the SBQ-R was used [70]. For the current sample, the Cronbach's alpha was 0.828.

Ethical Considerations
Ethics approval of the research was granted by the University of Isfahan's Ethic Committee with the number IR.UI.REC.1399.008 (approval date: 15 February 2020). To ensure confidentiality, the questionnaires were anonymous. Respondents could refrain from answering anytime they wished. There was no incentive for participation. However, participants could write an email and receive their test interpretations.

Methods
Choosing informative, discriminating, and stable features is a crucial step for designing a classification model [71]. In this research, stability feature selection is performed [72]. The idea of stability selection is to inject more noise into the original problem by providing bootstrap data batches and to use a baseline feature selection algorithm to investigate which features are essential in each sampled version of the data. The results of each bootstrap sample are then used to calculate the stability score for each feature. For this purpose, logistic regression [73] is utilized to analyze feature importance in each bootstrap step, and, if a feature was significant in most of the iterations, it would be selected as a stable feature. Such a selection was made based on the Type one error of 0.05 or less.
Ensemble learning is a concept that multiple weak learners are trained to solve the problem. They are then combined to achieve better performance. The stack ensemble methods use various weak learners separately, and a meta-model is learned to predict outputs on top of all weak learners. In this paper, decision tree (DT) C4.5 was used as a weak learner, and selected features were used for training a meta DT classifier, which was created of stacked ensembled DT [74]. Each DT was tuned during training using a grid search. Tuned parameters contain the depth of the tree and feature importance metrics. Besides, for creating a stacked ensemble DT model, data are over-sampled using random sampling for overcoming an imbalance of the dataset issue [75]. The minority parts are trained with more weight on training each DT [76]. The performance of the system is evaluated using a three-fold cross-validation. The block diagram of the proposed method is depicted in Figure 1. trained with more weight on training each DT [76]. The performance of the system is evaluated using a three-fold cross-validation. The block diagram of the proposed method is depicted in Figure 1. The demographic variables, and the questionnaires PCL-5, PTGI, MSPSS, PHQ-9, and PMH were used as the inputs of the prediction system, while the suicidal high and low-risk groups (calculated from SBQ-R questionnaire) were used as the binary output.

Validation
Three-fold cross-validation was used to guard against testing hypotheses suggested by the data (Type III errors) [77]. The following performance indices were calculated on each test fold as well as the cross-validated confusion matrix [78]. The demographic variables, and the questionnaires PCL-5, PTGI, MSPSS, PHQ-9, and PMH were used as the inputs of the prediction system, while the suicidal high and low-risk groups (calculated from SBQ-R questionnaire) were used as the binary output.

Validation
Three-fold cross-validation was used to guard against testing hypotheses suggested by the data (Type III errors) [77]. The following performance indices were calculated on each test fold as well as the cross-validated confusion matrix [78].
where, TP, TN, FP, and FN were True Positives, True Negatives, False Positives, and False Negatives. Also, Se, Sp, and PPV are sensitivity, specificity, and positive predictive values (a.k.a., precision). The compositive indices AUC, MCC, K(C), DOR, and DP were the areas under the ROC Curve (a.k.a., the balanced diagnostic accuracy), the Matthews correlation coefficient [79], the Cohen's Kappa coefficient [80], diagnostic odds ratio, and the discriminant power [81], respectively. Moreover, following the Standards for the Reporting of Diagnostic Accuracy Studies (STARD) guideline [82], the CI 95% of the performance indices of the cross-validated confusion matrix was presented. The selected features (i.e., demographic and questionnaire items) were also reported. Note that, since the accuracy is biased toward the majority class in an imbalanced dataset, it was not reported since it could overestimate the overall performance. Alternative objective indices such as AUC and MCC are preferred [79,81].

Results
The average age of all participants was M = 24.45 (SD = 6.651) years, and 72.1% were female. In addition, 63% were undergraduates, while 37% were graduate students. Furthermore, 60.7%, 5.8%, 23.6%, and 7.2% were liberal arts, basic sciences, engineering sciences, and medical sciences students. Among the 573 participants, 25% were at high-risk of suicide (SBQ-R ≥ 8). The characteristics of participants belonging to the low and high-risk groups are shown in Table 1 and further demographic information in Table A1. The relative frequency of the questionnaire items of the PCL, PTG, MSPSS, PHQ, PMH, and SBQ-R in low and high-risk groups are shown in Tables A2-A7. The selected features, along with their importance, are shown in Table 2. The weight parameter ranges from zero to one. The higher it is, the more important the feature is. The performance of the prediction system is displayed in Table 3, where the performance indices are shown in each test fold, as well as the cross-validated confusion matrix. The CI 95% of the later indices are also provided.
The performance of the proposed diagnosis system could be provided based on the interpretations of the overall AUC, MCC, DP, and K(C) scores as the following. The proposed prediction system has excellent balanced diagnosis accuracy with a high correlation between predicted and observed class labels, fair discriminant power, and excellent class labeling agreement rate.

Discussion
To the best of our knowledge, this is the first study of its kind to make an accurate diagnosis of suicide ideation/behavior using robust ensemble learning in a university student sample in the MENA region. According to the results, we were able to identify 23 items out of 92 items that could be used to create a screening tool to distinguish young adults with high and low suicide ideation/behavior. The screening tool may have important implications for universities and policymakers that are responsible for university students' suicide prevention and treatment.
Based on our results, exposure to trauma, history of psychological illness, PTSD symptoms (9 items out of 20 items in PCL-5), positive mental health (6 items out of 9 in PMH), depression symptoms (3 items out of 9 in PHQ), post-traumatic growth (2 items out of 21 in PTGI), and social support (1 out of 12 items in the MSPSS) were recognized as the main variables for predicting suicide ideation and attempt.
The first important findings showed that exposure to trauma and PTSD symptoms could diagnose suicide ideation/behavior. The main items here were the type of exposure to trauma (first-hand or witnessing trauma) and PTSD symptoms. These findings complement previous findings showing that there is a strong link between trauma exposure and suicide risk [15][16][17][18]36,85,86]. PTSD symptoms such as cognitive distortion, irritability, and an intense feeling of distress when reminded of the traumatic event were the main variables that could predict suicide risk. Previous studies also found that PTSD has a strong and significant relationship with suicide ideation among young adults [87,88]. As Shafiei and colleagues [89] and Whiteman and colleagues [36] stated, people with traumatic experiences often face negative emotions such as anger, fear, horror, guilt, and shame. These negative emotions may decrease meaning in life, and, along with cognitive distortion, may decrease the ability to problem-solve and make decisions.
Psychological illness is another input feature in three prediction systems that could predict suicide ideation/behavior. This is in line with previous studies that found a strong relationship between psychological distress and suicide ideation [85,[90][91][92]. In our study, the main psychological problems that could predict suicide ideation/behavior were depression, anxiety, and obsessive-compulsive disorder. Previous studies also found that depression [13,41,86,93] and anxiety [13] predict suicide ideation. In line, feeling bad about oneself, lack of self-worth, and shame were selected by the proposed diagnostic system as another essential factor predicting suicide ideation/behavior.
In our study, a lack of positive mental health, i.e., psychological and subjective well-being, was selected by the proposed diagnostic system, as another main predictor for suicide ideation/behavior. From the nine items of the PMH scale, six items were selected by the system as effective factors on suicide ideation/behavior. This is in agreement with some previous studies that have shown that positive mental health moderates the effects of stressful life events on suicidal thoughts [24]. In several recent studies, PMH is introduced as an essential target in suicide prevention and treatment programs because of its substantial impact on suicide ideation [26,27].
One of the notable factors in our research was post-traumatic growth (PTG) items. According to the literature, PTG moderates suicide risk among students who experienced traumatic events [32], and lower levels of PTG can increase suicidality [30]. In our study, two items from the PTG questionnaire were selected as predictors for suicidality, and both items are related to the appreciation of life. Accordingly, people who appreciate the value of their life and can appreciate each day are among the low-risk population for suicide ideation/behavior. This finding is consistent with previous studies that show having a reason to live [93] and gratitude [94] are protective factors for suicidality.
Our system did not select social support as one of the well-studied buffering factors in the suicide literature [95,96] as a predictor of high vs. low suicide ideation/behavior. Although there was a significant correlation between social support's items with some of the items of the PTGI and PMH, out of 12 items in the MSPSS, the only social support factor that was selected was the item: "I can talk about my problems with my family." With regard to the idea that Iran is a collectivist society, and family and kinship have a strong influence on the individuals' life, this finding is not surprising. For the majority of Iranians, the family is the first place to seek help and comfort. The role of parental connectedness and decreasing the risk of suicide is found in some other studies as well [7,97,98].

Limitations and Strength
The advantage of our study is the high accuracy of the proposed statistical model. It is the first study to analyze the comprehensive factors related to suicide ideation/behavior and to design a diagnostic system for university students at risk of suicide ideation/behavior in the MENA region. However, since a pure student sample was studied with the majority of the participants being female, one should refrain from generalizations to other samples. Furthermore, all variables were assessed using self-report measures. This is of great advantage with regard to the development of an easy-to-administer screening instrument. At the same time, bias effects cannot be ruled out. Therefore, it might be beneficial to cross-validate the described diagnostic model using interviewer-based assessments in future studies. Another limitation of the study is related to the distribution of gender and major among the sample. According to the Statistical Centre of Iran [99], the number of female students in all majors except the engineering field was higher than male students in 2016. Therefore, it was not surprising to have more female respondents than males in this study. The number of students in different majors, however, did not match with the country's student population and we have more respondents from the fields of liberal arts and engineering than from basic sciences and medicine. We see this as the limitation of the study. However, the diagnostic system did not select these features and it is, thus, not problematic in the computer-aided-diagnostic systems.

Implications
As mentioned before, the findings of this study have practical implications for policymakers and universities. Concerning the findings that trauma exposure and PTSD symptom were selected as the main variables in diagnosis of suicide ideation/behavior, routine assessment of cognitive distortion and PTSD symptoms among students at risk is important. The majority of the Iranian universities have free-of-charge counseling service. Having a trauma-informed counseling service to identify at-risk students is essential. As a lack of positive mental health and a lack of some of the post-traumatic growth's items were selected as other predictors of suicide ideation/behavior, creating national programs to enhance PMH and PTG for the at-risk students is recommended. Based on the findings of this research, a screening tool will be designed, and the application can be used in universities to screen high-risk and low-risk students for suicide ideation/behavior for future prevention and intervention programs. Prospective studies should follow to further investigate the efficiency of the described screening tool.

Conclusions
To the best of our knowledge, this study is the first study to use robust ensemble learning to reliably and accurately diagnose suicide ideation/behavior among university students in the MENA region. Based on our results, we found that trauma exposure and PTSD symptoms, psychological problems (depression, anxiety, and obsessive-compulsive disorder), low mood, and low self-esteem can diagnose students that are at high risk for suicide ideation/behavior. We have also found that positive mental health can strongly affect suicide ideation/behavior, meaning that individuals that had high scores in PMH had less suicide ideation/behavior. People who appreciate life also showed less suicidality. Therefore, two items related to post-traumatic growth will be included in the final screening tool. Social support was only necessary for terms of family support in our prediction system. This study was performed to create a screening tool to identify university students at risk of suicide ideation/behavior and to help policymakers and universities to make appropriate on-time prevention and early intervention programs.
Author Contributions: A.N.: Conceptualization, methodology, data curation, investigation, resources, writing-original draft preparation, writing-review and editing, visualization, project administration, formal analysis, and supervision. T.T. Conceptualization, methodology, data curation, resources, investigation, writing-original draft, writing-review and editing, visualization, project administration, formal analysis. Z.A.: Conceptualization, data curation, investigation, resources, and writing-review and editing. M.R.M.: methodology, software, validation, data curation, formal analysis, and resources. M.M.: Conceptualization, methodology, software, validation, data curation, formal analysis, resources, writing-review and editing, visualization, supervision, project administration, and funding acquisition. M.A.M.: methodology, software, validation, data curation, formal analysis, resources, supervision, and funding acquisition. All authors have read and agreed to the published version of the manuscript and are accountable for all aspects of the work and its integrity and accuracy. Acknowledgments: The authors would like to thank the team, working on this national project, as well as all university students who participated in this project.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. 56 (13%) 8 (5.6%) I was exposed to it as part of my job.