Using Decision Trees to Examine Environmental and Behavioural Factors Associated with Youth Anxiety, Depression, and Flourishing

Modifiable environmental and behavioural factors influence youth mental health; however, past studies have primarily used regression models that quantify population average effects. Decision trees are an analytic technique that examine complex relationships between factors and identify high-risk subgroups to whom intervention measures can be targeted. This study used decision trees to examine associations of various risk factors with youth anxiety, depression, and flourishing. Data were collected from 74,501 students across Canadian high schools participating in the 2018–2019 COMPASS Study. Students completed a questionnaire including validated mental health scales and 23 covariates. Decision trees were grown to identify key factors and subgroups for anxiety, depression, and flourishing outcomes. Females lacking both happy home life and sense of connection to school were at greatest risk for higher anxiety and depression levels. In contrast with previous literature, behavioural factors such as diet, movement and substance use did not emerge as differentiators. This study highlights the influence of home and school environments on youth mental health using a novel decision tree analysis. While having a happy home life is most important in protecting against youth anxiety and depression, a sense of connection to school may mitigate the negative influence of a poor home environment.


Introduction
Mental illness has garnered increased global concern in recent years as a leading contributor to global disease burden [1,2]. Youth have been identified as a priority group for addressing mental health concerns [3,4], given that the onset of mental illness primarily occurs during adolescence [5] and untreated mental illness during adolescence can lead to negative consequences in adulthood [6]. Depression and anxiety are among the mental illnesses associated with highest suicide risk [7], and have also been associated with increased substance use during adolescence [8,9]. While previous efforts around youth mental health have primarily focused on combating mental illnesses such as anxiety and depression, recent approaches have also emphasized the importance of enhancing mental well-being [10][11][12]. Flourishing, defined as a state of psychosocial well-being, has been associated with increased life expectancy [13]. Among youth, flourishing has also been associated with lower likelihood of substance use [8,9,14] and improved academic performance [15][16][17].
Following Bronfenbrenner's social-ecological model [18], the causal mechanisms driving mental illness onset in youth involve complex interactions between a hierarchical network of individual (e.g., genetic, biological) and environmental (e.g., interpersonal, organizational, community, public policy) factors. Past studies have found widely varying estimates of the proportion of mental illness onset attributable to genetic vs. environmental Int. J. Environ. Res. Public Health 2022, 19, 10873 2 of 16 influences: anywhere from 15% to 80% of youth-onset depression [19] and 18% to 35% of youth-onset anxiety [20] is heritable, with the remaining attributable to environmental factors. Genetic and environmental influences on flourishing are less understood, though one past study examining related well-being constructs found heritability estimates of 34% for subjective happiness and 44% for life satisfaction [21]. Thus, while there is evidence of a genetic component to youth mental illness and well-being, the contextual environment plays an influential role.
From a public health perspective, the contextual environment is important as many environmental risk and protective factors can be considered modifiable and hence potential intervention leverage points. The importance of context on youth mental health outcomes is recognized within national public policy guidance. The Mental Health Strategy for Canada [22] published by the Mental Health Commission of Canada (MHCC) prioritizes support for youth mental health with calls to "increase the capacity of families, caregivers, schools, post-secondary institutions and community organizations". Publicly funded community-and school-based supports can act as universal access points for prevention and early intervention efforts and are consistently highlighted as pillars in federal [12,22] and provincial mental health strategies. Related interpersonal factors such as family relationships [23][24][25], peer relationships [26,27], bullying [28], and school connectedness [23,27,29] have previously been linked to youth mental health outcomes. Previous research has also found associations to modifiable behavioural factors such as diet [30], movement behaviours [30][31][32], sleep [33], and substance use [34]. However, two major limitations of past studies are that associations to domains of risk and protective factors are generally examined in isolation, and that the primarily regression-based analytic methods focus on quantifying average effects across the study population without consideration for potential high-risk subgroups.
Decision trees are a machine learning-based analytic technique comprising several classes of modeling algorithms [35], which group similar subjects with respect to an outcome using a tree structure. While more commonly used in medical screening and diagnostics for disease prediction, decision trees have seen recent increasing use in public health research [36] to examine complex relationships between outcomes and risk factors and identify high-risk subgroups to whom prevention and intervention measures can be targeted. Decision trees have previously been used to examine depression outcomes in various adult populations; past studies involving various environmental factors have found social connection [37] and aspects of financial stability [37,38] to be important, while substance use was only found to be a risk factor among certain subgroups [38]. However, youth face distinct contextual risk factors, and previous research using decision trees to examine youth mental health is limited. Seely et al. [39] examined major depressive disorder (MDD) onset among adolescent females and found that the subgroup of previously depressed females with poor school functioning was at greatest risk for MDD onset, while family support was only a protective factor among the subgroup of females without previous depressive symptoms. Hill et al. [40] found friend support to be a protective factor against the development of MDD among those with subclinical symptoms, while subgroups with history of anxiety and substance use disorder were at higher risk. These studies highlight the importance of interpersonal factors (school and family support) and behavioural factors (substance use); however, sample sizes in both studies were small. To our knowledge, no previous studies have used decision trees to examine anxiety or flourishing outcomes among youth.
Given the importance of environmental factors on youth mental health, the purpose of this study is therefore to use decision tree analysis to examine associations of modifiable behavioural and interpersonal risk factors with youth anxiety, depression, and flourishing, with a focus on characterizing groups at highest risk of mental ill-health. Results of this exploratory analysis are contrasted against those of traditional regression-based analysis and compared to findings from previous literature to highlight the unique insights gleaned from decision tree analysis.

Study Design and Sample
COMPASS is a prospective cohort study (2012-2021) designed to examine the impact of policies and environmental characteristics on Canadian secondary school students [41]. COMPASS collects data on multiple health behaviours and risk factors including mental health, substance use, healthy eating, movement behaviours, bullying and academics. Additional details about the COMPASS study design and methods are available in print [41] and online (https://uwaterloo.ca/compass-system accessed on 13 July 2022). The COMPASS study received ethics clearance from the University of Waterloo Research Ethics Board (ORE 30118) and participating school boards.
The current study uses student-level data from 2018-2019 (Year 7) of the COMPASS Study. The sample consists of 74,501 students from 136 schools in Ontario (61 schools), Alberta (8 schools), British Columbia (15 schools) and Quebec (52 schools). Schools were purposefully recruited into the COMPASS study according to their use of active-information, passive-consent protocols, which have been shown to be important for collecting unbiased data among youth [42]. Further details on general school recruitment procedures [43] and 2018-2019 sample recruitment [44] are available. All students within a recruited school who received passive parental permission [41] were invited to participate, and students could withdraw at any time. The participation rate for 2018-2019 was 81.9%, with the primary reason for non-participation being absenteeism at the time of data collection.

Compass Student Questionnaire
The COMPASS student questionnaire is an anonymized, self-administered, paperbased questionnaire. The questionnaire is completed during class time and takes approximately 40 min to complete. Data collection procedures for student questionnaire administration are documented [45]. This study examined three mental health scale outcomes measuring anxiety, depression, and flourishing, as well as 23 predictor measures related to questionnaire items on demographics, body weight, healthy eating, movement behaviours, substance use, bullying, academics, and perceived school, family, and friend support.

Mental Health Outcome Measures
Depression is measured using the Centre of Epidemiologic Studies Depression Scale 10-Revised (CESD-10) [46,47]. The CESD-10 is measured as a continuous score ranging from 0 to 30, with higher scores indicating greater degrees of depressive symptomatology, and scores at or above 10 indicating clinically relevant depressive symptoms [46]. Anxiety is measured using the Generalized Anxiety Disorder 7-item Scale (GAD-7) [48]. The GAD-7 is measured as a numeric score ranging from 0 to 21, with higher scores indicating greater levels of anxiety, and scores at or above 10 indicating clinically relevant anxiety symptoms [48]. Flourishing is measured using a modified version of Diener's Flourishing Scale (FS) [49]. The FS is a numeric score ranging from 8 to 40 with higher scores indicating greater levels of flourishing. Consistent with recommendations for Likert-style scales [50,51] all individual mental health scale items were person-mean imputed for students missing 1 or 2 items. Students missing three or more scale items on the GAD-7, CESD-10, or FS outcomes were not found to be significantly different on any predictor measures from students missing two or fewer values and were therefore excluded from the respective analyses.

Predictor Measures Demographics
Students are asked to indicate their sex (male, female) and age (12 to 18 years). Students self-identify their ethnicity with options for White, Black, Asian, Hispanic, and Other, with the option to select multiple ethnicities. Weekly spending money is measured as a proxy for socioeconomic status, with options ranging from "$0" to "More than $100".

Weight Status and Perception
Students are asked how they describe their weight with options for Slightly/Very Underweight, About the Right Weight, or Slightly/Very Overweight. An objective measure of Body Mass Index (BMI) is calculated based on self-reported height and weight, and classified into Underweight, Normal Weight, Overweight, or Obese based on World Health Organization age-and sex-adjusted cut-offs. Students with missing height or weight data are included in a separate Not Stated category due to the tendency for BMI data to have non-random missingness mechanisms [52].

Diet and Eating Behaviours
Students are asked whether they eat breakfast daily and their number of daily servings of fruits and vegetables.

Movement Behaviours
Daily moderate-to-vigorous physical activity is measured by asking students the amount and the intensity of activity performed on each of the last seven days. Total daily screen time is measured by asking students the amount of time they usually spend texting/messaging/emailing, playing video/computer games, talking on the phone, watching TV/movies and surfing the internet. Daily sleep time is also measured by asking how much time they usually spend sleeping. These measures have been shown to have moderate validity when compared to objective measures and high test-retest reliability [53].

Substance Use
Current use of cigarettes and e-cigarettes is measured based on students indicating any use in the last 30 days. Current use of cannabis is measured based on use at least once a month in the past 12 months. Current binge drinking is measured based on having five or more drinks at least once a month in the past 12 months.

Bullying and Academics
Bullying is measured using two indicators of whether students have been bullied or have bullied others in the past 30 days. Academic expectations are measured based on students indicating expectation to attend some form of post-secondary education. Truancy is measured based on the number of classes skipped in the past four weeks.

School Connectedness
School connectedness (SC) is measured using an adapted version of the National Longitudinal Study of Adolescent Health SCS-5 item scale [54]. The SC scale is as a numeric score ranging 6 to 24, with higher scores indicating greater SC. Scale items include the SCS-5 measures "I feel close to people at my school", "I feel I am part of my school", "I am happy to be at my school", "I feel the teachers at my school treat me fairly", and "I feel safe in my school", and an additional measure "Getting good grades is important to me", with response options ranging from Strongly Agree to Strongly Disagree.

Social Support
Family and friend support are measured based on three individual items from the Multidimensional Scale of Perceived Social Support [55]. Students are asked to indicate level of agreement with the statements "I have a happy home life", "I can talk about my problems with my family", and "I can talk about my problems with my friends".

School-Level Census Data
Province and school enrolment size are recorded for each participating school. School area median income and school urbanicity are measured by linking to Statistics Canada 2016 Census data based on each school's forward sortation area [56,57].

Analysis
Mixed effects regression trees were separately grown for GAD-7, CESD-10 and FS outcomes including all predictor variables. Random Effects EM (RE-EM) trees were used following the algorithm proposed by Sela and Simonoff [58] and Hajjem [59] to account for school-level clustering based on the assumption that students from the same school may have greater similarity in responses than students from different schools. Students with missing values on a given outcome were therefore excluded from the analysis, while missing predictor values were included and accounted for using surrogate splitting. Given the large sample size, a splitting rule was set requiring a minimum increase to adjusted Rsquared (R 2 adj ) of 0.005 to limit splits that would be unlikely to improve overall prediction accuracy. Tree pruning using 10-fold cross-validation was performed to limit overfitting to the sample data. The smallest tree within one standard deviation of the minimum cross-validation error was chosen. The R software was used for all analyses [60]; package "REEMtree" [61] was used to grow the trees, and the package "rpart.plot" [62] was used for plotting.
To provide a comparison of the RE-EM tree results, linear mixed effects regression (LME) models were also fit for each outcome including all predictor variables, using the R package "nlme" [63]. Students with missing values on a given outcome or any predictors were excluded from the analysis; maximum likelihood estimation is used within LME to account for missing at random data. A random intercept term was included to account for school-level clustering. Backward variable selection was implemented based on Akaike's Information Criterion (AIC). Intraclass correlation coefficients (ICCs) were calculated on null LME models to quantify the amount of variability in mental health outcomes that can be attributed to differences between schools.

Sample Characteristics
Sample characteristics are shown in Table 1. The mean GAD-7 score in the sample was 6.2 (SD 5.6) with 24.0% of the sample having scores of 10 or higher, which indicates clinically relevant anxiety symptoms. The mean CESD-10 score was 8.8 (SD 6.1) with 37.0% of the sample having scores of 10 or higher, indicating clinically relevant depressive symptoms. The average FS score was 32.2 (SD 5.7). The sample was 49.1% female with mean age 15.2 (SD 1.5) and predominantly identified as white (68.5%). ICCs showed modest between-school variability of 3.35% in student GAD-7 scores, 2.12% in CESD-10 scores, and 4.29% in FS scores.

GAD-7
The RE-EM tree fitted to the GAD-7 outcome is provided in Figure 1. The R 2 adj for the model was 0.23. Having a happy home life was identified as the primary splitting factor; that is, the factor that best distinguishes between high and low GAD-7 scores. Among students without a happy home life, school connectedness (SC) was identified as a protective factor. The highest risk subgroup comprised students without a happy home life and with low SC (score < 15.3); the average GAD-7 score in this group was 11.9, which is above the threshold of 10 for having clinically relevant anxiety symptoms. This subgroup constituted 7% of the total sample. Among students with higher SC (score ≥ 15.3) females had average GAD-7 scores nearly 3 points higher than their male counterparts (9.89 compared to 6.97), closely approaching the clinical threshold.

GAD-7
The RE-EM tree fitted to the GAD-7 outcome is provided in Figure 1. The R 2 adj for the model was 0.23. Having a happy home life was identified as the primary splitting factor; that is, the factor that best distinguishes between high and low GAD-7 scores. Among students without a happy home life, school connectedness (SC) was identified as a protective factor. The highest risk subgroup comprised students without a happy home life and with low SC (score < 15.3); the average GAD-7 score in this group was 11.9, which is above the threshold of 10 for having clinically relevant anxiety symptoms. This subgroup constituted 7% of the total sample. Among students with higher SC (score ≥ 15.3) females had average GAD-7 scores nearly 3 points higher than their male counterparts (9.89 compared to 6.97), closely approaching the clinical threshold.  Among those with a happy home life, sex was identified as a key differentiating factor; however, SC was a protective factor for both males and females. Both subgroups of males with high and low SC had lower average GAD-7 scores than females, except for the small subgroup of females with very high SC scores. Notably, females with low SC (score < 17.5) had much higher average GAD-7 scores than their male counterparts (9.07 compared to 5.74). The largest final subgroup comprised males with high SC who indicated having a happy home life (31.2% of sample), and this group had the lowest average GAD-7 score of 3.47.
The LME model for GAD-7 score is provided in

CESD-10
The RE-EM tree fitted to the CESD-10 outcome is provided in Figure 2. The R 2 adj for the model was 0.30. Like the GAD-7 tree, having a happy home life was identified as the primary splitting factor. Among those without a happy home life, SC was the most important factor, followed by sex, with the highest risk subgroup comprising females without a happy home life and with low SC (average CESD-10 score 16). Among both subgroups with low and high SC, males had lower average CESD-10 scores than females. Notably, the average CESD-10 score met or exceeded the threshold for clinically relevant depressive symptoms of 10 or higher among all subgroups without a happy home life.

CESD-10
The RE-EM tree fitted to the CESD-10 outcome is provided in Figure 2. The R 2 adj for the model was 0.30. Like the GAD-7 tree, having a happy home life was identified as the primary splitting factor. Among those without a happy home life, SC was the most important factor, followed by sex, with the highest risk subgroup comprising females without a happy home life and with low SC (average CESD-10 score 16). Among both subgroups with low and high SC, males had lower average CESD-10 scores than females. Notably, the average CESD-10 score met or exceeded the threshold for clinically relevant depressive symptoms of 10 or higher among all subgroups without a happy home life.  Among those with a happy home life, SC was again the most important factor. Females with a happy home life but low SC (score < 17.5) had an average CESD-10 score of 11.9, exceeding the threshold for clinically relevant depressive symptoms. Students of both sexes with a happy home life were further differentiated by whether they felt comfortable talking about problems with their family. Among those who did not feel comfortable, females had higher average CESD-10 scores than males. Those who did feel comfortable were further split based on having very high very high SC, with those students having the lowest average CESD-10 score of 5.18, followed by those with moderately high SC scores who had an average CESD-10 score of 6.85. Notably, being able to talk about problems with family was identified as a protective factor only among the subgroup of students with a happy home life and high SC.
The LME model for CESD-10 score is provided in

Flourishing Scale
The RE-EM tree fitted to the FS outcome is provided in Figure 3. The R 2 adj for the model was 0.42. The primary splitting variable is SC score. Among those with both moderately high and very high SC, having a happy home life was identified as the next most important factor. Students with very high SC and a happy home life had the highest average flourishing score of 36.9. Among students with moderately high SC and a happy home life, those who felt able to talk about problems with family had higher FS scores than those without (34.55 vs. 32.6), though this factor was not identified as important among students who did not already have a happy home life.
lems with family was identified as a protective factor only among the subgroup of students with a happy home life and high SC.
The LME model for CESD-10 score is provided in

Flourishing Scale
The RE-EM tree fitted to the FS outcome is provided in Figure 3. The R 2 adj for the model was 0.42. The primary splitting variable is SC score. Among those with both moderately high and very high SC, having a happy home life was identified as the next most important factor. Students with very high SC and a happy home life had the highest average flourishing score of 36.9. Among students with moderately high SC and a happy home life, those who felt able to talk about problems with family had higher FS scores than those without (34.55 vs. 32.6), though this factor was not identified as important among students who did not already have a happy home life.  Among those with low SC, having a happy home life was again identified as the most important factor, and being able to talk about problems with family was identified as important among those with a happy home life. The tree further differentiated subgroups by SC among those either without a happy home life or who felt unable to talk about problems with family. The highest risk subgroups comprised students without a happy home life and with low or very low SC, having average FS scores of 23.8 and 15.3, respectively.
The LME model for FS score is provided in . While no sex differences were identified in the RE-EM tree, male sex was significantly associated with higher FS score in the LME model (Est. 0.10 [0.04,0.17]), though the magnitude of association was small.

Discussion
This study used decision trees to examine associations between a range of behavioural and interpersonal risk factors and anxiety, depression, and flourishing outcomes among a large sample of Canadian youth. For all outcomes, the two factors that consistently emerged from the decision trees models as most important were having a happy home life and strong sense of connection to school. The consistency in association seen across three related but distinct measures of mental health provides strong support for the importance of positive home and school environments. Notably, while this study also included a wide array of modifiable behavioural measures that have previously been shown to be related to youth mental health outcomes [30][31][32][33][34] none of these emerged as important in the final tree models. This suggests that interpersonal relationships, particularly those related to home and school environments, are more strongly associated with youth anxiety, depression and flourishing than the individual health behaviours more commonly examined in isolation in the literature. This is important as some characteristics of social support related to school connectedness (SC) and happy home life are potentially modifiable through prevention and intervention efforts by schools and public health professionals. These findings support calls by the MHCC and provincial mental health strategies for prioritization of resources to families and schools for mental health promotion and primary prevention efforts.
The decision tree analysis used in this study is a hypothesis-generating approach in which all available potential risk factors are entered into the models without a priori assumptions. This contrasts with most past research in this field, which has generally taken a hypothesis-testing approach based on theorized associations to a particular risk factor or domain of factors. Despite the difference in approach, the results of the current study align with previous research into the influence of home [24,25,[64][65][66][67][68] and school environments [27,29,37,68,69] on youth mental health. However, behavioural factors such as diet, movement behaviors, and substance use which have previously been associated with mental health outcomes [30][31][32][33][34] were not identified as important differentiating factors within the decision tree models in the present study. In fact, the most important factors identified here around social support are typically not included in traditional analyses examining behavioural factors. Given that decision trees also tend to be more parsimonious than regression models in isolating key differentiating factors, the current findings do not necessarily contradict the associations seen in past studies, but rather suggest that the interpersonal relationships from home and school environments are influential factors that require additional consideration in the literature moving forward.
Having a happy home life was identified as the primary distinguishing factor between groups with low and high anxiety and depression scores. Students who indicated not having a happy home life had the highest average GAD-7 scores, with values for females approaching or exceeding the threshold for clinically relevant anxiety symptoms even among those with high SC. Average CESD-10 scores also approached or exceeded the clinical threshold for students of both sexes who indicated not having a happy home life. The influence of the home environment on youth anxiety and depression is well-documented. Past reviews have found consistent associations between parenting style [24,64], interparental conflict [65], and early life stressors [66] on anxiety and depression during adolescence. A review of various sources of social support also found parents and family to be among the most important sources of support to protect against depression in children and adolescents, especially for females [25]. These findings also align with previous decision tree results from Seeley et al. [39] which found family support to be protective among females without previous MDD. In the current study, the home environment was also influential on flourishing: students who indicated having a happy home life had higher average FS scores across all subgroups. While this area of research is newer, these findings are consistent with past studies which have found family resilience and connection to be associated with greater flourishing [67] and adverse family experiences to be associated with lesser flourishing [68] in children and youth. The measure of home environment used in the current study does not provide a definition of the term "happy home life" and is therefore subjective to an individual respondent's interpretation. Nevertheless, the strong differentiation seen on this measure justifies the need for future validation work to understand how it is interpreted by students. Given that this study also included a measure on feeling able to talk about problems with family, this suggests that the concept of happy home life in relation to mental health is broader than merely the perception of open communication. The perception of happy home life could also be affected by early childhood experiences. While some elements of home life such as parenting style may be considered modifiable through educational interventions, other factors surrounding family dynamic may not be considered modifiable from the perspective of external policymakers and public health professionals. Future work should examine more specifically which aspects of perceived happy home life are contributing to the protective effect seen in this study.
SC was also identified as a key differentiating factor across all outcomes, highlighting the importance of a positive school environment to youth mental health. Past research has similarly found SC and belonging to be protective against depression [27,29]. In the current study, SC was protective among students without a happy home life; average GAD-7 scores were at or below the clinical threshold for those who had high SC, compared to exceeding the threshold for those with low SC. Average CESD-10 scores were also over 4 points lower for those with high SC among both sexes without a happy home life. This is consistent with past research which found that SC moderated the relationship between family obligations and emotional distress among middle and high school students [23]. SC was also identified as the primary distinguishing factor between groups with low and high FS scores. Smaller studies have found consistent associations between sense of school community or belonging with measures of wellbeing [69,70]. This finding has important implications for school-based interventions since it suggests that schools can play a meaningful role in increasing mental wellbeing among students-even among those who may not have a happy home life-by cultivating a climate of connection and belonging. Further research into evidence-based policy and program interventions for increasing school connection is warranted.
Consistent with literature regarding adolescent mental illness prevalence [1], differences by sex were identified for anxiety and depression outcomes in the decision trees, with female subgroups having consistently higher average GAD-7 and CESD-10 scores than corresponding male subgroups. These differences are commonly posited to be related to sociocultural gender norms [1], with females being more likely than males to exhibit internalizing symptoms [71,72]. Notably, no differences by sex emerged in the decision tree for flourishing. This is an important finding in the context of school-based intervention as it suggests that males and females could benefit equally from initiatives to increase school connection. Other demographic factors such as ethnicity and age were found to be associated with mental health outcomes in the LME models but not the decision tree models.
While the decision tree results provide insight into distinguishing factors and highrisk subgroups, the LME results describe the average effect of each factor on the total sample after controlling for all other factors. The LME models in this study had higher fit indices, as measured by R 2 adj , than the corresponding tree models but were also much more complex. Notably, R 2 adj was at or below 50% for all models, which is unsurprising given a likely genetic component to youth mental health outcomes that cannot be explained by environmental factors. For most variables identified as statistically significant in the LME models but not included in the corresponding decision trees, the LME magnitude of association was small. One exception to this was having been bullied in the past 30 days, which had a large magnitude of association with anxiety and depression outcomes. Post hoc t-tests found a moderately strong negative association between bullying and SC (p < 0.0001, Cohen's d 0.588), suggesting that the impact of bullying on groups with higher GAD-7 and CESD-10 scores may already be accounted for through differentiation on SC in the tree models. This hypothesis is supported by previous studies which have found SC to be a mediating factor in the relationship between bullying and mental health indicators [73,74]. Aside from this factor, the decision trees captured the key distinguishing factors in more parsimonious and easily interpretable and flexible models than LME, allowing for effective knowledge translation. Decision trees also identified underlying non-linear associations for SC, as can been seen by this factor being split recursively across different cut points. This highlights the ability of decision trees to capture complex relationships that are often missed when using standard regression analysis.

Strengths and Limitations
This is one of the first studies to use decision tree methods to examine youth depression, anxiety, and flourishing outcomes and associated behavioural and interpersonal risk factors. Unlike past research which commonly used regression approaches, the use of decision trees allows for the identification of key differentiating factors and high-risk subgroups. This study also used hierarchical RE-EM trees which properly account for the clustered nature of the data and are novel to public health research. However, decision tree techniques have limitations, including lower prediction accuracy than other methods, and a tendency to overfit the sample data which is only partially mitigated by pruning. Additionally, while this study benefits from a large sample size, the sampling method used is not representative and therefore results may not be reflective of all Canadian youth. Additionally, this study uses self-report data and thus mental health indicators are not based on clinical assessment. Further, this study is cross-sectional and thus temporality between risk factors and mental health outcomes cannot be inferred. Notably, perceptions of happy home life and school connection may be consequences of or bi-directionally associated with mental health status. Further longitudinal research into the directionality of associations is warranted. Lastly, some measures used in this study contained meaningful amounts of missing data, which could introduce bias if missingness does not occur completely at random. LME models use maximum likelihood estimation and are unbiased when outcome missingness can be explained by the observed covariates; however, this assumption is untestable. RE-EM trees handle missing covariate data using surrogate splits but cannot correct for missing outcome data. Mental health scales were person-mean imputed for students missing 1 or 2 items to partially recover missing responses. Multiple imputation has been suggested as the preferred approach to handle missing data in Likert-type scales such as the CESD [75]; however, there is limited research into how to apply multiply imputed datasets to the generation of a single interpretable decision tree.

Conclusions
This study found that, across a range of interpersonal and behavioural factors, having a happy home life and SC were key differentiators of youth anxiety, depression, and flourishing levels. This highlights the importance of the influence of home and school environments on youth mental health and supports calls for national policy focus and investment in family and school resources. While having a happy home life is most important in protecting against youth anxiety and depression, a sense of connection to school may mitigate the negative influence of a poor home environment. Schools can also play a meaningful role in contributing to positive mental health among students by cultivating a sense of belonging.