1. Introduction
Improved assessment of male depression is gaining momentum internationally as a means of reducing male suicide, and the construct of a distinct clinical phenotype is central to this work [
1,
2,
3]. Meta-analytic research shows that depression is a significant risk factor for suicide [
4], with both male gender and the misuse of alcohol or drugs as important predictors [
5]. Underscoring the gendered nature of the problem, worldwide, suicide occurs 1.8-times more frequently among men than women [
6]. The growing recognition of suicide as a gendered phenomenon has led to a greater focus on risk factors experienced by men [
7,
8]. Building on early qualitative work introducing the possibility of a unique profile of externalising and male-type symptoms experienced by some depressed men [
9,
10,
11,
12,
13,
14], practitioners have developed and validated a number of male-specific depression screening tools [
15,
16,
17,
18,
19]. Currently available male-specific measures have sought to assess a broader range of symptoms (relative to prototypic depression measures) that align with men’s socialisation and gender norm processes. For example, emotional restrictiveness and self-reliance are often promoted among men [
20], while externalising behaviours (anger, aggression, alcohol use) may be condoned as responses to male distress [
21,
22,
23]. In seeking to assess these broader domains, male-specific depression measures include symptoms assessing anger and irritability, substance misuse, risk-taking and recklessness, and non-externalising manifestations, including emotion suppression and somatic symptoms—all of which largely fall outside the prototypic symptoms of major depressive disorder [
24]. Whereas use of male-specific depression measures continues to grow, psychometric studies are lacking [
25].
To date, the most widely validated of the currently available male-specific tools is the Male Depression Risk Scale (MDRS; [
26]). Developed using exploratory and confirmatory factor analysis, the MDRS has shown test-retest stability [
27], in addition to good sensitivity in detecting men’s suicide risk and cross-nation factor structure stability [
28]. The MDRS progresses the early pioneering work of the Gotland research programme into men’s depression, established in Sweden in the 1990s. Whereas the male-type depression syndrome, as originally articulated by Rutz et al. [
17], has become a topic of significant interest in the men’s mental health literature [
2], conclusive studies are yet to categorically support or refute the construct, with debate enduring [
29].
One of the main methodological challenges for advancing the field is the predominance of cross-sectional studies and absence of men’s depression symptom trajectories modelled over time [
30]. In seeking to address this gap, foundational work by Rice at al. [
27] examined male depression trajectories over 16 weeks relative to stressful life events. In comparison to females, males experiencing stressful life events reported elevated MDRS scores. However, while these early results provided some supporting evidence for a differential male depression symptom trajectory, the design was limited by data collection at only two time points, precluding opportunities for complex modelling to account for within and between person changes beyond simple group means.
Using three waves of data (baseline, 3 months, 6 months), we examined a multiple-group (e.g., current treatment yes/no), multiple-domain latent growth model (MDLGM) comparing longitudinal trajectories for the MDRS and Patient Health Questionnaire–Depression Module (PHQ; [
31]), a widely used screening tool of the nine criterion symptoms contributing to a diagnosis of major depression [
24]. The present study had four overarching aims. First, the study aimed to provide psychometric reliability data on the MDRS, benchmarked against the PHQ, to determine the relative longitudinal internal consistency values using omega and alpha coefficients across the three waves (Aim 1). Second, the study aimed to evaluate structural equation model fit indices for a multiple-group, multiple-domain latent growth model approach including treatment as a covariate predictor (Aim 2). Third, the study sought to determine whether baseline differences existed on PHQ and MDRS scores according to whether men self-reporting a mental health problem were or were not accessing treatment at baseline (Aim 3). Finally, the study aimed to assess if trajectories of change for PHQ and MDRS varied as a function of experiencing mental health problems either with or without treatment engagement (Aim 4).
3. Results
3.1. Sample Characters
On average, the participants were aged 38.35 years, standard deviation (SD) = 14.09 (range = 18–73). Men who were currently accessing treatment were, on average, 4.5 years older (M = 40.37, SD = 13.67) than those not currently in treatment (M = 35.69, SD = 14.27), t(232) = −2.54, p = 0.012), and tended to be in higher income brackets (χ2(5) = 11.85, p = 0.037). There were no group differences for sexual orientation (76.1% heterosexual; 10.7% homosexual; 11.1% bisexual), student status (22.2%), ethnicity (2.6% Aboriginal; 0.9%; African; 2.1% Asian; 0.9% Hispanic; 84.2% Caucasian; 6.0% multiple ethnicities; 3.4% other), relationship status (45.7% single; 29.1% married; 15.4% committed relationship; 6.0% divorced, 3.8% separated) or self-rated general health (5.1% excellent; 18.4% very good; 40.2% good; 30.3 fair; 6.0% poor). Among the 234 men self-reporting a mental health problem at baseline, 133 (56.8%) reported that they were currently accessing mental health treatment. This decreased to 78 (33.3%) and 71 (30.3%) at 3 and 6 months. Most participants resided in Canada (n = 138; 59.0%), with the remaining participants residing in the US (n = 42; 17.9%), UK (n = 18; 7.7%), Australia (n = 19; 8.1%) or elsewhere (n = 17; 7.3%).
3.2. Baseline Differences—PHQ, MDRS
Skewness and kurtosis values were all within the normal range ±2.0, supporting univariate normality, with multivariate normality established via elliptical plots [
39]. Descriptive statistics for the individual MDRS items and MDRS and PHQ total scores are presented in
Table 1.
Consistent with a help-seeking population, on average, at baseline, participants were in the ‘moderate depression’ range on the PHQ (
M = 16.52,
SD = 6.34) and the ‘elevated risk’ range for the MDRS (
M = 45.00,
SD = 21.10). Three MANCOVAs were conducted with baseline PHQ and MDRS items, and the six MDRS subscales as dependent variables, current treatment engagement as the independent variable and age as covariate. There was no multivariate effect observed for current treatment for the PHQ items (Λ = 0.947,
F(9, 223) = 1.40,
p = 0.191, partial η
2 = 0.053). In contrast, there was a large multivariate effect for the 22 MDRS items (Λ = 0.835,
F(22, 210) = 1.89,
p = 0.012, partial η
2 = 0.165), which attenuated to a moderate multivariate effect when the six MDRS subscales were evaluated as dependent variables (Λ = 0.931,
F(6, 226) = 2.78,
p = 0.012, partial η
2 = 0.069). Age was not a significant covariate, at either the multivariate or univariate level, for any analysis. As can be seen from
Table 1, three MDRS item scores were significantly higher for men not in treatment than those currently in treatment. At the MDRS subscale level, those not in treatment reported higher scores than those currently in treatment for the emotion suppression
F(1, 231) = 8.91,
p < 0.001, partial η
2 = 0.037 and risk-taking domains
F(1, 231) = 7.69,
p = 0.006, partial η
2 = 0.032. Finally, ANCOVAs were undertaken for the MDRS-22 and PHQ total scores. As shown in
Table 1, higher baseline MDRS-22 scores (but not PHQ-9 scores) were observed for those not engaged in current treatment
F(1, 231) = 6.35,
p = 0.012, partial η
2 = 0.027.
3.3. Internal Consistency and Correlations Across Waves
To examine Aim 1, reliability coefficients were evaluated. Both alpha and omega coefficients supported the reliability of the PHQ and MDRS, with comparable values reported for each scale across the three time points (see
Table 1 for MDRS subscales;
Table 2 for total scores). Robust (
p’s < 0.001) intercorrelations were observed between the PHQ and MDRS total scores ranging from moderate to strong associations.
3.4. Latent Growth Modelling
To examine the subsequent aims, we first conducted a MDLGM model to establish model fit and added current treatment at baseline as a time-invariant predictor. Linearity was confirmed, as we observed a decrement to the chi-square value for the competing quadratic (e.g., curved) model. As significant associations were expected between modelled slope and intercept values for the PHQ and MDRS, these terms were allowed to correlate in the model. Initial model fit indices indicated that there was need for model improvement (CFI = 0.954, TLI = 0.902, RMSEA = 0.182, SRMR = 0.0259). Significant covariance estimates were observed between the PHQ and MDRS intercepts and slopes, indicating that these variables tended to vary in similar ways across the time points. Modification indices were inspected, indicating that a substantial parameter change would occur by freeing (e.g., correlating) the error terms for the PHQ and MDRS at the 3-month time point. Atheoretical post-hoc model re-specification should be avoided, as it risks incorrect model specification [
44]. However, given that previous research has highlighted the significant positive longitudinal association between the PHQ and MDRS [
27], there was a rationale for permitting the error estimates for these variables to correlate, especially given the association between these constructs within the same (e.g., 3-month) time point [
45].
Addressing Aim 2, the re-specified model yielded excellent fit statistics (CFI = 0.992, TLI = 0.980, RMSEA = 0.070, SRMR = 0.0281), which became the basis of interpretation and further analysis. The slope values for both the PHQ (−1.376, p < 0.001) and MDRS (−3.917, p < 0.001) were negative, showing that scores for both the PHQ and MDRS, on average, decreased between baseline and 6 months (e.g., symptoms marginally improved, with scores becoming less severe by one point on the PHQ and almost four points on the MDRS).
When the within-domain covariance was examined (e.g., covariance between the intercept and slope related to the same construct), the estimated covariance between the intercept and slope factors for PHQ was not statistically significant (
p = 0.193). This indicated no difference in the PHQ rate of change between baseline and 6 months relative to baseline PHQ scores. In contrast, the estimated covariance between the intercept and slope factors for MDRS was statistically significant (
p < 0.001). The negative estimate value (−66.720) suggests that men whose MDRS scores were high at baseline demonstrated a lower rate of change in these scores over the 6-month period than was the case for men whose MDRS scores were lower at Time 1 (even though, on average, MDRS scores went down over time, men with higher baseline MDRS scores improved less quickly than men with lower MDRS scores). Turning to the first between-domain covariance (MDRS slope/PHQ slope), a very strong relationship between the standardised coefficients (
r = 0.823;
p < 0.001) indicated a longitudinal association between the MDRS and PHQ (as men’s MDRS scores between baseline and 6 months underwent a strong decrease, so too did their PHQ scores). Similarly, the covariance for the PHQ and MDRS intercepts was also significant (r = 0.667;
p < 0.001), indicating that men reporting higher MDRS scores also tended to have higher PHQ scores. These findings revealed robust inter-individual differences in both the initial scores of PHQ and MDRS at baseline and their change over 6 months. Such evidence of inter-individual differences provides powerful support for further investigation of variability related to the growth trajectories [
39], in particular, the incorporation of predictors into the model to explain variability.
3.5. Effect of Current Treatment
Provided with evidence of inter-individual differences, we then asked whether, and to what extent, current treatment might explain this heterogeneity. In particular, we asked if PHQ and MDRS scores differed for those who were either currently accessing or not accessing treatment (Aim 3). Additionally, we asked if trajectories of change for the PHQ and MDRS varied as a function of experiencing mental health problems either with or without current treatment (Aim 4). The subsequent model, including the predictor of baseline current treatment, reported good model fit χ
2(8) = 20.23,
p = 0.010, CFI = 0.964, TLI = 0.986, RMSEA = 0.081, SRMR = 0.033 (see
Figure 1).
Table 3 shows that current treatment was not a statistically significant predictor of PHQ scores at baseline (−1.202,
p = 0.145), but current treatment did predict PHQ rate of change (0.897,
p = 0.033). Given a coding of 0 for ‘no current treatment’ and 1 for ‘current treatment,’ these findings suggest that the rate of change was faster (by 0.897 PHQ points over 6 months) for those reporting baseline current treatment than for those reporting no current treatment. Results for the MDRS indicated that current treatment was a statistically significant predictor of both initial MDRS severity (−7.476,
p = 0.006) and rate of MDRS change (2.749,
p = 0.018). These findings suggest that MDRS scores were lower (e.g., better) for men reporting current treatment, and men reporting current treatment reported a faster rate of improvement on MDRS scores by 2.749 points over the 6-month period compared to men not accessing treatment. When current treatment at 3 and 6 months were introduced as time variant predictors, the multidomain model reported very poor model fit according to all indices (CFI = 0.886, TLI = 0.773, RMSEA = 0.172, SRMR = 0.129). This indicated that the present dataset was unable to test the longitudinal impact of treatment, a likely function of the comparatively small sample size for complex SEM models.
In terms of the effect magnitude (R2 values), while still proportionally low, current treatment accounted for over three-times the variance in MDRS intercept values (3.4%) than it did for PHQ intercept values (1.1%). There was no differentiation for current treatment on the rate of change for PHQ or MDRS (3.2%, respectively). For baseline PHQ scores, 76.0% of the variance was accounted for by predictors (e.g., by the intercept, slope, and current treatment). In contrast, 92.0% of the variance in baseline MDRS scores was explained. This indicates that the MDRS trajectory model was better able to account for baseline MDRS scores than the PHQ trajectory model was for predicting baseline PHQ scores. At 3 and 6 months, the proportion of variance accounted for was largely equivalent between the PHQ (3 months: 64.1%; 6 months: 86.2%) and MDRS (3 months: 65.5%; 6 months: 88.3%). This shows that the PHQ and MDRS appeared to have similar measurement utility over time, while also suggesting that due to the higher R2 value at baseline, the MDRS may better identify men’s symptom domains relative to current treatment compared to the PHQ.
4. Discussion
Longstanding commentaries and emergent empirical work have highlighted the possibility that men’s depression may be missed clinically as a by-product of residing outside generic screens (e.g., the PHQ), which may be insensitive to men’s socialisation processes and internalised traditional gender norms [
46,
47]. By engaging a comparison of the MDRS and PHQ longitudinally in a sample of men who were in and out of treatment, the current study made available critically important clinical considerations. The PHQ is a widely used and validated measure of prototypic depression symptoms [
32] and is therefore an important point of comparison for the MDRS. In this study (and consistent with prior work [
27]), both scales reported satisfactory internal consistency across the three waves of data for the alpha, with more rigorous omega reliability coefficients. The finding that reliability coefficients between the two scales were equivalent indicates that the MDRS and PHQ consistently measure their target constructs. That said, in comparison to the PHQ, the MDRS may have greater sensitivity to detecting change for men currently in treatment (and who were reporting a mental health (MH) problem) compared to those not in treatment (and who were reporting a MH problem). Supporting the utility and responsiveness to treatment, for both PHQ and MDRS, improvement was faster for men reporting a mental health problem who were currently accessing treatment. Of note, the baseline between-group analysis indicated no difference for PHQ scores but a significant difference for MDRS scores relative to current treatment, which suggests that the MDRS may be better able to differentiate men’s treatment response to mental health intervention than the PHQ. Nonetheless, this is a finding that needs to be replicated in future work.
Broadly speaking, mean baseline MDRS subscale and item scores were higher in the present sample compared to prior MDRS research undertaken with the general population [
28]. This is perhaps unsurprising given that the participants in the present sample were seeking information on men’s depression via the
HeadsUpGuys website [
33], and were therefore more likely to be symptomatic compared to men in the general population. Of the six MDRS subscales, the emotion suppression and risk-taking domains were significantly higher at baseline for men not accessing treatment than treatment-engaged men. At the individual MDRS item level, these effects appeared largely driven by three items assessing stoicism (e.g., working things out independently), and recklessness (e.g., stopping caring about consequences of actions, taking unnecessary risks), though nonsignificant univariate trends (
p < 0.10) for higher scores in men not engaged in treatment were also observed for items assessing bottling up negative feelings, overreaction with aggressive behaviour and requiring drugs to cope and somatic symptoms (e.g., heartburn, aches). Though speculative, it may be the case that these domains serve as treatment barriers in their own right by men trying to avoid or suppress uncomfortable emotions deliberately (a cognitively demanding state that confers health risks [
48,
49]) or alternatively through distraction routines that may co-occur with risk taking behaviours (possibly as means of enacting a sense of control [
9]). The MDRS total score subsumes these domains, which are strongly correlated with the PHQ, yet also distinct from the internalising prototypic depression symptoms that are assessed by the PHQ (e.g., anhedonia, sadness, guilt), which is a strength of the scale in providing a broader perspective on men’s depression or distress.
When the dual growth curve models were evaluated without current treatment as a predictor, the within-domain covariance indicated that there was no difference on the PHQ rate of change between baseline and 6 months relative to the baseline PHQ score. In contrast, the MDRS slope was significant (p < 0.001) with a negative estimate value, indicating that men with higher baseline MDRS scores improved slower than men with lower MDRS scores. This shows that MDRS severity at baseline (but not PHQ severity) reduces the rate of change that can be expected, indicating that when MDRS domains are more severe, change is harder to achieve. This finding suggests a potential differential responsiveness to treatment assessed by the PHQ and MDRS, especially for those men at the severe end of the scales. The strong between-domain covariance (r = 0.823, p < 0.001) indicated that the trajectory of MDRS score changes co-occurred with PHQ changes, and the moderate-strong intercept correlation indicated that men reporting higher MDRS scores also reported higher PHQ scores. These findings show that the MDRS and PHQ domains ‘travel together in time,’ supportive of the putative function of domains assessed by the MDRS that place men at risk of major depression.
The observed inter-individual variation justified the inclusion of a predictor to explore potential group differences. Given that the entire sample self-indicated the presence of a mental health problem, engagement in current treatment was considered an important variable to explore. Goodness-of-fit indices indicated that including current treatment as a predictor resulted in excellent model fit (although if the more stringent criteria of RMSEA < 0.08 is applied [
39], the RMSEA value could be considered marginal), highlighting the importance of this variable in accounting for the observed inter-individual variation. Results indicate that men currently in treatment reported significantly lower MDRS scores at baseline, but they did not report significantly lower PHQ scores at baseline. This is of note, as the initial dual model indicated that unlike PHQ scores, men with higher MDRS scores experienced less improvement over time. Whereas the present data did not allow us to identify how long men were in treatment (which may impact PHQ and MDRS scores), the models included three waves of data from all participants, and the MDLGM approach accounted for inter-individual difference. Regardless of the amount of time men had received current treatment, those currently in treatment tended to have lower MDRS scores, but not lower PHQ scores than men reporting a mental health problem and not in treatment. Therefore, while MDRS severity resulted in less change over time, MDRS domains appeared amenable to intervention, and this change can be assessed by the scale. Further, when examining the slope statistics (e.g., improvement over time), both the PHQ and MDRS improved more rapidly for men currently in treatment compared to those not in treatment (again, this shows that both scales are able to detect change associated with current treatment). These findings support application of the MDRS in clinical settings.
A range of study limitations and future directions should be considered. Whereas the present sample was strengthened by the use of three waves of data, it was limited by size, as samples of 200 are considered the minimum for valid growth curve analyses. Nonetheless, we observed a robust correlation between the PHQ and MDRS slopes, and Lee and Whittaker [
37] suggest that researchers can be confident in statistically significant group differences when effect sizes are at least moderate (e.g., 0.50) in sample sizes as small as 200. That said, if the sample exceeded 400, then we could expect to have observed sufficient power to estimate time variant effects and potentially a more favourable RMSEA value [
41]. It is also important to bear in mind that results may be biased as a function of the comparatively small (8.3%) proportion of study respondents from the larger baseline sample (
n = 3769) who identified a mental health problem and provided data at waves 2 and 3. Our sampling method also introduced the risk of bias, given participants visiting the
HeadsUpGuys site were help-seeking or proactively seeking information on men and depression. This limits the generalisability of findings to non-help-seeking populations. Nonetheless, severity of prototypic depression symptoms was equivalent at baseline between those accessing and not accessing mental health support. Hence, group effects were not due to differences in depressive severity. Furthermore, while analyses explored whether men were engaged in current treatment at baseline, model fit indices indicated that inclusion of current treatment as a time variant covariate at 3 and 6 months was yielded a poor fit to the data. We were unable to determine whether this was a function of sample size (which we believe as the likely explanation given the good model fit achieved when baseline current treatment was a covariate), or was instead suggestive that the longitudinal modelling of current treatment inadequately explains differing growth trajectories for the PHQ or MDRS. Again, future longitudinal studies drawing on larger samples are therefore needed. In the present study, participants self-identified as having a mental health problem. This was not validated by clinical interview or diagnosis. Further, the actual mental health problem(s) that men were referring to was not captured, and data were not captured on the severity or duration of this problem or treatment modality accessed. All participants were recruited via the
HeadsUpGuys website which provides men’s depression psychoeducation. Results should be verified in a broader population of men who are not necessarily actively seeking mental health information. It is recommended that further psychometric evaluation of the MDRS be undertaken, including invariance testing and the further establishment of a hierarchal factor structure model where the six MDRS domains load on a latent factor for male depression risk, e.g., [
26], or testing a unidimensional model of the MDRS regarding use of the scale total score, e.g., [
50]. At present, the MDRS uses an 8-point response scale, and as a 22-item tool, it may be too lengthy for use in primary care settings. Efforts are currently underway to validate an MDRS short form using a condensed response format, in addition to cross-nation validation and translation [
51]. These developments may increase the likelihood of the MDRS being used adjunctively with brief prototypic depression measures such as the PHQ.
The construct of intersectionality and its application to the field of men’s mental health is growing [
52]. Intersectionality focuses on the connecting and overlapping aspects of personhood and identity [
53] and is increasingly used in the field of gender and health internationally [
54]. Better understanding the ways in which prototypic and male-specific symptoms of depression intersect with domains of sexuality, social class, race/ethnicity and income, and corresponding links to maladaptive behaviours and suicide risk is an important future endeavour for the field [
55]. Finally, given that the present data was collected from a help-seeking population, it was not suited to evaluating whether the MDRS is able to detect a subgroup of men that may be missed on the PHQ but who go on to develop a major depressive illness. An answer to this question is needed to evaluate the true value and utility of the MDRS as (i) a measure of a potential prodromal depression state for men or (ii) conclusively determining whether the putative male depression subtype exists.