Gender Differences in Treatment Outcomes for Eating Disorders: A Case-Matched, Retrospective Pre–Post Comparison

Eating disorders (EDs) are increasingly emerging as a health risk in men, yet men remain underrepresented in ED research, including interventional trials. This underrepresentation of men may have facilitated the development of women-centered ED treatments that result in suboptimal outcomes for men. The present study retrospectively compared pre- vs. post-treatment outcomes between age-, diagnosis-, and length-of-treatment-matched samples of n = 200 men and n = 200 women with Anorexia Nervosa (AN), Bulimia Nervosa (BN), Binge Eating Disorder (BED), or Eating Disorder Not Otherwise Specified (EDNOS), treated in the same setting during the same period, and using the same measurements. Compared to women, men with AN showed marked improvements in weight gains during treatment as well as in ED-specific cognitions and general psychopathology. Likewise, men with BED showed marked weight loss during treatment compared to women with BED; ED-specific cognitions and general psychopathology outcomes were comparable in this case. For BN and EDNOS, weight, ED-specific cognitions, and general psychopathology outcomes remained largely comparable between men and women. Implications for treatments are discussed.


Introduction
Eating disorders (EDs) are of increasing public health concern [1]. Characterized by body image disturbances, abnormal eating patterns, and weight-control behaviors [2], an estimated 2.6 to 8.4% (women) and 0.7 to 2.2% (men) of the global population suffer from Anorexia Nervosa (AN), Bulimia Nervosa (BN), Binge Eating Disorder (BED), and other EDs during their lifetime [3][4][5]. EDs pose one of the highest mortality risks among mental disorders [6] and are associated with adverse physical and mental health outcomes across multiple domains of functioning [7]. Globally, the disability-adjusted life years (DALYs) for EDs amount to 43.36 per 100,000 individuals, with data for Western Europe suggesting a burden of 112.27 DALYs [8]. The healthcare and economic costs of untreated EDs are substantial [9], emphasizing the importance of tailoring treatments toward the needs of patients with EDs.
Counter to the widespread perception that EDs primarily affect adolescent girls and women [10,11], EDs increasingly emerge as a health risk in men [12][13][14]. Although overall still lower, men's prevalence rates have increased faster than women's prevalence rates since 1990 (by 22% vs. 12% to 117.9 vs. 231.5 per 100,000 men and women in 2019, respectively [15]). Similarly, men's DALYs increased by 0.70% annually, compared to a 0.63% annual increase for women [8]. These findings suggest that men could make up every third clinical ED case, although there is agreement that the available data still have you consciously tried to limit the amount of food you eat to affect your figure or weight?"), Eating Concern (e.g., " . . . has thinking about food, eating, or calories made it difficult for you to focus on things that interest you?"), Weight Concern (e.g., " . . . did you have a strong desire to lose weight?"), and Shape Concern (e.g., " . . . did you feel fat?"). Items were rated on a 7-point scale (from 1, never, up to 7, every day). Mean scores are computed for each subscale and a global score is computed for the overall questionnaire. Six additional open-ended questions assess the frequency of compensatory behaviors and objective binge episodes.
Patients' body perception and body image were assessed using the FBeK (Fragebogen zur Beurteilung des eigenen Körpers), which is a widely used questionnaire in Germany for assessing individuals' subjective views of their own bodies [57]. The FBeK includes 52 statements evaluated in a yes or no format and assesses body perception and body image on four subscales related to Physical Attractiveness and Self-confidence (e.g., "I am satisfied with my weight and with my size"), Accentuation of Physical Appearance (e.g., "I often and gladly look at myself"), Insecurities and Concerns related to bodily processes (e.g., "My body has a mind of its own"), and Physical/Sexual Discomfort (e.g., "I do not like being touched"). The manual provides gender-based percentile ranks of subscale means that were used for analysis.

General Psychopathology
General psychopathology was assessed using the symptom checklist SCL-27-plus, a short, multidimensional screening instrument that contains 28 items across five subscales for depressive (e.g., "loss of joy"), vegetative (e.g., "nausea"), agoraphobic (e.g., "fear of leaving the house alone"), and sociophobic symptoms (e.g., "feeling of being unwanted") as well as a subscale for pain (e.g., "headache"), a global symptom severity index, a lifetime assessment for depressive symptoms, and screening questions for suicidality [58]. Symptoms are rated on a 5-point scale (from 0, never, to 4, very often), with additional dichotomous ratings for lifetime depression (occurrence of depressive symptoms for more than two weeks), and frequency estimates for suicidal ideations and suicide attempts. Among patients with Eds, the SCL-27-plus mean scores have demonstrated good reliabilities and sensitivity to change [59].
The Beck Depression Inventory (BDI-II) [60] was also included as a widely used self-report inventory to measure the severity of depression in adults. The BDI-II contains 21-items, each scored on a 4-point scale, with sum scores ranging between 0 and 63.

Statistical Analysis
Patient and treatment characteristics were compared between ED and gender groups using univariate analysis of variance (ANOVA).
Weight outcomes were transformed to body mass index (BMI) (kg/m 2 ) scores using patient admission height data. Similar to Strobel et al. [51], raw BMI values were further transformed into age-and gender-standardized z-scores using the lambda-mu-sigma (LMS) method [61] and German general population reference data [62,63]. Z-scores indicate the deviation of patient BMI relative to the population mean, allow extremes to be quantified outside the percentile range, and are comparable independent of age and sex [64]. We compared admission zBMI using independent samples t-test. Changes from admission to end-of-treatment and gender-based comparisons of zBMI change were analyzed using univariate analyses of covariance (ANCOVAs), with initial admission levels and length of treatment as co-variates. Weight trajectories, i.e., the timing-dependent changes from admission at different timepoints during treatment (zBMI timepoint -zBMI admission ), were further analyzed as a function of gender, ED group, and treatment timepoint using linear mixed-effects (LME) modeling in R package lme4 Version 1.1.28 (R Core Team, Vienna, Austria) [65]. LME models describe an outcome as the linear combination of fixed effects, i.e., the independent predictors, and random effects, such as patient variance. They are ideally suited to analyze continuous data from mixed designs in which each case provides a differently sized dataset [66]. Models were fitted using restricted maximum likelihood (REML) estimation and built empirically using likelihood ratio tests for model comparisons via R's ANOVA command. For model comparisons involving differences in fixed effects, models were refitted using maximum likelihood (ML) estimation. For parameter estimates of the fixed effects, p-values are based on Type III ANOVA as implemented in the R package car version 3.0.12 (R Core Team, Vienna, Austria) [67]. Pairwise comparisons used R package emmeans version 1.7.2 [68].
Patient questionnaire responses were aggregated according to each questionnaire's specifications. Admission differences were evaluated using independent samples t-test. In case of violated assumptions about homoscedasticity, t-tests with adjusted degrees of freedom (df) are reported. Changes from admission to end-of-treatment, and gender-based comparisons of change, were analyzed using univariate ANCOVAs with initial admission levels and length of treatment as co-variates.
For a subsample of 104 men who provided information on previous external inpatient or outpatient treatment, we conducted an additional set of analyses, as described above, using a matched subsample of 104 women, and using the number of previous treatments as an additional covariate. However, because these analyses yielded descriptively similar findings concerning gender differences in treatment outcomes compared to the full sample-with deviations in inferential statistics due to reduced power-these analyses are not reported in detail here.
Descriptive results are reported as means and standard deviations (SDs). The significance level for all analyses was set at p ≤ 0.05. Post hoc pairwise comparisons report Bonferroni-adjusted p-values for multiple comparisons. Effect sizes are reported as η 2 and Cohen's d. Instead of classical power calculation, which evaluates the strength of evidence against an arbitrarily defined effect, evidence strength for gender differences in treatment outcomes was evaluated using inclusion Bayes factor in Bayesian ANCOVA [69]. The inclusion Bayes factor provides a continuous measure of support for either H1 (gender modulates an outcome) or H0 (gender does not modulate an outcome) by quantifying the change from prior inclusion odds (i.e., the probability that gender is included as a predictor in a specific statistical model before seeing the data) to posterior inclusion odds (i.e., the probability of including gender in the statistical model after seeing the data). By convention, factors greater than three are considered as evidence for H1 and, vice versa, a Bayes factor smaller than 1/3 indicates evidence in favor of the null [70]. In other words, if the data are three times more likely with gender as a predictor than without gender in the model (BF incl ≥ 3), the data support H1. If, however, the data are three times more likely in the absence of gender than in its presence (BF incl ≤ 1/3), the data support rejecting H1 and accepting H0. Though any BF incl > 1 supports H1 and any BF incl < 1 supports H0, BF incl ranging from 1/3 to 3 are considered "anecdotal", suggesting that further research is needed.
Statistical analyses and case-control matching were conducted using SPSS Statistics version 28 for Windows (SPSS Inc., Chicago, IL, USA) [71]. Mixed models and plots were calculated in R version 4.1.3 (R Core Team, Vienna, Austria) [72]. Bayes factors were computed using Bayes ANCOVA in JASP version 0.16.1 (JASP Team, Amsterdam, The Netherlands) [73].
The resulting model (see Table 2) revealed significant main effects for treatment timepoint, p < 0.001, ED group, p = 0.004, ED group × timepoint and gender × timepoint interactions, ps < 0.001, which were qualified by a significant timepoint × gender × ED group interaction, p < 0.001. We investigated the interaction further by comparing gender groups within each diagnostic category at every seventh day of treatment, starting at admission (timepoint = 0) and ending after nine weeks (timepoint = 63), at which 90% of patients had concluded their treatment. With increased temporal resolution compared to an ANCOVA (see above), the LME-based comparisons (see Appendix A Table A3) revealed a persistent and significant advantage in weight gain for men over women with AN after the first week of treatment and a persistent and significant advantage in weight loss in men over women with BED after the first week of treatment. Weight change in patients with BN remained comparable until seven weeks of treatment, at which point the model estimated more weight loss in men compared to women, although men remained within the overweight BMI range and women remained within the normal BMI range. No significant gender differences in weight change were estimated at any point during treatment for patients with EDNOS. Figure 1. Patients' weight trajectories, i.e., the time-dependent changes in age-and gender-standardized (zBMI) from admission to different timepoints during treatment (zBMI timepoint -zBMI admission ), as a function of gender, ED group, and treatment timepoint. Curved regression lines (with 95% confidence bands) were fitted using function geom_smooth(), method ("gam"), as implemented in R package ggplot2 v. 3.3.5 (R Core Team, Vienna, Austria) [74]. ED = Eating Disorder, AN = Anorexia Nervosa, BN = Bulimia Nervosa, BED = Binge Eating Disorder, EDNOS = Eating Disorder Not Otherwise Specified, BMI = body mass index. We further evaluated the significance of week-by-week changes in zBMI scores within each gender and ED group, using area under the curve formulae for time-dependent changes [75], to determine the timepoints at which weight changes occurred during treatment. LME-based estimates revealed significant week-by-week weight increases in men with AN over nine weeks of treatment, all ps < 0.001. Women with AN showed weight increases starting in week 4, until week 9 of treatment, all ps < 0.01. Men with BED showed weekly weight reductions between week 1 and week 7 of treatment, all ps < 0.03, whereas women with BED showed weekly weight reductions between week 1 and 5 of treatment, all ps < 0.003. For men with BN, weight reductions were observed between weeks 1 and 4, all ps < 0.01, whereas women with BN showed weight reduction between weeks 1 and 3, all ps < 0.05. Men with EDNOS showed weight reductions during weeks 1 and 2, ps < 0.05, and a weight increase in week 9, p = 0.03. Week-by-week changes were not detectable among women with EDNOS, all ps > 0.13.

Eating Disorder Symptoms' Outcomes
Appendix A Table A1 includes the summary of EDE-Q outcomes. At admission, EDE-Q total scores and subscale scores for Restraint, Eating Concerns, Weight Concerns, and Shape Concerns were comparable between men and women with AN, all |t|s < 0.88, all ps > 0.39, men and women with BN, all |t|s < 0.57, all ps > 0.58, and men and women with EDNOS, all |t|s < 1.29, all ps > 0.22. However, men with BED had lower EDE-Q total and subscale scores than women with BED, all ts < −2.42, all ps < 0.02, suggesting overall lower ED symptom severity at the start of their treatment. There were no admission differences in the frequency of self-reported compensatory behaviors and objective binge episodes between genders within any ED group, all |t|s < 1.7, all ps > 0. 10.
Finally, we examined BDI outcomes (see Appendix A Table A1). Men with BED presented with lower severity of depression at admission compared to women with BED, t(225)= −2.25, p = 0.013 d = −0.29, 95% CI (−0.55, −0.04); other admission differences were not significant, all |t|s < 0.9, all ps > 0.38. There were, overall, no significant improvements in BDI scores from admission to end-of-treatment across groups, all Fs < 1.5, all ps > 0.24, and no gender differences for the comparison of change scores, all Fs < 3.8, all ps > 0.07. Figure 2 plots the inclusion Bayes Factors obtained from Bayesian ANCOVA on gender differences in ED treatment outcomes, separated by ED group (see also Appendix A Table A2). Except for weight outcomes and SCL-27-plus lifetime depression, which provide strong evidence for H1 (i.e., that gender modulates these outcomes), most outcomes compared between men and women with BED (19 of 26) favor H0 (gender parity). In other words, for most outcomes in patients with BED, the data are at least three times more likely under statistical models that do not include gender than under models with that predictor. For patients with AN, about a third of outcomes (8 of 26) provide at least moderate support for H0, with evidence for the remaining outcomes remaining anecdotal, despite nominally supporting H0. Due to their smaller sample sizes, most outcomes comparisons for patients with BN and EDNOS remain within the anecdotal range, though generally favoring H0 over H1.

Discussion
EDs increasingly emerge as a health risk in men [15], yet men remain underrepresented in ED research and interventional trials [18]. Addressing concerns that men's underrepresentation may have facilitated the development of women-centered ED treatments that result in suboptimal outcomes for men [26], we systematically compared immediate treat-ment outcomes between age-, diagnosis-, and length-of-treatment-matched samples of men and women with AN, BN, BED, and EDNOS, treated at the same clinic during the same time period, and using the same measurements. Compared to their female counterparts, men with AN showed improved weight gains during treatment and improved in ED-related cognitions and general psychopathology. Likewise, men with BED showed improved weight loss during treatment compared to women with BED, with ED-related cognitions and general psychopathology outcomes remaining comparable. For BN and EDNOS, weight, ED-related cognitions, and general psychopathology outcomes remained largely comparable between men and women.
The present findings add to an emerging yet, overall, still sparse body of studies that systematically compare treatment outcomes between men and women with EDs. Consistent with adolescent [45,46] and adult AN samples [51], we observed improved weight gains in men compared to women with AN throughout treatment. Although there were no significant gender differences when comparing age-and gender-standardized BMIs at admission to end-of-treatment, men showed more pronounced weight increases after the first week of treatment. However, the mechanisms responsible for these improved weight gains remain elusive. Moreover, and in contrast to Strobel et al. [51], who observed more pronounced reductions in ED-related cognitions in men with AN long-term, immediate reductions in ED-related cognitions at end-of-treatment remained comparable between men and women with AN in our sample. The increased weight change in the absence of gender differences in ED-related cognitions might suggest higher levels of therapy adherence (i.e., higher capacity for men to implement behavioral change despite the presence of ED-related cognitions) as a possible explanation for gender differences in weight gains. As noted above, however, the possibility remains that traditional ED-specific assessments may simply not have captured improved ED-related cognition outcomes due to traditional measures failing to account for men-associated symptomatology [19]. Further research is needed on the underlying mechanisms of improved weight gains in men with AN.
We observed a complementary pattern of increased weight loss in men with BED throughout treatment, consistent with gender differences found in clinical trial data [50]. The similarities further extend toward ED-specific psychopathology: men with BED in our study presented with less severe ED-related cognitions and a more positive body image than women with BED, although men showed similar improvements to women with BED due to their treatment. Again, the reasons for pronounced weight reductions with simultaneous parity in ED-related cognition outcomes remain elusive, as we cannot exclude that traditional ED-specific assessments may be less sensitive toward capturing men-specific ED psychopathology. Men with BED may show pronounced weight reductions due to higher levels of energy expenditure [76], although increased weight reductions could also reflect differences in therapy adherence. Given the paucity of studies on gender differences, especially for treatment outcomes in BED, further substantiation of these observations and their long-term consequences is needed. Shedding light onto the reasons that could be responsible for such differences between men and women may advance our understanding even further for the more tenacious course of weight gain in women's AN and support the design of corresponding interventions.
Gender differences were further examined for patients with BN and EDNOS. Like Strobel et al. [51], but unlike Fernández-Aranda et al. [49], we did not observe gender differences between men and women treated for BN. However, given the limited number of patients involved in the comparisons, we caution against strong interpretations. Similarly, we caution against interpreting the absence of gender differences among patients with EDNOS, although the overall pattern of findings favors gender parity.
Thus far, the role of gender and other diversity aspects remain poorly explored in ED treatment settings [13], raising the question of whether men and women with EDs should be treated differently. The present findings suggest two possible implications: First, evidence for gender parity in levels of ED-related and general psychopathology suggests that current diagnostics provide adequate tools for ED assessment across gender groups. At the same time, until future research has thoroughly examined, established or refuted the validity of these tools for cross-gender ED assessments, presumptions and stereotypical expectations about gendered ED presentation should not preclude men from receiving comprehensive diagnostics. Second, observed gender differences in the speed and magnitude of weight changes for AN and BED groups suggest that therapists should employ different criteria when evaluating ED treatment outcomes for men and women. However, further large-scale and controlled comparisons are required in order to develop more specific recommendations.

Strengths and Limitations
To the best of our knowledge, this is one of the first large-scale pre-/post-treatment comparisons of gender differences in ED treatment outcomes involving diagnosis-, age-, and length-of-treatment-matched men and women with AN, BN, BED, and EDNOS. All patients were treated in the same clinic during the same timeframe and completed the same standardized ED-specific and general psychopathology measures. Although the study design precluded strict control of treatment application, the resulting naturalistic setting allows for a more direct evaluation of ED treatment effectiveness across men and women.
Interpreting the current findings is subject to limitations. With data collected exclusively at an EDs specialty clinic, evidence for gender parity could be limited to the more severe ED cases admitted to inpatient treatment, or to treatment settings with high levels of expertise and experience. Moreover, given possible deviations from treatment protocols under naturalistic conditions and the retrospective nature of the study, we cannot exclude that therapists may have compensated for specific men's needs that are not addressed during standard treatment. We also only report on immediate treatment outcomes, raising the question of whether long-term outcomes would remain comparable among these patients.
Moreover, as mentioned above, diagnostic criteria and ED-specific assessment tools were developed primarily based on EDs in women's samples, questioning whether their use may have promoted phenotypical homogeneity among the men and women with EDs that were included in our sample. For example, we observed similar levels in weight and shape concerns between gender groups at admission, while previous research shows that men with EDs are often less concerned with thinness [22][23][24] and may seek to increase body mass and muscularity instead [25,26]. "Gold standard" measurement tools of ED psychopathology such as the EDE-Q used within this study [77] do not currently distinguish between drives for muscularity and thinness as non-exclusive causes of shape and weight concerns, leaving the possibility that gender differences in ED psychopathology could have been present without being detected. The extent to which men-associated body image concerns may have or may not have been adequately addressed thus cannot be answered based on the present data.
Finally, the current data provide only limited insights into gender differences concerning risk factors and antecedents of ED development. Still, the observed similarities in ED and general psychopathology at admission may suggest shared risk factors across genders, though their particularities might vary. For example, engaging in sharing technologically enhanced ("filtered") images and comparisons on social networking sites has been linked to body dissatisfaction and eating disorder risk in adolescent women [78], and similar patterns have been observed concerning idealized representations of muscularity in men [79]. However, similar to other aspects of diversity in ED research, gender differences in ED development remain poorly explored [13]. Therefore, further research on ED presentation, assessment, and treatment in men is warranted.

Conclusions
Gender differences in ED treatment outcomes remain under-explored. Our data provide at least moderate support for gender parity and against gender differences among ED and general psychopathology outcomes in BED and AN treatment, with weight out-comes favoring men over women. Further research on underlying mechanisms and gender differences among men-associated ED outcomes is needed.

Data Availability Statement:
The data presented in this study are available upon reasonable request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Tables A1 and A2 present additional descriptive and inferential statistics for ED outcomes analyses, respectively. Table A3 includes additional results for LME modelling of weight outcomes.      * Inferential statistics for univariate analysis. Admission differences were evaluated using independent samples t-test. In case of violated assumptions about homoscedasticity, t-tests with adjusted degrees of freedom (df) are reported. Changes from admission to end-of-treatment, and gender-based comparisons of change, were analyzed using univariate ANCOVAs with initial admission (baseline) levels and length of treatment as co-variates. See