Gender Differences in Developing Biomarker-Based Major Depressive Disorder Diagnostics.

The identification of biomarkers associated with major depressive disorder (MDD) holds great promise to develop an objective laboratory test. However, current biomarkers lack discriminative power due to the complex biological background, and not much is known about the influence of potential modifiers such as gender. We first performed a cross-sectional study on the discriminative power of biomarkers for MDD by investigating gender differences in biomarker levels. Out of 28 biomarkers, 21 biomarkers were significantly different between genders. Second, a novel statistical approach was applied to investigate the effect of gender on MDD disease classification using a panel of biomarkers. Eleven biomarkers were identified in men and eight in women, three of which were active in both genders. Gender stratification caused a (non-significant) increase of Area Under Curve (AUC) for men (AUC = 0.806) and women (AUC = 0.807) compared to non-stratification (AUC = 0.739). In conclusion, we have shown that there are differences in biomarker levels between men and women which may impact accurate disease classification of MDD when gender is not taken into account.


Introduction
Major depressive disorder (MDD) is a major cause of disability, economic burden and mortality around the world [1]. Appropriate treatment early in the course of the disease has been shown to improve prognosis [2] but requires timely and accurate diagnosis.
Traditionally, the diagnosis of major depressive disorder is based on a clinical interview in which subjectively experienced and in part observable symptoms are identified and categorized according to the Diagnostic and Statistical Manual of Mental Disorders (DSM) or the International Classification of Diseases (ICD) [3,4]. However, the diagnosis of major depressive disorder as produced by these Mean body mass index (BMI) was slightly higher in the MDD than in the control group, and this difference was statistically significant ( Table 1). The average number of symptoms (range) was 7.38 (0-9) for males and 7.73 (0-9) for females. Medication was more prevalent in MDD cases than in controls, but no substantial differences between men and women were observed. Medicines used belonged to one of the following categories; neuropsychotropic, cholesterol, cardiovascular, blockers, immune system, metabolic, corticosteroids, (para)sympathetic and other types of medication. Various participants used multiple forms of medications.

Gender Differences in Biomarker Levels Irrespective of MDD Status
Serum and urine biomarker concentrations for men and women in the total cohort are presented in Table 2. Out of 28 biomarkers analyzed, concentrations of nine biomarkers in serum significantly differed between men and women and twelve in urine. In serum, concentrations of BDNF total, endothelin and TNF receptor 2 were significantly higher in men, whereas concentrations of alpha-1-antitrypsin, apolipoprotein A1, cortisol, leptin, prolactin and resistin were significantly higher in women. In urine, concentrations of alpha-1-antitrypsin and midkine were significantly higher in males, whereas concentrations of aldosterone, calprotectin, cGMP, cortisol, HVEM, isoprostane, leptin, myeloperoxidase, resistin and substance P were significantly higher in females. From all significant biomarkers, the effect sizes were the highest for leptin within serum and cGMP within urine. The lowest effect sizes were found for alpha-1-antitrypsin within urine and BDNF total within serum.

Gender Differences in Biomarker Levels According to MDD Status
From all MDD cases, one can assume that they retain their MDD status irrespective of the day of sample drawing (See Supplementary Figure S1 for the Kaplan-Meier analysis). Mean biomarker concentrations according to MDD status are presented in Table 3 with and without gender stratification. Mann-Whitney U and Levene's test values are presented in Supplementary Table S1. In the total population, concentrations of eight biomarkers in serum and five biomarkers in urine were significantly different between MDD and controls. Resistin was the only marker that was significantly different in both serum and urine. When a gender stratification is applied, seven biomarkers in serum and four in urine, in men, differed between MDD and the control group. Among women, four biomarkers were different between MDD and the control group within serum, and another four biomarkers differed in urine. From all biomarkers identified by applying gender stratification, six biomarkers were only identified after gender stratification and were not identified as significant during the initial analysis. There were three biomarkers (apolipoprotein A1, thromboxane B2, and BDNF total) in serum and three biomarkers (calprotectin, cortisol, α1-antitrypsin) in urine. All serum markers were significant for men. From the urine biomarkers, calprotectin was significant within women, whereas cortisol and α1-antitrypsin were only significant within men.
A comparison of biomarker levels between men and women with MDD as presented in Table 4 revealed that out of 28 biomarkers, levels of 16 biomarkers were significantly different, seven within serum and nine in urine.  Table 5 represents quantile-based prediction (QBP) identified relevant biomarkers with and without stratification. Twenty-eight out of 29 biomarkers were deemed relevant in the non-stratified group. When stratified for gender, the QBP method identified different sets of relevant biomarkers for each gender separately. Endothelin and leptin within serum, and myeloperoxidase and midkine within urine, seemed to be more involved in women's MDD pathophysiology, whereas apolipoprotein A1, EGF and myeloperoxidase within serum and cortisol, substance P and thromboxane B2 seemed to be more involved within men s MDD pathophysiology. From all biomarkers identified within the stratified groups by QBP, endothelin (serum), myeloperoxidase (urine) and midkine (urine) were only identified by QBP and not previously using the Mann-Whitney test. Table 5. Overview of biomarkers actively contributing to the AUC of the bio depression score.

Gender Stratification and Biomarker Panel Selection with QBP
Under all conditions BMI is also an active bilmarker; Bold biomarkers are significantly different between men and women.

BDS and Biomarker Panel Performance in the Total Group and That Stratified for Gender
The Bio Depression score BDS was calculated based on the optimized inclusion/exclusion criteria for relevant biomarker tails. For the total group, criteria were set at 20%/6%, criteria for men were set at 17%/6% and for women at 17%/7%. Permutation analysis with all 29 biomarkers (including BMI) showed significant BDS discrimination for the non-stratified (p < 0.0001) as well as the stratified biomarker panels (men p < 0.01, women p < 0.05). Figure 1 depicts the distribution of the bio depression score (BDS) in the total study group as well as that stratified for gender. There was no significant difference between the mean BDS scores of all MDD groups (non-stratified and stratified). Regression analysis showed no direct interaction between gender, BDS score and MDD status. However, gender seems to have a small confounding effect as its inclusion into the regression model increased the odds ratio of the BDS score from 1177 (CI: 1.125-1.232) to 1205 (CI: 1.147-1.267).

Figure 1.
Bio depression score distribution with and without gender stratification. Each generated bio depression score (BDS) is based on all the actively contributing biomarkers within the quantile-based prediction (QBP) for each configuration.

Disease Classification with ROC Analysis
Without gender stratification, an Area Under Curve (AUC) of 0.739 was calculated. Subsequent gender-specific AUC calculation (with the BDS based on the total group) resulted in an AUC of 0.760 for men and 0.751 for women. Stratified for gender and based on gender-specific criteria in the BDS, the AUC in men (AUC = 0.806) and women (AUC = 0.807) increased, although not significantly (total group BDS vs. men BDS, p = 0.089; total group BDS vs. women BDS, p = 0.090). Receiver operator characteristic (ROC) curves of the non-stratified BDS and gender stratified groups are visualized in Figure 2.
Without gender stratification, out of participants classified by BDS as having MDD, 37% were correct whereas participants classified by BDS as control, 33% were correct. Correct identification increased when gender stratification was applied. BDS correctly classified MDD within 40% of men and 44% of women. The number of false positives increased within the women group, whereas that within the men group decreased compared to no gender stratification. The percentage of false negatives decreased when gender stratification was applied.

Disease Classification with ROC Analysis
Without gender stratification, an Area Under Curve (AUC) of 0.739 was calculated. Subsequent gender-specific AUC calculation (with the BDS based on the total group) resulted in an AUC of 0.760 for men and 0.751 for women. Stratified for gender and based on gender-specific criteria in the BDS, the AUC in men (AUC = 0.806) and women (AUC = 0.807) increased, although not significantly (total group BDS vs. men BDS, p = 0.089; total group BDS vs. women BDS, p = 0.090). Receiver operator characteristic (ROC) curves of the non-stratified BDS and gender stratified groups are visualized in Figure 2.
Without gender stratification, out of participants classified by BDS as having MDD, 37% were correct whereas participants classified by BDS as control, 33% were correct. Correct identification increased when gender stratification was applied. BDS correctly classified MDD within 40% of men and 44% of women. The number of false positives increased within the women group, whereas that within the men group decreased compared to no gender stratification. The percentage of false negatives decreased when gender stratification was applied.

Discussion
The aim of the current paper was to investigate the effect of gender on the discriminative power of potential biomarkers for MDD in serum and urine. First, we focused on gender differences in serum and urine levels of MDD-associated biomarkers, irrespective of MDD, and found that there are several differences in biomarker levels between men and women. Second, we focused on associations of biomarker levels within MDD with and without gender stratification. Without gender stratification, eight biomarkers in serum and five biomarkers in urine were significantly different between the MDD and control group. When gender stratification was applied, differences concerned seven biomarkers in serum and five in urine in men, whereas in women, differences regarded four in serum and four in urine. Six of those biomarkers were only identified as different between MDD and control after applying a gender stratification. Last, we investigated the effects of gender stratification on MDD disease classification by a panel of biomarkers using a novel statistical method called quantile-based prediction. This analysis revealed that when the QBP was applied for each gender separately, the predictive accuracy improved as evident from an increase in AUC from 0.739 calculated for the total population to 0.805 in men and 0.807 in women, although this increase was not statistically significant. To our knowledge, this is the first study to investigate the modifying effects of gender on the discriminative power of serum-and urine-based biomarkers for MDD.
Before we can accept the findings of our study, some limitations need to be addressed. One of the major issues in the development of biomarker panels is that the performance of specific biomarkers panels is measured against the disease state, which in turn is determined by the use of diagnostic instruments. Although all diagnostic instruments are in principle clinically valid, they have their limitations with respect to correctly diagnosing psychiatric disorders like MDD, which makes a biomarker performance assessment as good as the reference tool which is used for the diagnosis. Within the current study, we used the Mini-International Neuropsychiatric Interview (MINI), which is a valid instrument for diagnosing DSM-IV disorders [23]. The sensitivity of the MINI is highest for MDD, albeit with a high false-positive scoring rate [23][24][25], which increases the chance of falsely linking biomarkers to MDD. To reduce the possibility of false-positive MDD classification by the MINI, we selected participants with moderate-to-severe MDD as defined by the presence of at least some disability (i.e., staying in bed due to psychopathology, being unable to do normal

Discussion
The aim of the current paper was to investigate the effect of gender on the discriminative power of potential biomarkers for MDD in serum and urine. First, we focused on gender differences in serum and urine levels of MDD-associated biomarkers, irrespective of MDD, and found that there are several differences in biomarker levels between men and women. Second, we focused on associations of biomarker levels within MDD with and without gender stratification. Without gender stratification, eight biomarkers in serum and five biomarkers in urine were significantly different between the MDD and control group. When gender stratification was applied, differences concerned seven biomarkers in serum and five in urine in men, whereas in women, differences regarded four in serum and four in urine. Six of those biomarkers were only identified as different between MDD and control after applying a gender stratification. Last, we investigated the effects of gender stratification on MDD disease classification by a panel of biomarkers using a novel statistical method called quantile-based prediction. This analysis revealed that when the QBP was applied for each gender separately, the predictive accuracy improved as evident from an increase in AUC from 0.739 calculated for the total population to 0.805 in men and 0.807 in women, although this increase was not statistically significant. To our knowledge, this is the first study to investigate the modifying effects of gender on the discriminative power of serum-and urine-based biomarkers for MDD.
Before we can accept the findings of our study, some limitations need to be addressed. One of the major issues in the development of biomarker panels is that the performance of specific biomarkers panels is measured against the disease state, which in turn is determined by the use of diagnostic instruments. Although all diagnostic instruments are in principle clinically valid, they have their limitations with respect to correctly diagnosing psychiatric disorders like MDD, which makes a biomarker performance assessment as good as the reference tool which is used for the diagnosis. Within the current study, we used the Mini-International Neuropsychiatric Interview (MINI), which is a valid instrument for diagnosing DSM-IV disorders [23]. The sensitivity of the MINI is highest for MDD, albeit with a high false-positive scoring rate [23][24][25], which increases the chance of falsely linking biomarkers to MDD. To reduce the possibility of false-positive MDD classification by the MINI, we selected participants with moderate-to-severe MDD as defined by the presence of at least some disability (i.e., staying in bed due to psychopathology, being unable to do normal activity/work at all, being unable to do the normal amount of activity/work or being unable to have normal quality in activity/work. Another important factor is the time between the administration of the questionnaire and drawing biological samples. For a direct correlation, a limited time between the questionnaire and sample drawing is essential because levels of biomarkers are to be related to the disease state at that moment. However, within the current study, the time between the MINI and drawing of biological samples varied greatly, including in one participant, for which this time lag exceeded 512 days. The difference in time makes it more difficult to directly correlate biological markers to disease status, especially for subjects with a larger time difference. In these patients, one can question disease status based on the duration of MDD [26]. However, in combination with data on the prevalence of MDD [27], we estimated the chance of false positives in the cohort over time by using a Kaplan-Meier survival model. Results showed that from all participants initially diagnosed with MDD, 90% are still depressed at the time of biological sample drawing. Although 90% seems sufficiently high to assess correlations between biomarker data and disease state, the 10% of misdiagnoses may have had an influence on the performance of our biomarker panel. It is, however, unlikely to have had an effect on the 28 biomarkers which we identified to be significantly different between men and women. With respect to the observed biomarker differences between men and women irrespectively from disease status, there are some points worth considering. For this analysis, we used biomarker data from the combined group of controls and MDD patients, which could have skewed the individual differences in biomarker levels leading to detecting gender differences, but which were actually absent. Other factors that may have influenced the differences between men and women and also between control and MDD are BMI, age and smoking. For both factors, it is known that they can influence biomarker levels: leptin levels, for example, increase with increasing BMI [28], whereas age affects multiple biomarkers of the Hypothalamic Pituitary Adrenal axis (HPA axis), the immune system and neurotrophic factors, leading either to an increase or decrease of levels [29][30][31]. Although not explicitly investigated within the current study, the previously observed positive correlation between BMI and leptin levels seems to be supported by our data showing that higher BMI and higher leptin levels within serum are associated with MDD (data not shown). The effects of age on biomarker levels were not investigated within the current scope of the study, but participants were matched for age, therefore limiting its potential influence. At the same time, this procedure may limit extrapolation to a larger age group. As with age, smoking is another factor which was currently not within the scope of the study but could potentially explain differences between men and woman as well as between controls and MDD. It was previously shown that tobacco use increases plasma levels of BDNF within MDD participants as compared to nonsmoking MDD participants [32]. Next to age, BMI and smoking, the use of prescription medicine may have altered differences in biomarker levels. A recent study has shown that prescription drugs can affect 1 to 250 different proteins/biomarkers and that even after correction for other covariates such as gender, age and smoking, medicine use still accounts for a substantial part of the observed variance in biomarker levels at the population level [33]. Another factor which could have had substantial effect on the performance of our biomarker panel is the Caucasian descent of our study population, which may limit extrapolation to other ethnic groups. In a recent study, van Buel et al. (2019) showed that the AUC varies using the same set of biomarkers when stratified for Caucasians and other ethnicities [15].
Previous research showed that one of the major issues in developing suitable biomarker panels is finding biomarkers that have, when combined, sufficient discriminative power to be of diagnostic value [5,34]. Ideally, these are biomarkers which are associated with a specific biological pathway. However, due to the heterogeneous nature of MDD, it is hard to determine specific biological markers for MDD, leading to discussions about the validity of a potential biomarker (and by extension biomarker panels). With respect to the currently investigated biomarker panel, biomarkers were included which were carefully selected and covered various biological mechanisms associated with MDD (see Supplementary Table S2). Still, only a subset of biomarkers could be associated with MDD within the study population. It could be that within the current population only a subset of biological pathways were actively involved. The association between individual biomarkers and MDD is, however, not solely dependent on specific biological pathways but also on gender. By applying gender stratification, a different pattern of associated biomarkers emerged which, interestingly, was associated with different biological pathways. Our data suggest that biological mechanisms such as the HPA axis/stress, neuroplasticity, endothelial dysfunction and neuro immune-inflammation are more likely to be involved in male MDD pathophysiology, whereas a strong immune response and endothelial dysfunction/oxidative stress are more likely to be involved in female MDD pathophysiology. Gender-specific pathophysiology patterns are also supported by recent studies showing gender-specific neuroimmune dysregulation [35], male-specific HPA axis dysregulation within MDD related to alcohol abuse [36], and gender-specific differences at the level of immune responses [37].
After gender stratification, the AUC increased in our study. Albeit not significantly, the increase of AUC after stratification suggests a better biomarker performance, which is also evidenced by the change in contributing biomarkers. Within both genders, less biomarkers contributed more to a higher AUC, suggesting that these biomarkers more closely reflect the underlying biological pathways. Furthermore, a recent analysis performed by our group involving a larger cohort showed that after gender stratification, the AUC increases with less contributing biomarkers compared to the non-gender-stratified group. The difference in AUC remained after performing a 5-fold cross-validation (see Supplementary Table S3) [38]. An advantage of the new advanced statistical method we used is that we were able to utilize the full informative potential of our biomarkers with respect to underlying biomarker dynamics and the contribution to disease status, thereby limiting the effects of the high variability in biological pathways involved in MDD. The application of advanced statistical methods is not new: others have shown that by utilizing potential information buried within biomarker data, one can develop biomarker panels with high discriminative power [5,15]. This concept of identifying contributing biomarkers is in line with our QBP method, which showed not only that with gender stratification, different patterns of contributing biomarkers can be identified, but that this method also eliminated non-relevant biomarkers which were previously identified as biomarkers associated with MDD and vice versa.
In future studies utilizing QBP, transforming biomarker data before applying QBP might positively affect performance. Papakostas et al. (2013) have previously shown that data transformation improved their algorithm, resulting in the detection of MDD with high sensitivity and specificity [17]. The QBP method we used can be further optimized by selecting only relevant tails (disease or non-disease-associated) from one biomarker, thereby reducing potential background noise which could interfere with the analysis. Still, without performing extensive data transformations, our method could quite well distinguish between MDD and controls.
In summary, we have shown that gender differences play an important role in not only biomarker identification but also the development of biomarker panels for MDD. Gender stratification increased the discriminative power of our QBP biomarker panel. Further, several points were mentioned which could further increase the performance of our biomarker panel.

Study Design
The present study was a cross-sectional analysis of participant data and biological specimens (serum and urine) from the Lifelines Cohort Study and biobank [39,40]. Lifelines is a large multi-disciplinary prospective population-based cohort study examining risk factors for multifactorial diseases among 167,729 persons living in the north of the Netherlands. It employs a broad range of investigative procedures in assessing the biomedical, socio-demographic, behavioral, physical and psychological factors which contribute to the health and disease of the general population, with a special focus on multi-morbidity and complex genetics. Data and biological specimens are collected every five years. Baseline assessments took place between 2006 and 2013. The present analysis was performed using data from the first two waves. The study complied with the principles enunciated in the Declaration of Helsinki and was approved by the medical ethics review committee of the University Medical Center Groningen. Informed consent was acquired from all participants.

Study Population
The study population consisted of 200 participants with a current MDD diagnosis and 200 1:1 ageand gender-matched controls from the total Lifelines cohort. Current MDD was assessed according to DSM-IV-TR criteria with a standardized diagnostic interview based on the Mini-International Neuropsychiatric Interview (MINI). Missing data were allowed when it did not interfere with the establishment of a diagnosis (e.g., no data were required for the additional symptoms of MDD when the core criteria were both absent). We selected only MDD subjects with at least one day of disability a month, according to the four disability questions from the MINI (i.e., staying in bed due to psychopathology, being unable to do normal activity/work at all, being unable to do the normal amount of activity/work or being unable to have normal quality in activity/work). Medication use was measured by asking the participants to bring their medication to the baseline interview where the research assistant noted the corresponding Anatomic Therapeutic Chemical code (ATC). Controls were selected when they reported no MDD symptoms and did not qualify for another MINI diagnosis. Exclusion criteria included non-Caucasian descent, pregnancy, self-reported substance abuse disorder (question: did you have contact with addiction care in the past 12 months) or self-reported physical illness (i.e., kidney problems, cardiovascular problems, cancer, diabetes type 1 or 2, and thyroid problems).

Biomarker Selection and Measurements
The selection of biomarkers was based on a previous study by our group [15] which identified seven biomarkers in serum and eleven in urine. These biomarkers were supplemented with an additional seven biomarkers in serum and three in urine, based on a literature search using various combinations of the terms MDD, depression, biomarkers, urine, serum and pathophysiology. Table 6 shows an overview of biomarkers selected for the present study. Supplementary Table S3 shows an overview of the biological mechanisms in which they are involved. The concentration of the biomarkers as measured in urine are all expressed as the amount of biomarker relative to the amount of creatinine, calculated as the ratio between the concentration in urine of the biomarker divided by the concentration of creatinine. In addition, BMI was added as an extra non-matrix based biomarker due to the high association between BMI and MDD [41].
Biomarker levels in serum and urine were determined by Enzyme-Linked Immunosorbent Assays (ELISA). ELISA kits were obtained from various vendors (see Table 6). ELISA procedures were performed using specific Standard Operating Procedures (SOPs) in which all experimental variables are recorded to generate full experimental traceability for each run. Each SOP was set up according to the manufacturer's instructions with minor modifications (like adding extra calibrators, altered sample dilutions). An ELISA plate washer (Biorad PW40) was used for all washing steps. TMB absorption measurements were performed on a Microtiter plate reader (Thermo Multiskan Spectrum) at 450 nm using 620 nm as a reference wavelength. Unknown biomarker concentrations were determined by the use of a 4-parameter logistic regression (4-PL) model without weight factors [42]. In short, unknown sample concentrations are calculated based on the optimal fit of the standard curve (based on the Optical Density values) calculated within the 4-PL model. The application of a weight factor can be applied to better fit unknown sample concentrations with a relatively low sample concentration since these samples tend to fit worse compared to samples falling within the middle and upper parts of the curve [42].
Since the ELISAs used are provided at the research and development level, performance parameters such as calibration and reproducibility are not well controlled, potentially leading to biomarker variations when measured over various runs. To account for this, biomarker testing was designed such as to maximally reduce uncontrolled variability by means of measuring all samples in one run for each biomarker. The sample plate positions were randomized for each run.

Statistical Analysis
Descriptive statistics were calculated for demographic and clinical characteristics of the study population according to MDD status and gender. These characteristics were first compared between MDD cases and controls and subsequently within MDD status between men and women.
Mean levels of biomarkers were first compared in the total population between men and women. Thereafter, we assessed differences in these levels according to MDD status in the total group and while stratifying for gender.
Categorical data were compared using chi-square tests, and continuous data including differences in biomarker levels were compared using the non-parametric Mann-Whitney test because they showed a non-normal distribution. The Levene's test to assess heterogeneity was applied to determine variance differences in each individual biomarker. A Kaplan-Meier analysis was performed in order to investigate the effect of time between sample drawing and the initial diagnosis which in some occasions was more than 30 days.
For the binary classification of MDD presence, a newly developed method called quantile-based prediction (QBP) was used [15]. In short, QBP assigns scores to certain percentiles in the left and right tails of empirical biomarker distributions. Tails in which a shift case versus control is observed (differences between case and control) are assigned a value 0, and tails where no difference is found are assigned 0. The further the percentile is removed from the 50th, the higher the absolute value of the score. For relevant tails (tail either represents disease or control) three percentiles (P10, P5 and P1, or P90, P95 and P99) are selected and receive a value of 1, 2 or 3, respectively. For each singular biomarker, a positive or negative score is assigned such that a positive score is associated with MDD and a negative score with control status. The sum of scores for all biomarkers is the Bio Depression Score (BDS) for each participant based on the relevant tails. Next, we performed a receiver operator characteristic (ROC) analysis by relating the score to the presence of MDD for various cut-offs. The AUC was calculated as a cut-off-independent measure of discriminatory power of the score. The next step involved optimization of the selection criteria of relevant biomarker tails by empirically making the criteria for inclusion of relevant biomarkers more or less stringent. This was accomplished by varying the threshold on the exceed ratio (ratio between the dominant and non-dominant group for each of the three percentiles). By varying the threshold at the 90 and 95 percentile and the 5 and 10 percentile, the AUC increased or decreased. The optimal criteria were set at the level where the AUC started to decrease and are expressed as percentages. The AUCs of the total as well as the gender-specific groups were determined based on the BDS score calculated by the QBP method including the total population. For the gender-stratified analysis, the AUCs were determined based on the BDS estimated in men and women separately.
The statistical significance of the discriminative power of the optimal BDS and calculated AUC was assessed by performing a permutation analysis (approx. 2000 permutations) in which the case control/control indicator was randomly distributed over the original biomarkers data generating randomly generated AUCs. Possible interaction between gender and the BDS score in relation to MDD status was assessed by testing the statistical significance of a product term gender*BDS as an independent variable in a binary logistic regression model including the total population. For the assessments of discriminative power, the Youden index was used.
The statistical significance level was set at 0.05, two-sided. QBP analysis, calculation of ROC curves, the AUC permutation analysis and the accompanying sensitivity and specificity was done in labview (see [15]) and in Medcalc version 18.11.3. Group comparisons were performed within GraphPad prism 8 and IBM SPSS statistics 26. The binary logistic regression was performed with IBM SPSS statistics 26.

Conclusions
We have demonstrated that at the gender level, numerous biomarker differences can be found not only irrespective of disease but also between controls and MDD patients. We also have shown that some of these biomarker differences are specific to either males or females and that without gender differences taken into account, possible MDD candidate biomarkers can be missed. Next, we have shown that gender differences likely play an important role in biomarker panel development in terms of performance. Selecting for gender increased the performance of our biomarker panel in terms of discriminative power, although not significantly. Our results indicate that there might be a need to focus on gender differences, but more studies are needed to confirm.
Supplementary Materials: Supplementary materials can be found at http://www.mdpi.com/1422-0067/21/9/3039/ s1. Figure S1 Kaplan-Meier analysis of MDD status irrespective of day of sample drawing; Table S1 Mann whitney U and levene's test results; Table S2 overview of biological mechanisms of MDD and associated biomarkers; Table  S3 Comparison of results with and without gender separation within new cohort. Conflicts of Interest: M. Meddens holds stocks in Brainscan. Brainscan has filed separate patents covering the diagnostic use of the biomarkers as described in this manuscript and the statistical methodology applied. The other authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.