Risk of Interruption of Doctoral Studies and Mental Health in PhD Students

PhD students report a higher prevalence of mental illness symptoms than highly educated individuals in the general population. This situation presents a serious problem for universities. Thus, the knowledge about this phenomenon is of great importance in decision-making. In this paper we use the Nature PhD survey 2019 and estimate several binomial logistic regression models to analyze the risk of interrupting doctoral studies. This risk is measured through the desire of change in either the supervisor or the area of expertise, or the wish of not pursue a PhD. Among the explanatory factors, we focus on the influence of anxiety/depression, discrimination, and bullying. As control variables we use demographic characteristics and others related with the doctoral program. Insufficient contact time with supervisors, and exceeding time spent studying -crossing the 50-h week barrier-, are risk factors of PhD studies interruption, but the most decisive risk factor is poor mental health. Universities should therefore foster an environment of well-being, which allows the development of autonomy and resilience of their PhD students or, when necessary, which fosters the development of conflict resolution skills.


Introduction
Recent studies [1][2][3] have brought to light again the mental health problems suffered by students during their doctoral studies (hereafter referred to as PhD students). Although the interest about the well-being of students in higher education is not new [4][5][6], most authors have restricted their analyses to undergraduate and master studies. However, a PhD has its own characteristics, which make it worthy of a separate analysis from other postgraduate studies.
PhD students often complain of social isolation, loss of motivation, and communication difficulties with the supervisor [7][8][9]. The study of these factors has focused mainly on explaining the success or failure in the studies [10,11] as well as the assessment of mental illness symptoms [9,[12][13][14].
Rates of anxiety and depression are high among the PhD students [14]. Moreover, PhD students report a higher prevalence of mental illness symptoms than highly educated individuals in the general population and other higher education students [14].
The six-factor model of psychological well-being provides a theoretical framework to understand the doctoral degree context [2]. Some authors analyze the social support [15,16] and autonomy [17]. Some other studies focus on conceptualizing and measuring the well-being of PhD students [18]. And others show that levels of well-being are correlated to progress in doctoral studies, professional development, and scientific productivity [10,19].
Low levels of well-being in PhD students suppose serious problems for the universities. PhD students make a significant contribution to the overall research output from universities [20,21], and low levels of well-being reduce the quality and quantity of the research outputs [22].
Universities are responsible for maintaining an environment that supports PhD students' wellbeing. In this respect, the knowledge of the mental health situation of the PhD students and its influence in the risk of interruption of doctoral studies is of great importance for decision-making. Poor mental health, which is linked to low levels of well-being, is associated with an increased risk of interruption in the studies [23][24][25][26].
In this paper we also analyze the risk of interruption of doctoral studies, but using a different methodology from that of previous studies, as well as an updated database. Specifically, we estimate several binomial logistic regression models using the data of the Nature PhD Survey 2019. The risk of interruption is measured through the desire of change in either the area of expertise or the supervisor, or the wish of not pursue a PhD. Among the explanatory factors, we focus on the influence of anxiety/depression, discrimination, and bullying. As control we use demographic variables and those other related with the doctoral program.

The Data
The Nature team have run a biennial PhD Career survey since 2011. In this research we use the last wave of the survey, namely Nature PhD survey 2019. This online survey was developed in collaboration with Nature and sent to their database and subscribers via different channels. In order to boost response in specific regions which have been previously under-represented, the survey was translated into four languages (Mandarin Chinese, Portuguese, Spanish and French) in addition to English. The survey was live for approximately six weeks during the months of May and June of 2019. The final usable sample, after removing poor quality responses and missing data, reaches 6320 records [27].
The survey included up to 56 questions, and we focus our research on the following one: "What would you do differently right now if you were starting your PhD program?" The risk of interruption, thus, is measured through the desire of change in either the area of expertise or the supervisor, or the wish of not pursue a PhD. We consider that a desire to change (not to change) shows a dissatisfaction (satisfaction) of the PhD student with the PhD studies, and that dissatisfaction can materialize later in an interruption of studies.  As can be seeing in Figure 1, more than half of the respondents to the Nature PhD Survey 2019 would radically change the beginning of their PhD (changing area, supervisor or directly not pursuing a PhD at all). Less than half are satisfied with their initial choice, as they would not change anything.

Method
The variable under study discriminates between more options than the ones we are interested in. That is why we chose to combine the different alternatives into only two options: (a) not changing anything, (b) changing something (area, supervisor or not doing a PhD at all). In this way we transformed the variable object of study into a dichotomous one and estimated the probability of belonging to the group of students who would not change anything, as opposed to the probability of belonging to the group of students who would change something.
To estimate this probability, we chose to carry out the maximum likelihood estimation of a binomial logit model [28], in which we introduced a set of explanatory control variables -related to demographic characteristics of the students, as well as to characteristics of the doctoral program itself-, together with the variables under study -related to mental health characteristics-. In this way we can see which personal, doctoral program or mental health characteristics increase the probability of belonging to one group or another.
As there are some missing values among the explanatory variables, the sample is reduced from 6320 to 5911 PhD students. Nevertheless, we decided not to use multiple assignment techniques, though the loss of sample size, due to the uncertainty of the quality of the assignment of such values [29]. The estimation of the logit model is made for the entire sample without missing values (5911), and repeated by splitting the sample in two: students who had no mental health problems (2908) and those who had mental health problems -anxiety or depression, or suffered discrimination or bullying-(3003). In this way we try to identify different patterns of behaviour in terms of the explanatory variables and the probability of belonging to each of the groups (those who would not change anything vs. those who would).
Finally, we repeat the estimations with other subsamples. In this case, differentiating the students with some problem according to the problem typology. In this way we try to see if the explanatory variables have a different influence on the group of those who have suffered anxiety/depression (2164), those who have suffered discrimination (1252) and those who have suffered bullying (1280).
All the estimated models try to analyse which variables increase the probability of belonging to the group of those who would not modify their choice of PhD, since the generated dichotomous variable Yi takes the value 1 when students report that choice and the value 0 when they report that they would modify something (area, supervisor or not doing a PhD at all). Equation (1) shows this probability as a function of the k-explanatory variables (Xki), whose summary statistics are shown in the Appendix (Table A1): The expected values of the dependent variable Yi, using the logistic function, can be transformed in the logistic model (2): where Pi is the probability of not modifying anything related to the PhD choice, is the mean probability for the reference student, and the coefficients of the k-explanatory variables. Thus, the probability of not modifying anything related to their PhD choice can be expressed as (3): Equation (3) has been estimated with the abovementioned sample and subsamples. All estimations and post-estimations have been performed with the StataSE 15 statistical package using the logit command [30].

Results
Equation (3) is estimated for the whole sample (Logit 1), for the subsample of students with no mental health problems (Logit 2), and finally for the subsample of students with mental health problems (Logit 3). Results are shown on Table 1, where it is easy to interpret which variables increase the probability of not wanting to change anything related to the PhD, and which variables decrease this probability or, looked at another way, which variables increase the probability of wanting to change something.
As all variables except age are categorical, Table 1 shows the estimations in the form of oddratios. When the value of the odd-ratio is greater (lower) than one, the variable increases (decreases) the probability of not wanting to change anything. Therefore, students with the characteristics of those variables that show an odd-ratio greater than one would maintain their choice of PhD without any modification, and those with characteristics that show an odd-ratio lower than one would change area or supervisor or, directly, would not pursue a PhD.
In addition to the odd-ratios, Table 1 shows the 95% confidence intervals, so that, for those variables that are statistically significant, it can be checked whether their confidence intervals include the value 1, in which case it cannot be stated whether this variable increases or decreases the corresponding probability.
The three estimated models show good results in terms of goodness of fit, since they manage to correctly classify more than 70% of the sample and have a pseudo coefficient of determination of 0.20, 0.13, and 0.19, respectively.
Out of all demographic variables, only age is significant in the Logit 1 and Logit 3 estimates. Their odd-ratio is less than 1, so as the student's age increases the probability of wanting to change also increases. All other variables, such as gender, race, region of birth, having childcare or eldercare responsibilities, or being working while studying, do not have a statistically significant influence on this probability.
Regarding the characteristics of the doctoral program, the estimations show that the most decisive factor is the students' previous expectations. When the doctoral program does not meet the original expectations, the student's probability of not wanting to change anything decreases very sharply, and when it exceeds those expectations the opposite occurs (the probability of not wanting to change anything increases greatly).
The next variable with the highest odd-ratio value is the one that refers to the weekly contact time with the supervisor. As this weekly contact time increases, so does the probability of not wanting to change anything in all three models (Logit 1, Logit 2, and Logit 3).
On the other hand, the weekly time spent on the PhD program is only significant-but lower than 1-when it exceeds the fifty hour per week barrier. In that case, the probability of not wanting to change anything decreases (or the probability of wanting to change something increases). However, in the group of those who have had some kind of problem (Logit 3), this variable is not statistically significant.
Finally, regarding Table 1, the three variables related to mental health are statistically significant in the two models in which they intervene (Logit 1 and Logit 3), showing that students who have suffered anxiety or depression, discrimination or bullying, are the least likely to want to keep everything unchanged. Especially in the case of those who have experienced bullying. The Logit 3 estimation was made to the subsample of students who had stated that they had suffered some kind of problem, without distinguishing the type of problem. In order to check if there are differences in the influence of the variables according to the problem suffered by the students, we chose to estimate three new binomial logistic regressions (Logit 4, Logit 5, Logit 6) and show their results in. The models have been estimated with the same explanatory variables as those in Table 1, but here we show only the variables of interest: those related to the doctoral program and to mental health. Concerning the variables related to the doctoral program, it can be seen that expectations are still relevant. When the program does not meet the student's expectations, the probability of wanting to keep everything unchanged decreases in all three groups, especially among those who have suffered discrimination (although the ICs in the three models overlap, so the difference between them is probably not significant). However, for the subsample of those who have had anxiety/depression (Logit 4) the fact that the program exceeds the students' expectations is not relevant to the probability under study, unlike the other two subsamples (Logit 5 and Logit 6).
Regarding the contact time with the supervisor, there are again differences between subsamples. The subsample of those who have had anxiety or depression (Logit 4) presents odd-ratios for this variable almost identical to those of the total sample (Logit 1). However, the subsample of those who have suffered discrimination or bullying only value positively having a weekly contact with their supervisor of between one and three hours, but not more than three hours a week.
Finally, in the subsample of those who have suffered from anxiety or depression (Logit 4), having also suffered from bullying is a determining factor in reducing the probability of wanting to keep everything unchanged, while discrimination is not statistically significant enough to influence this probability. In the group of those who have suffered discrimination, both having had anxiety/depression, and having experienced bullying, decreases in the same proportion the probability of wanting to keep everything unchanged. However, in the group of those who have experienced bullying, neither anxiety/depression nor discrimination is statistically significant enough to influence this probability.

Discussion and Conclusions
Low levels of well-being in PhD students are a serious problem for universities. In decisionmaking, the awareness about the mental health situation of PhD students and its influence on the risk of interruption of doctoral studies is of great relevance.
In this respect we estimated several binomial logistic regression models in a large-scale survey of about six thousand PhD students. The risk of interruption of the studies was measured through the desire of changing area of expertise, or supervisor, or to regret having chosen to pursue a PhD. Among the explanatory factors, we focused on the influence of three mental health aspects: anxiety/depression, discrimination, and bullying. As control variables we used some demographic variables and some others related with the doctoral program itself.
Regarding the demographic variables, only age was statistically significant for the risk of interruption: as the student's age increases the risk of interruption also increases. All other demographic variables, such as gender, race, region of birth, having childcare or eldercare responsibilities, or being working while studying, do not have a statistically significant influence on the risk of interruption in the studies.
Regarding the characteristics of the doctoral program itself, the most decisive risk factor is not meeting the students' original expectations. When the doctoral program does not meet the original expectations, the risk of interruption increases considerably, and when it exceeds those expectations the opposite occurs. The second most decisive risk factor is insufficient contact time with the supervisor. As the weekly contact time decreases, the risk of interruption increases. On the other hand, the weekly time spent on the PhD program is only significant when it exceeds the fifty hour per week barrier. In that case, the risk of interruption increases. However, in the group of those who have had some mental health problem, this variable is not statistically significant, which leads us to think about the possibility that these students have a higher level of resilience or endurance and that, therefore, it is not relevant for them that their weekly time spent on the PhD program exceeds fifty hours. Note: Each of these models include the same explanatory variables as those of Table 1, but we only show the variables we are interested at, for explanatory purposes.
Finally, the three variables related to mental health are statistically significant in all the models analyzed, showing that students who have suffered anxiety or depression, discrimination or bullying, are the more likely to interrupt the doctoral studies, especially in the case of those who have experienced bullying.
Distinguishing by subsamples according to the mental health problem, the initial expectations are still relevant. When the doctoral program does not meet the student's expectations, the risk of interruption increases. However, for the subsample of those who have had anxiety/depression the fact that the program exceeds the students' expectations is not relevant in reducing the risk of interruption, unlike the other two subsamples of those who have suffered discrimination or bullying.
Regarding the contact time with the supervisor, there are again differences between subsamples. The subsample of those who have had anxiety or depression presents values almost identical to those of the total sample. However, the subsample of those who have suffered discrimination or bullying only value positively having a weekly contact with their supervisor of between one and three hours, but not more than three hours a week. It is possible that, due to their previous poor interpersonal relationship experience, a contact of more than three hours per week with the supervisor may not be for them a sign of better quality of a doctoral program.
Finally, in the subsample of those who have suffered from anxiety or depression, having also suffered from bullying is a determining factor in the risk of interruption, while discrimination is not statistically significant. In the group of those who have suffered discrimination, both having had anxiety/depression, and having experienced bullying, increase in the same proportion the risk of interruption. However, in the group of those who have experienced bullying, neither anxiety/depression, nor discrimination is statistically significant enough to influence this risk of interruption.
Universities are in charge of maintaining an environment that supports PhD students' wellbeing. Suggestions for improving the well-being of doctoral students include three levels of intervention. Firstly, actions in relation to the development of student resilience and autonomy [31,32]. Secondly, actions to promote positive relationships between students and their supervisors, with clear guidelines for both student and supervisor expectations. And finally, training to both student and supervisors about conflict resolution and relationship boundaries [16,17,33].
The contribution of this paper to the literature is mainly methodological. We analyzed the risk of interruption of doctoral studies using a different methodology from that of previous studies in the literature, as well as an updated database. We estimated several binomial logistic regression models using data from a 2019 survey. We measured the risk of interruption through the desire of change in either the area of expertise or the supervisor, or the wish of not to pursue a PhD. Among the explanatory factors we focused on the influence of anxiety/depression, discrimination, and bullying, while using other demographic variables, as well as variables related to the doctoral program, as control ones.
The main limitation of this study lies in the fact that we can only observe the 'risk' of abandoning doctoral studies, since the survey was carried out among students who were actually doing their doctorate and therefore none of them had abandoned it. In fact, the desire to change the area of study or supervisor does not necessarily imply abandonment of studies. Thus, we cannot speak of a 'dropout rate', but only of a risk of abandonment. However, the higher the risk, the greater the likelihood of dropout.
Future research could extend the study to a sample that includes both types of students, those who have completed their doctoral studies and those who have left their studies uncompleted, in order to explore what the main dropout factors may have been.