Which Risk Factors Matter More for Psychological Distress during the COVID-19 Pandemic? An Application Approach of Gradient Boosting Decision Trees

Background: A growing body of scientific literature indicates that risk factors for COVID-19 contribute to a high level of psychological distress. However, there is no consensus on which factors contribute more to predicting psychological health. Objectives: The present study quantifies the importance of related risk factors on the level of psychological distress and further explores the threshold effect of each rick factor on the level of psychological distress. Both subjective and objective measures of risk factors are considered in the model. Methods: We sampled 937 individual items of data obtained from an online questionnaire between 20 January and 13 February 2020 in China. Objective risk factors were measured in terms of direct distance from respondents’ housing to the nearest COVID-19 hospital, direct distance from respondents’ housing to the nearest park, and the air quality index (AQI). Perceived risk factors were measured in regard to perceived distance to the nearest COVID-19 hospital, perceived air quality, and perceived environmental quality. Psychological distress was measured with the Kessler psychological distress scale K6 score. The following health risk factors and sociodemographic factors were considered: self-rated health level, physical health status, physical activity, current smoker or drinker, age, gender, marital status, educational attainment level, residence location, and household income level. A gradient boosting decision tree (GBDT) was used to analyse the data. Results: Health risk factors were the greatest contributors to predicting the level of psychological distress, with a relative importance of 42.32% among all influential factors. Objective risk factors had a stronger predictive power than perceived risk factors (23.49% vs. 16.26%). Furthermore, it was found that there was a dramatic rise in the moderate level of psychological distress regarding the threshold of AQI between 40 and 50, and 110 and 130, respectively. Gender-sensitive analysis revealed that women and men responded differently to psychological distress based on different risk factors. Conclusion: We found evidence that perceived indoor air quality played a more important role in predicting psychological distress compared to ambient air pollution during the COVID-19 pandemic.


Introduction
The 2019 coronavirus  pandemic is a health threat that has spread throughout the world [1][2][3]. Patients with COVID-19 can suffer from severe pneumonia, pulmonary oedema, or acute renal injury and eventually die of multiple organ failure. As of January 2021, over one hundred million cases of coronavirus have been registered, and more than two million people have died from this virus worldwide. Although different countries have used lockdown measures and vaccine recommendations to control the spread of the virus, the direct effects of the COVID-19 risk factors on people's mental health should not be overlooked. 2 of 18 Correspondingly, a variety of studies have investigated the direct effects of COVID-19 on people's mental health outcomes among the general population, e.g., the association between COVID-19-related factors and levels of depression [2,4,5], anxiety [6][7][8], and psychological distress and stress [9][10][11][12][13]. Other studies have evaluated such an association but have focused on health professionals [12,[14][15][16][17], college students [6,18,19], and patients [20][21][22]. While these studies provided a substantial evidence base showing that COVID-19-related risk factors are significantly associated with psychological status, to the best of our knowledge, no previous study has identified which related risk factors have the greatest impact on the level of psychological distress.
General studies on human health note that air pollution is a significant risk factor [23]. Previous studies have paid more attention to the association between air pollution and physical health outcomes, such as various adverse respiratory and cardiovascular diseases [24], while recent epidemiological studies have indicated a feasible correlation between air pollution and psychological health. Specifically, a growing number of studies have examined the association between air pollutants and a person's depression and psychological distress. Studies conducted in the US, Canada, and China provide most of the literature on this topic though the results have been mixed. One study conducted in the US found no association between the short-term air pollutant level and the depressive symptoms of participants (age over 65), while two other studies reported positive relationships between pollution effects and depression and anxiety symptoms [25,26]. A longitudinal study conducted in the US confirmed that PM2.5 was significantly associated with increased psychological distress [27]. One Canadian study found a positive relationship between ambient air pollution (PM2.5 and NO) and psychological distress through using the measure of the Kessler Psychological Distress Scale (K10) [28]. Finally, a recent study in China indicated that an increase in the previous week's PM2.5 contributed to an increase in the prevalence of depression, and such effects were more pronounced in the spring and summer and in eastern and southern areas [29]. The findings from these works are promising but not conclusive, since many of these studies emphasized only linear or non-linear associations, or they utilized inconsistent measures and methodologies [30].
In addition, there are other influences. Recent studies have pointed out the importance of perceived factors on psychological health status. In 2008, the Blacksmith Institute World's Worst Polluted Places report listed indoor air pollution as one of the world's worst toxic pollution problems [31]. Indeed, evidence has demonstrated that indoor air quality plays an essential role in determining people's health, since people in modern society spend over 65% of their time in their own residence [32]. Research has found that indoor air quality within buildings and homes might be worse than outdoor air quality, even in industrial cities [33]. However, the findings of previous studies investigating the relationship between indoor air quality and health symptoms are mixed. For example, one study found that the perceived indoor environmental quality was associated with psychological distress with respect to the workplace by using a seven-point Likert scale: 1 (unsatisfactory) to 7 (satisfactory) [34]. Another study in China examined the association between living environment and self-reported health, with a special focus on the differences between rural and urban residents in relation to environmental health [35]. However, they found no significant association between exposure to self-reported air pollution and self-rated health. Limited studies have evaluated the direct effect of self-reported or perceived air quality on psychological health benefits, especially during the COVID-19 outbreak, when respondents in the majority of Chinese cities were required to stay in their residence and were allowed to venture out only to obtain necessary medical help or to purchase daily food. Prolonged quarantine restrictions increase the time people spend indoors, so a lower level of self-reported indoor air quality might exacerbate the level of psychological distress.
Additionally, studies have indicated that risk factors related to sociodemographic characteristics and health are associated with symptoms of psychological distress. For example, studies reported that females were more likely to develop symptoms than their male counterparts [12,36,37]. Younger participants were more likely to perceive a higher level of psychological distress compared to the elderly [13,37]; participants who had a chronic disease and reported a lower self-rated health level tended to have a higher probability of suffering from depressive symptoms compared to those who did not [12,38]; and participants who were current smokers or who had a current specific level of alcohol consumption were associated with a higher level of psychological distress [39], while participants who took part in moderate or sufficient physical activity reported less psychological distress [40]. Nevertheless, these studies quantified the significant associations between sociodemographic characteristics and psychological distress, and thus further investigation is required to identify which sociodemographic characteristics have a greater influence on psychological distress.
The novelty of this study is that it examines how different risk factors may affect psychological distress by applying gradient-boosting decision trees (GBDT) to national web-based data on Chinese residents during the early stage of the COVID-19 outbreak, emphasizing the threshold effects of both objective and perceived risk factors on the level of psychological distress. This study aims to address the following research questions: (1) how important is the effect of objective and perceived risk factors in predicting the level of psychological distress, (2) which risk factors most affect the level of psychological distress, and (3) do different risk factors have threshold effects in predicting the level of psychological distress? Uncovering these differences in the importance of risk factors should help to advance epidemiological research in this area.

Method
The GBDT was applied in this study to serve as a new machine-learning approach in the field of urban planning and development. By using this approach, recent studies have investigated the relationship between the built environment and travel behavior [41], and population density and waist-hip ratio [42]. Compared to the traditional regression model, GBDT has several advantages. Firstly, GBDT can efficiently handle complex and non-linear correlations while maintaining a relatively high prediction accuracy [41,42]. One reason is that the GBDT approach uses decision trees to classify predictors and estimate the outcome by minimizing the loss function. As noted in the study, "Models are fit by minimizing a loss function averaged over the training data, such as the squared-error or a likelihood-based loss function" [43]. Second, the GBDT can effectively handle missing data through marking such data with comprising information rather than missing it at random, so the missing value is treated as a new category during the tree building. Lastly, the GBDT can compute and rank the relative importance among predictors contributing to the response variable, whereas traditional statistical models find this difficult to achieve [44].
The equation is derived from the research [43] and is summarized as follows: For m = 1 to M: For I = 1, 2, . . . , N computer Fit the regression tree to the targets r im giving terminal regions For j = 1, 2, . . . , J m compute Update Three steps are conducted to generate the equation. Firstly, the approach initializes the optional constant model to minimize the loss function L(y, f (x)). Then, in the second step, the equation is constructed by four sub-steps at each interaction m. The negative gradient of the loss function is calculated as an estimate of a generalized or pseudo residual (Equation (2)). It further fits the regression tree to the target. It is important to note that two layers of loops are nested in the algorithm, namely, the number of iteration rounds m and the sample i. The regression tree of rounds m is obtained through applying the calculated formula (x i , r ti ) to fit into a regression tree of CART. In terms of sub-step (Equation (3)), J represents the size of each of the constituent trees. Then, the equation calculates the optimal fitting value γ min for the leaf area (loss function minimization) and updates the strong learner. Lastly, it reports the results of the final model.
GBDTs build models stagewise and update models by minimizing a loss function's expected value. A shrinkage strategy is applied by the GBDT to prevent over-fitting and improve prediction accuracy [45,46]. However, an overfitting problem might exist when training data are fitted too closely. Therefore, three parameters are introduced to prevent the over-fitting problem and promote prediction accuracy, namely, the number of trees (M), learning rate (ξ), and tree complexity (C). It should be noted that the tree complexity determines the model complexity and how well the model fits, while the learning rate (shrinkage) is calculated to scale the contribution of each base tree model by introducing a factor of ξ (0 < ξ ≤ 1) as shown below: The smaller ξ is, the greater the shrinkage becomes, which indicates that the overfitting problem can be solved by reducing or shrinking the impact of each tree. However, it is worth noting that a large number of trees might be added to the model during the process. Tree complexity C, which denotes the number of splits (or the number of nodes), is calculated for fitting each decision tree. It represents the depth of variable interaction in a tree. In accordance with the research [43], 2 ≤ C ≤ 5 indicates that the model generally works well. Optional performance of the model depends on choosing the combination of the number of trees (M), learning rate (ξ), and tree complexity (C).
Generally, predictors are rarely equally relevant, as they have different influences on the response variables during the data mining process. Thus, it is necessary to learn the relative importance or contribution of each input predictor in estimating the response. Compared to the traditional modelling approach, the GBDT method can systematically identify and rank the influences of predictors on response predictions.
For a single decision tree T, in accordance with the research [47], the equation can be written as follows: where the summation applies to the non-terminal nodes t of J-terminal node tree T, x κ denotes the splitting variable correlated with node t, and ∧ τ 2 t denotes the corresponding improvement in a square error as when conducting the splitting variables x κ as the nonterminal node t.

Study Area and Data Description
The data were collected using an online questionnaire between 20 January and 13 February-that is, during the early stage of the outbreak of COVID-19 in China. Using the crowdsourcing platform 'Wenjuanxing', 1037 responses were gathered from the online questionnaire. Following the data cleaning process, 100 samples were dropped from the study because respondents had provided invalid information in response to other questions relating to the quarantine location. This left a total of 937 responses that could be used for the statistical analysis. The survey explored individuals' demographic characteristics, health risk factors, psychological distress, and perceived risk factors. The study spanned 230 cities in China. The study received ethical approval from the School of Energy, Geoscience, Infrastructure and Society, at Heriot-Watt University.

Response Variable
The response variable in this study is psychological distress. Psychological distress was assessed with the Kessler Psychological Distress Scale K6 score [48]. It is acceptable to use K6 to measure people's distress level, since it has been widely used and has evidenced reliability and validity across a wide variety of mental health surveys [49,50]. The scale includes six items associated with psychological distress in the previous 4 weeks. It is based on six questions, which were used in our survey as follows: How often have you been feeling (a) nervous, (b) restless or fidgety, (c) so sad nothing could cheer you up, (d) hopeless, (e) everything was an effort, and (f) worthless? The answer to each question is given via a 5-point Likert scale ranging from 0 'never' to 4 'very often'. A K6 result of 13 or above indicates high psychological distress, a score of between 8 and 12 denotes moderate psychological distress, and a score of between 0 and 7 denotes low psychological distress [48].

Objective Predictors
In this study, objective predictors included three key risk factors that were derived from recent research: direct distance from the respondent's housing to the nearest COVID-19 hospital [10], direct distance from the respondent's housing to the nearest park [51], and the air quality index (AQI) [52]. The data regarding the parks were obtained from Beijing City Lab (Beijing City Lab, 2019, Data 40, Urban green lands in main Chinese cities 2017, http://www.beijingcitylab.com (accessed on 30 May 2021)) which shared information on 16,721 urban green spaces in 287 Chinese cities in 2017. The data package included the size of different parks, the landscape shape index, and the geocoordinate, which made it possible to conduct the spatial analysis. Data regarding distance to the nearest hospital were obtained from the website (http://file.caixin.com/datanews_mobile/interactive/2020/fever accessed on 9 February 2020), which recorded the location (including the geocoordinate) of hospitals that were specifically used for curing COVID-19 patients. Regarding the AQI, data were acquired from a real-time remote inquiry website (Airborne Fine Particulate Matter and Air Quality Index. Secondary Airborne Fine Particulate matter and Air Quality Index 2020. Available online: http://www.pm25.in/ (accessed on 7 July 2020)) that provides an hourly value for the AQI with real-time concentration and the 24 h moving average of air pollutant indicators, such as PM2.5, PM10, SO 2 , CO, NO 2 , and O 3 . AQI observation data from 20 January to 13 February were collected and cleaned and then connected to the survey data through the 'Near' id. The 'Near' id was generated through the 'Near' tool. In this study, the spatial 'Near' tool in ArcGIS was used to calculate the direct distance from each property to the nearest COVID-19 hospital and the nearest park. We also measured the direct distance from each respondent's location to the nearest monitoring stations that provided the AQI.

Perceived Predictors
Perceived predictors are defined by a limited number of characteristics applicable to the Chinese context, including perceived distance to the nearest COVID-19 hospital, perceived indoor air quality, and perceived environmental quality [10]; these were measured on 5-point Likert scales. Perceived distance to the COVID-19 hospital was measured by the following question: 'How would you rate the distance from your house to the nearest COVID-19 hospital?', which referred to a hospital specifically used for curing COVID-19 patients. The options were (1) very far, at least an hour's drive; (2) far, at least half hour's drive; (3) close, at least 10 to 30 min' drive; and (4) very close, a 5-min drive. Perceived air quality was assessed with the following question: 'How would you rate your indoor air quality level overall?' Response categories ranged from (1) extremely bad to (5) extremely good. Perceived neighborhood environmental quality was measured by the following question: 'How would you rate your neighborhood environment overall?' Response categories ranged from (1) neighborhood environment maintained in very poor quality to (5) neighborhood environment maintained in very good quality.

Health and Sociodemographic Predictors
Many studies have identified several lifestyle variables as significant predictors of psychological distress; physical health status, physical activity, and behaviors such as smoking and drinking are significantly associated with psychological distress [53][54][55][56][57][58]. Accordingly, in this study, we investigated five potential health risk factors that may affect the response predictors: self-rated health level, physical health status, physical activity, and current smoker or drinker. Respondents' self-rated health was measured with the question: 'In general, how would you rate your general health status?' with possible choices ranging from (1) very poor to (5) very good. Respondents' physical health status was measured with the question: 'Have you been diagnosed with a chronic disease in the past six months?' This item was then coded (1) yes or (0) no. Respondents' regular physical exercise was assessed with the question: 'How often have you exercised during the outbreak of COVID-19?' Response categories ranged from (1) never to (5) very often. Regarding behaviors, we measured smoking and drinking with two items. Smoking was measured with the following question: 'Have you smoked in the last month?' This item was coded (1) for smokers or (0) otherwise. Similarly, we measured drinking by the following question: 'Have you drunk alcohol more than 3 times per week in the last month?' This item was coded (1) for drinkers or (0) otherwise. Additionally, we controlled for sociodemographic predictors, such as age, gender, marital status, educational attainment level, and household income level.

Reliability and Validity
Before running the GBDT model, the variables were first analyzed in SPSS (IBM, Armonk, NY, USA) to test their reliability. Cronbach's alpha is the coefficient used to estimate the reliability of instruments based on internal consistency [59]. Cronbach's alpha reliability coefficient normally has values from 0 to 1, where a higher value refers to a greater internal consistency of variables in the scale. The results show an acceptable validity of the K6 with a reliability coefficient (alpha) of 0.73 > 0.70. A reliability coefficient (alpha) of 0.70 or higher is considered an acceptable reliability in previous studies in SPSS [60][61][62].

Results
We ran the "gbm" package in R to apply the GBDT model used by the study [63]. Five-fold cross-validation was applied to minimize the estimate and overfitting errors. We conducted Gaussian regression to reduce the squared error. In this study, we set the maximum of 10,000 trees, kept the learning rate at 0.001, the interaction depth at 5, the bag fraction at 0.5, and the training fraction at 0.5. After 2286 boosting interactions, the model generated the best results. Table 1 defines and describes the variables used in this study. Table 1 presents the descriptive characteristics for the sample (n = 937). The respondents were predominantly young, with 59% aged between 18 and 34 years, and 34.7% were male. In addition, 60.8% of the respondents were married, 24.5% of the respondents had attained a degree and above, and 35.5% of the respondents earned over 10,000 yuan monthly, which was higher than the national average of 4340 yuan according to the 2019 wave of the China Household Finance Survey (CHFS) (CHFS has conducted several follow-up surveys since 2011). Regarding respondents' lifestyle characteristics, 21.3% of the respondents reported that they slept less than 7 h per day, and 24.6% reported having slept over 9 h per day. Nearly half of the respondents lived a sedentary lifestyle, exercising only once a week or less. Additionally, 21.7% of the respondents reported having smoked in the previous four weeks, which was lower than the national average of 26. . Furthermore, 20.1% of the respondents reported having consumed alcohol more than three times per week in the previous month, which is higher than the national average of 14.0% reported by the fourth wave of the CFPS conducted in 2018. Less than 50% of the respondents reported being satisfied with their neighborly relationships, while about 38% of the respondents perceived that their environment was maintained in good and very good quality. Furthermore, less than 15.5% of the respondents rated their indoor air quality as bad or extremely bad. Of note, the average distance from residence to park was 37.8 km, and the average distance from residence to the nearest COVID-19 hospital was 67.1 km, whereas 21.9% of the respondents reported living a very far distance from a COVID-19 hospital. Overall, the sample primarily contains young, relatively welleducated adults; this perhaps explains the average K6 score of 9.2, which suggests that the respondents perceived only moderate psychological distress. In comparison, the average K6 score reported in the CFPS 2018 rating was 9.4, which is higher than the score of K6 in our study.  Table 2 shows the relative importance of all independent variables for predicting respondents' psychological distress. All independent variables were ranked in accordance with the size of their relative importance. The results from Table 2 show that health predictors make a significant contribution in predicting psychological distress level, accounting for 42.32%. Next are objective predictors (accounting for 23.49%), sociodemographic predictors (accounting for 17.91%), and perceived indicators (accounting for 16.26%). Table 2. Importance of independent variables in predicting psychological distress K6 score.

Predictors
Relative Importance (%) Rank Regarding the health predictors, two essential predictors among all the independent variables are disease and self-rated health, with 17.46% and 16.23% of the predictive power, respectively. One underlying mechanism is respondents who had a chronic disease and a low level of self-rated health, as they were more likely to have a higher level of psychological distress. Being a current drinker and smoker were the third and fourth most important predictors among health characteristics, which contribute to the predicting power with 3.45% and 3.14%, respectively. Physical exercise was ranked last, and accounted for only 2.04%. The most essential predictor among the objective predictors is the distance to parks, which accounted for 9.38%. This is followed by the distance to a COVID-19 hospital (accounting for 7.47%), and AQI ranked third (accounting for 6.64%). Furthermore, neighborly relationships had an important influence on psychological distress, with a relative importance of 6.96%. Perceived indoor air quality played an essential role in affecting respondents' psychological distress level, with a relative importance of 11.62%. This result is plausible since previous research suggests that residents who have a lower level of perceived indoor air quality are more likely to perceive a higher level of psychological distress.

Association between High-Ranking Predictors and Psychological Distress
The results from Figure 1 suggest that respondents who had a chronic disease and low self-rated health level were more likely to perceive a higher level of psychological distress. Furthermore, a higher level of perceived indoor air quality contributed to a lower level of psychological distress. The association between distance to parks and psychological distress was an inverse V-shaped curve. There was a dramatic increase in the level of psychological distress as the distance from residence to parks increased in terms of respondents living adjacent to a green space. Eventually, the level of psychological distress tended to be stable when the distance from residence to parks reached a certain distance. This indicates that respondents' psychological distress level is sensitive to the distance from their residence to parks, and such sensitivity seems to be mitigated as the distance increases to a certain value. Similarly, we observed a continuous V-shaped curve effect of distance to a COVID-19 hospital on psychological distress. Eventually, it reached a moderate level of psychological distress as the distance from the COVID-19 hospital approached a certain value, which accounts for a score of approximately 9. Moreover, we found that the psychological distress level increased rapidly with respect to an AQI ranging between 40 and 50. The trend levelled off when the AQI was between 50 and 100. Once the AQI exceeded 100, the number of respondents suffering from psychological distress spiked dramatically and then decreased and remained stable for a long time. Regarding the sociodemographic predictors, the results suggest that respondents who had a higher educational attainment level and a good relationship with neighbors were more likely to perceive a lower level of psychological distress. Such effects showed a downward gradient. Table 3 further displays the gender-sensitive analysis of independent variables on predicting respondents' psychological distress. Similar to Table 2, all independent variables were ranked in accordance with the size of their relative importance.

Gender Senstive Analysis
the sociodemographic predictors, the results suggest that respondents who had a higher educational attainment level and a good relationship with neighbors were more likely to perceive a lower level of psychological distress. Such effects showed a downward gradient.   For women, the results show that both health predictors and objective predictors make a considerable contribution to predicting the psychological distress level, accounting for 32.05% and 32.01%, respectively. Next are perceived predictors (accounting for 21.13%) and sociodemographic predictors (accounting for 14.80%). In contrast, it is worth noting that there is a slight change in the relative importance ranking among men. The results show that health predictors make a considerable contribution to predicting psychological distress level, accounting for 31.31%. Next are sociodemographic predictors (accounting for 27.57%), objective predictors (accounting for 25.10%), and perceived indicators (accounting for 18.93%).
Regarding the health predictors, we found that disease, which accounted for 16.87%, played a crucial role in influencing psychological distress in men, while self-rated health, which accounted for 25.20%, had the greatest influence on psychological distress among women. The effect of smoking on psychological distress was more pronounced in men (accounting for 1.56%) compared to women (accounting for 0.57%). In contrast, the influence of drinking on psychological distress was more pronounced in women (accounting for 0.80%) compared to men (accounting for 0.63%). In terms of objective predictors, we found that men were more sensitive to the impact of AQI on psychological distress (accounting for 9.74%) than women (accounting for 8.98%), while women were more sensitive to the influence of the direct distance to the nearest COVID-19 hospital on psychological distress (accounting for 14.21%) compared to men (accounting for 8.89%). Regarding the sociodemographic predictors, interestingly, we found that neighborly relationship (accounting for 16.7%) played an important role in influencing psychological distress in men but not in women (accounting for 2.43%). Lastly, in terms of perceived predictors, the results show that perceived indoor air quality (accounting for 13.67%) played an essential role in affecting women's psychological distress, while perceived environment had a greater effect on men's psychological distress (accounting for 8.84%).

Main Findings
Numerous studies have examined the association between different risk factors and mental health during the COVID-19 pandemic and have produced mixed results depending on the different cultural contexts. Some studies have focused on evaluating the effects of the COVID-19 risk factor on psychological health while others have focused more on how sociodemographic characteristics affect psychological health. Yet, it remains unknown as to which factor has a substantial impact on the psychological health level. The present study fills this gap by exploring the varied importance of different risk factors in predicting the level of psychological distress, and it further examines the threshold effect of each risk factor for affecting the level of psychological distress. We found evidence that health risk factors made the greatest contribution to predicting the level of psychological distress. Meanwhile, objective risk factors had a stronger predictive power than perceived risk factors, whereas perceived indoor air quality played a more important role in predicting psychological distress compared to the ambient air pollution during the COVID-19 pandemic.

Evidence on the Association between Risk Factors and the Level of Psychological Distress
First, we found that health and objective predictors played a substantial role in predicting the psychological distress level compared to sociodemographic predictors and perceived predictors. This finding was partially consistent with current studies of the link between built environment and body mass index [42,64], which had found that built environment predictors played a more essential role in affecting the related health outcomes than other predictors. Regarding the health predictors, we found that whether respondents had chronic disease and respondents' self-rated health contributed over one third to predictive psychological distress. This finding was in line with the current COVID-19 study that a respondent with a chronic disease or with self-rated poor health had a significantly higher level of psychological distress [10,12,38]. This finding is plausible since an unpredicted virus outbreak could potentially exacerbate the negative effect on the level of psychological distress of respondents with a lower self-rated health level or chronic disease.
Regarding the objective predictors, an interesting finding of this study was that both distance to a park and distance to a COVID-19 hospital showed a non-linear relationship with psychological distress. Such a relationship fits an inverted V-shaped curve in general. This finding was in line with previous empirical studies suggesting that proximity to urban greenness had a significant inverted U-shape effect on health wellbeing [65]. A non-linear association has been found between AQI and the level of psychological distress. We observed a dramatic rise in the moderate level of psychological distress in regard to an AQI of between 40 and 50 and between 110 and 130, respectively. This result was consistent with a similar study, which indicated that the potential threshold effect of NO 2 played a substantial role in health-based risk assessment [66]. One underlying mechanism might be that people are more sensitive to the dramatic increase in air pollutants that pose a threat to psychological health. In terms of the perceived predictors, an interesting finding was that perceived indoor air quality contributed 11.62% to predicting the level of psychological distress compared to AQI, which accounted for 6.64% when predicting the level of psychological distress. This finding indicates that perceived indoor air quality contributed more to predictive psychological distress compared to the ambient air pollution during the COVID-19 pandemic. One possible explanation might be that residents spent the majority of their time at home due to the lockdown restrictions during the very early stage of the pandemic.
Regarding sociodemographic predictors, we found that respondents' gender, age, marital status, household income, and geolocation did not affect the level of psychological distress significantly more than other risk factors, in addition to respondents' neighborly relationships and educational attainment level. This finding partially challenges the common assumptions in the COVID-19 psychological health literature, as most studies have suggested that sociodemographic characteristics have played a significant role in influencing the level of psychological distress [12]. However, studies have not quantified which sociodemographic characteristics have played a more important role in affecting the level of psychological distress in addition to evidence indicating the significant or non-significant relationship between sociodemographic characteristics and the outcome. Overall, this study has provided a potential pathway to quantify the importance of different predictors on psychological health and has produced a more realistic association between related risk factors and psychological health by using the GBDT model. Further studies are advised to apply the GBDT to evaluate the direct and indirect effect of different noise exposures in both outdoor and perceived environments on mental health and to explore the potential preventive benefits of psychological noise attenuation by urban environment [67][68][69]. Additional research on exploring the potential pathway between green space exposure, air pollution, and psychological wellbeing is also recommended using GBDT [70][71][72][73].

Evidence from Gender Sensitive Analysis
Though we found that respondents' gender did not have a significant effect on the level of psychological distress, we conducted additional gender-sensitive analysis to explore whether men and women would have different responses to psychological distress based on different risk factors. The results regarding the relative importance of different predictors among women were largely consistent with results in the total sample, while the results in men were partially in line with the results in the total sample. Specifically, for men, disease was considered the main risk factor that contributes to a higher level of psychological distress, while self-rated health contributes most to a higher level of psychological distress level among women. This finding indicates the robustness of the main findings and is in line with results in the total sample suggesting that health predictors contribute most to psychological distress for both men and women. Furthermore, we found that the influence of drinking on psychological distress was more pronounced for women than for men, which was consistent with previous studies indicating that the relationship between drinking and stress was more relevant to women than men [74,75]. Furthermore, in terms of the objective predictors, the results in men are largely consistent with the findings in the total sample. However, an interesting finding was that the distance to the nearest COVID-19 hospital contributed significantly to psychological distress in women. One possible explanation might be that women were more likely to perceive severe symptoms during the pandemic than men [76,77], and living close to the nearest COVID-19 hospital exacerbated such influences. Additional results show that men were more sensitive to the perceived environment but less sensitive to the impact of perceived indoor air quality on psychological distress, while women were more sensitive to the impact of perceived indoor air quality on psychological distress, which warrants further study.

Limitations
This study has several limitations. First, since our study applied a cross-sectional design, we cannot avoid the causal effect, which may lead to potential bias in the estimations. Nevertheless, our sample was collected at a relatively early stage of the outbreak of COVID-19 in China, which might minimize such bias to some extent. Second, this study used a non-random sampling design based on network invitation through WeChat, which might cause specific populations across the country to be under-represented in the sample. In our study, one underlying reason for the limited sample size of older adults might be that elderly people have limited access to mobile phones. Third, as perception variables were self-reported and variables were measured as the focal perception of the survey participants, which might also create potential bias. Lastly, empirical research might not control for the hypothesis; instead, an experimental method should be considered for future research if the necessary data are available. These limitations warrant future research to address these challenges.

Conclusions
Using individual items of data (n = 937) obtained from an online questionnaire between 20 January and 13 February, the early stage of the outbreak of coronavirus  in 2020, we quantified the relative importance of different risk factors in predicting the level of psychological distress by using the GBDT. The results from this study indicate that among all predictors, health predictors played the most important role in predicting the level of psychological distress. Though objective predictors contributed slightly more to predicting the level of psychological distress compared to perceived predictors, perceived indoor air quality played a more important role in predicting psychological distress compared to the ambient air pollution. This finding might be more significant during the COVID-19 pandemic, when respondents were compulsorily quarantined at home. Finally, we found that women and men respond differently to psychological distress based on different risk factors.

Institutional Review Board Statement:
The study has received ethical approval from the School of Energy, Geoscience, Infrastructure and Society, at Heriot-Watt University.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Data sharing not applicable.