Self-Perceived Health, Life Satisfaction and Related Factors among Healthcare Professionals and the General Population: Analysis of an Online Survey, with Propensity Score Adjustment

: Healthcare professionals (HCPs) often suffer high levels of depression, stress, anxiety and burnout. Our main study aimswereto estimate the prevalences of poor self-perceived health, life dissatisfaction, chronic disease and unhealthy habits among HCPs and to explore the use of machine learning classiﬁcation algorithms to remove selection bias. A sample of Spanish HCPs was asked to complete a web survey. Risk factors were identiﬁed by multivariate ordinal regression models. To counteract the absence of probabilistic sampling and representation, the sample was weighted by propensity score adjustment algorithms. The logistic regression algorithm was considered the most appropriate for dealing with misestimations. Male HCPs had signiﬁcantly worse lifestyle habits than their female counterparts, together with a higher prevalence of chronic disease and of health problems. Members of the general population reported signiﬁcantly poorer health and less satisfaction with life than the HCPs. Among HCPs, the prior existence of health problems was most strongly associated with worsening self-perceived health and decreased life satisfaction, while obesity had an important negative impact on female practitioners’ self-perception of health. Finally, the HCPs who worked as nurses had poorer self-perceptions of health than other HCPs, and the men who worked in primary care had less satisfaction with their lives than those who worked in other levels of healthcare. HCPs’ self-perceived health, life satisfaction and associated factors is essential to enabling policy makers and healthcare managers to design and implement effective programmes to improve the attention paid to human resources. The study results we report can be used as a baseline for monitoring the health effects produced in HCPs by the COVID-19 pandemic and for assessing interventions to beneﬁt the welfare of these professionals, whose current role makes them priority beneﬁciaries of such attention. published manuscript.


Introduction
One of the elements of the physician's pledge in the 2017 revision of the Declaration of Geneva, adopted by the World Medical Association (WMA), states: 'I will attend to my own health, well-being, and abilities in order to provide care of the highest standard [1]'. This addition to the previous Declaration of Geneva acknowledges that patients suffer when the well-being of healthcare professionals (HCPs) is compromised [2] and was adopted in response to the growing awareness that physicians and nurses present high levels of depression, stress, anxiety and burnout [3]. In fact, suicide is the only cause of death that has a higher prevalence among physicians than in the general population [4], and the situation among nurses is likely to be similar [5]. Moreover, the prevalence of substance abuse and/or addiction among physicians is likely to be similar to that found among the general public, or even higher [6].
The WMA recommends that more research be conducted into physicians' health and well-being and into the impact of these parameters on the patient care provided [7]. In view of these considerations, the main objectives of this research were to estimate the prevalence 2 of 27 among HCPs of ill health, dissatisfaction, chronic disease and unhealthy lifestyle habits and to identify and analyse factors associated with life satisfaction and perceived health status.
We addressed these study goals by means of an online survey, an approach that offers substantial advantages over traditional survey techniques in terms of financial and time savings.Health surveys have traditionally used probability sampling of addresses and data collection facilitated by an interviewer who visits each address, but this traditional approach has some limitations, such as the great economic and temporal cost and the susceptibility to nonresponse bias. The main motivation for using nonprobability samples (as volunteer web surveys) is their low cost, lowrespondent burden and quick turnaround since they allow for producing estimates shortly after the information needs have been identified.
Although the validity of internet research for subjective surveys of personal well-being is well established [8] and online questionnaires are recognised as an important tool for epidemiological research [9], many surveys of this type are subject to self-selection [10,11]. Ref. [12] found in a health study that the bias in web surveys is too important, even when additional quotas are set. Statistical adjustments are the key to obtaining reliable estimates from online survey data. Among the various techniques to remove bias in web surveys, we could underline propensity score adjustment (PSA). This method, originally developed for reducing selection bias in non-randomised clinical trials [13], was adapted to nonprobability surveys in the work of [14,15]. PSA aims to estimate the propensity of each individual's participation in a survey by using logistic regression. [16] assessed the ability of PSA to remove bias in the context of sensitive sexual health research and the potential of web panel surveys to replace or supplement probability surveys.
Another goal of this research was to explore the use of machine learning (ML) classification algorithms to remove selection bias by reweighting the study variables via PSA. ML techniques are commonly employed in epidemiology [17][18][19], and statistical algorithms have been used to weight variables in recent health surveys [20][21][22].These techniques have also shown good properties in simulated data in terms of bias reduction [23,24] but at the cost of increasing the variance of the estimates. However, the mean square error (MSE), which combines bias and variance, is reduced with PSA in some situations, meaning that its application can be recommended in nonprobability sampling contexts. The objective of this study was to compare the performance and applicability of ML algorithms for PSA using several transformations to convert the probabilities provided by PSA into weights in a real-world context. This work pioneers the use of ML techniques to adjust the voluntary response bias in a real health survey and shows the capabilities of the different methods compared with the usual non-adjustment methodology.

Target Population
In 2014, according to census data, the Public Health System of Andalusia (SAS) employed 137,882 HCPs. However, for the purposes of this study, only those with a university degree were considered for inclusion, and so the target population was composed of the 73,465 HCPs who had this academic qualification.

Sample
In 2014, the participants in an online course on holistic care for patients with chronic diseases were asked to complete a web survey. These participants (n = 1797) were all university graduates working in the SAS as HCPs.

Variables
The following variables were present in both datasets (web survey and census): sex, age, degree and type of medical care provided (Table 1). In addition to the variables presented in the table, the following variables were also addressed in the web survey: In order to make the prevalences of the healthcare professional survey comparable with those of the general population, the same categorisation and cut-off points of the Andalusian Health Survey [26] were applied for those study variables considered in both surveys, as follows: poor health ≤3 (i.e., fair, bad or very bad); dissatisfaction with life ≤6; ≥1 alcoholic drink per month; and insufficient sleep <7 h of sleep per night.

Sampling Weights
As shown in Table 1, HCPs aged 36-55 years were over-represented in the web survey sample with respect to the target population as well as to primary care HCPs. On the other hand, there was an under-representation of HCPs with a degree in nursing.
Given a volunteer survey s v , the usual estimator of the population proportion is the Horvitz-Thompson estimator given by where A i = 1 if the unit i in the sample s has the desired characteristics and 0 else, and w i is the weight (the inverse of the sampling rate). To adjust for the lack of probability sampling and the resulting non-representativeness, the sample was weighted, using the standard procedure of propensity score adjustment (PSA) for web surveys [14,15].
This approach aims to estimate the propensity of an individual to be included in the nonprobability sample by combining the data from the sample s v with a reference probability sample s r and training a predictive model on the variable δ, with δ i = 1 if i s v and δ i = 0 if i s r . PSA assumes that the selection mechanism of s v is ignorable and follows a parametric model: for some function π of the observed covariates x i and a parameter γ. The usual procedure is to estimate the parameter γ by using logistic regression and to transform the estimated propensities to weights by inverting them: whereπ(x i ) denotes the estimated propensity for the individual i s v . This transformation is equivalent to the Hajek estimator of the population proportion. An alternative that takes into account the fact that individuals of s v must be excluded from the target population of s r is the formula presented in [27]: We considered the following algorithms for estimating the aforementioned propensities: The k-nearest neighbours algorithm, with k = 5 (5-NN) • Naïve Bayes with no Laplace smoothing • Random forest with 500 trees • Gradient boosting machine (GBM) with 100 trees, interaction depth of 1 and learning rate of 0.1 • Feed-forward neural networks with one hidden layer, initialising weights to 0 and considering three cases with 1, 3 and 5 units in the hidden layer In all cases, the probabilities calculated in PSA were transformed into weights for Hajek estimators, following the formula for p PSA2 stated in [27]. Weights for Horvitz-Thompson estimators were also calculated, in accordance with [15]. PSA was performed in R 3.1.5 [29] using the packages sampling [30], survey [31], C50 [32], randomForest [33], gbm [34], e1071 [35], caret [36] and nnet [37].
The weights for the Horvitz-Thompson estimators were discarded, as they were unstable and produced unacceptably high variances. In general, the Horvitz-Thompson weights, although they correlated with the Hajek weights obtained by the same methods, presented higher levels of skewness, probably caused by the grouping features of the weighting method (see Appendix A). Moreover, the weights obtained by PSA using decision trees and neural networks with five units were also discarded, as they were found to be equal to the design weights and so provided the same outputs as in the unadjusted case.

Statistical Analysis
Several weights were applied in estimating the prevalence of each of the variables considered. To reflect potential differences between male and female HCPs in these prevalence values, sex was taken as a stratification variable. The variances of the proportion estimators were calculated using the leave-one-out jackknifealgorithm [38], implemented in the bootstrap package in R [39]. Prevalence values for the study population were compared with those for the general population [26] in the same age range (22-67 years).
Multivariate ordinal logistic regression models were run to characterise the ordinal variables of life satisfaction and self-perceived health status. Sampling weights were applied in the models, which were constructed independently for male and female HCPs. In the statistical analysis, the scales for life satisfaction and self-perceived health status were inverted; thus, odds ratios (OR) >1 mean that the explanatory variable increases the probability of dissatisfaction with life or of poor self-perceived health. In addition, those reference categories of the explanatory variables which obtained a better interpretation of odds ratios (i.e., OR > 1) were chosen. The following explanatory variables were included in the models: Multicollinearity of the independent variables was assessed using the variance inflation factor (VIF) [40], which indicates collinearity if the factor takes large values. The factor was discarded for VIF >3 [41]. Therefore,'chronic diseases' and 'physical, mental or sensorial disability' were not included in the final model. Alcohol consumption was also excluded because of its low association with the dependent variables of the models, which was assessed with a preliminary regression analysis where the alcohol variable was not significant and had a beta coefficient around zero. The rest of the coefficients and test statistics remained almost unchanged with respect to the case without the alcohol consumption variable.To observe the range of values in which the coefficients would be applicable to the entire population, 95% confidence intervals were calculated. Hypothesis testing of the beta coefficients was performed with the Wald test. Statistical and graphical analyses were performed in R 3.5.1 using the packages poliscidata [42] and ggplot2 [43], respectively, in addition to those mentioned above.

Prevalence Estimations
According to results provided by PSA with logistic regression,10.3% of male HCPs ( Table 2) and 12.6% of female HCPs (Table 3) were dissatisfied with their life and 8.4% of male and 7.8% of female professionals perceived their own health as poor. Regarding lifestyle habits, 62.3% of the men and 42.8% of the women drank alcohol at least once a week, while 31.1% of the men and 26.7% of the women slept for less than seven hours a day. Finally, 31.8% of the men and 22.3% of the women reported havingat least one chronic disease. Moreover, 26.3% of the men and 20.6% of the women had one health problem, 10.4% and 6%, respectively, had two or more health problems, and 7% of men and 6% of women had a disability (Tables 2 and 3). Figures A8 and A9 of Appendix B show the 95% confidence intervals for the prevalence of each of the variables considered. All of the estimations were very similar, whichever method was applied, although some point estimates varied slightly due to the influence of certain algorithms on the propensity estimation step. In consequence, there were no statistical differences between the prevalences estimated among any of the weighting methods applied. The logistic regression algorithm obtained the best results in terms of both prevalence and variance deviations compared with no weighting adjustment (see Tables A3 and A4 of Appendix B). As stated before, PSA contributed to increasing the variance of the estimators but reduced their bias, meaning that the estimates based in PSA might be more valuable as they mitigated the effect of non-sampling errors in the final estimates. Given that the estimates provided by PSA with different algorithms were very similar (and therefore might reduce the bias in the same amount), the choice that reduced MSE to the minimum extent might be the estimate with the lowest variance. Table 4 shows the prevalences of the study variables for the general population [26] and the HCPs. The latter group self-reported significantly better health and greater satisfaction with life than the general population. In addition, while women in the general population reporteda significantly worse perception of their health than men (17.5% and 12.1%, respectively, reported poor health), female HCPs had a better, although non-significant, perception in this respect, compared with their male counterparts (7.8% and 8.5%, respectively). On the contrary, women reported significantly less satisfaction with their life than men, both those in the general population (19.2% vs. 16.3%, respectively) and among the HCPs (12.6% vs. 10.3%, respectively). With respect to alcohol consumption (at least once in a month), the men in the general population and among HCPs reportedsignificantly higher prevalencesthan women. In addition, alcohol consumption was significantly more prevalent among male and female HCPs than among men and women in the general population (79.8% and 60%, 62.5% and 37.1%, respectively). Regarding hours of sleep per day, significantly more HCPs than persons in the general population slept for less than 7 h. This difference was especially marked among men (31.2% vs. 17.7%, respectively). In addition, significantly more male than female HCPs slept for less than 7 h per day (31.2% vs. 26.7%, respectively), which is contrary to the pattern observed in the general population.
The presence of chronic disease was much more prevalent among women in the general population than among female HCPs (45.3% vs. 22.3%, respectively), but no such difference was observed between the two groups of men (35.9% vs. 31.8%, respectively). The prevalence of disability was almost twice as high among HCPs as in the general population (6% vs. 3.5%, respectively). In this respect, there were no differences between men and women.

Regression Modelling
As described above, the regression modelling was performed using three types of weighting: no adjustment, PSA using logistic regression for prevalence estimation and PSA using a neural net with one unit for prevalence estimation. These weighting methods were selected taking into account the low degree of variability among them, which means that one or more could be discarded if necessary to avoid redundancy (see Appendix A for further information on the similarity among weights). In almost every case, the strength of evidence against the explanatory variable having a null effect weakened with reweighting, not only because the variance increased (for example, with larger confidence intervals) but also when the beta coefficient shifted towards zero (or towards one; see Tables 5-8). In other words, when reweighting was performed, it merely addressed misestimation of the association between explanatory variables, caused by the nonprobabilistic sampling method applied in the survey.    Tables 5 and 6 depict the results for the models assessing self-perceived health, and Tables 7 and 8 depict those concerning satisfaction with life. Figures 1 and 2 illustrate the OR for self-perceived health and satisfaction with life, respectively, for male and female participants. The strongest OR for poor self-perceived health was obtained when the respondent had one or more pre-existing health problems. Thus, the prior existence of one health problem increased the likelihood of poor health by 3 and 2 times, respectively, for men and women. In the case of two or more health problems, this probability rose to 8 and 10 times, respectively, see Tables 5 and 6. In addition, there was evidence that the presence of obesity, according to the BMI index, was significantly associated with a lower probability of good health among women (OR = 2.1).
Regarding the type of university degree held, nursing qualifications were significantly associated with poorer self-perceived health, compared with respondents with a degree in medicine, regardless of sex (OR = 1.8), or even among women those whose degree subject was reported as neither medicine nor nursing (OR = 2). However, no significant differences in OR were observed between those who worked in primary care or other level of healthcare. nificant differences in OR were observed between those who worked in primary care or other level of healthcare.
In relation to lifestyle habits, smoking every day was associated with a greater like lihood of poorer self-perceived health in women; no physical activity or only occasiona activity was also associated with poorer self-perception of health, especially in men,aswas sleeping less than seven hours per night. Figure 1. Confidence intervals at 95% for the odds ratiofor each explanatory variable on self-perception of health, using logistic regression for the propensity score adjustment. Reference classes for categorical variables: no health problems, Figure 1. Confidence intervals at 95% for the odds ratio for each explanatory variable on self-perception of health, using logistic regression for the propensity score adjustment. Reference classes for categorical variables: no health problems, never smoked, ≥7 h of sleep, physical exercise several days a week, normal weight or underweight, other level of healthcare and degree in medicine. The x axis scale is logarithmic to facilitate interpretation of the data. Figure 2. Confidence intervals at 95% for the odds ratio for each explanatory variable on self-perceived life satisfaction after applying logistic regression to the propensity score adjustment. The following reference classes are assumed for the qualitative variables: no health problems, never smoked, seven or more hours of sleep per night, physical exercise several days a week, normal weight or underweight, working in other level of healthcare and holding a degree in medicine. The xaxis scale is logarithmic to facilitate interpretation of the data.

Discussion
The stress of addressing the COVID-19 pandemic is having significant ill effects on HCPs'mental and physical health [44]. In consequence, the analysis of relevant data compiled before the present crisis is of crucial assistance to efforts to maintain and/or Figure 2. Confidence intervals at 95% for the odds ratio for each explanatory variable on self-perceived life satisfaction after applying logistic regression to the propensity score adjustment. The following reference classes are assumed for the qualitative variables: no health problems, never smoked, seven or more hours of sleep per night, physical exercise several days a week, normal weight or underweight, working in other level of healthcare and holding a degree in medicine. The x axis scale is logarithmic to facilitate interpretation of the data.
In relation to lifestyle habits, smoking every day was associated with a greater likelihood of poorer self-perceived health in women; no physical activity or only occasional activity was also associated with poorer self-perception of health, especially in men, as was sleeping less than seven hours per night.
The results obtained from the analysis of self-perceived life satisfaction are detailed in Tables 7 and 8 and illustrated in Figure 2. As in the case of self-perceived health, the strongest negative association with life satisfaction was measured for prior health problems, and this relationship became significantly stronger for both male and female respondents as the number of pre-existing health problems increased. For men, furthermore, working in primary rather than other levels of healthcare was also associated with less life satisfaction. Another important factor was that of physical inactivity, which was also associated with lower levels of life satisfaction, especially among men, although the differences with women in this respect were not statistically significant. Thus, male and female HCPs who performed no physical activity at all were 5 and 2.5 times, respectively, more likely to have less satisfaction with life than their more physically active counterparts. With respect to tobacco consumption, women who smoked (whether every day or less frequently) were more likely to report lower levels of life satisfaction than those who had never smoked. Finally, HCPs who slept less than seven hours per night were around 1.5 and 1.8 times (for men and women, respectively) more likely to report low levels of life satisfaction than those who slept for longer, assuming all other variables remained constant.

Discussion
The stress of addressing the COVID-19 pandemic is having significant ill effects on HCPs'mental and physical health [44]. In consequence, the analysis of relevant data compiled before the present crisis is of crucial assistance to efforts to maintain and/or improve HCPs'well-being and to facilitate the application of more effective supportive interventions targeting policies, institutions and individuals [45]. In this regard, attention to personal welfare and service quality is of the utmost importance [46].
Regarding the methodological aspects of this study, in the analysis of nonprobability samples, any inference drawn must take into account the selection bias inherent in the sampling procedure, which in most internet surveys is equivalent to self-selection bias. Propensity score adjustment can be a useful means of overcoming the effects of this kind of bias, although additional calibration may be needed to remove the bias completely [47,48]. In our study, PSA alone produced no substantial changes in the estimates except for the effect of certain variables on the indicators of health and life satisfaction. From this, we conclude that either the original sample was sufficiently representative of the target population or the variables in question did not properly model the self-selection mechanism.
The outcomes from algorithms used to estimate prevalences, as an alternative to logistic regression, did not differ from those obtained by assigning weights to decision trees and 5-unit neural networks. In the first case, this was because the algorithm was unable to grow any branch for the tree, as it did not detect any variable enabling it to classify an individual, either in the self-selected or in the reference sample. In the second case, the feed-forward technique achieved convergence in the first iteration, and therefore no adjustment was needed (see Appendix A for further information). Either or both of these cases might reflect a lack of predictability in the covariates available for both samples. On the other hand, the Horvitz-Thompson weights, which were also obtained for each PSA performed, had to be discarded as they resulted in a higher variance of the estimators and produced unstable and misleading point estimates.
The study has several limitations that have to be pointed out. First of all, there were no available measures to assess whether the bias removal had been successful or not. It is reasonable to assume that adjustments to mitigate selection bias may have a significant effect; however, model misspecification in PSA can increase the bias of the estimates, although the logistic regression model that was used as the reference result showed a relative robustness to changes in the covariates or sample size [23]. Further studies could consider the use of estimators that ensure robustness against model misspecifications, such as the doubly robust estimator proposed in [49].
Moreover, the available covariates did not show a very different behaviour in the online sample in comparison with the full population. This can indicate that the online sample was fairly representative of the population but can also indicate that the available covariates failed to capture the differences between the sampled and the non-sampled population, which could reduce the potential of PSA to mitigate the selection bias.
It was also observed that PSA increased the variance of the estimators in comparison with the unadjusted case. As stated in Section 1, it is known that PSA can reduce the selection bias at the cost of increasing the variance because of the complexity added by the predictive models. However, the bias-variance trade-off is often positive, as the mean square error gets reduced after the application of PSA in certain situations, according to literature [11,14,15,23,24].
Our analysis shows that, although there were no significant differences between male and female HCPs regarding self-rated health and dissatisfaction with life, male personnel had significantly poorer lifestyle habits than their female counterparts, together with a higher prevalence of chronic disease, of disability and of health problems. A different tendency was observed in sleep, chronic disease and health problems when comparedwith the general population. Further research is needed in this area in order to justifyinterventionswhich encourage male HCPs to modify their lifestyle habits in order to prevent problems from spiralling through the burnout cascade stages of reduced activity, distress and despair [50].
In our survey, members of the general population reported significantly poorer health and less satisfaction with life than the HCPs consulted. Although female HCPs consumed alcohol at least once in a month in a significantly higher frequency than those in the general population, they were only half as likely to suffer chronic disease. A limitation of that result is that the quantity of consumed alcohol was not reported in the survey. Other studies have also found a lower prevalence of chronic diseases among physicians than in the general population, with similar percentages to ours, ranging from 13-44% [51,52]. Nevertheless, further detailed, up-to-date research is needed in this area.
Among HCPs, the prior existence of health problems was the factor most strongly associated with worsening self-perceived health and decreased life satisfaction, while obesity had an important negative impact on female practitioners' self-perceived health. Our study did not include work environment, workplace characteristics and other factors such as quality of management, professional development and colleague support/team spirit. Allof those factors have a stronger positive association with HCPs' satisfaction compared with personal and intrinsic factors [53].

Conclusions
For almost all of the explanatory variables, any misestimations caused by the nonprobabilistic nature of the sampling process for the online survey were corrected by reweighting. There were some differences across the estimations provided by different adjustments and estimators, although several groups of algorithms for PSA with similar behaviours could be spotted according to the weights that they provided. Horvitz-Thompson estimates had larger estimated variances, and tree-based bagging algorithms provided more skewed weights, which contributed to an increase in the variance of the estimates. The point estimates finally considered were similar, meaning that they probably removed bias to the same extent, but some adjustments presented lower variances, which made them more desirable in terms of reducing estimation error.According to our analysis, male HCPs reported poorer lifestyle habits and health conditions than their female counterparts, although men and women had similar perceptions of health and life satisfaction. All HCPs self-reported much better health conditions and life satisfaction than the general population. The prevalence of chronic disease among female HCPs was half that of the prevalence measured among the general population but that of disability among all HCPs was almost twice that of the general population. Prior health problems, sleeping for less than seven hours per night, physical inactivity and smoking (by women) were all associated with the perception of poorer health, while obesity (among women), working as a nurse or in primary healthcare (among male HCPs) were associated with less satisfaction with life. Accurate knowledge of HCPs' self-perceived health, life satisfaction and associated factors is essential to enabling policy makers and healthcare managers to design and implement effective programmes to improve the attention paid to human resources. The study results we report can be used as a baseline for monitoring the health effects produced in HCPs by the COVID-19 pandemic and for assessing interventions to benefit the welfare of these professionals, whose current role makes them priority beneficiaries of such attention.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Descriptive statistics of weights obtained through PSA with Horvitz-Thompson weighting applying each predictive algorithm can be observed in Table A1. It can be noticed that weights obtained using C5.0 and neural networks with 5 units in the hidden layer for propensity estimation provide constant weights as a result, equivalent to not doing any adjustment at all and using design weights. The rest of the weights move around the same values given the similarity of means (except for weights using random forest in PSA), but the variability is not the same for all of them. More precisely, variability of weights after using logistic regression is relatively smaller, as well as after the use of naïve Bayes, neural networks with 1 unit in the hidden layer or gradient boosting machines. Variability begins to be relatively high when 3 units are placed in the hidden layer in neural networks and very high when using random forest and 5-NN. In these last two cases, very significant outliers are present. All of the weightings present a high skewness, along with a high kurtosis in a majority of the cases.
Histograms and boxplots for each weighting can be observed in Figures A1 and A2, where some of the patterns detected in the descriptive statistics are notorious. Positive skew is present in all weights, but although some of them are more uniform (such as weights using logistic regression in PSA), positive skew is more pronounced in others and even attributable exclusively to outliers. For example, when using GBM in PSA, most of the weights are below 80, except for only 65 of those weights (3.6% of the individuals) which take values over 220. However, the most notorious cases are those provided by random forest and 5-NN. In the case of random forest, all of the individuals have a weight of 7.96, except for 126 individuals (around 7% of the sample) that take a value of 117.85, much higher than the rest, leading to an increase of the skewness and the variability. On the other hand, weighting using 5-NN in PSA provides weights under 200 (with most of them being under 36.8, as described in Table A1), while a small subset of 11 individuals (0.6% of the sample) has a weight of almost 1400. This disposition largely increases variability, as well as skewness.
Mathematics 2021, 9, x FOR PEER REVIEW 17 of 28 individuals (0.6% of the sample) has a weight of almost 1400. This disposition largely increases variability, as well as skewness. Figure A1. Histograms of Horvitz-Thompson weights. Figure A1. Histograms of Horvitz-Thompson weights.
Descriptive statistics of weights obtained through PSA with Hajek weighting applied to each predictive algorithm can be observed in Table A2.  Descriptive statistics of weights obtained through PSA with Hajek weighting applied to each predictive algorithm can be observed in Table A2.  Weights obtained for Hajek estimators are more stable than those obtained for Horvitz-Thompson ones. In each weighting, values are around the same numbers (mean is identical in all cases), and the coefficient of variation is, in all cases, relatively low and below its counterpart for Horvitz-Thompson weights. Skewness coefficients again show that weights tend to be right-skewed, except for weighting with PSA using random forest, which provides very left-skewed values. Kurtosis coefficients are high as well, showing leptokurtic distributions. Figures A3 and A4 show histograms and boxplots for Hajek weights obtained with each algorithm in PSA. In this case, skewness appears in a smoother manner as propensities were not grouped in strata as was done with Horvitz-Thompson weights. This allows weights to be closer to the arithmetic mean, which results in the decrease in variability previously mentioned. The use of 5-NN or random forest provides the most unstable situations because of the presence of outliers.  Following one-dimensional analysis, Pearson bivariate correlations between weights were analysed. Results of correlations can be observed in Figures A5 and A6.    It is noticeable how correlations are generally positive and relatively high except for two cases: Horvitz-Thompson weighting using 5-NN in PSA and using random forest. In the former case, correlations with the rest of weights are positive but weaker than the rest of the cases (it only shows a slightly stronger relationship when the same algorithm is used but weights are developed for Hajek estimator instead). The random forest case is more remarkable: correlations with any other set of weights are very low, except with Hajek weights using the same algorithm where the correlation is highly negative. It is likely that this lack of correspondence is caused by the propensities estimated by the random forest algorithm, which assigns probabilities very close to the limits 0 and 1, and therefore correlation depends almost exclusively on the few individuals that have been assigned probabilities far from those limits. It is noticeable how correlations are generally positive and relatively high except for two cases: Horvitz-Thompson weighting using 5-NN in PSA and using random forest. In the former case, correlations with the rest of weights are positive but weaker than the rest of the cases (it only shows a slightly stronger relationship when the same algorithm is used but weights are developed for Hajek estimator instead). The random forest case is more remarkable: correlations with any other set of weights are very low, except with Hajek weights using the same algorithm where the correlation is highly negative. It is likely that this lack of correspondence is caused by the propensities estimated by the random forest algorithm, which assigns probabilities very close to the limits 0 and 1, and therefore correlation depends almost exclusively on the few individuals that have been assigned probabilities far from those limits.
In order to better visualise the existent relationships between weights, the correlation matrix was used as an input for multidimensional scaling (MDS) in two dimensions, which explains 89.65% of the total variance. Results of the analysis can be observed in Figure A7. In order to better visualise the existent relationships between weights, the correlation matrix was used as an input for multidimensional scaling (MDS) in two dimensions, which explains 89.65% of the total variance. Results of the analysis can be observed in Figure A7.
Thanks to the scaling, the existence of two differentiated groups can be noted: the group composed of weights obtained using PSA with logistic regression, GBM and naïve Bayes and another group composed of those obtained with neural networks and 5-NN (for Hajek estimators). For 5-NN, if Horvitz-Thompson weighting is used, weights separate from the groups previously mentioned but are closer to the second group than to the first one. Weights obtained with PSA using random forest are very separated from the rest of the weights, no matter which estimator weights were developed for. Mathematics 2021, 9,   Thanks to the scaling, the existence of two differentiated groups can be noted: the group composed of weights obtained using PSA with logistic regression, GBM and naïve Bayes and another group composed of those obtained with neural networks and 5-NN (for Hajek estimators). For 5-NN, if Horvitz-Thompson weighting is used, weights separate from the groups previously mentioned but are closer to the second group than to the first one. Weights obtained with PSA using random forest are very separated from the rest of the weights, no matter which estimator weights were developed for.  Figure A8. The 95% confidence intervals for the prevalence of variables related to self-perceived health and lifestyle satisfaction among male HCPs, according to the algorithms used in the propensity score adjustment (facets are sorted by confidence interval values in order to obtain common yaxis limits in each row). Figure A8. The 95% confidence intervals for the prevalence of variables related to self-perceived health and lifestyle satisfaction among male HCPs, according to the algorithms used in the propensity score adjustment (facets are sorted by confidence interval values in order to obtain common y axis limits in each row).

Appendix B
Mathematics 2021, 9, x FOR PEER REVIEW 26 of 28 Figure A9. The 95% confidence intervals for the prevalence of variables related to self-perceived health and lifestyle satisfaction among female HCPs, according to the algorithms used in the propensity score adjustment (facets are sorted by confidence interval values in order to obtain common yaxis limits in each row). Figure A9. The 95% confidence intervals for the prevalence of variables related to self-perceived health and lifestyle satisfaction among female HCPs, according to the algorithms used in the propensity score adjustment (facets are sorted by confidence interval values in order to obtain common y axis limits in each row).