Socioeconomic Status and Morbidity Rate Inequality in China: Based on NHSS and CHARLS Data

Previous studies have shown there are no consistent and robust associations between socioeconomic status and morbidity rates. This study focuses on the relationship between the socioeconomic status and the morbidity rates in China, which helps to add new evidence for the fragmentary relationship between socioeconomic status and morbidity rates. The National Health Services Survey (NHSS) and China Health and Retirement Longitudinal Study (CHARLS) data are used to examine whether the association holds in both all-age cohorts and in older only cohorts. Three morbidity outcomes (two-week incidence rate, the prevalence of chronic diseases, and the number of sick days per thousand people) and two socioeconomic status indicators (income and education) are mainly examined. The results indicate that there are quadratic relationships between income per capita and morbidities. This non-linear correlation is similar to the patterns in European countries. Meanwhile, there is no association between education years and the morbidity in China, i.e., either two-week incidence rate or prevalence rate of chronic diseases has no statistically significant relationship with the education level in China.


Introduction
So far, an abundance of research has been undertaken on the relationship between socioeconomic status and mortality rates. These studies were carried out in the United States [1,2], Canada [3,4], Europe [5,6], and China [7,8], where they found that the people with socioeconomical disadvantages usually have higher mortality rates than those who have higher education or income level. However, mortality is not the only one which matters. Morbidity, which is another basic element of health, is, at least, equally important as mortality [9]. Morbidity has a vital impact on life expectancy and the length of dependent life [10] when it has a different geographic distribution from the distribution of mortality. Specific in China, morbidity in different regions of China varies widely: 2008 statistics show that the Chengguan District in Tibet has a minimal two-week incidence rate of 5.2%, while the Dongcheng District in Beijing has a maximal two-week incidence rate of 53.2%. As for the prevalence of chronic diseases, the Chengguan District in Tibet has a minimal rate of 5.4%, while the Luwan District in Shanghai, however, has a maximum of 33.6%.
Studies tried to explain how income and education level relate to mortality rates. Those who are wealthy and highly educated are less likely to die younger, as they tend to enjoy advantaged access to health-enhancing resources [11][12][13][14][15][16][17][18], and are more likely to live in well-built houses situated in safe neighborhoods in a non-toxic environment [19][20][21][22]. In addition to affording the cares of better quality [23][24][25], people who have a greater socioeconomic status tend to better understand and follow the instructions given by their health care providers [21,26]. Similarly, people who have a better socioeconomic status may also more easily adopt healthy lifestyles [25,27,28], which decreases their exposures to material deprivation and stressful psychosocial environments [25,[29][30][31].
However, while the inverse effects of socioeconomic factors on mortality have been reported by a number of studies, no consistent and robust associations have been found between socioeconomic status and morbidity rates [4,9]: A higher income levels experienced lower levels of morbidity in England [32,33], European countries [9], and Nordic countries [34], while no association between income and morbidity was observed in Canada [4]. Meanwhile, a higher education level experienced lower levels of morbidity in the United States [33] and European countries [9,35], while no association between education and morbidity was observed in Canada [4], England [32], and Nordic countries [34]. Nevertheless, in China, the relationship between socioeconomic status and the overall morbidity rates was less focused on: Though a research has found no wealth and education gradients in the prevalence of hypertension [36], and another found there is a lack of socioeconomic gradients in the overall incidence of non-hospitalized injuries for children in China [37]. Therefore, it becomes important to understand and discuss the morbidity rate itself. It is not only because the lack of literature but also because that simple associations between mortality and morbidity cannot be made, nor can the trend of morbidity be inferred from the trend of mortality because of the development of vaccinations and medical technology [38]. We cannot directly apply the conclusions about mortality in China to morbidity in China, where morbidity also has an important impact on people's life expectancy in China.
This study aims to focus on the relationship between socioeconomic status and the morbidity rates in China, which helps to add new evidence for the fragmentary relationship between socioeconomic status and morbidity [9]. There are three main contributions of this paper. First, we examine the relationship between socioeconomic status and morbidity rates in China. Previous research mainly focused on other countries except for China. Second, both all-age cohorts and older only aged cohorts' socioeconomic status and morbidity rate are examined in China. Attention was given to the relationship between socioeconomic status and health at all ages [1,2,4,33,39] and older only ages [3,[7][8][9]25]. It is a meaningful observation for China to examine the relationship between socioeconomic status and morbidity still hold only in old aging cohorts after checking all the age cohorts. We use two data sources: NHSS (National Health Services Survey) questionnaire collects data from Chinese residents at all ages; CHARLS (China Health and Retirement Longitudinal Study) questionnaire collects data from Chinese residents aged 45 and older. Third, we discover the non-linearity in the association between income and the morbidity rate in China by incorporating the quadratic term of income into the regression model. Previous studies [9,32] have found the non-linear relationship between income and morbidity, we further examine the existence of the quadratic relationship between income and morbidity.
In summary, this study provides a detailed analysis of the relationship between socioeconomic status and morbidity in China at all ages and old only ages by using three morbidity indicators and two socioeconomic statuses. The structure of the paper is as follows: in Section 2, we describe the data used in our empirical analysis then outline the model; we present empirical results in Section 3; finally, Section 4 contains our conclusion.

Data Sources
Our data come from the National Health Services Survey (NHSS) in China, and the China Health and Retirement Longitudinal Study (CHARLS) [40]. The NHSS survey began in 1993 and is conducted every five years. In this paper, we use the data in 1998, 2003, and 2008. The data of NHSS in 1993, 2013 and later are not used because some important socioeconomic variables were not collected in 1993, and the detailed data of 2013 and later have not been published by the time we conduct this research. As for the CHARLS survey, we use the Harmonized CHARLS data (Version C) published in April 2018. The Table 1 displays the definitions of the original variables selected from the NHSS data. In the NHSS data, health indicators of morbidity rates includes: cut down in daily activities due to a physical or mental problem [9,41] and long-term disability [9], incidence of severe illness [42], number of bedridden days [42], multimorbidity [39,43], chronic medical morbidity [44], etc. Considering the availability of data, three health outcomes are taken as our dependent variables: two-week incidence rate (illnessratio), number of sick days per thousand people (illnessday), and prevalence of chronic diseases (chronicratio). The two-week incidence rate measures the respondents' feelings regarding disease, mainly from the perspective of health services. The rate has three outcomes according to reaction to sickness in two weeks: receiving medical treatment in a health institution, taking medicine or some other adjuvant therapy by themselves, and resting for at least one day without receiving medical treatment or taking medicine. The number of sick days is defined as the average number of sick days in two weeks per 1000 surveyed people, which measures the severity of illness. It is highly correlated with the two-week incidence rate, with a correlation coefficient of 0.946. The prevalence of chronic diseases (chronicratio) refers to the prevalence of chronic diseases among the surveyed population. The variable is positively related to the other two, with both correlation coefficients being over 0.7. The variables on socioeconomic status used frequently includes income [3,41,45], education [3,41], occupational prestige [3], and housing tenure [46]. We focus on income and education in this paper. Real income per capita (income) is defined as the average annual income per capita, deflated by the GDP deflators of each city to its 1998 purchasing value. The deflator data are from the China Statistical Yearbook for Regional Economy (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008). The weighted education years (edu) is calculated by years of completed education composition. In addition, several kinds of controlling variables are usually considered in literature: demographic factors [41], consumption and health expenses level [42,47], accessibility and affordability of health services [48,49], and environment factor [19]. The demographic factors in this paper are measured by four variables: the average age of all age group weighted by group size (average), the population over 65 years old (age65), male population proportion (male), and a dummy variable of urbanization (urban). Consumption and health service characteristics are measured by the following indicators: total annual consumptions per capita (expend), proportion of family health expenditure to total living expenses (mediratio), average annual medical treatment cost (permedicost) and average hospitalization expense of each time (perhospitalcost). The accessibility of health services has two perspectives: geographical accessibility and economic accessibility. The first one considers the distance from and the time cost to medical institutions which measure the physical convenience of accessing health services. Economic accessibility measures people's capacity to afford medical bills, i.e., a patient's income level and whether he/she has medical insurance [48]. The accessibility and affordability of health services are measured by the following indicators: proportion of the population within 1 km of the nearest medical and health unit (distance), proportion of the population whose time cost to the nearest hospital is less than 10 min (time10), and proportion of the population with medical insurance (insurance). The environment factor is shown by the proportion of hygienic toilets (washroom).
In this paper, the variables from the Harmonized CHARLS data are selected then aggregated to correspond with the variables selected from NHSS data. Table A1 in Appendix A displays the definitions of aggregated variables. Corresponding to the dependent variables in the NHSS dataset, we use the prevalence of chronic diseases (CHRONIC_RATIO) as our dependent variable in the analysis on the CHARLS dataset. Meanwhile, the earned income per capita (AVGINDIINCOME_EARN) and average education years (AVGEDU) are the two main socioeconomic variables that we study in the CHARLS dataset. All variables in the CHARLS dataset can be roughly classified to eight sub-categories: morbidity, income, education, demographic backgrounds, health expenditure, health insurance, job status, and family relationships. The first six categories separately correspond to the variables selected from NHSS data. However, because the respondents of CHARLS are older people who possibly have been retired or been receiving extra financial support from other family members, we add the variables of the last two categories to control the effects of non-earned incomes, such as the transfer payments from children.

Descriptive Statistics
In this paper, we use two-panel datasets to perform econometric analysis: a 3-year (1998, 2003, 2008)  In addition to simplify aggregating variables, we also adjusted the variables of income: the real income per capita (income) in the NHSS dataset and the average earned income per capita (AVGINDIINCOME_EARN) in the CHARLS dataset. In the NHSS dataset, because of the collinearity (correlation coefficient is 0.639) between the real income per capita and weighted education years, the original real income per capita was firstly regressed by the weighted education years with OLS (ordinary least squares). Then, it was replaced by the regression residuals. However, considering the lower correlation coefficient (0.532) between the earned income per capita and education years in the CHARLS dataset, we did not adjust the earned income per capita. Then, to discover whether there is a quadratic relationship between income per capita and health outcomes, we separately added the squared real income per capita (income2) to our NHSS dataset, and the squared earned income per capita (AVGINDIINCOME_EARN2) to the CHARLS dataset. Table 2 displays the general descriptive statistics of health outcome indicators, income indicators, and education indicators of the NHSS and CHARLS datasets. Because we used principle component analysis (PCA) to aggregate and rotate our candidate variables to orthogonal components, their descriptive statistics are not reported. In Section 2.4, we introduce how we performed the PCA. To discover the existence of the inequality of health outcomes, we compute the Gini coefficients of health outcome indicators at county/city level: two-week incidence rate, number of sick days per thousand people, the prevalence of chronic diseases in the NHSS dataset, and the prevalence of chronic diseases in the CHARLS dataset. Table 3 reports the Gini coefficients in every year. The Gini coefficients of the three indicators in the NHSS dataset reveal a moderate inequality of health outcomes. However, the prevalence of chronic diseases in the CHARLS dataset shows a lack of variance. It is partly because that the mean value of the prevalence (75%) is much higher than its counterpart (16.35%) in the NHSS dataset: If most people have at least one kind of chronic diseases, their differences of whether to have a chronic disease lead to a smaller variance in numbers. In addition to Gini coefficients, we also compute other types of inequality indices as a reference. Note: "prop" indicates proportions, a digit in the range from 0 to 1  Table A2 displays the complete tabular, where we can also find the existence of similar inequality of health outcomes.
The inequality of health outcomes is related to the difference among counties/cities. For example, Figure 1 displays this relationship using the prevalence of chronic diseases. In Figure 1, every circle marks a county (NHSS) or city (CHARLS) whose X-Y coordinates are defined with real longitudes and latitudes. The circles' color indicates the relative level of a county's or a city's prevalence of chronic diseases when compared with other counties or cities. The red color becomes deeper with the chronic disease prevalence increasing. To make the two sub-figures comparable, we dyed these circles in quantile measure rather than the absolute values of the prevalence of chronic diseases, i.e., a circle's color indicates the county's or city's relative level of chronic disease prevalence in the whole sample. In Figure 1, both sub-figures show a consistent geographic inequality of chronic disease prevalence among counties/cities, e.g., coastal metropolises such as Shanghai (N31.23 • , E121.47 • ) and Guangzhou (N23.13 • , E113.27 • ) have a relatively higher prevalence of chronic diseases; and even counties/cities in the same province may have different levels of chronic disease prevalence. However, this kind of descriptive inequality among counties/cities cannot finally answer whether the inequality of the prevalence of chronic diseases is significantly related to the counties/cities themselves.
In addition to counties/cities themselves, the inequality of chronic disease prevalence is also descriptively related to the income per capita in counties/cities. In Figure 1, circle size denotes the level of income per capita. A larger circle means the county/city has a higher average income level. When combing the circles' color and size, Figure 1 displays an intuitive pattern that counties/cities with a higher level of income per capita tend to have a relatively higher prevalence of chronic diseases. cities with higher income level, such as Shanghai, Beijing (N39.90 • , E116.40 • ), and Chengdu (N30.67 • , E104.07 • ), usually have higher chronic disease prevalence. However, this pattern is not universal, e.g., Luoyang (N36.03 • , E103.73 • ) and Lanzhou (N36.07 • , E103.82 • ), in the sub-figure of CHARLS dataset have higher incomes per capita but a relatively lower prevalence of chronic diseases. Meanwhile, similar patterns can be observed in Figure 2, where circles' color and size separately denote chronic disease prevalence and average education years. In the first sub-figure of Figure 2, counties with higher average education years tend to have a higher prevalence of chronic diseases, while this pattern is less intuitive in the second sub-figure. Nevertheless, all these possible correlations are descriptive but without statistical proof: no individual difference considered, no other socioeconomic factors controlled, etc. Therefore, further econometric analysis is required to accurately answer whether there are significant correlations between health outcomes and income per capita or education years.
inequality of health outcomes.  The inequality of health outcomes is related to the difference among counties/cities. For example, Figure 1 displays this relationship using the prevalence of chronic diseases. In Figure 1, every circle marks a county (NHSS) or city (CHARLS) whose X-Y coordinates are defined with real longitudes and latitudes. The circles' color indicates the relative level of a county's or a city's prevalence of chronic diseases when compared with other counties or cities. The red color becomes deeper with the chronic disease prevalence increasing. To make the two sub-figures comparable, we dyed these circles in quantile measure rather than the absolute values of the prevalence of chronic diseases, i.e., a circle's color indicates the county's or city's relative level of chronic disease prevalence in the whole sample. In Figure 1, both sub-figures show a consistent geographic inequality of chronic disease prevalence among counties/cities, e.g., coastal metropolises such as Shanghai (N31.23°, E121.47°) and Guangzhou (N23.13°, E113.27°) have a relatively higher prevalence of chronic diseases; and even counties/cities in the same province may have different levels of chronic disease prevalence. However, this kind of descriptive inequality among counties/cities cannot finally answer whether the inequality of the prevalence of chronic diseases is significantly related to the counties/cities themselves.
In addition to counties/cities themselves, the inequality of chronic disease prevalence is also descriptively related to the income per capita in counties/cities. In Figure 1, circle size denotes the level of income per capita. A larger circle means the county/city has a higher average income level. When combing the circles' color and size, Figure 1 displays an intuitive pattern that counties/cities with a higher level of income per capita tend to have a relatively higher prevalence of chronic diseases. cities with higher income level, such as Shanghai, Beijing (N39.90°, E116.40°), and Chengdu (N30.67°, E104.07°), usually have higher chronic disease prevalence. However, this pattern is not universal, e.g., Luoyang (N36.03°, E103.73°) and Lanzhou (N36.07°, E103.82°), in the sub-figure of CHARLS dataset have higher incomes per capita but a relatively lower prevalence of chronic diseases. Meanwhile, similar patterns can be observed in Figure 2, where circles' color and size separately denote chronic disease prevalence and average education years. In the first sub-figure of Figure 2, counties with higher average education years tend to have a higher prevalence of chronic diseases, while this pattern is less intuitive in the second sub-figure. Nevertheless, all these possible correlations are descriptive but without statistical proof: no individual difference considered, no other socioeconomic factors controlled, etc. Therefore, further econometric analysis is required to accurately answer whether there are significant correlations between health outcomes and income per capita or education years.

Model Specification
In this paper, we use panel data models with individual effects as our benchmark models. Because individual difference needs to be controlled to evaluate the real correlations between health outcomes and main socioeconomic status indicator variables such as income per capita and education years. On the one hand, we use panel datasets of both NHSS and CHARLS data. On the other hand, the descriptive statistics, e.g., Figure 1, display possible individual differences at county/city level. Nevertheless, there are 95 counties from 31 provinces in the NHSS dataset and 126 cities from 28

Model Specification
In this paper, we use panel data models with individual effects as our benchmark models. Because individual difference needs to be controlled to evaluate the real correlations between health outcomes and main socioeconomic status indicator variables such as income per capita and education years. On the one hand, we use panel datasets of both NHSS and CHARLS data. On the other hand, the descriptive statistics, e.g., Figure 1, display possible individual differences at county/city level. Nevertheless, there are 95 counties from 31 provinces in the NHSS dataset and 126 cities from 28 provinces in the CHARLS dataset, while only three years' data are acquirable. The incidental parameter problem [50] suggested we use province as individual fixed effects and county/city for individual random effects to obtain consistent estimates. In this context, Hausman tests are no longer required if the results of different model specifications are consistent and robust.
Through principle component analysis (PCA), the socioeconomic indicator variables other than income per capita and education years are converted to orthogonal components which are used as control variables in our regression analysis. There are two reasons not to use original socioeconomic indicator variables: One is that many similar variables can together describe a specific aspect of socioeconomic status, e.g., consumptions can be described by both the average amount of annual food consumptions and Engel's coefficient; the other one is that similar or related socioeconomic indicator variables usually have severe collinearity. Selecting original socioeconomic indicator variables are arbitrary and may result in the failure of estimation. Meanwhile, the trading-off among similar but different variables may also result in omitted variable bias. Therefore, we use dimensionality reduction techniques to solve the problems.
PCA, as one of the most popular dimensionality reduction techniques, has been widely used to construct socioeconomic status indices [51][52][53][54], because there are many similar variables to collect in surveys, where similar information may be covered by different variables. PCA can eliminate variable duplication, distinguish dimensions of information, and save as much variance as possible while reducing dimensionality [55]. Thus, considering we have many candidate socioeconomic indicator variables with similar but different economic meanings, we use PCA to summarize these variables to several interpretable components. These components, rather than original socioeconomic indicator variables, are used as control variables in our final model specifications. Specifically, we perform the regression-based PCA with maximum variance on the centralized and scaled socioeconomic indicator variables. Components are selected according to their eigen values (greater than or equal to 1). Finally, these components are named according to loading matrices. In Appendix B, Figure A1 reports the scree plots, when

Results
In the regression analysis on both the NHSS and CHARLS datasets, we use feasible generalized least square (FGLS) estimators to avoid possible heteroscedasticity problems. Meanwhile, in addition to individual fixed effect model and individual random effect model, three kinds of regressions are also reported as robustness check: Two-ways fixed effect models are used to exclude the effect of time in short panel datasets; pooling models with FGLS estimator are used to see whether individual effects significantly affect estimation results; and pooling models with OLS estimator are reported as the most conservative estimates. Table 4 displays the main results of the regression analysis of the county-level NHSS dataset, and Table 5 reports the main results of the analysis of the city-level CHARLS dataset. Meanwhile, in Appendix C, we provide complete regression results without effect terms. Models marked with ⱡ use province-level individual fixed effects. Models marked with ⱴ use city-level individual random effects. 3. *** p < 0.01, ** p < 0.05.
When age and other socioeconomic indicators controlled, the analysis on the CHARLS dataset presents similar estimation results: the coefficient of squared earned income per capita is significantly positive among different models; the average education years has no robust significant impact on the correlations can be profiled with quadratic curves, it means that there are turning-point levels of income per capita: when people's income is lower than these levels, the three health outcome indicators decline with the growth of income level; however, when people's income is higher than the turning-point levels, the increase of incidence and prevalence is positively correlated with income per capita. In this context, the linear term of income per capita (income) does not solely reflect income's correlation with morbidity, but its coefficient decides the turning-point level of income per capita together with the coefficient of squared income per capita. However, in our analysis on two-week incidence rate and the number of sick days per thousand people, the specific turning-point level of income per capita cannot be determined, because the coefficients of the linear term of income per capita are not statistically significant but also not robust among different models. When age and other socioeconomic indicators controlled, the analysis on the CHARLS dataset presents similar estimation results: the coefficient of squared earned income per capita is significantly positive among different models; the average education years has no robust significant impact on the use province-level individual fixed effects. Models marked with ncome per capita: when people's income is lower than these levels, the three health outcome ndicators decline with the growth of income level; however, when people's income is higher than he turning-point levels, the increase of incidence and prevalence is positively correlated with income er capita. In this context, the linear term of income per capita (income) does not solely reflect income's orrelation with morbidity, but its coefficient decides the turning-point level of income per capita ogether with the coefficient of squared income per capita. However, in our analysis on two-week ncidence rate and the number of sick days per thousand people, the specific turning-point level of ncome per capita cannot be determined, because the coefficients of the linear term of income per apita are not statistically significant but also not robust among different models. When age and other socioeconomic indicators controlled, the analysis on the CHARLS dataset resents similar estimation results: the coefficient of squared earned income per capita is significantly ositive among different models; the average education years has no robust significant impact on the use city-level individual random effects. 3. *** p < 0.01, ** p < 0.05. tandard errors of coefficients are reported in parentheses. In regressions on the e squared income per capita has significant positive coefficients whose estimates different models. Education years, however, do not display a statistically n all the three dependent variables. Meanwhile, the income per capita shows a effect on the prevalence of chronic diseases, while it does not show such an k incidence rate and the number of sick days per thousand people. s of squared income per capita indicate non-linear correlations between income three kinds of morbidities: two-week prevalence, the number of sick days and alence. Table C1 in Appendix C displays the variance inflation factor (VIF) whose n 5. The VIF excludes the possibility that the significance of the coefficient of capita is a fake one raised by the collinearity with the linear term (income). When profiled with quadratic curves, it means that there are turning-point levels of when people's income is lower than these levels, the three health outcome ith the growth of income level; however, when people's income is higher than vels, the increase of incidence and prevalence is positively correlated with income ntext, the linear term of income per capita (income) does not solely reflect income's rbidity, but its coefficient decides the turning-point level of income per capita oefficient of squared income per capita. However, in our analysis on two-week the number of sick days per thousand people, the specific turning-point level of annot be determined, because the coefficients of the linear term of income per tically significant but also not robust among different models. three morbidities, the squared income per capita has significant positive coefficients whose estimates are robust among different models. Education years, however, do not display a statistically significant impact on all the three dependent variables. Meanwhile, the income per capita shows a significant negative effect on the prevalence of chronic diseases, while it does not show such an impact on two-week incidence rate and the number of sick days per thousand people.
The coefficients of squared income per capita indicate non-linear correlations between income per capita and our three kinds of morbidities: two-week prevalence, the number of sick days and chronic disease prevalence. Table C1 in Appendix C displays the variance inflation factor (VIF) whose values are less than 5. The VIF excludes the possibility that the significance of the coefficient of squared income per capita is a fake one raised by the collinearity with the linear term (income). When correlations can be profiled with quadratic curves, it means that there are turning-point levels of income per capita: when people's income is lower than these levels, the three health outcome indicators decline with the growth of income level; however, when people's income is higher than the turning-point levels, the increase of incidence and prevalence is positively correlated with income per capita. In this context, the linear term of income per capita (income) does not solely reflect income's correlation with morbidity, but its coefficient decides the turning-point level of income per capita together with the coefficient of squared income per capita. However, in our analysis on two-week incidence rate and the number of sick days per thousand people, the specific turning-point level of income per capita cannot be determined, because the coefficients of the linear term of income per capita are not statistically significant but also not robust among different models. When age and other socioeconomic indicators controlled, the analysis on the CHARLS dataset presents similar estimation results: the coefficient of squared earned income per capita is significantly positive among different models; the average education years has no robust significant impact on the Two-way Fixed Effect (FGLS) model, where the standard errors of coefficients are reported in parentheses. In regressions on the three morbidities, the squared income per capita has significant positive coefficients whose estimates are robust among different models. Education years, however, do not display a statistically significant impact on all the three dependent variables. Meanwhile, the income per capita shows a significant negative effect on the prevalence of chronic diseases, while it does not show such an impact on two-week incidence rate and the number of sick days per thousand people.
The coefficients of squared income per capita indicate non-linear correlations between income per capita and our three kinds of morbidities: two-week prevalence, the number of sick days and chronic disease prevalence. Table C1 in Appendix C displays the variance inflation factor (VIF) whose values are less than 5. The VIF excludes the possibility that the significance of the coefficient of squared income per capita is a fake one raised by the collinearity with the linear term (income). When correlations can be profiled with quadratic curves, it means that there are turning-point levels of income per capita: when people's income is lower than these levels, the three health outcome indicators decline with the growth of income level; however, when people's income is higher than the turning-point levels, the increase of incidence and prevalence is positively correlated with income per capita. In this context, the linear term of income per capita (income) does not solely reflect income's correlation with morbidity, but its coefficient decides the turning-point level of income per capita together with the coefficient of squared income per capita. However, in our analysis on two-week incidence rate and the number of sick days per thousand people, the specific turning-point level of income per capita cannot be determined, because the coefficients of the linear term of income per capita are not statistically significant but also not robust among different models.   Table 4 presents the estimates of the effects of income per capita, squared income per capita and education years on three kinds of morbidities. Every column summarizes the result of a specific model, where the standard errors of coefficients are reported in parentheses. In regressions on the three morbidities, the squared income per capita has significant positive coefficients whose estimates are robust among different models. Education years, however, do not display a statistically significant impact on all the three dependent variables. Meanwhile, the income per capita shows a significant negative effect on the prevalence of chronic diseases, while it does not show such an impact on two-week incidence rate and the number of sick days per thousand people.
The coefficients of squared income per capita indicate non-linear correlations between income per capita and our three kinds of morbidities: two-week prevalence, the number of sick days and chronic disease prevalence. Table C1 in Appendix C displays the variance inflation factor (VIF) whose values are less than 5. The VIF excludes the possibility that the significance of the coefficient of squared income per capita is a fake one raised by the collinearity with the linear term (income). When correlations can be profiled with quadratic curves, it means that there are turning-point levels of income per capita: when people's income is lower than these levels, the three health outcome indicators decline with the growth of income level; however, when people's income is higher than the turning-point levels, the increase of incidence and prevalence is positively correlated with income per capita. In this context, the linear term of income per capita (income) does not solely reflect income's correlation with morbidity, but its coefficient decides the turning-point level of income per capita together with the coefficient of squared income per capita. However, in our analysis on two-week incidence rate and the number of sick days per thousand people, the specific turning-point level of income per capita cannot be determined, because the coefficients of the linear term of income per capita are not statistically significant but also not robust among different models. When age and other socioeconomic indicators controlled, the analysis on the CHARLS dataset presents similar estimation results: the coefficient of squared earned income per capita is significantly positive among different models; the average education years has no robust significant impact on the use province-level individual fixed effects. Models marked with ducation years on three kinds of morbidities. Every column summarizes the result of a specific odel, where the standard errors of coefficients are reported in parentheses. In regressions on the hree morbidities, the squared income per capita has significant positive coefficients whose estimates re robust among different models. Education years, however, do not display a statistically ignificant impact on all the three dependent variables. Meanwhile, the income per capita shows a ignificant negative effect on the prevalence of chronic diseases, while it does not show such an mpact on two-week incidence rate and the number of sick days per thousand people.
The coefficients of squared income per capita indicate non-linear correlations between income er capita and our three kinds of morbidities: two-week prevalence, the number of sick days and hronic disease prevalence. Table C1 in Appendix C displays the variance inflation factor (VIF) whose alues are less than 5. The VIF excludes the possibility that the significance of the coefficient of quared income per capita is a fake one raised by the collinearity with the linear term (income). When orrelations can be profiled with quadratic curves, it means that there are turning-point levels of ncome per capita: when people's income is lower than these levels, the three health outcome ndicators decline with the growth of income level; however, when people's income is higher than he turning-point levels, the increase of incidence and prevalence is positively correlated with income er capita. In this context, the linear term of income per capita (income) does not solely reflect income's orrelation with morbidity, but its coefficient decides the turning-point level of income per capita ogether with the coefficient of squared income per capita. However, in our analysis on two-week ncidence rate and the number of sick days per thousand people, the specific turning-point level of ncome per capita cannot be determined, because the coefficients of the linear term of income per apita are not statistically significant but also not robust among different models. When age and other socioeconomic indicators controlled, the analysis on the CHARLS dataset resents similar estimation results: the coefficient of squared earned income per capita is significantly ositive among different models; the average education years has no robust significant impact on the use city-level individual random effects. 3. *** p < 0.01, ** p < 0.05. Table 4 presents the estimates of the effects of income per capita, squared income per capita and education years on three kinds of morbidities. Every column summarizes the result of a specific model, where the standard errors of coefficients are reported in parentheses. In regressions on the three morbidities, the squared income per capita has significant positive coefficients whose estimates are robust among different models. Education years, however, do not display a statistically significant impact on all the three dependent variables. Meanwhile, the income per capita shows a significant negative effect on the prevalence of chronic diseases, while it does not show such an impact on two-week incidence rate and the number of sick days per thousand people.
The coefficients of squared income per capita indicate non-linear correlations between income per capita and our three kinds of morbidities: two-week prevalence, the number of sick days and chronic disease prevalence. Table A6 in Appendix C displays the variance inflation factor (VIF) whose values are less than 5. The VIF excludes the possibility that the significance of the coefficient of squared income per capita is a fake one raised by the collinearity with the linear term (income). When correlations can be profiled with quadratic curves, it means that there are turning-point levels of income per capita: when people's income is lower than these levels, the three health outcome indicators decline with the growth of income level; however, when people's income is higher than the turning-point levels, the increase of incidence and prevalence is positively correlated with income per capita. In this context, the linear term of income per capita (income) does not solely reflect income's correlation with morbidity, but its coefficient decides the turning-point level of income per capita together with the coefficient of squared income per capita. However, in our analysis on two-week incidence rate and the number of sick days per thousand people, the specific turning-point level of income per capita cannot be determined, because the coefficients of the linear term of income per capita are not statistically significant but also not robust among different models.
When age and other socioeconomic indicators controlled, the analysis on the CHARLS dataset presents similar estimation results: the coefficient of squared earned income per capita is significantly positive among different models; the average education years has no robust significant impact on the prevalence of chronic diseases. Nevertheless, the coefficient of the linear term of earned income per capita becomes significantly negative in the analysis on the CHARLS dataset. Therefore, we estimate that the turning-point level of earned income per capita locates in the range from 11.1 thousand yuan to 12.1 thousand yuan.
The regression analysis on the NHSS and CHARLS datasets do not fully support the conclusions of descriptive statistics: a non-linear correlation between morbidities and income per capita was discovered; and education, however, was found to have no significant impact on morbidities. The quadratic relationship between income per capita and morbidities is a new answer to the argument whether there is a universal income-morbidity correlation in China. So far, different correlations have been discovered in different countries: negative correlation was found in the U.S. and Europe [9,33,46], while no specific correlation was found in Canada [4]. Meanwhile, a study in Europe [9] pointed out that the negative income-morbidity correlation in Europe is non-linear among different income strata. Thus, what is the case of China? In our analysis, both positive and negative income-morbidity correlations are found in China, where the relationship is also found to be non-linear: morbidity decreases with growing per capita income; however, when income per capita exceeds a specific turning-point level, morbidity begins to increase with continuing income growth. This non-linear relationship in China can be profiled with a quadratic curve. Therefore, this paper suggests distinguishing different income groups when discussing the relationship between income and morbidity in China, e.g., designing gradient contribution policies of the health insurance plans in China. Other than the effect of income per capita on morbidity, the effect of education years in our analysis is consistent with previous research conducted in China [36], i.e., education years have no significant effect on morbidity in different cohorts. However, a negative correlation between education and morbidity is found in the older population in other countries, e.g., the U.S. [33]. This paper does not discuss the reasons for this difference. It should be discussed with causality analysis.

Conclusions
This study focuses on the relationship between socioeconomic status and the morbidity rate in China. It concerns the cohorts at not only all age stages but also old age stages to add new evidence for the fragmentary relationship between socioeconomic status and morbidity. In our regression analysis on the NHSS and CHARLS datasets, three morbidities are used as dependent variables: two-week incidence rate, the number of sick days per thousand people, and the prevalence of chronic diseases. Meanwhile, we use PCA to convert different socioeconomic indicator variables to several interpretable components as the controlling variables in our model specifications. Then, these specifications are estimated with five models, where our robustness check shows consistent estimates among different models.
The quadratic relationship between income per capita and morbidities were found in both the NHSS and CHARLS data. This relationship is statistically robust on different models and on all the three dependent variables. Such a non-linear correlation means there is an all-age quadratic pattern between the income and morbidities in China. This correlation is similar to the patterns in European countries [9] and England [32]. Meanwhile, our study found no correlation between the education level and the two-week incidence rate in both all-age and old-age cohorts in China. The same conclusion was also found on the number of sick days and chronic disease prevalence in the NHSS data, also the chronic disease prevalence in the CHARLS data. Our conclusions indicate that the relationship between education and morbidity rates in China is consistent to the cases in Canada [4], England [32], and Nordic countries [34]. Such a relationship is different from the cases in the United States [33] and European countries [9,35], where previous research found that education is associated with the onset of health problems. However, the average education level may visibly affect the correlation between education and morbidity rates. Previous studies suggest that education affects morbidity rates through people's medical knowledge. However, the average education years is about 7 years in NHSS data and 2 years in CHARLS data, where CHARLS interviews people over 45 years old in China. These respondents were all born no later than 1970 and received less education than the younger generations in China. It suggests that the low education level and generally insufficient medical knowledge of this cohort may be one of the causes of absent association between education level and morbidity in China.
In summary, the negative correlation between socioeconomic status and morbidity might not be proved to be a general pattern, but instead depends on the studied countries. In this paper, the relationship between socioeconomic status and morbidity in China was proved to be not universally negative, which is similar patterns to the cases in some other countries  ). This analysis uses data or information from the Harmonized CHARLS dataset and Codebook, Version C as of April 2018 developed by the Gateway to Global Aging Data. The development of the Harmonized CHARLS was funded by the National Institute on Ageing (R01 AG030153, RC2 AG036619, R03 AG043052). For more information, please refer to www.g2aging.org. Finally, we want to express our gratitude to all the reviewers who provided many valuable suggestions.

Conflicts of Interest:
The authors declare no conflict of interest.    Figure B1. Scree plots.      model, where the standard errors of coefficients are reported in parentheses. In regressions on the three morbidities, the squared income per capita has significant positive coefficients whose estimates are robust among different models. Education years, however, do not display a statistically significant impact on all the three dependent variables. Meanwhile, the income per capita shows a significant negative effect on the prevalence of chronic diseases, while it does not show such an impact on two-week incidence rate and the number of sick days per thousand people. The coefficients of squared income per capita indicate non-linear correlations between income per capita and our three kinds of morbidities: two-week prevalence, the number of sick days and chronic disease prevalence. Table C1 in Appendix C displays the variance inflation factor (VIF) whose values are less than 5. The VIF excludes the possibility that the significance of the coefficient of squared income per capita is a fake one raised by the collinearity with the linear term (income). When correlations can be profiled with quadratic curves, it means that there are turning-point levels of income per capita: when people's income is lower than these levels, the three health outcome indicators decline with the growth of income level; however, when people's income is higher than the turning-point levels, the increase of incidence and prevalence is positively correlated with income per capita. In this context, the linear term of income per capita (income) does not solely reflect income's correlation with morbidity, but its coefficient decides the turning-point level of income per capita together with the coefficient of squared income per capita. However, in our analysis on two-week incidence rate and the number of sick days per thousand people, the specific turning-point level of income per capita cannot be determined, because the coefficients of the linear term of income per capita are not statistically significant but also not robust among different models. When age and other socioeconomic indicators controlled, the analysis on the CHARLS dataset presents similar estimation results: the coefficient of squared earned income per capita is significantly positive among different models; the average education years has no robust significant impact on the use province-level individual fixed effects. Models marked with hree morbidities, the squared income per capita has significant positive coefficients whose estimates re robust among different models. Education years, however, do not display a statistically ignificant impact on all the three dependent variables. Meanwhile, the income per capita shows a ignificant negative effect on the prevalence of chronic diseases, while it does not show such an mpact on two-week incidence rate and the number of sick days per thousand people.

Appendix A. Variables and other descriptive statistics
The coefficients of squared income per capita indicate non-linear correlations between income er capita and our three kinds of morbidities: two-week prevalence, the number of sick days and hronic disease prevalence. Table C1 in Appendix C displays the variance inflation factor (VIF) whose alues are less than 5. The VIF excludes the possibility that the significance of the coefficient of quared income per capita is a fake one raised by the collinearity with the linear term (income). When orrelations can be profiled with quadratic curves, it means that there are turning-point levels of ncome per capita: when people's income is lower than these levels, the three health outcome ndicators decline with the growth of income level; however, when people's income is higher than he turning-point levels, the increase of incidence and prevalence is positively correlated with income er capita. In this context, the linear term of income per capita (income) does not solely reflect income's orrelation with morbidity, but its coefficient decides the turning-point level of income per capita ogether with the coefficient of squared income per capita. However, in our analysis on two-week ncidence rate and the number of sick days per thousand people, the specific turning-point level of ncome per capita cannot be determined, because the coefficients of the linear term of income per apita are not statistically significant but also not robust among different models. When age and other socioeconomic indicators controlled, the analysis on the CHARLS dataset resents similar estimation results: the coefficient of squared earned income per capita is significantly ositive among different models; the average education years has no robust significant impact on the use city-level individual random effects. 3. *** p < 0.01, ** p < 0.05, * p < 0.1. Models marked with ⱡ use province-level individual fixed effects. Models marked with ⱴ use city-level individual random effects. 3. *** p < 0.01, ** p < 0.05.
When age and other socioeconomic indicators controlled, the analysis on the CHARLS dataset presents similar estimation results: the coefficient of squared earned income per capita is significantly positive among different models; the average education years has no robust significant impact on the Models marked with ⱡ use province-level individual fixed effects. Models marked with ⱴ use city-level individual random effects. 3. *** p < 0.01, ** p < 0.05.
When age and other socioeconomic indicators controlled, the analysis on the CHARLS dataset presents similar estimation results: the coefficient of squared earned income per capita is significantly positive among different models; the average education years has no robust significant impact on the  incidence rate and the number of sick days per thousand people, the specific turning-point level of income per capita cannot be determined, because the coefficients of the linear term of income per capita are not statistically significant but also not robust among different models. Models marked with ⱡ use province-level individual fixed effects. Models marked with ⱴ use city-level individual random effects. 3. *** p < 0.01, ** p < 0.05.
When age and other socioeconomic indicators controlled, the analysis on the CHARLS dataset presents similar estimation results: the coefficient of squared earned income per capita is significantly positive among different models; the average education years has no robust significant impact on the use province-level individual fixed effects. Models marked with ncome per capita cannot be determined, because the coefficients of the linear term of income per apita are not statistically significant but also not robust among different models. Models marked with ⱡ use province-level individual fixed effects. Models marked with ⱴ use city-level individual random effects. 3. *** p < 0.01, ** p < 0.05.
When age and other socioeconomic indicators controlled, the analysis on the CHARLS dataset resents similar estimation results: the coefficient of squared earned income per capita is significantly ositive among different models; the average education years has no robust significant impact on the use city−level individual random effects. 3. *** p < 0.01, ** p < 0.05.