Does “Rural” Always Mean the Same? Macrosocial Determinants of Rural Populations’ Health in Poland

Rural areas, as well as urban ones, are not homogeneous in terms of social and economic conditions. Those surrounding large urban centers (suburban rural areas) act different roles than those located in remote areas. This study aims to measure the level of inequalities in social determinants of health (SDH) between two categories of rural areas. We pose the following research hypotheses: (hypothesis H1) rural areas in Poland are relatively homogenous in the context of SDH and (hypothesis H2) SDH affects life expectancies of rural residents. Based on data covering all rural territories, we found that rural areas in Poland are homogenous in SDH. We also find important determinants of health rooted in a demographic structure—the feminization index and a ratio of the working-age population. On the other hand, we cannot confirm the influence of commonly used SDH-GDP and unemployment rate.


Introduction
Social determinants of health (SDH) comprise of economic and social conditions, as well as their distribution in the particular populations. The most commonly cited are income, education, unemployment and job security, employment and working conditions, early childhood development, food insecurity, housing, social exclusion, social safety network, health services, aboriginal status, gender, race, and disability [1]. SDH covers, therefore, all factors determining the condition of a human organism, both physical and psychological. In other words they describe non-medical conditions that influence populations' health SDH [2][3][4][5][6]. These factors, combined or individually, can exert a favorable or unfavorable impact on the health of individuals, as well as the entire population. The catalogue of potential SDH is virtually unrestricted, so in previous studies, authors employ a wide range of factors (Table 1), and the choice of analyzed factors is partly determined by the characteristics of the research group (children, immigrants, and older people). On the other hand, some authors propose to group SDH into clusters having the same origin or characteristics. Orpana and Lemyre introduce a division into three broad categories: material/structural, behavioral/lifestyle, and psychosocial mechanisms [7]. Bethune and colleagues propose to group SDH into the following categories: the structural factors having a distal character, which is rooted in socioeconomics and politics (like income, education, and gender) and the intermediary determinants regarded as proximal ones that flow out of the structural determinants (like stressors or social isolation) [8].
Accumulated occupational class position (unskilled manual workers, skilled manual workers, assistant non-manual employees, intermediate non-manual employees, higher level non-manual employees, self-employed, and farmers), years in work, age, and education [21] Hence, the place of residence, through a specific combination of SDH, influences individual's state of health [33][34][35][36][37][38]. Usually, the dissimilarity between an urban and rural residence are the centre of attention, as the result of clear differences in SDH. Cities offer a positively denser social and institutional network what contributes to better health in several ways [39,40]. Rural areas, as territories of lower urbanization, are exposed to several negative consequences-residents of rural areas are particularly vulnerable to social deprivation [16]-there are, on average, less educated that generate lower earnings, more often perform physical work, and are exposed to harmful factors related to agricultural production. Moreover, villagers have usually worse access to health care benefits [41,42]. Village residents rarely benefit from regular doctor visits [43,44] including preventive services [45] and tend to have weaker access to emergency services [46,47], which are usually located in urban areas. These differences in SDH contribute to health inequalities expressed by dissimilar life expectancies or healthy life expectancies [33].
Life expectancy (LE) is the most commonly used indicator that provides an overall assessment of a population's health [48]. LE is usually estimated for a newborn and a person at the age of 65 years, which bases on the assumption that this age draws a line of increasing demand for health benefits. LE weakness is that it does not take into account the quality of life associated with the burden of diseases. Factors such as mortality and morbidity are incorporated into other indicators, like disability-free life expectancy (DFLE) or healthy life expectancy (HLY). Both DFLE or HLY are calculated using the Sullivan method based on the data of incidence, prevalence, and disability distributions for selected diseases [49][50][51].
Previous research in this area focuses primarily on the differences in health state between urban and rural populations, explained by SDH. Generally, LE and HLY, as well as DFLE, is lower for rural residents [52]. However, rural areas should not be perceived as homogeneous in terms of social and economic conditions. The level of socio-economic development of rural regions varies-the most developed rural areas are located in the vicinity of large and medium-sized cities [53]. Rural areas surrounding large urban centers (suburban rural areas) act different roles than those located in remote areas. These differences include almost all factors defined as SDH: employment and income structure, education level, and access to social infrastructure. The areas around large cities are more densely populated and usually serve as a reservoir of the labor force for large cities. The proximity of highly urbanized areas often can be a source of a better financial situation of surrounding municipalities so that they can offer public services at a higher level. According to that, many important research questions arise: do rural areas differ in the context of SDH? Do these differences, if they exist, affect, and how, the state of health of the rural residents?

Methodology and Data
This study aimed to measure the level of inequalities in SDH. Using measures of inequality, we analyze the degree of inequalities for selected SDH inside macroregions and between two categories of rural territories: those located in the direct proximity of large cities (PLM) and areas located away from large urban centers (PLW). The research hypotheses were as follows: Hypothesis H1: rural areas in Poland are relatively homogenous in the context of SDH; Hypothesis H2: SDH affects life expectancies of rural residents.
In the construction of the H1 hypothesis, we assumed that rural areas in Poland, despite various functions and characteristics, are homogenous in the context of the selected SDH. That does not preclude a situation that the study identifies regions characterized by higher and smaller differentiation. We measured the level of inequality using two indicators: the Herfindahl-Hirschman Index (HHI) and the GINI coefficient (GINI).
We also assumed that selected SDH affects rural populations' state of health expressed by LE for a newborn (hypothesis H2). We are forced to use LE for a newborn because other health states' indicators are unavailable for analyzed territorial division units.

Reseach Sample
Poland is located in Central Europe-from the North Poland borders is Russia and Lithuania, from the east is Belarus and Ukraine, and from the south is Slovakia and the Czech Republic, with Germany on the West. Poland has a population of 38.1 million and an area of 312,685 km 2 (2018). It is the largest country among the new Member States adopted after 2004. The Human Development Index for Poland is 0.865 (2018), which gives it the 33rd place in the World. Health services are financed within compulsory public health insurance. Of the population 40% lives in rural areas characterized as "a thinly populated area". The rural inhabitants are, on average, less educated (higher education has 12% of people, while in cities 28%) and achieve lower income (21.9% of the rural inhabitants are in the first income quintile, comparing to 10.8% of urban inhabitants, while in the fifth quintile, 8.2% of rural inhabitants and 21.2% of urban ones).

Variables
Based on previous studies (Table 1) and data availability, we selected four categories of SDH: demography, labor market, education, communities' economic situation, and households' access to infrastructure ( Table 2) [54]. Data comes from the research "Statistical information system of rural areas", which bases on a census and official sources and covers the years 2006-2016. Thereupon the data cover the whole rural population. The methodology of the identification and the division of subregions complies with the OECD regional typology, which is based on the degree of urbanization [55][56][57][58].

Statistical Analysis
In the first step of research, we employed basic descriptive statistics to describe selected variables in terms of their distributions. We analyzed, for each variable, the percentage of the average value in the maximum value, kurtosis, skewness, and variability. The share of average values in the maximum average values for the variable allows determining the disproportion of variables. It means that the disproportion of the variable is high when this indicator takes lower values.
Then we proposed the Herfindahl-Hirschman Index (HHI), which allows assessing a level of concentration, and as a consequence, the level of inequality, for analyzed variables. The index is calculated as the sum of the squares of the shares of each variable in the overall sum of variables (characteristics), according to the following formula (1): where: ω i -the proportion of a percentage of a variable for i sub-region to a percentage of a variable in all sub-regions.
Moreover, we employed the Gini coefficient as a measure of concentration (inequality). where: x i -i-unit value of analyzed phenomenon, x-arithmetic mean, i-position in a series, n-sample size.
In the second step, we estimated a single-equation econometric model using the Ordinary Least Squares (OLS) method, where the dependent variable is a life expectancy for a newborn (LE). We wanted to measure the strength and direction of the relationship between selected SDH and LE. Since the study did not focus on health-related factors, we used the LE for a newborn. We could not use indicators like HLE or DFLE due to the data structure.
The model should be linear towards its parameters, and the number of observations must be higher than the number of parameters. There should be no linear dependencies among the exogenous variables. In the final stage of this empirical study, the estimated econometric models were verified. The number of observations only allows estimating the simple regression. Moreover, this small number of observations does not authorize the proper identification of the shape of the dependency. The use of more extensive equations requires an increase in the number of degrees of freedom. That is why we were forced to use the methods of spatial econometrics. Gretl supports calculations. Table 3 presents the percentage of the average value in the maximum value and Table 4-the descriptive statistics for all analyzed variables. First, all rural areas were relatively homogenous in the context of feminization (FM), the working-age population (WAP), the old-age dependency ratio (ODR), and community's income per capita (FR). We can also observe high homogeneity in the case of selected public services like access to pre-school education (PPE) or the water supply system (WSS). In the case of the rest of the variables, the level of homogeneity was lower. We could interpret it as relative homogeneity.

Results
In financial terms, there were more considerable differences (in favor of municipalities around large cities) in the own resources of municipalities (OSR), targeted public grants (TG), and general subvention from the state budget (GS). However, this did not significantly change the overall financial situation of municipalities-per capita indicators only slightly favored suburban municipalities, which by nature are more densely populated. The location near a large urban centre was also associated with higher GDP, but, on the other hand, higher levels of recorded unemployment (UR). That does not necessarily mean lower unemployment levels in remote rural areas, where unemployment sometimes has a hidden character. Regardless of that, we could not clearly state that rural areas surrounding large cities were in a more favorable situation in the context of SDH.  Table 4 presents basic descriptive statistics for macroregions (PL2-PL9) and two types of rural areas-surrounding big urban centers (PLM) and more remote (PLW). The kurtosis helps to detect "tailedness" of empirical distributions, while skewness informs about the "long tiles". The Gini coefficient measures a level of inequality.
For all variables, we can observe essential differences in kurtosis. For the same variable, the distributions took a different shape: from platykurtotic (negative values) to leptokurtic (positive values). That means that the variables' values were, depending on a macroregion, focused more or less around the centre point-only in a few cases, the distribution was similar to a normal one. The skewness behaves similarly-for the same variable, we observed both distributions with a left tail (negative values) and a right tail (positive values).
On the other hand, the Gini coefficient did not show a high level of variation inside macroregions that it was comparable between them. The level of inequalities was generally low (FM, ODR, ER2, UR, WAP, GDP, OSR, FR, PPE, WSS, and SS) or moderate.
Except for demographic variables (FM and ODR) and WAP, which strongly related to the demographic situation, all variables show medium or high variability, indicating tiny differences in demographic structure, not only inside sub-regions but also between them. In particular, we did not see differences in volatility between the two categories of rural areas (PLW and PLM; Table 4).  Rural areas, however, had a very diverse economic structure. The characteristics that most differed in these areas were employment structure, expressed by the ratios of the population employed in industry, services, and financial sector (ER2, ER3, and ER4), as well as unemployment (UR). As a consequence, rural areas surrounding large city centers were more homogenous in terms of generated GDP. Rural areas, on the other hand, were more diverse in terms of access to infrastructure (WSS, SS, and GSS). The mean value of the volatility was lowest for WSS, PPE, and FR. For these characteristics, the average variability coefficient was lower than 10%, indicating that those variables were not statistically significant.
The combination of measures presented above allows an elaborate assessment of each variable. For example, in the case of access to a gas supply system (GSS), we could observe that it was characterized by a moderate level of inequality but higher in the case of remote rural areas. In four macroregions distributions were platykurtic while in three, leptokurtic. In the case of more remote areas, values were more concentrated around the focal point of the distribution than in the case of PLM areas. The same pattern was visible in the case of the variation coefficient. In almost all regions distributions had a right tail. We could conclude that the access to a gas supply system was moderately unequal in macroregions and PLW areas were more diversified in this context then PLM areas.
In the next step, we estimated the concentration level for selected SDH, using the nominal data due to the determination of the structure indicators. The HHI level generally shows a low level of variable concentration, describing the selected SDH (Table 5). However, there were some regions that were characterized by high or moderate concentration of SDH. Rural areas in three regions: southwestern, central, and the Masovian (metropolitan) voivodeship were characterized by moderate or even very high level of inequality in terms of SDH. These areas were characterized by the existence of a very fast growing economic centre (Warsaw, Wrocław, or Łódź), which were surrounded by relatively low developed territories. Additionally, we estimated the value of the Gini coefficients, separately for two categories of rural territories-those surrounding big city centers (PLM) and more remote areas (PLW). For the PLM group, it was equal to 0.19, while in the PLW group it was 0.20. That proves that, first, the level of SDH inequalities in those two types of rural areas were virtually the same, and, secondly, this level of inequality was low. These findings allowed us to adopt the hypothesis H1, suggesting that rural areas are homogenous in terms of SDH. This homogeneity was expressed by, not only, average values, but also by the same level of inequality (HHI and Gini).
In order to verify the H2 hypothesis, we estimated the following econometric model, separately for men and women: Estimation results for the male population are presented in Tables 6 and 7. All variables were highly statistically significant. The model indicates five variables that positively correlated with LE for men population: FM, GS, PPE, SS, and GSS. Additionally, we estimated the value of the Gini coefficients, separately for two categories of rural territories-those surrounding big city centers (PLM) and more remote areas (PLW). For the PLM group, it was equal to 0.19, while in the PLW group it was 0.20. That proves that, first, the level of SDH inequalities in those two types of rural areas were virtually the same, and, secondly, this level of inequality was low. These findings allowed us to adopt the hypothesis H1, suggesting that rural areas are homogenous in terms of SDH. This homogeneity was expressed by, not only, average values, but also by the same level of inequality (HHI and Gini).
In order to verify the H2 hypothesis, we estimated the following econometric model, separately for men and women: Estimation results for the male population are presented in Tables 6 and 7. All variables were highly statistically significant. The model indicates five variables that positively correlated with LE for men population: FM, GS, PPE, SS, and GSS. What is interesting, if we compared two types of rural areas (PLW and PLM), we did not observe the significant inequality, although concentration coefficients for urban areas were slightly higher, still indicating small inequalities.
Additionally, we estimated the value of the Gini coefficients, separately for two categories of rural territories-those surrounding big city centers (PLM) and more remote areas (PLW). For the PLM group, it was equal to 0.19, while in the PLW group it was 0.20. That proves that, first, the level of SDH inequalities in those two types of rural areas were virtually the same, and, secondly, this level of inequality was low. These findings allowed us to adopt the hypothesis H1, suggesting that rural areas are homogenous in terms of SDH. This homogeneity was expressed by, not only, average values, but also by the same level of inequality (HHI and Gini).
In order to verify the H2 hypothesis, we estimated the following econometric model, separately for men and women: Estimation results for the male population are presented in Tables 6 and 7. All variables were highly statistically significant. The model indicates five variables that positively correlated with LE for men population: FM, GS, PPE, SS, and GSS. We checked the normality of the distribution of the random component. H0 hypothesis: the random component has a normal distribution. Test statistics: Chi-square (2) = 1.34456, p = 0.510543. A more feminized environment seems to encourage the prolongation of men's lives. When the feminization index rises by 1%; the life expectancy will extend by 0.17%, provided that the value of other variables does not change. The model also indicates variables that correlate negatively with LE for men. These are ODR, WAP, OSR, TG, and WSS. In the case of OSR, TG, and WSS variables, the interpretation of the obtained results was somewhat tricky. We could only interpret the relationship between the situation on the labor market and life expectancies-the model suggests that the lowering resource of the working population affected life expectancy negatively. If the percentage of the working-age population lowers by 1%, the LE for men extends by 0.23%, if the value of other variables is constant. The interpretation of other coefficients was analogical. However, their values were very low, so the potential to stimulate, or destimulate, LE was rather weak.
In the estimated model, the coefficient of determination was 0.84, which confirms that the equation explains 84% of the variability of the explained. Hence, the model was well fit to the data. The standard error of the rests, that is, the root of rests' variance describes the behavior of the explained variable. For model 1, it was equal to 1004, which means that the estimated LE(m) = 0 will change, on average, by ±1004 units. H0 hypothesis: experiential distribution has a normal distribution. Asymptotic test statistics: z = −0.197331 with p = 0.843568. Figure 1 illustrates the empirical and aligned values of the model 1 variable.   Tables 8 and 9). We generally observed the same relationship as in the case of model 1, except two variables that had an opposite direction: FM and ODR. The higher share of the female population seemed to limit LE, while the higher value of dependency ration stimulated LE positively, although this influence was minimal. It could also be concluded that the strength of this linear relationship between analyzed variables studied and women's life expectancy was lower than for the male population (all coefficients in the model 2 had lower values than in the model 1). We could search for the source of this relationship in observation, that along with age, the share of women in the population increased as an effect of overall higher life expectancy for all women's age groups (81.8 years, comparing to 74.4 for men, in 2017).  The determination coefficient for model 2 took a value of 0.81, which shows that the equation explains 81% of the variability of the dependent variable. The model very well fitted the empirical data. The standard error was 1.010, which means that the estimated LE 0 will change, on average, by ±1010 units. H0 hypothesis: experiential distribution has a normal distribution. Doornik-Hansen test (1994) transformed skewness and kurtosis: chi-square(2) = 0.572684 with p = 0.751006. Figure 2 shows the empirical and aligned values of the model 2 dependent variable.   To summarize, estimated models allowed us to adopt the H2 hypothesis, but there were some limitations. Although both models explained a large part of the variability of the dependent variable (R-squared) however, the strength of their influence on life expectancy was relatively small (small values of coefficient). This pattern was particularly visible in the model estimated for the female population. The exception was two variables: the feminization index and working-age population ration, but this effect was important (in terms of strength) only for the male population.

Discussion and Conclusions
The study was inspired by the observed diversity of rural areas in terms of functions, economic development' level, and access to social and technical infrastructure. So, we posed a question whether this diversity would be visible at the level of SDH. We could conclude that in terms of analyzed SDH, rural areas were quite homogenous. Even if observed, differences acted in different directions (like GDP and unemployment rate), so we could not accept the assumption that some areas were more favorable than others.
This pattern also applies to the problem of inequality in SDH. We have not confirmed the existence of inequalities in SDH between rural areas located around large cities and those more remote. On the other hand, the study shows that there were three macro-regions in Poland, which were characterized by moderate or even high inequality in demographic structure, labor market, economic development, or access to technical infrastructure. This is an essential signal for stakeholders responsible for cohesion policy and public health.
Estimated econometric models also confirmed the impact of selected SDH on the life expectancy of women and men, although in most cases this relationship was quite weak, especially for the female population. However, it should be stressed out that in this study employed variables that have so far been rarely used in research in this area.
In the case of the male population, two factors drew our attention primarily. The impact of the feminization factor on the expected life expectancy of women and men was an exciting output of the presented study. Previous studies identify gender as an essential determinant of health state or life expectancy. We proved that the highly feminized environment positively affected men's life expectancy but negatively influenced the life expectancy of women. This conclusion requires intense further research. High, or low, rate of participation of women in society can be linked, indirectly or directly, to other factors studied earlier, such as friendly neighborhood [9,12,17] psycho-social environment [9,15], social support [22,27], marital status [23,31], or even culture [8,16]. The second interesting determinant was the percentage of the working-age population. This factor, importantly, and negatively, affected male life expectancy.
We were a little bit surprised that both models did not confirm the relationship between GDP and life expectancies. We have expected to bear out the existence of this link based on previous studies that identify income as a major SDH [2,8,9,11,13,16,19,22,28] and, additionally, research that positively verified the impact of GDP or national wealth [17,23].
We also expected to confirm the impact of unemployment on life expectancy [13,15,19,22,31]. Furthermore, none of the models confirmed this relationship though many researchers consider it downright obvious.
The SDH problem is still a topical subject of research around the world. We hope that the presented study provides new evidence in this area and would be a voice in the discussion of the future shape of social policy. This paper contributes to science in several ways. Previous research generally focuses on the differences between rural and urban areas. We tried to assess the homogeneity of rural areas in the context of SDH.
Additionally, we provided new evidence in the area of SDH-we showed the role of demographic structure (especially feminization index). At the same time, we did not confirm the importance of factors rooted in previous research. We also proposed to use the Herfindahl-Hirschman Index (HHI) as a tool allowing us to measure the inequalities in SDH.