Spatial Analysis of the Distribution, Risk Factors and Access to Medical Resources of Patients with Hepatitis B in Shenzhen, China

Considering the high morbidity of hepatitis B in China, many epidemiological studies based on classic medical statistical analysis have been started but lack spatial information. However, spatial information such as the spatial distribution, autocorrelation and risk factors of the disease is of great help in studying patients with hepatitis B. This study examined 2851 cases of hepatitis B that were hospitalized in Shenzhen in 2010 and studied the spatial distribution, risk factors and spatial access to health services using spatial interpolation, Pearson correlation analysis and the improved two-step floating catchment area method. The results showed that the spatial distribution of hepatitis B, along with risk factors as well as spatial access to the regional medical resources, was uneven and mainly concentrated in the south and southwest of Shenzhen in 2010. In addition, the distribution characteristics of hepatitis B revealed a positive correlation between four types of service establishments and risk factors for the disease. The Pearson correlation coefficients are 0.566, 0.515, 0.626, 0.538 corresponding to bath centres, beauty salons, massage parlours and pedicure parlours (p < 0.05). Additionally, the allocation of medical resources for hepatitis B is adequate, as most patients could be treated at nearby hospitals.

In addition, the prevention and control of disease is the main goal of spatial epidemiology, and one important research subject is spatial access to health services. Spatial access to health services refers to overcoming spatial obstacles to medical facilities [22,23]. Based on different application environments and requirements, many different research methods are used to measure spatial access. Among the numerous methods, the gravity model (also known as the potential model) and the two-step floating catchment area method are the most widely used [24,25]. However, both methods have flaws; the gravity model is somewhat abstract and difficult to understand, and the two-step floating catchment area method ignores the spatial access differences in demand points within the same search scope and the spatial access values in demand points outside of the search scope [26]. It is also difficult to set a reasonable search radius for each medical facility [27]. This current study used an improved two-step floating catchment area model to overcome these limitations. In particular this model considers the effect of distance decay by setting weighted distance values, and this improved method is also an effective method for setting a reasonable search radius for each medical facility. These two changes significantly improve the accuracy of measuring spatial access to health services.

Study Area
Shenzhen is a coastal city located in the south of China, northeast of the Pearl River Estuary. This city is located south of the Tropic of Cancer, from 113°46′ to 114°37′ east longitude and between 22°27′ and 22°52′ north latitude, and the total area of Shenzhen is approximately 1952.84 km 2 [27]. Shenzhen is located at the border between subtropical monsoon and tropical marine climates with abundant rainfall and beautiful scenery. The average annual precipitation is approximately 1924.7 mm. With 230 km of coastline, Shenzhen is rich in marine resources such as excellent ports and abundant fisheries. Shenzhen was the first special economic zone in China. With its rapid economic development, Shenzhen occupies a position of importance in China. At present, there are a total of 10 districts and 57 sub-districts in Shenzhen. Figure 1 shows the location of Shenzhen in China and the names of its sub-districts.

Study Data
The study area for this paper was the whole of Shenzhen City, which includes 10 districts and 57 sub-districts. The study mainly collected geographic, demographic, and hepatitis B case data, along with some service facility and medical resource data: 1. Basic geographic data: Administrative data for the division of Shenzhen were obtained from the Urban Planning, Land and Resources Commission of the Shenzhen Municipality [9]. 2. Demographic data: These data, for all 57 different sub-district administrative regions, were obtained from the 6th national population census [28]. 3. Hepatitis B case data: Data from the Shenzhen Centre for Health Information (SCHI), an institute directly administered by the Health, Population and Family Planning Commission of Shenzhen Municipality, were obtained from hospitalized patients' medical records including the patients' home addresses, ages, sexes, etc. in 2010. 4. Medical facility data: These data were also obtained from the Shenzhen Centre for Health Information with addresses and service levels for most hospitals. 5. Service facility data: Certain service facilities may promote hepatitis B infection. Address data were obtained by searching electronic maps on the Internet.

Study Methods
This study mainly applied spatial interpolation (Kriging), spatial risk factors correlation analysis and analysis of spatial access to health services. To be pointed out that the Shenzhen Center for Health Information (SCHI), an institute directly administered by the Health, Population and Family Planning Commission of Shenzhen Municipality gave the permission of the research involving hepatitis B cases and approved this retrospective study. In this study, written informed consent was obtained by participants so we could not reveal their personal information to the public. In addition, the study was approved by the Wuhan University Institutional Ethnic Committee, and we must keep the patients' information secret.

Spatial Interpolation
Spatial interpolation is a process of intelligent guesswork in which the investigator attempts to make a reasonable estimate of the value of a continuous field in places where the field has not actually been measured [29]. This operation only makes sense from the continuous-field perspective. In addition, spatial interpolation can weaken the influence of administrative boundaries on the spatial distribution of certain attributes, and its methods include point and areal interpolation [30]. This paper adopted point interpolation, which is widely used in disease mapping to predict the disease morbidity in a specific area. Many methods can be used for spatial interpolation, such as inverse distance weighting, global polynomial, local polynomial, spline function, kriging, etc.
Among the various interpolation methods, kriging combines the advantages of other methods and also considers spatial variation factors, which are critical in the study of spatial epidemiology. Kriging is the best interpolation method, when the statistical data satisfies the hypothesis (either a normal distribution and second-order stationary hypothesis or an intrinsic hypothesis) [31]. Consequently, it is possible to precisely describe the random space processes of illness.
Kriging uses variograms which are widely used to describe regionalised variables. A variogram is defined as a type of variance in regionalised variables and is important in Geostatistical analysis [32]. Under the conditions of either a second-order stationary hypothesis or an intrinsic hypothesis, any random h can be indicated by the following formula: If the sample points are presented as a discrete distribution, then the formula is: is the variogram and is the number of sample points with h spatial distance. () i Zx and In the variogram model ( Figure 2), spatial structures of geographical phenomena are mainly measured as nuggets, sills and ranges. The nugget is a random value reflected by spatial variability and measurement error. When the distance between sample points is approximately equal to zero, random error and spatial variability lead to variations in the variogram values, which will not be zero at the origin points. Range refers to the sample points within the range of spatial dependence. If the distance between two points is greater than the range, the sample point value differences will be stable. A sill is a platform value when the range is unchanged, and the sill is used to measure variation in an entire system.

Correlation Analysis of Spatial Risk Factors
Correlation analysis of spatial risk factors involves measuring and evaluating the correlations between a disease and its risk factors in a small area [7], and the general calculation for measuring risk factor and disease correlations is the Pearson correlation coefficient [33].
The Pearson correlation coefficient describes the degree of association between two variables. The coefficient is represented by the letter "r", and the formula for the calculation is as follows: where n is the sample number, i X and i Y are the observed values of two variables, X and Y are the averages of the two variables. R measures the degree of correlation between the two variables, X  and Y  represents the standard deviations of the two variables. The value of r is between −1 and 1. If r > 0, there is a positive correlation between the two variables, whereas if r < 0, there is a negative correlation between the variables. For larger absolute values of r, there is greater correlation between the two variables. If r = 0, there is no linear correlation between the two variables. As shown in Table 1, the interpretation of the Pearson correlation coefficient put forward by some scholars is as follows [34,35]. However, it should be noted that all of these criteria are, to a certain extent, arbitrary rather than strict [35]. The interpretation of correlation coefficients depends on the specific application background and purpose. Because the subject of this study was affected by various and complicated social factors, the requirements for the calculation results were not strict; thus, the criteria in the table can be considered reliable and reasonable.

Analysis of Spatial Access
Spatial access is an uncertain concept that depends on specific occasions or practical problems [27]. In terms of this paper, spatial access was defined as the extent of overcoming spatial obstacles. Based on two groups of combination, potential and realized and spatial and non-spatial, access can be divided into four types: potential spatial access, potential non-spatial access, realized spatial access, realized non-spatial access [36]. Realized access refers to the actual consumption of services, whereas potential access refers to the possibility of service consumption. In this study, potential spatial access to medical resources was the study focus, because this form of access can evaluate whether the regional distribution of medical resources is reasonable.
In the study of spatial access to medical resources, there are mainly three factors: spatial distribution and the supply of medical services; spatial distribution and the residential area; and the spatial relationship between the population and medical services [27].
There are many methods for evaluating the spatial access. Generally speaking, the gravity model and the two-step floating catchment area method are extensively used in this field. This paper used the improved two-step floating catchment area method, which is different from the enhanced two-step floating catchment area method (E2SFCA, Luo W) [37], to measure the potential spatial interaction between patients and hospitals across administrative regions.
The improved two-step floating catchment area method is based on the method proposed by Radke and Mu but includes two improvements. First, the improved method considers the effect of distance decay in the search scope, thus overcoming the shortcoming that all of the demand points have the same access within the search scope. Second, the improved method offers an effective way to set the search radius for each hospital.
The improved two-step floating catchment area method includes the following two main steps, which are further elaborated: (1) Setting the search radius value and calculating the ratio of supply and demand.
(a) The search radius value can be set by the service level and capacity of each hospital. Hospitals are divided into three levels, with higher service levels, indicating a greater search radius value. (b) To set a more accurate hospital service radius, a group gradient search radius value can be set at the same hospital level. Each level has three search radius values. (c) Based on the requirements of the hospital, which mainly depend on the number of beds and the number of health technical personnel, each hospital's service capacity can be calculated.
In sum, the hospital's supply and demand ratio jl R can be indicated by the following formula: where l is the hospital level, j is the number of hospitals at that same level, k is the number of study units, jl S is the hospital's service capacity and r D is the search radius. Each level has three search radius values. In addition, kj d represents the distance between a residential area and a hospital, and k P is the population of each study unit.
(2) Calculating the spatial access in each study unit To consider the effect of distance decay on search scope, a weight that is inversely proportional to the distance can be calculated according to the distance between residential areas and hospitals. The spatial access in each study unit, k A , can be indicated by the following formula: where l is the hospital level, j is the number of hospitals at that same level, k is the number of study units, jl S is the hospital's service capacity and r D is the search radius. Each level has three search radius values. In addition, kj d represents the distance between a residential area and a hospital, and  is the hospital level.

Hepatitis B Data Processing
The statistical data for this paper included all new hospitalized hepatitis B cases in Shenzhen in 2010. In this group of data, there were 2,851 new hepatitis B cases. As shown in Figure 3, the distribution of hepatitis B cases was uneven, as cases were concentrated in the southwest of Shenzhen, but sparse in the rest of the city. The home addresses of case were accurate to the sub-district level; as a result, geometric centre points of each sub-district could be used to represent respective spatial position (shown in Figure 4). These points represent the number of hepatitis B cases in each sub-district, as shown in Table 2. According to the table, Nantou, Lianhua and Yuehai demonstrated the most cases in Shenzhen in 2010. The raw statistical data based on the regional administrative units indicated the number of hepatitis B cases. These data were not suitable for spatial interpolation, but they can be converted to morbidity rates in epidemiological studies. The morbidity of hepatitis B was expressed as: where M is the morbidity, c N is the number of new cases, and P is the exposed population.
In fact, with a larger population in a specific region, the morbidity from this calculation could be more accurate, and with a smaller population, the morbidity might be less accurate; thus, these morbidity data cannot reflect the true spatial distribution of hepatitis B. Based on the population in Shenzhen in 2010, there was a noticeable disparity among the different regions, making it necessary to adjust the morbidity in each sub-district to improve the accuracy.
where i w represents a weight with a value between 0 and 1, i n is population in each sub-district, i  represents mathematical expectation and 2 i  represents variance of hepatitis B morbidity. The Equations (7) and (8) where  and n represents the mean value of hepatitis B morbidity and population in the study area, respectively, and i y is the number of new hepatitis B cases in each sub-district.   Table 3 shows the adjusted hepatitis B morbidity in each sub-district, which was used for kriging interpolation after preprocessing. Before using kriging interpolation, morbidity data must be converted using a statistical method. Logarithmic conversion is widely used in this field, as it tends to shift data tend to a normal distribution and satisfy the data's stationary hypothesis [39]. After the conversion, the morbidity data were in accordance with the demand of kriging interpolation. Figure 5 shows the spatial distribution of hepatitis B morbidity; Figure 5 used no interpolation method, whereas Figure 5b used kriging interpolation. In general, the locations of high morbidity on both graphs were the same, and the highest morbidity was observed in the southwest of Shenzhen. Nantou and its vicinity exhibited the most hepatitis B infections, and Guangming, Pingdi, Shiyan, Pingshan, Lianhua, Donghu, Nan'ao, Shekou (S) and their respective vicinities represented the other high-morbidity areas in Shenzhen in 2010. Comparing these two graphs, Figure 5a shows a statistical map of grade, whereas Figure 5 shows a contour map that weakens the influence of administrative boundaries on the spatial distribution of hepatitis B morbidity. The population often flows constantly from one district to another, and thus the results shown in Figure 5 better reflect the reality.

The Types of Risk Factors
The kriging interpolation results show that there were areas of high hepatitis B morbidity, and it can be inferred that some risk factors may have had an effect on this high morbidity. Hepatitis B is an infectious disease caused by HBV that is mainly transmitted through blood or body fluids. According to the conclusions of previous studies, some service facilities such as bath centres, beauty salons, massage parlours and pedicure parlours contribute to a high risk for HBV infection [40][41][42][43]. The shared utensils used at these facilities are not only unsterile, but also the skin of customers and service staff members can be easily broken during this services. Therefore, these four types of service facilities can be regarded as dangerous point sources of HBV infection.

Data Processing of Risk Factors
Data concerning the numbers and addresses for these four types of service facilities were extracted from Baidu maps. As shown in Figure 6, the spatial distribution of the four types of service facilities was uneven, with increasing numbers detected in southwest Shenzhen.
To accurately reflect the true spatial distribution of the four types of facilities, their density in each sub-district could be calculated with the following formula: where f D is the density of the facilities in each sub-district, f N is the number of each type of facility in each sub-district, and S is the area of each sub-district. Figure 6 shows the spatial distribution density of each type of service facility in each sub-district. The highest was observed in the south and southwest of Shenzhen. Comparing with Figures 5a,b and Figure  6, it is clear that there was a positive correlation between the hepatitis B morbidity distribution and the density distribution of the four types of service facilities. The correlation between these two factors could be measured accurately by calculating the Pearson correlation coefficient. Figure 6. The spatial distribution density of the four types of service facilities. Table 4 shows the Pearson correlation coefficients calculated between each type of facility and the hepatitis B morbidity after performing a t-test (p < 0.05). In general, there was a positive correlation between these types of facilities and the hepatitis B morbidity, which suggests that these facilities have a positive effect on the spread of hepatitis B. Among the Pearson correlation coefficient results, the coefficient between massage parlours and hepatitis B morbidity was the largest of the four types of facilities, followed by bath centres, pedicure parlours and beauty salons. Interpretation of the Pearson correlation coefficients is shown in Table 1. Because hepatitis B studies are related to the social sciences, the interpretation of coefficients in this table is considered to be reasonable and reliable. From the calculation results, each of the four types of service facilities showed a positive correlation with hepatitis B morbidity in Shenzhen in 2010. Although these four types of facilities to an extent can make people's life convenient and drive local economic development, they greatly increase the risk of hepatitis B infection. Because of factors such as the large number of HBVinfected individuals in China, the troubling health status in the service facilities and a lack of disease prevention awareness, these four types of service establishment can be easily regarded as dangerous sources of HBV infection. Distinguishing between spatial risk factors not only helps to analyze how they influence the spatial spread of epidemic diseases, but also contributes to establishing disease prevention and control measures.

The Spatial Distribution of Medical Resources
According to statistics from the Shenzhen Centre for Health Information, there were 139 hospitals in Shenzhen in 2010, and 65 hospitals with level certificates were related to the prevention and control of liver diseases. On the basis of the scale and level of medical treatment, the hospitals were divided into three levels in descending order: A, B and C. As shown in Figure 7, in terms of the number of hospitals, most hospitals were located in the southwest or south of Shenzhen. Specifically, there were 9 level-A hospitals, which have the largest red crosses; 22 level-B hospitals, with slightly smaller crosses; and 34 level-C hospitals, with the smallest crosses. Most of the hospitals with high service ability are also located in the southwest or south of Shenzhen. Thus, it can be concluded that Shenzhen lacks highquality medical resources and that spatial distribution of medical resources is extremely uneven regardless of quantity or quality.

The Calculation of Spatial Access
The spatial access to medical resources was measured using the improved two-step floating catchment area method. The main calculation progress is described below: 1. Setting the value of the search radius Based on the mean population density of Shenzhen and the number of people the hospitals provide with medical service [44], the search radius values for the different hospital levels were calculated. The result indicated that the level-A hospital search radius was 6 km, the level-B hospital search radius was 3 km, and the level-C hospital search radius was 1.7 km.
To more accurately determine the hospital service radius values, a group of gradient search radius values was set for the same hospital levels. Because Shenzhen lacked medical resources, the search radius value had to be somewhat larger. Therefore, for the level-A hospitals, the gradient search radius values were 6, 9 and 12 km; for the level-B hospitals, they were 3, 4 and 5 km and for the level-C hospitals, they were 1.7, 2 and 2.3 km.
2. Calculating the ratio of supply and demand The calculation of spatial access to medical resources could be divided into three groups according to the different search radius values. Specifically, values of 6, 3 and 1.7 km, were used for the first group; 9, 4 and 2 km for the second group; and 12, 5 and 2.3 km for the third group.
Using the different search radius values for each group, search circles could be drawn using buffer analysis in ArcGIS Desktop (Environmental Systems Research Institute, Inc. [ERSI], Redlands, CA, USA). The number of points in the search circle was counted and the corresponding points in each subdistrict were determined. Finally, each hospital's supply and demand ratio was calculated using Equation (4).
3. Calculating the spatial access in each sub-district Equation (5) uses two parameters: the distance between a residential area and a hospital, and each hospital's level coefficient. The distance between the residential areas and hospitals was measured using the tools in ArcGIS. The hospital level coefficients, which were determined by hospital level, were 3, 2 and 1, corresponding to hospital levels A, B, and C. Then, spatial access to the hospitals in each subdistrict could be calculated using Equation (5). However, some sub-districts may not have been included in any search circles; this would indicate that no services could be obtained for these sub-districts, which was highly unlikely. To resolve this challenge, we selected the nearest hospital to the sub-district and calculated its spatial access using the same process. Table 5 shows the spatial access to hospitals in each sub-district within different search radiuses.

Kriging interpolation of spatial access
Kriging interpolation can also be used to determine the spatial access to medical resources. Using the same method of logarithmic conversion, the spatial access data for each sub-district were transformed to approximate the normal distribution and satisfy the stationary data hypothesis.  Figure 8b,c were 9, 4, 2 km and 12, 5, 2.3 km, respectively. These values showed the same corresponding relationship with hospital level. Within the three groups, Cuizhu, Huaqiang North and Huafu were the sub-districts with the most spatial access. Figure 8d-f shows three contour maps, for which kriging interpolation was used, that were also based on the different search radius values. The locations of high and low spatial access were approximately the same, and the distribution of access was uneven as well. In general, there were three centres with high spatial access, which were concentrated in the southwest, south and northeast, and two low-access centres, which were concentrated in the northwest and southeast. The reason for this distribution pattern is that most of the high-quality medical resources are concentrated in the southern area, which is the economic centre of Shenzhen. In addition, most of the level-A hospitals which have sufficient paramedics, advanced medical facilities, and professional medical staff, are located in the same region, whereas in areas with low spatial access, there is a lack of medical resources to meet the demands of the surrounding population.  (c) (d)

The Distribution of Spatial Access
(e) (f)

Relationship between Hepatitis B Morbidity and Spatial Access
The adequacy of medical resources can be measured by the spatial access index, and comparing a patient's regional access to medical resources with disease morbidity make it possible to determine whether the regional medical resource distribution is adequate for this particular disease. In Figure 9, the green column represents access to regional medical resources, and the orange column represents hepatitis B morbidity. It can be seen from these three figures that regardless of the search radius, in sub-districts with high hepatitis B morbidity, the spatial access index was generally higher; whereas in sub-districts with low morbidity, the spatial access index was generally lower. This result indicated that hospitals were able to provide sufficient medical services for neighbouring hepatitis B patients. In general, the allocation and spatial distribution of medical resources for hepatitis B were adequate.

Conclusions
1. In epidemiological research, mining geographic spatial data can not only reveal the spatial distribution characteristics of infectious diseases but also may identify risk factors that have a major impact on the spread of these diseases. In addition, assessing the spatial access to medical resources can reveal the accuracy of medical resource spatial distribution. These studies have important significance for establishing infectious disease prevention and control measures.
2. The spatial distribution of hepatitis B; the distribution of high-transmission-risk facilities such as bath centres, beauty salons, massage parlours and pedicure parlours and access to regional medical resources were uneven in Shenzhen in 2010, with a main concentration of such factors in the south and southwest, which correspond to the economic centre regions of Shenzhen. On the one hand, these distribution characteristics indicated that hepatitis B infection had a positive correlation with the four high-risk service facilities. On the other hand, the allocation and spatial distribution of medical resources for hepatitis B was adequate.
3. In view of these conditions, measures must be taken to strengthen the prevention and control of hepatitis B, such as improving the sanitary conditions in service facilities, raising awareness about the spread and prevention of hepatitis B and strengthening the construction of medical facilities in economically less-developed areas in Shenzhen.
4. This paper did not research spatial distribution of hepatitis B in a temporal manner because of limited data. In addition, the correlation between hepatitis B and high-risk locations was not very strong, and thus further studies are necessary to determine whether hepatitis B correlates with certain geographical or other elements in Shenzhen city. Furthermore, The method, the improved two-step floating catchment area model to evaluate the spatial access to medical resources must continuously be modified for more accurate evaluation results. These limitations will be studied further in the future.