Environmental and Socioeconomic Factors for Gastric Cancer in 14 Counties of the Huai River Basin from 2014 to 2018

To explore the potential relationship between environmental and socioeconomic factors and the risk of gastric cancer (GC) in the Huai River Basin, the GC incidence rate (GIR) and GC mortality rate (GMR) data from 2014 to 2018 in 14 counties of the Huai River Basin were collected from the Chinese Cancer Registration Annual Report. Environmental and socioeconomic parameters were collected through the Statistical Yearbook. The 14 counties were classified into three groups with low, moderate, and high risk of GC according to the point density of environmental factors (PDF) and index of socioeconomic factors (ISF). Significant differences in GIR and GMR were found among the counties with PDF (χ2 = 21.36, p < 0.01) and ISF (χ2 = 11.37, p < 0.05) levels. Meanwhile, significant differences in mortality rate were observed among counties with different PDF (χ2 = 11.25, p < 0.01) and ISF (χ2 = 18.74, p < 0.01), and the results showed that the ISF and PDF were increased while the GIR and GMR were decreased. Meanwhile, there was a lag effect between them, and we used two models to explore the lag effects between ISF, PDF and GIR and GMR; the coefficient influence between the ISF lag phase and GIR was −2.9768, and the coefficient influence between PDF and the lag phase on the GIR was −0.9332, and there were both significant impact when there was a probability of more than 95%. The results showed that the higher the ISF and PDF that lags in one stage, the more GIR was reduced, while the impact of the ISF and PDF on lag stage on mortality was not obvious. We used differential GMM to test the results, and also research results were relatively robust. Overall, GIR and GMR decreased with increasing point density of environmental factors and index of socioeconomic factors.


Introduction
The Huai River Basin (HRB) is located between the Yangtze River Basin and the Yellow River Basin and spans over 1000 km in the Henan, Anhui, Jiangsu, and Shandong provinces. With the aggravation of pollution in the HRB, reports have emerged about villages with high incidences of gastric cancer [1]. Therefore, people have paid increasingly strong attention to gastric cancer in counties of the HRB.
Gastric cancer (GC) has a high prevalence in the world. According to the latest statistics released by the International Agency for Research on Cancer, there were approximately 608,000 new cases of GC and 433,800 GC deaths worldwide in 2020 [2]. The GC incidence rates (GIRs) and GC mortality rates (GMRs) were significantly higher in East Asia (e.g., Mongolia, Japan, and South Korea) compared to in Northern Europe [3,4]. GC was the third most common malignant tumor in China and one of the primary disease burdens faced by Chinese residents [5]. The detrimental effects of GC are mainly physical and psychological [6]. Furthermore, the development of GC quickly occurred. Finally, most families cannot afford the high cost of treatment; therefore, people have paid an intensive attention to the GC. primary industry, added value of secondary industry, savings balance of urban and rural residents, general public budget income, gross industrial output (current price), and number of hospital beds. These variables were selected because they potentially affect health outcomes throughout the life course. To this day, no studies on this topic have been conducted in HRB. In consideration of the above, we used GIS and SAS software to analyze the differences in GIR and GMR among 14 counties in the HRB based on the point density of environmental factors and index of socioeconomic factors. Due to the heterogeneous distribution of environmental exposure and socioeconomic status, the GIR showed significant spatial differences. The geographical distribution of GC in the 14 counties can be used to improve our understanding of disease and be used to help identify high-and low-risk areas [24]. We chose 14 counties as the unit of analysis because the research question was formulated at the area-level, and the main construct investigated (environmental and socioeconomic determinants) is conceptualized as an area-level attribute. Environmental and socioeconomic determinants are conceptualized as a group attribute that affects all individuals living within the community, and the interest is in drawing inferences regarding differences between areas. primary industry, added value of secondary industry, savings balance of urban and rural residents, general public budget income, gross industrial output (current price), and number of hospital beds. These variables were selected because they potentially affect health outcomes throughout the life course. To this day, no studies on this topic have been conducted in HRB. In consideration of the above, we used GIS and SAS software to analyze the differences in GIR and GMR among 14 counties in the HRB based on the point density of environmental factors and index of socioeconomic factors. Due to the heterogeneous distribution of environmental exposure and socioeconomic status, the GIR showed significant spatial differences. The geographical distribution of GC in the 14 counties can be used to improve our understanding of disease and be used to help identify high-and low-risk areas [24]. We chose 14 counties as the unit of analysis because the research question was formulated at the area-level, and the main construct investigated (environmental and socioeconomic determinants) is conceptualized as an area-level attribute. Environmental and socioeconomic determinants are conceptualized as a group attribute that affects all individuals living within the community, and the interest is in drawing inferences regarding differences between areas.

Data Source
The annual GC incidence and mortality data at the county level from 2014 to 2018 were obtained from the Chinese Cancer Registry Annual Report released by the National Cancer Center of China. The indicators included cases and deaths and the GIR and GMR

Data Source
The annual GC incidence and mortality data at the county level from 2014 to 2018 were obtained from the Chinese Cancer Registry Annual Report released by the National Cancer Center of China. The indicators included cases and deaths and the GIR and GMR based on the standardization of the Segi's world population composition, which contains all the populations of the world over 2000 years.
The data used to calculate the Point Density of Environmental Factors came from a survey conducted in the 14 counties of the HRB by the National Institute of Environmental Health, Chinese Center for Disease Control and Prevention from 2012 to 2018. The survey focused on the natural environment, social environment, rural living environment, and sanitation conditions in the 14 counties. In this study, three indicators (harmless disposal points, sewage treatment points, and concentrated stacking points of domestic waste) were selected to represent the efficiency of local sewage and garbage treatment.
Socioeconomic Factors in 2012 to 2018 were obtained from the Chinese Yearbook of Social and Economic Statistics issued by the Chinese Bureau of Statistics. Six countylevel indicators were selected in this study to characterize the socioeconomic status in the 14 counties: added value of primary industry, added value of secondary industry, savings balance of urban and rural residents, general public budget income, gross industrial output (current price), and number of hospital beds.

Statistical Analyses
The vectorized county boundary map of the 14 counties in the HRB was taken as the base map in this study. The GIR and GMR values from 2014 to 2018 in the HRB were imported into the GIS system to establish the database. Using ArcGIS 10.7, the Point Density of Environmental Factors was calculated with the GIS spatial connection method. Meanwhile, the Index of Socioeconomic Factors was obtained using SAS 9.4 software based on principal component analysis (PCA). The differences in GIR and GMR values among the counties with Point Density of Environmental Factors and Index of Socioeconomic Factors levels were evaluated using Chi-square tests.

Standardization of GIR and GMR Values
The age structure of the Segi's world population composition in 2000 was used to standardize the GIR and GMR values in the 14 counties of the HRB from 2014 to 2018. The age-standardized rates were calculated as follows: (1) calculate the age-specific GIRs/GMRs; (2) multiply the age-specific incidence/mortality rates by the corresponding age-structured percentage of the world standard population to obtain the corresponding theoretical incidence/mortality rates; and (3) sum the theoretical age-specific incidence/mortality rates to obtain the age-standardized incidence/mortality rates, as shown in Equation (1): Standardized incidence or mortality rates = (∑ standard population age composition × age−specific incidence or mortality rates) (∑ standard population age composition) (1) where standardized incidence/mortality rates are the GC incidence/mortality rates in one county after standardization in units of 1/100,000; the standard population age composition is the Segi's world population composition in 2000; age is the entire age structure of one county; and specific incidence/mortality rates are the GC incidence/mortality rates in one county before standardization in units of 1/100,000.

Analysis of Point Density of Environmental Factors
The quantity of the three-point density of environmental factors (sewage treatment plants, garbage treatment plants, and garbage dumps) in a single county was determined using ArcGIS software according to a previously reported method [25,26]. The Point density of environmental factors (m), which refers to the number of environmental factors in each county in number/km 2 , was obtained using the Spatial Connection tool in ArcGIS software [27], as shown in Equation (2): where data of environmental factors in the county is the number of vector planes obtained by longitude and latitude in units of number, and county area is the area of the county in units of km 2 .

Analysis Index of Socioeconomic Factors
The index of socioeconomic factors was evaluated based on six economic factors: the added value of primary industry, the added value of secondary industry, savings balance of urban and rural residents, general public budget income, the total industrial output value (current price), and the number of hospital beds [28]. First, a single factor was normalized as shown in Equation (3): where Normalization index is the normalized value of the factor; Original value is the original value of the factor; Max value is the maximum value of the factor; and Min value is the minimum value of the factor. After normalization, PCA was carried out at the county level, and the Normalization operation was extracted. Finally, according to the value and contribution of each principal component, Equation (4) was used to calculate the socioeconomic factor scores of each county (SEvalue), with a larger SEvalue indicating a more developed social economy: where F 1 through F n are the principal component values obtained by PCA; Contribution 1 through Contribution n are the contribution rates of the central components; and Accumulative Contribution is the total contribution rate of all features.

GIR and GMR Values in the 14 Counties of the HRB from 2014 to 2018
From 2014 to 2018, the GIR and GMR values in the 14 counties in the HRB showed a decreasing trend over time, and the gender-specific GIR and GMR values showed similar trends in males (Figures 2 and 3). the index of socioeconomic factors was evaluated based on six economic facto added value of primary industry, the added value of secondary industry, savings ba of urban and rural residents, general public budget income, the total industrial o value (current price), and the number of hospital beds [28]. First, a single factor wa malized as shown in Equation (3): where Normalization index is the normalized value of the factor; Original va the original value of the factor; Max value is the maximum value of the factor; an value is the minimum value of the factor.
After normalization, PCA was carried out at the county level, and the Normali operation was extracted. Finally, according to the value and contribution of each pri component, Eq. (4) was used to calculate the socioeconomic factor scores of each c (SEvalue), with a larger SEvalue indicating a more developed social economy: where F1 through Fn are the principal component values obtained by PCA; Contrib through Contributionn are the contribution rates of the central components; and Acc lative Contribution is the total contribution rate of all features.

GIR and GMR Values in the 14 Counties of the HRB from 2014 to 2018
From 2014 to 2018, the GIR and GMR values in the 14 counties in the HRB sho decreasing trend over time, and the gender-specific GIR and GMR values showed s trends in males (Figures 2 and 3).

Analysis Index of Socioeconomic Factors
the index of socioeconomic factors was evaluated based on six economic factors: the added value of primary industry, the added value of secondary industry, savings balance of urban and rural residents, general public budget income, the total industrial output value (current price), and the number of hospital beds [28]. First, a single factor was normalized as shown in Equation (3): where Normalization index is the normalized value of the factor; Original value is the original value of the factor; Max value is the maximum value of the factor; and Min value is the minimum value of the factor.
After normalization, PCA was carried out at the county level, and the Normalization operation was extracted. Finally, according to the value and contribution of each principal component, Eq. (4) was used to calculate the socioeconomic factor scores of each county (SEvalue), with a larger SEvalue indicating a more developed social economy: where F1 through Fn are the principal component values obtained by PCA; Contribution1 through Contributionn are the contribution rates of the central components; and Accumulative Contribution is the total contribution rate of all features.

GIR and GMR Values in the 14 Counties of the HRB from 2014 to 2018
From 2014 to 2018, the GIR and GMR values in the 14 counties in the HRB showed a decreasing trend over time, and the gender-specific GIR and GMR values showed similar trends in males (Figures 2 and 3).  As shown in Figure 3, the average GIR and GMR values from 2014 to 2018 in the 14 counties ranged from 32.84/100,000 to 93.05/100,000 and from 26.19/100,000 to 66.55/100,000, respectively. In ten counties (YD, MC, LB, JY, YQ, XY, LS, JH, SX, and SY), GIR decreased over time with annual decreases ranging from 0.20% to 19.73%. In contrast, GIR increased over time in four counties (WS, XP, FG, and SQ). Similarly, the GMRs in the above ten counties decreased over time with annual decreases ranging from 1.91% to 22.7%, while GMR increased in four counties (WS, XP, FG, and SQ), as shown in Figure 4.

Point Density of Environmental Factors
According to the data obtained from the Environmental Factors Survey Project in the 14 counties of the HRB from 2012 to 2018, 1826 pieces of geographical information data related to sewage treatment plants, garbage treatment plants, and garbage dumps were collected for the analysis of point density of environmental factors. Based on the longitude and latitude of the selected environmental point sites, all point information was imported into Arcmap10.7. Combined with the county area of the 14 counties in the HRB, the comprehensive point density of environmental factors was calculated (Table 1).

Interaction between the Point Density of Environmental Factors and Index of Socioeconomic Factors
Based on an analysis using SPSS software, there is no interaction between the point density of environmental factors and the index of socioeconomic factors. Therefore, we only analyzed the relationships between the GIR/GMR and the point density of environmental factors and between GIR/GMR and the index of socioeconomic factors.

Relationships between GIR and GMR with the Point Density of Environmental Factors and Index of Socioeconomic Factors
The differences in GIR and GMR among regions with different Point densities of environmental factors and index of socioeconomic factors values were analyzed by Chi- square tests. Statistically significant differences in GIR were observed between counties with different point density of environmental factors scores (χ 2 = 21.36, p < 0.01) and between counties with different index of socioeconomic factors scores (χ 2 = 11.37, p < 0.05). As shown in Table 3, a higher point density of environmental factors corresponded to a lower GIR. Significant differences were also observed in GMR between counties with different levels of point density of environmental factors and between counties with different levels of index of socioeconomic factors (χ 2 = 11.25, p < 0.01 and χ 2 = 18.74, p < 0.01, respectively). Higher values of point density of environmental factors and index of socioeconomic factors corresponded to lower GMR (Table 4).

Relevant Tests of Panel Data
In this paper, we carried out the relevant tests of panel data, and after it was processed accordingly, descriptive statistics were carried out to obtain the basic situation of the variable data. The datum was estimated by the Dynamic Panel System, GMM was used to explore the regression result, and then we used the differential GMM estimation method; if the results were still consistent, the model results in this paper were relatively robust.

Build a Model
By setting the explanatory variables as well as the explanatory variables, the model was built as follows

Descriptive Statistics
Descriptive statistics were performed on the sample data of each variable to understand the basic situation of the data and to provide a basic understanding of the research data in this article, and the descriptive statistics were shown in Table 5: The mean GIR was 53.2057, the GMR was 38.3057, the ISF was 0.0003, and the PDF was 0.000017, all of the data are positive numbers, indicating that the average growth rate of the economy was positive or growing.

Dynamic Panel Regression
Since the lag of the interpreted variable may affect the current period of the interpreted variable and solve the endogenous nature that the model might produce, the model was estimated through the Dynamic Panel System, and GMM was used to use the interpreted variable as an instrumental variable to estimate the Dynamic Panel Model, as shown in Table 6: There was a significant effect more than 99% probability, *: There was a significant effect of more than 95% probability. The absence of an asterisk indicated that there was no significant effect. Inside the parentheses was the t-value. L stands for one lag item.
According to the Arellano-Bond autocorrelation test, if AR (1) rejects the null hypothesis, and the AR (2) test accepts the null hypothesis, it showed that there was no sequence correlation problem in the Dynamic Panel Model, and in the above test, AR (1) corresponds to the p-value, the values were 0.0328 and 0.0436, both less than 0.1, and the corresponding p-value for AR (2) were 0.9374 and 0.2618, both lager than 0.1, so there was no autocorrelation problem in the model. The probability values corresponding to Sargan values were 0.4746 and 0.7873, and were greater than 0.1, indicating that the acceptance of instrumental variables was not over-recognized, and the setting of instrumental variables was also reasonable. Judging from the above regression results, the interpreted variables of the lag phase had a certain impact on the explanatory variables of the current period, and there was a relatively obvious positive impact; the GIR and GMR of the previous period would affect the GIR and GMR of this period. Otherwise, the coefficient influence between the ISF lag phase and GIR was −2.9768, and the coefficient in-fluence between PDF and the lag phase on the GIR was −0.9332, and there were both significant impact when there was a probability of more than 95%. The results showed that the higher the ISF and PDF that lags one stage, the more GIR was reduced, while the impact of the ISF and PDF on lag stage on mortality was not obvious.

Robustness Testing
The above used the GMM estimation method and then used the differential GMM estimation method; if the results were still consistent, the model results in this paper were relatively robust.
As shown in Table 7, according to the Arellano-Bond autocorrelation test, the AR (1) of the differential GMM corresponds to a p-value of 0.0173 and 0.0589, which were less than 0.1, AR (2) corresponds to the p-values of 0.9211 and 0.2052, which were greater than 0.1. Therefore, the model did not include autocorrelation issues. The probability values corresponding to the Sargan value were 0.2517and 0.5545, which were larger than 0.1, indicating that the acceptance of the instrumental variable was not over-recognized. It was also reasonable to set the instrumental variables. While the effective coefficient of the ISF and PDF of the one lag phase of the GIR were −2.8586 and −0.9624, there was still a significant negative effect, and the effect of the ISF and PDF of the lag stage on the mortality was not significant, so regardless of the use of GMM or differential GMM, the results obtained were relatively consistent, and the research results were relatively robust. 0.2517 0.5545 **: There was a significant effect more than 99% probability, *: There was a significant effect of more than 95% probability. The absence of an asterisk indicated that there was no significant effect. Inside the parentheses was the t-value. L stands for one lag item.

Discussion
We investigated the environmental indicators (garbage treatment plants, garbage dumps, and sewage treatment plants) in 14 counties of the HRB from 2014 to 2018. As the point density of environmental factors increased, the GIR and GMR decreased. This suggests that the risk of GC might be related to the point density of environmental factors; as the treatment of sewage and garbage was improved, the GIR and GMR declined. Meanwhile, we collected county-level economic factors (added value of the primary industry, added value of secondary industry, savings balance of urban and rural residents, general public budget income, gross industrial output value (current price), and number of hospital beds) in the 14 counties of the HRB in 2012 to 2018. We found that economic growth was associated with an improvement in people's living standards. For example, economic development can improve family hygiene, facilitate healthy eating habits, and allow refrigerator use to reduce the risk of GC. In this study, as the index of socioeconomic factors increased, the GIR and GMR values decreased. Torre et al. reported that the improved social security benefits and financial strength in recent years had also provided some guarantees for local cancer treatment [29].
The lack of effective waste treatment might lead to scattered landfills or the burning of garbage, which would contribute significantly to environmental pollution. In addition, the relatively high GIR and GMR values in males might be due to a lack of knowledge about the disease and greater exposure to bacterial infection and other occupational risk factors compared with females [23]. Tian et al. reported digestive tract cancer in middle-and high-risk areas according to the classification of GC and analyzed the relationship between the GC and centralized waste and litter treatment; they also found that the higher GIR, the more waste and litter centralized equipment [30]. Wei et al. showed that untreated wastewater contains many nitrate carcinogens [31]. Meanwhile, wastewater contains many nitrogen-containing compounds. Without effective wastewater disposal, these compounds would be converted into carcinogenic nitrate compounds under the action of various microorganisms. In 2016 years, the Huai River Commission of the Ministry of Water Resources inspected 2139 sewage outlets of rivers and imposed pollution restrictions on several enterprises. The Huai River Water Resources Commission of the Ministry of Water Resources is a full-time organization for comprehensive water resources planning, treatment, and development in the HRB [32]; therefore, the conduction of it had a significant impact on the local disease control.
The socioeconomic conditions also affect malignant tumors. The incidence of GC is higher in areas with relatively low socioeconomic status [12,16]. Vries reported that areas with a monthly income of less than 1000 Won had a higher risk of GC than those with a monthly income higher than 5000 Won [33]. Thus, improvement in the local economy is expected to reduce the occurrence of GC. Developing a better lifestyle, including savings balance of urban and rural residents, general public budget income, and the total industrial output value (current price) in an area could also reduce GC risk and increase the disease rates. The trend in high incidence of gastric cancer among low poverty populations may be explained by a greater participation of the members of wealthier socioeconomic classes in screening programs. This suggests that socioeconomic factors cannot be considered as a determining factor of cancer in a biological sense but since several cancer risk factors are associated with social economics, poverty is a "cause of the cause".
Finally, we also discussed the lag effect in this study. Judging from the above regression analysis, the model results of GIR and GMR were 0.8920 and 0.9795 respectively, indicating that there was a significant effect by more than 99% probability, which means that the interpreted variables of the lag phase do have a certain impact on the explanatory variables of the current period, and there was a relatively obvious positive relationship that the GIR and GMR of the previous period would affect the GIR and GMR of this period. Meanwhile, the higher the ISF and PDF that lags one stage, the incidence was reduced, while the impact of the economic lag stage on mortality was not obvious. Therefore, we can conclude that the ISF and PDF have one lag effect of gastric cancer's incidence.
There are some limitations to our study. First, the disease data were extracted from the Chinese Cancer Registration Annual Report, and some data were missing. In the case of missing data, data from neighboring counties were used to make up for the missing values, which may have resulted in information bias. In addition, the economic factors used in this study (e.g., general public budget revenue and the savings balances of urban and rural residents) may not fully represent the income level on the village or family scale. In future studies, other economic variables should be obtained for inclusion in the analysis. The development of digestive tract tumors generally occurs over 10 or 20 years or even longer. The disease data analyzed in this study were from 2014 to 2018, while the point density of environmental factors and index of socioeconomic factors were from 2012 to 2018. Both these time periods are less than 10 years; however, the GIR and GMR values were significantly different between counties and showed the same downward trend over time; meanwhile, the effects of environmental and socioeconomic were accumulated year by year, so the results were still convincing.
We provide the following suggestions for future work. First, more reliable variables should be obtained, including the income status and health intervention measures in villages and households. Second, some spatiotemporal models (e.g., geographically weighted regression and the GeoDetector model) should be used to analyze the spatial heterogeneity of factors influencing GIR and GMR [34].

Conclusions
In this study, we demonstrated the relationships between PDF/ISF and GIR/GMR. Larger numbers of garbage dumps, garbage treatment plants, and sewage treatment plants were associated with lower GIR and GMR. In addition, as the six evaluated socioeconomic factors (added value of primary industry, added value of secondary industry, savings balance of urban and rural residents, general public budget income, total industrial output value (current price), and number of hospital beds) increased, the GIR and GMR values in the 14 counties of the HRB decreased. Thus, as concrete measures to increase economic investment and improving resident income have taken effect, cancer prevention and control have improved in the HRB, and the better control of environmental sewage and garbage dumps was significant to the GC reduction.
Understanding the associations between different risk factors and GIR/GMR is helpful for identifying interventions to control GC. The findings of this study are valuable for epidemiologists as they provide information on the potential link between cancer development and environmental and economic factors in the HRB. The results suggest that local governments had enhanced measures to improve sewage and garbage treatment to better control GC, and meanwhile the economic were boost increased have decreased the GC.  Data Availability Statement: The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.