Risk Analysis of Thyroid Cancer in China: A Spatial Analysis

: Thyroid cancer (TC) is the fastest growing cancer in China and has lots of inﬂuencing factors which can be intervened to reduce its incidence. In this article, we aimed to identify the risk factors of TC. The regional TC data in 2016 were obtained from the China Cancer Registry Annual Report published by the National Cancer Center (NCC). Univariate correlation analysis and generalized linear Poisson regression analysis were used to determine risk factors for morbidity of TC from the provincial and prefecture levels. High urbanization rate (UR) (RR = 1.109, 95%CI: 1.084, 1.135), high GDP per capita (PGDP) (RR = 1.013, 95%CI: 1.007, 1.018), high aquatic products (RR = 1.047, 95%CI: 1.020, 1.075) and dry and fresh fruit consumption (RR = 1.024, 95%CI: 1.007, 1.040) can increase TC incidence. Therefore, high PGDP, high UR, high aquatic products and dry and fresh fruit consumption were all risk factors for TC incidence. Our results may be helpful for providing analytical ideas and methodological references for the regionalized prevention and control of TC in a targeted manner.


Introduction
Thyroid cancer (TC) is one of the most common malignant tumors of the endocrine system, accounting for about 1% of systemic tumors. The main pathological types are follicular thyroid carcinoma (FTC), papillary thyroid carcinoma (PTC), anaplastic thyroid carcinoma and medullary thyroid carcinoma (MTC). Among them, PTC accounted for the majority of cases (92.98%), but this type of cancer has a low degree of malignancy and a good prognosis. Studies show approximately 2.7% of women and 0.7% of men worldwide suffer from TC [1]. According to global cancer statistics, the number of new cancer cases worldwide is about 18,078,957 every year, of which 4% are TC [2]. In 2016, there were 238,000 TC patients worldwide, yet the number of TC patients reached 586,000 in just four years, a 1.46-fold increase [3,4]. It is predicted that by 2030, TC will surpass colorectal cancer to become the fourth major malignant tumor after lung cancer, breast cancer and prostate cancer [5]. TC incidence in China has been increasing in recent years, similar to the global trend. According to the Annual Report of China Cancer Registry, the incidence of TC increased from 4.3/100,000 (2005) to 13.22/100,000 (2016). Researchers have predicted the incidence of TC in the mainland of China will continue increasing for at least 20 years, with more than 3.7 million new cases between 2028-2032 [6].
Previous studies discussed many risk factors associated with TC, e.g., ionizing radiation, weight, iodine intake as well as environment pollutants, diet, behaviors and economic level. Studies show that exposure to 131 I in childhood increased the risk of TC, exposure dose had a certain correlation with the degree of risk, and the two were linearly related when the radiation dose was 1.5-2.0 Gy [7]. The mechanism of obesity-induced TC was very complex, which was the result of the combined action of a variety of independent and synergistic factors. A cohort study of adults in the United Kingdom found that for every 5 kg/m 2 increase in BMI, the risk of TC increased by 1.09 times [8]. Iodine was one of the main raw materials for the synthesis of thyroid hormone. Its deficiency and excess both Grid data -Note: crude incidence rate (CIR), age-standardized incidence (IR), urbanization rate (UR), GDP per capita (PGDP), and number of beds per 1,000 people (beds), the coverage rates of basic medical insurance (MIR), green space coverage rates in urban built-up areas (GR), National Space Administration (NASA). Data on socioeconomic factors were at prefecture level and data on dietary behavior factors were at provincial level. Except for average years of education, which were from the sixth national census in 2010, all other data were 2016 data.* indicated the report was published by NCC and there was a three-year lag in the year in the published data. Influencing factors and geographic information layers were connected by administrative codes.

Data of TC
We entered the number of TC cases, crude incidence rate (CIR) and age-standardized incidence (standardized according to the world standard population, hereinafter referred to as incidence, IR) in 2016 and divided them into prefectures and provinces. If a prefecture was not tumor registration site, but there were two or more registration sites in its subordinate counties, then we used the average incidence of these counties to estimate its TC incidence. The data we obtained were called estimated prefecture level data. If there was only one county in a prefecture as a registration site, it was assumed that the incidence of TC in this county covered the entire prefecture, and then made an estimate (Equation (1)): In the formula, IR eci was estimated prefecture level IR, IR aco was actual county level IR under the prefecture, N was the number of county tumor registries in the prefecture.
Similarly, provincial TC data were estimated using data from subordinate prefectures. If a prefecture had both its own and county level TC incidence, only the prefecture level data would be used to estimate TC incidence of the province in which it was located. In other words, the IR of a province was equal to the average of the actual and estimated IR of its subordinate prefectures (Equation (2)): In the formula, IR epr was estimated provincial level IR, IR aci was actual prefecture level IR under the province, α, β were the number of estimated prefectures and actual prefectures, IR eci and N were the same as the Equation (1).

Geographic Information Data
The spatial coordinate system adopted the 2000 National Geodetic Coordinate System, the 1985 National Elevation Datum, and the latitude and longitude coordinates. This study obtained geographic information data of provinces, prefectures, and counties across the country in 2021, including administrative boundaries and geographic coordinates. The administrative codes of 2019 were used for registered prefectures. The cropping and merging function of ArcGIS 10.8.1 software was used to obtain the geographic information of the national tumor registration prefectures (including the actual registered prefectures and estimated registered prefectures) in 2016.

Socioeconomic Factors Data
The details were shown in Table 1.

Dietary Behavior Factors Data
Except for salt iodine, data came from the "2011 National Iodine Deficiency Disease Surveillance" and water iodine data came from the "2017 National Survey on Iodine Content in Drinking Water"; all of the consumption data in each province were from the "2017 China Statistical Yearbook". Data of daily salt intake per capita in Hebei Province were from the "2007 Survey on Chronic Diseases and Risk Factors of Adults in Hebei Province", which was missing in Tibet. Apart from some provinces that came from local research, the smoking rates were all from active reports of each province. Among them, the smoking data of Inner Mongolia Autonomous Region were from 2021, Yunnan were from 2019, Jilin, Guizhou, Gansu were from 2013, and other provinces were from 2014-2018. The data for the above years were the latest data available to us and due to the availability of the data years, data of Tibet, Qinghai and Xinjiang were missing. Obesity rates data came from the "Chronic Diseases and Risk Factors Surveillance Data" report released by the Chinese Center for Disease Control and Prevention in 2017.

PM 2.5 Concentration
In our dataset, PM 2.5 concentration data were obtained by two satellites, MISR and Sea WiFS, and aerosol optical depth (AOD) obtained by the Mesoscale Imaging Spectroradiometer (MODIS). On this basis, the atmospheric chemical transport model was used to establish a mapping relationship with the ground-measured PM 2.5 concentration, and then a geographically weighted regression model was used to calibrate the estimated PM 2.5 concentration and data combined with the ground-measured data. The final output raster layer was at a resolution of 0.01 Through other studies [10,17], we selected compromised PM 2.5 lag years, which were 0, 5, and 10 years. Thus, this study obtained PM 2.5 concentration data from 2006, 2011 and 2016. We extracted the PM 2.5 concentrations of each tumor registry in three time periods through the spatial clipping, spatial analysis and spatial data extraction functions of ArcGIS 10.8.1 software.

Statistical Analysis
This study mainly studied TC from two aspects, spatial analysis and risk factor analysis. The specific details are shown in the figure below ( Figure 1).

Descriptive and Geographical Analysis
The national and different gender TC incidence rates in 2016 were described. Moreover, based on the incidence data of the existing tumor registries, we drew the incidence map of TC nationwide in 2016 by ArcGIS 10.8.1 software according to the prefecture (including the actual registration prefecture and the estimated registration prefecture).

Spatial Autocorrelation Analysis
Spatial autocorrelation referred to spatial dependencies between adjacent spatial units of the same variable and showed correlations between variables across space [18].
It was divided into two categories: global autocorrelation and local autocorrelation. In this study, we used ArcGIS 10.8.1 to perform local autocorrelation analysis on the incidence of TC in 2016. Local autocorrelation analysis detected the spatial distribution of variables from a specific local area and can find the specific type of aggregation and the location of the aggregation area. It was precisely because the global autocorrelation cannot accurately indicate the location of the aggregation area and it was mainly measured by Local Moran's I coefficient (LISA). According to LISA, the specific aggregation type, location and corresponding hot and cold areas of the disease can be explored. Thus, we used LISA to explore the high-high, high-low, low-high, and low-low clusters of TC incidence, further identifying high risk areas of TC incidence.

Descriptive and Geographical Analysis
The national and different gender TC incidence rates in 2016 were described. Moreover, based on the incidence data of the existing tumor registries, we drew the incidence map of TC nationwide in 2016 by ArcGIS 10.8.1 software according to the prefecture (including the actual registration prefecture and the estimated registration prefecture).

Spatial Autocorrelation Analysis
Spatial autocorrelation referred to spatial dependencies between adjacent spatial units of the same variable and showed correlations between variables across space [18].
It was divided into two categories: global autocorrelation and local autocorrelation. In this study, we used ArcGIS 10.8.1 to perform local autocorrelation analysis on the incidence of TC in 2016. Local autocorrelation analysis detected the spatial distribution of variables from a specific local area and can find the specific type of aggregation and the location of the aggregation area. It was precisely because the global autocorrelation cannot accurately indicate the location of the aggregation area and it was mainly measured by Local Moran's I coefficient (LISA). According to LISA, the specific aggregation type, location and corresponding hot and cold areas of the disease can be explored. Thus, we used LISA to explore the high-high, high-low, low-high, and low-low clusters of TC incidence, further identifying high risk areas of TC incidence.

Analysis of Influencing Factors of TC
First, socioeconomic variables were available at the prefecture level, so we studied their impact on TC incidence at prefecture level. However, for dietary behavior variables, we can only obtain the data at the provincial level. As a result of their close association with the occurrence of TC, we had to explore their impact on TC at the provincial level.

Analysis of Influencing Factors of TC
First, socioeconomic variables were available at the prefecture level, so we studied their impact on TC incidence at prefecture level. However, for dietary behavior variables, we can only obtain the data at the provincial level. As a result of their close association with the occurrence of TC, we had to explore their impact on TC at the provincial level.

(a) Univariate Correlation Analysis
In this study, the incidence of TC and the corresponding covariates did not meet the normal distribution at the same time, so we chose Spearman's rank correlation in R4.2.0 for univariate correlation analysis.
At prefecture level: all socioeconomic factors were used as independent variables; IR of TC was used as dependent variable for analysis. At provincial level: dietary behavior factors were also used as independent variables; IR of TC in each province was used as dependent variable. The two-sided p value of less than 0.05 was considered statistically significant, and the "corrplot" package in R4.2.0 was further called to draw a correlation heat map.

(b) Generalized Linear Poisson Regression Model (GLM)
A generalized linear model was an extension of the general linear model, which established the relationship between the mathematical expectation of the response variable and the linearly combined predictor variable through a link function. In generalized linear models, independent variables can be continuous variables, binary variables, or rank variables. The dependent variable can obey the binomial, Poisson, negative binomial, gamma, inverse Gaussian distribution, etc. These distributions were collectively referred to as the exponential distribution family. The distribution of health outcome variables (such as morbidity, mortality, and hospital visits) was usually described by a Poisson distribution, so the GLM was used to analyze the associated factors of TC. RR was used to describe the relationship between TC and various factors.
In this study, the variables with statistical significance after univariate correlation analysis were used as independent variables again at the prefecture and provincial levels, and the incidence of TC was used as the dependent variable. The fitting of GLM was performed in R4.2.0. At the same time, in order to exclude multi-collinearity between the included variables as much as possible, by calling the "leaps" package in R4.2.0 to perform full-subset regression to select the optimal and simplest GLM and eliminate the excessive deviance, we performed the Poisson regression again, which could help us identify risk factors for TC occurrence.
(c) Sensitivity analysis After obtaining the optimal and simplest GLM model, we used K-fold cross-validation to test its robustness. Cross-validation referred to selecting a certain proportion of samples as training samples, and another part of the samples as reserved samples, first using the training samples to obtain the regression equation, and then predicting on the reserved samples. Because the holdout sample did not participate in the model building process, it could be used to estimate the accuracy of the model. K-fold cross-validation meant that the samples were divided into k subsets, and k-1 sub-samples were taken as the training set in turn, and the other subset was used as the reserved set, and finally the average predicted value was obtained. The smaller the value of K, the larger the bias; the larger the value of K, the more unstable the result was, so k = 10 was usually recommended. In this study, K = 10.

Descriptive Analysis
In 2016, the number of TC cases nationwide was 50,424, of which 12,240 (24.27%) were male patients and 38,184 (75.73%) were female patients, which was more than twice the number of male cases. The national crude incidence rate (CIR) of TC was 13.22/100,000, and the incidence rate (IR) was 9.70/100,000, both the CIR and IR of women were higher than men (Table 2). Note: crude incidence rate (CIR), age-standardized incidence rate according to the world standard population (IR).

Geographical Analysis
On the whole, the incidence of TC in most areas of China was at a low level, the high incidence areas were mainly in the coastal areas, especially in Zhejiang Province in the southeast coast, which had the highest incidence of TC in China. Hangzhou had the highest incidence (37.87/100,000) nationwide. The incidence of TC in Zhoushan, Jiaxing, Wenzhou, Shaoxing, Lishui, Ningbo and other prefectures in Zhejiang Province was also at a high level. In addition, Shenzhen, Dalian and Shanghai also had high incidence levels, which were 33.97/100,000, 31.49/100,000, and 31.49/100,000, respectively ( Figure 2). southeast coast, which had the highest incidence of TC in China. Hangzhou ha highest incidence (37.87/100,000) nationwide. The incidence of TC in Zhoushan, Ji Wenzhou, Shaoxing, Lishui, Ningbo and other prefectures in Zhejiang Province wa at a high level. In addition, Shenzhen, Dalian and Shanghai also had high incidenc els, which were 33.97/100,000, 31.49/100,000, and 31.49/100,000, respectively (Figure Note: The blank areas had no available registration data. The blue lines of different shades a thicknesses represented the water towns. The darker and the thicker the line was, the higher stream order. The first order stream was the highest order.

Spatial Autocorrelation Analysis
There was a positive spatial autocorrelation in the incidence of TC nationw 2016 (Z score = 8.03, p < 0.001). Furthermore, there were 13 high-high clusters of T cluding nine prefectures in Zhejiang Province (Hangzhou, Ningbo, Wenzhou, Ji Huzhou, Jinhua, Quzhou, Lishui, and Taizhou). Moreover, five prefectures, incl Bayan Naoer, Taiyuan, Changzhou, Liuzhou and Loudi were high-low clu Xuancheng was a low-high cluster. Finally, the light blue areas in the map incl Yangzhou, Nanchang, Chongqing and the other 24 prefectures were all low-low cl ( Figure 3). The darker and the thicker the line was, the higher the stream order. The first order stream was the highest order.

Spatial Autocorrelation Analysis
There was a positive spatial autocorrelation in the incidence of TC nationwide in 2016 (Z score = 8.03, p < 0.001). Furthermore, there were 13 high -high clusters of TC, including nine prefectures in Zhejiang Province (Hangzhou, Ningbo, Wenzhou, Jiaxing, Huzhou, Jinhua, Quzhou, Lishui, and Taizhou). Moreover, five prefectures, including Bayan Naoer, Taiyuan, Changzhou, Liuzhou and Loudi were high-low clusters, Xuancheng was a lowhigh cluster. Finally, the light blue areas in the map including Yangzhou, Nanchang, Chongqing and the other 24 prefectures were all low-low clusters (Figure 3).  Note: The blank areas had no available registration data. The blue lines of different sh thicknesses represented the water system. The darker and the thicker the line was, the stream order. The first order stream was the highest order.

Univariate Correlation Analysis
Through the K-S test, we found that the incidence of TC did not meet the normal distribution (p < 0.001), so Spearman correlation analysis was used to explore the factors associated with TC incidence.

Generalized Linear Poisson Regression Model (a) Prefecture Level-Socioeconomic Factors
Some socioeconomic factors associated with TC morbidity were correlated with each other (Figure 4). To exclude the multi-collinearity between various factors, we called the regsubsets function in the R language "leaps" package to perform full subset regression on the incidence and influencing factors data of TC at the prefecture level. The optimal and simplest GLM was selected according to the principle that the larger the adjusted R 2 was, the smaller the BIC value was.

Generalized Linear Poisson Regression Model (a) Prefecture Level-Socioeconomic Factors
Some socioeconomic factors associated with TC morbidity were correlated with each other (Figure 4). To exclude the multi-collinearity between various factors, we called the regsubsets function in the R language "leaps" package to perform full subset regression on the incidence and influencing factors data of TC at the prefecture level. The optimal and simplest GLM was selected according to the principle that the larger the adjusted R 2 was, the smaller the BIC value was. When the included independent variables were UR and PGDP, the adjusted R 2 of the model was the largest of 0.33, and its BIC was −69, the second-to-last smallest. Therefore, the model that incorporated UR and PGDP was the optimal model ( Figure 5). We performed Poisson regression on it and called the qcc.overdispersion.test function in the "qcc" package in R4.2.0 to detect the excessive deviation. We observed an excessive deviation (p < 0.05), so Quasi-Poisson regression was used to refit the regression model.  When the included independent variables were UR and PGDP, the adjusted R 2 of the model was the largest of 0.33, and its BIC was −69, the second-to-last smallest. Therefore, the model that incorporated UR and PGDP was the optimal model ( Figure 5). We performed Poisson regression on it and called the qcc.overdispersion.test function in the "qcc" package in R4.2.0 to detect the excessive deviation. We observed an excessive deviation (p < 0.05), so Quasi-Poisson regression was used to refit the regression model. The model results showed that the higher the UR, the higher the incidence of TC (RR = 1.109, 95%CI: 1.084 to 1.135). That is, for every 1% increase in the UR, the incidence The model results showed that the higher the UR, the higher the incidence of TC (RR = 1.109, 95%CI: 1.084 to 1.135). That is, for every 1% increase in the UR, the incidence of TC increased by 1.109 times. The higher the PGDP, the higher the incidence of TC (RR = 1.013, 95%CI: 1.007 to 1.018). That is, for every CNY 10,000 increase in PGDP, the incidence of TC increased by 1.013 times (Table 5). Similarly, some dietary behavioral factors associated with TC incidence were correlated with each other (Figure 6). In order to exclude the mutual influence of them, we still called the regsubsets function in the "leaps" package of the R language to perform full subset regression on variables that were statistically significant after univariate correlation analysis. The optimal model selection principle was the same as the prefecture level. Excessive deviation detection was also required. When the included independent variables were salt iodine, aquatic product consumption, dry and fresh fruit consumption and grain consumption, the BIC value was the smallest of −11, its corresponding adjusted R 2 value was 0.56, which was only 0.01 less than the maximum adjusted R 2 0.57. Therefore, the model including salt iodine, aquatic products, dried and fresh fruits and grain was the optimal model (Figure 7). Similarly, i can be seen that there was an excessive deviation phenomenon (p < 0.05), so Qua si-Poisson regression was used for refitting. When the included independent variables were salt iodine, aquatic product consumption, dry and fresh fruit consumption and grain consumption, the BIC value was the smallest of −11, its corresponding adjusted R 2 value was 0.56, which was only 0.01 less than the maximum adjusted R 2 0.57. Therefore, the model including salt iodine, aquatic products, dried and fresh fruits and grain was the optimal model ( Figure 7). Similarly, it can be seen that there was an excessive deviation phenomenon (p < 0.05), so Quasi-Poisson regression was used for refitting. the smallest of −11, its corresponding adjusted R 2 value was 0.56, which was only 0.01 less than the maximum adjusted R 2 0.57. Therefore, the model including salt iodine, aquatic products, dried and fresh fruits and grain was the optimal model (Figure 7). Similarly, it can be seen that there was an excessive deviation phenomenon (p < 0.05), so Quasi-Poisson regression was used for refitting. The higher the aquatic product consumption per capita, the higher the incidence of TC (RR = 1.047, 95%CI: 1.020 to 1.075). That was, for every 1kg increase in aquatic prod- The higher the aquatic product consumption per capita, the higher the incidence of TC (RR = 1.047, 95%CI: 1.020 to 1.075). That was, for every 1kg increase in aquatic product consumption per capita, the incidence of TC increased by 1.047 times. The higher the consumption of fresh and dried fruits per capita, the higher the incidence of TC (RR = 1.024, 95%CI: 1.007 to 1.040). The incidence of TC increased by 1.024 times for every 1kg increase in the consumption of fresh and dried fruits. However, salt iodine and grain consumption per capita were not associated with TC incidence, as their RR confidence intervals included 1 ( Table 6).

Sensitivity Analysis (a) Prefecture Level-Socioeconomic Factors
The results of the 10-fold cross-validation showed that the actual explained amount of incidence was 0.29 for the model we finally obtained; the variability (equivalent to error) was very small at 0.02. The relative importance of the variables is shown in the figure below ( Figure 8).

(b) Provincial Level-Dietary Behavior Factors
The 10-fold cross-validation results showed that when the two variables, salt iodine and grain consumption, which were not statistically significant, were included in the model, the variability of the model was 0.40. When only the consumption of aquatic products and dried and fresh fruits and vegetables were included, the variability of the model was reduced to 0.22. At this time, the robustness of the model was better. The relative importance of the variables is shown in the figure below (Figure 9).

(a) Prefecture Level-Socioeconomic Factors
The results of the 10-fold cross-validation showed that the actual explained amount of incidence was 0.29 for the model we finally obtained; the variability (equivalent to error) was very small at 0.02. The relative importance of the variables is shown in the figure below (Figure 8).

(b) Provincial Level-Dietary Behavior Factors
The 10-fold cross-validation results showed that when the two variables, salt iodine and grain consumption, which were not statistically significant, were included in the model, the variability of the model was 0.40. When only the consumption of aquatic products and dried and fresh fruits and vegetables were included, the variability of the model was reduced to 0.22. At this time, the robustness of the model was better. The relative importance of the variables is shown in the figure below ( Figure 9).

Discussion
There were significant regional differences in the incidence of TC, e.g., eastern and coastal areas were higher than inland areas, urban areas were higher than rural areas, socio-economically developed areas were higher than less-developed areas [19]. This study found that coastal areas had higher TC incidence, especially in southeastern coastal areas, which may be related to the economic development in those regions. With the development of the regional economy, the examination of thyroid-related diseases had been included in the routine physical examination items of residents, which greatly improved the lead time of TC detection. In addition, the more developed the economy, the higher the accessibility of local health resources and the more advanced the medical technology. At the same time, with the higher education level of residents, people had more knowledge about health and paid more attention to disease prevention. The high level of the above factors can strengthen people's health awareness and disease screen-

Discussion
There were significant regional differences in the incidence of TC, e.g., eastern and coastal areas were higher than inland areas, urban areas were higher than rural areas, socio-economically developed areas were higher than less-developed areas [19]. This study found that coastal areas had higher TC incidence, especially in southeastern coastal areas, which may be related to the economic development in those regions. With the development of the regional economy, the examination of thyroid-related diseases had been included in the routine physical examination items of residents, which greatly improved the lead time of TC detection. In addition, the more developed the economy, the higher the accessibility of local health resources and the more advanced the medical technology. At the same time, with the higher education level of residents, people had more knowledge about health and paid more attention to disease prevention. The high level of the above factors can strengthen people's health awareness and disease screening awareness, prompting people to seek medical treatment in the early stage of the disease. Importantly, due to the emergence of high-resolution B-ultrasound and the application of fine needle aspiration technology, more early TCs were found, resulting in lead time bias and over-diagnosis bias, which caused high TC incidence in economically developed areas in China. A study found that from 2008 to 2012, 85% of female TC patients in Shanghai were over-diagnosed, and 82.8% of male patients were over-diagnosed. Similarly, female over-diagnosed patients in Jiaxing accounted for 93.2% of all female TC patients, followed by Hangzhou, which was 92.9% [20]. Moreover, we also found TC incidence was at a high level for most prefectures in Zhejiang Province. In addition to the over-diagnosis, it may also be related to the dietary structure of local residents. The aquatic products consumption per capita in Zhejiang Province was 23.3 kg, ranking fourth nationwide and much higher than the national average level of 11.4 kg [21]. As we all know, aquatic products are rich in iodine, which is one of the main raw materials for the synthesis of thyroid hormones. Its deficiency can lead to decreased thyroid hormone synthesis and secretion, resulting in increased thyroid-stimulating hormone (TSH) levels. Long-term chronic stimulation of TSH can lead to hypertrophy of thyroid follicles, and even further formation of nodules or cancer. A diet high in iodine can also cause changes in the structure and function of the thyroid gland. A study found that the incidence of TC in Shanghai increased significantly after 5 and 8 years of iodization in salt [22].
In this study, we found that high UR and PGDP, high aquatic products and dried and fresh fruit consumption per capita were risk factors for TC. As mentioned above, the impact of high UR and PGDP on the incidence of TC was due to the development of medical technology and medical methods caused by economic development, resulting in over-diagnosis bias. The effect of high aquatic product consumption may be related to iodine intake. Studies have shown that eating more marine animal foods (OR = 9.484) and eating too much seafood (OR = 1.933) are risk factors for TC [23]. As for the effect of high consumption of dried and fresh fruits per capita on TC, there was no research to explore its association with the occurrence of TC. Most of the existing studies have explored only the effect of fresh fruit intake on TC. For example, a study in South Korea found that high consumption of persimmons and oranges may reduce the risk of TC to a certain extent and play a preventive role in early TC [24]. This protective effect of fruits might be related to their vitamin and antioxidant content [25]. Another study found no link between fruit intake and TC [26]. The above results were different from ours, probably because the consumption of fruits in our study included dried fruits, such as peanuts, walnuts, and cashews, most of which were rich in protein, and a related study found that excessive intake of protein was related to the incidence of DTC [27]. Moreover, we found that water iodine content, salt iodine content and daily salt intake per capita were not associated with the occurrence of TC, possibly because they did not represent the actual iodine intake of the population. Firstly, the main source of iodine intake of the population was salt; salt iodine content and salt intake influenced each other. Secondly, the degree of salt iodization varied in different regions.
We also found that PM 2.5 concentration had no statistical association with the occurrence of TC, which was different from previous studies. For example, a study showed that an increase in the incidence of PTC was significantly associated with an increase in PM 2.5 concentrations with a lag of 2 and 3 years [10]. The abnormal role of PM 2.5 in our study may be related to other air pollutants as well as people's health behavior and awareness. Firstly, our study only included PM 2.5 , without adjusting the effects of other pollutants, such as PM 10 and SO 2 , on the incidence of TC. Secondly, the PM 2.5 concentration level in our study only referred to the external exposure level rather than the actual internal dose. The higher the concentration of PM 2.5 in a certain area, the more likely people would take corresponding measures, such as wearing masks and reducing going out as much as possible to reduce their exposure level. The above factors might lead to the abnormal effect of PM 2.5 to a certain extent.
Finally, a large number of studies have shown that greening rate and green space are closely related to human health. For example, studies found that people with diabetes who lived in areas with relatively high levels of greenery had a significantly lower risk of cardiovascular disease [28]. Increasing residential green exposure levels was significantly associated with a decrease in all-cause mortality [29]. Studies have also found that urban greenness was inversely associated with childhood respiratory outcomes, especially among children exposed to high levels of tobacco smoke pollution [30]. Therefore, we used GR as an influencing factor for the first time to explore its impact on the incidence of TC. However, we found no association between them. On the one hand, it might be because it was indeed not an influencing factor of TC. On the other hand, the GR in this study referred to the green space coverage rate of urban built-up areas. Generally speaking, the high green space coverage rate of urban built-up areas reflected a high development level of prefecture to a certain extent, and the development of prefecture must be accompanied by acceleration of the industrialization process, which produced a series of environmental problems, thereby masking or even reversing the protective effect of green space coverage on TC.
There were strengths in this study. First of all, our study was a nationwide study, which was relatively representative in describing patterns of TC in China. Secondly, this study examined the impact of socioeconomic factors and dietary behavioral factors on TC at the prefecture level and provincial level, respectively.
In addition, this study had inevitable limitations. Firstly, the incidence data of TC in some prefectures and all provinces were estimated data, and their accuracy was limited by the number of tumor registration sites in their subordinate administrative regions. Secondly, this study was an ecological study, individual level data cannot be obtained, which resulted in ecological fallacies. Thirdly, due to data availability (although we have tried our best to keep the time consistent), behavioral factors, such as smoking rates and obesity rates, were from different years and came from local research or active reports from provinces. The quality and comparability of the data were difficult to guarantee. Furthermore, due to the inability to obtain relevant data, we did not take migration (all the predictors may be quite substantial in China between rural areas and cities and across regions) into account, which will potentially undermine the spatial cross-sectional analysis, although maybe the results might not change too much (optimistically).

Conclusions
High UR, high PGDP, high consumption of aquatic products, and dried and fresh fruits were risk factors for TC. By exploring the factors associated with TC incidence in China, we found that the incidence of TC presented a special phenomenon of "economic differentiation", which meant that the more developed the regional economy, the higher the incidence of TC to a certain extent, and the reason behind this was over-diagnosis as a result of advanced medical technology and perfect medical systems in economically developed areas. In order to eliminate this differentiation, the state must adjust and unify the diagnostic criteria for TC at a reasonable level, improve the medical and health system and policy nationwide as much as possible. For example, screening of thyroid-related diseases should be included into the routine physical examination items of residents, especially in less-developed regions, such as the western areas. We should also carry out targeted health education policies according to people's dietary habits in high risk areas of TC occurrence, correct their bad eating habits and improve their health awareness, so as to reduce the incidence of TC in high risk areas. Finally, individual level studies and more precise data should be used to further explore the risk factors of TC in the future.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest:
The authors declare no conflict of interest.