Identifying the Local Influencing Factors of Arsenic Concentration in Suburban Soil: A Multiscale Geographically Weighted Regression Approach

Exploring the local influencing factors and sources of soil arsenic (As) is crucial for reducing As pollution, protecting soil ecology, and ensuring human health. Based on geographically weighted regression (GWR), multiscale GWR (MGWR) considers the different influence ranges of explanatory variables and thus adopts an adaptative bandwidth. It is an effective model in many fields but has not been used in exploring local influencing factors and sources of As. Therefore, using 200 samples collected from the northeastern black soil zone of China, this study examined the effectiveness of MGWR, revealed the spatial non-stationary relationship between As and environmental variables, and determined the local impact factors and pollution sources of As. The results showed that 49% of the samples had arsenic content exceeding the background value, and these samples were mainly distributed in the central and southern parts of the region. MGWR outperformed GWR with the adaptative bandwidth, with a lower Moran’s I of residuals and a higher R2 (0.559). The MGWR model revealed the spatially heterogeneous relationship between As and explanatory variables. Specifically, the road density and total nitrogen, clay, and silt contents were the primary or secondary influencing factors at most points. The distance from an industrial enterprise was the secondary influencing factor at only a few points. The main pollution sources of As were thus inferred as traffic and fertilizer, and industrial emissions were also included in the southern region. These findings highlight the importance of considering adaptative bandwidths for independent variables and demonstrate the effectiveness of MGWR in exploring local sources of soil pollutants.


Introduction
In recent years, soil arsenic (As) pollution caused by industrialization, agricultural production, and urbanization has become an important threat to soil ecology, food security, and human health [1][2][3].The 2014 Chinese Soil Pollution Survey Bulletin reported that As is one of the main pollutants in the soil of arable land, mining areas, industrial parks, and both sides of trunk roads.High concentrations of As will disrupt soil microbial metabolism and plant growth, leading to the soil's ecological deterioration and reduced food production [4,5].More importantly, crops can absorb and enrich soil arsenic, posing a threat to human health [6,7].Therefore, exploring the main sources and influencing factors of soil As is crucial for reducing soil arsenic pollution, protecting soil ecology, and ensuring human health.The Kriging model [8], principal component analysis/cluster analysis [9], multiple linear regression (MLR) [10], geographical detectors [11,12], spatial lag models [13], and machine learning methods [14][15][16] have been used to explore the main influencing factors of As.These studies revealed that the main sources of As include mineral mining, industrial smelting, agricultural production, fossil energy combustion, traffic emissions, and some As-rich minerals (e.g., hutchinsonite and arsenopyrite).However, these global models assume that the relationship between the dependent and independent variables is the same in different subregions.This assumption may lead to the poor fitting performance of these models in areas with strong landscape heterogeneity [10,17].Moreover, the global regression model can only reveal the average relationship between As and covariates throughout the entire study area and thus cannot provide data support for local As pollution treatment.
Geographically weighted regression (GWR) is a local spatial regression model that can determine the local impact factors of soil pollution at different locations [18].It has been widely used to reveal the spatially heterogeneous relationships between soil pollutants and environmental variables [19][20][21].The bandwidth is the core parameter of the GWR model, which reflects the influence range and scale effect of the independent variable on the dependent variable.However, GWR applies a fixed bandwidth to all covariates [22], which ignores that the impact scales of different covariates often vary in space.For example, climate factors and parent material typically control the spatial distribution of soil properties on a large scale, whereas human activities such as agricultural production and industrial emissions mainly affect the local soil [23,24].To address this deficiency, Fotheringham et al. (2017) (Fotheringham, et al. [25]) proposed the multiscale GWR (MGWR) model, which fits an appropriate bandwidth for each explanatory variable.MGWR has been successfully used in the analysis of local impact factors on housing prices [26], the urban built environment [27], COVID-19 incidence rates [28], landslide susceptibility [29], ecological environment quality [30], and soil carbon storage [31].Therefore, MGWR has great potential in determining the local pollution sources of As and proposing targeted pollution control measures.
Kuancheng District is located in the northeastern black soil zone of China and has a large area of fertile farmland.However, as a regional transportation hub, it undergoes rapid urbanization and industrialization, and its soil is at risk of arsenic pollution [32].Therefore, this study aimed to evaluate the effectiveness of MGWR by comparing it with MLR and GWR, investigate the spatial non-stationary relationship between As and environmental variables, and explore the local impact factors and pollution sources of As.

Study Area
The study area (43 1), and is typical suburban cultivated land affected by urbanization in the main grain-producing area of Northeast China.The research area is approximately 240 km 2 , with flat terrain and elevations ranging from 187 m to 246 m.Its climate is a typical continental monsoon climate, with an average annual precipitation of 550 mm and an average annual temperature of 4 • C. The research area mainly comprises cropland and construction land, with a small amount of forest, grassland, and water areas.The main lithology is a Holocene system, which contains sandstone and pebble sand wedge.The main soil types are meadow soil, phaeozem, and chernozeme, which are highly fertile.

Soil Sampling and Measurement
The urban-rural transition zone selected for the study includes different cultivated land utilization environments from urban to rural.Based on the land use map, the sampling locations were pre-determined to keep the sample density at 1 per km 2 , considering the land use type and the topographic conditions to ensure a uniform distribution of the sampling sites.Finally, 200 topsoil samples (0-20 cm) were taken in September 2017 (Figure 1).
Each sample was composited by mixing three to five subsamples within a 1 m 2 area.A stainless-steel spade was used for soil sampling and was washed with deionized water and wiped dry with paper towels between each use to avoid cross-contamination.The collected soil samples were air-dried to a constant weight, passed through a 2 mm sieve, and ground before being stored in the laboratory until soil As content analysis.
For the analysis of total As, 0.25 g aliquots of the dried soil were digested in aqua regia (65% HNO3 to 37% HCl as 1:3) and analyzed using atomic fluorescence spectrometry (AFS200T, Skyray Instrument, Suzhou City, Jiangsu Province, China).Soil organic carbon (SOC) content was analyzed for all the samples following the Walkley-Black method [33], and soil organic matter (SOM) was determined by the van Bemmelen conversion factor along with the SOC concentration.Total nitrogen (TN) was determined using the Kjeldahl method.Soil pH was measured for a 1:2.5 soil/water ratio using a pH meter (PHS-3E, Leici, Shanghai, China).All sample analyses included batch replicates, reagent blanks, and standard reference materials from the National Research Center for Certified Reference Materials of China (GBW07424) to ensure analytical accuracy [34,35].

Soil Sampling and Measurement
The urban-rural transition zone selected for the study includes different cultivated land utilization environments from urban to rural.Based on the land use map, the sampling locations were pre-determined to keep the sample density at 1 per km 2 , considering the land use type and the topographic conditions to ensure a uniform distribution of the sampling sites.Finally, 200 topsoil samples (0-20 cm) were taken in September 2017 (Figure 1).
Each sample was composited by mixing three to five subsamples within a 1 m 2 area.A stainless-steel spade was used for soil sampling and was washed with deionized water and wiped dry with paper towels between each use to avoid cross-contamination.The collected soil samples were air-dried to a constant weight, passed through a 2 mm sieve, and ground before being stored in the laboratory until soil As content analysis.
For the analysis of total As, 0.25 g aliquots of the dried soil were digested in aqua regia (65% HNO 3 to 37% HCl as 1:3) and analyzed using atomic fluorescence spectrometry (AFS200T, Skyray Instrument, Suzhou City, Jiangsu Province, China).Soil organic carbon (SOC) content was analyzed for all the samples following the Walkley-Black method [33], and soil organic matter (SOM) was determined by the van Bemmelen conversion factor along with the SOC concentration.Total nitrogen (TN) was determined using the Kjeldahl method.Soil pH was measured for a 1:2.5 soil/water ratio using a pH meter (PHS-3E, Leici, Shanghai, China).All sample analyses included batch replicates, reagent blanks, and standard reference materials from the National Research Center for Certified Reference Materials of China (GBW07424) to ensure analytical accuracy [34,35].

Sources and Pre-Processing of Environmental Variables
A total of 14 possible explanatory variables were selected from two aspects (Table 1) [36,37]: (i) possible sources of As, including distance from an industrial enterprise (Dis_IE), road density (RD), population density (PD), land use types (LU), total nitrogen (TN), total phosphorus (TP), and soil types (ST); (ii) migration-related factors of As, including clay content, silt content, pH, soil organic matter (SOM), elevation, and topographic wetness index (TWI).We crawled the locations of industrial enterprises in the study area from Amap, as shown in Figure 2a.We calculated the distance from each sample point to the nearest industrial enterprise (Dis_IE).The land use type was derived from the China Land Cover Dataset (https://zenodo.org/,accessed on 1 September 2023), with a resolution of 30 m.The main land use types in the research area are cropland and construction land, as well as a small number of water bodies and a greenbelt (Figure 1).Road data were obtained from OpenStreetMap (https://www.openstreetmap.org/,accessed on 1 September 2023), including highways, city roads, suburban rural roads, and road density, which were determined using a 1 km raster, as shown in Figure 2a.The population density data were derived from LandScan Global Population Data (https://landscan.ornl.gov/,accessed on 1 September 2023), with a resolution of 1 km, as shown in Figure 2b.Total nitrogen and phosphorus and soil type are agricultural As sources.Total nitrogen was measured in the laboratory, and Figure 2c was obtained by ordinary Kriging interpolation with a resolution of 30 m.The total phosphorus from the National Earth System Science Data Center of China (http://soil.geodata.cn/ztsj.html,accessed on 1 September 2023), with a resolution of 90 m, is shown in Figure 2d.The soil type represents the natural source of As.The soil type data were obtained from the Chinese Resource and Environment Science and Data Center (https://www.resdc.cn/,accessed on 1 September 2023) with a resolution of 1 km.Soil types in the study area include phaeozem, chernozem, and meadow soil, as shown in Figure 2e.The clay and silt contents were collected from SoilGrids (https://soilgrids.org/,accessed on 1 September 2023) with a resolution of 250 m.The silt content ranges from 0 to 52.7%, and the clay content ranges from 0 to 30.2%, as shown in Figure 2f,g, respectively.The data sources of SOM and pH are the same as that of total nitrogen, which was measured in the laboratory.The raster maps were obtained by Kriging interpolation with a resolution of 30 m, as shown in Figure 2h,i.The elevation data were obtained from NASA's Earthdata website (https://nasadaacs.eos.nasa.gov/,accessed on 1 September 2023) with a resolution of 12.5 m, as shown in Figure 2j.The TWI was calculated based on the elevation data with a resolution of 30 m (Figure 2k).The ArcGIS software platform was used to complete all the environmental variable grid statistics, distance calculation, and spatial interpolation pre-processing work (version 10.7).

Spatial Local Regression Model 2.4.1. GWR
GWR is an effective local spatial regression model [18,38].It assumes that the relationship between dependent variables and covariates has spatial heterogeneity and thus establishes local linear regression equations at each sampling point.The formula of the GWR of the ith point with the coordinates (u i , v i ) is as follows: where y i is the dependent variable, x i,j is the jth explanatory variable, ) is a constant term, and ε i is the stochastic error term.The bandwidth is the kernel function's coverage range and is the GWR model's core parameter.It determines the parameter estimation of the local regression equation using samples within a certain spatial range based on the scale effect of the independent variable's influence on the dependent variable.An excessive bandwidth can lead to excessive bias in regression parameter estimation, whereas too small a bandwidth can lead to excessive variance in regression parameter estimation.Therefore, determining the optimal bandwidth size is crucial.However, GWR applies a fixed bandwidth to all covariates, which ignores that the impact scales of different covariates often vary in space.This assumption results in a biased estimation of regression parameters.In addition, GWR cannot robustly deal with parameter instability caused by outliers, multicollinearity, and spatial autocorrelation [39].

MGWR
To address this deficiency, Fotheringham et al. (2017) (Fotheringham, Yang and Kang [25]) proposed the MGWR model.MGWR relaxes the assumption that all modeling processes are conducted on the same spatial scale.It adopts adaptative bandwidths for all explanatory variables to describe the scale effects of different covariates on the dependent variable.Thus, this model can reduce over-fitting with adaptative bandwidths and mitigate the concurrency of GWR [40].The formula of MGWR is as follows: where bw in β bw,j is the adaptative bandwidth of x i,j , and the meanings of other parameters are the same as those in Formula (1).When estimating local parameters, MGWR first sets the parameters of the GWR model as the initial parameters using the weighted least-squares method.Then, it uses the back-fitting algorithm to optimize the model parameters.Given the uneven distribution of sampling points, an adaptive bi-square kernel function and the Akaike information criterion (AIC) were used to obtain the optimal bandwidth of each explanatory variable.This study used MLR, GWR, and MGWR models to explore the relationships between As and the explanatory variables.Their modeling and parameter estimation were conducted using SPSS (version 24.0) and MGWR software (version 2.2).Before modeling, the Moran's I value of the As dataset was calculated using Geoda software (version 1.16) [41].
The use of GWR and MGWR spatial models is only meaningful if Moran's I is significant.The log-likelihood, AIC, determination coefficient (R 2 ), and residual Moran's I were used to evaluate the performance of the three models.A well-performing model has a low AIC and residual Moran's I and high log-likelihood and R 2 values [42].

Descriptive Statistics of As
Table 2 exhibits the descriptive statistics results of As, TN, SOM, and pH.The As content ranged from 6.239 mg/kg to 15.966 mg/kg.Its mean and median values were 10.241 mg/kg and 10.088 mg/kg, respectively, close to the local background value of As (i.e., 10.25 mg/kg) [34].The mean values of TN, SOM, and pH were 1.495 g/kg, 26.45 g/kg, and 7.085, respectively.The coefficients of variation of the four soil properties were 18.3%, 39.8%, 45.5%, and 13.2%.CV: coefficient of variation.
Table 3 shows the one-way ANOVA results of soil and land use types.The mean As content among different soil and land use types was different, but they did not pass the F test (p > 0.05).The least significant difference test results showed that the mean As content of meadow soil was higher than that of black soil and significantly higher than that of chernozem.The mean As content of construction land was higher than that of cropland.These results indicated that soil and land use types affect the As content.

Spatial Distribution and Pollution Levels of As
Figure 3 shows the spatial distribution of As content.The As content was higher in the central southern part of the study area, which is close to residential and industrial areas.It is lower in the northern part of the study area, which is close to agricultural areas.The As content in 49% of the samples exceeded the background value (10.25 mg/kg) [34].However, none reached the risk screening value recorded in "Soil environmental quality-Risk control standard for soil contamination of agricultural land and development land" (Chinese GB36600-2018 [43] and GB15618-2018 [44]), indicating that agricultural production is still within a safe range.These samples are mainly distributed in the central and southern parts of the study area.

Model Comparison
The Moran's I value of As content was 0.477 (p < 0.01), which confirms that As content has spatial autocorrelation and should be fitted via the GWR and MGWR models.Figure 4 exhibits the optimal bandwidth of explanatory variables for GWR and MGWR models.The optimal bandwidth in GWR was 169.In MGWR, the optimal bandwidths of land use, TN, TP, soil type, clay and silt contents, elevation, and TWI ranged from 181 to 199, which were close to the whole study area.SOM had the smallest optimal bandwidth of 43, followed by road density, Dis_IE, and pH.These results demonstrate the importance of considering an adaptative bandwidth in the MGWR model.Figure 5a displays the model evaluation results of MLR, GWR, and MGWR.MGWR outperformed GWR and MLR, with a higher R2 and log-likelihood and lower AIC and Moran's I of residuals.In particular, the residuals' Moran's I of MLR and GWR was significant, whereas the residuals' Moran's I of MGWR was not significant.These results indicated that MGWR was the optimal model.Thus, we only show the regression results of MGWR in the following sections.
The local R 2 of the MGWR model ranged from 0.445 to 0.631.Its spatial distribution is shown in Figure 5b.The local R 2 was highest in the western region and decreased from west to east.

Model Comparison
The Moran's I value of As content was 0.477 (p < 0.01), which confirms that As content has spatial autocorrelation and should be fitted via the GWR and MGWR models.Figure 4 exhibits the optimal bandwidth of explanatory variables for GWR and MGWR models.The optimal bandwidth in GWR was 169.In MGWR, the optimal bandwidths of land use, TN, TP, soil type, clay and silt contents, elevation, and TWI ranged from 181 to 199, which were close to the whole study area.SOM had the smallest optimal bandwidth of 43, followed by road density, Dis_IE, and pH.These results demonstrate the importance of considering an adaptative bandwidth in the MGWR model.

Model Comparison
The Moran's I value of As content was 0.477 (p < 0.01), which confirms that As content has spatial autocorrelation and should be fitted via the GWR and MGWR models.Figure 4 exhibits the optimal bandwidth of explanatory variables for GWR and MGWR models.The optimal bandwidth in GWR was 169.In MGWR, the optimal bandwidths of land use, TN, TP, soil type, clay and silt contents, elevation, and TWI ranged from 181 to 199, which were close to the whole study area.SOM had the smallest optimal bandwidth of 43, followed by road density, Dis_IE, and pH.These results demonstrate the importance of considering an adaptative bandwidth in the MGWR model.Figure 5a displays the model evaluation results of MLR, GWR, and MGWR.MGWR outperformed GWR and MLR, with a higher R2 and log-likelihood and lower AIC and Moran's I of residuals.In particular, the residuals' Moran's I of MLR and GWR was significant, whereas the residuals' Moran's I of MGWR was not significant.These results indicated that MGWR was the optimal model.Thus, we only show the regression results of MGWR in the following sections.
The local R 2 of the MGWR model ranged from 0.445 to 0.631.Its spatial distribution is shown in Figure 5b.The local R 2 was highest in the western region and decreased from west to east. Figure 5a displays the model evaluation results of MLR, GWR, and MGWR.MGWR outperformed GWR and MLR, with a higher R2 and log-likelihood and lower AIC and Moran's I of residuals.In particular, the residuals' Moran's I of MLR and GWR was significant, whereas the residuals' Moran's I of MGWR was not significant.These results indicated that MGWR was the optimal model.Thus, we only show the regression results of MGWR in the following sections.

Spatial Distribution of MGWR Regression Coefficients
Table 4 shows the descriptive statistics of the standardized regression coefficients of the explanatory variables for the MGWR model.The absolute value of the mean standardized coefficient is the highest for road density, which was 1.521, followed by silt content, clay content, TN, and TP, which were 0.383, 0.305, 0.286, and 0.219, respectively.The absolute values of the mean standardized coefficient of other factors were relatively small, all below 0.200.The significant number of explanatory variables was also counted.The silt and clay contents were substantial at all 200 locations.The RD, land use, TP, and pH were significant at 116 to 198 locations.Dis_IE, TN, and SOM were only significant at several locations, whereas other variables were not significant at any locations.The local R 2 of the MGWR model ranged from 0.445 to 0.631.Its spatial distribution is shown in Figure 5b.The local R 2 was highest in the western region and decreased from west to east.

Spatial Distribution of MGWR Regression Coefficients
Table 4 shows the descriptive statistics of the standardized regression coefficients of the explanatory variables for the MGWR model.The absolute value of the mean standardized coefficient is the highest for road density, which was 1.521, followed by silt content, clay content, TN, and TP, which were 0.383, 0.305, 0.286, and 0.219, respectively.The absolute values of the mean standardized coefficient of other factors were relatively small, all below 0.200.The significant number of explanatory variables was also counted.The silt and clay contents were substantial at all 200 locations.The RD, land use, TP, and pH were significant at 116 to 198 locations.Dis_IE, TN, and SOM were only significant at several locations, whereas other variables were not significant at any locations.Figure 6 displays the spatial distribution of significant standardized coefficients.Variables that were not significant at any location are not shown.Dis_IE was only significant at some points in the southeastern part of the study area.The highest absolute values of RD, silt content, and SOM all occurred in the southeast of the region and decreased along the southeast-northwest direction.However, the RD and SOM were not significant in the central region.The coefficients of TN and TP showed a decreasing trend from north to south, but TN was only significant in the northwest of the region.The spatial distribution of the clay content coefficient was symmetrical with the spatial distribution of silt content.It decreased in the northeast-southwest direction.The coefficient values of pH and LU had relatively small spatial differences, with an overall trend of high values in the west and east.
Figure 6 displays the spatial distribution of significant standardized coefficients.Variables that were not significant at any location are not shown.Dis_IE was only significant at some points in the southeastern part of the study area.The highest absolute values of RD, silt content, and SOM all occurred in the southeast of the region and decreased along the southeast-northwest direction.However, the RD and SOM were not significant in the central region.The coefficients of TN and TP showed a decreasing trend from north to south, but TN was only significant in the northwest of the region.The spatial distribution of the clay content coefficient was symmetrical with the spatial distribution of silt content.It decreased in the northeast-southwest direction.The coefficient values of pH and LU had relatively small spatial differences, with an overall trend of high values in the west and east.To sum up, the coefficients of industrial and traffic factors were higher in the southeastern part of the study area, whereas the coefficients of agricultural factors were higher in the northern part of the region.The coefficients of soil environmental factors varied less spatially in the study area.

Local Influencing Factors and Sources of As
The absolute values of the coefficients of each variable were sorted from highest to lowest, and the primary and secondary influencing factors for As content at each sampling point were determined (Figure 7).The primary influencing factor of As content for most points is road density.The primary influencing factor of As content for some samples in the southwestern region was SOM, silt content, or TN.The second influencing factor of As content for most points was SOM, silt content, or TN, whereas the second influencing factor of As content for only a small number of points was clay content, Dis_IE, or RD. Figure 6.Spatial distribution maps of significant standardized coefficients for the MGWR model.Dis_IE: distance from the industrial enterprise; RD: road density; LU: land use type; TN: total nitrogen; TP: total phosphorus; SOM: soil organic matter content.
To sum up, the coefficients of industrial and traffic factors were higher in the southeastern part of the study area, whereas the coefficients of agricultural factors were higher in the northern part of the region.The coefficients of soil environmental factors varied less spatially in the study area.

Local Influencing Factors and Sources of As
The absolute values of the coefficients of each variable were sorted from highest to lowest, and the primary and secondary influencing factors for As content at each sampling point were determined (Figure 7).The primary influencing factor of As content for most points is road density.The primary influencing factor of As content for some samples in the southwestern region was SOM, silt content, or TN.The second influencing factor of As content for most points was SOM, silt content, or TN, whereas the second influencing factor of As content for only a small number of points was clay content, Dis_IE, or RD.Based on the relative importance of source-related variables, the main sources of As at each point were inferred and are exhibited in Figure 8.The traffic source characterized by the road density was the primary or secondary source of As for almost all locations.The agricultural source, represented by TN and TP, was the second source for most points and the primary source for a small number of points located in the southwest of the region.The primary or secondary source, with only a few points located in the southeast of the region, was industrial emissions, characterized by Dis_IE.
To sum up, the main influencing factors and sources of As varied greatly at different locations.Traffic, fertilization, and industrial emissions were the primary sources of As in the study area, and soil environmental factors were crucial for soil As accumulation.Based on the relative importance of source-related variables, the main sources of As at each point were inferred and are exhibited in Figure 8.The traffic source characterized by the road density was the primary or secondary source of As for almost all locations.The agricultural source, represented by TN and TP, was the second source for most points and the primary source for a small number of points located in the southwest of the region.The primary or secondary source, with only a few points located in the southeast of the region, was industrial emissions, characterized by Dis_IE.
To sum up, the main influencing factors and sources of As varied greatly at different locations.Traffic, fertilization, and industrial emissions were the primary sources of As in the study area, and soil environmental factors were crucial for soil As accumulation.

Impact of Environmental Variables on As Accumulation
Based on the mean absolute standardized coefficients, the road density, silt content, clay content, TN, and TP were the main influencing factors of As content, and traffic and agricultural sources were the main sources of As.
Kuancheng District is the transportation hub of Changchun City, with two railway stations, long-distance bus stations, and a high road density within the territory [32,34].With rapid urbanization, the road density and residential car ownership in the study area are rapidly increasing.Given that road surface materials and vehicle components contain As, an increasing road density will lead to more As accumulation in soil [45].Zechmeister et al. (2005) [46] found that friction between heavy-duty vehicle tires and the road is particularly conducive to the accumulation of pollutants such as As in roadside soil and moss.Many studies have observed arsenic pollution in the soil on both sides of the road [47][48][49][50].In this study, we found that As was positively related to the road density, which is consistent with the research of Seker et al. (2022) (Seker,et al. [51]) and Qiao et al. (2022) (Qiao,et al. [52]).
Using arsenical animal feed and phosphorus fertilizers in agricultural production is another important source of soil arsenic [53][54][55][56].For example, arsenic compounds such as roxarsone are commonly used as feed additives in livestock farming and enter the soil through livestock manure [57,58].In addition, several studies found As in phosphate and compound fertilizers [59,60].Using these feeds and fertilizers increases total nitrogen, total phosphorus, and arsenic contents in the soil.As a result, TN and TP showed a positive relation with As content.
The emissions from non-ferrous metal smelting and fossil fuel combustion are commonly considered the most important sources of arsenic pollution [61,62].Arsenic diffuses outward with industrial waste gas and wastewater.As a result, As is often negatively correlated with the distance to a factory [63,64].The present study also found such a relationship, but the mean absolute standardized coefficient of Dis_IE was relatively small.The reason may be that most enterprises in the research area are food processing, weaving, and clothing factories, while mining and metal smelting factories are fewer.
Soil properties, including pH, clay content, and silt content, affect As accumulation by affecting the adsorption/desorption of arsenic [65,66].Specifically, the increase in pH

Impact of Environmental Variables on As Accumulation
Based on the mean absolute standardized coefficients, the road density, silt content, clay content, TN, and TP were the main influencing factors of As content, and traffic and agricultural sources were the main sources of As.
Kuancheng District is the transportation hub of Changchun City, with two railway stations, long-distance bus stations, and a high road density within the territory [32,34].With rapid urbanization, the road density and residential car ownership in the study area are rapidly increasing.Given that road surface materials and vehicle components contain As, an increasing road density will lead to more As accumulation in soil [45].Zechmeister et al. (2005) [46] found that friction between heavy-duty vehicle tires and the road is particularly conducive to the accumulation of pollutants such as As in roadside soil and moss.Many studies have observed arsenic pollution in the soil on both sides of the road [47][48][49][50].In this study, we found that As was positively related to the road density, which is consistent with the research of Seker et al. (2022) (Seker,et al. [51]) and Qiao et al. (2022) (Qiao, et al. [52]).
Using arsenical animal feed and phosphorus fertilizers in agricultural production is another important source of soil arsenic [53][54][55][56].For example, arsenic compounds such as roxarsone are commonly used as feed additives in livestock farming and enter the soil through livestock manure [57,58].In addition, several studies found As in phosphate and compound fertilizers [59,60].Using these feeds and fertilizers increases total nitrogen, total phosphorus, and arsenic contents in the soil.As a result, TN and TP showed a positive relation with As content.
The emissions from non-ferrous metal smelting and fossil fuel combustion are commonly considered the most important sources of arsenic pollution [61,62].Arsenic diffuses outward with industrial waste gas and wastewater.As a result, As is often negatively correlated with the distance to a factory [63,64].The present study also found such a relationship, but the mean absolute standardized coefficient of Dis_IE was relatively small.The reason may be that most enterprises in the research area are food processing, weaving, and clothing factories, while mining and metal smelting factories are fewer.
Soil properties, including pH, clay content, and silt content, affect As accumulation by affecting the adsorption/desorption of arsenic [65,66].Specifically, the increase in pH causes the soil to carry more negative charges and repel arsenate, leading to the desorption of solid arsenic in the soil [67,68].This increase will lead to an increase in available arsenic content and a decrease in soil arsenic content.As the soil particles become smaller, the specific surface area increases, and the charge density increases, which enhances the soil's ability to adsorb arsenic [69,70].As a result, pH was negatively related to As content, while clay and silt contents were positively related to As content.

Scale Effect of Explanatory Variables
The results of MGWR show great differences in the optimal bandwidth for different explanatory variables, indicating that the adaptative bandwidth is the main reason that MGWR outperforms GWR.This result also reflects the significant differences in the range of influence of different explanatory variables on As, which may be related to the spatial heterogeneity and inherent properties of the explanatory variables.
Natural factors such as elevation, TWI, pH, clay content, and silt content have larger bandwidths, whereas road density and Dis_IE have smaller bandwidths.On the one hand, natural factors typically control the spatial distribution of soil attributes on a larger scale, whereas human activity factors typically affect soil spatial patterns on a smaller scale [23,24].On the other hand, the number of local factories is small, and no heavily polluting industries are identified.Thus, the pollution impact range of industrial enterprises is relatively small.
The proper bandwidth of SOM was the smallest of all variables, only 43.The reason may be that farmers often adopt different cropping strategies.After long-term cultivation, the spatial dependence of SOM decreases, while spatial heterogeneity increases (Figure 2i) [71,72].Therefore, the impact range of SOM on As is relatively small.These results emphasize the importance of considering adaptative bandwidths for independent variables and confirm the effectiveness of MGWR.

Limitations
Although the 14 explanatory variables can explain 55.9% of As variation, some important factors were not included in the MGWR model due to data acquisition limitations.For example, iron minerals, aluminum oxides, and soil sulfur cycling all affect the adsorption/desorption of As [65,73,74].In future research, combining parent materials, high-resolution lithology, and Fe, Al, and S contents may further enhance the R 2 of the regression model.
The MGWR model assumes that the relationship between As and a covariate was linear.However, several studies have used machine learning methods to reveal non-linear correlations between soil properties and covariates [14][15][16].How to combine MGWR with machine learning to obtain an MGWR-machine learning model remains to be explored.

Conclusions
This study explored the effectiveness of MGWR, revealed the spatial non-stationary relationship between As and environmental variables, and determined the local impact factors and pollution sources of As in the Kuangcheng District.The results showed that 49% of the samples had arsenic content exceeding the background value, and these samples were mainly distributed in the central and southern parts of the region.The optimal bandwidth of different variables varies greatly in the MGWR model.MGWR outperformed GWR and MLR, and its R 2 reached 0.559.The MGWR model revealed spatially heterogeneous relationships between As and the explanatory variables.In particular, the road density, TN, clay, and silt content were primary or secondary influencing factors at most points, whereas Dis_IE and SOM were secondary influencing factors at only a few points.The main pollution sources of As were thus inferred as traffic and fertilizer, and industrial emission was also included in the southern region.These findings highlight the importance of considering adaptative bandwidths for independent variables and demonstrate the effectiveness of MGWR in exploring local sources of soil pollutants.

Figure 1 .
Figure 1.Location of study area and spatial distribution of sampling points.

Figure 1 .
Figure 1.Location of study area and spatial distribution of sampling points.

Figure 2 .
Figure 2. Spatial distribution maps of possible explanatory variables.(a) Spatial distribution map of industrial enterprises and road density.(b) Spatial distribution map of population density.(c) Spatial distribution map of total nitrogen.(d) Spatial distribution map of total phosphorus.(e) Spatial distribution map of soil types.(f) Spatial distribution map of silt content.(g) Spatial distribution map of clay content.(h) Spatial distribution map of soil organic matter content.(i)

Figure 2 .
Figure 2. Spatial distribution maps of possible explanatory variables.(a) Spatial distribution map of industrial enterprises and road density.(b) Spatial distribution map of population density.(c) Spatial distribution map of total nitrogen.(d) Spatial distribution map of total phosphorus.(e) Spatial distribution map of soil types.(f) Spatial distribution map of silt content.(g) Spatial distribution map of clay content.(h) Spatial distribution map of soil organic matter content.(i) Spatial distribution map of pH.(j) Spatial distribution map of elevation.(k) Spatial distribution map of topographic wetness index.TN: total nitrogen; TP: total phosphorus; SOM: soil organic matter content; TWI: topographic wetness index.

Figure 3 .
Figure 3. Spatial distribution maps of As content and contamination situation.

Figure 4 .
Figure 4. Optimal bandwidths of explanatory variables for GWR and MGWR models.Dis_IE: distance from an industrial enterprise; RD: road density; LU: land use type; PD: population density; TN: total nitrogen; TP: total phosphorus; ST1, 2: dummy variable for soil type; SOM: soil organic matter content; TWI: topographic wetness index.

Figure 3 .
Figure 3. Spatial distribution maps of As content and contamination situation.

Toxics 2024 , 18 Figure 3 .
Figure 3. Spatial distribution maps of As content and contamination situation.

Figure 4 .
Figure 4. Optimal bandwidths of explanatory variables for GWR and MGWR models.Dis_IE: distance from an industrial enterprise; RD: road density; LU: land use type; PD: population density; TN: total nitrogen; TP: total phosphorus; ST1, 2: dummy variable for soil type; SOM: soil organic matter content; TWI: topographic wetness index.

Figure 4 .
Figure 4. Optimal bandwidths of explanatory variables for GWR and MGWR models.Dis_IE: distance from an industrial enterprise; RD: road density; LU: land use type; PD: population density; TN: total nitrogen; TP: total phosphorus; ST1, 2: dummy variable for soil type; SOM: soil organic matter content; TWI: topographic wetness index.

Figure 5 .
Figure 5. Results of model evaluation and spatial distribution map of local R 2. (a) Results of model evaluation.(b) Results of spatial distribution map of local R 2 .

Figure 5 .
Figure 5. Results of model evaluation and spatial distribution map of local R 2. (a) Results of model evaluation.(b) Results of spatial distribution map of local R 2 .

Figure 6 .
Figure 6.Spatial distribution maps of significant standardized coefficients for the MGWR model.Dis_IE: distance from the industrial enterprise; RD: road density; LU: land use type; TN: total nitrogen; TP: total phosphorus; SOM: soil organic matter content.

Figure 7 .
Figure 7. Spatial distribution maps of main influencing factors at each sampling point.(a) Spatial distribution map of the primary influencing factor.(b) Spatial distribution map of the second influencing factor.Dis_IE: distance from the industrial enterprise; SOM: soil organic matter content; TN: total nitrogen.

Figure 7 .
Figure 7. Spatial distribution maps of main influencing factors at each sampling point.(a) Spatial distribution map of the primary influencing factor.(b) Spatial distribution map of the second influencing factor.Dis_IE: distance from the industrial enterprise; SOM: soil organic matter content; TN: total nitrogen.

Figure 8 .
Figure 8. Spatial distribution maps of the main sources of As at each sampling point.(a) Spatial distribution map of the primary source.(b) Spatial distribution map of the second source.

Figure 8 .
Figure 8. Spatial distribution maps of the main sources of As at each sampling point.(a) Spatial distribution map of the primary source.(b) Spatial distribution map of the second source.

Table 1 .
Possible influencing factors of As and their data sources.

Table 2 .
Descriptive statistics of measured soil properties.

Table 3 .
One-way ANOVA results of soil and land use types.
a,b indicate significant differences via least significant difference test.

Table 4 .
Standardized regression coefficients of explanatory variables for the MGWR model.

Table 4 .
Standardized regression coefficients of explanatory variables for the MGWR model.