Hierarchical Spatially Varying Coefﬁcient Process Regression for Modeling Net Anthropogenic Nitrogen Inputs (NANI) from the Watershed of the Yangtze River, China

: The increasing discharge of nitrogen nutrients into watersheds calls for assessing and predicting nitrogen inputs, as an important basis for formulating management strategies. The traditional net anthropogenic nitrogen inputs (NANI) budgeting model relies on 45 predictor variables, for which data are sourced from local or national statistical yearbooks. The large number of predictor variables involved makes NANI accounting difﬁcult, and the missingness of data reduces its accuracy. This study aimed to build a prediction model for NANI based on as few predictor variables as possible. We built a prediction model based on the last 30 years of NANI data from the watershed of the Yangtze River in China, with readily available and complete socio-economic predictor variables (per gross domestic product, population density) through a hierarchical spatially varying coefﬁcient process model (HSVC), which exploits underlying spatial associations within 11 sub-basins and the spatially varying impacts of predictor variables to improve the accuracy of NANI prediction. The results showed that the hierarchical spatially varying coefﬁcient model performed better than the Gaussian process model (GP) and the spatio-temporal dynamic linear model (DLM). The predicted NANIs within the entire catchment of the Yangtze River in 2025 and in 2030 were 11,522.87 kg N km − 2 to 12,760.65 kg N km − 2 , respectively, showing an obvious increasing trend. Nitrogen fertilizer application was predicted to be 5755.1 kg N km − 2 in 2025, which was the most signiﬁcant source of NANI. In addition, the point prediction and 95% interval prediction of NANI in the watershed of the Yangtze River for 2025 and 2030 were also provided. Our approach provides a simple and easy-to-use method for NANI prediction.


Introduction
Nitrogen is a common element in nature, and controls the processes, functions, and diversity of ecosystems [1]. As a result of the industrial revolution and human activities, new sources of nitrogen, such as fertilizer application, crop cultivation, nitrogen deposition, and food inputs, have had a significant impact on nitrogen cycling at the global and the regional level [2][3][4][5]; in some cases, natural nitrogen fixation is exceeded [6]. However, when nitrogen inputs exceed the nitrogen uptake and storage capacity of regional ecosystems, the remaining nitrogen enters watersheds, with long-term negative impacts [7,8], such as groundwater contamination, soil acidification, biodiversity decline, water eutrophication, and hypoxia [9]. The increasing nitrogen concentrations and fluxes in watersheds have become major factors contributing to the eutrophication of lakeshores, estuaries, and coastal areas [4][5][6][7][8][9].
Driven by resource scarcity, population expansion, and economic growth, human activities greatly interfere with the natural nitrogen cycle [10], and it has been shown that the nitrogen input from human activities has exceeded the natural nitrogen fixation in most river watersheds in China, such as the watershed of the Yangtze River and the watershed of the Pearl River [11]. For example, in 2010, 61.6% of the 26 state-controlled key lakes (reservoirs) in China had a water quality of V (the lowest grade), with total nitrogen being one of the main pollutants; in addition, inorganic nitrogen has become one of the common pollution indicators in the major sea areas of China [12]. Overall, excessive nitrogen input from human activities has become an important factor restricting environmental sustainability in China.
The high load of anthropogenic nitrogen input has attracted widespread attention from researchers, organizations, and the international scientific community, to study the anthropogenic biogeochemical cycling of nitrogen for sustainable development [13,14]. Quantitative source analysis of nitrogen in watershed ecosystems has become an international research hotspot, guiding studies on the optimization of nitrogen management in ecosystems at the watershed scale [15,16].
The most widely used model for accounting for anthropogenic nitrogen inputs into watersheds is the NANI model proposed by [17] and consisting of four components: atmospheric nitrogen deposition, nitrogen fertilizer consumption, crop nitrogen fixation, and food/feed imports, which represent the main types of nitrogen inputs due to human activities. The calculation results can be used to characterize the impacts of nitrogen produced by human activities on watersheds. The NANI model and its improved methods are widely used at various watershed and regional scales in the U.S., Europe, and Asia [18][19][20][21]. Han et al. estimated NANI for 99% of the world from 1961 to 2009 and suggested that the global NANI levels dramatically increased due to higher net nitrogen inputs in Asia; furthermore, the NANI values in Asia, Europe, and North America were above average [2]. To calculate NANI across United States watersheds, a consistent method was proposed and showed that yield-based estimation of NANI differed significantly from area-based estimation [18]. Hong et al. investigated regional variation parameters for the NANI Calculator Toolbox and observed significant regional variation in NANI across the Baltic Sea Basin, where the associated relationship with riverine nitrogen fluxes suggested that NANI contributed to a portion of the riverine nutrient fluxes [19]. Swaney et al. calculated NANI in Indian watersheds, showing that agricultural fertilizer is the main source of NANI, followed by crop nitrogen fixation, but the regression relationship between NANI and riverine nitrogen fluxes is problematic, due to the limited availability of river data [20]. A study about the spatiotemporal differences of NANI in mainland China from 1981 to 2009 revealed that the NANI had more than doubled [21].
To guide nitrogen management at the macroscopic scale, assessing the impacts of human activities on nitrogen inputs into watersheds is of great significance. The indicators involved in the NANI accounting model include the urban population, the rural population, the number of livestock, crop yield, nitrogen fertilizer consumption, compound fertilizer use, planting area, and energy consumption, among others, with a total of 45 indicators [22]. Data for these indicators are usually obtained from local or national statistical yearbooks. At longer time scales, the data usually have heavy missingness because of changes in the accounting of statistical yearbooks or omissions in data aggregation. The large number of indicators involved increases the difficulty of NANI accounting, and the missing data make NANI accounting less accurate. In this paper, we attempt to build a NANI prediction HSVC model based on as few predictor variables as possible for the Yangtze River Basin, China.The HSVC model is a generalization of the variable coefficient model, aiming to determine the spatial variations of data by allowing the coefficients to be functions of location. Since the model has obvious indigenous effects on analyzing spatial heterogeneity, and the coefficient function has a strong flexibility, it has been widely used in environmental science, ecology, and epidemiology [23][24][25][26].
In this study, a hierarchical spatially varying coefficient process regression (HSVC) model was built to elaborate the mechanisms of the predictor variables per gross domestic product (PGDP) and population density (PD) for NANI, and to forecast the annual values of NANI within the watershed of the Yangtze River. The main objectives were as follows: (1) to analyze the spatiotemporal variations of NANI, PGDP, and PD; (2) to explore the relationship between watershed NANI, PGDP, and PD, and to compare the fitting and prediction accuracy of HSVC, GP, and DLM models; (3) to predict the values and 95% interval predictions of NANI for the years 2025 and 2030. This study not only provides a simple and easy-to-use method for NANI prediction but also provides a watershed nitrogen management strategy, which is useful for nitrogen pollution control.

Study Area and Data Sources
The Yangtze River Basin, displayed in Figure 1, is located between 90 • -122 • E and 24 • -35 • N, covering an area of approximately 1.8 million km 2 [27]. It originates southwest of the Tang  In previous research, Cui et al. [28], studied the linear relationship between NANI, per capita GDP (PGDP), and population density (PD). In addition, the correlation coefficients between PGDP and PD were below 0.7, and we did not consider collinearity, as per reference [29]. Therefore, we directly adopted PGDP and PD as explanatory variables. The data for modeling were derived from a published master's thesis [22], including the response variable NANI and its four components; the predictor variables GDP; and the population in each subcatchment for the years 1980, 1985, 1990, 1995, 2000, 2005, 2010, and 2012, with a total of 88 samples. To ensure normality, NANI was modeled on the square root scale, PGDP and PD were centralized. Model fitting was performed using data between 1980 and 2010; for validation, data from 2012 were used.

Hierarchical Spatially Varying Coefficient Process Regression
In the Bayesian hierarchical setting, the hierarchical spatially varying coefficient model is specified, as follows [30,31]: Let Y(u i , t), i = 1, · · · , N, t = 1, · · · , T be the response variable of the observed data at site u i for time t, X j (u i , t), j = 1, · · · , p denote the predictor variables. The HSVC model has three stages, with the first stage equation given by where µ(u i , t) is the underlying mean process, ε(u i , t) is an independent error term of the model, assumed to follow a normal distribution ε(u i , t) ∼ N(0, σ 2 ε ). In the second stage, specifying the underlying mean process µ(u i , t) as where γ(u i , t) is the mean surface modeled by the following equation: here, α is the global intercept,β j = β j0 + β j (u i ) and β j0 , β j (u i ) are the fixed coefficient and the spatially varying coefficient at site u i , respectively. η(u i , t) is a spatially correlated random effect assumed to follow a Gaussian process with mean 0 and spatial correlated covariance Σ η = σ 2 η H η (φ). In particular, for any t, the covariance of η(u i , t) and η(u j , t) is where σ 2 η is an unknown invariant variance parameter, H η (φ) is an exponential covariance function, which depends on the distance u i − u j between sites u i and u j , φ is the parameter of correlation, controlling the decay rate, so the larger φ is, the faster (Σ η ) ij decrease.
are therefore written in vector and matrix notations: where β 0 = (α, β 10 , · · · , β p0 ) is an unknown p + 1 dimensional vector of fixed parameters, is an unknown N dimensional spatially varying parameter vector. In this study, we assume β as an unknown hyper-parameter and β 0 ∼ N(µ 0 , Σ 0 ) in the last stage [26,32]. The term X t is the N × (p + 1) matrix and defined as the term Z jt is an N × N diagonal matrix and defined as To estimate the parameters of HSVC model in Equation (4), it is necessary to obtain the posterior distribution of the parameters conditional on the observed data based on Bayes rule, which is as follows: is the prior distribution of the parameters θ. Let n = NT, Y = {Y 1 , · · · , Y T } denote the observed data, the posterior distribution of the parameters θ given by the observed data Y can be written as follows: , and π(φ) are the prior distribution.

Temporal Forecasting and Model Assessment
Suppose we forecast Y(u i , T + 1), the value at location u i for time point T + 1. In terms of (1) and (2), Y(u i , T + 1) has a normal distribution As a result, the posterior predictive distribution of Y(u i , T + 1) conditional on Y is followed by To derive the distribution π(Y(u i , T + 1)|Y), some assumptions need to be made for the terms on the right-hand side of (6). The terms µ(u i , T + 1) and µ T+1 follow a multivariate normal distribution, , where X T+1 and µ T+1 are X t and µ t with t being replaced by T + 1. Subsequently, the conditional distribution π(µ(u i , ), S 12 is the 1 × N vector with the ith entry as exp(−φ u i − u j ) and S 12 = S 21 . Furthermore, the coefficients β j (u i ) and β j (u) have the following joint distribution, Hence, we obtain the conditional distribution β j (u i )|θ: The conditional distribution π(θ|Y) is provided in (5), where the parameters of the posterior distribution are not tractable, but sampling from the posterior predictive distribution π(Y(u i , T + 1)|Y) using the Markov chain Monte Carlo(MCMC) algorithm is straightforward [33][34][35].
In this paper, model performance was assessed using predictive model choice criteria (PMCC) [36], which are expressed as follows: where Y(u i , t) new is a new observation, sampled from the distribution π(Y(u i , t) new |Y) for the given data Y(u i , t). The smaller the value of PMCC, the better the model fitting performance. Model prediction capability was assessed using the root mean square error (RMSE), mean absolute error (MAE), and relative bias (RB) [37], which are defined as follows: whereŶ(u i , t) is the prediction value, which could be the mean of the posterior distribution π(Y(u i , t)|Y), andȲ is the mean of the observations, a model with a lower value is preferred for prediction.

Spatial Correlation Test
Global Moran's I index was determined, to examine the spatial correlation of NANI across the Yangtze River Basin. Based on the locations and values of each sub-basin simultaneously, global Moran's I measures spatial autocorrelation and evaluates whether the Yangtze River Basin shows a positive correlation, negative correlation, or random correlation [38]. The global Moran's I for spatial autocorrelation across the Yangtze River Basin of the t year is given as follows: where w ij is the spatial weight between sub-basin i and j, if i and j are adjacent, w ij = 1; otherwise, The z-score is computed as follows: According to [39], E(I t ) = − 1 N−1 and lim N→∞ Var( Figure 2a shows the year-by-year changes in the overall spatial pattern of NANI in the Yangtze River Basin. To ensure the clarity of the standard deviation ellipse trajectory, only the elliptical trajectories of NANI in 1980,1990,2000, and 2010 are shown. The dynamic feature is that the average center of NANI moved from east to west during 1980-2010. To analyze the spatial variations of NANI, PGDP, and PD, the average NANI, PGDP, and PD values in each subcatchment were calculated. Figure 2b shows that NANI, PGDP, and PD had significant spatial variations in 11 subcatchments. The spatial differences in the PDGP and PD distributions were highly influenced by the NANI distribution across the catchment of the Yangtze River. The TH subcatchment had the highest PDGP/PD and the highest NANI values, and the JSJ subcatchment had the lowest PGDP/PD and the lowest NANI values. The results of Global Moran's I for NANI across the catchment of the Yangtze River from 1980 to 2012 are listed in Table 1. The values of global Moran's I ranged between 0.211 and 0.304, and all p-values were below 0.05, indicating that the NANI of the basin was significantly positively spatially correlated at a 5% significance level.

Exploratory Analysis of Spatiotemporal Availability and Variability of the Observations
In the watershed of the Yangtze River, the total NANI gradually increased from 3514.7 kg N km −2 in 1980 to 8138.2 kg N km −2 in 2012. However, in the same period, the mean values of NANI were 2360 km −2 in 1980 and 5013 kg N km −2 in 2009 in mainland China [21], lower than those for the Yangtze River Basin. At the same time, the temporal variations in NANI, PGDP, and PD are shown in Figure 3, illustrating that they steadily increased from 1980 to 2012. The PGDP grew at a rate of approximate 2.5 times per year, which was the highest, whereas PD increased at a rate of approximate 5% each year, which was the lowest.

Model Fitting Based on Hierarchical Spatially Varying Coefficient Process Model
To complete the estimation of model parameters, for the parameters σ 2 ε , σ 2 η , and σ 2 β , we assigned an inverse Gamma prior with hyperparameters 1 and 2. Considering a normal distribution with hyperparameters Σ 0 = 10 4 I 3 and µ 0 = 0 for parameter β 0 . For the parameter φ, assuming a Gamma distribution, the location parameter and scale parameter were set to 2 and 1, respectively [26,40]. The calculations were performed using the software R and the package spTDyn proposed by [32]. The MCMC chain was run for 5000 iterations, with the first 1000 as burn-in iterations. Satisfactory convergence of the chain was obtained in 5000 iterations.
The estimated results of the parameters based on fitting data for the HSVC model are displayed in Table 2, which shows that the predictor variables PD and PGDP had a significant and positive effect on NANI. The variance estimators of the random error term σ 2 ε and the spatially correlated error term σ 2 η were 0.0302 and 61.2185, respectively. Thus, the variability in NANI could be attributed to the spatially correlated error term. This indicates the necessity of using the HSCV modeling approach for NANI. The estimated spatial decay parameter φ was 0.4079, implying that the effective spatial correlation distance in NANI among sub-basins was 8 km, determined by exp(−φd 0 ) ≈ 0.05 [41].  Figure 4 shows the spatially varying coefficient values for PGDP and PD by sub-basin. As expected, the coefficient estimators varied by location, which also validated the HSVC model. The positive effect of PGDP on NANI was significant across sub-basins JSJ, DTH, and MM, and the negative effect of PGDP on NANI was seen across sub-basins WJ and JLJ. PD had a positive effect on NANI and was significant across sub-basins WJ, UM, and JLJ, and in the DTH and HJ sub-basins, PD had a significant positive effect on NANI.

Forecasting Based on the Hierarchical Spatially Varying Coefficient Process Model
A fitted HSVC model was applied for the temporal prediction of NANI. For the purpose of comparison among prediction models, we also fitted a GP model and a DLM with the same response and predictor variables. For more details of these two models, see previous studies [34,42]. Based on Table 3, the PMCC for the HSVC model was 4952.56, and those for the other two models were 23,125.30 and 1,762,757.79, indicating a better performance of the HSVC model in model fitting. To access the performance in prediction, we predicted the NANI for the 11 sub-basins in 2012 based on the HSVC model. In Table 4, the results show that the RMSE, MAE, and RB values were lower for the HSVC model. Thus, the HSVC model had a better prediction than the other two models.  Plotting the validation values against predicted values based on validation data is another way to validate the prediction accuracy. As shown in Figure 5, for the HSVC model, most of the validation values ranged between 4000 and 21,000. The predicted values in this interval had a higher accuracy. Based on the results, the HSVC model had a better prediction performance. We also predicted the future NANI in 2025 and 2030 using GDP and the natural rate of population growth as base conditions. In this study, GDP growth was 5.35%, the natural growth rate of the population continued to decline to 1.76 per 1000 per year, and 2021 was the peak year of China's population [43,44]. For simplicity, we set the population size in 2030 at the same level as in 2025. The predicted PGDP and PD values in each sub-basin are given in Table 5.
By plugging the predicted PGDP and PD values for 2025 and 2030 into the fitted HSVC model, the future NANI values in each sub-basin were predicted (Table 6 and Figure 6). As shown in Figure 6, the spatial variability was well captured by the spatial distribution of the predicted NANI. High NANIs appeared in almost all eastern and northern subbasins, indicating that economically developed and densely populated areas tended to form agglomerations of high NANIs.
The NANI within the catchment of the Yangtze River was calculated as follows: where area denotes the entire area of the watershed of the Yangtze River, area i denotes the area of the ith subcatchment, and NANI i denotes the predicted NANI of the ith subcatchment. As a result, the predicted NANIs within the entire catchment of the Yangtze River in 2025 and in 2030 were 11,522.87 kg N km −2 and 12,760.65 kg N km −2 , respectively. By mimicking the above procedure of the prediction, the NANI values of four components was also predicted, namely the atmospheric nitrogen deposition, nitrogen fertilizer application, crop nitrogen fixation, and food/feed imports in 2025 (Figure 7). Based on this, NANI from atmospheric nitrogen deposition was the main source, whereas that from crop nitrogen fixation was the most insignificant.  Although these predictions were affected by the uncertainty of the predicted values of PGDP and PD in each sub-basin in 2025 and 2030, these predicted results roughly reflect the potential impact of PGDP and PD on NANI in the Yangtze River Basin.The response of the NANI yield can provide a reference for protecting the estuarine and coastal environment.

Characteristics of the Predictor Variables
In this study, we modeled NANI based on as few predictor variables as possible, i.e., PGDP and DP across the Yangtze River Basin for 1980-2012 through a HSVC model. The NANI, PGDP, and PD in the Basin showed remarkable spatiotemporal variations.
Specifically, PGDP and PD had a positive effect on NANI. This is consistent with the results of Cui et al. [28], where PGDP and PD contributed to the increase in NANI in the watershed of the Pearl River. However, in some sub-basins such as WJ and DTH, the spatially varying coefficient values for PGDP and PD were negatively related to NANI, respectively. As a result, the predicted NANIs within the WJ sub-basin decreased from 2025 to 2030, this conclusion contradicts the fact that NANl increases over time.

Limitations and Future Research
The limitations of this research are as follows: (1) Due to the data available for analysis being eight years, we only used one year of data from 2012 for model validation. Therefore, the validation measures, RMSE, MAE, and RB were calculated for a fixed time, which may have caused the prediction to be inaccurate. (2) When predicting the future NANI in 2025 and 2030, the GDP growth and the natural rate of population growth were based on previous expert evaluations [43,44]. However, GDP growth and the natural rate of population growth can be divided into various scenarios, and we only used one of them for prediction, which is an approximate way to achieve this.NANI can be influenced by various factors, such as land-use factors and social factors [15], which still need further investigation. Further research could explore the explanatory variables of NANI, to gain a deeper understanding of the factors influencing NANI and to make more accurate predictions. Additionally, exploring more scientific model and research methods, such as geographically and temporally weighted regression, could provide a more comprehensive and profound understanding of modeling NANI.

Implications for Watershed Nitrogen Management
With the ongoing development of globalization in China, the Yangtze River Basin will play a increasing role in the future of the regional economy. The prediction of NANI is of great significance for nitrogen management in watersheds, since nitrogen concentrations in watersheds have become an indicator of ecological restoration and eutrophication mitigation. Based on the results of this study, watershed nitrogen management strategies could be implemented. First, partially organic alternatives to nitrogen fertilizer should be explored, given the high predicted value of nitrogen fertilizer application. Second, more attention should be given to atmospheric nitrogen deposition and reducing the amount of nitrogen emissions. Finally, from the perspective of nitrogen pollution control, effective riparian buffer strips and wetland treatment techniques could be built downstream of NANI hotspot areas, to intercept nitrogen pollution [45].

Conclusions
The purpose of this study was to use a hierarchical spatially varying coefficient process regression (HSVC) model to elaborate the mechanisms of the predictor variables per gross domestic product (PGDP) and population density (PD) for NANI, as well as to forecast the annual values of NANI within the watershed of the Yangtze River. First, we analyzed the spatiotemporal availability and variability of NANI, PGDP, and PD, and a significant (p < 0.05) spatial autocorrelation of NANI was diagnosed using global Moran's I from 1980 to 2012 in the watershed of the Yangtze River. Then, through a case study of NANI within the watershed of the Yangtze River, the performance of the proposed NANI prediction model was compared with those of the Gaussian process (GP) model and the spatiotemporal dynamic linear model (DLM). It was found that the HSVC model could extract more information on the relationships among NANI and predictor variables, thus improving the fitting and prediction accuracy. Finally, we predicted the NANI values for the years 2025 and 2030, and the values increased from 11,522.87 kg N km −2 to 12,760.65 kg N km −2 . In addition, the point predictions and 95% interval predictions of NANI within the watershed of the Yangtze River for the years 2025 and 2030 were provided. This paper provides a simple and easy-to-use method for NANI prediction.