Application of the Regression-Augmented Regionalization Approach for BTOP Model in Ungauged Basins

: Ten years after the Predictions in Ungauged Basins (PUB) initiative was put forward, known as the post-PUB era (2013 onwards), reducing uncertainty in hydrological prediction in ungauged basins still receives considerable attention. This integration or optimization of the traditional regionalization approaches is an effective way to improve the river discharge simulation in the ungauged basins. In the Jialing River, southwest of China, the regression equations of hydrological model parameters and watershed characteristic factors were ﬁrstly established, based on the block-wise use of TOPMODEL (BTOP). This paper explored the application of twelve regionalization approaches that were combined with the spatial proximity, physical similarity, integration similarity, and regression-augmented approach in ﬁve ungauged target basins. The results showed that the spatial proximity approach performs best in the river discharge simulation of the studied basins, while the regression-augmented regionalization approach is satisfactory as well, indicating a good potential for the application in ungauged basins. However, for the regression-augmented approach, the number of watershed characteristic factors considered in the regression equation impacts the simulated effect, implying that the determination of optimal watershed characteristic factors set by the model parameter regression equation is a crux for the regression-augmented approach, and the regression strength may also be an inﬂuencing factor. These ﬁndings provide meaningful information to establish a parametric transfer equation, as well as references for the application in data-sparse regions for the BTOP model. Future research should address the classiﬁcation of the donor basins under the spatial distance between the reference basin and the target basin, and build regression equations of model parameters adopted to regression-augmented regionalization in each classiﬁcation group, to further explore this approach’s potential.


Introduction
An ungauged basin is a relative concept, which mainly refers to a basin lacking high temporal and spatial resolution (higher observing site density), long historical records, and continuous ground precipitation and river flow data [1,2]. From a global perspective, due to the limited number of hydrological and meteorological observations, plenty of ungauged basins are widely distributed worldwide, especially in developing countries. Small and medium-sized basins are usually defined as basins with a catchment area of 50-3000 km 2 [3]. Compared with large basins, these small and medium-sized basins are more common and typical ungauged basins, due to natural conditions or social and economic constraints [1,4,5]. Moreover, discharge simulation is an essential part of water resources management [6], water environment protection [7] and risk management [8]. Therefore, it is a significant challenge for hydrologists to simulate discharge in small and medium watersheds with insufficient data.
In 2003, the International Association for Hydrological Sciences (IAHS) issued a hydrological ten-year plan called the Prediction in Ungauged Basins (PUB) for the next ten years (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012) at the 23rd International Union of Geophysics and Geodesy. It aims to vigorously promote the development of hydrological forecasting in ungauged basins [1,4,9]. The implementation of this project has promoted the development of parametric regionalization approaches for ungauged basins [5,10,11]. Among them, there are three regionalization approaches widely used in river discharge simulation with relatively stable simulation results: regression approach [11][12][13], spatial proximity approach (SP) [11,14,15], and physical similarity approach (PS) [1,16,17]. The regression approach is the most popular approach in regionalization, and usually utilizes hydrological reference information of the basin to establish the regression equation between model parameters and watershed characteristic factors, such as basin area, the annual average rainfall, and so on [5,11]. SP is based on the climate, and underlying surface conditions change uniformly in the space between the donor basins and the target one [18]. At present, there have been many improvements to the SP, such as the use of inverse distance weighting (IDW) and Kriging for parameter interpolation in the participating basins [19,20]. PS was founded on the existence of similar watershed attributes among basins. For example, Oudin et al. (2008) [11] proposed an integration similarity approach (SP − PS), which takes the spatial distance as an attribute, showing certain advantages over the PS in the 913 studied watersheds in France.
Although nearly ten years have passed since the end of PUB in 2012, the research on regionalization approaches for river discharge simulation in the ungauged basin has never stopped. Arsenault and Brissette (2014) [13] introduced the regionalization approach of regression-augmented (RA) for the first time, essentially combining the multiple linear regression approach with the SP and PS approach, and found RA outperformed SP and PS regionalization approaches in 268 Canadian watersheds. Hong et al. (2017) [21] carried out spatial interpolation for a runoff simulation in the ungauged basin, based on the information diffusion model of the genetic algorithm in the Yellow River Basin. The results showed SP combined with a machine learning method has better simulation results than the SP combined with the IDW method or Cokriging method. In Tyrol, Pugliese et al. (2018) [22] simulated rainfall-runoff on a macro-scale. The results showed that the SP combined with the data assimilation method could significantly improve the runoff prediction of the basin with very low station density (one gauge per 2000 km 2 ). Arsenault et al. (2019) [18] used six regionalization approaches (SP, SP-IDW, SP-IDW-RA, PS, PS-IDW, PS-IDW-RA) to predict streamflow, based on three hydrological models in ungauged catchments in Mexico. The results showed that the SP approach performed best, and the regression-augmented spatial similarity approach was also a great choice in certain conditions. Kanishka and Tieldho (2020) [23] pointed out that before utilizing regionalization approaches for river discharge simulation in an ungauged basin, isomap and principal component analysis could be used to classify the reference basins. This kind of procedure can improve the result of river discharge simulation in the target basin.
More studies have shown that optimizing and combining the spatial proximity approach, physical similarity approach, or regression approach is a favorable method to improve the accuracy of simulation results of the regionalization approach [13,18,[21][22][23]. However, it is difficult to determine the best proper regionalization approach, due to various factors, such as the heterogeneity of the basin surface condition and the diversity of hydrological models [1]. As a semi-physically distributed hydrological model, the blockwise use of TOPMODEL (BTOP) has been applied in many watersheds in the world [24][25][26][27][28][29][30][31]. The model has fewer tuning parameters, and each parameter holds the physical interpreta-tion. Therefore, it has the potential for its parameters to be related to basin features, and can show superiority in the parameter transfer functions in the ungauged basins [32]. However, limited research with inadequate data is related to regionalization approaches based on the BTOP, to the authors' best knowledge. Ao et al. (2006) [32] linked the parameters of the BTOP model with the physical features of the watershed through multiple regression equations, to explore the possibility of establishing an improved physically-based a priori parameter estimation technique. They found that 'improved' a priori parameter estimation techniques have the potential to reduce parameter uncertainty and enable prediction in ungauged basins. Huang et al. (2015) [33] attempted to build a simple linear relationship between BTOP sensitive parameters and possible watershed features. They defined two new characteristics (mean soil particle size and mean root depth) based on the model input data, showing that the newly defined characteristics played an essential role in building the relationship between the stability of parameters and watershed features. However, none of the above research specifically studied the regionalization approaches for the BTOP model in the ungauged basin. Accordingly, this paper is committed to filling the gap.
This study explores whether the RA regionalization approach could improve discharge simulation, compared with the traditional regionalization approach. We selected 11 small and medium basins in the middle and lower reaches of the Jialing River, the largest tributary of the Yangtze River in the southwest of China. Then, a total of 12 regionalization approaches, which combined with the SP, PS, SP − PS, and the regression approach, are applied to the study area to facilitate the BTOP model in obtaining river discharge simulation results. Finally, an evaluation is conducted to compare the RA regionalization approach with the traditional regionalization approach.

Study Area
Jialing River is an important tributary of the Yangtze River, and is an essential source of flood water and sediments for the Three Gorges Reservoir [34][35][36]. We selected 11 small and medium basins in the middle and lower reaches of the Jialing River basin with similar distance and discharge observation data of sub-basins as the study area ( Figure 1). The selected basins are located in mountainous regions, and can be considered a typical small and medium-sized ungauged basin, due to sparse gauges density.
As shown in Figure 1, the studied catchments are located between 30-32 • N and 105-109 • E, and each catchment area is monitored by a river gauge. The study basins belong to a subtropical monsoon humid climate, with an annual average rainfall of about 1000 mm and an annual average temperature range of 14.7-17.6 • C. Except for MB(ID6), which is located at the edge of the Sichuan Basin and has a high altitude and low temperature, there is little difference in other basins. Most of the basins in the study area have a relatively small mean slope, with a range of 3.93~23.60 • . Study area has similar spatial distance and hydrological characteristics. The 11 basins selected in this study can be considered as a typical ungauged mountainous catchments, which have a great reference significance for the discharge simulation of ungauged basins worldwide.

BTOP Model
The BTOP model, a semi-distributed hydrological model, is designed to perform simulations of rainfall-runoff, soil moisture, unsaturated and saturated groundwater, and river discharge processes at each BTOP grid in watersheds [37,38]. Moreover, BTOP is referred to as BTOPMC when the Muskingum-Cunge (MC) method is applied for the flow routing phase [32,39]. As a significant part of the Yamanashi Hydrological Model (YHyM) system for river basin simulation, BTOP has been applied to many river basins worldwide, especially in warm, humid catchments [25,33,40,41]. The current version of the BTOP model has five calibrated parameters of n 0c , m, α, SD bar , and D o (which is calculated by the fraction and dischargebility of sand, silt, and clay). Table 1 summarizes a range of parameter values with the default BTOP model parameters obtained for the Mekong River basin [24,38]. Compared with other conceptual hydrological models, the BTOP model parameters have specific physical interpretations. Moreover, the number of tuning parameters is fewer than other physically-based distributed hydrological models, such as SWAT, which means that the parameter interaction and uncertainties are less [32]. Therefore, a robust relationship between the model parameters and the physical characteristics of the river basin is more likely to be established for the BTOP model. In summary, applying the BTOP model to the study of the regionalization approach of the data-sparse watershed has apparent advantages.

Core of BTOPMC
The structure of BTOP includes core components and optional modules. For the runoff simulation section, the BTOP core modules are briefly introduced as follows [33].

(a) Topographic sub-module
The topographic module of BTOP is based on DEM and soil type data. Topographic processing includes digital river network generation, sub-basin division using river station locations for delineating drainage area boundaries, effective contributing area calculation, and topographic index calculation [42].

(b) Runoff generation sub-module
The runoff generation in BTOP is calculated in each grid cell. The processing divides into four zones vertically: vegetation, root, unsaturated, and subsurface [38]. There are four model parameters involving calibration and optimization for each BTOP block, which can be less or equal to the number of study basins (see Table 1). Furthermore, the Hortonian overland flow option was selected in our study. The detailed rainfall-runoff calculation of each zone can be referred to in the user manual of BTOP [38].

(c) Flow routing sub-module
The flow routing sub-module of BTOP was constructed by the Muskingum-Cunge (MC) method [43,44], but the original MC method could not explain the backwater phenomena. Thus, the modified MC method was developed to overcome this problem [45,46], by solving the inherent negative outflow problem and ensuring the flow routing accuracy of BTOP. The modified MC method also indicated a better performance than the original MC method implemented in earlier applications of BTOP [32,37]. Moreover, at this stage, there is just one model parameter, the block-average Manning's coefficient n 0c , that needs to be optimized and calibrated, and n 0c is used to calculate the Manning's coefficient at each grid using grid slope.

Evaluation Criteria
An appropriate objective function should be used to evaluate the proximity between simulated and measured river discharge. In this study, the coefficient of determination (R 2 ) and Nash-Sutcliffe Efficiency (NSE) [47] were selected as the objective function of the model simulation evaluation. The formula of evaluation criteria are as follows: where, Q s and Q o represent the simulated flow rate and the measured flow rate, respectively. The i is the number of time series of discharge, n is time steps in the evaluation period. The value range of R 2 is 0~1. The closer R 2 is to 1, the more consistent the change process lines between the simulated value and the measured value. The value range of NSE is −∞-1. The closer NSE is to 1, the better is the simulation efficiency [47].

Data
We collected 37 rainfall stations and 11 hydrological stations in the study basins. Each basin includes at least three rainfall stations and a hydrological station at the outlet. Daily streamflow records and daily rainfall records from 1981-1987 were derived from hydrologic year books published by the Hydrologic Bureau of the Ministry of Water Resources of China. Other input data of the BTOP model are globally available data: the topography is represented by the DEM with an original resolution of 30 m, obtained from the geospatial data cloud [48]. The land cover dataset on a grid-scale of 500 m was accessed from the USGS Land Cover Institute [49]. Food and Agriculture Organization (FAO) provides the soil map at the scale of 1:5 million [50]. The National Centers for Environmental Information (NCEI) provides the leaf area index (LAI) with an original resolution of 0.05 degree [51]. We employed the potential evapotranspiration from the Climatic Research Unit (CRU), with an original resolution of 0.5 degree [52]; the potential intercept evaporation with an original resolution of 0.25 degree was obtained from the Global Land Evaporation Amsterdam Model (GLEAM) [53]. All the above datasets were resampled to 500 m for BTOP model simulation. The BTOP models were set up for 11 study catchments of Jialing River, and each catchment boundary was controlled by the river hydrological station. Figure 2 shows the framework of this research. First, the model calibration was carried out for 11 study basins to obtain the parameter calibration values of the hydrological model. Then, each type of model calibrated parameter and each type of watershed attribute characteristic factor were calculated as correlation coefficients to select the influential watershed characteristic factors. After that, the SPSS software [54,55] was used to establish the linear regression equation between 5 parameters of the BTOP model and the watershed attribute characteristic factors with a higher correlation coefficient (R > 0.6). In the establishment of the regression equation for model parameters, to explore whether the more consideration of the basin characteristic factors leads to a better model fitting effect, we established a one-variable linear regression equation (regression I), two-variable linear regression equation (regression II), and a multivariate linear regression equation (regression III) for the corresponding model parameters. In regression III, the basin characteristic factors that had an R greater than 0.6 with parameters were considered. The fitting effect of the regression equation was evaluated by R 2 .

Methodology
According to the hydrologic year books, in the 1986 daily river discharge data, the YH and MB basins were missing for several days, and there were a few discharge data estimated by hydrological methods in the basins of SXZ, YT, and ZJC. Therefore, those five basins were selected as basins with a lack of river discharge data, and SP, PS, and SP-PS approaches were analyzed in turn. According to the target data-deficient basin, the similarity ranking of gauged basins was carried out, and the appropriate donor basins were selected.
The next step was based on the BTOP model, using traditional regionalization approaches and RA regionalization approaches to simulate discharge in the ungauged basin. The output averaging approach was adopted to deal with the discharge simulation results.
Finally, we compared the evaluation criteria of simulation results of traditional and RA regionalization approaches. The details of the methodology are described in Sections 3.1-3.4. Figure 2. The framework of the study. SP, PS, and SP-PS are abbreviations for spatial proximity approach, physical similarity approach, and an integration similarity approach (SP-PS); the R I, RII, and RⅢ represent regression equations with different numbers of independent variables. According to the hydrologic year books, in the 1986 daily river discharge data, the YH and MB basins were missing for several days, and there were a few discharge data estimated by hydrological methods in the basins of SXZ, YT, and ZJC. Therefore, those five basins were selected as basins with a lack of river discharge data, and SP, PS, and SP − PS approaches were analyzed in turn. According to the target data-deficient basin, the similarity ranking of gauged basins was carried out, and the appropriate donor basins were selected.

Selection of Watershed Characteristic Factors
The next step was based on the BTOP model, using traditional regionalization approaches and RA regionalization approaches to simulate discharge in the ungauged basin. The output averaging approach was adopted to deal with the discharge simulation results.
Finally, we compared the evaluation criteria of simulation results of traditional and RA regionalization approaches. The details of the methodology are described in Sections 3.1-3.4.

Selection of Watershed Characteristic Factors
When using the PS and SP − PS to carry out parameter transplantation in the ungauged basins, it involved the selection of basin descriptors. Basin characteristic descriptors selection in the regionalization study of runoff could be classified into two categories: physical descriptors and hydroclimatic descriptors. Basin area (Ac, km 2 ), average elevation (E, m), average slope (S, • ), forest cover fraction (W, %), and average topographic index (λ) are usually selected as physical descriptors. The annual average rainfall (P, mm) and annual runoff coefficient (RC) are generally chosen as the hydroclimatic descriptors [11]. In addition, soil type and land cover type are two essential characteristics of the basin. They are also important influencing factors in the process of rainfall and runoff simulation of hydrological models. According to the simple linear modeling method explored by Huang et al. (2015) [33], two new watershed characteristic factors, mean soil particle size (Φ) and mean root depth (r), are employed as factors in this study as well. The Φ not only introduces the proportion information of each component of soil type but also introduces the ordering information of soil particle size. Its formula is as follows: where, Φ i is the proportion of i kinds of soil components, r i refers to the sorting number of the size of the i kinds of soil components (each soil type of BTOP is composed of clay, soil, and sand), n refers to total types of soil components. Similarly, land-cover types in the BTOP model were classified into 17 categories by IGBP of USGS Land Cover Research Institute. The r introduces information on the proportion of each type of land use type in the watershed and its root depth information: where, r i is the root depth corresponding to each land-use type, t i is the proportion of the area occupied by each land-cover type in the watershed, and m is 17 categories of land-cover in the research watershed. In summary, nine watershed characteristic factors selected in this study are shown in Table 2. The basic principle of the SP approach is that the climate and underlying surface conditions change uniformly in space, and it is believed that similar fields have similar hydrological behaviors [18]. In this study, the centroid coordinates of each watershed was calculated by ArcGIS software, and the spatial distance value from each participating basin to the target basin was obtained, respectively.

Physical Similarity Approach (PS)
The PS approach refers to the approach of transmitting hydrological information in a data-deficient basin using reference basins that are similar in terms of basin characteristics. It is usually divided into two categories: basin feature ranking and physical weight methods [11,18]. According to the hydrological characteristic values of each basin, the PS is used to calculate the physical similarity values of each reference basin to the ungauged basin. The smaller the attribute similarity value is, the higher the watershed similarity is [18]. The formula of the physical similarity approach is as follows: where, x i G is the i-th watershed characteristic factor of the reference basin; x i U is the corresponding watershed characteristic factor of the data-sparse basin; ∆x i is the difference between the maximum and minimum value of the i-th watershed characteristic factor of the donor basins; and k is the number of selected watershed characteristic factors.

Integration Similarity Approach (SP − PS)
Oudin et al. (2008) [11] proposed an integration similarity approach (SP − PS) combining the SP approach and the PS approach in the comparative study of 913 watersheds' regionalization approaches in France in 2008. This approach takes the spatial distance as an attribute. Then, the PS approach formula is used to calculate the similarity value between the donor basin and the target basin.

'Regression-Augmented' (RA) Regionalization
The regression-augmented (RA) regionalization approach combines the regressionbased approach with the SP, PS, and SP − PS approach: the regression equation was used to obtain the relevant model parameters, and the remaining parameters of the model were transplanted according to the reference basins selected by the SP, PS, and SP − PS approach. The regression-based approach establishes regression equations for the model parameters that have a high correlation with watershed attributes. It is generally believed that the correlation coefficient between model parameters and watershed characteristic factors is greater than 0.5, indicating that the established regression equation has application significance [11,18]. Usually, to build the regression equation of model parameters and watershed characteristic factors, only the basins with better simulation performance judged by model calibration results are selected in the study. Then, the corresponding model parameters in the data-deficient area can be estimated based on the regression equations. Thus, each model parameter is independently evaluated in the regression-based approach. However, the model parameter set cannot be evaluated and established simultaneously, which leads to the partial cohesion of the hydrological model parameter set being lost, and affects the simulation performance of the model [11,18]. The RA regionalization approach overcomes the shortcoming of the regression-based approach and makes full use of the watershed attribute characteristics of the ungauged basin and the parameter set of the reference catchment in the parameter transfer process, thus retaining the cohesion of the transferred parameter set [13].

Output Averaging of Regionalization
When the regionalization approach of parameter transplantation is applied to the ungauged basin, the number of reference basins will affect river discharge simulation and the prediction of the target basin [1]. When 2-5 reference basins are used for the regionalization approach, the model simulation efficiency is more satisfactory [56]. In this study, three reference basins were selected when the regional approach was used for parameter transplantation of the ungauged basin. When the traditional regionalization approach uses parameter sets of reference basin to simulate the runoff of the target watershed based on the hydrological model, there are two approaches to deal with the transplanting process: one is the parameter averaging, and the other is the output averaging [11,13]. Parameter averaging is conducted by taking the average value of the model parameters of the selected reference basins, and then transplanting them to the target basin for river discharge simulation. The output averaging indicates that the calibrated parameter sets of each reference basin are transplanted to the target basin respectively, and then the average river discharge simulation results are taken. Previous studies have shown that some reference watershed information may be lost in the process of parameter averaging, and the simulation results are often not as good as the output averaging [11,13,57]. Moreover, in this study, some parameters of the RA approach were obtained by regression equations, which could not be averaged. After comprehensive consideration, the output averaging approach was adopted to obtain the river discharge simulation results of the target basin in this study. Table 3 shows the calibration and validation results of river discharge simulation using the BTOP model of each basin, and Table 4 shows their calibrated parameters. The NSE of the research basin during calibration was basically above 0.60 (except for the QX (ID11), the model performance was generally good, and the model performance of small basins, such as JB (ID8), MB (ID7) and MYT (ID10) were excellent. The R 2 of BTOP simulation results were all greater than 0.6, and the hydrographs of river discharge simulation results and the measured hydrographs had a good fit degree. In the validation period, the NSE values were all above 0.50, and the R 2 values were above 0.60, reflecting good performance of the BTOP model simulation. The NSE values of the BY (ID2), JB (ID8), MB (ID7), QX (ID11), SXZ (ID1), and YT (ID5) during the validation period were even higher than that during the calibration period. Since the study basin belongs to the small and medium-sized mountain basins, and in the 1980s, the hydrological and meteorological monitoring had certain limitations, and the rainfall data itself had a large error. Therefore, according to the simulation results in Table 3, the BTOP hydrological model presented a good simulation effect in the daily river discharge simulation of 11 study basins, and the regression equation between the model calibration parameters and the characteristic factors of the basin were reliable.

Regression Equations Establishment
This study individually analyzed the correlation between the BTOP calibrated parameters of the 11 small and medium basins in Table 4, and the watershed characteristic factors in Table 2. We found that only three parameters (SD bar , m, and α) in the BTOP model had correlation coefficient values higher than 0.6 with watershed characteristic factors. As shown in Figure 3a, the correlation coefficient of parameter SD bar with the mean soil particle size ϕ, the forest coverage rate W, the average root depth r, the average watershed elevation E are −0.773, 0.739, 0.696, and 0.620, respectively. The parameter SD bar was determined in the unsaturated zone of the model river discharge simulation, which describes the condition of soil water content [38], and the ϕ, W, r, and E had an apparent impact on the soil water content in the runoff phase. Thus, the research results are reasonable and have physical significance. In Figure 3b, the correlation coefficient between the model parameter m and W was 0.813, the annual runoff coefficient RC was 0.781, and the r was 0.690. The m is the decay factor of the lateral transmittance of the groundwater [33]. Therefore, the research results show that the high correlation between this parameter and W (reflecting land-cover), r (reflecting land-cover), and RC (affected by soil type) can be explained. As for parameter α shown in Figure 3c, the correlation coefficient of α and W, r, ϕ, and E were 0.810, 0.711, −0.690, 0.645. The parameter α is the drying coefficient of the root zone, which reflects the drying capacity of the soil surface in the root zone. Therefore, soil type and land use are statistically significant influencing factors.
The multiple linear regression analysis module of the SPSS software was used to establish the regression equation between model parameters and basin characteristic factors. According to the results of correlation analysis, each of the three model parameters, SD bar , m, and α, had more than three basin characteristic factors with their correlation coefficient R > 0.6. The SPSS fitting results of each regression approach are shown in Table 5, and we can see that the deterministic coefficients of regression I, II, and III increased successively. It indicates that the more watershed characteristic factors were considered, the better the fitting effect between model parameters and selected watershed characteristic factors would be. However, when the regression equation was combined with the traditional regionalization approach, the combination effect of regression III may not be as good as regression I and II. This issue is analyzed in the following two sections.

Regression Equations Establishment
This study individually analyzed the correlation between the BTOP calibrated parameters of the 11 small and medium basins in Table 4, and the watershed characteristic factors in Table 2. We found that only three parameters (SDbar, m, and α) in the BTOP model had correlation coefficient values higher than 0.6 with watershed characteristic factors. As shown in Figure 3a, the correlation coefficient of parameter SDbar with the mean soil particle size φ, the forest coverage rate W, the average root depth r, the average watershed elevation E are −0.773, 0.739, 0.696, and 0.620, respectively. The parameter SDbar was determined in the unsaturated zone of the model river discharge simulation, which describes the condition of soil water content [38], and the φ, W, r, and E had an apparent impact on the soil water content in the runoff phase. Thus, the research results are reasonable and have physical significance. In Figure 3b, the correlation coefficient between the model parameter m and W was 0.813, the annual runoff coefficient RC was 0.781, and the r was 0.690. The m is the decay factor of the lateral transmittance of the groundwater [33]. Therefore, the research results show that the high correlation between this parameter and W (reflecting land-cover), r (reflecting land-cover), and RC (affected by soil type) can be explained. As for parameter α shown in Figure 3c, the correlation coefficient of α and W, r, φ, and E were 0.810, 0.711, −0.690, 0.645. The parameter α is the drying coefficient of the root zone, which reflects the drying capacity of the soil surface in the root zone. Therefore, soil type and land use are statistically significant influencing factors.
(a) The multiple linear regression analysis module of the SPSS software was used to establish the regression equation between model parameters and basin characteristic factors. According to the results of correlation analysis, each of the three model parameters, SDbar, m, and α, had more than three basin characteristic factors with their correlation coefficient R > 0.6. The SPSS fitting results of each regression approach are shown in Table 5, and we can see that the deterministic coefficients of regression I, II, and Ⅲ increased successively. It indicates that the more watershed characteristic factors were considered, the better the fitting effect between model parameters and selected watershed characteristic factors would be. However, when the regression equation was combined with the traditional regionalization approach, the combination effect of regression Ⅲ may not be as good as regression I and II. This issue is analyzed in the following two sections.

Similarity Ranking of Basins
While exploring the effect of each regionalization approach on river discharge simulation of the data-deficient basin, an independent "data-deficient basin" should be selected. According to Figure 1, except for the JB (ID8), which contains an inset sub-basin, SXZ (ID1), all other watersheds were independent catchments. Here, we selected SXZ (ID1), YH (ID3), YT (ID5), MB (ID7), and ZJC (ID9) as the data-deficient basins, which were drained by different tributaries of the Jialing River. The similarity sorting was performed for each datadeficient basin according to the SP, PS, and SP − PS, and are shown in Figure 4. Table 6 displays the three most similar reference basins selected, according to the above three approaches and their similar values to the ungauged basin. The river discharge simulation results of the three reference basins corresponding to the target basin were obtained based on the weight of similarity values in Section 4.4.

Similarity Ranking of Basins
While exploring the effect of each regionalization approach on river discharge simulation of the data-deficient basin, an independent "data-deficient basin" should be selected. According to Figure 1, except for the JB (ID8), which contains an inset sub-basin, SXZ (ID1), all other watersheds were independent catchments. Here, we selected SXZ (ID1), YH (ID3), YT (ID5), MB (ID7), and ZJC (ID9) as the data-deficient basins, which were drained by different tributaries of the Jialing River. The similarity sorting was performed for each data-deficient basin according to the SP, PS, and SP-PS, and are shown in Figure 4. Table 6 displays the three most similar reference basins selected, according to the above three approaches and their similar values to the ungauged basin. The river discharge simulation results of the three reference basins corresponding to the target basin were obtained based on the weight of similarity values in Section 4.4. The ordinates of (a-c) represent the spatial distance (km) of the center of mass, the physical similarity value, and the integration similarity value between the reference basin and the target basin, respectively. The horizontal axis is the 11 study basins: the solid triangle represents the target basin. The solid circle reflects the three most similar reference basins selected for the corresponding target basin.  The ordinates of (a-c) represent the spatial distance (km) of the center of mass, the physical similarity value, and the integration similarity value between the reference basin and the target basin, respectively. The horizontal axis is the 11 study basins: the solid triangle represents the target basin. The solid circle reflects the three most similar reference basins selected for the corresponding target basin.

The Results of Regionalizations
In this study, a total of 12 regionalization approaches were used for each data-deficient basin. Therefore, it is necessary to conduct an overall evaluation of the river discharge simulation results of the regionalization approaches for each ungauged basin. Figure 5 shows the Taylor diagram [58] of each BTOP simulation result using different regionalization approaches for five target basins. It uses three indicators (correlation coefficient (R), normalized standard deviation (NSD), and normalized root mean square deviation (NRMSD)) in one graph to evaluate how close the simulation is to the measured discharge. The R shows correlation, NSD reflects the degree of dispersion, and NRMSD is sensitive to outliers in data series. In general, the river discharge simulation results of all regionalization approaches in the study basin were inferior to the simulation results during the model validation period, except for the YH(ID3). Both the traditional and RA regionalization approaches showed satisfied simulation results in the YH basin, even better than the validated simulation results (as other markers are closer to the reference point (REF) than the black circle in Figure 5b). However, for the regionalization approaches, the traditional approach was superior to the regression-augmented ones. For the SXZ(ID1), YH(ID3), and ZJC(ID9) basins, as shown in Figure 5a,b,e, the triangle is closer to the REF, followed by the square and the five-star icon, indicating that "SP/PS/SP − PS + RI" and "SP/PS/SP − PS + RII" have more significant simulation effects than "SP/PS/SP − PS + RIII". In the same way, for the YT(ID5) and MB(ID7) basins, the simulation effect of "SP/PS/SP − PS + RI" is better, and the simulation effect of "SP/PS/SP − PS + RII" and "SP/PS/SP − PS + RIII" are very close. The evaluation of these regionalizations in each target basin provide some implications for the following regionalization analysis. Table 7 shows the BTOP model simulation evaluation results of all regionalizations conducted in this study. The application of traditional regionalization approaches in the study area showed that the simulation effect in SXZ(ID1) and YH(ID3) is good. The simulation results were satisfactory in target basin YT(ID5). In watershed MB(ID7), the simulation results were generally good. The traditional regionalization approach had a favorable application effect in the five selected basins. On the whole, among the three traditional regionalization approaches of SP, PS, and SP − PS, SP had the best simulation effect in the research area, which is consistent with Figure 5.
the SXZ(ID1), YH(ID3), and ZJC(ID9) basins, as shown in Figure 5a,b,e, the triangle is closer to the REF, followed by the square and the five-star icon, indicating that "SP/PS/SP-PS+RI" and "SP/PS/SP-PS+RII" have more significant simulation effects than "SP/PS/SP-PS+RⅢ". In the same way, for the YT(ID5) and MB(ID7) basins, the simulation effect of "SP/PS/SP-PS+RI" is better, and the simulation effect of "SP/PS/SP-PS+RII" and "SP/PS/SP-PS+RⅢ" are very close. The evaluation of these regionalizations in each target basin provide some implications for the following regionalization analysis. REF is the observed discharge; Ori represents the original simulated results (by auto-calibrating the model); RI, RII, and RⅢ are the approaches with regression I, regression II, and regression Ⅲ, respectively. For example, the red triangle means the river discharge simulation results using the SP+RI approach.  The RA regionalizations generally presented a good simulation effect in SXZ (ID1), YH (ID3), and YT (ID5). However, the results in MB (ID7) and ZJC (ID9) were not ideal. Taking YH (ID3) as an example, the R 2 and NSE of river discharge simulation results of the SP + RI approach were 0.70 and 0.63, respectively. The R 2 and NSE of SP + RII were 0.70 and 0.64, respectively, representing an excellent simulation effect and indicating a certain simulation potential. As shown in Figure 6a, the SP performed better, and SP − PS + RIII performed the least, but overall, the difference was not prominent. The river discharge simulation results of the regional approach showed that the low-flow simulation was on the high side, while the high-flow simulation was on the low side, and the difference in simulation results of various regional approaches was mainly reflected in the maximum flow peak simulation.  Table 7, the simulation effect of RI and RII combined with the traditional regionalization approaches were generally better than that of RIII in five study basins. The distinction of RI, RII, and RIII lied in the set of watershed characteristic factors, taken into account by the regression equation in the RA approach, which implies that the watershed characteristic factors set of the model parameter regression equation had a non-negligible influence on the simulation results. Therefore, determining the optimal characteristic factors set in the parameter transfer equation of the BTOP model parameters should be focused on in future research. Figure 6e shows that "SP + RII" had the best simulation effect among "SP/PS/SP − PS + RII". It indicates that the regression-augmented spatial similarity approach could get a good regionalization in some circumstances, which agrees with the findings from Arsenault et al. (2019) [18]. Moreover, there are three parameters (see Table 5) that were replaced by linear regression in the RA regionalization approaches, accounting for 60% of the calibration parameters of the BTOP model. This regression strength analysis showed that in the RA Moreover, there are three parameters (see Table 5) that were replaced by linear regression in the RA regionalization approaches, accounting for 60% of the calibration parameters of the BTOP model. This regression strength analysis showed that in the RA approach, more than half of the parameters of the BTOP model were obtained from the regression equation. Thus, its simulation performance was directly related to the regression equation, and this analysis leads to additional considerations for the RA approach, in that the regression strength may also be a significant factor affecting the simulation result.
In summary, the five "data-deficient basins" results in this study indicated that the traditional regionalization approach is superior to the RA regionalization approach. We summarized several possible reasons for the outcome. First, due to the limited number of small watersheds, the established relationship between the parameters and the watershed characteristic factors was not accurate enough. Thus, there is still room for improvement in the regression equation, as the regression is vital for the RA approach. Secondly, the regression strength was an influence coefficient for the simulation result of the RA approach. Lastly, compared with previous research which adopts the SP approach [11,18], the small and medium basins in the study area were relatively closely situated. Traditional regionalization approaches are based on the characteristics of watersheds or spatial distances, which makes the traditional regionalization approaches have more advantages than RA. In contrast, the preponderance of RA regionalization approaches have not been highlighted. Introducing more donor basins and establishing representative regression equations that are combined with traditional regionalization approaches are critical to improving the applicability of RA regionalization approaches in ungauged basins.

Conclusions
Relying on the nine processed watershed characteristic factors (including two new characteristic factors: mean soil particle size (Φ) and mean root depth (r)), this study carried out correlation analysis on the five model parameters of the BTOP model. We established the regression equation between the parameters and the selected watershed characteristic factors with 12 regionalization approaches, combining regression equations with traditional regionalization approaches. Overall, we explored the application of the regression-augmented approach for the BTOP model in five small ungauged target basins in the Jialing River Basin, China. The main conclusions are as follows: (1) This paper initially determined that among the five parameters of the BTOP model, the three parameters SD bar , m, and α have high correlation (R > 0.6) basin characteristic factors, and the Φ and r both played a significant role in establishing regression equations. We constructed linear regression equations between these three parameters, and the relevant watershed characteristic factors, which could contribute to the establishment of the parameter transfer function of the BTOP model in sparsely gauged regions.
(2) The present work verifies the applicability of the regression-augmented (RA) regionalization approach. In the case of the limited number of the research basins, the RA regionalization approach showed a considerable simulation effect in specific circumstances, indicating that the RA approach has some great potential.
(3) This paper fills the gap in the regionalization research of the BTOP model in the data-deficient basin. It also preliminarily evaluated the application of traditional and RA regionalization approaches, based on BTOP in the ungauged basins. The outcome of this study provides a reference for seeking the optimal BTOP regionalization approach in the target catchment, and a valuable experience for the river discharge simulation of small and medium-sized data-sparse catchments worldwide, especially for some developing countries with limited conditions. Future work of the regression-augmented regionalization approach of ungauged or sparsely gauged basins, to establish a more representative regression equation and find a regression equation that better combines with the traditional regionalization approach, should focus on the following aspects: introducing more donor basins, classifying the participating basins in advance, increasing the number of watershed characteristic factors, and establishing the regression equation by determining the optimal set of watershed characteristic factors for each sensitive parameter.
Author Contributions: T.A., L.Z. and Y.Z. designed this research; Y.Z., L.L., F.Q., L.Z., X.Z., T.C. and X.L. collected the data; Y.Z., L.L. and F.Q. analyzed the data and wrote the draft. L.Z. and T.A. revised the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding:
We gratefully acknowledge the Regional Innovation Cooperation Program (2020YFQ0013) and Key R&D Project (2021YFS028) from the Science and Technology Department of Sichuan Province, China Scholarship Council (201906240035).

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The study did not report any data.