Setting the Flow Accumulation Threshold Based on Environmental and Morphologic Features to Extract River Networks from Digital Elevation Models

Determining the flow accumulation threshold (FAT) is a key task in the extraction of river networks from digital elevation models (DEMs). Several methods have been developed to extract river networks from Digital Elevation Models. However, few studies have considered the geomorphologic complexity in the FAT estimation and river network extraction. Recent studies estimated influencing factors’ impacts on the river length or drainage density without considering anthropogenic impacts and landscape patterns. This study contributes two FAT estimation methods. The first method explores the statistical association between FAT and 47 tentative explanatory factors. Specifically, multi-source data, including meteorologic, vegetation, anthropogenic, landscape, lithology, and topologic characteristics are incorporated into a drainage density-FAT model in basins with complex topographic and environmental characteristics. Non-negative matrix factorization (NMF) was employed to evaluate the factors’ predictive performance. The second method exploits fractal geometry theory to estimate the FAT at the regional scale, that is, in basins whose large areal extent precludes the use of basin-wide representative regression predictors. This paper’s methodology is applied to data acquired for Hubei and Qinghai Provinces, China, from 2001 through 2018 and systematically tested with visual and statistical criteria. Our results reveal key local features useful for river network extraction within the context of complex geomorphologic characteristics at relatively small spatial scales and establish the importance of properly choosing explanatory geomorphologic characteristics in river network extraction. The multifractal method exhibits more accurate extracting results than the box-counting method at the regional scale.


Introduction
River networks constitute topographic features that are widely used in the analysis of regional geomorphic features and hydrogeological environments [1,2]. The accurate extraction of river networks is necessary for delineating water pollution sources, monitoring disturbances on rivers caused by human activities, determining flood levels in periods of heavy rain, assessing soil and water conservation measures, and enacting comprehensive management practices [3][4][5][6].
Much effort has been devoted to extracting river networks [7][8][9]. GeoNet [10] is a well-developed and representative open-source software for extracting channel networks based on geodesic minimization principles. However, the extraction efficiency is not high due to its complex geometric-based channel head recognition algorithm and the loosely ISPRS Int. J. Geo-Inf. 2021, 10, 186 2 of 26 coupled program design. The flow accumulation method still dominates in large-scale drainage network extraction from digital elevation model (DEM) data because of its simple form and efficient computational design [11][12][13][14][15][16]. The flow accumulation threshold (FAT) is a user-defined parameter that directly affects the structure and density of extracted river networks from DEMs [17]. Too large a FAT would omit useful details of river networks, and too small a FAT would extract pseudo rivers. This work focuses on the determination of the FAT seeking to maximize river extraction accuracy with computational efficiency.
Several methods have been proposed to determine the FAT. Tarboton et al. (1991) reported a quantitative method to determine the FAT through power functions involving slope and drainage area [18]. However, power functions must be validated further to evaluate the influences of land morphology, soil, and climate on channel initiation [19]. Widely used GIS tools apply 1% of the maximum flow accumulation value as a default FAT [20]. The FAT proposed by Tang (2000) and Olivera et al. (2002) is easily determined from flow accumulation statistics, but the accuracy is relatively low [20,21]. Jones (2002) applied a trial-and-error search to determine the FAT [22]. Trial-and-error is subjective and time-consuming for determining a suitable FAT. Lin et al. (2006) determined FAT with the headwater-tracing approach with a fitness index in Taiwan's upstream watersheds [19]. However, this method requires comparison with known stream data that must be obtained from aerial photography or a topographic map and it is time-consuming when applying it to multiple sub-basins. Tantasirin et al. (2016) determined the optimal FAT by considering the rate of change of the trend line of the "error stream cells" obtained with multiple thresholds [23]. FAT was obtained at the point where the difference of variation decreases, and the trend line approaches zero. This method is mainly used to produce a drainage network consisting of the shortest length of stream links (stream tributaries) and to delineate a watershed area into smaller-sized hillslopes. Gökgöz et al. (2006) assessed the effect of the FAT selection on the extraction of drainage lines and watershed delineation by considering the adjacency and direction relationships between the cells including the beginning points of the lines in the drainage network to be derived [24]. Ibrahim and Gökgöz (2018) improved the method proposed by Gökgöz et al. (2006) [17,24]. FAT was determined by considering the adjacency and direction functions between all cells including the drainage lines. The latter two algorithms are based on the assumption that among the cells containing parallel lines there must be other cells in the model to describe a ridge, which may or may not be the case.
Initiation of natural river networks depends on the climate, slope, lithology, vegetation cover, soil properties, and drainage density (δ), and is spatially variable [10,25,26]. The cumulative action of erosional processes over long timescales drives the formation of river networks commensurate with the evolution of large-scale topography [27]. Drainage (or, perhaps more accurately, river) density is defined as the total length of river reaches (l) within a basin divided by the basin area. It is a macroscale measure of the amount of area (say, in km 2 ) encompassing one unit length (in km) of a river reach [28]. Several studies have analyzed the spatial dependence of δ on environmental parameters at the continental scale. Luo et al. (2016) extracted the land dissection density (considered equivalent to the drainage density) within the United States with a geomorphological detection method and a 30 m resolution DEM and discerned a spatial dependence on climate, lithology, and several terrain-based attributes [29]. Schneider et al. (2017) proposed a new method to extract river networks based on the spatial variable δ as a function of slope, lithology, and climate with the 15" HydroSHEDS DEM in France and Australia [30]. In the Holocene, or perhaps more accurately, in the Anthropocene, human activities have gradually become the main driving force of global ecosystem degradation [31,32]. Song et al. (2019) concluded that the anthropogenic influence on river network structure is more significant than those of morphology and lithology in the urbanized regions by evaluating the river network change and its multifractal dynamics in the Yangtze River Delta [33]. River networks have become increasingly degraded, altered, or modified by anthropogenic actions such as road and reservoir construction [34]. Urban expansion has exerted dramatic changes The common procedure of extracting a river network involves filling sinks, calcul ing the flow direction, calculating the flow accumulation, and extracting the river netwo [48]. A deterministic eight-node (D8) algorithm is herein applied to obtain the flow dire tion with ArcGIS 10.6 [49]. The D8 algorithm determines the flow direction of each gr cell by choosing the steepest among a set of slopes oriented towards the neighboring eig cells. The FAT must be specified prior to extracting a river network. The determination the optimal FAT is herein made with two methods, as shown in the methodolo flowchart displayed in Figure 1. The first method calculates the FAT at the small bas scale by means of an empirical scaling equation (see Section 2.1). The second method c culates the FAT at the regional scale with the fractal method (see Section 2.2). The seco method is intended for FAT determination when the large areal extent of a river bas precludes the use of basin-wide representative regression predictors. The application the fractal method at the small basin scale is not addressed in this study. The accurac of the river-network extracting methods are assessed based on the similarity score eval ation index (SSEI), river length error (err(l)), root mean square error (RMSE), and quali tive visual inspection (see Section 2.6). The first method is also compared with seve classical FAT determination methods and is evaluated with a robustness test.

Empirical Scaling Equation Relating the Drainage Density and the FAT
The drainage density of basin is given by Equation (1): = in which , , and denote the drainage area, the length of river reaches, and t drainage area of basin , respectively. Tarboton et al. (1991) [18] proposed an empiric function expressing the drainage density in terms of the flow accumulation threshold basin ( ) and drainage length is given by Equation (2):  The drainage density of basin i is given by Equation (1): in which δ i , l i , and A i denote the drainage area, the length of river reaches, and the drainage area of basin i, respectively. Tarboton et al. (1991) [18] proposed an empirical function expressing the drainage density in terms of the flow accumulation threshold of basin i (FAT i ) and drainage length is given by Equation (2): ISPRS Int. J. Geo-Inf. 2021, 10, 186

of 26
K denotes a proportionality constant that is empirically determined. Equation (2) may be interpreted as an empirical "scaling law" for the FAT given that it can be solved as a function of the drainage density.

Estimation of the River Length
The river length is estimated by multilinear regression involving multiple factors or regressor variables. A list of considered factors is presented in Table 1. The dependent variable of the regression equation is the total length of the river network within each basin. Predictive factors exhibiting strong correlations (determined with the Pearson's correlation coefficient) are filtered for the purpose of reducing data redundancy and model complexity. The reference river networks are obtained from the National Earth System Science Data Center, National Science and Technology Infrastructure of China, after being corrected by remote sensing images (0.4 m spatial resolution from GeoEye) accessed from Google Earth. The total length of the reference river network within each basin is calculated with the ArcGIS software. The degree of association between selected regression factors was further verified by nonnegative matrix factorization (NMF). NMF has previously been shown to be a useful decomposition for multivariate data belonging to unsupervised learning [50]. This study introduces the NMF to reflect the effect of several environmental factors on the estimation of river length [51]. A detailed description of NMF is presented in Section 3.1.3. NMF captures the data traits by identifying the correlation between data parts and finding internal patterns of association among the data. The final model was constructed by the multilinear stepwise regression method with the software SPSS 22.0, as described in the Results section.

The Estimation of the FAT at the Regional Scale
The modified box-counting algorithm and the multifractal algorithm for FAT calculation applied in this study were proposed by De Bartolo et al. (2004) and Ariza-Villaverde et al.
(2015) [40,52]. The determination of the FAT by box-counting and the multifractal method are compared in this study.

The Box-Counting Method
A mesh of d-dimensional cubes of size r d is used to discretize the region within which a river network is defined [53]. Let N (r) denote the smallest number of cubes of size r needed to cover a region. This work extracts river networks relying on thirteen FATs which range from 1000 to 25,000. The corresponding fractal dimensions of the thirteen extracted river networks were calculated.
The formula for computing the fractal dimension D f is as follows [54]: Taking the logarithm (log) on both sides of Equation (3) yields: D f equals the slope of a plot of N(r) vs. r in logarithmic scales. The rate of change d i of the fractal dimension is given by Equation (5): where V i , V i+1 denote the ith and (i+1)th thresholds, respectively (the FAT = 1000 corresponds to V 1 ). D i , D i+1 denote respectively the box-counting fractal dimensions corresponding to the ith and (i+1)th flow accumulation thresholds. D i+1 − D i equals the fractal dimension difference of the V i , V i+1 thresholds, and V i+1 − V i denotes the difference of the thresholds.
There is a fractal dimension corresponding to a river network extracted with a FAT. That is, there is an optimal FAT associated with the shape of the function of the rate of change d i . The FAT associated with the optimal estimate of the river network is determined from the shape of the function of the rate of change d i . Specifically, the FAT for river network extraction occurs at that point defining where the change rate of the fractal dimension exhibits a rapidly declining trend. This method is illustrated in the Results section (Section 3.2.1, specifically).

The Multifractal Method
The length of a river network is herein employed as the performance variable with which to assess the performance of river extraction methods. The box-counting method is implemented to calculate the probability distribution of the river network's length [55]. Specifically, N(ε) squares of size ε · ε are used to superimpose a grid over the river network. The sum of a river network's length within a box of size ε · ε is calculated and divided by the sum of the lengths of all rivers within the entire basin. The river distribution ratio, Q i (ε), with box size ε · ε is given by the following expression: where M i (ε) equals the summed lengths of the rivers in a box of size ε · ε, and M denotes the total lengths of the rivers within the river network spanning the entire study area. The partition function X q (ε) is defined as follows [56]: The partition function X q (ε) denotes the q-th quadratic weighted summation of the probability, where N(ε) represents the number of grid cells with box size ε, and q ∈ (−∞, +∞) denotes the moment order. τ(q) denotes the sequence of mass exponents estimated by the slope of the fitted function ln X q (ε) vs ln(ε). The value of q reflects the importance of the subset of the distribution ratio Q i (ε) in the partition function [57]. When q→+∞ the maximum probability plays a significant role. X q (ε) reflects the nature of the subset with the largest probability. On the other hand, when q →−∞ the minimum probability is dominant, representing the subset with the smallest probability. The multifractal dimension D(q) is calculated as follows: τ(q) is zero when q = 1. A larger weight is assigned to cells having a larger amount of information (i.e., a greater mass) when q > 1; smaller weights are assigned to cells where the quantity to be measured is less concentrated (when q < 1). The singular exponent α(q) and the multifractal spectrum f (α(q)) are deduced from the Legendre transform as follows: where the smallest and largest values of the singular exponent α (i.e., α min and α max , respectively) constitute the minimum and maximum Lipschitz-Holder exponents, respectively, and ∆α = α max − α min denotes the range of the multifractal exponent that quantitatively characterizes the degree of difference and the range of variation within the river network. A large value of ∆α indicates the river network exhibits greater heterogeneity and complexity, and vice versa. The singular exponent α reflects the structural characteristics of the river network. The singular exponent changes with changes in the FAT. When the FAT is too large the extracted river network omits details in the river network; when it is too small a virtual "pseudo-river network" is extracted. The river network extracted with an optimal FAT captures the watershed-scale structure and does not include small, artificial, river networks. The optimal FAT corresponds to an extremal point of the line of the maximum singular exponent expressed as a function of the FAT, as shown in the Results section. f (α max ) and f (α min ) represent the maximum and minimum spectral values, respectively.
∆ f = f (α max ) − f (α min ) denotes the multifractal spectral elevation difference, which indicates the structural changes between subsets with different probabilities associated with the fractal dimension. ∆ f > 0 when the number of maximum probability subsets exceeds that of the minimum probability subsets. Similarly, ∆ f < 0 when the number of minimum probability subsets exceeds that of the maximum probability subsets. Furthermore, ∆ f = 0 when the number of maximum probability subsets is equal to that of the minimum probability subsets.

Study Area and Original Data
Two regions in China were selected as study areas to implement this paper's methods, namely the middle Yangzi River delta and the northeastern Tibetan Plateau. The Yangzi River delta lies in eastern China and encompasses 14 municipalities, covering an area equal to 95,400 km 2 . It produces 37.6% of exports and attracts 41.8% of transnational investments, contributing 17.6% to the GNP of China, and harboring 7.6% of the Chinese population in 2010 [58]. Hubei Province (108 • 21 42" to 116 • 07 50" E, 29 • 01 53" to 33 • 06 47" N) is located in the middle part of the Yangzi River delta. It features numerous rivers and lakes, and thus has been called the "land of a thousand lakes". Most rivers in Hubei Province are part of the Yangtze River fluvial network. In addition, this province is host to a variety of landforms, including mountains (about 56%), plains (about 20%), and hills (about 24%) [59]. Hubei Province has a sub-tropical, humid, and monsoonal climate. Its annual mean temperature ranges from 13 • C to 18 • C, and the average annual precipitation ranges between 800 and 1609 mm depending on location, most of which falls in the wet season [60]. The wet season ranges from May through September. The geographical distribution of vegetation in the Yangtze River Basin is heterogeneous. The climate, topography, and vegetation of Hubei Province interact to form the Yangtze River network [61,62].
The Tibetan Plateau is the source of several major rivers such as the Yangtze, Yellow, and LanCang rivers (the three rivers). Qinghai province (89 • 35 to 103 • 04 E, 31 • 9 to 39 • 19 N) lies in the northeastern part of the Qinghai-Tibet Plateau, with average elevation ranging from 3000 to 5000 m. The climate is dry and cold. There are many rivers and lakes in Qinghai Province, and its topography and climate patterns are complex [63]. The speed of industrialization and urbanization is accelerating. The locations of the study areas are shown in Figure 2. The data used for model building span from 2001 to 2018 (May to September), and include ten main components: (1) Actual river network dataset: The 1:25,0000 reference river networks were obtained from the National Earth System Science Data Center, National Science and Technology Infrastructure of China (http://www.geodata.cn, accessed on 1 August 2020). The data were further corrected by remote sensing images (0.4-m spa- The data used for model building span from 2001 to 2018 (May to September), and include ten main components: (1) Actual river network dataset: The 1:25,0000 reference river networks were obtained from the National Earth System Science Data Center, National Science and Technology Infrastructure of China (http://www.geodata.cn, accessed on 1 August 2020). The data were further corrected by remote sensing images (0.4-m spatial resolution from GeoEye) accessed from Google Earth. Read Yan et al. (2019) for the datacorrection procedure [64]. (2) The normalized difference vegetation index (NDVI): The NDVI data used in this study were obtained from the MOD13A3 version 6 products ( https://search.earthdata.nasa.gov/search, accessed on 1 August 2020), which provide monthly measurements at a 1 km spatial resolution. (3) Topographic data: The ALOS DEM with 12.5 m resolution, downloaded from the Alaska Satellite Facility (ASF) Distributed Active Archive Center (DAAC), was used to identify and extract drainage networks in the study area. The ALOS DEM from which maximum altitude, average altitude, maximum slope, average slope, maximum surface roughness, and average surface roughness were extracted has a resolution of 12.5 m at the equator. (4) Precipitation data: the data issued from the GPM_3IMERGM 06 product (https://disc.gsfc.nasa.gov/, accessed on 1 August 2020) with a resolution of 0.1 • . (5) Surface runoff and evapotranspiration data: the data were provided by GLDAS_NOAH025_M 2.1 (https://disc.gsfc.nasa.gov/, accessed on 1 August 2020). Surface runoff and evapotranspiration data were extracted from the datasets with a resolution of 0.25 • . (6) Surface temperature: the data acquired from the MOD11C3 product (https://search.earthdata.nasa.gov/search, accessed on 1 August 2020) had a spatial resolution of 0.05 • . (7) Population data: The population distribution was provided by the LandScan TM (2018) High-Resolution Global Population Data Set (https://landscan. ornl.gov/, accessed on 1 August 2020), which has a spatial resolution of approximately 1 km (30" × 30"). (8) Luojia 1-01 Night-time Light Imagery: the images were jointly provided by Wuhan University and the Gaofen Hubei Center (http://www.hbeos.org.cn/, accessed on 1 June 2020) with a resolution of 130 m. Night-time lights mainly reflect human activities at night and include residential lighting, traffic lights, commercial lighting, and factory lighting. Therefore, night-time lighting area reflects the spatial distribution of residential, transportation, commercial, and factory lighting [65]. (9) Land use data: these data were acquired from the MCD12Q1 product (https://search.earthdata.nasa.gov/search, accessed on 1 June 2020) with a 500 m resolution and comprise seventeen types (see Table S1 in the online Supplemental Materials). The last five land-use types of Table S1 are not found in Hubei province. The evergreen broadleaf forests closed shrublands, and cropland/natural vegetation mosaics are not found in Qinghai province. Land pattern indicators were computed from the land use data. (10) Lithology describes the geochemical, mineralogical, and physical properties of rocks [66]. Global lithology data acquired from the Global Lithological Map Database v1.0 (http://dx.doi.org/10.1594/PANGAEA.788537, accessed on 1 July 2020) with a 0.5 • resolution and comprised sixteen types. The Hubei study area includes seven types of soil and rock (su, ss, sm, sc, mt, pa , and py). The Qinghai study area includes twelve types of soil and rock (su, ss, sm, sc, mt, pa, pb, va, vi, wb, pi and ig). The full name of lithology is listed in Table S2 in the online Supplemental Materials. The data description with 47 features is listed in Table 1. The detailed description of each feature is displayed in Table S3.

Data Pre-Processing and Normalization
The topographic indexes were extracted with the spatial analyst toolbox of ArcGIS 10.6, namely, maximum terrain altitude, mean terrain altitude, maximum slope, mean slope, maximum surface roughness, and mean surface roughness. The surface area ratio grids are applied to measure the surface roughness [67]. The steps to calculate the surface roughness are as follows: (1) calculate the slope of DEM; (2) calculate the surface roughness = 1 /Cos ([angle of slope of DEM] * π/180) [68][69][70].
The population ratio (Q) is calculated with Equation (13). Two light indicators were constructed from Night-time Light Imagery: basin light area (L 1 ) and basin light intensity (L 2 ), as shown in Equation (14) and Equation (15), respectively. Equation (16) shows how the digital numbers (DN) are converted to the radiance for accurate analysis of lighting brightness and discrepancy. Thirdly, landscape pattern indexes including landscape percentage, patch density, Shannon's diversity index, and Species evenness were computed by FRAGSTATS 4.2 [71]. Finally, all data were projected to the same coordinate system.
where U i denotes the population size of basin i. A denotes the population of the province.
where L 1 , S N , and S denote the urbanization sprawl, the lighting area of a basin, that is, the total area of lights in a basin, and the total area of a basin, respectively [65].
where L 2 , G i , G max , and n denote the urbanization intensity, the radiance of the i-th pixel in a basin (indicating light intensity of the i-th pixel), the maximum radiance of the basin, and the total number of lighted pixels in a basin, respectively [65].
where G, and D denote the radiance of a pixel in the Night-time Light Image (W/(m 2 ·sr·µm)) and the Digital Number (DN) of a pixel [72]. Z-score normalization was adopted for comparison purposes to render the data dimensionless according to Equation (17): where X, µ, and σ denote the random variable that takes values equal to x, the mean value, and the standard deviation of the independent or dependent variable X, respectively.

Sample and Test Areas
The sample and test areas feature unique morphological characteristics with variable elevations, slopes, surface roughness, landform types, and network morphology. This study employs 27 sample areas in Hubei and Qinghai Provinces for model building. The sample and test areas are depicted in Figure 2a,b, respectively. These areas were obtained from a global dataset (https://doi.org/10.6084/m9.figshare.8044184.v6, accessed on 1 July 2020) with a 3-arcsecond resolution which has higher precision than the HDMA dataset released by the USGS [64].
The test areas are employed for model testing under a wide range of conditions. There are eight and three test areas in Hubei and Qinghai Provinces, respectively. The NDVI and average elevations of test areas of Hubei Province are significantly lower than those of Qinghai Province. Of the eight test areas of Hubei three serve to test the determination of the FAT based on an empirical scaling equation and statistical regression. These three independent sub-basins are depicted as yellow polygons in Figure 2a. The average elevation of test area 2 is above 1000 m a.s.l., which is significantly higher than those of the other test areas. The average elevations of test area 1 is near 500 m a.s.l., and has fewer mountainous features than test area 2. Test area 3 encompasses plain landform, with average elevation near 200 m a.s.l., and features relatively mild slopes compared with other test areas of Hubei Province. The other five test areas in Hubei Province were obtained with the fishnet tool of ArcGIS 10.6. Each has an area equal to 400 km 2 area. These five test areas are used for the evaluation of the FAT algorithm at the regional scale and are shown as yellow rectangles in Figure 2a. The test areas in Qinghai Province are depicted as yellow polygons in Figure 2b. The average elevations of test areas (Figure 2b) are over 4400 m a.s.l. The test areas in Qinghai Province are used for evaluating the FAT model based on an empirical scaling equation and statistical regression.

Validation Method
This paper compares the extracted with the actual river network in each area based on three validation criteria. One is the modified matching algorithm based on the Frobenius norm [73] and SSEI. The steps to implement this criterion are as follows: (1) Image pre-processing: Carrying out image binarization with 0 and 1 representing background (i.e., non-river) area and the river network, respectively; (2) The Frobenius norm of the actual river network is calculated; (3) Difference matrices between the actual and extracted river networks are calculated with several algorithms. This is followed by a calculation of the Frobenius norm of each difference matrix.
(4) The evaluation score is determined by the Frobenius norm calculated in step (3) divided by the Frobenius norm of the actual river network. The SSEI is given by Equation (18): where score(i) denotes the similarity between the actual and extracted river networks in basin i; a represents the binary matrix of the actual river network and i is that of the extracted river network calculated by different algorithms, norm(a) denotes the value of the Frobenius norm of the actual river network, and norm(a − i) denotes the Frobenius norm of the matrix difference between the actual and the extracted network in basin i. The similarity between the actual and the extracted river networks increases with decreasing value of the SSEI. The second evaluation criterion is the percentage error of the river length, which is given by the following equation: where err(l), l e , and l denote the percentage river length error, the total length of rivers within the extracted river network, and the total length of the rivers in the actual river network. The err(l) is a quantitative criterion well suited to assess the accuracy of extracted river networks. It is herein supplemented with visual inspection by comparison of the extracted and actual rivers networks as shown in the Results section. The third evaluation criterion is the root mean square error (RMSE), which is expressed as follows: where N denotes the total number of rivers. The l e and l have been defined in Equation (19).  (21) where l th denotes the estimated length of rivers in Hubei province (in m) based on the environmental factors within the basins. The meanings of the symbols of the predictive factors listed on the right-hand side of Equation (21) are found in Table 1. The coefficient of determination (R 2 ) associated with the regression Equation (21) equals 0.902.

Results and Discussion
Four characteristics were selected from among the 40 characteristics (Table S3) considered as predictive factors to predict the river length of Qinghai province according to multiple stepwise regression. The estimation of river length is based on Equation (22): where l tq denotes the estimated length of rivers in Qinghai province (in m). The meaning of the symbols and parameters can reference Equation (21). The coefficient of determination (R 2 ) associated with the regression Equation (22) equals 0.725. The formation of the river networks is governed by physical-geographic characteristics. Hubei Province has a sub-tropical and humid climate with relatively high temperatures in the wet season (May through to September). It features a relatively high population density. Evapotranspiration, surface temperature, and population ratio play a significant role in the formation of river networks (see Equation (21)). The climate of Qinghai Province, on the other hand, is dry and cold; its population density is relatively sparse (319/km 2 ) compared with that of Hubei province (7.80/km 2 ). Equations (21) and (22) indicate that anthropogenic factors and surface roughness are useful predictors of river length. The regression factors vary from region to region, in general.

Estimating the FAT
The river networks were extracted from DEMs with the ArcHydro tools of ArcGIS 10.6 software within basins with different FATs and calculated the lengths of corresponding extracted river networks. Graphs relating the drainage density to the FAT in Hubei Province are shown in Figure 3a-c. An interval equal to 1% was applied to calculate the percentile of flow accumulation. Notice the fitted equations in Figure 3a-c take the form δ = KFAT −b with an average of b = 0.50 as implied by Equation (2). The optimal FAT was estimated by the equations fitted to each basin shown in Figure 3a-c based on the estimated river length of Equation (21). The optimal FAT is given by Equation (23): Figure 3d-f depicts the captured and missed streams in test areas 1, 2, and 3 in Hubei Province.
Graphs relating the drainage density to the FAT in Qinghai Province are shown in Figure 4a-c. An interval equal to 1% was applied to calculate the percentile of flow accumulation. Notice the fitted equations in Figure 4a-c take the form δ = KFAT −b with an average of b = 0.50 as implied by Equation (2). The optimal FAT was estimated by the equations fitted to each basin shown in Figure 4a-c based on the estimated river length of Equation (22), in a manner analogous to the approach followed with Hubei Province areas.

NMF
The NMF method was applied to reflect the importance of different features in Equation (21) (for Hubei Province) and Equation (22) (for Qinghai Province). Taking Equation (21) as an example, the NMF quantifies the strength of association between factors (see a list in Table 1) and river length. Environmental, human, and topographic factors have multiple interactions and influence river length. Matrix decomposition in NMF performs a mapping of the original matrix to a subspace. The mapping is performed by approximating the low rank of the original matrix. ALT, SR max , LTD, Q, l, and EVP data from 27 sample sub-basins were normalized by means of Equation (17). M denotes a 27 row × 6 column matrix: The mean value of the NMF was calculated 1000 times and the relations between the factors were expressed by a heat map ( Figure 5). The third row of Figure 5a shows the influence of SR max on l is relatively weak in comparison to other factors. Thus, the factor SR max was excluded from the original matrix, and NFM was performed on the remaining 5 factors of the matrix. The last two rows of Figure 5b denote the influences of ALT on l is less significant, and NMF was performed on the remaining 4 factors, excluding ALT. Figure 5c shows that EVP is closely related to the LTD (see the first row), and river length exhibits a high correlation with EVP in the last row, which demonstrates the relatively large contribution of factor EVP. Overall, EVP, Q, and LTD ranked among the top three factors in terms of importance. The same procedure was repeated to the factors of Equation (22). L 2 ranked as the top factor in terms of importance. The introduction of the anthropogenic factor significantly enhanced both models' accuracy. The NMF clarifies the identification of key factors employed in Equations (21) and (22). is less significant, and NMF was performed on the remaining 4 factors, excluding ALT ̅̅̅̅̅ . Figure 5c shows that EVP is closely related to the LTD (see the first row), and river length exhibits a high correlation with EVP in the last row, which demonstrates the relatively large contribution of factor EVP. Overall, EVP, Q, and LTD ranked among the top three factors in terms of importance. The same procedure was repeated to the factors of Equation (22). L2 ranked as the top factor in terms of importance. The introduction of the anthropogenic factor significantly enhanced both models' accuracy. The NMF clarifies the identification of key factors employed in Equations (21)  The correlations between actual drainage density in the sample areas and rock types are shown in Table 2 (taking Hubei province as an example), where the areas are arranged and ranked in descending order by the magnitude of δ. Rock type which covers less than 30% of sub-basin areas was not included as a dominant factor. δ is largest for the carbonate sedimentary rocks and unconsolidated sediments compared to other rock types in Hubei province. δ is the largest for the mixed sedimentary rocks in Qinghai province. Table 2. Calculated drainage density (δ) and rock type in the sample areas in Hubei Province (see Table 1 for symbology).  The correlations between actual drainage density in the sample areas and rock types are shown in Table 2 (taking Hubei province as an example), where the areas are arranged and ranked in descending order by the magnitude of δ. Rock type which covers less than 30% of sub-basin areas was not included as a dominant factor. δ is largest for the carbonate sedimentary rocks and unconsolidated sediments compared to other rock types in Hubei province. δ is the largest for the mixed sedimentary rocks in Qinghai province. Table 2. Calculated drainage density (δ) and rock type in the sample areas in Hubei Province (see Table 1 for symbology).

Validation
The SSEI was applied to measure the extracted river network with the actual river network based on image similarity. The extracted river network was calculated by the ArcHydro Tools of ArcGIS 10.6 based on the estimated optimal FAT. The performance of the drainage density-FAT model was tested statistically, with results for Hubei Province listed in Table 3. The image similarity of test area 2 is the highest with a low SSEI (0.356) while that of test area 1 is the lowest with a relatively high SSEI (0.441). The results evaluated by means of the river length error do not generally coincide with those from the SSEI. The smallest length error of the extracted river network was 1.17% (corresponding to test area 1) and the average length error was 4.89%. All length errors were smaller than 15%. The RMSE of extracted river length equaled 1.109 × 10 4 m. Overall, the three indexes (SSEI, err(l), RMSE) demonstrated that the drainage density-FAT model performed well in basins of Hubei Province.  Table 4 lists the statistical criteria values for the comparison of the calculated and actual river length in test areas of Qinghai Province (Figure 2b). The smallest length error of the extracted river network was 3.56% (corresponding to test area 1) and the average length error was 11.44% in test areas of Qinghai Province. The RMSE of extracted river length equaled 2.135 × 10 4 m. The image similarity of test area 3 is the highest with a low SSEI (0.478), while that of test area 1 is the lowest which does not coincide with the results of length error. Thus, a comprehensive evaluation must consider the image similarity and length error of river networks. Tables 3 and 4 provide evidence that the river length error in test area 2 is larger than in areas 1 and 3. This may be due to the fact that the average altitude of test area 2 is higher than those of areas 1 and 3 in the Hubei and Qinghai provinces. The accuracy of regression factors significantly influences the performance of the drainage density-FAT method. The satellite products that are bias-corrected are more accurate than the satellite-only products because the meteorological stations are densely distributed in flat areas. The performance of the MODIS satellite products is relatively poor in the high-altitude area with complex terrain [74]. Moreover, previous studies have shown that the ecosystem exhibits greater vulnerability [75] with increasing altitude, which may influence the land pattern indicators.
In addition, the drainage density-FAT method was compared with Lin's method, known as the Fitness Index method hereafter [19], the OnePercent method [20], and the Mean method [21]. Choosing test area 2 of Qinghai Province as an example, the FAT was estimated by the Fitness Index method at 135,000. The FAT was calculated as 1,884,010 × 0.01 = 18840.1 based on the maximum accumulation value according to the OnePercent method. The mean accumulation value for the FAT was 1908 according to the Mean method. Table 5 indicates that a drainage network of acceptable accuracy was obtained by the drainage density-FAT method. In addition, the drainage density-FAT method is less computationally burden than the Fitness Index method. The 10-fold cross-validation [76] was applied to quantify the variation in performance due to the choice of sample and test cases (choosing Qinghai province as an example). The results indicate the average length error estimated by different models ranges from 11.42% to 30.35%. Most river length errors are within 25%. The model accuracy would slightly fluctuate with the change of sample and test datasets, but its accuracy remains acceptable.
Next, test area 2 of Qinghai province was discarded from the dataset. The 10-fold crossvalidation was applied to the remaining dataset. The river length of test area 2 (Qinghai Province) was calculated by different models. The results indicate that the regression factors L 2 , SHDI, and SEI play a significant role in ten models, which is consistent with the factors selected by Equation (21). The parameters of the objective equation changed with different training groups, yet, the average length error estimated by models only slightly changed from 24.63% (estimated by Equation (21)) to 22.37%. This robust performance confirms the accuracy of the drainage density-FAT model.

Estimation of the FAT at the Regional Scale (Hubei Province)
Fractal geometry methods are applied in this study to estimate the optimal FAT by considering the morphologic characteristics of river networks (using Hubei Province data), followed by an evaluation of the corresponding results.

Estimation of the Optimal FAT with the Box-Counting Method
Box-counting was applied to ascertain the relation between the fractal dimension and the FAT (see Table 6). The fractal dimension of river networks whose cube sizes r equaled 500, 1000, 2000, 5000, 10,000, 15,000, and 20,000 were calculated. The values of r and FAT used in this study may be not suitable for other regions. The selection of the r and FAT data must be based on the river network data's statistical characteristics (i.e., drainage density, area). The variable difference between chosen FATs (e.g., 500, 1000, 2000, 5000, etc.) does not alter the box-counting method results. The fractal dimension of the river network declined from 1.851 to 1.432 as the FAT increased respectively from 1000 to 25,000, implying the fit between the fractal dimension and the FAT was excellent. It is seen in Table 6 the rate of change of the fractal dimension stabilizes at 10,000, and the change rate diminished more rapidly when the FAT is equal to or larger than 15,000. Thus, the optimal FAT is estimated to be close to 10,000 for the entire study area. According to Figure 6, the fractal dimension decreases with increasing FAT. When the FAT increases, the slope of the function decreases. The fractal dimension of the reference river network (i.e., the river network) was estimated to be 1.543 which corresponds to a FAT equal to 10,000. This is the ascertained optimal FAT of the study area established with the box-counting method.
RS Int. J. Geo-Inf. 2021, 10, 186 19 o used in this study may be not suitable for other regions. The selection of the r and F data must be based on the river network data's statistical characteristics (i.e., drain density, area). The variable difference between chosen FATs (e.g., 500, 1000, 2000, 5 etc.) does not alter the box-counting method results. The fractal dimension of the r network declined from 1.851 to 1.432 as the FAT increased respectively from 100 25,000, implying the fit between the fractal dimension and the FAT was excellent. It is s in Table 6 the rate of change of the fractal dimension stabilizes at 10,000, and the cha rate diminished more rapidly when the FAT is equal to or larger than 15,000. Thus, optimal FAT is estimated to be close to 10,000 for the entire study area. According to Figure 6, the fractal dimension decreases with increasing FAT. W the FAT increases, the slope of the function decreases. The fractal dimension of the re ence river network (i.e., the river network) was estimated to be 1.543 which correspo to a FAT equal to 10,000. This is the ascertained optimal FAT of the study area establis with the box-counting method.

Estimation of the Optimal FAT with Multifractal Analysis
River networks were extracted with FATs set equal to 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10,000, 15,000, 20,000, 25,000. The distribution index Q(ε) was calculated based on box sizes 500 × 500, 1000 × 1000, 2000 × 2000, 5000 × 5000, 10,000 × 10,000, 15,000 × 15,000, and 20,000 × 20,000. The q-th order moment was set in the range [−3,3], and the step ∆q was 0.5. Figure 7 displays thirteen sequences of mass exponents τ(q), thereby illustrating τ(q) is a convex function, which indicates there is a nonlinear function between τ(q) and q. Therefore, the river network in Hubei Province exhibits multiple fractal characteristics. The maximum singular exponent αmax exhibited a decreasing trend with increasing FAT. Figure 8 depicts a graph of the change of αmax vs. FAT. It is seen in Figure 8 there are three extremal points corresponding to FATs equal to 3000, 5000, and 9000. Thus 3000, 5000, and 9000 are possible optimal FATs of Hubei province. The optimal FAT was further estimated by the range of the singularity exponent ∆ . The results listed in Table 7  ∆ describes the degree of heterogeneity of the probability distribution of river networks. It is demonstrated below there is no significant difference between the probability density distributions of the actual river networks in Hubei Province, that is, their distributions are similar. Comparing the ∆ associated with FATs equal to 3000, 5000, and 9000, shows the ∆ changes significantly with changing FAT. ∆ is smaller when the FAT = 9000 than when it equals 3000 or 5000. A FAT = 9000 is estimated as being optimal within the entire study area given the heterogeneity of river networks in Hubei province.  The maximum singular exponent α max exhibited a decreasing trend with increasing FAT. Figure 8 depicts a graph of the change of α max vs. FAT . It is seen in Figure 8 there are three extremal points corresponding to FATs equal to 3000, 5000, and 9000. Thus 3000, 5000, and 9000 are possible optimal FATs of Hubei province. The optimal FAT was further estimated by the range of the singularity exponent ∆α. The results listed in Table 7 indicate α min ∈ [0.926,1.448], α max ∈ [2.033,2.937], f (α max ) ∈ [0.209,1.317], and f (α min ) ∈ [−0.239,0.951]. ∆α describes the degree of heterogeneity of the probability distribution of river networks. It is demonstrated below there is no significant difference between the probability density distributions of the actual river networks in Hubei Province, that is, their distributions are similar. Comparing the ∆α associated with FATs equal to 3000, 5000, and 9000, shows the ∆α changes significantly with changing FAT. ∆α is smaller when the FAT = 9000 than when it equals 3000 or 5000. A FAT = 9000 is estimated as being optimal within the entire study area given the heterogeneity of river networks in Hubei province. The maximum singular exponent αmax exhibited a decreasing trend with increasing FAT. Figure 8 depicts a graph of the change of αmax vs. FAT. It is seen in Figure 8 there are three extremal points corresponding to FATs equal to 3000, 5000, and 9000. Thus 3000, 5000, and 9000 are possible optimal FATs of Hubei province. The optimal FAT was further estimated by the range of the singularity exponent ∆ . The results listed in Table 7  ∆ describes the degree of heterogeneity of the probability distribution of river networks. It is demonstrated below there is no significant difference between the probability density distributions of the actual river networks in Hubei Province, that is, their distributions are similar. Comparing the ∆ associated with FATs equal to 3000, 5000, and 9000, shows the ∆ changes significantly with changing FAT. ∆ is smaller when the FAT = 9000 than when it equals 3000 or 5000. A FAT = 9000 is estimated as being optimal within the entire study area given the heterogeneity of river networks in Hubei province.    (1) minimum singular exponent; (2) fractal spectrum corresponding to αmin; (3) maximum singular exponent; (4) fractal spectrum corresponding to α max ; (5) range of the singularity exponent; (6) multifractal spectral elevation difference. Figure 9 demonstrates that multifractal spectral curves of the river networks have similar shapes with different thresholds. The multifractal spectrum f (α) first shows an increasing trend with increasing α, followed by a decreasing trend ∆f (α) < 0 (see Table 7). These results reveal the number of minimum probability subsets in the river networks in Hubei Province exceeds that of the maximum probability subsets. Therefore, the minimum probability subsets are dominant, indicating the distribution of differences between the river networks in all directions is small and the river networks are relatively homogeneous in most areas. This result is consistent with the actual conditions in Hubei province. (1) minimum singular exponent; (2) fractal spectrum corresponding to αmin; (3) maximum singular exponent; (4) fractal spectrum corresponding to αmax; (5) range of the singularity exponent; (6) multifractal spectral elevation difference. Figure 9 demonstrates that multifractal spectral curves of the river networks have similar shapes with different thresholds. The multifractal spectrum f(α) first shows an increasing trend with increasing α, followed by a decreasing trend Δf(α) < 0 (see Table 7). These results reveal the number of minimum probability subsets in the river networks in Hubei Province exceeds that of the maximum probability subsets. Therefore, the minimum probability subsets are dominant, indicating the distribution of differences between the river networks in all directions is small and the river networks are relatively homogeneous in most areas. This result is consistent with the actual conditions in Hubei province.

Validation
Five test areas were chosen to further verify the accuracy of the optimal FAT estimated by the box-counting and the multifractal method (the test areas are depicted in Figure 2a with the rectangular shape).

Validation
Five test areas were chosen to further verify the accuracy of the optimal FAT estimated by the box-counting and the multifractal method (the test areas are depicted in Figure 2a with the rectangular shape).
(1) Similarity score Table 8 displays the SSEI calculated by two algorithms which shows the degree of image matching between the actual and the extracted river networks. The river network extracted with the multifractal method achieved greater similarity with the actual river network than the river networks extracted with the box-counting method, featuring a lower similarity score than box-counting except in test area 02. (2) River length error The percentage errors associated with the river length among the 5 test areas with boxcounting and multifractal models are listed in Table 8. In terms of the box-counting model, the err(l) of test areas 03, 04, and 05 were over 25%. Compared with the average err(l) (26.18%) and RMSE (2.728 × 10 4 m) of box-counting, the average err(l) of the multifractal model's calculation of the river length of the five test areas equaled 21.44%, and the RMSE of the river length equaled 2.222 × 10 4 m.
The fractal dimension reflects the morphologic characteristics of basins and indicates the degree of land erosion and river network evolution. The larger the fractal dimension is, the more advanced the evolutionary stage of a river network. The fractal dimension calculated in this study was in the range of 1.432 ≤ D f ≤ 1.851. The method for classifying erosional stages proposed by He and Zhao (1996) suggested the basins of Hubei Province are in their early stage of erosion [77]. Lateral and down-cutting erosion played a central role during this stage.
Overall, the image similarity reflected by SSEI and err(l) indicate the modified multifractal performed better in estimating the optimal FAT than the modified box-counting model at the regional scale.

Discussion
This work's results provide a theoretical basis for the accurate determination of the FAT and river network extraction. The D8 algorithm is relatively simple and convenient and has found wide applicability in hydrologic analysis. Its popularity stems from its simplicity and reasonable representation for convergent flow conditions, and it preserves the consistency between the flow patterns, calculated contributing area, and spatial delineation of subcatchments [44,78]. Thus, it was applied in this study. Other algorithms (e.g., D∞ and MD∞) may improve the calculation of flow directions on a topographic surface, and their evaluation is the topic of future research. These authors will evaluate the performance of the multifractal method at the basin scale in future work.

Conclusions
The hydrologic analysis must produce extracted drainage networks with accuracy for numerous applications. Depending on the intended application the FAT is not unique. For other purposes, drainage networks at a more or less detailed level may be needed. This study extracted river networks with suitable accuracy with the drainage density-FAT and multifractal methods. The drainage density-FAT method is a quantitative procedure that considers the physiographic and geomorphologic features of basins. The fractal method can be seen as complementary to the drainage density-FAT method when the large areal extent precludes the use of basin-wide representative regression predictors. The methods' performances were evaluated in this work with statistical indexes and visual inspection, and compared with several FAT determination methods with data from Hubei and Qinghai Provinces, China. These authors conclude the density-FAT and the multifractal methods are useful for FAT determination. Based on this study's results it is concluded that: The drainage density-FAT method performed well by fully considering possible environmental, topographic, and human activity factors, and selected the significant regression factor from the 47 factors in basins where the regression factors are representative of basinwide characteristics. This method can be used to estimate the optimal FAT in areas where basin-wide regression factors are representative for the purpose of river network extraction. River initiation depends on environmental and topographical characteristics. The best regression factors in a region might not be the best in other regions due to differences in topography, climate, and human activities. Anthropogenic factors improve the extraction accuracy of river networks in urbanized regions. The topographic characteristics, specifically, the surface roughness, play a significant role in the estimation of FAT. The regression factors of the extraction models account for basin-wide features and can be obtained from accurate satellite datasets.
There are also limitations with the proposed drainage density-FAT method that may lead to systematic or random errors in extracting river networks. First, the model based on empirical scaling equation and statistical regression was developed in areas with abundant water resources (Hubei Province) and in headwater areas (Qinghai Province). The statistical association between river length and model parameters applied in this study may not always be represented accurately in dry areas or in non-urbanized regions. Second, the satellite datasets may introduce random errors in building the regression model used to estimate the FAT (errors that may arise from extreme weather and radiation-limited ecosystems). The drainage density-FAT model may not be applicable at large spatial scales when regression factors are not basin-wide representatives.
The fractal geomorphology is efficient in the estimation of the FAT, overcoming the limitation of the drainage density-FAT method at the regional scale. This work investigated the performance of two classical fractal methods on the estimation of the FAT. The multifractal method featured a better performance than the box-counting method. The parameters of the multifractal spectrum were applied to describe the hierarchical structure of river networks and highlight the variability of characteristics across study areas. The accuracy of fractal geomorphology in relatively small basins is not addressed in this study. This constitutes the theme of future work.
The methods proposed in this study are applicable with DEMs of resolutions other than the one used in this work. This work indicates, however, that river networks extracted from very high-resolution DEMs are the most accurate [30,36,79]. Experimental results using data from Hubei and Qinghai Provinces, China, provide us with insights as to how the regression and multifractal methods may be applied and expected to perform in other regions with spatially variable topographic and environmental characteristics. The drainage density-FAT method is applicable to dendritic networks, meanders, braided networks, and trellis networks in urbanized basins where the regression factors are representative of basin-wide characteristics. The drainage density-FAT method presented in this paper may have limitations in the extraction of coastal river networks. Complex environmental phenomena such as hydrologic alteration, landscape fragmentation, coastal cutting and filling, land subsidence, sea-level rise, and hurricane activity may introduce uncertainties in extracting coastal river networks [80]. Nevertheless, the accuracy of extracted river networks may be high in some instances by means of the density-FAT method applied to coastal regions. This paper applied the Frobenius norm matching algorithm, the percentage error of the river length combined with a conventional index (root mean square error) by the mining of the contents of images, and the river length error to evaluate the accuracy of extracted river networks. A thorough procedure for the evaluation of river-network extraction methods must consider image similarity and the length error of river networks, as herein demonstrated.
Supplementary Materials: The following are available online at https://www.mdpi.com/2220-996 4/10/3/186/s1, Table S1: The land use types of MCD12Q1 products. Table S2: The full name and corresponding abbreviation of lithology datasets. Table S3: All factors extracted from the original data (/= dimensionless). Table S4: All factors of Qinghai Province for estimating the river length and the FAT.