Model Prediction of the Soil Moisture Regime and Soil Nutrient Regime Based on DEM-Derived Topo-Hydrologic Variables for Mapping Ecosites

: Ecosites are required for stand-level forest management and can be determined within a two-dimensional edatopic grid with soil nutrient regimes (SNRs) and soil moisture regimes (SMRs) as coordinates. A new modeling method is introduced in this study to map high-resolution SNR and SMR and then to design ecosites in Nova Scotia, Canada. Using coarse-resolution soil maps and nine topo-hydrologic variables derived from high-resolution digital elevation model (DEM) data as model inputs, 511 artiﬁcial neural network (ANN) models were developed by a 10-fold cross-validation with 1507 ﬁeld samples to estimate 10 m resolution SNR and SMR maps. The results showed that the optimal models for mapping SNR and SMR engaged eight and seven topo-hydrologic variables, together with three coarse-resolution soil maps, as model inputs, respectively; 82% of model-estimated SNRs were identical to ﬁeld assessments, while this value was 61% for SMRs, and the produced ecosite maps had 67–68% correctness. According to the error matrix, the predicted SNR and SMR maps greatly alleviated poor prediction in the areas of extreme nutrient or moisture conditions (e.g., very poor or very rich, wet, or very dry). Thus, the new method for modeling high-resolution SNR and SMR could be used to produce ecosite maps in sites where accessibility is hard. manuscript.


Introduction
Ecosites, as the stand-level forest unit within the ecological land classification system, define a suite of site conditions that represent the general productivity of forest lands and describe an ecological setting for grouping forest and soil classes [1]. Ecosite maps are required for forest stand-level management and planning, such as forest best management practices and timber supply analysis [2]. The principal means of identifying ecosites in the field is based on a number of easily observable topographic, soil, and vegetation indicators [3]. However, obtaining ecosite maps with adequate resolution and accuracy through interpolating field-identified ecosite points requires a large number of field surveys due to the high spatial variation of ecosites across landscapes. Ecosites can be determined within a two-dimensional edatopic grid using soil nutrient regimes (SNRs) and soil moisture regimes (SMRs) as coordinates [4]. This makes it possible to produce high-resolution ecosite maps without the requirement of a large number of field soil surveys.
SNRs and SMRs are indices that reflect the combined interaction of climatic, topographic, geological, and biophysical factors on forest yield and growth [5]. The SNRs and SMRs are more tangible and operationally useful for mapping ecosites than plant-based methods [6]. In addition to field assessments using easily observable site features, indicator vegetation, and easily identified soil properties [7], many studies focused on the model predictions of SNRs and SMRs. The models are often based on interpolation schemes [8] and statistics-based schemes [9,10] using varying model predictors from field-based plant indicators [7], model-based clay content [9], model-based soil drainage [11], remote sensing data [12], and map-based soil texture [10], with varying class number, map resolution, and model accuracies. However, there is a lack of model studies that estimated SNRs and SMRs with high resolution (i.e., ≤10 m) and high accuracy using easily accessible model predictors.
It has been known for a long time that soil properties at a landscape or regional level can be greatly affected by bedrock and geology because soil particles and nutrient elements are mainly sourced from bedrock and parent material deposits [13]. Hydrological processes associated with local topography can heavily modify soil properties at a local level. This is because the weathering products of rock and parent material, such as nutrient elements, sediment, and soil particles, are constantly transferred with water movements along topographical gradients, which in turn affect soil properties [14]. Some researchers [15,16] reported that existing coarse-resolution soil maps can present the general information of bedrock, parent materials, and geological formations that are related to average soil properties at a large scale, and topo-hydrological parameters derived from high-resolution digital elevation model (DEM) data can capture the characteristics of hydrological processes that tune soil properties at a local level. It is perhaps possible to model SNRs and SMRs with existing coarse-resolution soil maps and high-resolution DEM data. However, the relationships between SNRs and SMRs and the predictors are complicated in nature and sometimes unknown. The existing empirical models and interpolation methods for mapping SNRs and SMRs may hinder the real relationships due to the use of hypotheses that predetermined the relationships [17]. In recent years, artificial intelligence models, especially artificial neural networks (ANNs), have proven effective and excellent in mapping the complicated relationships between multisource inputs without hypotheses. ANN models have been increasingly used to map soil properties, such as soil texture [18], soil drainage [19], and soil electrical conductivity [20]. Thus, the combined scheme, i.e., building an ANN model with predictors derived from the easily accessible DEM data, ought to be able to estimate high-resolution SNRs and SMRs.
The main objective of this research was to introduce new models to estimate SNRs and SMRs for mapping forest ecosites with high resolution. Specific objectives were to (1) model SNRs and SMRs using coarse-resolution soil maps and high-resolution DEMderived variables with ANN models and (2) map ecosites using the produced SNR and SMR maps and test their accuracies.

Study Area
The province of Nova Scotia, Canada, is about 5.5 × 10 6 ha and 78% covered by forests. It is dominated by a continental climate with minor influence from the Atlantic Ocean. Various soil types and forest types within the province were formed with different combinations of topography, parent material, and climate. Hundreds of short rivers, streams, and lakes dotted over the countryside enhance the complexity of landscapes. The most common mixed-forest types in the region are made up of the major conifers, such as pines (Pinus spp.), balsam fir (Abies balsamea), and spruce (Picea spp.) and common hardwood species, such as aspen (Populus spp.), birch (Betula spp.), and maple (Acer spp.). The moisture and nutrient conditions of common soils, such as Regosols, Podzols, Gleysols, Brunisols, and Luvisols, in the area vary with topographic positions and soil texture [1].

Sample Plots
Field samples for this study came from 1507 fixed-area forest ecosystem classification (FEC) plots that have been investigated by the Nova Scotia Department of Natural Resources since 2000 [21]. The plots were established in order to cover the full range of SNRs and SMRs expected within the province and to produce an accurate FEC system. The com-binations of field-identified soil properties, including soil depth, soil texture, forest floor humus form, and A horizon type, are used to assign SNR and SMR class on the basis of the developed FEC guides. SMRs included seven classes: wet, moist/wet, moist, fresh/moist, fresh, dry, and very dry. SNRs included five classes: very rich, rich, medium, poor, and very poor. Assigning ecosite type to each plot was based on field-identified vegetation type and field-assessed SNR and SMR classes within the rectangle-based edatopic gird. As a function of the tree growth data of the full FEC plots, the whole forest area of Nova Scotia was divided into the Maritime Boreal region and Acadian region, with 10 ecosites [11].

Predictors for Modeling
The candidates of model predictors in this study were composed of coarse-resolution soil data and high-resolution DEM-derived variables. Coarse geology, landscape, and topography variables were obtained from the biophysical land classification map that was compiled by the Nova Scotia Department of Natural Resources [22]. The map was delineated from the aerial photographs with a 1:50,000 scale. The unit of the map was defined as an area of forest land having a reoccurring pattern of landforms, soil, and vegetation. The size of the polygon varied with topography, but usually averaged 400-1600 ha. Coarse geology presented major rock types associated with the system soil's origin, including sedimentary (six types), metamorphic (five types), and igneous (six types). The landscape delineated conspicuous topographic features, including glacial (six types), organic (four types), marine and glacial marine (four types), alluvial (five types), and glacial fluvial (six types) features. Topography reported the topographic pattern, namely, smooth, hummocky, knob and knoll, drumlins, dissections, and ridged. The DEM data were sourced from Service, Nova Scotia, Canada. Before the elevation points were interpolated into a grid map with 10 m resolution, original points were compiled from SPOT 5 images. Using developed forest hydrology tools and the spatial analyst extension tools of ArcGIS [23], nine topo-hydrological variables were derived from the DEM data [18], namely, flow direction (FD), potential solar radiation (PSR), flow length (FL), aspect, depth to water (DTW), the sediment delivery ratio (SDR), slope, the topographic position index (TPI), and the soil terrain factor (STF).

Artificial Neural Network Models
The ANNs were developed to estimate SNRs and SMRs because they can realize any nonlinear mapping between model input and output data without hypotheses [24]. A three-layer structure was used for the ANNs in this research ( Figure 1): an input layer containing predictor variables, an output layer consisting of estimated SNRs and SMRs, and a hidden layer, whose nodes linked the input layer and output layer and reflected the complexity of the ANNs. The ANNs were trained using the backpropagation technology with the Levenberg-Marquardt algorithm [25].
The candidate predictors composed of model inputs. Each candidate combination of model inputs corresponded to an ANN model. The coarse-resolution soil data were presented in each combination, whereas DEM-derived topo-hydrologic variables were gradually added from single to multiple in various combinations. For example, there were nine combinations when taking one from nine DEM-derived variables at a time (C 1 9 = 9) and 84 combinations when taking six from nine variables at a time (C 6 9 = 48). Thus, there were a total 511 combinations (C 1 9 + C 2 9 + C 3 9 + C 4 9 + C 5 9 + C 6 9 + C 7 9 + C 8 9 + C 9 9 = 511), which corresponded to 511 ANNs.

Model Calibration, Validation, and Assessment
A method of 10-fold cross-validation [16] was used to build the ANNs and evaluate model performance with a total of 1507 field-assessed SNRs and SMRs. In the 10-fold cross-validation mode, after the entire dataset (1507 data) was randomly divided into 10 equal subsets, nine subsets as calibration data were used to calibrate an ANN model, and the remaining subset was used to validate the model; this process was repeated 10 times until all subsets were used as validation data. Following the requirement of developing ANNs with an early stopping technology, nine subsets, used as the calibration data, were randomly subdivided into a testing dataset (15% of the calibration dataset) and a training dataset (85% of the calibration dataset) within each model building process (each "fold"), which was repeated 100 times to obtain the optimal prediction model for this "fold".
Three accuracy parameters from 10-fold cross-validation were used to screen the optimal ANN models, namely, coefficient of determination (R 2 ), root-mean-squared error (RMSE), and overall accuracy (OA). The processes of building ANNs, assessing model accuracies, and screening ANNs were programmed using MATLAB software.

Mapping Forest Ecosites
Before mapping forest ecosites, the entire province was divided into the Acadian and Maritime Boreal regions according to Table 1 in the study by Keys [1]. For each map unit (10 m × 10 m grid; equal to the size of an FEC plot) within each region, information on the model-estimated SNRs and SMRs was then used to project the map unit onto the rectanglebased edatopic grid [11] and assign an ecosite type that best matched its position on the edatopic grid. The produced ecosite maps were assessed with field-identified ecosites from FEC plots.

SMR Model
Prediction accuracies of the 511 ANN models for SMR, assessed with 10-fold crossvalidation, are shown in Table 1. The table shows that with the increase in DEM-derived variables, the performance of models was significantly improved in general. However, the improvement in model performance was limited when the number of used DEMderived variables exceeded a certain limitation. In the table, the performance of ANN models significantly improved when DEM-derived variables were gradually added from one to seven, and the values of OA were increased from 41% to 45%, 45%, 48%, 52%, 56%, and 61%, respectively, but there was a minor decrease in OA value when DEM-derived variables were further added (eighth and ninth). The improvements in RMSE and R 2 showed the same trend. This result was not a surprise because the order of adding DEMderived variables was the same as the order of importance in relation to SMR accumulation. Thus, the best model for estimated SMR classes in Table 1 was the ANN model that engaged seven DEM-derived variables, together with three coarse-resolution soil maps, as model inputs.
The results that involved the best combinations of DEM-derived variables for the SMR in Table 1 also illustrated that TPI was the most important DEM-derived variable for estimating SMRs, which, together with three coarse-resolution soil maps, could directly account for 53% of the total variation, following by SDR and slope, which accounted for 10% and 5%, respectively. This result was not a surprise since the spatial distribution of SMR was heavily affected by local topography that decided surface water movement [21]. This result agrees with a similar study [26].
An error matrix of model accuracy was calculated using field-assessed SMRs from FEC plots and model-estimated SMRs from the optimal ANN engaging seven DEM-derived variables in Table 1 (Table 2). Accordingly, 61% of model-estimated SMRs were identical with field assessments of 90% within ±1 class.

SNR Model
The performance of the ANN models in estimating SNRs is also reported in Table 1, showing similar results to the SMR models. Within a certain limitation, model accuracies of SNR also obviously improved with an increase in the number of used DEM-generated variables. The optimal model for estimating SNR was the ANN model that used eight DEM-generated variables, together with three coarse-resolution soil maps, as model inputs, with an OA of 90%, R 2 of 0.81, and RMSE of 0.51.
The results of screening DEM-generated variables for SNR in Table 1 showed that, as when modeling SMR, the TPI was also the most important DEM-generated variable for estimating SNR, which, together with three coarse-resolution soil maps, could account for 51% of total variation, followed by aspect, which accounted for 12%. These results were generally expected because soil nutrients are usually reassigned along the hillside at a local level.
Using the optimal ANN in Table 1 with eight DEM-derived variables to produce the SNR map, the error matrix showed that 90% of model-estimated SNRs were identical with field assessments of 99% within ±1 class (Table 2).

Mapped Forest Ecosite
After each map unit (10 m × 10 m grid) was projected onto the rectangle-based edatopic gird using its model-estimated SNR and SMR classes, the accuracies of produced ecosite maps for Maritime Boreal region and Acadian region are presented in Figure 2. For the 10 ecosites from the Acadian region, there was a total of 67% correctness with a range from 23% to 76%. The ecosites that offset from the middle of the edatopic grid, such as e.g., ecosite 1 and ecosite 10, were less well represented than those ecosites that appeared in the center, such as ecosite 5 and ecosite 6 ( Figure 3). For 10 ecosites from Maritime Boreal region, there was a total of 68% correctness with a range from 25 to 94%. Most of ecosites that were placed along the edges of the edatopic grid, such as ecosites 9 and ecosite 10, were less well represented than those ecosites in the center, such as 5 and ecosite 6 ( Figure 3).

The Performance of SMR and SNR Models and Ecosite Maps
The accuracy of the produced SMR map, i.e., 61% correctness (with 90% within ±1 class), was higher than the reported 48% correctness (with 83% within ±1 class), where SMRs, using the same database as in this study, were estimated from modeling soil drainage classes [11], as well as higher than the reported 55% correctness (with 94% within ±1 class), where SMRs with six classes were estimated from plant indicators [7], but a little lower than the reported 65% correctness, where SMRs with four classes were generated from a rule-based GIS model [10]. Compared with Yang's matrix [11], although many very dry plots in the estimated SMR map were still classified as dry or fresh plots and several wet plots were classified as moist or moist/wet plots in this study, there was an increase in OA value from 0% to 37% for very dry plots and from 23% to 50% for wet plots, respectively. Compared with the biggest problem in Yang's report [11], whereby estimated SMRs were clumped into the middle of SMR classes (Figure 3), which resulted in too many estimated mesic plots on the SMR map, the result in this study showed that the situation was greatly alleviated. This result was also proven by the fact that total kappa index was increased from 0.24 to 0.45. After mining all potentials of DEM-derived data by modeling in this study, introducing plant information, as prompted by Wang' research [7], should be a good choice to improve model predictions in the next step.
The accuracy of the produced SNR map, 82% correctness (with 99% within ±1 class), was evidently higher than the reported 58% correctness, where SNRs, using the same database as in this study, were estimated from modeling clay content [9], as well as higher than that obtained by Wang [7], who reported 59% correctness when using plant-based indicators. As with SMRs, an increase in OA value from 14% to 44% for very poor plots and from 19% to 44% for very rich plots, as well as an increase in total kappa index from 0.47 to 0.76, also supported that the produced SNR map in this study greatly alleviated the trend of clumping into the middle of SNR classes that was reported in Yang's research [9] (Figure 3). Considering that most ecosite classes exist in more than one SNR class [11]-for instance, the SNR for the Maritime Boreal region ecosite 1 ranges from poor to very poor-the obtained 82% correctness of model-estimated SNRs with 99% within ±1 class should be good enough to map ecosites in this study.
The accuracy of the produced ecosite map (67-68% correctness) was higher than the reported 59-61% correctness, using the same database as in this study, reported by Yang [11], and close to the 66-70% correctness recorded by MacMillan [27] for estimated 25 m resolution ecosite maps in British Columbia, Canada, using a mixed method of manual and automated procedures that could realize ecology-landform relationships with rule-based models.

Effects of Coarse-Resolution Soil Maps and High-Resolution DEM-Derived Maps
Coarse-resolution geology, landscape, and topography maps were used in each ANN model input in order to represent their general conditions in a region. The boundaries of the coarse-resolution soil maps were still visible in the produced SNR, SMR, and ecosite maps (Figure 4g-j), which reflected the effects of the maps on SNR, SMR, and ecosite distribution. The estimated SNRs (Figure 4g) within an area of shale-based soil (Figure 4b), for instance, were obviously richer than those of granite-and quartzite-based soil (Figure 4b). The estimated SMRs (Figure 4h) within an area of granite-based soil (Figure 4b) were generally drier than those of other parent material-based soil. Thus, the area of granite-based soil (Figure 4b), due to its poor nutrient level and dry moisture, signified that there was more SNRs in the Acadian region ecosite 1 (Figure 4i,j), where jack pine/black spruce is the main forest type [1]. All of these results proved that coarse-resolution maps affected the distributions of SNR, SMR, and ecosite on a large scale.
The high-resolution DEM-derived predictors were used in each ANN model input in order to express how soil nutrient and moisture were redistributed by hydrological processes in local topography. On one hand, the TPI (Figure 4d), aspect (Figure 4e), and SDR ( Figure 4f) derived from DEM data (Figure 4c) showed the relative topographic position, direction of the steepest slope, and efficiency of transporting sediment, respectively, in detail, proving that DEM-derived topo-hydrological predictors can capture the characteristics of hydrological processes that tuned soil properties at a local level. On the other hand, when compared to existing coarse-resolution soil maps (Figure 4b), the estimated SNR, SMR, and ecosite maps (Figure 4g-j) contained more detailed information, and reflected the influence of the high-resolution topo-hydrological maps. For example, the estimated SNR map (Figure 4g) and SMR map (Figure 4h) were somewhat similar to the TPI map (Figure 4d). This means that the TPI played an important role in the distribution of SNR and SMR, which was in keeping with the analysis of model input screen discussed above. As also shown in Figure 4d, the TPI was a ridge in areas far away from the waterway, typically signifying a poor SNR class (Figure 4g) and dry SMR class ( Figure 4h); alternatively, where the TPI was a valley, this usually signified a wet SMR class. These variations are indistinguishable in the coarse-resolution maps, which only referred to one value for each polygon. All of these results proved that high-resolution topo-hydrological predictors affected the distributions of SNR, SMR, and ecosite in detail.

Limitations and Future Improvements
Analyzing plot data indicated that the major reason for the lower accuracies in areas of extreme conditions, such as ecosite 1, ecosite 3, ecosite 7, and ecosite 10 from the Acadian region and ecosite 9 and ecosite 10 from the Maritime Boreal region, might be the problems in model estimations of SNR and SMR (Figure 3). These observations indicated that it was certain difficult to capture the extreme nutrient/moisture conditions, such as very rich/very poor and very dry/wet, using only the established models in this study, which was identical to the results of estimation error analysis discussed above ( Table 2). Considering a total of 80-84% correctness for the Acadian region and Maritime Boreal region ecosites according to field assessment of SNRs and SMRs ( Figure 1) and the remainder (16-20%) according to the field-identified vegetation type, it will be possible to improve the accuracy of model-estimated ecosites in areas with extreme conditions by introducing vegetation information in the future. In addition to remote sensing imagebased high-resolution vegetation information, ecozone, ecoregion, and ecodistrict maps from the current ecological land classification system of the Province of Nova Scotia, which capture macroclimate and macro-topographic differences in vegetation [28], may also be a good candidate.
Although the best combinations of the nine DEM-generated terrain attributes, together with three coarse-resolution soil maps, could account for 81% of SNR total variation and 78% of SMR total variation, introducing more topo-hydrological indices [29] and more advanced neural networks, e.g., deep learning neural networks with multiple hidden layers and improved training algorithms [30] (if a large number of field data are available), would improve model performance. However, it should also be noted that a combined building scheme, using k-fold cross-validation with an early stopping technology, may avoid overfitting, which had poor model generalization outside of the training zone, to a certain extent, as well as perform well in areas with a similar environment to the training zone; however, the model predictions would not be justified in areas with a significantly different environment. Thus, the best SNR and SMR models of Nova Scotia should be retrained before being applied to areas with a significantly different environment. Furthermore, an ANN model is a kind of "black box" where there is little or no possibility of understanding its internal behavior [31]; thus, generating equations based on ANN [32] to delineate how various predictors affect SNR and SMR should be done in the future.

Conclusions
A new modeling method was introduced to map SNRs and SMRs with high resolution, and then produce ecosite maps in Nova Scotia, Canada. Using the biophysical land classification maps and high-resolution DEM-derived variables as model inputs, 511 ANNs were developed and evaluated on the basis of 1507 field data with 10-fold cross-validation. The results showed that the best model for mapping SNRs and SMRs engaged seven and eight DEM-generated variables, together with three coarse-resolution soil maps, as model inputs, respectively, whereby 61% of model-estimated SMRs were identical with field assessments, while this value was 82% for SNRs. Model prediction accuracy for the Acadian region ecosite map and Maritime Boreal region ecosite map revealed 67% and 68% correctness, respectively. The accuracies were evidently higher than those obtained in previous studies. Based on the role of field-identified vegetation type in identifying SMR, SNR, and ecosite type in plots and edatopic grids, it will be possible to improve the accuracy of model-estimated ecosites in areas with extreme conditions by introducing forest-cover information in the future. Meanwhile, introducing more topo-hydrological indices and applying deep learning neural networks would improve model performance; however, retraining the best ANNs for Nova Scotia will be required before their application to areas with a significantly different environment.