Impacts of Soil Information on Process-Based Hydrological Modelling in the Upper Goukou Catchment, South Africa

: Although soils form an integral part of landscape hydrological processes, the importance of soil information in hydrological modelling is often neglected. This study investigated the impact of soil information on streamﬂow modelling accuracy and hydrological process representation. Two different levels of soil information were compared to long-term streamﬂow in the upper Goukou catchment (230 km 2 ), South Africa, over a period of 23 years using the Soil Water Assessment Tool (SWAT+). The land-type soil map (LTSM) dataset was less detailed and derived from the best, readily available soil dataset for South Africa currently. The hydrological soil map (HSM) dataset was more detailed and was created using inﬁeld hydropedological soil observations combined with digital soil-mapping techniques. Monthly streamﬂow simulation was similar for both soil datasets, with Nash– Sutcliffe efﬁciency and Kling–Gupta efﬁciency values of 0.57 and 0.59 (HSM) and 0.56 and 0.60 (LTSM), respectively. It is, however, important to assess through which hydrological processes were these streamﬂow values generated as well as their spatial distribution within the catchment. Upon further assessment, the representation of hydrological processes within the catchment differed greatly between the two datasets, with the HSM more accurately representing the internal hydrological processes, as it was based on inﬁeld observations. It was concluded that hydropedological information could be of great value in effective catchment management strategies since it improves representation of internal catchment processes.


Introduction
Soils are a primary control mechanism in determining hydrological processes within a landscape by partitioning precipitation into the different components of the water balance. This is the result of the ability of soils to absorb, store, and transmit water at different spatial and temporal scales [1]. Hydrological processes in the soil determine the volume, variability, and residence times of water resources within a landscape, which in turn affects the functionality and diversity of ecosystems [2]. However, due to the logistical impracticality of measuring hydrological processes at landscape-scale, these processes remain most practically defined using hydrological models to simplify and represent real-world hydrological systems [3,4].
As the need for more sustainable water resources has grown, the need for more accurate hydrological models has followed [5,6], particularly in ungauged basins [7]. Soil information is an important input parameter in physically-based hydrological models [8,9], yet the required soil information is often not readily available [10]. Reasons include that existing soil maps were not primarily designed for hydrological modelling purposes [11], and detailed hydrological soil surveys are costly and time-consuming to conduct. Hence, soil information employed in hydrological modelling has remained relatively crude compared The 230 km 2 upper Goukou catchment is located within the Western Cape province of South Africa. Elevation of the catchment area is between 92 and 1512 m.a.s.l. (Figure 1), and it forms part of the mountains of the Cape Fold Belt [38]. The predominant geological formations include quartzitic sandstone from the Table Mountain Group, shale and siltstone from the Bokkeveld Group, as well as conglomerate and sandstone from the Uitenhage Group [39]. The region experiences a bimodal rainfall pattern with peaks in spring and autumn. Annual rainfall is strongly related to topography and ranges from 1200 to 1400 mm on the highest mountain peaks, while southern slopes receive between 500 and 800 mm on average [40]. Daily maximum temperatures average at 22 • C in January (summer) and 16 • C in July (winter) [41]. Average daily minimum temperatures are approximately 15 • C in January and 7 • C in July [41]. Vegetation in the catchment primarily consists of Fynbos and Renosterveld as well as an alluvial wetland [42]. 2.2. SWAT+ Setup, Topography, Land-Use, Streamflow, and Climate Inputs SWAT+ (rev 60.5.2) download from https://swat.tamu.edu/software/plus/ (accessed on the 15 April 2021) was used as hydrological model within QSWAT+ (v. 2.0.6) to set up the watershed. SWAT+ is a process-based, semi-distributed catchment-scale model and is a revised version of the well-known Soil and Water Assessment Tool (SWAT) [43]. The SWAT model is widely used to simulate water quality and quantity as well as assessing the impacts of physical changes to a catchment, such as land use and climate changes. SWAT+ divides the catchment into landscape units (LSUs), which comprise of a number of similar hydrological response units (HRUs). An HRU is a homogenous area in terms of soils, land use, and slope. The model then calculates various components of the water balance, such as infiltration, overland flow, lateral flow, percolation, evapotranspiration, as well as discharge to the stream for each HRU. A complete description of the SWAT model is given by Neitsch et al. [44], and for changes in the SWAT+ version, see Bieger et al. [45].

Topography and Land Use
Elevation of the catchment area ( Figure 1) was obtained from a 30-m Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM) [46]. Current land use was obtained from the 2013-2014 SA National Land-Cover Map dataset [47]. Land cover was re-grouped according to SWAT land uses with pre-defined parameters for each use similar to real-world conditions ( Figure 2). Table 1 provides the corresponding generic SWAT land use.

Streamflow and Climate Data
Daily rainfall as well as maximum and minimum temperatures were obtained from a nearby weather station of the South African Weather Service. Daily solar radiation, relative humidity, and wind speed were obtained from the Climate Forecast System Reanalysis project [48], run by the National Center for Environmental Prediction. In order to account for the orographic rainfall effect of the mountains, three virtual weather stations were created to imitate differential rainfall distribution over the catchment. To generate daily rainfall values for each of the virtual weather stations, rainfall coefficients were calculated by dividing the mean annual precipitation of each virtual station [40] by the mean annual precipitation of the actual weather station . These coefficients were then used to calculate daily rainfall values for each virtual station by multiplying daily measured rainfall from the actual weather station by the coefficient for each virtual station.
Long-term daily streamflow data was obtained for the Goukou River at the outflow gauging station (H9H005) of the catchment, managed by the Department of Water and Sanitation. The daily streamflow was aggregated to monthly discharge volumes, which were used to compare model simulations [49].

Soil Datasets
SWAT+ requires a soil dataset as an input spatial layer. Information on soil horizons, such as depth, particle size distribution, saturated hydraulic conductivity (K sat ), bulk density (BD), carbon content, and available water capacity (AWC), are all prerequisites for each soil layer. The soil information provided as input parameter affects the distribution of hydrological processes throughout the catchment. Two levels of soil information were used in this study.

Land Type Soil Map (LTSM)
The first soil dataset was derived from the South African Land Type Survey maps [50,51], developed between 1972 and 2002, representing the best available soil dataset covering the whole of South Africa at a scale of 1:250,000. A land type polygon is defined as "a homogeneous, unique combination of terrain type, soil pattern and macroclimate zone." The Land Type Survey identified 7070 unique land type polygons based on some 400,000 soil observations (approximately 1 observation per 300 ha).
A total of 11 different land types are present within the catchment area. Except for K sat , all the SWAT+ soil parameters for the LTSM dataset are available from Schulze [50] and are summarized in Table 2. The ROSETTA model [52] was used to derive K sat for different horizons from the texture classes according to each land type.

Hydrological Soil Map (HSM)
The second soil dataset was a hydrological soil map (HSM) dataset that was developed by combining an infield hydropedological survey and a digital soil-mapping tool (SoLIM) [53][54][55], based on hydrological mapping techniques as described by Van Zijl et al. [56].
Environmental covariates, including elevation, slope, profile curvature, and normalized vegetation index (NDVI), were obtained for the catchment area [46,[51][52][53]. NDVI values were obtained for the catchment from the National Oceanic and Atmospheric Administration Climate Data Record NDVI database, where the 2013 average NDVI values for each pixel were calculated. These covariates were used to create a conditioned hypercube sampling method for the catchment area. For the hydropedological survey, a total of seven hillslope transects were selected to represent the soils of the catchment. Soil observations were made at crest, midslope, footslope, and valley bottom terrain positions or a combination of these terrain positions where applicable. A total of 50 soil observations were made by hand auger within the catchment, both on the hillslope transects and at easily accessible critical landscape areas. Legacy wetland soil data [57] were also used due to the inaccessibility of the wetland area. Soil morphology was classified in accordance with "Soil Classification: A Natural and Anthropogenic System for South Africa" [58]. The classified soil observations were then grouped by hydropedological type [59] for digital mapping purposes (Table 3).

Responsive (shallow) Graskop, Glenrosa, Mispah Leptosols
Shallow soils with bleached colours in the topsoil indicate that underlying bedrock is relatively impermeable. Small storage capacity will be exceeded following rain events and promote overland flow generation.

Responsive (saturated) Champagne Gleysols
Gleyed subsoils indicate long periods of saturation, typical of wetland soils. Soils will respond quickly to rain events and promote overland flow due to saturation excess.
The hillslope transects and soil observations were conceptualized and using SoLIM, the HSM was created by means of a rule-based design (Table 4). SoLIM was used to spatially map four identified hydropedological soil types, where SoLIM selects the soil mapping unit with the highest probability of occurrence for each pixel of the DEM grid. Based on infield investigations of soil distribution patterns, a 30-m buffer zone was created for streams above 350 m.a.s.l to represent responsive saturated soils, while the wetland was delineated using the national wetland map of South Africa [42]. The SoLIM rules used to create the HSM were designed to reflect field observations made in the catchment during the hydropedological soil survey. The HSM was validated using 50 soil observations throughout the catchment. The final HSM had an evaluation point accuracy of 74% and a Kappa statistic value of 0.68, indicating a substantial agreement with reality. Undisturbed core samples were collected from diagnostic horizons. The undisturbed core samples were used to determine bulk density (BD) and particle size distribution. K sat was determined using the ROSETTA model [52], while available water capacity (AWC) was obtained through pedotransfer functions [61]. Lastly, the hydraulic properties used as SWAT+ inputs for the different horizons of each hydrological soil type were obtained by averaging the infield measured values for each soil form in the specific hydrological soil type (Table 5).

Validation and Statistical Comparison of Simulations
Uncalibrated model simulations using the two soil datasets were conducted for the years 1987 to 2013. This included an initial settling period of four years and a 23-year simulation period. For statistical comparison, four widely used statistical indicators were employed, namely the coefficient of determination, percent bias (PBIAS), Nash-Sutcliffe efficiency (NSE), and Kling-Gupta efficiency (KGE).
Percent bias (PBIAS) measures the average tendency of the simulated data to be larger or smaller than their observed counterparts. The optimal value of PBIAS is 0.0, with lowmagnitude values indicating accurate model simulation. Positive values indicate model underestimation bias, while negative values indicate model overestimation bias [62]. PBIAS is generally expressed in percentage and is calculated using Equation (1): where V oi and V ei are, respectively, the observed and simulated volumes of water for day i. Nash-Sutcliffe efficiency (NSE) is a normalized statistic that determines the relative magnitude of the residual variance ("noise") compared to the measured data variance ("information"). NSE indicates how well the plot of observed versus simulated data fits the 1:1 line [63]. NSE is computed using Equation (2): where V oi and V ei are, respectively, observed and estimated; discharge of day i, V o , is the mean of the observed discharges. The optimum value is 1.0, with higher values indicating better model performance.
The Kling-Gupta efficiency (KGE) that incorporates correlation, variability bias, and mean bias [64] is increasingly used for model calibration and evaluation. It is expressed using Equation (3): where r is the correlation coefficient between observed and simulated flows, and σ o is the standard deviation of the observed streamflow, whereas σ e represents the standard deviation of the simulated streamflow. Q e and Q o represent the mean of the simulated and observed discharges, respectively.

Data Used
The hydro-meteorological data that have been used for the SWAT+ model in this study are precipitation, minimum, and maximum air temperature. Table 6 contains all the input data required by the model.

Soil Datasets
The two soil datasets (HSM and LTSM) differed both spatially ( Figure 3) as well as in their hydrological properties (Tables 1 and 4). The LTSM dataset contained a greater variety of soils (11 soil types vs. 6 in the HSM dataset) and hence also a wider array of hydrological soil properties based on pedotransfer functions. Although the HSM dataset had less variety in soil types, it was spatially more representative of the actual catchment soils. The HSM also consisted of accurate infield measured hydrological soil properties, more accurately representing the hydrological soil properties of the catchment.

Model Streamflow Simulations
The two model set-ups each contained an identical number of sub-basins (26) and landscape units (2642) since both models used the same DEM dataset. The number of HRUs, however, differed greatly with the HSM dataset containing 4216 HRUs, while the LTSM dataset contained 2836 HRUs. This is the result of the increased level of spatial detail in the HSM compared to the LTSM dataset as well as both models being run with all HRUs created, meaning that no HRU threshold was applied to either model.
Statistical comparison between the two models over the 23-year simulation period indicated that the two models simulated streamflow very similarly (Figure 4). Both model simulations yielded acceptable correlation values (R 2 ≥ 0.5) and also produced highly similar NSE and KGE values of 0.57 and 0.59 (HSM) and 0.56 and 0.60 (LTSM), respectively. PBIAS indicated that both models substantially overestimated baseflow conditions within the catchment while underestimating during peak flow periods; however, the HSM performed better than the LTSM, with considerably lower estimation bias. Hence, despite substantial differences between the HSM and LTSM datasets, both spatially and regarding physical soil properties, these differences were not carried over into statistical differences for long-term streamflow simulations.

Hydrological Processes
When comparing internal catchment processes, the impact of soil information on hydrological modelling became evident. Based on its importance in accurate streamflow predictions, average annual lateral flow and overland flow were used as examples to illustrate the differences in hydrological process representation between the two soil datasets. The HSM resulted in a very clear pattern, showing how the bulk of lateral flow volumes originate in the mountainous regions of the catchment and diminishes towards the wetland ( Figure 5). This coincides with the transition from sandy, shallow mountain soils to deeper, clayey interflow soils on the lower slopes, to extremely deep alluvial soils within the riparian zone. The LTSM, on the other hand, produced a haphazard spatial distribution of lateral flow generation processes with no regard for the delineated riparian wetland. When analysing overland flow or surface runoff generation processes, the differences between soil datasets became even more apparent (Figure 6). The HSM simulated the majority of surface runoff within the mountainous regions of the catchment, with comparatively small volumes of surface runoff (<10 mm) occurring on the southern footslopes. The LTSM dataset generated localized pockets with high runoff volumes, with the majority of these falling outside the mountainous area of the catchment.

Long-Term Streamflow Simulations
According to Knoben et al. [65], a KGE value greater than −0.41 implies that a model prediction is a better fit than the mean observed values. Therefore, it can be accepted that both uncalibrated model simulations in the current study produced satisfactory KGE values. Moriasi et al. [66] presented improved evaluation criteria for hydrologic and water-quality models. For streamflow simulations R 2 > 0.6, NSE > 0.5 and PBIAS ≤ 15% were regarded as satisfactory. Although both models in the current study just fall short of the R 2 > 0.6 norm, both models produced acceptable NSE values above the 0.5 threshold. The HSM model also stayed within the 15% PBIAS limit (11.76%), resulting in the HSM slightly outperforming the LTSM model. Examples of similar hydrological modelling studies obtained R 2 values of anywhere between 0.15 and 0.74 [27][28][29].
The clear overestimation of baseflow and underestimation of peak flow for both model simulations resulted in an overall overestimation of streamflow values (PBIAS = −11.76 and −21.55). Such an overestimation could be corrected by adjusting groundwater parameters, e.g., by decreasing the threshold depth of the groundwater before re-evaporation occurs (REVAMPM), increasing the threshold depth for return flow to occur (GWQMN), or increasing the coefficient of re-evaporation from the groundwater (GW_REVAP) for selected HRUs. The most likely explanations for the observed overestimation of baseflow are (a) the large degree of uncertainty in terms of rainfall distribution over the catchment area, which could not be accurately quantified by means of virtual rainfall stations, and/or (b) the catchment location in the transition zone between winter and summer rainfall region, adding to the complexity of rainfall variability within the catchment.
Although the two models did perform remarkably similar, the HSM did slightly outperform the LTSM when considering PBIAS values. The more detailed soil information of the HSM provided some degree of improvement in simulation accuracy in the overall water balance. This may be attributed to more accurate representation of soil physical parameters, such as profile depths, hydraulic conductivity values, and available water capacities.

Importance of Internal Catchment Processes
Figures 5 and 6 represent the dominant hydrological processes (subsurface lateral flow and overland flow) of the upper Goukou catchment. These figures illustrate the large discrepancies between the two soil datasets when simulating different hydrological processes. These differences are a result of how hydrological processes are simulated by the SWAT+ model. The HSM dataset, based on infield observations and measurements, more accurately represents actual hydrological processes of the catchment. The high validation accuracy of the HSM, as well as the reasonable modelling efficiency without calibration, emphasizes the accuracy of the HSM to accurately represent hydrological processes within the catchment.
There is a noticeable difference in spatial transition between higher and lower subsurface lateral flow areas between the two datasets. For the HSM data, there is a clear transition from high lateral flow volumes in the mountains to lower volumes on the footslopes and finally minimal lateral flow within the alluvial floodplain. Subsurface lateral flow is calculated by SWAT using a kinematic storage model, which simulates the movement of water in a two-dimensional cross-section of a hillslope. The kinematic approximation method assumes that the flow paths are parallel to the bedrock and that the hydraulic gradient equals the slope of the hill. Primarily, subsurface lateral flow is calculated as a product of the excess soil water volume after saturation, the hydraulic gradient of the HRU, as well as the drainable porosity of the soil. It is therefore expected that lateral flow simulations would decrease as the slope decreases from mountainous soils, footslope soils, and floodplain soils.
The LTSM simulated far higher lateral flow volumes for the catchment than the HSM, which could be attributed to the overall shallower nature of the LTSM soils compared to HSM soils, especially at the footslope and valley-bottom terrain positions. Shallower soils allow for less water to be stored within the soil profile, which means that more water percolates down to the soil/bedrock interface and is available for lateral flow. The distribution of lateral flow as simulated by the LTSM was also not primarily determined by soil information, as was the case with the HSM simulation, but rather by a combination of soil, land use, and topography for each HRU.
The mountains of the upper Goukou catchment form part of the areas identified as strategic water-source areas for surface water in South Africa [67]. However, the LTSM dataset simulated low volumes of surface runoff for this critical area of the catchment (Figure 6a). Differences between the soil datasets in terms of overland flow volumes in the mountainous regions of the catchment were mainly attributed to differences in hydraulic soil information, as both datasets contained similarly shallow mountain soils (<300 mm). The HSM dataset comprised of far lower K sat values and higher clay percentages than the LTSM. The HSM data also accounted for rock-dominated soils that cover most of the mountainous areas of the catchment. These differences resulted in less permeable topsoils for the HSM, which led more runoff generation to occur. The SWAT model calculates surface runoff using the Soil Conservation Service curve number or the Green-Ampt infiltration method. Surface runoff only occurs when excess water is available at the soil surface. Topsoil hydraulic conductivity, available water capacity, and bulk density all impact the potential volume of surface runoff generated. Infield measurements of these soil properties are imperative to accurately simulating surface runoff.
The inability of the LTSM to accurately simulate this important water balance component in a strategic water source area illustrates why the accurate representation of internal catchment processes are imperative to water-management strategies. This also highlights the importance of soil information in ungauged basins. For example, because the HSM provides a more accurate representation of hydrological processes over the catchment, it might aid in improved planning of surface drainage infrastructure, improve rehabilitation and maintenance of wetland water regimes, as well as assist in spatially quantifying areas for optimal clearing of invasive plant species.
Internal catchment processes should be adequately reflected through the hydrological model structure [35,68]. It was clear that the more detailed HSM dataset produced for hydrological modelling using infield soil observations did add value to the hydrological modelling process. The true value of more detailed soil information may therefore lie in the more accurate representation of internal catchment processes and its effect on catchment management strategies. For instance, if the upper Goukou catchment had been ungauged, the improved hydrological process representation of the HSM would enable more appropriate catchment development and management practices to transpire compared to the LTSM. This gives credence to the use of hydropedological information as a "soft data" tool to better represent internal catchment processes, especially in ungauged basins [29].

Conclusions
This study presented the findings of uncalibrated hydrological simulations using two different levels of soil information in the upper Goukou catchment, South Africa. Detailed hydropedological soil information, using infield soil measurements and digital soil mapping techniques, resulted in a more accurate representation of internal catchment hydrological processes.
Although the statistical improvement in long-term streamflow simulation were not as significant as expected, improved hydrological process representation does give credence to the importance of more detailed soil information for improved catchment management strategies. This is particularly true for ungauged basins, where long-term streamflow gauging stations are absent for calibration purposes.