Improved Process Representation in the Simulation of the Hydrology of a Meso-Scale Semi-Arid Catchment

: The variability of rainfall and climate, combined with land use and land cover changes, and variation in geology and soils makes it a difﬁcult task to accurately describe the key hydrological processes in a catchment. With the aim to better understand the key hydrological processes and runoff generation mechanisms in the semi-arid meso-scale Kaap catchment in South Africa, a hydrological model was developed using the open source STREAM model. Dominant runoff processes were mapped using a simpliﬁed Height Above the Nearest Drainage approach combined with geology. The Prediction in Ungauged Basins (PUB) framework of runoff signatures was used to analyse the model results. Results show that in the headwater sub-catchments of Noordkaap and Suidkaap, plateaus dominate, associated with slow ﬂow processes. Therefore, these catchments have high baseﬂow components and are likely the main recharge zone for regional groundwater in the Kaap. In the Queens sub-catchment, hillslopes associated with intermediate and fast ﬂow processes dominate. However, this catchment still has a strong baseﬂow component, but it seems to be more impacted by evaporation depletion, due to different soils and geology, especially in drier years. At the Kaap outlet, the model indicates that hillslopes are important, with intermediate and fast ﬂow processes dominating and most runoff being generated through direct runoff and shallow groundwater components, particularly in wetter months and years. There is a high impact of water abstractions and evaporation during the dry season, affecting low ﬂows in the catchment. Results also indicate that the root zone storage and the parameters of effective rainfall separation (between unsaturated and saturated zone), quickﬂow coefﬁcient and capillary rise, were very sensitive in the model. The inclusion of capillary rise (feedback from the saturated to unsaturated zone) greatly improved the simulation results. especially in the Kaap outlet. These results show how different model setups are needed for the


Introduction
In many regions of the world, including in most parts of Southern Africa, data availability and resources for detailed field investigations are limited [1]. Large catchments need to be modelled with limited input data, yet the results are needed to manage water resources that are crucial to

•
Interpret key landscape elements with respect to their hydrological functioning, and gather available data for hydrological modelling in the study area; • Analyse runoff signatures and processes from available gauged catchments; • Setup a process-based hydrological model that can utilize spatial (gridded) data and that is easy to adapt to different hydrological processes; • Gradually increase model complexity and assess model sensitivity to different inputs and parameters; and • Understand the key hydrological processes and runoff generation mechanisms in the catchment, and how this could improve hydrological modelling.

Study Area
The Kaap catchment (1640 km 2 ) is located in the South African Lowveld in the northeast of the Mpumalanga province, forming a sub-catchment of the Incomati river basin. The Kaap catchment contains three main tributaries: the Queens, the Suidkaap, and the Noordkaap (Figure 1). The elevation ranges from 300 m to 1800 m above sea level. The climate is semi-arid with cool dry winters (April to September) and hot wet summers (October to March). Precipitation ranges from 583 mm year −1 to 1243 mm year −1 in the highest parts of the catchment [26]. The mean potential evaporation is 1435 mm year −1 . Streamflow is highly seasonal with the highest average flow occurring in February and the lowest flow at the end of the dry season in September. The mean observed annual streamflow ranges from 66 mm year −1 in the outlet to 144 mm year −1 in the tributaries (Table A1, Appendix A). The catchment is fairly well monitored with five streamflow gauges available within the catchment (Figure 1). The dominant land covers in the Kaap Valley are Bushveld and grasslands (68% of the catchment area) as observed in Figure 1B. The upstream areas are mostly covered by exotic pine and Eucalyptus plantations, which may tap groundwater (25% of catchment area). Sugar cane, cash crops and citrus trees are found downstream and are irrigated (6% of catchment area). There are some mines and urban settlements, Barberton being the main town. Biotite granite is the predominant formation in the Kaap valley. Sandstones and shales are found in closer proximity to the Kaap River and in the south of the catchment [27]. Other formations present include lava, gneiss, ultramafic rocks, quartzite and dolomite (see Figure 1C).
Saraiva Okello, et al. [28] conducted an extensive analysis of rainfall and streamflow in the Incomati basin, and all the gauges within the Kaap were analysed. The analysis revealed that over the past 60 years , no significant upward or downward trend in the catchment rainfall was found, but rather seasonal variability dominated. The streamflow was analysed using the Indicators of Hydrological Alteration tool, and several significant trends were found in the streamflow records. The Noordkaap gauge (X2H010), for example, showed significant decreasing trends in mean monthly flows, low flows, 7-day minimum flow, among others. Further investigation of this shift in the flow regime identified the change of land use, that is, the increase in forestry plantation, as the main driver of the decreasing trends.
Camacho Suarez, et al. [24] conducted an intense tracer study during the rainy season of 2013-2014 in the Kaap catchment. They installed rainfall samplers in two locations in the catchment, and an automatic water sampler at the outlet of the Kaap. Furthermore, grab samples were collected in several locations before rainfall events to provide a snapshot of water quality of the catchment in baseflow conditions. Four major events were sampled and analysed, using isotope and hydro-chemical hydrograph separation, as well as end member mixing analysis. The study revealed great dominance of pre-event water in the streamflow. A three-component hydrograph separation highlighted a major contribution of shallow groundwater, which was enriched with potassium and isotopes. Two main sources of groundwater were identified, the upstream area with fractured granite, characterized by lower ionic content, and the downstream area, with more diverse geology and higher ionic content. Furthermore, a strong correlation was found between antecedent precipitation index and direct runoff. This means that when the catchment is wet from previous rainfall events, and the storages filled, the connectivity of the catchment increases and more direct runoff is generated.
Saraiva Okello, et al. [25] further explored hydrograph separation in the Kaap, using long-term records of water quality, particularly EC. They computed baseflow and quickflow components, using a calibrated recursive digital filter, at monthly and annual scales. The digital filter was calibrated using long-term EC and observed flow data. Hydrograph separation showed that all catchments contribute highly to baseflow.

Data Used
Hydrological data in the catchment including precipitation, evaporation and streamflow records were collected from the South African Department of Water & Sanitation (DWS, former DWA), the South African Weather Service (SAWS) and the South African Sugarcane Research Institute (SASRI). Figure 1A shows the locations of rainfall and streamflow stations, as well as sub-catchment delineation and topography. To analyze the flow behaviour at the outlet and tributaries, average daily discharges at X2H022 (Outlet), X2H008 (Queens), X2H031 and X2H024 (Suidkaap) and X2H010 (Noordkaap) stream gauges were obtained from the DWS. Land use from the Watplan project was used for this analysis [29]. Topographic information was derived from STRM images with 90 m pixel resolution.
In addition to water use by natural vegetation, the main water users in the catchment are: • Irrigated sugarcane (98 km 2 area) with a crop water requirement of 92 × 10 6 m 3 year −1 [30]. However, Mallory and Beater [30] report that only 62 × 10 6 m 3 year −1 are supplied from the river. • Domestic water supply to the Umjindi Local Municipality (over 71,200 population), with a demand of 3.9 × 10 6 m 3 year −1 -this is supplied from an interbasin transfer from the neighbouring Lomati dam (part of the Komati catchment) [30].
There are no major reservoirs in the catchment, and the industrial water requirements are considered insignificant [30]. The WR2012 study [31] simulated naturalized flows for the catchment, and these are reported in Table A1 (Appendix A) for comparison. The observed flow constitutes only 57 to 69% of naturalized flows (flows that would occur without human interventions). This means that Water 2018, 10, 1549 6 of 25 31 to 43% of the runoff generated is abstracted due to human activities (irrigation, forestry plantation and other abstractions). The relative difference between observed and naturalized flow is highest for the Kaap and Suidkaap catchments.

Landscape Classification
SRTM images with 90 m pixel resolution were used to define topography. Furthermore, a landscape analysis was conducted to define zones with similar landscape features, which are presumed to have similar runoff generation processes.
The HAND value was computed, as per the procedure of Rennó, et al. [32] and Gharari, et al. [13]. The HAND value was then combined with the slope map, and thresholds were defined to differentiate Wetlands, Hillslopes and Plateaus (or valley bottom) [12,13] (Figure 2A and Table A1). The thresholds were defined using expert knowledge and some site verifications. Gharari, et al. [13] present an extended calibration procedure to assess sensitivity of HAND model. The thresholds used to define the zones were: • Stream initiation at 1000 m.

•
The HAND threshold to separate wetlands from Plateau and Hillslope was 10 m.

•
The slope threshold to separate Hillslope from Plateau was 12%.
Several runs of the model were conducted and compared with verification locations to adjust the parameters.
Water 2018, 10, x 6 of 25 means that 31 to 43% of the runoff generated is abstracted due to human activities (irrigation, forestry plantation and other abstractions). The relative difference between observed and naturalized flow is highest for the Kaap and Suidkaap catchments.

Landscape Classification
SRTM images with 90 m pixel resolution were used to define topography. Furthermore, a landscape analysis was conducted to define zones with similar landscape features, which are presumed to have similar runoff generation processes.
The HAND value was computed, as per the procedure of Rennó, et al. [32] and Gharari, et al. [13]. The HAND value was then combined with the slope map, and thresholds were defined to differentiate Wetlands, Hillslopes and Plateaus (or valley bottom) [12,13] (Figure 2A and Table A1). The thresholds were defined using expert knowledge and some site verifications. Gharari, et al. [13] present an extended calibration procedure to assess sensitivity of HAND model. The thresholds used to define the zones were: • Stream initiation at 1000 m.

•
The HAND threshold to separate wetlands from Plateau and Hillslope was 10 m.

•
The slope threshold to separate Hillslope from Plateau was 12%.
Several runs of the model were conducted and compared with verification locations to adjust the parameters.

Dominant Runoff Generation Zones
After landscape analysis, and in combination with other physiographic information and previous fieldwork [24], the dominant runoff generation processes were identified in the catchment. The combination of the HAND zones and the geology map helped define zones of slow flow, intermediate (or delayed flow) and fast flow ( Figure 2B). All wetlands and sealed areas (urban areas and mines) were considered fast flow generation areas. The plateaus had two dominant mechanisms: Plateaus with underlying geology consisting of quartzite and gneiss were classified as intermediate (or delayed) zones, because both vertical and horizontal flows occur. Plateaus with weathered granite and sedimentary rocks were considered slow flow zones because the vertical percolation and recharge to deep groundwater through fissures of the bedrock is the predominant process [24].
Hillslopes, due to the steep topography, have mostly quickflow occurring through overland flow. When the hillslope has granite, quartzite or gneiss geology, some delayed runoff occurs, as subsurface lateral flow dominates. However, antecedent precipitation can change the dominant processes, in which case, quickflow is sourced from the intermediate runoff zone as well [24].

The STREAM Model
The Spatial Tools for River basin Environmental Analysis and Management (STREAM) model [33] has been used in several locations internationally and at different spatial/temporal resolutions. It is a spatially distributed and conceptual model, where the non-linear behaviour of the river basins is explained by a combination of thresholds and linear reservoirs. The model is based on a raster GIS which calculates the water balance of each grid cell and routes this through a stream channel network which is based on the digital elevation model (DEM). There is no routing of the surface runoff-it is removed from the model within the same time step as it is generated. A detailed description of model genesis and configuration can be found in several publications [34][35][36]. The model was selected because of its ability to use distributed (raster) data, and ease of configuration in open source PCRaster dynamic programming language. The main model parameters and variables are presented in Table 1.
The model was used as a tool to test our process understanding in the studied catchments and to highlight shortcomings in process representation in the model. The model structure included some of the main processes expected in a semi-arid catchment such as precipitation, interception, evaporation, and runoff generation ( Figure 3). After interception, the effective rainfall is partitioned using the cr coefficient between the unsaturated and saturated zones. The portion in the unsaturated zone is available for the transpiration process, which is computed using the soil water balance and is regulated by the maximum unsaturated zone storage, Sumax. The portion in the saturated zone can generate runoff, if certain groundwater storage thresholds are exceeded. Initially, the capillary rise process, whereby water from the saturated zone returns to the unsaturated zone, was not simulated, but in later model runs this process was also. Previous research showed that plantation forest (Eucalyptus) can access water from great depths, depleting the groundwater storage [37][38][39]. To mimic this process, the Sumax parameter, which was defined based on rooting depth of dominant land use and available water content (based on soil hydraulic properties), was made much larger under forest and plantation land-uses. Where shallow vegetation predominates, the Sumax parameter was set at lower values. Note that the hydrological model did not consider direct abstractions of water for irrigation and other purposes.

Model Inputs, Parameters and Setup
The hydrological model was configured to simulate stream flow for the period 2003-2013 for the Kaap catchment, with a daily time step and 90 m cell grid size.
Daily rainfall station data ( Figure 1) were used for precipitation. The data were interpolated with the Inverse Distance Weighing method (IDW) and were corrected with an elevation factor derived from the Mean Annual Precipitation map [40], according to the methodology described in Sieber and Uhlenbrook [41]. Interception was defined using a fixed daily threshold coefficient D based on the land use and land cover map, listed in Table 2. Potential Evaporation was also derived from the station data, and interpolated using IDW method. Actual Transpiration was computed from the soil moisture water balance in the unsaturated zone. The Sumax parameter was derived from a combination of available water content (field capacity minus wilting point of each soil type) and rooting depth of each respective land cover. All model parameters were derived from careful analysis of the literature, local expert knowledge and by calibration, as explained in Table 1.

Model Inputs, Parameters and Setup
The hydrological model was configured to simulate stream flow for the period 2003-2013 for the Kaap catchment, with a daily time step and 90 m cell grid size.
Daily rainfall station data ( Figure 1) were used for precipitation. The data were interpolated with the Inverse Distance Weighing method (IDW) and were corrected with an elevation factor derived from the Mean Annual Precipitation map [40], according to the methodology described in Sieber and Uhlenbrook [41]. Interception was defined using a fixed daily threshold coefficient D based on the land use and land cover map, listed in Table 2. Potential Evaporation was also derived from the station data, and interpolated using IDW method. Actual Transpiration was computed from the soil moisture water balance in the unsaturated zone. The Sumax parameter was derived from a combination of available water content (field capacity minus wilting point of each soil type) and rooting depth of each respective land cover. All model parameters were derived from careful analysis of the literature, local expert knowledge and by calibration, as explained in Table 1.

Model Simulations
Several model configurations with stepwise variation of model inputs, parameters and processes of differing complexity were tested.
The following were the main simulation comparisons conducted:

•
Rainfall input (Station data with Thiessen regionalization, with Inverse Distance Weighing and elevation correction, and Remote sensing precipitation from Chirps database).
Maximum ground water storage in saturated zone parameter GWSmax (mm), derived from DEM or from HAND maps. • Implementation of capillary rise process, with different thresholds of Cflux (mm/d). In addition, the HBV model [43,44] was set up for the catchments for comparison. The model was configured using similar input data (precipitation, temperature and potential evaporation), but only vegetation and elevation band zones were used to discretize the model. Automatic calibration was applied to obtain the best performing parameter sets.

Runoff Signatures and Assessment of Model Performance
The Prediction in Ungauged Basins (PUB) book [3] suggests a framework for hydrological understanding of catchments, by focusing on their runoff signatures. There are a myriad of possible signatures, but we choose to focus on the key signatures suggested by Blöschl, et al. [3], which are commonly used in the region as well (e.g., [45]): annual runoff, seasonal runoff, flow duration curve (FDC), low flows, floods, and runoff hydrographs.
The model performance was also assessed visually and statistically using different indicators of goodness of fit of the hydrographs: the Nash-Sutcliffe efficiency (NSE), the Logarithmic Nash-Sutcliffe efficiency (LogNSE), Bias and percentage Bias (PBias), Mean absolute error (MAE), the Pearson R 2 , Root mean square error (RMSE), and Kling-Gupta efficiency (KGE) coefficient [46,47]. The NSE varies between −∞ and 1.0, with 1.0 being the optimal value. Values between 0 and 1 are considered acceptable, whereas less than 0 is unacceptable performance. LogNSE has a similar range, but the flow values are transformed into logarithmic to better analyse low flows. Bias, MAE and RMSE have the same unit as observed flow, whereas PBias is the percentage of bias in relation to mean flow; the closer to 0, the better the model performance, with low-magnitude values indicating accurate model simulation. Positive values indicate model underestimation bias, and negative values indicate model overestimation bias. R 2 varies between 0 and 1, whereas KGE varies between and −∞ and 1. In both cases, values between 0.7 and 1 are considered good; between 0.5 and 0.7, acceptable; and below 0.5, poor. The KGE also offers diagnostic insights into the model performance because of the decomposition into correlation, bias term and variability term. From a hydrologic perspective usage of KGE assists in reproducing temporal dynamics, as well as preserving the distribution of flows; therefore, this was adopted as the main indicator of goodness of fit.

Model Parameterization
The final STREAM model parameters used for the comparison simulations are listed in Tables 2 and 3, and illustrated in Figure 4. Several manual calibration runs were conducted, where each parameter was varied while others kept constant. The best performing parameter sets were retained for subsequent simulations.
For the Sumax parameter, the areas with forest and plantations had higher Sumax values, because these occur on locations with deeper soils and stronger baseflow, indicative of larger water storage; research has shown that these vegetation types can tap deep water stores.
The cr parameter was derived from a combination of land use, soil texture and slope [42]. The qc parameter, however, was mainly driven by soil type. Coarser soils, such as sandy loams or sandy clays have a lower qc threshold, because these soils allow for quicker response, the threshold to initiate quickflow being lower. The finer clayey soils, in contrast, can hold water for longer periods of time, increasing the qc threshold value. The GWSmax parameter followed closely the elevation pattern, and the relationship derived by Gerrits [36] was used.

Model Simulations
Over 70 model runs were conducted, but only a sample of four representative runs will be presented and discussed (Table 4). Overall, the model simulations were able to capture the flow dynamics well. However, in several runs, the model overestimated peak flows and baseflows, especially at the Kaap outlet and in Queens. A comparison of the goodness of fit indicators was done to see which model better represents the actual catchment conditions (Table 5). Overall, the Pearson correlation was good, ranging from 0.75 to 0.84, meaning that the simulated flows generally followed well observed flow pattern. The NSE was poor to acceptable, mostly due to the overestimation of flow in some runs (e.g., run 53 and run 60), and the seasonality of the flow. In the best performing runs, the NSE was 0.5 to 0.66. Run 60 simulated zero flows during the low flow season, and thus it was not possible to calculate LogNSE. In terms of KGE, which is the most integrated indicator, run 64 was the best for Noordkaap and Suidkaap catchments, with KGE of 0.67 and 0.75, respectively, whereas run 67 was the best for the Queens and Kaap catchments, with KGE of 0.79 and 0.83, respectively. The Bias was very high, especially in the Kaap outlet. These results show how different model setups are needed for the different catchments. However, there is a similarity between Noordkaap and Suidkaap, and also between the Queens and the Kaap.

Model Simulations
Over 70 model runs were conducted, but only a sample of four representative runs will be presented and discussed (Table 4). Overall, the model simulations were able to capture the flow dynamics well. However, in several runs, the model overestimated peak flows and baseflows, especially at the Kaap outlet and in Queens. A comparison of the goodness of fit indicators was done to see which model better represents the actual catchment conditions (Table 5). Overall, the Pearson correlation was good, ranging from 0.75 to 0.84, meaning that the simulated flows generally followed well observed flow pattern. The NSE was poor to acceptable, mostly due to the overestimation of flow in some runs (e.g., run 53 and run 60), and the seasonality of the flow. In the best performing runs, the NSE was 0.5 to 0.66. Run 60 simulated zero flows during the low flow season, and thus it was not possible to calculate LogNSE. In terms of KGE, which is the most integrated indicator, run 64 was the best for Noordkaap and Suidkaap catchments, with KGE of 0.67 and 0.75, respectively, whereas run 67 was the best for the Queens and Kaap catchments, with KGE of 0.79 and 0.83, respectively. The Bias was very high, especially in the Kaap outlet. These results show how different model setups are needed for the different catchments. However, there is a similarity between Noordkaap and Suidkaap, and also between the Queens and the Kaap.
Furthermore, a visual analysis of the different hydrological signatures was conducted to further understand which processes were better represented by each model setup. From the other model simulations (not reported here), the parameters cr and qc proved to be very sensitive.

Annual Runoff
The annual runoff, which is a key component of the water balance, was computed for all hydrological years. There was difficulty in closing the water balance in the initial runs, when a simple setup without capillary rise (or feedback from the groundwater storage to the unsaturated zone) was implemented. Figure 5 illustrates the results of run 64, compared to observed flow for the four catchments.
Regarding the annual dynamics, one can see that the model tends to better capture the flows generated in wetter years than in drier years. This may be due to more uncertainty in the storage conditions during drier years, and the impact of water abstractions for irrigation. The naturalized flow is also 17 to 51% higher than observed flow, which implies that water abstractions and reductions in streamflow could be up to 50%, particularly in the dry season.
One important aspect to consider is that in semi-arid and sub-humid areas, the evaporation component of the water balance is very large. Actually, the potential evaporation is much larger than rainfall, which reflects in more than 90% of the water balance being attributed to evaporation, and only 10% or less to runoff generation (Table 6). Therefore, uncertainties related to the computation of evaporation, such as the parameters used, interception, and the interpolation of input data, can greatly affect the results of model simulations. This is illustrated by the great difference between evaporation estimates from different model runs (Table 6). A comparison was also made between evaporation generated by the water balance model, and evaporation from remote sensing products (Table 6). A more detailed description of the remote sensing evaporation products is available in the Supplementary Material. Comparing monthly and annual scales revealed that both ALEXI [48,49] and CMERST [50] products generally overestimate actual evaporation, whereas SSEBop [51,52] results in an underestimate.
Water 2018, 10, x 13 of 25 evaporation generated by the water balance model, and evaporation from remote sensing products (Table 6). A more detailed description of the remote sensing evaporation products is available in the supplementary material. Comparing monthly and annual scales revealed that both ALEXI [48,49] and CMERST [50] products generally overestimate actual evaporation, whereas SSEBop [51,52] results in an underestimate.   Note: FlowObsv is observed flow, RC is runoff coefficient, Qnat is the naturalized flow obtained from WR2012 database, Fm53 to 67 are the simulated flows for runs 53 to 67, PminFlowOb is the difference between precipitation and observed flow, ETfao is the potential evaporation using FAO method, ETm53 to 67 are the simulated actual evaporation for model runs 53 to 67, ETal is evaporation from ALEXI product, ETcm is evaporation from the CMERST product, and ETss is evaporation from the SSEBop product.

Seasonal Runoff
The average monthly streamflow graphs ( Figure 6) show the strong seasonality of streamflow in the catchments. The flow components analysis demonstrates that quickflow and saturated overland flow are only active for few months of the year (November to March, with peak contributions in January/February). For the Noordkaap catchment, model run 64, which included capillary rise and GWSmin threshold, was able to capture the monthly flow pattern relatively well. This pattern was also well captured in the Suidkaap catchment. For the Queens and Kaap catchments, however, this model run overestimates flow, especially during the wet months. Another model configuration (model 67, available in the Supplementary Material), which included a higher coefficient for capillary rise, generated better results for these latter two catchments. Overall, from the results of run 64, the baseflow component accounted for 85% of the flow in the Queens catchment, 95% in the Nordkaap, 94% in the Suidkaap, and 93% in the Kaap. The quickflow contribution ranged between 4 and 13%, whereas saturated overland flow was about 1% or less for all catchments.

Flow Duration Curves
The flow duration curves (Figure 7) give a comprehensive overview of the flow regime. Once more it is evident that the flow variation for the Noordkaap and Suidkaap catchments is fairly well represented. However, for the Queens and Kaap catchments the model overestimates the middle and low flows. This overestimation could be because of the representation of subsurface flow processes in the catchment model. Both Queens and Kaap have higher percentages of area with hillslopes, and also more diverse geology and soils, including a variety of sedimentary rocks. Apparently a more complex representation of flow processes and groundwater is required to capture such variability. Furthermore, the Kaap catchment has higher water abstractions for irrigation-and most of this water is abstracted during the dry season and in drier years. The naturalized flow for the Kaap catchment is 42% higher than the observed flow, which can largely be attributed to water abstractions for irrigation. Most of the irrigated sugarcane in the catchment is located in the Lower Kaap valley, with an estimated irrigation demand of 92 × 10 6 m 3 /year. hillslopes, and also more diverse geology and soils, including a variety of sedimentary rocks. Apparently a more complex representation of flow processes and groundwater is required to capture such variability. Furthermore, the Kaap catchment has higher water abstractions for irrigation-and most of this water is abstracted during the dry season and in drier years. The naturalized flow for the Kaap catchment is 42% higher than the observed flow, which can largely be attributed to water abstractions for irrigation. Most of the irrigated sugarcane in the catchment is located in the Lower Kaap valley, with an estimated irrigation demand of 92 × 10 6 m 3 /year.

Low Flows and Floods
The low flows and floods can be characterized by their magnitude, frequency and duration. The flow duration curve provides commonly used indicators of low flows and high flow regime. Q95

Low Flows and Floods
The low flows and floods can be characterized by their magnitude, frequency and duration. The flow duration curve provides commonly used indicators of low flows and high flow regime. Q95 (flow exceeded 95% of the time) is the most frequently used indicator of low flow. Q75 is frequently used in South Africa for yield estimation, and the Q50 and Qmean are also reported. On the high flow side, Q1 and Q5 are used, as well as floods with 100-year recurrence intervals. However, given that the model dataset is only 10 years long, it was not sufficiently long to perform a flood frequency analysis. For the sake of comparing the model runs, Q1 and Q5 for modelled and observed time series were used instead compared as indicators of high flows. Table 7 compares flow percentiles for observed and modelled streamflow under different model setups. The slope of the flow duration curve, computed as the slope between Q30 and Q70, is also reported for comparison. This set of signatures reveals once more the difficulty of having one single model setup performing equally well for both high and low flows, while getting the same slope of the FDC. This is likely due to the fact that in these catchments, different processes control the low flow and high flow generation; apparently the simple model structure did not fully capture these differences.
(flow exceeded 95% of the time) is the most frequently used indicator of low flow. Q75 is frequently used in South Africa for yield estimation, and the Q50 and Qmean are also reported. On the high flow side, Q1 and Q5 are used, as well as floods with 100-year recurrence intervals. However, given that the model dataset is only 10 years long, it was not sufficiently long to perform a flood frequency analysis. For the sake of comparing the model runs, Q1 and Q5 for modelled and observed time series were used instead compared as indicators of high flows. Table 7 compares flow percentiles for observed and modelled streamflow under different model setups. The slope of the flow duration curve, computed as the slope between Q30 and Q70, is also reported for comparison. This set of signatures reveals once more the difficulty of having one single model setup performing equally well for both high and low flows, while getting the same slope of the FDC. This is likely due to the fact that in these catchments, different processes control the low flow and high flow generation; apparently the simple model structure did not fully capture these differences.     3.3.5. Hydrographs Figure 8 shows the hydrographs for catchments aggregated at monthly time scale for the entire simulation period. During a sequence of wet years (2010 to 2013), the model was able to represent the flow dynamics fairly well with no systematic under-or over-prediction of flows. This confirms that the model performed better under wetter conditions than under drier ones. In drier years, better characterization of the evaporation processes is required, as well as groundwater discharge, storage and water abstractions, as this greatly influences the water balance. Small, localised rain events are also very difficult to capture with the given monitoring network. Water abstractions for irrigation and other uses are relatively higher in drier years, so explicit representation of the irrigation management and other water uses would be required to improve the model.

Implications for Hydrological Process Understanding
The research provided a framework to evaluate model performance considering a range of information sources. The use of landscape features to map dominant runoff mechanisms also

Implications for Hydrological Process Understanding
The research provided a framework to evaluate model performance considering a range of information sources. The use of landscape features to map dominant runoff mechanisms also assisted in estimating parameters for the STREAM model. Model parameter estimates derived from this study could guide parameterisation in the case of other semi-arid catchments with similar characteristics. Furthermore, the research highlighted gaps in understanding of key hydrological processes in the study catchments, which seem typical for semi-arid areas. The computation of all evaporation fluxes and the accurate quantification of water uses is very important for the water balance estimates of semi-arid areas, given that the greatest component of the water balance is attributed to these processes. The use of information from previous hydrograph separation studies (using digital filters or water quality data) proved useful to aid in understanding the flow components and dominant flow generation processes in the catchments as well. Therefore, it is relevant for other areas to explore these types of data and information, in addition to traditional hydrological data.
Notably, the soil grids dataset that is freely available at global scale proved useful as model input.

Implications for Water Resource Management
The results reveal that there are great heterogeneities in the catchments studied. Water resource management decisions are made at the scale of the catchments studied; therefore, it is important that a better process understanding is included in current management models and tools.
We could see that especially the Queens and Kaap catchment seem to have higher levels groundwater/surface water interaction. These catchments are more impacted by water abstractions and evaporation, particularly during dry season and drier years, which is evident from the steeper shape of the flow duration curves in the low flow portion (Figure 7). Therefore, the impacts of pollution and water abstraction could be more critical for these catchments, particularly on their low flows.
The Noordkaap and Suidkaap catchments, in contrast, are dominated by subsurface runoff, as a result of deeper soils and mostly fractured granite lithology. The baseflow is steadier in these catchments, and they are likely the recharge areas of the regional groundwater body. Research using hydro-chemical and stable isotopes in the Kaap outlet over the rainy season of 2013-2014 [24] revealed that 64 to 98% of flow in the Kaap was from shallow and deep groundwater components. During wet conditions, up to 41% of total runoff was attributed to direct runoff, and strong correlations were found between antecedent precipitation conditions and direct runoff.
Saraiva Okello, et al. [25] also found high contributions of baseflow to total flow in the Kaap catchment and tributaries using calibrated recursive digital filters for hydrograph separation. They reported baseflow contributions ranging from 45 to 70%, with very high inter and intra annual variability.
Water abstractions for irrigated agriculture significantly impact the streamflow in the Kaap, particularly in the dry season and in drier years. This impact seems to be more pronounced at the Kaap outlet, and in the Queens catchment, and is attributed to the different processes that are dominant in these catchments. The Suidkaap catchment, in spite of also experiencing high streamflow reductions (mostly due to commercial forestation), still sustains significant baseflow contributions. Some water is imported to augment irrigation in the Kaap valley, but these water imports do not significantly affect the water resources of the Kaap catchment, as return flows are small.
The reduction of water quantity in the Kaap negatively affects the water quality in the catchment. Saraiva Okello, et al. [25] reported that higher loads of EC and other water quality parameters, due to reduced dilution capacity of the system, particularly during the dry season and in dry years. Deksissa, et al. [53] and Slaughter and Hughes [54] supported this finding, attributing it to the combination of abandoned mines and irrigation return flows that occur in the Kaap. Therefore, in order to improve water quality in the Kaap and in the Crocodile catchment overall, it is important to reduce water abstractions and/or better control and restrict water pollution in the sources.

Input Uncertainty and Model Structure
The rainfall input greatly impacts the runoff generation. The regionalization of rainfall based on available stations using the IDW method could have induced a greater number of rainy days and reduced the magnitude of rain events. This results in higher interception rates, and underestimation of flow peaks. An attempt was made to compare the rainfall regionalization with the Thiessen method, and with remote sensing rainfall (Chirps), but still the IDW method provided better simulation results, likely due to the altitude correction using the MAP pattern.
Currently, the model only accounts for human activity in terms of modified land use, and not through explicit water abstractions. Improved monitoring of water use in the catchment would greatly assist in better hydrological simulation, as such information can be used to develop an irrigation routine in the model.
Another issue of uncertainty is the configuration of the groundwater reservoirs. From the shape of the observed hydrograph it appears that the recession is not linear, but rather logarithmic or another non-linear function. The research using tracers [24] revealed that there are two distinct groundwater components in the Kaap outlet, which can be indicative of different reservoirs that operate with distinct dynamics. Shallow groundwater responds quickly in rainfall events, and has the highest contribution to flow, particularly when the antecedent moisture in the catchment is already high. The other groundwater component is from deeper sources, which could be the regional groundwater, recharged in the headwaters of the catchment. This component is responsible for sustaining the baseflow during most of the year. In the months of February to April, when the catchments are already quite wet, most of the runoff is generated through direct runoff [25].
The HBV model [43,44] was set up for the catchments using similar input data, but only vegetation and elevation band zones were used to discretize the model. Precipitation, temperature and potential evaporation were used to drive the model and automatic calibration was applied to get the best performing parameter sets. Results of this model (available in the Supplementary Material) were comparable to the STREAM model outputs. The HBV was able to better simulate the water balance in the Kaap outlet, but the shape of the hydrographs and FDCs were better captured by the Stream model. This confirms that the distribution of input data, as well as the understanding of dominant runoff generation zones indeed assisted in informing model parameters and model simulation. However, both models still lack the complexity to fully capture the runoff processes occurring in the Kaap and tributaries.
The assessment of model performance should not only rely on statistical measures, but also on other aspects such as shape of the hydrograph, flow duration curves, among other hydrological signatures.

Limitations and Gaps in Process Understanding
Even though great effort was made in selecting the best available input data, and making use of all available information and process understanding, there are some limitations and gaps in process understanding.
The representation of evaporation processes in the model is simplified, but is largely consistent with the water balance. The interception process is represented by a single daily threshold, and the interception storage is not simulated dynamically.
There is also the limitation of available data for potential evaporation. It is likely that the spatial variability of evaporation is higher than that simulated, given that stations used to interpolate climatic data were located in low altitudes. Furthermore, temporal variability of potential and actual evaporation, based on the vegetation cover growing stage and physiology occurs. These aspects were not captured in the current model configurations.
An attempt was made to use remote sensing actual evaporation products, such as ALEXI [48,49], SSEBop [51,52] and CMERST [50]. However, these products have different time scales (weekly and monthly, respectively) and there were challenges with temporal interpolation of this data. We were able to downscale the ALEXI product and use it to drive the model, but the results were disappointing. Comparison at monthly and annual scale revealed that both ALEXI and CMERST products generally overestimate actual evaporation, whereas SSEBop results in underestimation. A potential way forward would be to use an ensemble product, and explore bias correction of the evaporation.
The runoff generation module of STREAM requires further development for model applications at daily time steps. The routing of runoff, the lateral flow process and the percolation were some of the gaps in the current model setup. In literature, most publications applying the STREAM model were done at monthly [34,36,55,56] and weekly [35] time steps. The reported daily model applications were also at much coarser spatial scale [57][58][59].

Conclusions
This study combined hydrological modelling with mapping of dominant runoff generation processes, and a runoff signatures approach to improve the understanding of hydrological processes and runoff generation in a semi-arid African catchment.
Several data sources, parameter input values, and model structures were explored, in order to better understand the dominant processes in the catchment. Runoff response was sensitive to parameters related to the partitioning of rainfall between unsaturated and saturated zone cr, as well as the thresholds for initiation of quickflow qc. However, the inclusion of the feedback process from the saturated zone to the unsaturated zone, termed capillary rise, proved critical to improve model simulations. This was particularly the case for the Kaap and Queens catchments, which have a more diverse geology, coarser soils and hillslope zone. The Noordkaap and Suidkaap catchments have mostly fractured granite for bedrock, have deeper soils, and more plateaus, which results in more subsurface flow occurring.
The results of model simulations were analysed using the hydrological signatures framework as well as standard goodness of fit parameters. Annual runoff, seasonal runoff, flow duration curves and hydrographs of the different model runs were compared. The annual runoff showed that these catchments have high inter-annual variability, driven mostly by the variability of rainfall. The models were able to better simulate flows in wetter years (2010-2013) than in drier years (2004)(2005)(2006). The seasonal flow analysis also revealed that there is strong seasonality in the flow generation. The capillary rise process in the model required a minimum threshold of initiation of the process (GWSmin) to avoid that the groundwater storage would run completely dry, which is not the case in the observed series of streamflow.
FDCs were the signature that best revealed the performance of different model simulations. In most cases the model was able to capture the slope of the FDC up to Q50/60, but missed the slope during the low flows. This was the case especially in the Kaap and Queens catchments. This finding reflects the importance of improving the representation of the evaporation and groundwater-surface water interaction processes, as well as water abstractions in the model setups, to better simulate the low flows.
Finally, the daily and monthly hydrographs were compared, and goodness of fit parameters computed between observed and modelled streamflow. Even though the goodness of fit results were average (0.75 to 0.84 Pearson R 2 , 0.5 to 0.66 NSE in the best simulations), visual comparison shows that the models were able to capture the flow variability well, but missed the simulation of peak flows and overestimated baseflows.
A comparison was also made between the Stream and HBV models. This yielded very similar results in terms of goodness of fit statistics for headwater catchments, but HBV performed better for the Kaap outlet. However, when the results were visually compared in terms of the various signatures used, the Stream model better captured the hydrograph shape, and the flow duration curve, particularly for baseflows.
This study clearly shows that there is no single model setup that can represent all the processes equally well for all the catchments. Due to the differences in landscape, geology, soils and land-use and land cover, different model configurations are better suited for each catchment. However, the distribution of input data, as well as the understanding of dominant runoff generation zones assisted in informing the Stream model. There is a benefit in combining process studies and modelling. The models highlight the shortcomings in process understanding, illustrating gaps in our knowledge. Process studies in this catchment assisted in filling some of this knowledge gaps, but other shortcomings were identified. Future improvements in the model should include the explicit accounting for irrigation and water transfers.
In terms of water management, the research findings reveal that the Queens and Kaap catchments are more sensitive to pollution, particularly during low flows, due to higher level of groundwater/surface water interactions. It is important to improve monitoring of water use, given the high impact of water abstractions in the catchment. The use of remote sensing products could assist in this, but more research is required for bias correction and calibration of products. Improvement in the calculation of actual evaporation is also required, as this constitutes the major component of the water balance, and there is high uncertainty in parameters used, and different evaporation products. and data provided. We particularly thank the Department of Water and Sanitation and the Inkomati-Usuthu Catchment Management Agency for hydrometric data provided. We thank the South African Weather Service for climatic data provided. Dr. Shervan Gharari is thanked for assistance in the HAND analysis. Prof. Hubert Savenije provided valuable ideas for the start of the modelling exercise. We thank Tim Hessels for kindly providing Remote Sensing data. The STREAM model setup in Python was initially coded by Benson Bashange and Dr. Hans van der Kwast. Micah Mukolwe assistance with coding and python is also highly appreciated.

Conflicts of Interest:
The authors declare that they do not have a conflict of interest.   [31].