1. Introduction
A sound understanding of hydrologic cycles of river basins is a crucial part in planning and managing water resources. Reliable predictions from distributed hydrological models require extensive datasets for setup, calibration and validation. Usually, time series of discharge and groundwater head are used to assess hydrological behaviour. However, these time series are often too short, corrupt and not even available in data-scarce regions. In addition, information provided by a hydrograph is integrated in space. In contrast, evapotranspiration (ET) data based on multi-temporal remote sensing products are now increasingly available for many parts of the world, and hence can be used to provide extensive information about hydrological behaviour in river basins. ET data comprise a direct quantification of hydrological fluxes that are second to precipitation, exceeding groundwater recharge and runoff in many regions of the world. Therefore, sophisticated analysis of remotely sensed ET data might even enable us to infer spatio-temporally distributed information about boundary conditions for groundwater, climate, elevations, slopes, soil texture, land use, and long-term trends for single variables of influence which are of vital importance for management as well as hydrological modelling. This information may help reduce the number of computational units in a distributed hydrological model as well as constraining it.
In spite of the fact that ET is at the centre of hydrological processes in river basins, its conventional measurements or estimations (pan evaporation, lysimeter data, etc.) are not available at the desired spatial and temporal resolutions. Therefore, remote sensing is the only viable option to map ET of relatively large areas in a globally consistent and economically feasible manner [
1,
2]. Remotely sensed ET data provide only a snapshot of ET at an instance of image acquisition, but repeated images capture seasonal changes and long-term trends. The recent availability of satellite images of medium spatio-temporal resolution from the MODerate resolution Imaging Spectroradiometer (MODIS) imagery program has increased its application potential [
3,
4]. One of the products of this imagery program is the MODIS evapotranspiration (MODIS ET) data.
In Africa, and specifically Sub-Saharan Africa, the MODIS ET data has been widely applied [
3,
4,
5,
6,
7,
8,
9,
10]. In Tanzania particularly, the MODIS ET data has been applied in three river basins including the Wami River basin [
3,
9,
10]. For example, the data have been applied in the Internal Drainage basin in the northwest of the Wami River basin to assess the dynamics of Lake Manyara [
9]. In that study, Deus and Gloaguen [
9] evaluated the suitability of the MODIS ET by comparing with measured pan ET and found coefficients of correlation of 0.60 and 0.73 for monthly and annual ET respectively. In the Pangani River basin in the north of the Wami River basin, the data were used to map ET trends [
3]. In that basin, Kiptala, et al. [
3] found that in vegetated land surface, the MODIS ET has a coefficient of spatial correction of 0.74 when compared with monthly ET derived from the surface energy balance procedure. In the Wami River basin, the data have also been used to assess the seasonal variation of ET [
10]. These examples demonstrate that the MODIS ET data has a great potential for hydrological studies in this region.
Interpreting a long time series with high spatio-temporal variability is not a trivial task [
11,
12,
13]. Exploratory data analysis using multivariate statistical techniques for dimensionality reduction, for example principal component analysis (PCA) have been used in various water-related studies [
14,
15,
16,
17,
18,
19,
20,
21,
22,
23]. In remote sensing studies, methods such as minimum noise fraction (MNF) and PCA have been applied for dimensionality reduction, pattern recognition, land cover change detection, land cover characterization, image transformation, image classification, and image fusion [
24,
25,
26,
27,
28,
29,
30,
31,
32]. In this study, PCA was used for dimensionality reduction because it generates higher signal-to-noise ratio than MNF when mixed pixels are used (i.e., a mixed pixel maps several different ground objects) [
23,
33]. This inconvenience of different neighbouring physical areas being mapped onto the same pixel is common in MODIS data because of the whiskbroom design and the observation geometry of the MODIS instrument, and spatial uncertainty in the registration of images [
34]. Moreover, Xavier et al. [
35] also argue that PCA is able to reveal interannual and non-periodical signals such as long-term trend, short-term trend, and seasonal variation in the time series of remotely sensed data.
In some studies, remote sensing data have been primarily used to assess spatial patterns of variables like land cover [
28,
30], that could then be inputted directly into hydrological models. This study followed a different approach. An extensive dataset of multi-temporal images of ET was systematically analysed to extract as much information as possible about features that help to better understand hydrological processes in a data-scarce region. For example, enhanced ET during long dry periods might indicate regions of extensive irrigation of crops or shallow groundwater being available for plant roots, long-term shifts of ET patterns could reflect major land use changes, etc. Thus, the focus of the analysis was on spatial patterns during certain boundary conditions, or even on changes of spatial patterns. This approach exactly aims at making maximum use of the potential of remote sensing data rather than aiming at calibration or validation of remote sensing data in order to come up with absolute values. Once this information has been compiled, it can extend the basis for sound water resources researches as well as constraining respective spatially distributed hydrological models right from the beginning.
The layout of the rest of the paper is as follows:
Section 2 introduces the study area, data used, applied principal component analysis and statistical tests.
Section 3 presents the results.
Section 4 discusses the main findings. Finally,
Section 5 offers an outlook on future studies.
4. Discussion
4.1. General Approach
The objective of this study was to infer information about hydrological behaviour in the Wami River basin as a basis for sustainable water resources and land use management as well as constraining subsequent hydrological models in a data-scarce environment. Thus, contrary to numerous other applications of the PCA approach, here the basic idea was not to assign processes or specific variables of influence to the PCs, but to identify prevailing and peculiar spatio-temporal patterns that were restricted to certain specific boundary conditions.
Correspondingly, this study focused on the spatial patterns of ET and how these changed during specific boundary conditions rather than aimed at a quantitative assessment of water fluxes. The latter is known to be prone to substantial uncertainties and to require substantial efforts of ground truth measurements. In contrast, the approach followed in this study makes maximum use of the potential of remote sensing data, i.e., grasping the spatial patterns rather than giving absolute numbers. However, there are still some sources of uncertainty that need to be considered.
The extent of meaningful physical information extracted from a PCA of remotely sensed ET can vary with the resolution (i.e., <1 km) and the type of remotely sensed data used. Yang et al. [
69] illustrated that different remote sensing ET products use different formulae and assumptions, that result in different estimates of ET. Uncertainties in remote sensing ET estimates also depend on geographic location that affects climate (i.e., cloud cover, solar radiation intensity, etc.). Since this study was conducted in the equatorial region in a tropical climate (i.e., high solar radiation intensity, etc.) and focused on relative spatial patterns rather than on absolute ET values, the uncertainties of the MODIS ET were considered minimal. In addition, systematic errors of remote sensing data have little impact on long-term anomalies [
5].
Neither rainfall nor air temperature exhibited a significant trend during the study period. Consequently, meteorological effects between the years 2000 and 2012 are considered to be negligible. Thus, significant trends in the scores of individual PCs offer some evidence for systematic shifts in land cover. The following sections discuss the inferences on hydrological behaviour and possible land surface drivers.
4.2. First Principal Component: Mean Behaviour of ET
All 644 images were strongly positively correlated with PC1 which did not change in the long-term. In fact, PC1 often expresses the most cumulative information from the dataset [
30]. The similarity between the map of average ET and the scores of PC1 suggest that PC1 mainly reflects the mean behaviour of ET in the river basin. Similarly, PCA of time series often yields a first component that depicts the mean behaviour [
17,
18,
59,
70]. This component explained 63.0% of the variance of spatial and temporal patterns of ET, thus it represents the most prevailing conditions in the river basin. The respective spatial pattern of ET likely reflects the spatial patterns of vegetation type and density in the river basin. For example, the downstream part of the river basin exhibited higher ET than the upstream part. In fact,
Figure 9 attests higher canopy density in the former than in the latter.
The negative correlation for the scores of PC1 against elevations indicated higher water availability for ET in low elevations than in high elevations. This is due to higher rainfall in low elevations in the downstream area than in high elevations in the upstream area. It is because the EAMs prevent the influence of Indian Ocean cyclonic rains from reaching the upstream area. Hence, heavier rainfall is encountered in the downstream area than in the upstream area [
45,
71]. The significant, but small positive correlation for the scores of PC1 against slopes was caused by the relatively high ET in the natural forest along the EAMs, where steeper slopes are found.
All soil texture classes differ significantly with respect to the scores of PC1. This could partly be ascribed directly to the different water retention capacities of different soils. On the other hand, land cover and correspondingly vegetation density are also related to soil texture, which has an indirect effect on ET. Soil texture classes are not randomly distributed within the catchment, thus different ET from different classes might partly be related to the spatial pattern of rainfall as well. The similarity of ET between the irrigation areas and woodland suggests that due to irrigation activities, the rice and sugar cane plantations evaporate and transpire as much water as woodland which has deeper roots. Correspondingly, the similarity between bushland and the ranch region could be due to the fact that bushes remain at the ranch even after grazing and may transpire similarly to bushland. The widest distributions of ET in loam, sandy-clay-loam, cropland, and bushland areas were caused by scattering of these soil texture and land use classes across the river basin, thus they exhibited both low and high ET in the upstream and downstream areas respectively (
Figure 2 and
Figure 4). However, the narrowest distributions of ET in clay, clay-loam, irrigation and ranch areas were caused by localized nature of these soil texture and land use classes, thus they exhibited localized high and low ET in the upstream and downstream areas respectively.
4.3. Second Principal Component: Dry Season Effects
The clear seasonal patterns shown by the loadings on PC2 in September–October and March represented the end of the dry period and the middle of the major rainy season (wet period) respectively. In contrast, the significant trend of the loadings on PC2 seemed primarily to be due to single, extreme dry events in the years 2002, 2006, and 2007 which led to relatively low ET across the river basin. Thus, we assume that the trend did not really reflect a continuous increase over time.
The spatial pattern of the scores of PC2 exhibited enhanced ET at the end of the dry period in the western part, north-western part and further downstream. In these regions the depths to static water levels in the wells were less or equal to 10 m below the ground [
10,
49]. The most plausible reason is that, plants have access to shallow groundwater that is related to geological structures in these regions. In the western part of the river basin, granites and migmatites are abundant, where the graben and horst formed by geological faults striking NNE-SSW are found [
10]. In this area, groundwater is mainly restricted to the weathered and fractured part. The faults increase the water-holding capacity of granitic aquifers. In the north-western part, the graben and horst formed by NNE-SSE-striking faults act as the main groundwater reservoir [
10]. Further downstream, composite metamorphic crust domains and granulites, gneisses and migmatites are found, but the geological fault is not well developed. Correspondingly, the aquifers are not as productive as in the western part of the river basin [
10].
Apart from this, clearly higher than average ET that was detected at the end of the dry season in natural forest areas at high elevations, cannot be related to shallow groundwater. These areas clearly stood out compared to all other vegetation types, especially with respect to exceptionally sharp boundaries to adjacent areas with different vegetation cover. It is remarkable that this holds for PC1, PC3, and PC5 as well. In the case of PC2, that has extended ET even at the end of the dry season, it is due to canopies of evergreen forest intercepting nightly fog [
72]. Trees in these cloud forests do not shed leaves even in the dry period.
Usually, during the dry season most of the forest plants shed their leaves due to the decrease of soil moisture. However, in the upstream area, vegetation in the areas with water table close to the surface exhibited relatively high ET as compared to the downstream area. That explains the significant positive correlations of scores of PC2 against both elevations and slopes.
Significantly differing scores for different soil texture classes except for loam and sandy-clay-loam give some evidence for different water retention properties that play a role especially during the dry season. The similar ET between loam and sandy-clay-loam is due to shallow groundwater that is largely found in the upstream area where the two soil texture classes exist. No significant differences in ET were found between irrigation areas and bushland. Both land cover classes exhibited low ET at the end of the dry period. This might be the case just by coincidence. The bushland ET was presumably restricted by low water availability, whereas in the irrigation areas this is the crop harvesting period (end of June–end of September). The widest distributions of ET in loam, sandy-clay-loam and natural forest areas were caused by the north-south spread of loam and natural forest as well as the presence of sandy-clay-loam in the upstream and downstream areas, thus they exhibited both high and low ET respectively (
Figure 2 and
Figure 5). In contrast, the narrowest distributions of ET in clay and ranch areas were caused by their localized presence in the downstream and upstream areas respectively.
4.4. Third Principal Component: General Spatial Patterns of Rainfall
The highest loadings on PC3 in the January–February, May–June, and October–November periods coincided with the peaks of the ONDJFMA, MAM, and OND rainfall seasons. Thus, the loadings on PC3 reflected primarily ET effects due to rainfall distribution in the river basin. The spatial pattern of PC3 shows that the rainy period of January–February caused higher ET in the upstream area due to the unimodal (ONDJFMA) rainfall whereas in the downstream areas there was no rainfall in that period. As a result, the scores of PC3 have positive correlations against both elevations and slopes in the January–February period. The observed significant trend of the loadings on PC3 was caused by relatively heavier January–February rainfall events in the years 2010, 2011, and 2013 in the upstream area. Thus, we assume that the trend did not really reflect a continuous increase. In the ranch region low ET might be due to late vegetation sprout. The natural forests in the downstream area exhibited relatively high ET in the January–February period when there were no heavy rainfall events because the plants in the EAMs are cloud forests [
72].
The scores of PC3 for different soil texture classes differed significantly. Despite the shallow groundwater (see
Section 4.3) and rainfall effects during the peak of ONDJFMA, larger water holding capacity of loam soil in the upstream area contributed to the availability of more water than the sandy soil in the downstream area. In addition, different vegetation cover played different roles. The similarity of scores of PC3 between bushland and woodland in the January–February period may be caused by vegetation sprout of the deciduous trees which shed their leaves in the June–September/early October period in the woodland. Thus, during the sprout period the rate of transpiration from bushland and woodland may not be very different.
4.5. Fourth Principal Component: Single Major Rainstorms
The spatial pattern of PC4 differs remarkably from the other four PCs because it is not clearly related to any topographical structure or vegetation pattern. The transition zone between positive and negative scores west of the EAMs, stretching in the nearly perfect north-south direction (
Figure 7b), provides strong evidence for atmospheric forcing. In addition, the lowest values encountered west of the EAMs, but not in the further upstream areas reflected a lee effect with respect to moisture transport from the east and the west. The time series of loadings on this component exhibited less clear seasonal patterns compared to those of PC3. High loadings on PC4 were restricted to single, relatively short periods, especially at the end of January 2007 and in March 2013 (
Figure 7a). Thus, this component seems to reflect deviations from the usual spatial pattern of rainfall and subsequent ET during single periods. Since the focus of the study is on the prevailing effects or processes and not on single extreme events, we will discuss the event at the end of January 2007 only briefly, as a major extreme event.
At the end of January 2007 the relatively high ET in the upstream and downstream areas is ascribed to the preceding heavy rainfall in OND, 2006. The OND, 2006 rainfall caused flooding in most of eastern Tanzania. It was related to exceptionally high sea surface temperature in the western Indian Ocean and Indo-Pacific El Nino effects [
73]. That was the largest extreme rainfall event since OND, 1997. The relatively lower ET in the midstream area could be caused by the lee effect of the EAMs with respect to strong fluxes of humid easterlies and weak fluxes of low level humid westerlies from the southern Congo basin [
71,
74], that differs due to different air humidity transport velocities. The negative correlation for the scores of PC4 against elevations and slopes might be caused by relatively low ET in the midstream region, where elevations and slopes are the highest in the EAMs.
4.6. Fifth Principal Component: Long-Term Trend of Land Use
Although PC5 explained less variance than any of the first four PCs, the loadings on PC5 exhibited the strongest long-term trend, including a distinctive, almost stepwise increase in 2010. Because neither rainfall nor temperature exhibited corresponding trends, land use change was considered the most probable cause. In fact, often components of lower rank order are associated with long-term changes [
29,
53]. Combining the trend of the loadings with the spatial pattern of the component scores, the results pointed to deteriorating land cover in the west and north of the EAMs. The visible satellite imageries also evinced diminishing forest cover in the north of the river basin and along the EAMs as compared to other areas. Positive scores further downstream, in the lowlands east of the EAMs and in the south of the river basin can be interpreted as reflecting areas that did not exhibit any trend at all. This part might have been intensively used already before the start of the MODIS ET dataset.
Schaafsma et al. [
41] found that commercial charcoal production is still practiced in the lower woodland areas of the EAMs (with some production centres within the boundaries of the EAM blocks), thus causing degradation of woodland in the EAMs. FBD [
75] also reported that the EAMs exhibit rapid land cover change, having lost 11% of their primary forests and 41% of their woodland vegetation since 1975. This conversion is driven by clearance for farmland, as well as by increasing demand for timber and fuel wood [
40]. This supports the argument that land cover change was caused by the deforestation of woodland vegetation.
Despite being in the same area, some natural forests did not exhibit deteriorating land cover because they are within the conservation areas [
42,
76,
77], thus they were not affected by anthropogenic activities. Currently, the vast majority of natural forests in the EAMs are under different forms of legal protection, with most falling within the category of “National Forest Reserves” managed for protection of water resources, soil erosion prevention, and biodiversity conservation [
78,
79,
80]. However, Tabor et al. [
81] argued that forest loss was still happening in the protected areas, though to a far lower extent than outside protected areas.
Correlation of scores of PC5 with elevations and slopes were significant, but rather weak, thus will not be discussed in more detail. The similarity of scores of PC5 between cropland and grassland may be caused by the growth of grass in areas that were initially cultivated, but later abandoned by local farmers for various reasons including loss of soil fertility after being used for some time.
4.7. Implications for Water Resources Management and Distributed Hydrological Models
A lot of information is required for setting up and calibrating distributed hydrological models. Although some of the calibration parameters are closely related to land surface conditions, the spatial variability of these parameters in a large river basin such as the Wami River basin is very difficult to assess. We hypothesized that much can be learned about hydrological behaviour and hydrologically relevant structures by analysing time series of ET data determined by remote sensing. Findings gained using this approach are summarized in the following paragraphs.
About 63% of the spatio-temporal variance of the ET data was already captured by the mean spatial pattern depicted by PC1. It significantly depended on elevations and slopes. The widest distributions of ET shown by PC1 in loam, sandy-clay-loam, cropland, and bushland areas suggest that the roles of these soil texture and land use classes are very crucial in determining the mean behaviour of ET in the river basin. In contrast, the narrowest distributions of ET in clay, clay-loam, irrigation, and ranch areas demonstrate their little effects on the mean behaviour of ET in the river basin. However, PC1 also informed that the effects of different soil textures are significantly different, thus all soil texture classes are absolutely crucial for the hydrological model of the river basin. Likewise, most of the different land use classes are significantly different, except for woodland and irrigation areas, and bushland and ranch areas. Therefore, from the perspective of modelling the mean behaviour of ET, the land use classes of irrigation and ranch areas can be replaced with woodland and bushland respectively, without any loss of information.
However, PC1 did not explicitly capture the effects of irrigation. This was not consistent with our expectations prior to the analysis. It cannot be explained by insufficient spatial resolution of the ET data. In fact, irrigation areas are known to extend more than 15 km2 in the eastern part of the river basin, thus they should have been discernible. In contrast, it has to be concluded that the current irrigation density is relatively low and does not affect the hydrologic cycle in the river basin to the extent we presumed. Similar to PC1, the irrigation areas did not stand out in the spatial pattern depicted by PC2. This is presumably due to the fact that harvest occurs in the dry season. Thus, there is hardly any need for extended irrigation during this season. However, information provided by PC2 helped to better understand the reason for enhanced ET in other parts of the river basin. Sustained high ET in the western and north-western parts at the end of the dry season pointed to shallow groundwater that would be available for plant root uptake. Conducting sufficient spatially distributed measurements of the shallow groundwater or aquifer in the large river basins is not economically feasible. Therefore, the shallow aquifer boundary condition from PC2 can be used to emulate spatially distributed critical depths of water in shallow aquifer routines in hydrological models. In many distributed hydrological models this information is used to control the movement of water from the aquifers to the root zones.
In contrast, shallow groundwater cannot explain the outstanding high ET of natural forests with clear-cut boundaries to adjacent land cover classes. This feature was dominant for PC1, PC2, PC3, and PC5. It sheds some light on the effects of land cover changes around the natural forest areas. In terms of ET, even intensively cultivated areas cannot compensate for high water availability of natural forests as long as the former is not irrigated. Apparently this is due to interception of nightly fog in the canopies of evergreen trees in the natural forest areas, thus ensuring high water availability even during extended dry periods [
72]. Thus, any conversion or impairment of the natural forests obviously would have major effects on the hydrologic cycle in the river basin. The most important difference between natural forests and other land cover classes seems to be their ability to maintain high physiological activity even under dry conditions, pointing to better adaptation to local climatic conditions. On the other hand, PC5 provided some evidence that the protection of the natural forests seems to have been highly effective during recent years. However, close to the nature reserve regions, land use change seemed to have had a major impact, especially in the northern part of the river basin close to the EAMs. This land cover change, especially the step shift around 2010, should be considered both for water resources management and in spatially distributed hydrological models of the Wami River basin.
The loadings on PC3 clearly reflected the unimodal seasonal rainfall in the upstream area and the bimodal rainfall in the downstream area. The spatial pattern of ET depicted by PC3 shows that the effects of January–Februry rainfall during ONDJFMA is substantial in the upstream areas as compared to the downstream areas. This is the minimum spatial differentiation of rainfall that a distributed hydrological model in the Wami River basin should account for. The pattern of PC4 reflected the lee effect during strong easterly rainfall periods and the weak westerly rainfall effects from the southern Congo basin [
71,
74], however, the pattern is more pronounced during extreme rainfall events.
It is remarkable that remote sensing ET data (i.e., PCs) gave information about prevailing and peculiar patterns of rainfall as well. In the short-term, ET was likely to decrease during single rainstorms due to high air humidity and low irradiation and temperature. However, given the limited temporal resolution of the MODIS ET data, this effect seemed to be negligible. In contrast, in the mid-term after heavy rainfall more water was available for plant uptake and thus, likely boosted ET, especially in regions with antecedent limited water supply. This plant response might be delayed due to the time needed for budding of perennial plants or for germinating and growth of annual plants, thus depending on the vegetation type and thus reflecting its spatial patterns. The summary of findings of principal components and their inferences for water resources management and distributed hydrological modelling in the Wami River basin is shown in
Table 4.
5. Conclusions
Planning and managing water resources, and modelling to support the former requires a sound understanding of the hydrologic cycle of the river basin. However, many regions of the world lack extensive monitoring networks of river gauges and groundwater observation wells required for hydrologists to base their advice and their models upon. On the other hand, high-quality remote sensing products are now globally available. We recommend using these products for water resources management and planning, for hydrological system analysis prior to setting up the models and testing them. In this study, inferences regarding hydrological behaviour were derived from a PCA of MODIS ET data.
We showed how a time series of MODIS ET data could be used to assess hydrological behaviour in the data-scarce region of the Wami River basin in Sub-Saharan Africa. A PCA was applied to extract prevailing and peculiar spatial patterns for different seasons and possible long-term trends. The main results elucidated the mean behaviour of ET, illustrated the distinctive behaviour of natural forests, helped to identify regions of shallow groundwater, and clearly pointed to long-term shifts in degrading vegetation cover in parts of the river basin. In addition, the results allowed reduction of the number of land use classes to be considered in the distributed hydrological model of the river basin. We conclude that these findings will highly improve the task of setting up and calibrating or validating the subsequent distributed hydrological model in the Wami River basin.