Evaluation of Spatio-Temporal Patterns of Remotely Sensed Evapotranspiration to Infer Information about Hydrological Behaviour in a Data-Scarce Region

Information about the hydrological behaviour of a river basin prior to setting up, calibrating and validating a distributed hydrological model requires extensive datasets that are hardly available for many parts of the world due to insufficient monitoring networks. In this study, the focus was on prevailing spatio-temporal patterns of remotely sensed evapotranspiration (ET) that enabled conclusions to be drawn about the hydrological behaviour and spatial peculiarities of a river basin at rather high spatial resolution. The prevailing spatio-temporal patterns of ET were identified using a principal component analysis of a time series of 644 images of MODIS ET covering the Wami River basin (Tanzania) between the years 2000 and 2013. The time series of the loadings on the principal components were analysed for seasonality and significant long-term trends. The spatial patterns of principal component scores were tested for significant correlation with elevations and slopes, and for differences between different soil texture and land use classes. The results inferred that the temporal and spatial patterns of ET were related to those of preceding rainfalls. At the end of the dry season, high ET was maintained only in areas of shallow groundwater and in cloud forest nature reserves. A region of clear reduction of ET in the long-term was related to massive land use change. The results also confirmed that most soil texture and land use classes differed significantly. Moreover, ET was exceptionally high in natural forests and loam soil, and very low in bushland and sandy-loam soil. Clearly, this approach has shown great potential of publicly available remote sensing data in providing a sound basis for water resources management as well as for distributed hydrological models in data-scarce river basins at lower latitudes.


Introduction
A sound understanding of hydrologic cycles of river basins is a crucial part in planning and managing water resources.Reliable predictions from distributed hydrological models require extensive datasets for setup, calibration and validation.Usually, time series of discharge and groundwater head are used to assess hydrological behaviour.However, these time series are often too short, corrupt and not even available in data-scarce regions.In addition, information provided by a hydrograph is integrated in space.In contrast, evapotranspiration (ET) data based on multi-temporal remote of remote sensing data rather than aiming at calibration or validation of remote sensing data in order to come up with absolute values.Once this information has been compiled, it can extend the basis for sound water resources researches as well as constraining respective spatially distributed hydrological models right from the beginning.
The layout of the rest of the paper is as follows: Section 2 introduces the study area, data used, applied principal component analysis and statistical tests.Section 3 presents the results.Section 4 discusses the main findings.Finally, Section 5 offers an outlook on future studies.

Study Area
The study region is the Wami River basin, located between 5°00′-7°27′ S and 36°00′-39°00′ E in east-central Tanzania (Figure 1).It has an area of approximately 41,170 km 2 and its elevation ranges from 0 to 2360 m a.s.l.(Figure 1).The river basin is separated into two major parts by the Eastern Arc Mountains (EAMs) which comprise the Rubeho, Ukaguru, Nguru, and Nguu mountain ranges (Figure 1).The geology comprises diverse lithologies derived from cratonic granitoids of the Precambrian age in the west, highly metamorphosed rocks of the Orogenic belts in the central and Neogene deposits in the eastern parts [10].The river basin has been affected by faults, causing terrace and cascade flows at the western boundary of the coastal plain [10].The topographic slopes are much steeper in the EAMs compared to other parts of the river basin and the slope angles range between 0° and 27° (Figure 2a).The predominant soil texture is loam and sandy-clay-loam which constitute 38% and 41% of the river basin area respectively (Figure 2b) [36].The predominant land use classes are bushland (savannah), woodland (deciduous trees), grassland, ranch (savannah grassland), cropland (small-scale farming), irrigation areas, and natural forests (evergreen trees).Bushland, woodland and grassland cover 23%, 44%, and 20% of the basin area respectively [37] (Figure 2c).Ranch, cropland and irrigation areas cover about 10%.Natural forests which are predominantly located along the EAMs cover about 3%.The area of the EAMs is a globally important eco-region [38,39] and one of the world's hotspots of biological diversity [40,41].Madoffe et al. [42] reported that in the year 1900 there was three times the amount of natural forest cover compared to the 2000s.In order to reduce further losses of biodiversity, logging was banned in the EAMs in the mid-1980s and 1990s and forest boundaries were restored in most reserves [42].Other activities such as agriculture development, fuel wood collection, and charcoal burning were also prohibited in the protected areas.The river basin comprises the Saadani National park which is also very important for the downstream ecosystem.The topographic slopes are much steeper in the EAMs compared to other parts of the river basin and the slope angles range between 0 • and 27 • (Figure 2a).The predominant soil texture is loam and sandy-clay-loam which constitute 38% and 41% of the river basin area respectively (Figure 2b) [36].The predominant land use classes are bushland (savannah), woodland (deciduous trees), grassland, ranch (savannah grassland), cropland (small-scale farming), irrigation areas, and natural forests (evergreen trees).Bushland, woodland and grassland cover 23%, 44%, and 20% of the basin area respectively [37] (Figure 2c).Ranch, cropland and irrigation areas cover about 10%.Natural forests which are predominantly located along the EAMs cover about 3%.The area of the EAMs is a globally important eco-region [38,39] and one of the world's hotspots of biological diversity [40,41].Madoffe et al. [42] reported that in the year 1900 there was three times the amount of natural forest cover compared to the 2000s.In order to reduce further losses of biodiversity, logging was banned in the EAMs in the mid-1980s and 1990s and forest boundaries were restored in most reserves [42].Other activities such as agriculture development, fuel wood collection, and charcoal burning were also prohibited in the protected areas.The river basin comprises the Saadani National park which is also very important for the downstream ecosystem.[36] and land use classes of the year 1997 [37].
The average daily air temperature in the river basin is between 21 and 27 °C (2000-2012), and the average rainfall is between 585 and 1175 mm per annum (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012).The river basin has two major rainfall zones: A unimodal rainfall zone with one heavy rainfall season in the upstream area and a bimodal rainfall zone with two rainfall seasons in the downstream area.The unimodal rainfall occurs from late October to April (ONDJFMA).The bimodal rainfall comprises the light rainfall season between late October and December (OND) and the heavy rainfall season from March to May (MAM).The dry period starts in June and ends in September or early October.This is the period with no or little rainfall in the river basin from upstream to downstream.During this period the deciduous trees shed their leaves, most parts of the river flows are confined within the banks and most flood plains dry up.The average MODIS ET is between 353 and 1637 mm/year (2000-2013), with higher ET in the downstream area than in the upstream area (Figure 3).To some extent the hydrologic cycle of the Wami River basin is affected by domestic water supply, irrigation, recently introduced rainwater harvesting agriculture, increasing demand for charcoal, fuel wood, and timber [43,44].The irrigation activities (e.g., sugarcane and rice plantations) accounted for an average of 96% of total abstracted water in the year 2010 [45,46].

Land Surface and Meteorological Data
The land surface data for correlation and difference tests included elevation, slopes, land use, and soil texture data from various databases.A digital elevation model of 90 m resolution was downloaded from the Shuttle Radar Topography Mission (SRTM) database [47] and was used to  [36] and land use classes of the year 1997 [37].
The average daily air temperature in the river basin is between 21 and 27 • C (2000-2012), and the average rainfall is between 585 and 1175 mm per annum (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012).The river basin has two major rainfall zones: A unimodal rainfall zone with one heavy rainfall season in the upstream area and a bimodal rainfall zone with two rainfall seasons in the downstream area.The unimodal rainfall occurs from late October to April (ONDJFMA).The bimodal rainfall comprises the light rainfall season between late October and December (OND) and the heavy rainfall season from March to May (MAM).The dry period starts in June and ends in September or early October.This is the period with no or little rainfall in the river basin from upstream to downstream.During this period the deciduous trees shed their leaves, most parts of the river flows are confined within the banks and most flood plains dry up.The average MODIS ET is between 353 and 1637 mm/year (2000-2013), with higher ET in the downstream area than in the upstream area (Figure 3).The average daily air temperature in the river basin is between 21 and 27 °C (2000-2012), and the average rainfall is between 585 and 1175 mm per annum (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012).The river basin has two major rainfall zones: A unimodal rainfall zone with one heavy rainfall season in the upstream area and a bimodal rainfall zone with two rainfall seasons in the downstream area.The unimodal rainfall occurs from late October to April (ONDJFMA).The bimodal rainfall comprises the light rainfall season between late October and December (OND) and the heavy rainfall season from March to May (MAM).The dry period starts in June and ends in September or early October.This is the period with no or little rainfall in the river basin from upstream to downstream.During this period the deciduous trees shed their leaves, most parts of the river flows are confined within the banks and most flood plains dry up.The average MODIS ET is between 353 and 1637 mm/year (2000-2013), with higher ET in the downstream area than in the upstream area (Figure 3).To some extent the hydrologic cycle of the Wami River basin is affected by domestic water supply, irrigation, recently introduced rainwater harvesting agriculture, increasing demand for charcoal, fuel wood, and timber [43,44].The irrigation activities (e.g., sugarcane and rice plantations) accounted for an average of 96% of total abstracted water in the year 2010 [45,46].

Land Surface and Meteorological Data
The land surface data for correlation and difference tests included elevation, slopes, land use, and soil texture data from various databases.A digital elevation model of 90 m resolution was downloaded from the Shuttle Radar Topography Mission (SRTM) database [47] and was used to To some extent the hydrologic cycle of the Wami River basin is affected by domestic water supply, irrigation, recently introduced rainwater harvesting agriculture, increasing demand for charcoal, fuel wood, and timber [43,44].The irrigation activities (e.g., sugarcane and rice plantations) accounted for an average of 96% of total abstracted water in the year 2010 [45,46].

Land Surface and Meteorological Data
The land surface data for correlation and difference tests included elevation, slopes, land use, and soil texture data from various databases.A digital elevation model of 90 m resolution was downloaded from the Shuttle Radar Topography Mission (SRTM) database [47] and was used to classify topography and to compute slopes.Land use classes were derived from land use data available at a scale of 1:250,000 from the Africover database [37].The FAO [37] produced land use data from digitally enhanced high-resolution LANDSAT TM images acquired mainly in the year 1997.The soil texture classes of the year 2003 were derived from soil texture data available at a scale of 1:2,000,000 from the FAO-ISRIC [36] database.
The daily rainfall and air temperature data for climate variability tests were obtained from the Tanzania Meteorological Agency (TMA).To supplement the measured meteorological data, the reanalysed precipitation and air temperature data from the Inter-Sectoral Impact Model Intercomparison Project (ISI-MIP, [48]) database were used.
Data for verifying the inferred hypotheses were depths to static water levels of wells and the visible satellite imageries.The depths to static water levels of groundwater wells in the Wami River basin were obtained from the Wami/Ruvu Basin Water Office [49].The visible surface reflectance images (MOD09) of 10 June 2001 and 10 June 2012 both at a resolution of 500 m were downloaded from the United States Geological Survey (USGS) database (http://earthexplorer.usgs.gov/,accessed on 20 September 2016).

MODIS ET Data and Pre-Processing
ET data used were provided by MODIS.MODIS is an extensive program using the Terra and Aqua satellite data to provide a comprehensive series of global observations of the earth's land, oceans, and atmosphere in the visible and infrared regions of the electromagnetic spectrum [50].The Terra earth observation system was launched in the year 1999 and the Aqua earth observation system in the year 2002 [50].The Terra satellite goes from north to south across the equator and Aqua passes south to north over the equator [3].
The MODIS ET was calculated based on the Penman-Monteith equation [51].The calculation was done on vegetated regions, but it did not include urban areas, permanent wetlands, and water bodies [51], thus these areas were also not considered in this study.The input data for the MODIS ET calculation were other MODIS products and meteorological reanalysis data.The MODIS products used were the global 1 km Collection 4 MODIS land cover type 2 (MOD12Q1), Collection 4 0.05-degree Climate Modeling Grid MODIS albedo (MOD43C1), global 1 km MODIS Collection 5 leaf area index and the fraction of photosynthetic absorbed radiation (MOD15A2) [51].The global daily meteorological reanalysis dataset used were air temperature, air pressure, humidity, and solar radiation at a resolution of 1.00 • by 1.25 • from the NASA's Global Modeling and Assimilation Office [51].Since the MODIS ET product is a composite index of a number of remote sensing products of various spatio-temporal resolutions and some global meteorological data, it was only used as a convenient index that has a hydrological interpretation.Thus, the focus of this study was not on accuracy of the MODIS ET rather on its spatio-temporal patterns.
For this study the most recent version (version 5: V005) of 8-day ET data with a spatial resolution of 1 km (MOD16A2) were used.The dataset was downloaded from the Numerical Terradynamic Simulation Group database of the research laboratory at the University of Montana in Missoula (http://www.ntsg.umt.edu/project/mod16,accessed on 17 October 2014) [51].The downloaded MOD16A2 dataset contained 644 images between 1 January 2000 and 27 December 2013 (see Video S1 in the Supplementary Materials).
The 644 images of MOD16A2 with their corresponding quality control datasets (MOD16A2 QC) were extracted from HDF files and converted into the Geotiff format.Then the images were clipped to the size of the Wami River basin.The 644 MOD16A2 QC images were analysed for cloud cover.The average cloud cover per MOD16A2 image was 19%.Since clouds rarely repeated in the same areas, distortion due to clouds was negligibly small.Therefore, the computation of cloud masks was omitted as heavy fragmentation of the time series would occur if the masks were applied for even small clouds in every affected image and cumulatively applied to the entire time series [25].In addition, MOD16A2 uses daily global reanalysis weather data as part of its input which potentially mitigates the Water 2017, 9, 333 6 of 25 effect of missing data due to cloudiness [52].For the purpose of consistency in the following sections, the MOD16A2 is referred to as the MODIS ET.

Principal Component Analysis
The primary objective of applying PCA to the MODIS ET dataset was to extract the most typical spatial patterns from the whole dataset of 644 images.The typical spatial patterns could include long-term prevailing patterns as well as peculiar patterns that were restricted to a single season or became clearly stronger or weaker in the long-term.Patterns of long-term changes usually point to possible effects of climate or land use change [29,53].
PCA, also known as the Karhunen-Loeve transformation or empirical orthogonal function [11,19], is a linear method that can be used to reduce the number of variables down to a few principal components (PCs) which explain most of the variance of the dataset [54][55][56][57].The PCA approach basically decomposes a correlation matrix into eigenvectors and eigenvalues.The resulting eigenvectors, i.e., the principal components are un-correlated and are ordered by the fraction of variance they explain in descending order [58].When PCA is applied to a set of time series, the first component usually describes the mean behaviour of the entire dataset and the following components reflect typical deviations from that mean behaviour [59].
Prior to the application of PCA, the 644 images of MODIS ET data covering the Wami River basin were flattened and arranged in matrix X (Equation (1)), where x represents a value of a pixel i from an image j.The variable n is the number of pixels in the image, thus each column in matrix X contains all pixels in a single image (48,132 pixels), whereas m is the number of images in the dataset (644 images).
Each column in matrix X was normalized to a mean of zero and unit standard deviation using Equation (2) to obtain matrix Z (Equation (3)), where x j and σ j are the column mean and column standard deviation of matrix X, respectively.Normalization of the matrix prevented images with extreme conditions from dominating the PCA process.
After that, the correlation matrix S (Equation ( 4)) containing the Pearson correlation coefficients between each column vector and others in matrix Z was calculated.
The Python routine numpy.linalg.svd()[60][61][62] was then used to perform the eigenvalue decomposition of the correlation matrix S which resulted in eigenvectors U and eigenvalues Λ, where T stands for transpose (Equation ( 5)).The eigenvalues are proportional to the fraction of variance explained by the respective eigenvectors [58].
The complete set of typical patterns was determined by multiplying the standardised dataset Z with the eigenvectors U to obtain the matrix of the principal component scores P (Equation ( 6)).
Each column vector of matrix P was then transformed to a 2-D pattern, inversely to the procedure described in Equation ( 1).Thus, any single principal component reflects only a fraction of the spatial pattern of a given MODIS ET image that has not been represented by any other principal component.These spatial patterns were then compared to those of elevation and slope, and also used to check for significant differences between different soil texture and land use classes in order to determine the land surface drivers of the hydrological behaviour in the river basin.
The extent to which a spatial pattern depicted by a single principal component plays a role for a single image of the MODIS ET dataset was quantified by the loadings [14].Loadings L were calculated as the Pearson correlation coefficients of the pairwise comparisons of the scores of a single principal component from matrix P with standardized images from matrix Z (Equation ( 7)).Therefore, the spatial pattern of scores of a component was more pronounced on the highest absolute values of loadings, i.e., the extremes of loadings denoted periods where the respective spatial pattern was especially important.
Because the 644 images of the MODIS ET constituted a time series, their respective loadings on principal components also constituted time series.Therefore, any single MODIS ET image can then be reconstructed by adding the scores of the PCs multiplied by their respective loadings.
The time series of loadings on selected PCs were analysed for clear seasonal patterns and long-term trends.For a more intuitive interpretation of the components, the scores and loadings on the first, third, and fifth PCs were reversed (multiplied by −1).The sign reversing led to the positive and negative scores and loadings representing high and low ET respectively.Subsequent analyses used these reversed PCs.To check for major land use changes, bands 1 (648 nm), 4 (555 nm), and 3 (470 nm) of the surface reflectance data were mapped as red, green, and blue (RGB) colours respectively to obtain the visible appearance of the Wami River basin in the years 2001 and 2012.

Trend Tests
The loadings on each PC constituted a time series of 644 points spanning between 1 January 2000 and 27 December 2013 at 8-day intervals.In order to check for temporal patterns of PCs, the loadings were tested for long-term trends using the pre-whitened Mann-Kendall test and Sen's slope [63][64][65][66].The trend-free pre-whitening procedure was used to correct the time series of loadings for autocorrelation prior to the Mann-Kendall test [66].The Mann-Kendall test was then used to look for a significant monotonic trend in a given dataset of loadings and Sen's slope was used to determine the magnitude of that trend, assuming that it was linear.For a p value less than or equal to 0.05 we rejected the null hypothesis of no long-term trend in the time series of loadings.
In order to investigate the effects of atmospheric forcing on the patterns of PCs, the meteorological effects test was also conducted.Prior to meteorological effects test, the daily rainfall and air temperature data spanning between the years 2000 and 2012 were converted into 8-day periods to yield the same temporal resolution as the loadings.The new time series of rainfall and air temperature at the stations and grid points were then tested for long-term trends using the pre-whitened Mann-Kendall test and Sen's slope [63][64][65][66].For a p value less than or equal to 0.05 we rejected the null hypothesis which stated that there was no long-term trend in the time series of rainfall or air temperature data.

Statistical Dependence Test for Metric Data
The relationship between elevations or slopes and individual principal component scores were tested for significance using Kendall's rank correlation test [64].This test is based on the ranks of data and uses the τ coefficient to measure the statistical dependence.Kendall's τ takes values between −1 and +1, where a positive τ value indicates a perfect positive relationship between two variables and a negative τ value an opposing relationship.For a p value less than or equal to 0.05 we rejected the null hypothesis which stated that Kendall's τ between principal component scores and elevations or slopes was equal to zero.

Statistical Difference Test for Nominal Data
The Kruskal-Wallis [67] and Wilcoxon [68] tests were used to check for significant differences of the scores of PCs for different soil texture and land use classes.Prior to the tests, all pixels of scores of a component in question were sorted and grouped in accordance with the soil texture or land use classes.Firstly, the Kruskal-Wallis test was used to detect differences in the scores between at least two of all soil texture or land use classes.When that was the case, the Wilcoxon post-hoc test was subsequently applied for pairwise comparison of scores of soil texture or land use classes.For both tests a p value less than or equal to 0.05 was used for rejection of the null hypothesis of no differences between different classes.

Trends in Meteorological Data
The meteorological trend tests between the years 2000 and 2012 showed that, the p values for rainfall ranged between 0.31 and 0.97 (>0.05), and the p values for air temperature ranged between 0.08 and 0.56 (>0.05).Therefore, long-term trends of rainfall and air temperature were not significant at the 5% level.Thus, any trend found for individual PC necessarily had to be ascribed to different causes.However, these findings did not necessarily rule out the possibility that individual extreme meteorological events could have caused major effects on some ET patterns.

Principal Components
This study did not aim at a full understanding of the spatio-temporal patterns of ET.Instead, it aimed at extracting only the most dominant and peculiar information from 644 images of the MODIS ET dataset.Therefore, only PCs that reflected meaningful effects were considered.The study also followed the assumption that each single PC might reflect more than one effect or process that affect the hydrologic cycle of the river basin, partly only during very rare and extreme conditions.Thus, only the first five out of 644 PCs were selected based on the interpretability of their spatio-temporal patterns with respect to physical effects or processes.These first five PCs explained 81.2% of the variance in the dataset (Table 1).Time periods on loadings were identified where the spatial pattern represented by the scores of an individual principal component played a major role (see Videos S2 to S6 in the Supplementary Materials).Correspondingly, a trend in the time series of loadings indicated whether the role of the respective spatial pattern increased or decreased in the long-term.Because data were normalized prior to the PCA, zero component scores represented average scores, positive scores denoted a higher than average markedness of the respective effect, and negative scores a lower than average markedness.However, what was considered an effect could equally be considered an inverse effect with a reversed sign (see Videos S3 and S4 in the Supplementary Materials).For example, a strong reduction in ET in the dry period compared to the mean pattern is necessarily inverse to a strong increase in ET in the wet period compared to the mean pattern.
The following sections present results of the five selected scores and loadings on PCs.The differences in the scores of PCs for different soil texture and land use classes are also presented.The long-term trends of loadings on these PCs are presented in Table 2.The results for relationships between scores and elevations or slopes are presented in Table 3.

First Principal Component
The first principal component (PC1) covered the largest fraction of explained variance (63.0%) (Table 1).The loadings on PC1 were usually close to 0.9 throughout the period, except for the first months of the year and did not show any significant trend (Figure 4a, Table 2).The map of average ET throughout the whole period and the scores of PC1 were nearly identical (cf.Figures 3 and 4b) with a coefficient of spatial correlation equal to 99.9%.Consequently, higher-order PCs depict typical deviations from PC1 during respective specific conditions.The scores of PC1 divided the river basin into two parts which showed high and low ET in the downstream and upstream areas respectively (Figure 4b).In addition, natural forests exhibited relatively high scores indicative of systematically high ET (cf.Figures 2c and 4b).The Wami river riparian zone showed relatively high ET in the midstream and downstream areas, but it showed low ET in the upstream area (Figure 4b).
The scores of PC1 showed a significant negative correlation with elevations, and a significant positive correlation with slopes (Table 3).The scores of PC1 also exhibited significant differences among all soil texture classes, but loam and sandy-loam soils exhibited the highest and lowest medians respectively (Figure 4c).In addition, clay and sandy-clay-loam soils exhibited the narrowest and widest distributions of scores respectively.The scores between irrigation areas and woodland (p value = 0.15) as well as between bushland and ranch areas (p value = 0.14) did not differ significantly, but the remaining land use classes exhibited significant differences (Figure 4d).Natural forests exhibited the highest median, whereas bushland and ranch areas exhibited the lowest median of the scores of PC1.However, cropland and ranch areas exhibited the widest and narrowest distributions of scores respectively.

Second Principal Component
The second principal component (PC2) covered 8.7% of explained variance (Table 1).The loadings on PC2 showed an increasing trend (Table 2), but there were also single extreme events in the years 2002, 2006, and 2007 (Figure 5a).The time series of loadings exhibited a clear seasonal pattern with the highest loadings (about 0.6) at the end of the annual dry season in September and October (Figure 5a).Negative peaks of loadings occurred in March but did not undercut −0.4.This illustrates that the spatial pattern of PC2 predominantly captured the September-October period as compared to March.
Low scores, i.e., lower than average ET in the September-October period were found in most lowland areas in the downstream part of the river basin (cf.Figures 1 and 5b).Higher than average ET in this period were found in the loam soil region in the downstream part, and in the sandy-clayloam soil region in the western and north-western parts of the river basin (cf.Figures 2b and 5b).In

Second Principal Component
The second principal component (PC2) covered 8.7% of explained variance (Table 1).The loadings on PC2 showed an increasing trend (Table 2), but there were also single extreme events in the years 2002, 2006, and 2007 (Figure 5a).The time series of loadings exhibited a clear seasonal pattern with the Water 2017, 9, 333 11 of 25 highest loadings (about 0.6) at the end of the annual dry season in September and October (Figure 5a).Negative peaks of loadings occurred in March but did not undercut −0.4.This illustrates that the spatial pattern of PC2 predominantly captured the September-October period as compared to March.
Low scores, i.e., lower than average ET in the September-October period were found in most lowland areas in the downstream part of the river basin (cf.Figures 1 and 5b).Higher than average ET in this period were found in the loam soil region in the downstream part, and in the sandy-clay-loam soil region in the western and north-western parts of the river basin (cf.Figures 2b and 5b).In these regions the depth to static water levels was less or equal to 10 m below the ground (Figure 5b).By using the inverse proportionality, higher and lower than average ET in the September-October period might correspond to static water depths of less or equal to 10 m and greater or equal to 23 m below the ground respectively.
Natural forest areas clearly stand out as sharply delineated regions of high scores, i.e., higher than average ET in the September-October period (cf.Figures 2c and 5b).Groundwater data were not available for these regions in the mountainous region of the EAMs.However, a shallow depth to groundwater is not very likely here.In the September-October period, the scores of PC2 significantly increased with both elevations and slopes, although the relationship was stronger for elevations than for slopes (Table 3).Unlike other soil texture classes, the scores of PC2 for sandy-clay-loam and loam soils did not differ significantly (Figure 5c).However, sandy-loam and clay soils exhibited the highest and lowest medians respectively.The narrowest and widest distributions of the scores were exhibited by clay-loam and loam soils respectively.Among the land use classes, only the scores of PC2 for bushland and irrigation areas did not differ significantly.Moreover, natural forests and woodland exhibited exceptionally the highest and lowest medians respectively (Figure 5d).Nevertheless, natural forest and irrigation areas exhibited the widest and narrowest distributions of scores of PC2 respectively.

Third Principal Component
The third principal component (PC3) explained 5.7% of the variance (Table 1).Figure 6a shows In the September-October period, the scores of PC2 significantly increased with both elevations and slopes, although the relationship was stronger for elevations than for slopes (Table 3).Unlike other soil texture classes, the scores of PC2 for sandy-clay-loam and loam soils did not differ significantly (Figure 5c).However, sandy-loam and clay soils exhibited the highest and lowest medians respectively.The narrowest and widest distributions of the scores were exhibited by clay-loam and loam soils respectively.Among the land use classes, only the scores of PC2 for bushland and irrigation areas did not differ significantly.Moreover, natural forests and woodland exhibited exceptionally the highest and lowest medians respectively (Figure 5d).Nevertheless, natural forest and irrigation areas exhibited the widest and narrowest distributions of scores of PC2 respectively.

Third Principal Component
The third principal component (PC3) explained 5.7% of the variance (Table 1).Figure 6a shows that the loadings on PC3 exhibited a clear seasonal pattern with highly positive peaks (about 0.6) in the January-February period and minor peaks in the August-September period.Moderate negative peaks (about −0.4) occurred in the May-June period and in the October-November period.The loadings showed a significant increase and individual strong events in the January-February periods in the years 2010, 2011, and 2013 (Figure 6a, Table 2).
High scores of PC3 indicated higher than average ET in the upstream area of the river basin during the January-February period (Figure 6b).In contrast, ET was relatively low in the downstream area and in the ranch region in the upstream area during this period (cf.Figures 2c and 6b).Similarly as for the previous PCs natural forests exhibited relatively high ET during this period.
The scores of PC3 correlated equally well positively with both elevations and slopes (Table 3).There were significant differences between all soil texture classes, but sandy-loam and clay-loam soils exhibited the highest and lowest medians respectively (Figure 6c).However, sandy-loam and clay have the widest and narrowest distributions of scores respectively.The scores of PC3 for land use classes showed that, bushland and woodland were similar but the remaining classes differed significantly.The highest and lowest medians were exhibited by natural forest and irrigation areas (Figure 6d).Nevertheless, bushland and woodland exhibited the widest distributions, whereas irrigation areas exhibited the narrowest distribution.
The scores of PC3 correlated equally well positively with both elevations and slopes (Table 3).There were significant differences between all soil texture classes, but sandy-loam and clay-loam soils exhibited the highest and lowest medians respectively (Figure 6c).However, sandy-loam and clay have the widest and narrowest distributions of scores respectively.The scores of PC3 for land use classes showed that, bushland and woodland were similar but the remaining classes differed significantly.The highest and lowest medians were exhibited by natural forest and irrigation areas (Figure 6d).Nevertheless, bushland and woodland exhibited the widest distributions, whereas irrigation areas exhibited the narrowest distribution.

Fourth Principal Component
The fourth principal component (PC4) covered 2.1% of the variance (Table 1).In contrast to the first three principal components, the loadings on PC4 exhibited only weak seasonal patterns with maximum values between October and February, but with pronounced short-term fluctuations (Figure 7a).The loadings exhibited a significant increase with extreme positive values in the end of January 2007 and extreme negative values in March 2013 (Figure 7a, Table 2).
The scores of PC4 showed a remarkably spatial pattern, clearly independent of the topography Water 2017, 9, 333 13 of 25

Fourth Principal Component
The fourth principal component (PC4) covered 2.1% of the variance (Table 1).In contrast to the first three principal components, the loadings on PC4 exhibited only weak seasonal patterns with maximum values between October and February, but with pronounced short-term fluctuations (Figure 7a).The loadings exhibited a significant increase with extreme positive values in the end of January 2007 and extreme negative values in March 2013 (Figure 7a, Table 2).
The scores of PC4 showed a remarkably spatial pattern, clearly independent of the topography (cf.Figures 1, 2a and 7b).The scores divided the river basin into three nearly homogeneous parts with high ET in the upstream and downstream areas, and low ET in the midstream area, west of the mountainous regions.
The scores of PC4 negatively correlated with both elevations and slopes (Table 3).However, the relationship was stronger for elevations than for slopes.The scores of PC4 also showed that all soil texture classes exhibited different distributions significantly.Sandy-loam and loam soils exhibited the highest and lowest medians respectively (Figure 7c).However, the widest distribution of scores was exhibited by sandy-clay-loam, whereas the narrowest distribution was exhibited by clay-loam.For land use, all classes were significantly different except for bushland and cropland.The land use classes which exhibited the highest and lowest medians of the scores of PC4 were bushland and ranch areas respectively (Figure 7d).Irrigation areas exhibited the narrowest distribution of scores, but woodland and bushland exhibited the widest distributions of scores (Figure 7d).areas respectively (Figure 7d).Irrigation areas exhibited the narrowest distribution of scores, but woodland and bushland exhibited the widest distributions of scores (Figure 7d).

Fifth Principal Component
The fifth principal component (PC5) covered 1.7% of the variance (Table 1).The time series of loadings on PC5 exhibited substantial short-term variation and lacked a clear seasonal pattern (Figure 8a).However, they exhibited the clearest long-term trend among the first five components with an average increase of 2% per annum (Table 2).
High scores were found in the downstream parts of the river basin and in the lowlands east and south of the EAMs (cf.Figures 1 and 8b).In contrast, consistently low scores occurred especially in the region north and west of the EAMs, except for regions of natural forests that exhibited relatively high scores irrespective of their location in the river basin (cf.Figures 2c and 8b).

Fifth Principal Component
The fifth principal component (PC5) covered 1.7% of the variance (Table 1).The time series of loadings on PC5 exhibited substantial short-term variation and lacked a clear seasonal pattern Water 2017, 9, 333 14 of 25 (Figure 8a).However, they exhibited the clearest long-term trend among the first five components with an average increase of 2% per annum (Table 2).
High scores were found in the downstream parts of the river basin and in the lowlands east and south of the EAMs (cf.Figures 1 and 8b).In contrast, consistently low scores occurred especially in the region north and west of the EAMs, except for regions of natural forests that exhibited relatively high scores irrespective of their location in the river basin (cf.Figures 2c and 8b).
Both elevations and slopes were significantly correlated with the scores of PC5 (Table 3).We also found significant differences among the scores of PC5 for all soil texture classes, but the highest and lowest medians were exhibited by clay and loam soils respectively (Figure 8c).However, loam soil also has the widest distribution of scores, whereas clay has the narrowest distribution of scores.For land use, classes with similar patterns of scores were cropland and grassland, but other classes were significantly different (Figure 8d).The highest and lowest medians of the scores of PC5 for land use classes were exhibited by natural forest and ranch areas respectively.But the widest distributions of scores were exhibited by natural forest and woodland, whereas the narrowest distributions were found in irrigation and ranch areas.
Figure 9 shows the land covers in the Wami River basin in the years 2001 and 2012.The images show that in the period between the years 2001 and 2012, forests diminished in the north of the river basin and along the EAMs (cf.Figures 1, 2c, 8b and 9).However, natural forests appeared to be the same during that period (cf.Figures 8b and 9).Moreover, slight changes of land cover in the west of the river basin were also found.

General Approach
The objective of this study was to infer information about hydrological behaviour in the Wami River basin as a basis for sustainable water resources and land use management as well as constraining subsequent hydrological models in a data-scarce environment.Thus, contrary to numerous other applications of the PCA approach, here the basic idea was not to assign processes or specific variables of influence to the PCs, but to identify prevailing and peculiar spatio-temporal patterns that were restricted to certain specific boundary conditions.
Correspondingly, this study focused on the spatial patterns of ET and how these changed during specific boundary conditions rather than aimed at a quantitative assessment of water fluxes.The latter is known to be prone to substantial uncertainties and to require substantial efforts of ground truth measurements.In contrast, the approach followed in this study makes maximum use of the potential of remote sensing data, i.e., grasping the spatial patterns rather than giving absolute numbers.However, there are still some sources of uncertainty that need to be considered.

General Approach
The objective of this study was to infer information about hydrological behaviour in the Wami River basin as a basis for sustainable water resources and land use management as well as constraining subsequent hydrological models in a data-scarce environment.Thus, contrary to numerous other applications of the PCA approach, here the basic idea was not to assign processes or specific variables of influence to the PCs, but to identify prevailing and peculiar spatio-temporal patterns that were restricted to certain specific boundary conditions.
Correspondingly, this study focused on the spatial patterns of ET and how these changed during specific boundary conditions rather than aimed at a quantitative assessment of water fluxes.The latter is known to be prone to substantial uncertainties and to require substantial efforts of ground truth measurements.In contrast, the approach followed in this study makes maximum use of the potential of remote sensing data, i.e., grasping the spatial patterns rather than giving absolute numbers.However, there are still some sources of uncertainty that need to be considered.
The extent of meaningful physical information extracted from a PCA of remotely sensed ET can vary with the resolution (i.e., <1 km) and the type of remotely sensed data used.Yang et al. [69] illustrated that different remote sensing ET products use different formulae and assumptions, that result in different estimates of ET.Uncertainties in remote sensing ET estimates also depend on geographic location that affects climate (i.e., cloud cover, solar radiation intensity, etc.).Since this study was conducted in the equatorial region in a tropical climate (i.e., high solar radiation intensity, etc.) and focused on relative spatial patterns rather than on absolute ET values, the uncertainties of the MODIS ET were considered minimal.In addition, systematic errors of remote sensing data have little impact on long-term anomalies [5].
Neither rainfall nor air temperature exhibited a significant trend during the study period.Consequently, meteorological effects between the years 2000 and 2012 are considered to be negligible.Thus, significant trends in the scores of individual PCs offer some evidence for systematic shifts in land cover.The following sections discuss the inferences on hydrological behaviour and possible land surface drivers.

First Principal Component: Mean Behaviour of ET
All 644 images were strongly positively correlated with PC1 which did not change in the long-term.In fact, PC1 often expresses the most cumulative information from the dataset [30].The similarity between the map of average ET and the scores of PC1 suggest that PC1 mainly reflects the mean behaviour of ET in the river basin.Similarly, PCA of time series often yields a first component that depicts the mean behaviour [17,18,59,70].This component explained 63.0% of the variance of spatial and temporal patterns of ET, thus it represents the most prevailing conditions in the river basin.The respective spatial pattern of ET likely reflects the spatial patterns of vegetation type and density in the river basin.For example, the downstream part of the river basin exhibited higher ET than the upstream part.In fact, Figure 9 attests higher canopy density in the former than in the latter.
The negative correlation for the scores of PC1 against elevations indicated higher water availability for ET in low elevations than in high elevations.This is due to higher rainfall in low elevations in the downstream area than in high elevations in the upstream area.It is because the EAMs prevent the influence of Indian Ocean cyclonic rains from reaching the upstream area.Hence, heavier rainfall is encountered in the downstream area than in the upstream area [45,71].The significant, but small positive correlation for the scores of PC1 against slopes was caused by the relatively high ET in the natural forest along the EAMs, where steeper slopes are found.
All soil texture classes differ significantly with respect to the scores of PC1.This could partly be ascribed directly to the different water retention capacities of different soils.On the other hand, land cover and correspondingly vegetation density are also related to soil texture, which has an indirect effect on ET.Soil texture classes are not randomly distributed within the catchment, thus different ET from different classes might partly be related to the spatial pattern of rainfall as well.The similarity of ET between the irrigation areas and woodland suggests that due to irrigation activities, the rice and sugar cane plantations evaporate and transpire as much water as woodland which has deeper roots.Correspondingly, the similarity between bushland and the ranch region could be due to the fact that bushes remain at the ranch even after grazing and may transpire similarly to bushland.The widest distributions of ET in loam, sandy-clay-loam, cropland, and bushland areas were caused by scattering of these soil texture and land use classes across the river basin, thus they exhibited both low and high ET in the upstream and downstream areas respectively (Figures 2 and 4).However, the narrowest distributions of ET in clay, clay-loam, irrigation and ranch areas were caused by localized nature of these soil texture and land use classes, thus they exhibited localized high and low ET in the upstream and downstream areas respectively.

Second Principal Component: Dry Season Effects
The clear seasonal patterns shown by the loadings on PC2 in September-October and March represented the end of the dry period and the middle of the major rainy season (wet period) respectively.In contrast, the significant trend of the loadings on PC2 seemed primarily to be due to single, extreme dry events in the years 2002, 2006, and 2007 which led to relatively low ET across the river basin.Thus, we assume that the trend did not really reflect a continuous increase over time.
The spatial pattern of the scores of PC2 exhibited enhanced ET at the end of the dry period in the western part, north-western part and further downstream.In these regions the depths to static water levels in the wells were less or equal to 10 m below the ground [10,49].The most plausible reason is that, plants have access to shallow groundwater that is related to geological structures in these regions.In the western part of the river basin, granites and migmatites are abundant, where the graben and horst formed by geological faults striking NNE-SSW are found [10].In this area, groundwater is mainly restricted to the weathered and fractured part.The faults increase the water-holding capacity of granitic aquifers.In the north-western part, the graben and horst formed by NNE-SSE-striking faults act as the main groundwater reservoir [10].Further downstream, composite metamorphic crust domains and granulites, gneisses and migmatites are found, but the geological fault is not well developed.Correspondingly, the aquifers are not as productive as in the western part of the river basin [10].
Apart from this, clearly higher than average ET that was detected at the end of the dry season in natural forest areas at high elevations, cannot be related to shallow groundwater.These areas clearly stood out compared to all other vegetation types, especially with respect to exceptionally sharp boundaries to adjacent areas with different vegetation cover.It is remarkable that this holds for PC1, PC3, and PC5 as well.In the case of PC2, that has extended ET even at the end of the dry season, it is due to canopies of evergreen forest intercepting nightly fog [72].Trees in these cloud forests do not shed leaves even in the dry period.
Usually, during the dry season most of the forest plants shed their leaves due to the decrease of soil moisture.However, in the upstream area, vegetation in the areas with water table close to the surface exhibited relatively high ET as compared to the downstream area.That explains the significant positive correlations of scores of PC2 against both elevations and slopes.
Significantly differing scores for different soil texture classes except for loam and sandy-clay-loam give some evidence for different water retention properties that play a role especially during the dry season.The similar ET between loam and sandy-clay-loam is due to shallow groundwater that is largely found in the upstream area where the two soil texture classes exist.No significant differences in ET were found between irrigation areas and bushland.Both land cover classes exhibited low ET at the end of the dry period.This might be the case just by coincidence.The bushland ET was presumably restricted by low water availability, whereas in the irrigation areas this is the crop harvesting period (end of June-end of September).The widest distributions of ET in loam, sandy-clay-loam and natural forest areas were caused by the north-south spread of loam and natural forest as well as the presence of sandy-clay-loam in the upstream and downstream areas, thus they exhibited both high and low ET respectively (Figures 2 and 5).In contrast, the narrowest distributions of ET in clay and ranch areas were caused by their localized presence in the downstream and upstream areas respectively.

Third Principal Component: General Spatial Patterns of Rainfall
The highest loadings on PC3 in the January-February, May-June, and October-November periods coincided with the peaks of the ONDJFMA, MAM, and OND rainfall seasons.Thus, the loadings on PC3 reflected primarily ET effects due to rainfall distribution in the river basin.The spatial pattern of PC3 shows that the rainy period of January-February caused higher ET in the upstream area due to the unimodal (ONDJFMA) rainfall whereas in the downstream areas there was no rainfall in that period.As a result, the scores of PC3 have positive correlations against both elevations and slopes in the January-February period.The observed significant trend of the loadings on PC3 was caused by relatively heavier January-February rainfall events in the years 2010, 2011, and 2013 in the upstream area.Thus, we assume that the trend did not really reflect a continuous increase.In the ranch region low ET might be due to late vegetation sprout.The natural forests in the downstream area exhibited relatively high ET in the January-February period when there were no heavy rainfall events because the plants in the EAMs are cloud forests [72].
The scores of PC3 for different soil texture classes differed significantly.Despite the shallow groundwater (see Section 4.3) and rainfall effects during the peak of ONDJFMA, larger water holding capacity of loam soil in the upstream area contributed to the availability of more water than the sandy soil in the downstream area.In addition, different vegetation cover played different roles.The similarity of scores of PC3 between bushland and woodland in the January-February period may be caused by vegetation sprout of the deciduous trees which shed their leaves in the June-September/early October period in the woodland.Thus, during the sprout period the rate of transpiration from bushland and woodland may not be very different.

Fourth Principal Component: Single Major Rainstorms
The spatial pattern of PC4 differs remarkably from the other four PCs because it is not clearly related to any topographical structure or vegetation pattern.The transition zone between positive and negative scores west of the EAMs, stretching in the nearly perfect north-south direction (Figure 7b), provides strong evidence for atmospheric forcing.In addition, the lowest values encountered west of the EAMs, but not in the further upstream areas reflected a lee effect with respect to moisture transport from the east and the west.The time series of loadings on this component exhibited less clear seasonal patterns compared to those of PC3.High loadings on PC4 were restricted to single, relatively short periods, especially at the end of January 2007 and in March 2013 (Figure 7a).Thus, this component seems to reflect deviations from the usual spatial pattern of rainfall and subsequent ET during single periods.Since the focus of the study is on the prevailing effects or processes and not on single extreme events, we will discuss the event at the end of January 2007 only briefly, as a major extreme event.
At the end of January 2007 the relatively high ET in the upstream and downstream areas is ascribed to the preceding heavy rainfall in OND, 2006.The OND, 2006 rainfall caused flooding in most of eastern Tanzania.It was related to exceptionally high sea surface temperature in the western Indian Ocean and Indo-Pacific El Nino effects [73].That was the largest extreme rainfall event since OND, 1997.The relatively lower ET in the midstream area could be caused by the lee effect of the EAMs with respect to strong fluxes of humid easterlies and weak fluxes of low level humid westerlies from the southern Congo basin [71,74], that differs due to different air humidity transport velocities.The negative correlation for the scores of PC4 against elevations and slopes might be caused by relatively low ET in the midstream region, where elevations and slopes are the highest in the EAMs.

Fifth Principal Component: Long-Term Trend of Land Use
Although PC5 explained less variance than any of the first four PCs, the loadings on PC5 exhibited the strongest long-term trend, including a distinctive, almost stepwise increase in 2010.Because neither rainfall nor temperature exhibited corresponding trends, land use change was considered the most probable cause.In fact, often components of lower rank order are associated with long-term changes [29,53].Combining the trend of the loadings with the spatial pattern of the component scores, the results pointed to deteriorating land cover in the west and north of the EAMs.The visible satellite imageries also evinced diminishing forest cover in the north of the river basin and along the EAMs as compared to other areas.Positive scores further downstream, in the lowlands east of the EAMs and in the south of the river basin can be interpreted as reflecting areas that did not exhibit any trend at all.This part might have been intensively used already before the start of the MODIS ET dataset.
Schaafsma et al. [41] found that commercial charcoal production is still practiced in the lower woodland areas of the EAMs (with some production centres within the boundaries of the EAM blocks), thus causing degradation of woodland in the EAMs.FBD [75] also reported that the EAMs exhibit rapid land cover change, having lost 11% of their primary forests and 41% of their woodland vegetation since 1975.This conversion is driven by clearance for farmland, as well as by increasing demand for timber and fuel wood [40].This supports the argument that land cover change was caused by the deforestation of woodland vegetation.
Despite being in the same area, some natural forests did not exhibit deteriorating land cover because they are within the conservation areas [42,76,77], thus they were not affected by anthropogenic activities.Currently, the vast majority of natural forests in the EAMs are under different forms of legal protection, with most falling within the category of "National Forest Reserves" managed for protection of water resources, soil erosion prevention, and biodiversity conservation [78][79][80].However, Tabor et al. [81] argued that forest loss was still happening in the protected areas, though to a far lower extent than outside protected areas.
Correlation of scores of PC5 with elevations and slopes were significant, but rather weak, thus will not be discussed in more detail.The similarity of scores of PC5 between cropland and grassland may be caused by the growth of grass in areas that were initially cultivated, but later abandoned by local farmers for various reasons including loss of soil fertility after being used for some time.

Implications for Water Resources Management and Distributed Hydrological Models
A lot of information is required for setting up and calibrating distributed hydrological models.Although some of the calibration parameters are closely related to land surface conditions, the spatial variability of these parameters in a large river basin such as the Wami River basin is very difficult to assess.We hypothesized that much can be learned about hydrological behaviour and hydrologically relevant structures by analysing time series of ET data determined by remote sensing.Findings gained using this approach are summarized in the following paragraphs.About 63% of the spatio-temporal variance of the ET data was already captured by the mean spatial pattern depicted by PC1.It significantly depended on elevations and slopes.The widest distributions of ET shown by PC1 in loam, sandy-clay-loam, cropland, and bushland areas suggest that the roles of these soil texture and land use classes are very crucial in determining the mean behaviour of ET in the river basin.In contrast, the narrowest distributions of ET in clay, clay-loam, irrigation, and ranch areas demonstrate their little effects on the mean behaviour of ET in the river basin.However, PC1 also informed that the effects of different soil textures are significantly different, thus all soil texture classes are absolutely crucial for the hydrological model of the river basin.Likewise, most of the different land use classes are significantly different, except for woodland and irrigation areas, and bushland and ranch areas.Therefore, from the perspective of modelling the mean behaviour of ET, the land use classes of irrigation and ranch areas can be replaced with woodland and bushland respectively, without any loss of information.
However, PC1 did not explicitly capture the effects of irrigation.This was not consistent with our expectations prior to the analysis.It cannot be explained by insufficient spatial resolution of the ET data.In fact, irrigation areas are known to extend more than 15 km 2 in the eastern part of the river basin, thus they should have been discernible.In contrast, it has to be concluded that the current irrigation density is relatively low and does not affect the hydrologic cycle in the river basin to the extent we presumed.Similar to PC1, the irrigation areas did not stand out in the spatial pattern depicted by PC2.This is presumably due to the fact that harvest occurs in the dry season.Thus, there is hardly any need for extended irrigation during this season.However, information provided by PC2 helped to better understand the reason for enhanced ET in other parts of the river basin.Sustained high ET in the western and north-western parts at the end of the dry season pointed to shallow groundwater that would be available for plant root uptake.Conducting sufficient spatially distributed measurements of the shallow groundwater or aquifer in the large river basins is not economically feasible.Therefore, the shallow aquifer boundary condition from PC2 can be used to emulate spatially distributed critical depths of water in shallow aquifer routines in hydrological models.In many distributed hydrological models this information is used to control the movement of water from the aquifers to the root zones.
In contrast, shallow groundwater cannot explain the outstanding high ET of natural forests with clear-cut boundaries to adjacent land cover classes.This feature was dominant for PC1, PC2, PC3, and PC5.It sheds some light on the effects of land cover changes around the natural forest areas.In terms of ET, even intensively cultivated areas cannot compensate for high water availability of natural forests as long as the former is not irrigated.Apparently this is due to interception of nightly fog in the canopies of evergreen trees in the natural forest areas, thus ensuring high water availability even during extended dry periods [72].Thus, any conversion or impairment of the natural forests obviously would have major effects on the hydrologic cycle in the river basin.The most important difference between natural forests and other land cover classes seems to be their ability to maintain high physiological activity even under dry conditions, pointing to better adaptation to local climatic conditions.On the other hand, PC5 provided some evidence that the protection of the natural forests seems to have been highly effective during recent years.However, close to the nature reserve regions, land use change seemed to have had a major impact, especially in the northern part of the river basin close to the EAMs.This land cover change, especially the step shift around 2010, should be considered both for water resources management and in spatially distributed hydrological models of the Wami River basin.
The loadings on PC3 clearly reflected the unimodal seasonal rainfall in the upstream area and the bimodal rainfall in the downstream area.The spatial pattern of ET depicted by PC3 shows that the effects of January-Februry rainfall during ONDJFMA is substantial in the upstream areas as compared to the downstream areas.This is the minimum spatial differentiation of rainfall that a distributed hydrological model in the Wami River basin should account for.The pattern of PC4 reflected the lee effect during strong easterly rainfall periods and the weak westerly rainfall effects from the southern Congo basin [71,74], however, the pattern is more pronounced during extreme rainfall events.It is remarkable that remote sensing ET data (i.e., PCs) gave information about prevailing and peculiar patterns of rainfall as well.In the short-term, ET was likely to decrease during single rainstorms due to high air humidity and low irradiation and temperature.However, given the limited temporal resolution of the MODIS ET data, this effect seemed to be negligible.In contrast, in the mid-term after heavy rainfall more water was available for plant uptake and thus, likely boosted ET, especially in regions with antecedent limited water supply.This plant response might be delayed due to the time needed for budding of perennial plants or for germinating and growth of annual plants, thus depending on the vegetation type and thus reflecting its spatial patterns.The summary of findings of principal components and their inferences for water resources management and distributed hydrological modelling in the Wami River basin is shown in Table 4. Unimodal (October-April (ONDJFMA)) and bimodal (October-December (OND) and MAM) rainfall distributions in the upstream and downstream parts of the river basin respectively.ONDJFMA rainfall during the January-February periods increases ET in the upstream part of the river basin, at high elevations and steep slopes.

4
Lee effect of strong humid easterlies and effects of weak humid westerlies.
Effect on the spatial pattern of ET in the river basin due to strong rainfall from the east and weak rainfall from the west of the Eastern Arc Mountains (EAMs).

5
Long-term change of land use.
Long-term and spatially almost homogeneous reduction of ET due to massive deforestation of woodland vegetation northwest of the EAMs, except for the forest nature reserves.

Conclusions
Planning and managing water resources, and modelling to support the former requires a sound understanding of the hydrologic cycle of the river basin.However, many regions of the world lack extensive monitoring networks of river gauges and groundwater observation wells required for hydrologists to base their advice and their models upon.On the other hand, high-quality remote sensing products are now globally available.We recommend using these products for water resources management and planning, for hydrological system analysis prior to setting up the models and testing them.In this study, inferences regarding hydrological behaviour were derived from a PCA of MODIS ET data.
We showed how a time series of MODIS ET data could be used to assess hydrological behaviour in the data-scarce region of the Wami River basin in Sub-Saharan Africa.A PCA was applied to extract prevailing and peculiar spatial patterns for different seasons and possible long-term trends.The main results elucidated the mean behaviour of ET, illustrated the distinctive behaviour of natural forests, helped to identify regions of shallow groundwater, and clearly pointed to long-term shifts in degrading vegetation cover in parts of the river basin.In addition, the results allowed reduction of the number of land use classes to be considered in the distributed hydrological model of the river basin.

Figure 4 .
Figure 4. Time series of the loadings on the first principal component (PC1) (a).Map of the scores of PC1 and natural forest areas (b).Quartiles of the scores of PC1 for different soil texture (c) and land use (d) classes, similar score distributions at the 5% level of significance are marked with the same small letter.

Figure 4 .
Figure 4. Time series of the loadings on the first principal component (PC1) (a).Map of the scores of PC1 and natural forest areas (b).Quartiles of the scores of PC1 for different soil texture (c) and land use (d) classes, similar score distributions at the 5% level of significance are marked with the same small letter.

Figure 5 .
Figure 5.Time series of the loadings on the second principal component (PC2) (a).Map of the scores of PC2, natural forest areas and depths to static water levels (DSWL) measured below the ground (b).Quartiles of the scores of PC2 for different soil texture (c) and land use (d) classes, similar score distributions at the 5% level of significance are marked with the same small letter.

Figure 5 .
Figure 5.Time series of the loadings on the second principal component (PC2) (a).Map of the scores of PC2, natural forest areas and depths to static water levels (DSWL) measured below the ground (b).Quartiles of the scores of PC2 for different soil texture (c) and land use (d) classes, similar score distributions at the 5% level of significance are marked with the same small letter.

Figure 6 .
Figure 6.Time series of the loadings on the third principal component (PC3) (a).Map of the scores of PC3 and natural forest areas (b).Quartiles of the scores of PC3 for different soil texture (c) and land use (d) classes, similar score distributions at the 5% level of significance are marked with the same small letter.

Figure 6 .
Figure 6.Time series of the loadings on the third principal component (PC3) (a).Map of the scores of PC3 and natural forest areas (b).Quartiles of the scores of PC3 for different soil texture (c) and land use (d) classes, similar score distributions at the 5% level of significance are marked with the same small letter.

Figure 7 .
Figure 7. Time series of the loadings on the fourth principal component (PC4) (a).Map of the scores of PC4 (b).Quartiles of the scores of PC4 for different soil texture (c) and land use (d) classes, similar score distributions at the 5% level of significance are marked with the same small letter.

Figure 7 .
Figure 7. Time series of the loadings on the fourth principal component (PC4) (a).Map of the scores of PC4 (b).Quartiles of the scores of PC4 for different soil texture (c) and land use (d) classes, similar score distributions at the 5% level of significance are marked with the same small letter.

Figure 8 .
Figure 8.Time series of the loadings on the fifth principal component (PC5) (a).Map of the scores of PC5 and natural forest areas (b).Quartiles of the scores of PC5 for different soil texture (c) and land use (d) classes, similar score distributions at the 5% level of significance are marked with the same small letter.

Figure 9 .
Figure 9. Satellite imageries for the years 2001 (a) and 2012 (b) mapped as red, green, and blue (RGB)

Figure 8 .
Figure 8.Time series of the loadings on the fifth principal component (PC5) (a).Map of the scores of PC5 and natural forest areas (b).Quartiles of the scores of PC5 for different soil texture (c) and land use (d) classes, similar score distributions at the 5% level of significance are marked with the same small letter.

Figure 8 .
Figure 8.Time series of the loadings on the fifth principal component (PC5) (a).Map of the scores of PC5 and natural forest areas (b).Quartiles of the scores of PC5 for different soil texture (c) and land use (d) classes, similar score distributions at the 5% level of significance are marked with the same small letter.

Figure 9 .
Figure 9. Satellite imageries for the years 2001 (a) and 2012 (b) mapped as red, green, and blue (RGB) (648, 555, and 470 nm).The deep and light green colours represent dense and sparse forests respectively, other colours represent scattered vegetation (not discernible at 500 m resolution).

Figure 9 .
Figure 9. Satellite imageries for the years 2001 (a) and 2012 (b) mapped as red, green, and blue (RGB) (648, 555, and 470 nm).The deep and light green colours represent dense and sparse forests respectively, other colours represent scattered vegetation (not discernible at 500 m resolution).

Table 1 .
Eigenvalues, fractions of explained variance and cumulative proportions of the first five principal components (PCs).

Table 2 .
Trend analysis of the time series of PC loadings at 5% level of significance.

Table 3 .
Kendall's rank correlation between the scores of the principal components and elevations or slopes.

Table 4 .
Findings and inferences for water resources management and distributed hydrological modelling.Clear dichotomy between the upstream (low evapotranspiration (ET)) and downstream (high ET) parts of the river basin, partly due to a heavier March-May (MAM) rainy season in the latter.ET was exceptionally high in natural forests and loam soil, and very low in bushland and sandy-loam soil.No significant differences between ET of bushland and ranch areas.Irrigation of rice and sugar cane plantations obviously resulted in ET as high as in woodland.Loam, sandy-clay-loam, bushland and cropland areas have widespread effects on average ET across the river basin.Clay, clay-loam, current irrigation and ranch areas have localized effects on average ET in the river basin. 2 Regions of extended high ET at the end of the dry season.Regions of shallow groundwater, accessible by plant roots in the dry season.Outstanding role of fog interception in regions of natural cloud forests.Effect of irrigation not visible during the dry season due to earlier harvest.No significant differences between loam and sandy-clay-loam during the dry season.High importance of this dry season pattern in the June-September periods in the years 2002, 2006, and 2007.