An Assessment of the Hydrological Trends Using Synergistic Approaches of Remote Sensing and Model Evaluations over Global Arid and Semi-Arid Regions

: Drylands cover about 40% of the world’s land area and support two billion people, most of them living in developing countries that are at risk due to land degradation. Over the last few decades, there has been warming, with an escalation of drought and rapid population growth. This will further intensify the risk of desertification, which will seriously affect the local ecological environment, food security and people’s lives. The goal of this research is to analyze the hydrological and land cover characteristics and variability over global arid and semi-arid regions over the last decade (2010–2019) using an integrative approach of remotely sensed and physical process-based numerical modeling (e.g., Global Land Data Assimilation System (GLDAS) and Famine Early Warning Systems Network (FEWS NET) Land Data Assimilation System (FLDAS) models) data. Interaction between hydrological and ecological indicators including precipitation, evapotranspiration, surface soil moisture and vegetation indices are presented in the global four types of arid and semi-arid areas. The trends followed by precipitation, evapotranspiration and surface soil moisture over the decade are also mapped using harmonic analysis. This study also shows that some hotspots in these global drylands, which exhibit different processes of land cover change, demonstrate strong coherency with noted groundwater variations. Various types of statistical measures are computed using the satellite and model derived values over global arid and semi-arid regions. Comparisons between satellite- (NASA-USDA Surface Soil Moisture and MODIS Evapotranspiration data) and model (FLDAS and GLDAS)-derived values over arid regions (BSh, BSk, BWh and BWk) have shown the over and underestimation with low accuracy. Moreover, general consistency is apparent in most of the regions between GLDAS and FLDAS model, while a strong discrepancy is also observed in some regions, especially appearing in the Nile Basin downstream hyper-arid region. Data-driven modelling approaches are thus used to enhance the models’ performance in this region, which shows improved results in multiple statistical measures ((RMSE), bias ( ψ ), the mean absolute percentage difference ( (cid:12)(cid:12)(cid:12) ψ (cid:12)(cid:12)(cid:12) ) ) and the linear regression coefficients (i.e., slope, intercept, and coefficient of determination (R 2 )). GLDAS of ground observations. satellite-based dataset monitoring SSM and ET in this the satellite-based dataset inaccessible due to atmospheric conditions (e.g., clouds) and sensor operating issues. Therefore, additional investigations are conducted in this region to compare FLDAS and GLDAS’s coherency with satellite products to improve their performance through machine learning modeling. Scatterplots are applied to compare the satellite products with FLDAS, GLDAS and the proposed model. The results are presented in Figures 12 and 13 for SSM and ET, respectively. It is noted that both FLDAS and GLDAS have undesired results with near-zero R 2 and high RMSE, (cid:12)(cid:12)(cid:12) ψ (cid:12)(cid:12)(cid:12) and ψ . The proposed models in this study remarkably improve the results with a very good correlation coe ﬃ cient R 2 ( > 0.8) and lower RMSE and bias.


Introduction
Arid and semi-arid regions, also known as drylands, are areas where the annual total surface evaporation and vegetation transpiration substantially exceeds precipitation. Recent studies show that they account for about 40% of the Earth's land surface [1] and play an important role in the process of global climate change as a regulator of trends and variabilities of atmospheric carbon dioxide (CO 2 ) concentrations [2][3][4][5]. Anders et al. [2] showed that the mean sink, trend, and interannual variability in CO 2 uptake by terrestrial ecosystems are influenced by diverse biogeographic regions where the trend and interannual variability of the sink are largely influenced by semi-arid ecosystems. In fact, Joel et al. [3] suggested that the contribution of the dryland regions towards the changing regional and global CO 2 exchange may be three to five times larger than current estimates. Benjamin et al. [5] found that the global carbon cycle inter-annual variability is driven by the carbon pools in semi-arid biomes. Due to the lack of water supplies, low vegetation coverage, and fragile ecological environment in these areas, these regions are suitable for activities with lower annual productivity and more sensitive and rapid in response to external forces. In the past 100 years, the arid and semi-arid regions have been the areas with the most significant land temperature increase and area expansion [6,7]. The situation is projected to intensify, as indicated by climate models [8,9], likely resulting in an increase in aridity, warming, along with land degradation and desertification in the drylands of developing countries [7]. The drylands would also undergo the consequences of climate change through emissions from humid lands [7,10].
Nowadays, more than two billion people live in these regions, with about 90% of them from developing countries where people's living standards and technological capabilities lag far behind developed countries [1]. The interaction of anthropogenic activities (e.g., farming, grazing and irrigation) and natural processes is suggested to have a strong influence on the water balance. The increasing demand for water resources in support of these activities can exacerbate the severity of existing drought conditions. Mitigation actions such as groundwater exploitation for irrigation and industry would alleviate the ongoing situation yet increase the vulnerability for upcoming droughts. Therefore, moderate to severe droughts, combined with the depletion of surface and groundwater resources, pose significant risks to water and food security, leading to wide-ranging and long-term impacts, including conflicts among dryland countries over shared transboundary river basins. As a result, there is a pressing need to improve our understanding of the ongoing situations and the future trends of hydrological parameters in these global arid and semi-arid regions to perform proper mitigation assessments for these important and vulnerable ecosystems.
Dryland vegetation is strongly governed by the spatial and temporal variation of water availability. These variables show apparent links between hydrology and ecology in these regions, where the temporal variation of precipitation is more substantial than other areas at different scales (diurnal, seasonal and interannual). Generally, precipitation pulses trigger a cascade of pulsed ecosystem responses to end dry periods. Meanwhile, they can also lead to a rapid increase in evapotranspiration (ET), which consists mainly of evaporation (E) from plant and soil surfaces and plant transpiration (T) through leaf stomata. Consequently, transpiration can contribute significantly to total ET if the water added is adequate to wet the root zone, indicated by the surface/sub-surface soil moistures [3,[11][12][13][14]. In drylands, ET is an effective indicator of vegetation resilience and resistance to drought conditions, since water is the major constraint on plant productivity [15,16]. In the hydrological cycle, ET also serves as a key process that links energy, carbon and water cycles [17].
Ground-based observation networks, mainly from eddy covariance flux towers, have been established to estimate various variables such as ET and gross primary productivity (GPP). Some typical networks include FLUXNET [18], AmeriFlux [19], ILTER [20] and NEON [21]. These sites are mostly situated in more humid and developed areas, while large drylands in Africa, South America, Middle East, and central Asia have limited representation. Remote sensing has been functional in demonstrating the role of GPP and ET, as well as ecosystem structure within the context of the broader Earth system [4,5]. Favorable atmospheric conditions (lower cloud coverage) in drylands offer advantages for optical remote Remote Sens. 2020, 12, 3973 3 of 28 sensing and increase the chances of high-quality observations. In addition, pioneering technologies were developed in drylands, including the retrieval of surface reflectance [22,23], as well as quantitative estimates of vegetation conditions and photosynthetic capacity in the rangelands of the Great Plains [24]. The later research afterward became the Normalized Difference Vegetation Index (NDVI) [25], calculated as NDVI = (NIR − R)/(NIR + R), which is generally based on the high reflectivity of plants in the near-infrared (NIR) wavelength range and the high absorption by plants in the red (R) wavelength range for photosynthesis. With spectral normalization, NDVI minimizes effects from topographic variation, sun-sensor geometry and shadows [26], thus becoming one of the most widely used vegetation indices [26][27][28]. The long-term and high-frequency NDVI time series retrievals, like the 23 year time series generated for long-term studies of the Sahel region [26], enhanced our understanding of the long-term trends and driving forces behind regional dryland dynamics. In Australia, an analysis to check whether vegetation cover increased was conducted using calibrated advanced very high-resolution radiometer data spanning 1981-2006 [27]. In the same region, an analysis of the decadal time-scale changes was conducted using AVHRR GIMMS NDVI dataset to check the relationship between a proxy for vegetation productivity and annual rainfall [28]. Subsequently, Randall et al. [29] observed a greening of the globe over recent decades which they attributed to CO2 fertilization effect. A study by Ramus et al. [30] to analyze trends in vegetation greenness of semi-arid areas showed an increase in greenness found both in semi-arid areas where precipitation is the dominating limiting factor for plant production and where air temperature is the primary growth constraint. Similarly, Ulf et al. [31] in their research using NOAA AVHRR data in Mediterranean basin, the Sahel from the Atlantic to the Red Sea, major parts of the drylands of Southern Africa, China-Mongolia and the drylands of South America showed a greening-up. Other studies conducted by Kolby et al. [32], which showed an increase in net primary productivity (NPP), and by Zaichun et al. [33], which discussed the greening of earth and its drivers, attributed the major cause to be CO 2 fertilization. These investigations recently elucidated that global drylands greening is largely due to CO 2 fertilization's effects in the context of global climate change [29][30][31][32][33]. Since the 2000s, there has been an exponential increase in the volume of remote sensing data, resulting from newly launched sensors and products, as well as higher resolution domains (spatial, temporal, and spectral). Such expansion of big data revolutionized our understanding of the Earth system, while posing a higher request for advances in data management and processing capability. Accordingly, increasing the availability of cloud-based computing and analytical platforms, such as Amazon Web Service (AWS) and Google Earth Engine [34], is being widely used for dryland research [35][36][37][38][39][40]. Using the tools within Google Earth Engine, it was found that global dryland tree cover was significantly underrated and recent estimates exceeded previous ones by over 40% [41].
The goal of this paper is to evaluate hydrologic and land cover changes in global arid/semi-arid regions over the last decade (2010-2019), through time series and trends analysis, correlation and interactions of multiple hydrological indicators and vegetation indices. Another key point of this research is to evaluate two major process-based land surface modeling and data assimilation models (GLDAS and FLDAS) by satellite-based hydrological products. Additional analysis displayed an overestimation of surface soil moisture (SSM) and an underestimation of evapotranspiration (ET), as well as a discovery of a strong discrepancy observed in Egypt by the aforementioned models. To mitigate this discrepancy, machine learning models using extra data from satellites (Moderate Resolution Imaging Spectroradiometer (MODIS)) were developed which increased the process-based model performance significantly and provided results with lower RMSE and bias. This optimization effort highlights the promising applications of the synergistic approach of data-driven and process-based modeling for hydrological studies.
This paper is organized into four sections. The information regarding the study area, data sources and methods are described in Section 2. Section 3.1 illustrates the time series boxplots of multiple hydrological parameters used over arid regions. In Section 3.2, the trends of ET, soil moisture and precipitation during the period of 2010-2019 over global drylands are presented. Section 3.3 shows the Remote Sens. 2020, 12, 3973 4 of 28 landcover and water storage changes of selected areas. In Section 3.4, the comparison between FLDAS and GLDAS surface soil moisture and ET products is shown by correlation maps, followed by the approach of data-driven modelling of the aforementioned products. Finally, the results are discussed in detail in Section 4 and a conclusion in Section 5 is provided for the paper.

Study Area
The global arid and semi-arid regions were selected from the updated version of the Köppen-Geiger climate map [42], which shows five main climate groups, with sub-division of each group based on seasonal precipitation and temperature patterns. The five main groups are A (tropical), B (dry), C (temperate), D (continental), and E (polar). In this research, the subgroups starting with the letter "B" (dry) were identified as arid and semi-arid classes, namely, BWh (hot desert climates), BWk (cold desert climates), BSh (hot steppe climates) and BSk (cold steppe climates) (Figure 1a). The corresponding biome type distribution over these regions is also presented in Figure 1b using OpenLandMap Potential distribution of biomes dataset [43].
Remote Sens. 2020, 12, x FOR PEER REVIEW 4 of 30 moisture and precipitation during the period of 2010-2019 over global drylands are presented. Section 3.3 shows the landcover and water storage changes of selected areas. In Section 3.4, the comparison between FLDAS and GLDAS surface soil moisture and ET products is shown by correlation maps, followed by the approach of data-driven modelling of the aforementioned products. Finally, the results are discussed in detail in Section 4 and a conclusion in Section 5 is provided for the paper.

Study Area
The global arid and semi-arid regions were selected from the updated version of the Köppen-Geiger climate map [42], which shows five main climate groups, with sub-division of each group based on seasonal precipitation and temperature patterns. The five main groups are A (tropical), B (dry), C (temperate), D (continental), and E (polar). In this research, the subgroups starting with the letter "B" (dry) were identified as arid and semi-arid classes, namely, BWh (hot desert climates), BWk (cold desert climates), BSh (hot steppe climates) and BSk (cold steppe climates) (Figure 1a). The corresponding biome type distribution over these regions is also presented in Figure 1b using OpenLandMap Potential distribution of biomes dataset [43].  The precipitation data used in this study were obtained from the Climate Hazards Group Infrared Precipitation with Station (CHIRPS) dataset (version 2.0) [44], with 0.05 • spatial resolution satellite imagery, and were converted into monthly temporal resolution. Developed by Earth Resource Observation and Science Center and the United States Geological Survey (USGS), CHIRPS is a 30+ year quasi-global rainfall dataset incorporating satellite and station data and to create gridded precipitation time series for hydrology monitoring and correlational and trend studies.

•
Surface Soil Moisture Data The surface soil moisture data were obtained from the National Aeronautics and Space Administration (NASA)-United States Department of Agriculture (USDA) Global Soil Moisture Dataset [45][46][47][48], with 0.25 • spatial resolution and monthly temporal resolution. This dataset is generated through the data assimilation approach using a modified two-layer Palmer model using a 1-D ensemble Kalman filter (EnKF) to integrate the satellite-derived soil moisture observations. This dataset includes products of soil moisture profile (%), surface and subsurface soil moisture, as well as their standardized anomalies values computed using a 31 day moving window. •

Water Storage Data
The changes in groundwater storage are represented by the variations in the water equivalent height (in cm) observed from the Gravity Recovery and Climate Experiment (GRACE) and GRACE Follow-on mission monthly gravitational anomalies datasets [49][50][51] in 1 • spatial resolution. GRACE is a twin satellite setup that measures global changes in water anomalies. These anomalies depict the changes in the total water column represented by water equivalent height, which includes surface, sub-surface, and groundwater components. •

Vegetation Indices Data
The Moderate Resolution Imaging Spectroradiometer (MODIS) sensor onboard of Terra satellite provides three vegetation index products: evapotranspiration (ET) [52], NDVI [53], Leaf Area Index (LAI) and fraction of absorbed photosynthetically active radiation (FPAR) [54]. All the products are regridded and converted into monthly data over 1 km spatial resolution. The calculation of MODIS ET is affected by other variables, including albedo, biomes type and LAI [52]. LAI is defined as half the total leaf area per unit ground area [55]. FPAR is defined as the fraction of the incoming solar radiation that is absorbed by a photosynthetic organism in the photosynthetically active radiation spectral region. The MODIS products are proved to effectively capture the dominant seasonal LAI and FPAR patterns in many arid and semi-arid ecosystems [56].

Data from Process-Based Models
The Land Data Assimilation Systems (LDAS) use data from multiple ground and space observations that are integrated with advanced numerical models of physical processes to simulate fields of water and energy states and fluxes, which helps in filling gaps and minimizing errors in the observations. Different projects (e.g., NLDAS, GLDAS, FLDAS) of LDAS have been configured for specific domains and purposes [57]. There are some known issues with GLDAS version 1 data, which might present some challenges in the runoff, soil moisture and ET analysis performed with the GLDAS model. These issues are suggested to be solved in the newer version (GLDAS version 2.1) used in this study. The GLDAS 2.1 simulation started on 1 January 2000 using the conditions from the GLDAS version 2.0 simulation. The GLDAS-2.1 is revised with upgraded models using National Oceanic and Atmospheric Administration (NOAA)/Global Data Assimilation System (GDAS) atmospheric analysis fields [58], the disaggregated Global Precipitation Climatology Project (GPCP) precipitation fields [59] and the Air Force Weather Agency's AGRiculturalMETeorological modeling system (AGRMET) radiation fields. In this study, the surface soil moisture and evapotranspiration were selected from the GLDAS-2.1 Noah model from 2010 to 2019, with its 0.25 degree spatial resolution and monthly temporal resolution.
FLDAS [60] is designed for semi-arid, food-insecure regions of Africa. Unlike GLDAS, which relies on global rainfall products (e.g., CMAP [61] and Princeton [62]), FLDAS uses a combination of the forcing data from Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2) data and CHIRPS [44], which is designed for seasonal drought monitoring and trend analysis. In this study, the surface soil moisture and evapotranspiration were selected from the FLDAS dataset in Noah version 3.6.1 from 2010 to 2019, with its 0.1 degree spatial resolution and monthly temporal resolution.

Harmonic Analysis
Harmonic analysis is a method of expressing a function or signal as a superposition of fundamental waves as Equation (1). In this study, to estimate the variation of multiple variables (e.g., SSM, precipitation), a harmonic model H(t) was built with elements of a constant band (β 0 ), a linear term of slope (β 1 ) and harmonic terms of amplitudes (β 2 , β 3 , β 4 and β 5 ). The constant band β 0 represents the extent of consistency of the time series. β 1 shows linear trend of a time series in spite of seasonal variations. Therefore, the term β 1 represents the annually increasing/decreasing trend. Additionally, f represents the fundamental frequency and harmonic terms of amplitudes (β 2 , β 3 , β 4 and β 5 ) are used to simulate seasonal variations.
Following the methods in [38,39], harmonic analysis was applied to show a yearly trend followed by precipitation, ET and surface soil moisture in global drylands from 1 January 2000 to 31 December 2019 using monthly values (units in mm). The units of SSM/ET outputs from FLDAS and GLDAS products were also transformed into mm to compare with MODIS products.

Correlational Analysis
For the purpose of finding the correlational relationship between the multiple variables, correlation maps were developed to show the standard correlation using the Pearson correlation coefficient (r) in the range from −1 (anti-correlation) to +1 (perfect correlation), between these two monthly time series x and y, with N elements as Equation (2): with Cov being the covariance function, x and y the average and σ x and σ y the standard deviations for x and y, respectively. In this study, the association between the GLDAS and FLDAS products in both SSM and ET variables over global arid and semi-arid regions was studied by the Pearson correlation analysis using their anomalies value X a calculated as Equation (3): with X as the monthly value and X as the monthly mean value. The anomalies value X a can better demonstrate the effect of the change in a variable on the corresponding variable in the correlational

Data-Driven Modelling of Selected Variables
In this study, the monthly data from dryland area in Egypt (mainly the Nile Delta region) during the years 2010 to 2019 were sampled and selected for the SSM and ET model development and evaluation. This region has 50 randomly selected sampling locations with a 10 km scale. A total of 29 parameters, including MODIS SSM and ET (modeling objective), as well as GLDAS and FLDAS outputs (e.g., Heat flux, specific humidity, total precipitation rate, soil temperature, wind speed, etc.) were used in the modelling (see comprehensive list in Table S1). The modelling dataset (~3000 records) was randomly split into a training set and testing set, with a ratio of 8:2, respectively.
Traditionally, intensive manual experiments are needed for model selection and parameterization, because it is difficult to know which model is most suitable for the problem before researchers try all the models. Therefore, the automatic optimization methods were proposed to overcome this issue, which try to train and evaluate model groups by parameters sequentially, and then include new parameter groups according to the updated results. In this study, automated model selection and tuning techniques were performed using Bayesian parameter optimization on the OptiML platform (https://bigml.com/api/optimls), which tried different supervised models (101 in total) for the regression tasks, including one linear regression, 89 tree models, 9 ensembles (up to 256 trees), and two deep neural networks. During the process, the platform used the Monte Carlo cross-validation [65], which iteratively generated new original data sets of training and test segmentation for the upper half of the model, and discarded the remaining half of the model due to poor performance. Therefore, this approach has an advantage over k-fold cross validation for the training and validation subsets, in that it is independent of the number of iterations. For both SSM and ET, deep neural networks outperformed other candidate models with R 2 values of 0.87 and 0.81 (see comprehensive metrics in Table S2), respectively, and thus applied for the simulation.

Model Accuracy Assessment
Following [66], accuracy assessments of the algorithms are presented through standard statistical errors (root mean square error (RMSE), bias (ψ), the mean absolute percentage difference (|ψ|)), and the linear regression (i.e., slope, intercept, and coefficient of determination (R 2 )). Mathematically, the value of ψ is derived from the following: where N is the number of data points and ψ i is derived from: In addition, the value of ψ is calculated from the following: The quantity ψ determines the bias, while ψ illustrates the scattering of data points.

The Time Series of Multiple Hydrological Parameters
Time series boxplots of multiple variables, in the order of ET, precipitation, FPAR, LAI, NDVI and SSM (surface soil moisture), are presented in        Figure 7 shows the land cover changes between 2010 and 2019 over BSh, BSk, BWh and BWk regions. Table 1 shows the pixel count of the areas that changed and the percentages for each type: BSh has the highest change of 10.57%, while BWh has the lowest change of 3.92%. It is noted that such changes are most plausible in four regions: (1) Figure 7 shows the land cover changes between 2010 and 2019 over BSh, BSk, BWh and BWk regions. Table 1 shows the pixel count of the areas that changed and the percentages for each type: BSh has the highest change of 10.57%, while BWh has the lowest change of 3.92%. It is noted that such changes are most plausible in four regions: (1)          Moreover, the comparisons between satellite-based products (NASA-USDA SSM and MODIS ET) and model products obtained from FLDAS and GLDAS over arid regions (BSh, BSk, BWh and BWk) are presented in Figure 10 (SSM) and Figure 11 (ET), respectively. It is noted that both models tend to overestimate SSM (linear regression coefficient <1) but underestimate ET (linear regression coefficient >1), which is indicated by the positive (SSM) and negative (ET) biases (ψ). The values of ψ and ψ are nearly identical for all plots in Figure 10. The absolute values of ψ and ψ are also almost identical in the cold areas (BSK and BWk), while different in the hot areas (BSh and BWh). This means model outputs of ET sometimes are estimated as being higher than ET observed from MODIS satellites in the hot areas but not in the cold areas. For SSM products (Figure 10), satellite SSM values fall into the range of around [0, 25], while ranges of [12,48] and [0,45] are set for FLDAS and GLDAS, respectively. For both FLDAS and GLDAS, the hot areas (BSh and BWh) have better consistency with NASA-USDA products than the cold areas (BSK and BWk). Compared with GLDAS, FLDAS achieves better consistency in the hot areas with higher R 2 , yet is slightly worse in the cold areas. For ET products (Figure 11), the highest consistency is found in BSh, yet the lowest consistency in BWk. FLDAS has slightly better coherency with MODIS ET data than GLDAS in all regions. Moreover, the comparisons between satellite-based products (NASA-USDA SSM and MODIS ET) and model products obtained from FLDAS and GLDAS over arid regions (BSh, BSk, BWh and BWk) are presented in Figure 10 (SSM) and Figure 11 (ET), respectively. It is noted that both models tend to overestimate SSM (linear regression coefficient <1) but underestimate ET (linear regression coefficient >1), which is indicated by the positive (SSM) and negative (ET) biases ( ). The values of   As shown in Figure 9, strong disagreement between FLDAS and GLDAS products mainly exists in the Egypt territory. It is not decided which product has better performance due to a lack of ground observations. Therefore, the satellite-based dataset becomes a relatively reliable source of monitoring SSM and ET in this region. However, the satellite-based dataset is occasionally inaccessible due to atmospheric conditions (e.g., clouds) and sensor operating issues. Therefore, additional investigations are conducted in this region to compare FLDAS and GLDAS's coherency with satellite products to improve their performance through machine learning modeling. Scatterplots are applied to compare the satellite products with FLDAS, GLDAS and the proposed model. The results are presented in Figures 12 and 13 for SSM and ET, respectively. It is noted that both FLDAS and GLDAS have undesired results with near-zero R 2 and high RMSE, ψ and ψ. The proposed models in this study remarkably improve the results with a very good correlation coefficient R 2 (>0.8) and lower RMSE and bias.

The Comparison and Modeling of FLDAS and GLDAS ET and SSM Products
Remote Sens. 2020, 12, x FOR PEER REVIEW 21 of 30 As shown in Figure 9, strong disagreement between FLDAS and GLDAS products mainly exists in the Egypt territory. It is not decided which product has better performance due to a lack of ground observations. Therefore, the satellite-based dataset becomes a relatively reliable source of monitoring SSM and ET in this region. However, the satellite-based dataset is occasionally inaccessible due to atmospheric conditions (e.g., clouds) and sensor operating issues. Therefore, additional investigations are conducted in this region to compare FLDAS and GLDAS's coherency with satellite products to improve their performance through machine learning modeling. Scatterplots are applied to compare the satellite products with FLDAS, GLDAS and the proposed model. The results are presented in Figures 12 and 13 for SSM and ET, respectively. It is noted that both FLDAS and GLDAS have undesired results with near-zero R 2 and high RMSE, | | and . The proposed models in this study remarkably improve the results with a very good correlation coefficient R 2 (>0.8) and lower RMSE and bias.

Discussions
At local and regional scales, previous studies have been conducted on the connections between hydrological and environmental/climatic variations in arid and semi-arid regions. A study at China's Yarlung Zangbo River (YLZR) basin showed the negatively correlated relationship between the climatic indices, including Pacific decadal oscillation (PDO) and the multivariate El Niño-Southern Oscillation (ENSO) index (MEI)) and precipitation, annual average temperature, and the streamflow of selected hydrological stations [67]. Similarly, this negative relationship between precipitation and ENSO indices was also found for Jana and Karan Islands in the nearshore Saudi Arabia area [36]. The monthly precipitation datasets (the time period 1961-2007) were compared over the arid Balochistan province of Pakistan, showing an increase in drought with global warming, indicated by the time series analysis of gauge-based records and climate models [68]. The study in the Upper Omo-Ghibe river basin of Ethiopia showed the trends in precipitation, temperature, and streamflow in the period from 1981 to 2008. A detailed review of the studies in global-scale monthly hydroclimatic time series was reported [69]. Therefore, our satellite-based approach on the analysis of monthly hydroclimatic time series during the last decade (2010-2019) is an updated continuing study of the dryland hydrology at a global scale. However, one of the fundamental challenges of dryland remote sensing is the decoupling of the ecosystem function (e.g., vegetation photosynthetic ability and stomatal conductance) from optical reflectance [70]. During the drought periods of extreme moisture shortage, some vegetation types, especially drought-tolerant evergreen shrubs, can keep constant greenness even if their vegetation and soil function decrease. This situation can be found in the boxplots in Figures 2-5, where it is noted that the indices (FPAR, LAI and NDVI) do not have strong responses for ET as compared to the precipitation pulses. Additionally, high aerosol concentrations, leaves

Discussions
At local and regional scales, previous studies have been conducted on the connections between hydrological and environmental/climatic variations in arid and semi-arid regions. A study at China's Yarlung Zangbo River (YLZR) basin showed the negatively correlated relationship between the climatic indices, including Pacific decadal oscillation (PDO) and the multivariate El Niño-Southern Oscillation (ENSO) index (MEI)) and precipitation, annual average temperature, and the streamflow of selected hydrological stations [67]. Similarly, this negative relationship between precipitation and ENSO indices was also found for Jana and Karan Islands in the nearshore Saudi Arabia area [36]. The monthly precipitation datasets (the time period 1961-2007) were compared over the arid Balochistan province of Pakistan, showing an increase in drought with global warming, indicated by the time series analysis of gauge-based records and climate models [68]. The study in the Upper Omo-Ghibe river basin of Ethiopia showed the trends in precipitation, temperature, and streamflow in the period from 1981 to 2008. A detailed review of the studies in global-scale monthly hydroclimatic time series was reported [69]. Therefore, our satellite-based approach on the analysis of monthly hydroclimatic time series during the last decade (2010-2019) is an updated continuing study of the dryland hydrology at a global scale. However, one of the fundamental challenges of dryland remote sensing is the decoupling of the ecosystem function (e.g., vegetation photosynthetic ability and stomatal conductance) from optical reflectance [70]. During the drought periods of extreme moisture shortage, some vegetation types, especially drought-tolerant evergreen shrubs, can keep constant greenness even if their vegetation and soil function decrease. This situation can be found in the boxplots in  where it is noted that the indices (FPAR, LAI and NDVI) do not have strong responses for ET as compared to the precipitation pulses. Additionally, high aerosol concentrations, leaves covered by dust, marked adjacency effect, as well as surface anisotropy in sparse and heterogeneous canopies, make high-quality surface reflectance retrievals over drylands still a challenging process [71][72][73]. In arid and semi-arid ecosystems, the presence of soil background, standing litter and senesced vegetation, can have a significant impact on the reflectance and vegetation indices (e.g., NDVI), also adding considerable noise to LAI and the retrieval of FPAR [74][75][76][77]. The spurious changes induced by variable soil background reflectance also make it difficult to validate the actual vegetation change [76,78,79]. Therefore, the Soil-Adjusted Vegetation Index (SAVI) was introduced with a soil adjustment factor (L), typically set as L = 0.5 [75], to account for differences from the soil background in red and near-infrared reflectance. However, SAVI is limited to differentiate vegetation from soil under sparse vegetation cover conditions [79]. Prior knowledge is also required to select the optimal L value [75]. In addition, one of the major assumptions of global MODIS LAI and FPAR products is that each pixel contains only one biome type with constant within-biome soil and vegetation properties [80]. However, this simplified assumption is unlikely to hold in arid and semi-arid areas due to their heterogeneous characteristics.
Sustainable water resources management requires knowledge regarding the effects of land use and land cover change (LULCC) on groundwater recharge and surface runoff [81]. The tropical and sub-tropical semi-arid regions use groundwater to meet improving urban, industrial and agricultural water requirements [82]. Hence, a thorough understanding of these effects and the subsequent changes is required to maintain a sustainable water supply in the arid and semi-arid regions. In response to the growing food demand of an increasing human population, semi-arid regions are experiencing a major conversion of natural vegetation to agricultural land, leading to land exploitation due to agricultural practices [83]. Figure 7 shows the situation of land cover changes over drylands, where some hotspot regions were also experiencing LULCC along with groundwater variations. Some regions in Iraq depend on groundwater for sprinkler irrigation to overcome the issue of less water due to a shortage of rainfall. Before this, groundwater was one of the main sources of freshwater and for domestic uses. Because of this change, many government and private wells have been drilled to use for irrigation [84]. In Syria, during the conflict with ISIS, the dependence on groundwater for agriculture grew at an alarming rate, as ISIS controlled the Euphrates river dam [85]. In many parts of Ethiopia, such as the Shinile catchment found in the Somali regional state, the pastoral and semi pastoral communities travel a long distance in search of water and grazing land. It is largely an arid and semi-arid watershed with a large potential for groundwater which the communities are unaware of [86]. New South Wales in Australia has undergone transitions between pasture/scrubland, vineyard and built-up land [87]. These transitions due to tourism and economic development would affect the groundwater recharge in the region. In Argentina, there is evidence of an increase in groundwater recharge in areas where croplands replaced forests [88]. Some studies also show that land cover changes influence ecosystem water fluxes as well as salinity patterns [89,90]. It is noted that land cover and groundwater change during the later 2010s is possibly connected with global warming conditions between 2014 and 2019, which are among the warmest years on record, and the year 2020 is likely to become the second warmest year on record (https://www.ncdc.noaa.gov/cag/global/haywood/globe/land_ocean).
Both GLDAS and FLDAS do provide a suite of outputs over a global domain in order to provide physically consistent and spatially and temporally continuous products that are critical for global dryland studies. In particular, FLDAS has been recommended in hydroclimate studies and early warning applications over areas of Eastern, Southern, and Western Africa [57]. However, in Figure 9, the strong discrepancy between GLDAS and FLDAS for certain areas excludes North Africa from the recommendatory list. The analysis in Figures 10 and 11 shows the modeling biases in cold and dry areas from both FLDAS and GLDAS SSM/ET products, showing that they systematically overestimate the SSM and underestimate the ET, respectively. However, the higher SSM values from models may be due to their soil layer depth being configured as 0-10 cm, while the satellite generally detects the soil moisture at depths of 0-5cm. A study on the central Tibetan Plateau [91] suggested that the modeling biases were contributed by soil texture and soil organic carbon (SOC) content that alters soil thermal/hydraulic properties, which causes the significant underestimation of the surface soil moisture. The present study also suggested that products retrieved from the Advanced Microwave Scanning Radiometer-Earth Observing System (AMSR-E) evidently overestimated soil moisture, indicating that the satellite-based retrieval algorithms must be improved for the cold semi-arid regions. In arid and semi-arid ecosystems, it is still a theoretical and technical challenge to partition ecosystem scale ET fluxes between plant transpiration and soil/canopy evaporation [92]. Both numerical models and satellite products require ground-based data for calibration and validation. However, the observational data (chiefly from flux towers) are very limited, which results in inaccurate, poorly constrained estimates of energy, water, and carbon fluxes in the dryland to study its ecosystem's structural and functional dynamics.
In Figures 12 and 13, the proposed SSM/ET products by using machine-learning models demonstrate an appreciable improvement over the LDAS model. The previous studies applying similar automatic methodologies have been discussed [35,40,93], highlighting the importance of balancing the performance and explainability of the machine learning models for scientific studies. The machine learning models, along with numerical models based on physical processes (e.g., FLDAS), are the most important sources for hydrological studies in the dryland countries due to their limited scientific data and technological ability. For example, countries in the Nile River Basin (shared by Egypt, Sudan, Tanzania, Ethiopia, etc.) are undergoing accelerated population growth and urbanization [39]. The ongoing construction and filling of Ethiopia's Grand Ethiopian Renaissance Dam have brought up intensifying tensions between Ethiopia and downstream dryland countries including Egypt and Sudan. However, scientific studies are restricted to support the decision making and collaborations due to limited on-site observations, while the number of functional hydrometeorological gauging stations declined by 88% between the early 1960s and 2014 [94]. It is noted that our approach to the correction of the process-based hydrological models is also known as a process of hydrological post-processing to obtain "point" predictions. By contrast, new data-driven practices [95,96] of delivering "probabilistic" hydrological predictions (e.g., quantile regression [97]) were introduced recently with their potential in expressing complex hydrological processes considering predictive uncertainties [98].
Fortunately, new and emerging advances leverage significant opportunities for overcoming previous challenges in dryland remote sensing and thus revolutionize our ability to contextualize arid and semi-arid regions within the broader Earth system. Accordingly, the following strategies are recommended [99]: (1) exploring novel fusions of sensors and techniques across different spatiotemporal scales (e.g., solar-induced fluorescence, thermal, microwave, hyperspectral, and LiDAR) to understand dryland structural and functional dynamics; (2) capturing instant responses of dryland ecosystems to diurnal variation in water stress by utilizing near-continuous observations from geostationary satellites; (3) building up new ground observational networks for validation and calibration to better represent the heterogeneity of dryland system; (4) developing algorithms specifically for dryland ecosystems; (5) coupling physical process-based models with remote sensing observations using data assimilation to improve ecological forecasts and long-term projections.

Conclusions
In this paper, we investigated the hydrological conditions over global arid and semi-arid regions, including areas of BSh (Hot semi-arid climate), BSk (Cold semi-arid climate), BWh (Hot desert climate) and BWk (Cold desert climate), through synergistic approaches of remote sensing and modelling. The time series analysis during the last decade (2010-2019) of multiple hydrological variables (ET, SSM and precipitation) and vegetation indices (FPAR, LAI and NDVI) highlighted precipitation pulses that showed less impact on the vegetation indices compared to ET and SSM in some arid regions. Moreover, the trends of ET, SSM and precipitation, as well as land cover changes, that were demonstrated over the same period, emphasized drier trends in the south of Africa and the east of Australia and wetter trends in Mesopotamia and North America. Finally, the comparison of widely-used FLDAS and GLDAS models showed a general consistency, while a strong discrepancy was observed in the downstream Nile Basin, also demonstrating the fact that both LDAS models overestimate SSM and underestimate ET while model outputs of hot areas have better consistency with satellite products than cold areas.
This work further confirmed the scientific fact of decoupling between ecosystem function and observed optical characteristics (e.g., greenness indicated by vegetation indices), which calls for novel techniques and applications of remote sensing in the arid and semi-arid regions. Some areas (e.g., Mesopotamia, Eastern Africa, Australia and Argentina) experienced intense land cover changes along with noted water storage variations, indicating complex hydrological processes which need further investigations. Our data-driven modelling as a post-processing approach effectively improved the process-based models' performance and recommended that both process-based and satellite-based hydrological products have a higher potential to be improved for the arid and semi-arid regions. However, considering complex hydrological processes and predictions uncertainties, new practices of delivering data-driven probabilistic modelling are suggested for future studies. In addition, the study was fully deployed on the open-source cloud computing and analytical platforms (Google Earth Engine and BigML), which posed their great potential in other global-scale environmental studies.