A Novel Scheme for Merging Active and Passive Satellite Soil Moisture Retrievals Based on Maximizing the Signal to Noise Ratio

In this research, we developed and evaluated a new scheme for merging soil moisture (SM) retrievals from both passive and active microwave satellite estimates, based on maximized signal-to-noise ratios, in order to produce improved SM products using least-squares theory. The fractional mean-squared-error (fMSE) derived from the triple collocation method (TCM) was used for this purpose. The proposed scheme was applied by using a threshold between signal and noise at fMSE equal to 0.5 to maintain the high-quality SM observations. In the regions where TCM is unreliable, we propose four scenarios based on the determinations of correlations between all three SM products of TCM at significance levels (i.e., p-values). The proposed scheme was applied to combine SM retrievals from Soil Moisture Active Passive (SMAP), Advanced Scatterometer (ASCAT), and Advanced Microwave Scanning Radiometer 2 (AMSR2) to produce SMAP+ASCAT and AMSR2+ASCAT SM datasets at a global scale for the period from June 2015 to December 2017. The merged SM dataset performance was assessed against SM data from ground measurements of international soil moisture network (ISMN), Global Land Data Assimilation System-Noah (GLDAS-Noah) and ERA5. The results show that the two merged SM datasets showed significant improvement over their parent products in the high average temporal correlation coefficients (R) and the lowest root mean squared difference (RMSE), compared with in-situ measurements over different networks of ISMN. Moreover, these datasets outperformed their parent products over different land cover types in most regions of the world, with a high overall average temporal R and the lowest overall average RMSE value with GLDAS and ERA5. In addition, the suggested scenarios improved SM performance in the regions with unreliable TCMs.


Introduction
Soil moisture (SM) is a critical factor in atmosphere-land interactions, energy fluxes, water fluxes control, and carbon exchange. Therefore, an understanding of SM dynamics informs applications in agricultural research, hydrology, drought monitoring, and climate change monitoring [1][2][3][4]. The most widely used methods to obtain SM data are ground measurements, remotely sensed retrieval data, and values extracted from surface models [5,6].

Reanalyzed SM Datasets
The ground SM observations can be considered the ideal reference dataset to evaluate the performance of SM datasets [38,39]. However, the ground observations have some challenges due to mismatches between it and the footprints from large scale satellite measurements and the limitations of spatial coverage [40][41][42]. To overcome these challenges, the SM retrievals can be assessed by a comparison with reanalyzed datasets [43].
The Global Land Data Assimilation System (GLDAS) was developed by the National Oceanic and Atmospheric Administration (NOAA) and NASA. The GLDAS-Noah model provides time series (3-h) SM data for four layers with different depths at a spatial resolution of 0.25 • [44]. The SM of the top layer (0-10 cm) from GLDAS Noah version 2.1 was used as one of the triple collocation method (TCM) datasets and to assess the performance of merged and parent SM datasets. GLDAS Noah has been used as a reference dataset in several studies for blending satellite SM datasets [13,22,45]. Moreover in several studies, the top 10 cm layer of SM from reanalyzed products was used to validate and improve SM retrievals for different satellites [35,46].
The daily averaged of the ERA5 reanalysis SM dataset with spatial resolution 0.25 in the period from June 2015 to December 2017 was used to evaluate the performance of merged and parent SM datasets. The European Center for Medium-Range Weather Forecasts (ECMWF) provides a global ERA5 atmospheric reanalysis to the public. In previous studies, the evaluation of ERA5 on global scale proved that the performance of ERA5 has reliability in the absence of ground measurements [47][48][49]. More information about this product is available at https://apps.ecmwf.int/datasets/.

In-Situ SM Dataset
The ground SM measurements from various networks of the international soil moisture network (ISMN) were used to evaluate the performance of merged and parents SM products. The locations of ISMN ground stations used in this study are displayed in Figure 1.

In-Situ SM Dataset
The ground SM measurements from various networks of the international soil moisture network (ISMN) were used to evaluate the performance of merged and parents SM products. The locations of ISMN ground stations used in this study are displayed in Figure 1. The ISMN is a website that stores and organizes the SM measurements form various in-situ stations of different global networks and make it freely through (https://ismn.geo.tuwien.ac.at). The ISMN data collected from approximately 1400 ground stations. The SM measurements of ISMN are important for evaluating the performance of SM retrievals from satellites and land surface models and for studying the climate [15,40,50]. The main features of ISMN validation sites used in this study are listed in Table 1.

Land Cover Dataset
The quality of SM data from both satellite and land surface model changes according to land cover types [14,57]. In this study, The MODIS (Moderate Resolution Imaging Spectroradiometer) land cover product (MCD12Q1) with grid resolution 0.5 km was used to assess the performance of merged and parents SM products over different land covers on a global scale. MCD12Q1 product provides a yearly land cover on a global scale from 2001 to the present available freely for users through (https://search.earthdata.nasa.gov/) [58]. Figure 2 shows a global map of the MODIS land cover. The ISMN is a website that stores and organizes the SM measurements form various in-situ stations of different global networks and make it freely through (https://ismn.geo.tuwien.ac.at). The ISMN data collected from approximately 1400 ground stations. The SM measurements of ISMN are important for evaluating the performance of SM retrievals from satellites and land surface models and for studying the climate [15,40,50]. The main features of ISMN validation sites used in this study are listed in Table 1.

Land Cover Dataset
The quality of SM data from both satellite and land surface model changes according to land cover types [14,57]. In this study, The MODIS (Moderate Resolution Imaging Spectroradiometer) land cover product (MCD12Q1) with grid resolution 0.5 km was used to assess the performance of merged and parents SM products over different land covers on a global scale. MCD12Q1 product provides a yearly land cover on a global scale from 2001 to the present available freely for users through (https://search.earthdata.nasa.gov/) [58]. Figure 2 shows a global map of the MODIS land cover.

Methods
A flowchart of the proposed blending scheme of SM datasets is shown in Figure 3.The blending scheme is based on maximizing signal to noise ratios. The fractional mean-squared-error (fMSE) estimates derived by the TCM was used for weight estimation based on the least-squares method. The proposed scheme was applied by using a threshold between signal and noise at fMSE equal to 0.5 to maintain the high-quality SM observations. Based on the correlations determinations between all three SM datasets of TCM at significance levels (i.e., p-values), we proposed four scenarios in the regions of unreliable TCM. These scenarios include using passive SM product only or using an active SM dataset only or using an unweighted average or exclude the pixels without any retrieval skills. The proposed merge scheme was applied to SM retrievals from ASCAT, SMAP, and AMSR2 in the period from June 2015 to December 2017 to produce SMAP+ASCAT and AMSR2+ASCAT SM datasets. The two merged SM datasets were evaluated against SM estimates from in situ observations of ISMN and reanalysis datasets (GLDAS and ERA5). More details about the merging scheme in the following sub-sections.

Methods
A flowchart of the proposed blending scheme of SM datasets is shown in Figure 3. The blending scheme is based on maximizing signal to noise ratios. The fractional mean-squared-error (fMSE) estimates derived by the TCM was used for weight estimation based on the least-squares method. The proposed scheme was applied by using a threshold between signal and noise at fMSE equal to 0.5 to maintain the high-quality SM observations. Based on the correlations determinations between all three SM datasets of TCM at significance levels (i.e., p-values), we proposed four scenarios in the regions of unreliable TCM. These scenarios include using passive SM product only or using an active SM dataset only or using an unweighted average or exclude the pixels without any retrieval skills. The proposed merge scheme was applied to SM retrievals from ASCAT, SMAP, and AMSR2 in the period from June 2015 to December 2017 to produce SMAP+ASCAT and AMSR2+ASCAT SM datasets. The two merged SM datasets were evaluated against SM estimates from in situ observations of ISMN and reanalysis datasets (GLDAS and ERA5). More details about the merging scheme in the following sub-sections.

Methods
A flowchart of the proposed blending scheme of SM datasets is shown in Figure 3.The blending scheme is based on maximizing signal to noise ratios. The fractional mean-squared-error (fMSE) estimates derived by the TCM was used for weight estimation based on the least-squares method. The proposed scheme was applied by using a threshold between signal and noise at fMSE equal to 0.5 to maintain the high-quality SM observations. Based on the correlations determinations between all three SM datasets of TCM at significance levels (i.e., p-values), we proposed four scenarios in the regions of unreliable TCM. These scenarios include using passive SM product only or using an active SM dataset only or using an unweighted average or exclude the pixels without any retrieval skills. The proposed merge scheme was applied to SM retrievals from ASCAT, SMAP, and AMSR2 in the period from June 2015 to December 2017 to produce SMAP+ASCAT and AMSR2+ASCAT SM datasets. The two merged SM datasets were evaluated against SM estimates from in situ observations of ISMN and reanalysis datasets (GLDAS and ERA5). More details about the merging scheme in the following sub-sections.

Maximized SNR for Merging SM Datasets
There are different ways to define optimality. In this research, the least-squares method was used to determine the optimal weight. Least-squares is widely used for data assimilation, since it was developed by [59,60]. The least-squares method deploys a weighted average method for calculating the optimal weights for parent of the merged datasets [22,61,62]. The original formula of the weighted mean can be written as, where SM c is the averaged SM estimate; SM i is the SM estimates for different datasets; and W i are the weights given to SM estimates when the output product is blended as a linear combination of single products. To merge the independent uncorrelated observations of active and passive satellite SM, the original Equation (1) of merging datasets can be expressed as, SM c = W ac SM ac +W pa SM pa (2) where W ac and W pa are the weights of active and passive satellite SM datasets; and SM ac and SM pa are the values of active, and passive SM retrievals, respectively. To overcome on the dependency of the relative estimated weights for merged product on a specific model, the weights are given for satellite datasets were calculated, based on maximizing signal to noise ratios of SM estimates. The fractional mean-squared-error (fMSE) was used for this purpose. Where fMSE has a normalized representation of the signal to noise ratio (SNR) and it has a specific range (i.e., from 0 to 1). A higher/lower fMSE indicates a noisier/clearer signal of the SM value. When the fMSE is equal to 0, it means that the SM observation is free from noise. When the fMSE is equal to 1, it is means the SM observations have only noise. Also, the SM signal becomes stronger than its noise when the fMSE value is lower than 0.5 [13,20]. The weights can be determined using the following equations, W pa = fMSE ac fMSE ac +fMSE pa (4) where W ac and W pa are the weights of active and passive SM products respectively; fMSE ac and fMSE pa are the fractional mean squared errors of active and passive SM datasets respectively. To maintain the high-quality SM observations, the proposed scheme was applied using a threshold between signal and noise at fMSE equal to 0.5.The different cases for using fMSE threshold to maintain high-quality SM retrieval data are listed in Table 2. The weighted average method was used only when the signal of both active and passive exceed the noise (i.e., fMSE < 0.5) or when the noise of both them exceed the signal (i.e., fMSE > 0.5). When the signal of one parent dataset exceeds the noise (i.e., fMSE < 0.5) and the noise for other dataset exceed the signal (i.e., fMSE > 0.5), we take only the dataset the strong signal.
Moreover, prior to merging datasets using the estimated weights, the systematic differences between the datasets were removed. This was achieved by rescaling SM products into a mutual data space. In this research, the fMSE estimates and rescaling coefficients were derived using TCM [20,[63][64][65].

Triple Collocation Method
The triple collocation method (TCM) was utilized to determine the fMSE and rescaling coefficients of the active and passive SM datasets. TCM considers a promising method to validate remotely sensed SM datasets [63,65]. Where the random error variance and SNR of these datasets can be estimated on a larger scale without the need for ground reference information [66]. Furthermore, it provides an optimal solution for rescaling SM datasets take into consideration the individual random error properties and match the variability of the jointly observed signal [67]. Further details about TCM derivation are presented in previous research [20,68].
To remove the dependence of the scaled error pattern on the spatial climatology of the selected scaling reference, Draper et al. [68] suggested normalizing the estimates of the un-scaled error variance with the variance of corresponding datasets. The fMSE of active and passive SM datasets can be determined as, where SNR ac and SNR pa are the signal to noise ratios of active and passive SM products respectively. The signal to noise ratios for active and passive SM datasets can be calculated using the following formulas, SNR ac = σ ac,pa σ ac,mo σ 2 ac σ pa,mo SNR pa = σ pa,ac σ pa,mo σ 2 pa σ ac,mo (8) where σ ac,pa , σ pa,mo and σ ac,mo are the covariance of datasets; σ 2 ac and σ 2 pa are the variance of the active, and passive datasets, respectively. Draper et al. [69] recommended normalization of the SM datasets to remove systematic differences between them, based on one reference dataset. To remove the systematic differences between SM datasets, the rescaling coefficients are estimated as, where B RESpa is a coefficient used to linearly rescaling of the passive against the active SM dataset; SM Pa and SM ac are the average of SM retrieval for active and passive datasets respectively. The dataset used as a reference in rescaling process does not matter, as this does not affect the rescaled dataset or the merged time series.

Blending of Unreliable TCM Observations
TCM is deemed unreliable due to the limited numbers of observations (i.e., <100). This is caused by one or more of SM datasets in the TCM triplet having retrieval issues such as spatial coverage gaps due to masking procedures or radio frequency interference (RFI), the difference in overpass times, and the lower time-series coverage. TCM results can also be unreliable if one or more SM datasets of the TCM triplet are correlated insignificantly to other datasets as indicated by a student's t-test [42,65]. The area of unreliability may be found in certain regions, such as over desert regions and at high latitudes.
In this work, we proposed four scenarios to increase the temporal and overall spatial coverage of merged datasets in regions where the TCM is unreliable. These scenarios include passive SM products only or active SM datasets only, an unweighted average, or excluded pixels. The selection of a suitable scenario is based on the retrieval quality of the SM dataset. The correlations between all three SM datasets in the TCM (i.e., passive and active products, active product and reanalysis, passive product and reanalysis) was calculated at significance levels (i.e., p-value < 0.05) as an indication for retrieval quality. The criteria for choosing the different scenarios in unreliable TCM regions are as follows: • The active SM retrieval data were used only in the area where the passive product correlates insignificantly to both model and active product, while the active product is correlated significantly with the model. Moreover, the pixels of an active SM ware used in the regions where the correlations of the active product with both, passive product and model are significant, while the passive product correlates insignificantly with the model.

•
The passive SM dataset was used only in the regions where the active correlates insignificantly with both the model and passive product, while the passive product is correlated significantly with the model. Also, a passive SM still the best choice, if the correlation of passive with both active product and model are significant, while the active product correlates insignificantly with the model.

•
The unweighted average was used in the case of, the correlations of both active and passive SM datasets are significant with the reanalysis, but not with each other. Also, the unweighted average method outperformed than other methods, if the active and passive SM datasets are correlated insignificantly with the model but the correlation between active and model is significant.

•
Excluded the pixels at which the correlations between all three SM of TCM are insignificant with each other. Moreover, these pixels showed an insignificant correlation against the independent reference dataset when applied in both scenarios using a single sensor and an unweighted mean method.

Hovmöller Diagrams
The SM strongly varies temporally and spatially. This variability mainly depends on latitude or longitude and season [23,70]. Hovmöller diagrams were used to study the spatiotemporal variability of merged and parents SM datasets [14,71,72]. A hovmöller diagram represents the time variability of spatial data. Where the time is presented on the x-axis and the average values (either overall latitudes or overall longitudes) of datasets are displayed on the y-axis [73]. In this research, the longitudinal averages for SM values were used to study the consistent between parents and merged SM products.

Error Statistics
In this research, the statistical metrics were used to evaluate the merged and parents SM datasets against reference (in-situ and reanalysis) datasets. These metrics include the Pearson correlation coefficients (R) and the root mean squared difference (RMSE) [74]. The R and RMSE between SM datasets and reference datasets are calculated by using the following equations, where E[·] represents the estimation of the average values, t is the time series of the dataset, SM(t) represents the SM observations of merged or parent dataset at time t; SM r (t) represents the SM observations of reference dataset; σ represents the standard deviation of merged or parent SM dataset; σ r represents the standard deviation of the reference SM dataset.

Spatiotemporal Variability of SM Datasets
The performance of merged datasets can be achieved by constructing hovmöller diagrams. Hovmöller diagrams of SMAP+ASCAT, AMSR2+ASCAT, SMAP, AMSR2, ASCAT, GLDAS, and ERA5 SM datasets on the global scale are presented in Figure 4. In these diagrams, time is represented on the abscissa from June 2015 to December 2017 and the latitude from 90 • S to 90 • N is displayed on the ordinate. These diagrams show that the overall global spatial patterns of all SM datasets agree. where E[·] represents the estimation of the average values, t is the time series of the dataset, ( ) represents the SM observations of merged or parent dataset at time t; ( ) represents the SM observations of reference dataset; represents the standard deviation of merged or parent SM dataset ; represents the standard deviation of the reference SM dataset.

Spatiotemporal Variability of SM Datasets
The performance of merged datasets can be achieved by constructing hovmöller diagrams. Hovmöller diagrams of SMAP+ASCAT, AMSR2+ASCAT, SMAP, AMSR2, ASCAT, GLDAS, and ERA5 SM datasets on the global scale are presented in Figure 4. In these diagrams, time is represented on the abscissa from June 2015 to December 2017 and the latitude from 90°S to 90°N is displayed on the ordinate. These diagrams show that the overall global spatial patterns of all SM datasets agree.

Optimal Weighting Factors
The optimal weights indicate the relative contribution of the remotely sensed SM datasets against each other. Also, the optimal weights provide evidence about the weakness and strengths of satellite SM products over different land covers. The relative optimal weights of parents SM products (SMAP, AMSR2, and ASCAT) for SMAP+ASCAT and AMSR2+ASCAT merged datasets are presented in Figures 5 and 6.

Optimal Weighting Factors
The optimal weights indicate the relative contribution of the remotely sensed SM datasets against each other. Also, the optimal weights provide evidence about the weakness and strengths of satellite SM products over different land covers. The relative optimal weights of parents SM products (SMAP, AMSR2, and ASCAT) for SMAP+ASCAT and AMSR2+ASCAT merged datasets are presented in Figures 5 and 6.  The weights are given for parent (SMAP, ASCAT, and AMSR2) SM products of SMAP+ASCAT and AMSR2+ASCAT datasets derived from the proposed scheme as discussed. In two merged SM datasets, the blue color indicates that the majority of the weight comes from the ASCAT product while the majority of SMAP and AMSER2 weights are represented with red colors. The ASCAT SM product was given a high relative weight than SMAP and AMSR2 in their merged products over more dense vegetation regions (forests, savannas, and crops) (see land cover map Figure 2) which is mostly concentrated in high-temperature regions below and above the equator. While, the SMAP and

Optimal Weighting Factors
The optimal weights indicate the relative contribution of the remotely sensed SM datasets against each other. Also, the optimal weights provide evidence about the weakness and strengths of satellite SM products over different land covers. The relative optimal weights of parents SM products (SMAP, AMSR2, and ASCAT) for SMAP+ASCAT and AMSR2+ASCAT merged datasets are presented in Figures 5 and 6.  The weights are given for parent (SMAP, ASCAT, and AMSR2) SM products of SMAP+ASCAT and AMSR2+ASCAT datasets derived from the proposed scheme as discussed. In two merged SM datasets, the blue color indicates that the majority of the weight comes from the ASCAT product while the majority of SMAP and AMSER2 weights are represented with red colors. The ASCAT SM product was given a high relative weight than SMAP and AMSR2 in their merged products over more dense vegetation regions (forests, savannas, and crops) (see land cover map Figure 2) which is mostly concentrated in high-temperature regions below and above the equator. While, the SMAP and The weights are given for parent (SMAP, ASCAT, and AMSR2) SM products of SMAP+ASCAT and AMSR2+ASCAT datasets derived from the proposed scheme as discussed. In two merged SM datasets, the blue color indicates that the majority of the weight comes from the ASCAT product while the majority of SMAP and AMSER2 weights are represented with red colors. The ASCAT SM product was given a high relative weight than SMAP and AMSR2 in their merged products over more dense vegetation regions (forests, savannas, and crops) (see land cover map Figure 2) which is mostly concentrated in high-temperature regions below and above the equator. While, the SMAP and AMSR2 are given high relative weight than ASCAT in SMAP+ASCAT and AMSR2+ASCAT merged datasets over the areas of moderate and less vegetation (grassland, shrub, and desert) (see land cover map in Figure 2). The analysis of the weights are given for both SMAP and AMSR2 in their merged datasets over various land cover we found that the SMAP was given a higher relative weight than AMSR2.

Evaluation of Merged SM Datasets
A key objective of the proposed scheme was to develop two merged SM products that outperform the parent products. Moreover, the selection of the SM dataset for some critical applications is based on their quality or accuracy. Therefore, the performance of SMAP+ASCAT and AMSR2+ASCAT SM merged products were assessed by a comparison with independent reference datasets include in-situ measurements, GLDAS, and ERA5 using R and RMSE metrics.

Validation of Merged Products against In-Situ Measurements
The SMAP+ASCAT and AMSR2+ASCAT products assessed against SM measurements of ISMN networks using correlations coefficients analysis and RMSE to fully understand how the merged products perform relative to parents products. Also, the parents (SMAP, AMSR2, and ASCAT) products were evaluated against the same ISMN networks which indicate the strengths points of these products and how contribute to merged products. The merged and parents datasets compared with the ground stations of different six networks available from ISMN and concentrated over America and Europe which include PBO-H2O, SNOTEL, SCAN, HOPE, SMOSMANIA, and USCRN. The results of overall averages of R and RMSE for the comparison of merged and parents SM datasets with the in-situ SM observations of different ISMN networks are listed in Table 3. The overall averages of correlation coefficients for all ISMN networks were 0.75, 0.70, 0.73, 0.66, and 0.60 for SMAP+ASCAT, AMSR2+ASCAT, SMAP, ASCAT, and AMSR2, respectively. In terms of overall average correlations values, the SMAP+ASCAT and AMSR2+ASCAT SM datasets showed higher overall average R values with in-situ measurements over all different six ISMN networks than either of the parents datasets (SMAP, AMSR2, and ASCAT).
The overall averages of RMSE for all ISMN networks were (0.073, 0.091, 0.079, 0.098, and 0.143) m 3 m −3 for SMAP+ASCAT, AMSR2+ASCAT, SMAP, ASCAT, and AMSR2, respectively. In terms of overall average RMSE values, the SMAP+ASCAT and AMSR2+ASCAT SM products showed the lowest overall average RMSE values with in-situ measurements over all different six ISMN networks than either of the parents datasets (SMAP, AMSR2, and ASCAT).
To analysis the full performance of two merged datasets comparing with parent datasets we constructed the box plots. The overall performance of merged and parents SM products against the in-situ SM observations of different six ISMN networks are displayed on box plots of Figures 7 and 8 improvements are displayed in box plots of correlations coefficients for merged and parents' products against ground measurements (see the median, the 3rd quantile Q3, and the 1st quantile Q1).   Remote Sens. 2020, 12, x FOR PEER REVIEW 12 of 23 improvements are displayed in box plots of correlations coefficients for merged and parents' products against ground measurements (see the median, the 3rd quantile Q3, and the 1st quantile Q1).   The overall performance of SMAP+ASCAT and AMSR2+ASCAT datasets outperformed the parents datasets (SMAP, ASCAT, and AMSR2) over different six ISMN networks. These improvements are displayed in box plots of correlations coefficients for merged and parents' products against ground measurements (see the median, the 3rd quantile Q3, and the 1st quantile Q1).

Evaluation of Merged Products against Modeling Data
Although, SM measurements of ground stations consider more robust reliable methods to assess the performance of the satellite SM retrieval data, but still has limitations in the temporal and spatial coverage all over the world. Also, the ground observations have some challenges due to mismatches between it and the footprints from large scale satellite measurements and the limitations of spatial coverage [40][41][42]. Therefore, to overcome these problems and fully evaluate the merged and parents SM datasets over different land covers, the comparison with reanalysis becomes the alternative solution. The merged and parents SM products were assessed by comparing it with GLDAS and ERA5 reanalysis datasets using R and RMSE metrics. The spatial distribution of differences in correlation coefficients (against modeling datasets) between merged (SMAP+ASCAT and AMSR+ASCAT) and parents (SMAP, AMSR2, and ASCAT) SM datasets are displayed in Figures 9 and 10.

Evaluation of Merged Products against Modeling Data
Although, SM measurements of ground stations consider more robust reliable methods to assess the performance of the satellite SM retrieval data, but still has limitations in the temporal and spatial coverage all over the world. Also, the ground observations have some challenges due to mismatches between it and the footprints from large scale satellite measurements and the limitations of spatial coverage [40][41][42]. Therefore, to overcome these problems and fully evaluate the merged and parents SM datasets over different land covers, the comparison with reanalysis becomes the alternative solution. The merged and parents SM products were assessed by comparing it with GLDAS and ERA5 reanalysis datasets using R and RMSE metrics. The spatial distribution of differences in correlation coefficients (against modeling datasets) between merged (SMAP+ASCAT and AMSR+ASCAT) and parents (SMAP, AMSR2, and ASCAT) SM datasets are displayed in Figures 9  and 10.   Although, SM measurements of ground stations consider more robust reliable methods to assess the performance of the satellite SM retrieval data, but still has limitations in the temporal and spatial coverage all over the world. Also, the ground observations have some challenges due to mismatches between it and the footprints from large scale satellite measurements and the limitations of spatial coverage [40][41][42]. Therefore, to overcome these problems and fully evaluate the merged and parents SM datasets over different land covers, the comparison with reanalysis becomes the alternative solution. The merged and parents SM products were assessed by comparing it with GLDAS and ERA5 reanalysis datasets using R and RMSE metrics. The spatial distribution of differences in correlation coefficients (against modeling datasets) between merged (SMAP+ASCAT and AMSR+ASCAT) and parents (SMAP, AMSR2, and ASCAT) SM datasets are displayed in Figures 9  and 10.   The SMAP+ASCAT and AMSR2+ASCAT are achieved improvement in temporal correlations coefficients with GLDAS and ERA5 outperforming the parents datasets overall the world. These correlations improvements spatially distributed over different land cover types of all most areas of the world according to the spatial weights patterns of parents' datasets. Over more dense vegetation regions (forests, savannas, and crops) the SMAP and AMSR2 are displayed high R differences with their merged datasets. Over these areas, the SMAP and AMSR2 were given the lower relative weights than ASCAT products for merged products. Conversely, over moderate and less vegetation regions (grassland, shrub, and desert) the ASCAT showed high R differences than their merged products. Over these regions the ASCAT was given the lowest relative weights than SMAP and AMSR2 products in their merged products. Although, the GLDAS is used as one of three datasets of TCM to calculate fMSE however the SMAP+ASCAT and AMSR2+ASCAT SM products showed better performance with ERA5 exceeds the performance with GLDAS. Box plots of correlation coefficients of merged and parents SM datasets against the modeling SM products are presented in Figures 11 and 12. Remote Sens. 2020, 12, x FOR PEER REVIEW 14 of 23 The SMAP+ASCAT and AMSR2+ASCAT are achieved improvement in temporal correlations coefficients with GLDAS and ERA5 outperforming the parents datasets overall the world. These correlations improvements spatially distributed over different land cover types of all most areas of the world according to the spatial weights patterns of parents' datasets. Over more dense vegetation regions (forests, savannas, and crops) the SMAP and AMSR2 are displayed high R differences with their merged datasets. Over these areas, the SMAP and AMSR2 were given the lower relative weights than ASCAT products for merged products. Conversely, over moderate and less vegetation regions (grassland, shrub, and desert) the ASCAT showed high R differences than their merged products. Over these regions the ASCAT was given the lowest relative weights than SMAP and AMSR2 products in their merged products. Although, the GLDAS is used as one of three datasets of TCM to calculate fMSE however the SMAP+ASCAT and AMSR2+ASCAT SM products showed better performance with ERA5 exceeds the performance with GLDAS. Box plots of correlation coefficients of merged and parents SM datasets against the modeling SM products are presented in Figures 11  and 12.  Box plots of correlations coefficients for SMAP+ASCAT, AMSR2+ASCAT, SMAP, ASCAT, and AMSR2 products against GLDAS and ERA5 references datasets (see the median, the third quantile Q3, and the first quantile Q1) outperformed the merged datasets their parent datasets all over the world. The overall averages of temporal correlations coefficients for two merged and parents SM datasets against the modeling SM products are presented in bar graphs of Figure 13. The SMAP+ASCAT and AMSR2+ASCAT are achieved improvement in temporal correlations coefficients with GLDAS and ERA5 outperforming the parents datasets overall the world. These correlations improvements spatially distributed over different land cover types of all most areas of the world according to the spatial weights patterns of parents' datasets. Over more dense vegetation regions (forests, savannas, and crops) the SMAP and AMSR2 are displayed high R differences with their merged datasets. Over these areas, the SMAP and AMSR2 were given the lower relative weights than ASCAT products for merged products. Conversely, over moderate and less vegetation regions (grassland, shrub, and desert) the ASCAT showed high R differences than their merged products. Over these regions the ASCAT was given the lowest relative weights than SMAP and AMSR2 products in their merged products. Although, the GLDAS is used as one of three datasets of TCM to calculate fMSE however the SMAP+ASCAT and AMSR2+ASCAT SM products showed better performance with ERA5 exceeds the performance with GLDAS. Box plots of correlation coefficients of merged and parents SM datasets against the modeling SM products are presented in Figures 11  and 12.  Box plots of correlations coefficients for SMAP+ASCAT, AMSR2+ASCAT, SMAP, ASCAT, and AMSR2 products against GLDAS and ERA5 references datasets (see the median, the third quantile Q3, and the first quantile Q1) outperformed the merged datasets their parent datasets all over the world. The overall averages of temporal correlations coefficients for two merged and parents SM datasets against the modeling SM products are presented in bar graphs of Figure 13. Box plots of correlations coefficients for SMAP+ASCAT, AMSR2+ASCAT, SMAP, ASCAT, and AMSR2 products against GLDAS and ERA5 references datasets (see the median, the third quantile Q3, and the first quantile Q1) outperformed the merged datasets their parent datasets all over the world. The overall averages of temporal correlations coefficients for two merged and parents SM datasets against the modeling SM products are presented in bar graphs of Figure 13. In terms of overall average correlations coefficients values, the SMAP+ASCAT and AMSR+ASCAT merged SM datasets showed a high correlation with GLDAS and ERA5 which outperform the parents SM datasets (SMAP, AMSR2, and ASCAT).
In this study, the second statistics metric was used to assess the performance of merged products is RMSE. Box plots of RMSE of merged (SMAP+ASCAT and AMSR+ASCAT) and parents (SMAP, AMSR2, and ASCAT) SM datasets against the modeling (GLDAS and ERA5) SM products are presented in Figure 14. In terms of overall average correlations coefficients values, the SMAP+ASCAT and AMSR+ASCAT merged SM datasets showed a high correlation with GLDAS and ERA5 which outperform the parents SM datasets (SMAP, AMSR2, and ASCAT).
In this study, the second statistics metric was used to assess the performance of merged products is RMSE. Box plots of RMSE of merged (SMAP+ASCAT and AMSR+ASCAT) and parents (SMAP, AMSR2, and ASCAT) SM datasets against the modeling (GLDAS and ERA5) SM products are presented in Figure 14. In terms of overall average correlations coefficients values, the SMAP+ASCAT and AMSR+ASCAT merged SM datasets showed a high correlation with GLDAS and ERA5 which outperform the parents SM datasets (SMAP, AMSR2, and ASCAT).
In this study, the second statistics metric was used to assess the performance of merged products is RMSE. Box plots of RMSE of merged (SMAP+ASCAT and AMSR+ASCAT) and parents (SMAP, AMSR2, and ASCAT) SM datasets against the modeling (GLDAS and ERA5) SM products are presented in Figure 14. In terms of overall average RMSE values, the merged SM datasets showed the lowest overall average RMSE, with two modeling datasets that outperform the parents SM products. Box plots of RMSE for SM products against GLDAS and ERA5 datasets displayed outperforming for the merged datasets on the parents datasets overall the world (see the median, the 3rd quantile Q 3 , and the 1st quantile Q 1 ).
In the regions of unreliable triple collocation analysis, as mentioned, we proposed four scenarios based on the determinations of correlations between all three SM products of TCM at significance levels (i.e., p-values < 0.05). These scenarios include, either a passive SM dataset only, or using an active SM dataset only, or using an unweighted average method, or excluding the pixels without any retrieval skills. The independent ERA5 reference product was used to make sure from the reliability of the selected scenario according to the suggested criteria. The SMAP+ASCAT and AMSR2+ASCAT merged products produced by using the scenarios of using the single sensor (i.e., either active only or passive only) displayed a significant correlation with ERA5. Also, the SMAP+ASCAT and AMSR2+ASCAT datasets were produced by using an unweighted average scenario in the case of the significant correlations for both parent SM products with GLDAS, but not with each other, showed a high overall correlation with ERA5. At high latitudes, the two merged products produced by using an unweighted average scenario when both active and passive correlate insignificantly with GLDAS, but correlate significantly with each other, showed a much lower overall correlation value ERA5. The pixels at which the correlations between all three SM of TCM are insignificant with each other were excluded from SMAP+ASCAT and AMSR2+ASCAT products. Where these pixels also showed an insignificant correlation against the ERA5 dataset by applying the scenarios of both using a single sensor and an unweighted mean method.

Discussion
In this research, we introduce a new scheme that takes advantage of both active and passive satellite SM products to produce improved SM product. This scheme is based on maximizing signal to noise ratios by using least-squares theory. The proposed scheme does not require a specific land surface model in combination process like other previous studies, which calculated the relative weights of parent satellite SM datasets, based on increasing correlations or decreasing error with specific land surface model [13,17,18,75]. TCM enables us to estimates the sensitivity of the datasets to SM changes in two terms SNR or fMSE. The weights in the proposed scheme are calculated based on fMSE which has a specific range values (i.e., from 0 to 1) not like the SNR term [20,68]. The proposed scheme was applied to SM retrieval from SMAP, AMSR2, and ASCAT datasets to produce SMAP+ASCAT and AMSR2+ASCAT SM datasets.
The proposed scheme maintains the TCM assumptions and not violated it. These assumptions include the errors independency, the availability of datasets in the long temporal period, and stationarity of signal and error estimates [20,64]. To maintain on the assumption of independent errors we selected triplet items with different derivation because the similarly derived datasets may have partially correlated errors [13,14]. This was achieved by calculating TCM twice: Once with ASCAT, SMAP, and GLDAS and once with ASCAT, AMSR2 and GLDAS. While, the assumption of long time series is necessary to reduce the sampling errors of TCM estimates. It is noted that the better results of TCM estimates could be obtained with long time series datasets [75]. This was achieved in our study by applying the proposed scheme using datasets in the period from June 2015 to December 2017 (two years and a half). In this study, we calculated weights based on the stationarity of signal over the working time, and the merged datasets showed better results with reference datasets, as we demonstrated above. However, further studies are recommended, as working studies during different seasons may give better results due to seasonal variations (i.e., rainfall and temperature changes over most regions of the world).
In the proposed scheme, the pixels of high retrieval SM quality are only used from the SMAP, AMSR2, and ASCAT SM products. This was achieved by masking these datasets according to retrieval quality flags as mentioned in Section 2 before merging it. Also, the used threshold at fMSE equal to 0.5 increased the performance of merged products, where this threshold helps us to select the pixels of high quality before the merging process.
The proposed scheme maintains on the variability and seasonal dynamics of SM for the parents and reanalysis references SM products. Where SMAP+ASCAT and AMSR2+ASCAT merged, the products showed the same spatiotemporal variability of the parents and reference datasets.
In SMAP+ASCAT and AMSR2+ASCAT merged products, the ASCAT product was given a high relative weight than SMAP over more dense vegetation (forests, savannas, and crops), which mostly spatially distributed in the high-temperature regions below and above the equator. This is because the ASCAT has a high signal to noise ratios than SMAP and AMSR2 products over these regions. Also, these results agree with previous studies that found the performance of C-band active SM retrievals exceeds the passive SM retrievals in the areas with dense vegetation. Moreover, in extensively high-temperature areas, the active sensors displayed less susceptibility to surface temperature than passive sensors. Therefore, over more dense vegetation regions, the ASCAT retrievals showed less sensitivity to surface temperature change [13,14,23,63,76]. On the other hand, the SMAP and AMSR2 datasets were given high relative weights in SMAP+ASCAT and AMSR2+ASCAT merged datasets than ASCAT product over the areas of moderate and less vegetation (grassland, shrub, and desert). That is because the SMAP and AMSR2 showed high signal to noise ratios over these areas than the ASCAT product. However, previous studies found that the passive SM datasets have better performance over less and moderate vegetation with achieving the highest signal to noise ratios and the lowest error variance than an active sensor [13,14]. The SMAP was given a high relative weight than AMSR2 in their merged datasets because the SMAP has better performance than AMSR2 with achieving a high signal to noise ratios. However, these results agree with previous studies that found that the SMAP SM retrievals are outperformed generally the AMSR2 SM data [2,13,15].
The SMAP+ASCAT and AMSR2+ASCAT products showed significant improvements with achieving the highest R and lowest RMSE values with ground SM measurements over different networks of ISMN than parents products. These improvements provide evidence about the reliability of the proposed scheme. The ground measurements have limitations in spatial and temporal coverage in the world. Therefore, the best alternative to a full evaluation of SM datasets in the world is comparing it with reanalysis datasets [43]. GLDAS and ERA5 atmospheric reanalysis datasets were used for this purpose. The SMAP+ASCAT and AMSR2+ASCAT datasets were achieved a high overall average temporal R-value and less overall average RMSE value with GLDAS and ERA5 references datasets, which exceeds the parents products s over different land cover types in most regions of the world. Demonstrating the proposed scheme assures improved performance. The correlations improvements spatially distributed over different land cover types of all most areas of the world according to the spatial weights patterns of parents' datasets. Over more dense vegetation regions, the SMAP and AMSR2 are given the lowest relative weight than ASCAT products in their merged products, therefore, displayed high R differences with their merged datasets. Conversely, over moderate and less vegetation regions the ASCAT was given the lowest relative weights than SMAP and AMSR2 products in their merged products. Therefore, high differences, R, with their merged products were demonstrated. These improvements change according to the weight differences between parents' datasets, which provide evidence for the accuracy of the estimated weights.
Although, the GLDAS is used as an item in three datasets of TCM to calculate fMSE, the SMAP+ASCAT and AMSR2+ASCAT products showed better performance with ERA5 exceeds the performance with GLDAS, which provides evidence about the independently of merged datasets. In recent studies [48,49,57], the different generations of commonly used global model products were evaluated with observations on global and regional scales. The results showed that the ERA5 atmospheric reanalysis outperforms other models by achieving a high correlation and less RMSE with ground measurements. Therefore, the high correlation of merged SM datasets with ERA5 gives an advantage for the proposed scheme.
After applying the weighted-average method in the regions which achieving significant correlations between all three SM datasets of TCM we proposed four scenarios in the regions of unreliable triple collocation analysis, based on the determinations of correlations between all three SM products of TCM at significance levels (p-values < 0.05). These scenarios include, either using a passive SM dataset only, or using an active SM dataset only, using an unweighted average method or excluding the pixels without any potential skills. Since the errors of the reference product are not correlated with those of the satellite retrievals, hence, the quality of the reanalysis dataset does not affect relative calculated correlations [63,77]. Therefore, the independent ERA5 reference product was used to make sure from the reliability of the selected scenario according to the suggested criteria. The two merged products produced by using the scenarios of using the single sensors (i.e., either active only or passive only), or by using an unweighted average scenario in the case of the significant correlations for both parent SM products with GLDAS, but not with each other provides prove the proposed scenarios assures the performance, where these products displayed a significant correlation with ERA5. The two merged products were produced by using an unweighted average scenario when both active and passive correlate insignificantly with GLDAS, but where they correlated significantly with each other, they showed a much lower overall correlation value with ERA5 at high latitudes. The reasons for these may be due to the unreliability of GLDAS and ERA5 in these regions. In addition, it is known that the land surface models are characterized by poor quality at very high latitudes [78]. For these reasons, we kept the pixels at these regions and consider the significant correlation between active and passive products as an indication the parents products, perhaps still contain valuable SM retrieval data even though the very low correlation with the ERA5 reference dataset. Therefore, we applied an unweighted average scenario in these regions. To obtain high-quality merged SM products we excluded the pixels which don't have any potential skills. These are achieved by excluding the pixels from SMAP+ASCAT and AMSR2+ASCAT products at which the correlations between all three SM of TCM are insignificant with each other. In addition to, these pixels displayed an insignificant correlation against the ERA5 product by applying the scenarios of both using a single sensor and an unweighted mean method.
As discussed in this section the novel proposed scheme improved the performance of satellite SM retrieval over different land cover in most regions of the world.

Conclusions
In this study, we developed and evaluated a new merging scheme, based on maximizing signal-to-noise ratios that take the advantage of both active and passive satellite soil moisture (SM) products to produce an improved SM product. Where the fractional mean-squared-error (fMSE) is derived from the triple collocation method (TCM) is used to estimate weight, based on the least-squares theory. The proposed scheme was applied by using a threshold between signal and noise at fMSE equal to 0.5 to maintain the high-quality SM observations. In the regions where TCM is unreliable, we proposed four scenarios based on the determinations of correlations between all three SM products of TCM at significance levels (i.e., p-values < 0.05). The proposed scheme was applied to SM retrievals from ASCAT, SMAP, and AMSR2 at a global scale in the period from June 2015 to December 2017 to produce SMAP+ASCAT and AMSR2+ASCAT SM datasets. The performances of merged SM datasets were assessed against independent reference datasets include in-situ measurements, GLDAS, and ERA5. The proposed scheme maintains on the variability and seasonal dynamics of SM for the parents and reanalysis references SM products. The merged SM datasets showed better performance with a high average temporal R with in-situ measurements over different networks of ISMN than parents products. Moreover, these datasets showed significant improvements with achieving a high average temporal R value and less RMSE value with GLDAS and ERA5 datasets which exceeds the parents' datasets over different land cover types in most regions of the world. In the regions of unreliable TCM, the suggested four scenarios improved the performance of SM retrievals. The proposed scheme has the potential to be applied existing microwave satellites as well as to new missions.
Further studies are recommended to calculate the relative weights of parent SM products, based on maximized signal-to-noise ratios, at different seasonal scales, and it maybe gives better results due to seasonal variations. Also, studies need to be performed on the possibilities of merging SM retrievals from more than two satellites. Finally, TCM has been applied successfully in different fields include oceanography, hydrometeorology, and ecology [79][80][81][82]. Therefore, our proposed demonstrated scheme for SM has flexibility and can be applied to other biogeophysical variables.