Soil Moisture Estimation Synergy Using GNSS-R and L-Band Microwave Radiometry Data from FSSCat/FMPL-2

The Federated Satellite System mission (FSSCat) was the winner of the 2017 Copernicus Masters Competition and the first Copernicus third-party mission based on CubeSats. One of FSSCat’s objectives is to provide coarse Soil Moisture (SM) estimations by means of passive microwave measurements collected by Flexible Microwave Payload-2 (FMPL-2). This payload is a novel CubeSat based instrument combining an L1/E1 Global Navigation Satellite Systems-Reflectometer (GNSS-R) and an L-band Microwave Radiometer (MWR) using software-defined radio. This work presents the first results over land of the first two months of operations after the commissioning phase, from 1 October to 4 December 2020. Four neural network algorithms are implemented and analyzed in terms of different sets of input features to yield maps of SM content over the Northern Hemisphere (latitudes above 45o N). The first algorithm uses the surface skin temperature from the European Centre of Medium-Range Weather Forecast (ECMWF) in conjunction with the 16 day averaged Normalized Difference Vegetation Index (NDVI) from the Moderate Resolution Imaging Spectroradiometer (MODIS) to estimate SM and to use it as a comparison dataset for evaluating the additional models. A second approach is implemented to retrieve SM, which complements the first model using FMPL-2 Lband MWR antenna temperature measurements, showing a better performance than in the first case. The error standard deviation of this model referred to the Soil Moisture and Ocean Salinity (SMOS) SM product gridded at 36 km is 0.074 m3/m3. The third algorithm proposes a new approach to retrieve SM using FMPL-2 GNSS-R data. The mean and standard deviation of the GNSS-R reflectivity are obtained by averaging consecutive observations based on a sliding window and are further included as additional input features to the network. The model output shows an accurate SM estimation compared to a 9 km SMOS SM product, with an error of 0.087 m3/m3. Finally, a fourth model combines MWR and GNSS-R data and outperforms the previous approaches, with an error of just 0.063 m3/m3. These results demonstrate the capabilities of FMPL-2 to provide SM estimates over land with a good agreement with respect to SMOS SM.


Introduction
Soil Moisture (SM) is one of the Essential Climate Variables (ECVs) [1] needed to better understand the water cycle. The soil is a natural water reservoir, and it is used by plants as a nutrient carrier. The amount of SM has great relevance in dividing incoming radiation over land into latent and sensible heat fluxes through evaporation and transpiration and in partitioning precipitation into runoff, subsurface flow, and infiltration. The risk of flooding [2] CubeSats contributing to the Copernicus system [36]. The mission is composed of two 6-unitCubeSats, the 3 Cat-5/A and the 3 Cat-5/B. Flexible Microwave Payload-2 (FMPL-2) is the main payload of the 3 Cat-5/A satellite. It is a dual passive microwave remote sensing instrument, consisting of a GNSS-R receiver at L1/E1 and an L-band Microwave Radiometer (MWR) working as a total power radiometer with frequent internal calibration, with an antenna footprint of ∼350 × 500 km 2 . The instrument design and characteristics were described in detail in Munoz-Martin et al. [37], and the in-orbit validation was presented in Munoz-Martin et al. [38]. FMPL-2 is simultaneously retrieving GNSS-R and L-band MWR measurements at a rate of 2 Hz. The instrument uses a dual-band antenna to receive the GNSS-R L1 reflected signals and the L-band antenna temperature. Regarding the GNSS-R part, as compared to the CyGNSS or TDS-1 instruments, FMPL-2 uses a very short integration time of 40 ms to maximize the spatial resolution and minimize the blurring (40 ms correspond roughly to the size of the first Fresnel zone divided by the sub-satellite point speed, and it also is a multiple of 1 ms and 4 ms, the lengths of the GPSL1 C/A and Galileo E1 OScodes). Besides, it also allows for frequent internal noise calibration of the MWR using an internal matched load and an active cold load.
This work presents the first SM retrievals using FSSCat/FMPL-2 data. The performance of four ANN implementations using different combinations of FMPL-2 data is addressed. Since FMPL-2 is a dual sensor, the goal of this research is to investigate how the combination of GNSS-R and MWR data improves the SM estimation accuracy compared to the selected ground truth SM, which is used for model training and validation. This paper is organized as follows: Section 2 describes the different input features, including the data collected by FMPL-2, other ancillary products, and the "ground truth" selected to implement the data-driven algorithms. Section 3 presents the four ANN topologies designed to retrieve SM from NDVI/skin temperature data; from the combination of these two together with the FMPL-2 L-band MWR data; using the FMPL-2 GNSS-R data; finally from the combination of the FMPL-2 MWR and GNSS-R data sets. Section 4 discusses the way forward to provide an improved-quality SM product using GNSS-R data. Finally, the main conclusions that arose from this study are summarized in Section 5.

Data Description
FMPL-2 was commissioned in less than three weeks after launch. As part of its primary scientific objective (sea-ice monitoring), the FMPL-2 instrument was collecting data over the poles on a weekly basis, as described in Munoz-Martin et al. [38]. This notwithstanding, the mission also pursues to provide SM estimates over land as a secondary scientific objective. For this reason, the FMPL-2 performs acquisitions over the North Pole (down to 45º N). The data used in this study correspond to the period from 1 October 2020 to 4 December 2020.

Ancillary Data
A global SM dataset was needed as a reference for training the ANNs. In this regard, two products of SMOS were selected. The nominal accuracy of SMOS SM is 0.04 m 3 /m 3 . The error standard deviation (std) obtained when validating the ESA SMOS Level 2 SM against in situ SM observations ranged between 0.04 and 0.07 m 3 /m 3 [39]. In this study, the Barcelona Expert Center on Remote Sensing (BEC) [40] provided two dedicated SMOS SM data on a daily basis at two different spatial resolutions: 36 km and 9 km [41]. The accuracy of the products was assessed in previous studies over areas with different environmental characteristics. It is important to remark that the product used in this work shows a dry bias with respect to in situ SM data found in many sites and an unbiased root mean squared error around ∼0.04-0.08 m 3 /m 3 [42][43][44]. Note that the requirement of 0.04 m 3 /m 3 for the SMOS SM accuracy is defined over non-forested areas with low to medium topography and without snow or frozen soil. In general, validations of SMOS SM were carried out in non-forested regions. Only a few of them were conducted over forests. For instance, the ESA SMOS L2 SM was assessed over boreal forests in Saskatchewan, Canada, obtaining an un-biased root-mean-square error (ubRMSE)of 0.15-0.18 m 3 /m 3 [45], and over temperate forests in the Continental U.S., showing an ubRMSE of 0.03-0.11 m 3 /m 3 [46]. However, most recent SMOS SM processor versions (V650, released in 2017) already include validation stations in boreal forest areas that have been used to enhance the SM estimations [47].
In this paper, the SMOS SM at 36 km is used to train and validate a data-driven algorithm implemented using FMPL-2 radiometry measurements, and the one at 9 km is used for the FMPL-2 GNSS-R measurements. The 9 km product is used to validate the combined MWR/GNSS-R SM product. The ground truth product is shown in Figure 1, the dynamic change could be seen in the Video S1. Note that the projection of all maps presented in this manuscript is the global Equal-Area Scalable Earth (EASE) Grid 2.0 [48]. As the FMPL-2 pixel is ∼10 times larger than the SMOS one, other data sources were used to down-scale the original FMPL-2 pixel to 36 km. The ancillary data used to downscale those measurements were a re-gridded version of the 16 day average Normalized Difference Vegetation Index (NDVI) from MODIS [49] and the daily skin temperature from ECMWF [50], also gridded at 36 km. In addition, a land cover mask derived from MODIS was also used, gridded at 36 km. The two products are shown in Figure 2.

FMPL-2 Data
The calibration procedures of both the MWR and GNSS-R sensors were explained in Munoz-Martin et al. [37], and the FMPL-2 performance was analyzed in Munoz-Martin et al. [38]. The FMPL-2 data are sampled at 2 Hz, together with the spacecraft telemetry. Once downloaded, the measurements were geo-located using the spacecraft position and pointing. This process was based on a Nearest Neighbor Interpolation (NNI), which was described in detail in Munoz-Martin et al. [38], but using a global EASE Grid 2.0 projection at 36 km, instead of a polar EASE2 grid. The algorithm geo-locates each MWR antenna temperature (T A ) measurement into a single 36 km. Then, all T A measurements with a time-span of five days are averaged into a single map, followed by a triangulation based natural neighbor interpolation. Finally, an antenna pattern compensation was performed based on 2D moving window filtering, using a circular window representing the same area of the antenna footprint. In this way, an entire map out of five days of measurements was generated. Thanks to this technique, there were no gaps in the antenna temperature data, which lately has helped collocate GNSS-R and MWR measurements to provide the combined MWR/GNSS-R SM estimation. Figure 3 illustrates the results of the NNI algorithm. This method allows detecting the coastline borders with a finer resolution than the one given by the antenna footprint (350 × 500 km 2 ), as can be seen over Greenland (see Figure 3). For example, the ocean area at the top of Canada, where it seems there is no land-water transition, is completely covered by sea-ice; thus, there is "jump" in the measured antenna temperature values, as the entire region is covered by sea-ice, which has a larger apparent brightness temperature than the ocean [51].
Finally, Figure 4 shows the reflectivity of the collected (a) GNSS reflections over land, (b) the associated incidence angles, and (c) the estimated SM values interpolated from the SMOS SM product. The reflectivity is calculated as shown in [37], and as explained in [38], all the GNSS-R tracks were pre-processed to filter out reflections with poor SNR or with inconsistent absolute Doppler frequencies. Moreover, only reflection points that had a time-collocated SMOS SM measurement were used in this study. After filtering, the resulting amount of specular reflections was 12,585. The collocation of the SMOS SM product was performed by interpolating the SMOS SM values to the reflection points' location using a nearest neighbor 2D linear interpolation, producing scattered SM values over the specular points.

Soil Moisture Retrieval Using ANN
As ANNs show a great potential to solve non-linear problems, these algorithms have also been applied to soil moisture retrieval. The ECMWF is assimilating real-time Level 2 SM derived from SMOS using ANN [52], and several other SM retrieval algorithms using CyGNSS GNSS-R data are also using ANNs [29,53]. However, ANNs have several challenges: to select the correct ground truth (known as the algorithm "target") to train the algorithm, to correctly pre-filter and collocate the input data set and to avoid network over-fitting [54].
All previous ANN algorithms using GNSS-R to retrieve SM used either SMAP or in situ probes as the ground truth. However, there is a large collocation error when the CyGNSS observable, with a native resolution ∼0.5 × 6.5 km 2 [28], is collocated with an SMAP cell of 36 km. In all previous cases, multiple averages were used to finally provide a good agreement between the resulting network and the target values (see Table 1 and Figure 13 from [29]). Moreover, algorithms using in situ probes as the ground truth (e.g., [30,55]) compare the CyGNSS reflectivity with single point measurements. Thus, it is necessary to apply either an averaging of different in situ probes or a statistical estimation of the SM of the equivalent GNSS-R pixel under study. In this case, using erroneous targets may introduce a bias in the algorithm and network over-fitting (i.e., the ANN does not perform well out of the training set).
The aim of this work is to investigate different ANN algorithms to retrieve SM using FMPL-2 MWR and GNSS-R data. For that reason, different ANNs are proposed to validate the performance of the MWR and the GNSS-R data separately. Each of the ANNs used in this study was dimensioned to avoid network over-fitting [54]. The amount of neurons in each of the layers was always less than two times the number of inputs. Each neuron used the sigmoid function as the transfer function, and the training process was stopped at an early stage to prevent noise over-fitting, as suggested in [54]. Finally, the trained networks were always pruned after training [56]. Table 1 details the different data-driven algorithms studied in this section, indicating the input data for each algorithm and the "ground truth" used for training each ANN.

Using Optical Data
L-band MWR data were used to recover SM, but several algorithms have also been proposed using optical data only, as NDVI and Land Surface Temperature (LST) measurements (e.g., [10]). In [11] (pp. 47-72), other methods based on optical data were summarized. Although it is possible to retrieve SM from optical data, errors are larger than if L-band MWR is built into the model. The two following main objectives are pursued in this section. First is to show the capabilities of an ANN based algorithm to provide SM estimates just using optical data, and second is to serve as a validation base of the down-scaling algorithm from FMPL-2 MWR data provided in Section 3.2.
The first approach is based on the aforementioned skin temperatures and NDVI datasets. The ANN used here was a three hidden layer network with six neurons per layer. The data were divided into 15% for training, 5% for validation, and 80% for testing, and the network "ground truth" was the SMOS-derived SM product gridded at 36 km. The input data were the following: Skin temperature from ECMWF [50], and • Land cover mask from MODIS [49].
Results are shown in Figure 5. As can be seen, the Pearson correlation coefficient R was 0.56, and the Standard Deviation of the Error (STD(Err)) was 0.084 m 3 /m 3 . Note that the scatter-density plot shows many pixels with an SM value of ∼0.2 m 3 /m 3 in order to minimize the error. The shortcoming of this methods becomes clear as the network is missing the valuable information coming from the L-band MWR. Figure 5. (a) Scatter-density plot between the ANN output using NDVI and skin temperature as input data and the SMOS SM product as the network reference value; and (b) the error histogram between the ANN output and the SMOS reference.

Using L-Band Microwave Radiometry Data
In this second approach, the FMPL-2 MWR data were added to the skin temperature and to the NDVI data sets to improve the SM estimation. The five day averaged FMPL-2 MWR data presented in Figure 3 were combined with the daily skin temperature and the 16 day averaged NDVI data. This produced a daily product that can be compared to the daily SMOS SM product gridded at 36 km.
The same network topology using three hidden layers was used here, but with 10 neurons per layer. In this case, the number of neurons was increased as more inputs were used in the model. Then, the data were divided again into 15% for training, 5% for validation, and 80% for testing. In this case, additional low resolution data inputs were used (see Figure 6) in addition to the FMPL-2 MWR data. The inputs of the network were: Land cover mask from MODIS [49], and • Low resolution land cover mask from MODIS [49].
The objective of this second approach was to down-scale the coarse footprint measurement of the FMPL-2 radiometer to a "higher resolution" pixel by using the NDVI and the skin temperature gridded at 36 km. Pixel down-scaling with L-band MWR data is a known technique [41,57], where optical datasets are usually complemented with coarsescale radiometry observations to achieve higher resolved images. However, to the authors' knowledge, this is the first implementation of a down-scaling algorithm using ANNs. The idea behind the network architecture is similar to the one conceived of in [41], but here, the ANN algorithm takes the NDVI, the skin temperature, the land cover mask, and the antenna temperature at a coarse resolution, and then, with the help of higher resolution NDVI, skin temperature, and land cover maps, it down-scales the original antenna temperature to provide a down-scaled SM product. An example of the skin temperature after "blurring" is provided in Figure 6a for illustration purposes. Moreover, the standard deviation of the antenna temperature measured in the along-track direction was also used as a proxy to correct the edge transitions (i.e., land-water transitions, as highlighted in Figure 6b).  Figure 7 shows the scatter-density plot and the error histogram of the ANN used to provide the down-scaled FMPL-2 soil moisture at 36 km. As compared to the previous case, adding the FMPL-2 MWR data resulted in more accurate SM estimations. The R increased from R = 0.56 to R = 0.69, and the STD(Err) decreased from 0.084 to 0.074 m 3 /m 3 , showing a negligible bias in both cases. This result validates that the five day averaged FMPL-2 antenna temperature data enhances the SM retrieval algorithm. Therefore, even at a coarse resolution, the FMPL-2 MWR data can be used to estimate SM. It is important to remark that studies comparing different remote sensing SM products (i.e., from the Advanced Microwave Scanning Radiometer, SMAP, or SMOS) to ground truth sensor networks [58] showed similar errors to the ones presented in this section (i.e., root mean squared differences of ∼0.076 m 3 /m 3 and unbiased root mean squared difference of ∼0.056 m 3 /m 3 ).  Figure 8 shows the estimated SM maps using the FMPL-2 MWR data and the errors with respect to the SMOS SM estimations. In this case, high resolution ancillary data contain the relevant information to obtain SM maps at full coverage. The first remarkable aspect is that the map corresponding to the latest period (Figure 8d) includes lower latitudes due to a change in the spacecraft execution scheduling, which started acquiring data above 45º Latitude North, as explained in Section 2. The error maps present many areas with errors lower than 0.1 m 3 /m 3 (i.e., color scale around yellow, green, and light-blue). Note that, as the FMPL-2 antenna temperature measurements used in this algorithm were built from the average of five days in order to provide a full-coverage map, SM estimations on areas with strong rain events occurring after FMPL-2 measurements will exhibit larger errors. To solve this problem in future missions, additional data sets could be used in the algorithm: for example, a constellation of multiple satellites including FMPL-2 sensors would provide a finer temporal resolution. Thus, if 5-8 satellites are used in different orbital planes, the required amount of time averages would decrease from five days to just one day, making it possible to provide a daily full coverage SM map above 45º N.

Using GNSS-R Data
Since the launch of CyGNSS, several data-driven algorithms have shown their potential to estimate SM from their measurements [59]. However, the GNSS-R data retrieved by the SGR-ReSi receiver [60] used in the TDS-1 and CyGNSS missions are configured with long integration times (i.e., 0.5 or 1 s incoherent integration time), as the main objective of these missions is to study the ocean.
In FMPL-2, the integration time of the GNSS-R processor is set to 40 ms in order to enhance the spatial resolution and to avoid waveform blurring. As seen in [61], the groundtrack displacement in 40 ms from a spacecraft orbiting at ∼550 km is equivalent to the size of the first Fresnel zone (i.e., 7.5 km/s · 40 ms = 300 m) of the GNSS signal reflection.
In this section, an ANN approach based on [62] is proposed to retrieve SM from GNSS-R measurements. The method uses the moving average (movmean) and the moving standard deviation (movstd) of the reflectivity (computed in linear units and then converted to dB) to infer the SM values. As discussed in Munoz-Martin et al. [62], the movmean and the movstd parameters are used by the ANN as a proxy to correct for the surface roughness effects.
In this case, a very simple network is proposed, using just two hidden layers, with six neurons per layer. The ANN target is the SMOS 9 km SM product. However, as introduced in Section 2.2, instead of gridding the GNSS-R data into the 9 km SMOS grid, the SMOS SM product is interpolated over the specular point location using a 2D linear interpolation algorithm. As the data set (see Figure 4) contains fewer points as compared to the MWR case, enough data should be used to train the algorithm to prevent over-fitting. Thus, the data were randomly divided into 60% for training, 10% for validation, and 30% for testing. The input data of this network were the following: Moving average of the reflectivity (movmean(Γ)) over N samples, • Moving standard deviation of the reflectivity (movstd(Γ)) over N samples, as a proxy to correct the surface roughness and speckle noise effects, and • Moving average of the SNR (movmean(SNR)) over N samples.
where N is the number of samples used in the movmean and movstd operations. Note that, four different N values are proposed: 3, 5, 10, and 20. By instrument design, the length of a GNSS track produced by FMPL-2 is up to 90 samples successively (i.e., 45 s of data sampled at 2 Hz). Figure 9 presents the density-scatter plot and the error histogram of the ANN output developed to estimate SM using GNSS-R data from FMPL-2 with respect to the collocated SMOS SM from BEC [40]. By looking at the shape of the density-scatter plot and the error histograms, in general, the ANN algorithm does not provide accurate results for low SM values (<0.15 m 3 /m 3 ), showing a large dispersion in this range. Similar results were discussed in [63], as due to the very low reflectivity of dry soils, GNSS-R loses sensitivity. Thus, the ANN algorithm over-estimates low SM values and under-estimates larger SM values.
The Pearson correlation coefficient (R) and the STD(Err) were computed for each integration time used to evaluate the algorithm performance for different averaging (N). As can be seen, as the averaging time increases, the correlation coefficient increases, and the STD(Err) decreases. For the shortest integration time (N = 3), the correlation coefficient was R = 0.52, and the STD(Err) was 0.094 m 3 /m 3 . For longer integration times, the R increased, and the STD(Err) decreased. For N = 5, the correlation coefficient grew up to R = 0.56, and the STD(Err) decreased down to 0.091 m 3 /m 3 . As more samples were used in the "moving" operation, the R increased even more, up to R = 0.61 and R = 0.62, for N = 10 and N = 20, respectively. The STD(Err) was also reduced to 0.087 m 3 /m 3 for both cases. It is important to remark that the bias was negligible (<0.001 m 3 /m 3 ) in all presented cases. This result is in agreement with the error graph presented in Figure 13 from [29], where the larger error was linked to areas where fewer samples were used to average the GNSS-R reflectivities. As discussed in [62], the effect of these "moving" operations is to smooth the surface roughness and to reduce speckle noise effects. When the averaging is large enough (i.e., N = 20), the correlation coefficient is maximized, and the error with respect to the SMOS SM product is minimized.
The increased average (i.e., larger N) produces a degradation of the spatial resolution, as more samples are required to derive a single measurement. In this case, taking into account the sampling rate (2 Hz) and the ground-track velocity (v track ∼6.5 km/s), a N = 3 means that the spacecraft has moved ∼9.75 km during moving window computation; thus, the equivalent spatial resolution following this technique would be ∼9.75 km for N = 3, 16.25 km for N = 10, 32.5 km for N = 10, and 65 km for N = 20. This phenomenon is a clear example of the remote sensing trade-off, which implies that maximizing the spatial and the radiometric resolution at the same time is not possible, and one is usually sacrificed for the benefit of the other. These results are in agreement with other studies using CyGNSS data [29,53]. It is shown that a large number of averages is required to retrieve accurate SM values from GNSS-R, showing similar errors to other algorithms developed using CyGNSS data.
Finally, Figure 10 presents the geo-located specular points with the estimated SM values and their computed errors with respect to the SMOS SM estimates, for two time periods, from 1-31 October 2020 and from 1 November to 4 December 2020. The error maps are defined as the network output minus the ground truth. Thus, a negative error means that the ANN algorithm estimates a lower SM than the actual ground truth (under-estimation), and a positive error means that the algorithm estimates a larger SM value than the actual ground truth (over-estimation). By looking at the first data period, it can be seen that larger errors were found in the area at the top of Canada (65º N, 100º W). In this case, most errors were under-estimations of SM for areas with moderatelow moisture content, as also identified in the scatter plots in Figure 9. Furthermore, the SM retrieved over Russia showed, in general, a good accuracy, with errors around ±0.1 m 3 /m 3 and a few SM points containing a slight over-estimation on low-moderate SM values (∼0.1 0.2 m 3 /m 3 ).
For the second period, focusing on the area at the top of Canada, the algorithm now presented SM over-estimations. In this case, the area contains several points over very dry areas (see Figure 4), and some others that are very wet, at the southern side of Hudson Bay. As expected from the scatter-density plots, dry areas were over-estimated, and wet areas were under-estimated.
Retrieving soil moisture using single-pass GNSS-R is a complex task [26]. In many cases, the surface roughness, and its dependence on SM, make SM recovery difficult. In our case, the training process of the ANN algorithm became biased because of the large dispersion of the GNSS-R reflectivity values due to this effect. For this reason, additional data must be used to develop a more accurate model, even using movstd(Γ) as a proxy to correct the surface roughness effects.

Using Combined GNSS-R and Radiometry Data
In order to correct the fitting errors produced by lower SM values, which then causes this "bias" for larger SM values, the L-band MWR data from FMPL-2 were used. Brightness temperature data are the complementary magnitude of the reflectivity [64]. Thus, even at a very low resolution, FMPL-2 data may contain meaningful and valid information to enhance the SM recovery when combined with GNSS-R.
The algorithm proposed in this section combines both measurements to provide a more accurate SM estimation. As is seen in Equation (12) from [65], the reflectivity of a given area is linked to its brightness temperature. In our example, the coarse resolution MWR data were used as ancillary data to improve the SM retrieval algorithm using the high resolution GNSS-R data. The network proposed here was the same as in the GNSS-R example: two hidden layers with six neurons per layer, using the same train/validation/test division. The target output was the same as in the previous case, and the input set now included the MWR radiometry data as follows: • FMPL-2 antenna temperature, • FMPL-2 standard deviation of the antenna temperature in the along-track measurement, • Incidence angle θ inc , • Moving average of the reflectivity (movmean(Γ)) over N samples, • Moving standard deviation of the reflectivity (movstd(Γ)) over N samples, and • Moving average of the SNR (movmean(SNR)) over N samples.
In this case, the same approach to collocate the target output was used to collocate the FMPL-2 measurements. Therefore, a 2D linear interpolation was used to generate the scattered data from the 36 km FMPL-2 gridded data. In this case, the standard deviation of the antenna temperature was also used as a proxy to correct the edge transitions (i.e., land-water transitions). Figure 11 shows the scatter-density plot and error histogram of the new ANN algorithm to derive SM from the combined MWR and GNSS-R data, as compared to the SMOS SM. In this last case, the resulting network showed a significant improvement with respect to the GNSS-R-only case. For lower SM values, the algorithm was more accurate, presenting, in general, a lower dispersion. This is also reflected in the Pearson correlation coefficient and in STD(Err). The R was boosted from 0.52 to 0.79 for N = 3, from 0.56 to 0.80 for N = 5, from 0.61 to 0.81 for N = 10, and from 0.62 to 0.82 for N = 20. In the four cases, the bias was also negligible, and the STD(Err) also decreased from 0.094 m 3 /m 3 to 0.067 m 3 /m 3 , from 0.091 m 3 /m 3 to 0.066 m 3 /m 3 , from 0.087 m 3 /m 3 to 0.064 m 3 /m 3 , and from 0.087 m 3 /m 3 to 0.063 m 3 /m 3 , for the four N cases: 3, 5, 10, and 20, respectively. The use of the FMPL-2 MWR data contributed to decreasing the overall error with respect to SMOS SM at 9 km, even for small values of N. The error difference between the shorter integration time and the largest integration time was 0.04 m 3 /m 3 .
Finally, Figure 12 presents the geo-located specular points as in the previous section, but with this new ANN algorithm that uses the MWR and GNSS-R data. As compared to the previous case, most of the errors previously analyzed in Figure 10 were now lower than 0.1 m 3 /m 3 . As compared to the previous case (using only GNSS-R data), the errors produced in the northern part of Canada were now drastically reduced. Drier areas were better detected thanks to the use of the MWR data, which is the complementary magnitude of the reflectivity. Thus, when the GNSS-R reflectivity is low, the antenna temperature retrieved by FMPL-2 MWR is high, and vice versa. This synergy allows solving the uncertainty for the surface roughness attenuation, which produces low reflectivity values for high SM values. Thanks to the MWR data, low reflectivity values that are collocated with a low antenna temperature corresponds to wet soils, but the reflection is affected by the surface roughness. Analogously, low reflectivity values collocated with a large antenna temperature correspond to drier soils.

Discussion
Machine learning works as a data-driven approach, and data from different sources must be introduced as input data to deliver quality results. Different space-borne approaches have been presented in the last few years using ANN to estimate SM from GNSS-R measurements, as seen in Table 1 from [29]. As summarized in Table 2, four ANN based models were implemented to infer SM from a different combination of input features. The use of movstd on a set of consecutive samples proposed in [62] was used to correct the speckle noise and surface roughness effects, and it was validated for SM retrieval using spaceborne GNSS-R data. In Figure 13, the error of the airborne algorithm presented in [62] is compared to the error achieved using FMPL-2 GNSS-R data. As can be seen, both error curves follow the same trend, where a large number of measurements or Fresnel zones are required to provide a lower error. Note that this effect was also studied in Figure 13 of [29], where a large number of samples within a grid were required to achieve a lower error. Moreover, there is an almost constant "gap" between both curves of ∼0.03 m 3 /m 3 , which in this case might be due to the difference in directivity between the airborne GNSS-R instrument (i.e., ∼21 dB [66]) and FMPL-2 (i.e., ∼12 dB [37]). However, the best results were achieved when combining the GNSS-R and the L-band MWR data retrieved by FMPL-2. The intrinsic synergy of GNSS-R with L-band MWR allows for more accurate SM retrievals, with a finer spatial resolution over the specular reflection points.  [62]) and the spaceborne case (FMPL-2). Note that the X-axis is normalized by the number of Fresnel zones used to derive the SM estimation (i.e., the N parameter used in Section 3.3). A regression curve is fit to the Std(Err) of both error curves to ease the comparison.
In both the airborne case in [62] and in the methodology described in this paper (Section 3.4), the use of combined data decreases the SM estimation error with respect to the SMOS SM target. However, this is not the case for other algorithms (i.e., Table 1 from [29]), where the efforts are focused on decreasing the amount of ancillary data used to retrieve SM from GNSS-R data, at the expense of using large spatial and temporal averages. However, this is not how a data-driven (and machine learning) algorithms works. Proper inputs shall be strategically selected to retrieve SM at larger spatial and radiometric resolutions. Thus, combining data from different sensors is key to retrieve a higher quality product.
CubeSats have a great potential for Earth observation. As is shown in this work, even a CubeSat based instrument can be used to retrieve SM measurements with good accuracy, although not as good as with traditional larger satellites, such as SMOS or SMAP. FMPL-2 has shown its potential to infer an enhanced resolution SM by means of combined MWR and GNSS-R measurements, in addition to the down-scaling algorithm using optical multi-spectral data used to derive vegetation indices and LST. In addition, a CubeSat based constellation with eight satellites embarking FMPL-2 as the payload would allow entirely covering all latitudes above +45º N every single day, with enough footprint oversampling to retrieve daily SM products (FMPL-2 MWR on board 3 Cat-5/A requires up to 3-5 days, depending on the downlink scheduling, to cover all latitudes above 45º N and 5-8 days to cover all latitudes above 35º N).
The more sensors in orbit, the more data will be available to implement data-driven algorithms like those presented in this work. This is the idea behind the CubeSat philosophy. Therefore, this hypothetical constellation of CubeSats embarking FMPL-2 among other optical sensors would allow high-quality Earth observation products, with improved revisit times, radiometric accuracy, and spatial resolution, as compared to the product currently available. The way to beat the trade-off between spatial and radiometric resolutions is to add extra information and to benefit from data-driven and data fusion algorithms, such as ANN.

Conclusions
This work presents the first SM retrievals using FMPL-2 data, showing the potential of CubeSats to produce quality science products for land monitoring applications. Four different algorithms based on ANNs are presented. The first algorithm serves as validation of the initial ECMWF skin temperature and MODIS NDVI data to retrieve SM. The second algorithm presents an ANN based pixel down-scaling technique, which takes the FMPL-2 MWR data along with MODIS NDVI data and ECMWF skin temperature to produce down-scaled SM measurements at 36 km resolution from an initial MWR pixel of 350 × 500 km 2 . The effect of adding FMPL-2 data to the ANN is validated, showing a performance improvement when the FMPL-2 antenna temperature is used.
The third algorithm presents an ANN using GNSS-R data from FMPL-2 to estimate SM content, without using any ancillary data. The proposed ANN takes movmean and the movstd of N consecutive samples together with the incidence angle and compares them to a down-scaled SMOS SM product at 9 km resolution. As expected from other SM studies using GNSS-R, the larger the N samples used to produce a measurement, the smaller the error is. In this case, by just utilizing GNSS-R data, the algorithm shows a standard deviation of the error of 0.079 m 3 /m 3 at N = 20 samples.
Finally, a fourth algorithm that combines FMPL-2 MWR and GNSS-R data is presented. It is shown that the combination of L-band antenna temperature measurements with GNSS-R reflectivity data significantly enhances the retrieval algorithm performance. In this case, the error is decreased by ∼27.5%, with a standard deviation of the error of 0.067-0.063 m 3 /m 3 , for N = 3 to N = 20, respectively. In conclusion, the combination of collocated MWR and GNSS-R data from FMPL-2 produces a higher accuracy in the soil moisture estimation. Data Availability Statement: Data used in this study will be publicly and freely available for everyone at the Copernicus system as part of the FSSCat mission.