Reconciling Flagging Strategies for Multi-Sensor Satellite Soil Moisture Climate Data Records

: Reliable soil moisture retrievals from passive microwave satellite sensors are limited during certain conditions, e.g., snow coverage, radio-frequency interference, and dense vegetation. In these cases, the retrievals can be masked using ﬂagging algorithms. Currently available single-and multi-sensor soil moisture products utilize di ﬀ erent ﬂagging approaches. However, a clear overview and comparison of these approaches and their impact on soil moisture data are still lacking. For long-term climate records such as the soil moisture products of the European Space Agency (ESA) Climate Change Initiative (CCI), the e ﬀ ect of any ﬂagging inconsistency resulting from combining multiple sensor datasets is not yet understood. Therefore, the ﬁrst objective of this study is to review the data ﬂagging system that is used within multi-sensor ESA CCI soil moisture products as well as the ﬂagging systems of two other soil moisture datasets from sensors that are also used for the ESA CCI soil moisture products: The level 3 Soil Moisture and Ocean Salinity (SMOS) and the Soil Moisture Active / Passive (SMAP). The SMOS and SMAP soil moisture ﬂagging systems di ﬀ er substantially in number and type of conditions considered, critical ﬂags, and data source dependencies. The impact on the data availability of the di ﬀ erent ﬂagging systems were compared for the SMOS and SMAP soil moisture datasets. Major di ﬀ erences in data availability were observed globally, especially for northern high latitudes, mountainous regions, and equatorial latitudes (up to 37%, 33%, and 32% respectively) with large seasonal variability. These results highlight the importance of a consistent and well-performing approach that is applicable to all individual products used in long-term soil moisture data records. Consequently, the second objective of the present study is to design a consistent and model-independent ﬂagging strategy to improve soil moisture climate records such as the ESA CCI products. As snow cover, ice, and frozen conditions were demonstrated to have the biggest impact on data availability, a uniform satellite driven ﬂagging strategy was designed for these conditions and evaluated against two ground observation networks. The new ﬂagging strategy demonstrated to be a robust ﬂagging alternative when compared to the individual ﬂagging strategies adopted by the SMOS and SMAP soil moisture datasets with a similar performance, but with the applicability to the entire ESA CCI time record without the use of modelled approximations.


Introduction
Recognized as an Essential Climate Variable in 2010, soil moisture is one of the primary drivers of water, energy, and carbon cycles [1,2]. Climate change-induced changes in these cycles may have a more profound effect on human and nature than global warming itself [3]. For this reason, consistent, long-term soil moisture records are key in fully understanding the impact of climate change.
Currently, long-term data records (from 1978 onwards) of soil moisture, such as the soil moisture (SM) datasets of the Climate Change Initiative (CCI) of the European Space Agency (ESA), are provided on a global scale by remote sensing satellites using passive and/or active microwave sensors [3][4][5][6][7]. As the full global climate record is not covered by a single satellite sensor [8], multiple satellite sensor products have to be merged to produce a Climate Data Record (CDR). A CDR is a time series of measurements that is characterized by a sufficient length, consistency, and continuity to capture climate variability and change. However, differences in mission design, sensor system, and retrieval algorithms have led to inconsistencies between satellite SM products [9][10][11].
Over the past decade, several studies aimed to assess and improve the consistency between products [12] by conducting quality comparisons [4,13,14], sensor calibration [3,13,15], uniform SM retrievals [16,17], and the merging of datasets [3,5,7,16,18]. However, to the best of our knowledge no research studies have been devoted to inconsistencies in flagging in the context of merged microwave-based soil moisture records. Although microwave observations are not dependent on daylight or cloud-free conditions as compared to optical-range observations, reliable retrievals are still limited by a range of conditions. These conditions can be related to weather events, like snow, frost, and heavy precipitation or to the land cover in case of densely vegetated, mountainous, or deserts regions [5]. In these conditions, retrieved SM data should be accompanied by one or more flags.
A flag is commonly defined as a binary quality indicator, which aims to mark a condition in which the SM retrieval is negatively influenced. A flag can be classified either as critical or as advisory:

1.
A critical flag is used by the dataset producer to decide that the retrieved SM value is not considered appropriate for dissemination and hence is replaced by Not a Number (NaN). Therefore, well-performing critical flags lead to a reduction of data availability and an improvement of the data quality; 2.
An advisory flag indicates that the data value should be interpreted carefully and can be filtered out by the user.
Flagging inconsistencies in the observational conditions could easily result in the incorrect interpretation of SM values. Therefore, it is important to evaluate the differences in the current flagging strategies of key satellite SM products from dedicated SM missions that form an important input of long-term climate records such as the ESA CCI SM. In this study, we will focus on two passive SM sensors, the Soil Moisture and Ocean Salinity (SMOS) SM and the Soil Moisture Active/Passive (SMAP) SM datasets. The two single sensor datasets are chosen because of their key importance for long-term climate records (especially for low-frequency observations) [19][20][21]. In addition, they are the only two input products from which auxiliary model data are extracted to flag frozen surfaces or snow in ESA CCI SM. To give an example of the flagging differences between the two sensor datasets, SMOS SM has seven snow flags, four frozen soil flags, one frozen vegetation, and two ice flags, whereas SMAP SM has two snow/ice flags and two frozen soil flags. It would be cumbersome for the individual user to remove all flagging inconsistencies, as it would require much research to assess the data source dependencies and accuracy of all flags.
A flagging system for multiple satellite SM sensor datasets already exists for the ESA CCI data products, which filters the SM values after the merging of individual sensor products. To provide insights into the flagging strategies on both the single sensor and multi-sensor level, we will provide an overview of the flagging systems of SMOS, SMAP, and ESA CCI SM.
Differences in flagging systems have multiple consequences. One of the consequences is that the spatial and temporal data availability and quality may vary between datasets resulting in differences in climatologies (and hence in anomalies). For areas characterized by a snow or a (heavy) rain season, SM data availability and quality at the beginning or end of the season are expected to differ considerably between datasets due to differences in flagging methods and threshold values. In these periods, the flag values will namely be relatively close to their thresholds. Figure 1 shows an example of insufficient flagging of snow and/or frozen soil conditions, in which the ESA CCI SM product exhibits unrealistic dry anomalies in April 2017 at the Northern Hemisphere permanent snow/ice boundaries. Another consequence is that model based flags imply a dependency of the related satellite data product. This decreases its usefulness for independent model evaluation purposes [5,22]. The purpose of this paper is to present an overview of the current data flagging inconsistencies in the context of long-term ESA CCI SM datasets and to provide an alternative model-independent solution for the detected inconsistencies. The alternative flagging algorithm is designed based on the research of Grody [23], Zhao et al. [24], and Jin et al. [25] for snow, ice, and frozen soil conditions that can be applied to the individual sensor products for the entire ESA CCI SM record. Whereas the ESA CCI SM currently only uses a flag that is partially based on model data for frozen surfaces and snow, the new flagging algorithm is based on the physical relationship between the emissivity and frozen water molecules in snow, ice, and frozen soils. To investigate the capability of this algorithm so as to fully capture snow, ice, and frozen events, we have compared it to the more extensive flagging strategies for snow, frozen, and ice within SMOS and SMAP SM. The comparison was two-fold: It included both an assessment of data availability impact and an evaluation of the performance against in situ measurements of 399 stations in Northern America.

Satellite Data
A series of satellites with passive microwaves sensors, including the Advanced Microwave Scanning Radiometer 2 (AMSR2) [26], the SMOS mission [27], and the SMAP mission [28] have now been in orbit for multiple years. These sensors have been successfully monitoring SM conditions globally based on Tb measurements, which resulted in different publicly available SM products. Only SM products based on passive microwave observations were included in this study to reduce complexity. The analysis used SMOS version 300, SMAP version 6, and the ESA CCI version 4.5 SM products and the corresponding flagging systems. Of these three, the ESA CCI SM was not included in the comparison of the flagging impact, as disentangling the impacts of filtering at the sensor level and those at the multi-sensor level was not within the scope of the paper. For the comparison and evaluation of the new flagging strategy, the Tb data products of AMSR2 were used.

AMSR2 Tb Sensor Data Products
AMSR2 launched in May 2012 on the first-generation satellite of the Global Change Observation Mission -Water (GCOM-W or "SHIZUKU"), is a multi-frequency microwave radiometer system [29,30]. The instrument is the successor of JAXA's Advanced Microwave Scanning Radiometer for the Earth Observing System (ASMR-E) on board NASA's Aqua satellite, the first passive microwave sensor (May 2002-October 2011) that was broadly used for SM retrieval from Tb [31][32][33][34]. AMSR2 measures horizontally (H) and vertically (V) polarized Tb for descending/ascending orbital equatorial overpassing at 1:30 A.M./P.M. local time. The resolution of the sensor footprint varies within the seven frequencies (6.9, 7.3, 10.7, 18.7, 23.8, 36.5, and 89.0) at which AMSR2 measures, from approximately 5 km (89 GHz) to 60 km (6.9 GHz) [35,36]. Here the AMSR2 L1R dataset Version 2.220.220 for the period 01 January 2016 to 31 December 2019 were obtained from https://gportal.jaxa.jp/gpr/ and was used at a gridded spatial resolution of 0.25 degree.

The SMOS SM Sensor Data Products
The first satellite that was dedicated and designed for measuring surface SM and sea surface salinity was the SMOS satellite launched in 2009 by the European Space Agency (ESA). The satellite boards a polar-orbiting 2-D interferometric radiometer at L-band (1.4 GHz) [4,27,37]. This frequency is not only chosen because it is the most favorable frequency for retrieving SM [38] and sea surface salinity, it is also the only protected frequency below 5 GHz [39,40]. SMOS follows a sun-synchronous orbit with local time 6 A.M. (ascending)/P.M. (descending) with a revisit time between 1 and 3 days [41]. The swath width of SMOS is 1000 km and the nominal resolution is 25 km to 50 km depending on the position of the footprint within the field of view [42].
Level 3 (L3) ascending SM data from the Centre Aval de Traitement de Données SMOS (CATDS) provided in a 25 km Equal Area Scalable Earth (EASE) grid Version 2 were used for this study [41,43,44]. The global L3 SM is composed of multi-orbit retrievals. Each retrieval derives SM from the multi-angular (0-55 degree) L-band Tb measurements for two orthogonal polarizations [45].

The SMAP SM Sensor Data Products
The SMAP mission, launched by the National Aeronautics and Space Administration (NASA) in 2015 [26], was designed to globally monitor the SM of the Earth's surface. The SMAP satellite includes both a passive radiometer and an active radar at L-band that measure the status of the hydrosphere in a syn-synchronous orbit with local time 6 A.M. (descending)/P.M (ascending) consisting of a 1000 km swath, with a resolution of about 40 km, and a revisit time of 2-3 days [46].
Due to an irrecoverable hardware failure of the radar, only the radiometer-derived SM product became operational, using the observed Tb [47]. In this study, the Global Daily 36 km EASE-grid passive radiometer Level 3 SM product (L3_SM_P v6, accessible via https://nsidc.org/data/SPL3SMP/versions/6) was used for the period 01 January 2016 to 31 December 2019.

The ESA CCI SM Multi-Sensor Data Products
Within ESA CCI, a set of active and passive Level 2 (i.e., in swath geometry) SM datasets are merged into a harmonized global record based on a thorough understanding of their error characteristics (http://www.esa-soilmoisture-cci.org) [3,5,7,18,48]. To obtain consistency in methodology across the different sensors, only active microwave products from ESA and H-SAF, derived with the TU Wien method [49] and passive microwave products retrieved with the Land Parameter Retrieval Model (LPRM) [16] are currently being used in ESA CCI SM products [5].
In the present study, only the flags of the passive microwave SM product (ESA CCI SM v04.5, http://dx.doi.org/10.5285/38b8e5e524e1449ab4b4994970752644) were considered. For more information on ESA CCI SM products please refer to Dorigo et al. [5] and Gruber et al. [48].

Ground Observations
For the evaluation of the new flagging method, ground measurements from the USDA NRCS Snow Telemetry (SNOTEL, data description, and access via: https://www.wcc.nrcs.usda.gov/snow/) and Soil Climate and Analysis Network (SCAN, data is available at https://www.wcc.nrcs.usda.gov/scan/) network were used, accessed via the data hosting facility of the International Soil Moisture Network (ISMN, https://ismn.geo.tuwien.ac.at/) [50,51]. At the time of writing (July 2020) the ISMN hosted 65 networks representing more than 2600 stations in 26 countries, with the most coverage in the United States and Eurasia.
The SCAN and SNOTEL networks are updated automatically in near real time, which means that the observations are not under quality control before submission to the ISMN. From both networks, the data of soil temperature (TS) measured at 6 am of the top layer (the highest within the first 8 cm) and snow water equivalent (SWE) were analyzed in this study. The precision of SNOTEL and SCAN observations is the same for both SWE measurements (0.1 inches; 2.54 mm) and TS measurements 0.1 • C [52]. All stations from the SNOTEL and SCAN networks that measured these two variables for the period from 2016 to 2019 were included, which are 390 and nine stations respectively of which the locations are depicted in Figure 2.

Flagging Algorithms
A detailed overview of both the flagging strategy of the multi-sensor ESA CCI SM products and of the underlying sensor SM data products SMOS and SMAP is presented in Tables A1-A3 in Appendix A. In the case of a flag being either critical or advisory based on a threshold, it is classified as critical.

ESA CCI SM Flags
The ESA CCI SM dataset has a flagging system in place that can be applied on the Tb level of all input products. Directly based on LPRM, the flags mask SM products in the case of dense vegetation, snow cover, temperature below 0 • C, and failure of the SM retrieval [53]. In the case that multiple input products are not available at a given time and location (due to, for example, flagging), it is considered whether the remaining dataset(s) still have valid observations that do not exceed the maximum error variance threshold [5]. This flagging approach might result in less days that are flagged for snow cover and frozen conditions in the beginning and at the end of a winter season. Figure 3 visualizes the "Snow_coverage_or_temperature_below_zero" flag of ESA CCI SM, that is solely based on a threshold for the land surface temperature. For all input products the 36.5 GHz land surface temperature estimates according to Holmes et al. [54] are used, except for SMOS and SMAP for which auxiliary model-based Ts are used as input [53]. Additionally, dense vegetation is masked according to sensor-specific thresholds and the land cover map identifying tropical forests masks out SM in the rainforest regions [5,53]. Only the critical flags (i.e., all except for the Open Water Body flag) are within the scope of this paper.

SMOS SM Flags
The flags in the SMOS SM products consist of four categories: Product science flags, event flags, S_Tree_1 flags, and retrieval flags. The event flags and the S_Tree_1 flags are provided to the users within the SM data products [45].
Event detection is for flagging events such as freezing, dew, snow, ice, or flood. Currently, only freezing events of soil and/or vegetation are implemented [44]. However, regions in which the snow layer is thick enough to affect L-band Tb are often excluded from SM retrieval based on masks for highly variable topography or permanent snow cover [44]. The detection of soil freezing is based on both the auxiliary IFS HRES ECMWF forecast data (i.e., the temperature of the first soil layer) and the actual status and temporal evolution of the retrieved parameters (e.g., SM, dielectric constant, and polarization ratio) [44].
Only the "S_Tree_1" flags influence whether SM retrieval takes place or not and are therefore classified as critical in this study. They flag snow, frost, and ice conditions, as well as open water, barren grounds, mountainous, and urban areas [45].

SMAP SM Flags
For SMAP's retrieval algorithm, a variety of global static information is required on conditions such as permanent land, water, forest, urban area, topography, and soil type, to accurately retrieve SM [52]. In addition, dynamic ancillary data of land cover, surface roughness, precipitation, vegetation parameters, and effective soil temperatures are used in the retrieval process of SMAP SM [55]. These flags indicate whether the ground is snow-covered, frozen, flooded, actively precipitated (at the time of the satellite overpass), or has steep sloped topography [55]. In addition, regions consisting of open water, vegetation with water content greater than 5 [kg/m 2 ], or urban areas are indicated by flags.
Pixels with snow fraction are flagged and based on the exceedance of a specific threshold that may affect SM retrieval processing as described in Table 1, therefore, this flag is considered a critical flag in this study. The same procedure is used for permanent snow/ice fraction, which is indicated in the SMAP ancillary land cover map. Due to failure of the SMAP radar it did not deliver information on frozen ground, so that the frozen soil flag in the End-of-Prime-Mission data release is still based on modeled temperature information (see Table 1 for thresholds). In the case of a precipitation event, the grid cell only receives an advisory flag as SMAP SM (L2_SM_AP) retrievals still take place [55]. Table 1. Flagging actions per snow/frozen soil fraction for Soil Moisture Active/Passive (SMAP) SM.

Snow/Frozen Soil Fraction Action
0.00-0.05 flag for recommended quality and retrieve soil moisture 0.05-0. 50 flag for uncertain quality and attempt to retrieve soil moisture 0.50-1.00 flag but do not retrieve soil moisture

A New Flagging Strategy for Snow and Ice
This paper introduces a global flagging strategy that is based on 18.7-36.5 GHz passive microwave satellite observations, which are consistently available from 1978 onwards. The proposed methodology consists of the decision tree depicted in Figure 4, that detects snow cover and frozen conditions. The proposed strategy is based on a refined version of Jin et al. (2015) [25]. As it can be applied in an uniform way to the Tb's of passive microwave satellites, the algorithm solves for any flagging inconsistencies and supports model independency. Here the method was applied to AMSR2 Tb' s, because these are measured at the three frequencies (18.7, 23.8, and 36.5 GHz) required and in the same period as SMOS and SMAP (2015-present). However, the algorithm can easily be applied to other passive microwave satellites, including the SSM/I (S) constellation, TRMM, and GMI to extend temporal coverage. The flagging strategy is based on the emissivity contrast between the 36.5 GHz and 18.7 GHz that is present for snow/frozen conditions, which is the result of two physical effects. The first effect consists of the frequency-dependent relationship between the dielectric constant and presence of unbounded water molecules [39,54,56,57]. Whereas the real part of the relative dielectric constant of pure ice is ε ≈ 3.2 across the full microwave domain, for pure water it ranges from ε ≈ 90 at L-band to close to 0 at the highest microwave band (mm-band, 110-300 GHz) [54]. The decrease in dielectric constant (and corresponding increase in Tb) due to freezing is thus bigger for low frequencies compared to high frequencies. In addition, greater emission depths due to soil freezing leads to scatter darkening by heterogeneities within the frozen soil [58]. This is more efficient at higher frequencies as the heterogeneities are larger relative to wavelength, resulting in a larger decrease in Tb of high frequencies compared to low frequencies due to freezing. Both effects lead to a larger negative contrast for (T b,36.5V − T b,18.7V ).
The first threshold value (i.e., <−2.5) was found by analyzing the dynamical patterns of SM and Tb for 40 locations spread out in Canada, Brazil, and Europe. A negative threshold was expected based on the relation between the emissivity contrast and freeze/thaw conditions, which was first described by Zuerndorfer et al. [59,60] in relation to the dielectric constant and by England et al. [58] in relation to volume scattering. Although the contrast in potential emissivity would be larger when comparing a lower (than 18.7 GHz) frequency to a higher (than 36.5 GHz) frequency, the frequencies in the decision tree were chosen for their persistent availability since 1978 onwards.
Considering the Tb23.7V to distinguish between snow/frozen and precipitation after a first threshold that distinguishes on scattering magnitude is based on Grody [23] where Tb22 was used for this purpose. The second threshold value (i.e., for the 23.8 GHz) was detected by comparing the values to precipitation measurements of the locations in Europe and Canada.
An anomalous (positive) Tb contrast signature was observed in this study for the outer and more southern parts of Greenland which is consistent with Jin [61] and Grody and Bassist [62]. This anomalous signature was explained by the latter through lower scattering at high frequencies compared to low frequencies owing to alternating layers of consolidated and granulated ice. Including a limit on the Tb18.7V for the detection of snow conditions was already implemented in Foster et al. [63] and expected to result in a more homogenous flagging for Greenland, as this frequency is less sensitive to variations in snow geophysical properties (i.e., density, depth, and grain size) [64,65]. Spatial analysis for all seasons showed the best results of a homogenous flagging over Greenland with the third threshold (i.e., <245).
Here we assume that snow/frozen conditions do not vary that much within the time difference of overpass (order of~1 day), so that a L-band only sensor dataset can be flagged by applying the flag of another sensor dataset for that day. Note that small inconsistencies could still be present between different sensors and frequencies related to differences in footprint sizes, swath widths, and overpass times. However, we assume that most of these differences are negligible when considering night-time observations only. In addition, we assume that the spatial scale of the weather events related to snow and frozen effects is in most cases larger than the scale of differences in footprint size.

Flagging Differences
The influence of data flagging differences on data availability was analyzed globally for the period 2016-2019. This enabled the detection of where and when the flagging differences have the most impact on any climate study that is based on the SMOS, SMAP, or ESA CCI SM datasets. To represent the impact of flags on data availability, we defined the total flagging intensity that is the average number of NaNs in the SM record. The flagging intensity of snow, ice, or frozen conditions was calculated by deriving a combined critical snow, frozen soil, and ice (SFI) flag for each of the two single sensor data products and averaging the number of times the combined flag operates as a mask for SM observations. The SFI flag of the new method was computed by counting the number of times the classification of snow or frozen flag was reached for AMSR2 following the decision tree in Figure 4. Any NaN that was already present before L3 (i.e., due to no valid overpass at the measurement time or filtering steps at L1 or L2 of f.e. sensor-specific conditions such as radio-frequency interferenceor sun glint) were excluded. The latitudes with less than 50 land pixels were excluded in the analysis of the maximum latitudinal differences in flagging intensity.

Performance Analysis of the Snow and Ice Flags
An ideal performing flag creates the optimal balance between data quality and data availability. For this reason we studied the performance of the SFI flags of SMAP SM, SMOS SM, and the AMSR2-based method in terms of false positives and false negatives for two types of in-situ measured snow and/or frozen conditions. SWE and TS measurements from the SNOTEL and SCAN networks were used for derivation of the reference snow and/or frozen soil flags, for which we distinguished between two different threshold combinations. The first (t1) consists of SWE > 50 mm and/or TS < 0 • C representing light snow and/or frozen soil conditions and the latter (t2) is based on SWE > 200 mm and/or TS < −1 • C representing more pronounced snow and/or frozen soil conditions.
The first threshold was found by studying the time series of the stations, for which we found that the 50 mm threshold in SWE best represents the observed snow season (on average 41% of the time). The 200 mm threshold was chosen because it captures the highest SWE peaks even for the southern stations (on average 24% of the time). For frozen conditions we simply took the lightest threshold possible from a physical point of view (0 • C) and −1 • C as this captured the rare events (≤5% of the time). By considering the SFI flag values as the estimated values and one of the two threshold combinations of SWE and TS as the reference or true values, the number of false positives (FP) and negatives (FN) can be quantified. The FP numbers indicate whether there was overflagging [66], whereas the FN numbers specify the degree of underflagging. The true positives (TP) and true negatives (TN) reflect the exact opposite situations, indicating correct representation of a condition that should be flagged or not flagged by the SFI flag value compared to the reference data. The false discovery rate (FDR), the false omission rate (FOR), and the accuracy (ACC) were calculated according to: By considering the final performance results, we found that t1 and t2 are representative for the type of snow/frozen conditions captured by the flagging systems of all three datasets. The shift in FDR is namely opposite to the shift in FOR when comparing threshold t1 to threshold t2 for all three datasets, which shows that the optimal threshold is found somewhere in between t1 and t2 of which the exact position differs per SFI flag.

Differences in Flagging Systems
The flagging systems at the single sensor level for SMOS, SMAP, and at the multi-sensor level for ESA CCI SM were compared in the number of flags and types of data source used for these flags. Seven flagging categories were defined based on mutual occurrence: "Snow/frozen", "Precipitation", "Open water", "RFI", "Urban areas", "Vegetation", and "Topography". The "Other" category includes all residual flags, among which coastal flags, wetlands flags, and flags related to unrealistic SM results. For a summary overview of the flagging differences, please see Table 2 (for more details please see  Tables A1-A3 in Appendix A). Figure 5 demonstrates the diversity in the number of flags per category and in total. The single sensor SM datasets SMOS and SMAP both have a high complexity in flagging with approximately seven to eight times as many flags as the ESA CCI SM. The category with the highest number of flags considering all datasets is the "Snow/frozen" category, with 11 out of the 31 critical flags (i.e.,~36%). SMOS SM has more flags in the "Snow/frozen", "Open water", "Urban areas", and "Topography" categories, than SMAP and ESA CCI SM. In contrast, SMAP SM has the most flags in the "Other" category, dealing with many quality aspects. However, the number of critical flags is highest for SMOS SM (e.g., 15). Every flag within ESA CCI SM except one is critical and for none of the seven categories it has multiple flags. Although not within the scope of this paper, it would be relevant to review the quality of the different flagging systems for all conditions. Then we would be able to assess the balance between more complex, but likely more accurate flagging systems (such as those implemented for SMOS and SMAP SM) and more simple, but likely less accurate flagging systems (such as the one for ESA CCI SM).   Figure 6 shows the differences in the data sources. In line with the sensor-specific nature of RFI, only internal data sources are used for the critical flags related to this condition. On the contrary, for the categories "Topography", "Open water", "Precipitation", and "Urban areas" only flags based on external data sources are implemented. In addition, "Precipitation" is the only category with only one critical flag in all three flagging systems (i.e., in SMAP SM), which could be related to the level of difficulty of measuring or modeling this dynamic component. While SMOS and SMAP SM have similar data source types for "Topography", "Urban", and "Open water", the specific data sources are still different (see Tables A1-A3 in Appendix A). The "Snow/frozen" category is targeted with the highest number of data source types. It is also the only one within the ESA CCI SM flagging system that is partially model-dependent. Snow and frozen conditions are separately flagged by both SMOS SM and SMAP SM, so one could argue to use two flagging categories. However, the complexity in the number of snow/frozen/ice conditions varies considerably between the three datasets, ranging from 15 flags for SMOS SM to one for the ESA CCI SM. Besides, the flagging strategy proposed here masks both the influence of frozen soil water and of snow particles on the microwave signal. For these reasons we decided to take the snow and frozen conditions as one category. The frozen ground condition of SMAP SM is based on both model results and on its own official Freeze/Thaw State product [28]. The latter is based directly on the Tb's but is only advisory. SMOS SM delineates between mixed, dry, and wet snow and between frost and ice.

The Total Flagging Impact on Data Availability
Figure 7a demonstrates that flagging affects the data availability of both datasets the most at northern high latitudes above 55 • N (i.e., Greenland, North America, Scandinavia, and Russia), with mean flagging intensities of 80% and 68% for SMOS SM and SMAP SM, respectively, and in mountainous areas such as the Himalayas and the Rocky mountains. The high flagging intensities for the northern high latitudes are attributed to their permanent (i.e., 100% flagging intensity for Greenland) and seasonal snow cover, frozen conditions, and land ice.
However, the overall flagging impact clearly differs between the two datasets ( Figure 7a). For SMOS SM, more values are flagged in general and with a larger spatial variability. In addition, some spots in South America, in particular the Andes, the Congo basin in Africa, India, and Oceania are also marked by a high flagging percentage in the SMOS SM dataset. The likely explanation for these differences is the different flagging thresholds for topography, vegetation, and open water. Surprisingly, the "Topography" flagging condition cannot be identified in the total flagging impact on SMAP for the mountainous areas.
By comparing the global data availability influenced by all critical flags to the availability affected only by critical snow/frozen flags, it shows that for SMOS SM~64% of the data unavailability can be explained by the latter only. For SMAP SM, the contribution of snow/frozen flags to the total data unavailability is almost 15% higher, namely~78%. However, this percentage should be slightly lower, as an internal error was found in the critical snow flag, which will be fixed in a future version of the product. Nonetheless, the differences in snow and frozen flags contribute significantly to the flagging inconsistencies and therefore have to be considered carefully to understand differences in the climatologies (and hence anomalies) of the SM products. Although SMOS SM has critical flags for urban areas, the impact of this is not clearly visible in Figure 7a. The same holds for the critical flags of dense vegetation and precipitation in SMAP SM. Dense forests (i.e., the Amazon) and/or the intertropical convergence zone, which is characterized by intense rainfall, can be detected from the SMOS SM spatial map (Figure 7a). SMOS SM has a critical flag for strong RFI at the brightness temperature level, but no RFI patterns show up in the temporally aggregated flagging intensity maps. RFI flags that filter the brightness temperature products already before L3 are not included in this analysis.
As is evident from Figure 7b, the level of data availability is seasonally dependent for the regions characterized by snow cover, frozen conditions, and land ice. As expected, the least amount of data is available for the Northern Hemisphere (NH) during the December-January-February (DJF) and March-April-May (MAM) seasons for both datasets and vice versa for the Southern Hemisphere (SH). For SMOS and SMAP SM, a local maximum can be detected for the northern mid latitudes in June-July-August (JJA), MAM, and September-October-November (SON) around 36-37 • N (36-45% and 5-17% for SMOS and SMAP respectively), corresponding to latitudinal orientation of the Himalayas. A steep south-to-north gradient in flagging intensity is also observed for both SMOS and SMAP on the NH, starting from a local minimum at 47-48 • N in JJA, at 40 • N in MAM and SON, and without local minima in DJF, consistent with seasonal latitudinal shifts of the snow/ice boundary.
The maximum difference in flagging impact (~37%) is found for JJA around 63 • N. This might be explained by a difference in sensitivity towards wet snow and would imply that SMOS SM flags more pixels for wet snow conditions than SMAP SM. In MAM, the flagging intensity of the two datasets differs the most (~33%) around the latitude where they both have the local maximum (i.e., 37 • N). In SON and DJF, the maximum difference in flagging intensity (~32%) is detected around the Equator (~1 • S), which corresponds to the location of the ITCZ/tropical region that is characterized by heavy precipitation and dense vegetation. This difference is explained by relatively high data unavailability for SMOS SM and high data availability for SMAP SM. This result is noteworthy as SMOS SM does not even have critical flags for precipitation or dense vegetation, while SMAP SM's documentation states that it has critical flags for precipitation and dense vegetation ( Table 2). Regarding the SMOS SM, the SM retrieval does not always converge for these regions.

The New AMSR2-Based Flagging Method in Analyzing the Flagging Intensity
The data availability based on the AMSR2-based flag (AMSR2flag) is compared to the cumulative critical snow/frozen flags of SMOS SM and SMAP SM (see Figure 8), which consist of respectively seven and two flags. Similar spatial patterns in snow/frozen flagging intensity (SFFI) are visible for all three datasets and all three datasets have snow/frozen flagging systems that flag the northern high latitudes and the Himalaya mountain area (Figure 8a). Nonetheless, there are some considerable differences between the three datasets in the regions depicted by specific spatial patterns and the magnitude of SFFI impact. The Andes and the Kaukasus ranges are clearly masked for SMOS SM and AMSR2flag, which is not the case for SMAP SM. Nevertheless, SMOS SM and AMSR2flag differ in the spatial pattern of masked snow/ice in the Andes. For SMOS SM, the intensity gradient is southward orientated, while for AMSR2flag this is northward orientated. The flagging intensity of the AMSR2flag is evidently low for open water bodies, which is not the case for SMOS SM and SMAP SM, for which the combined snow/frozen flags depend on modeled temperature. In general, the magnitude of SFFI impact is highest for SMOS SM and lowest for AMSR2flag. Please note that the relatively low AMSR2flag SFFI is likely related to its simplicity and that it mainly depends on the contrast in frozen/unfrozen states. Figure 8b demonstrates that the highest impact of flagging inconsistency in snow/frozen flagging intensity is found for the northern high latitudes, which is in line with our previous findings. Within each dataset and between the three datasets, large variability is visible in the SFFI magnitude for this region. In MAM, JJA, and SON there are local maxima in the SFFI of the three datasets at 36 • N, where SMOS flags the most intense and SMAP the least. This is coherent to the earlier finding of local maxima in the total flagging intensity of SMOS and SMAP, of which we can now conclude that this difference is (partially) the result of differences in snow/frozen flagging.
The maximum seasonal variability of SSFI is 76% at~68 • N latitude for SMOS SM, 93% at 58 • N latitude for SMAP SM, and 93% at~67 • N latitude for AMSR2flag. The latitudes of maximum seasonal variability in SSFI impact of SMOS SM and AMSR2flag are in line with the average latitude of the NH now/ice boundary, whereas it is situated~10 degrees more southwards for SMAP SM.
The maximum difference of SFFI impact on data availability between any two out of the three datasets is 55% in JJA at~75 • N, which is mainly related to the relatively low flagging intensity of AMSR2flag at high northern latitudes in summer. This could confirm the previous suggestion that AMSR2flag is more sensitive to contrast in frozen/unfrozen states, and therefore, does not mask pixels that are covered by wet snow and consist of a partial unfrozen top layer in JJA. This is not the case for SMOS SM and SMAP SM, as their individual snow/frozen flags do not depend on the emissivity but on the land-cover (including permanent snow/ice) or modeled temperature. An extension of the physical-based algorithm or a combination with land-cover data of permanent snow cover could extend the AMSR2flag to cover all snow and frozen conditions. Nonetheless, the quality of SM under these circumstances and/or the true snow/frozen conditions should be first investigated to be able to decide on whether these conditions need to be masked or not.
In DJF the maximum difference is 26% in DJF at 58 • N and can be mainly attributed to relatively low SFFI impact for SMOS SM (between 51-58 • N) compared to SMAP SM and AMSR2flag. The maximum difference is 18% in MAM and situated at 42 • N, which is reflected by the lack of a local maximum for SMAP at this latitude, which is visible in both SMOS SM and AMSR2flag. This difference is also observed for the JJA and SON seasons. The smallest maximum difference (16%) is found for SON at 73 • N and related to slightly lower SFFI flagging of AMSR2 compared to SMOS and SMAP SM.

Performance Evaluation with Ground Observations
In order to place the previous results in perspective, the performance of the new snow/frozen flagging method applied on AMSR2 must be evaluated in comparison to the SFI flags of SMOS SM and SMAP SM. An exact match is not expected because of the spatial representativeness error of comparing SFI flags derived from big satellite footprints with single site sensors. Nonetheless, this analysis provides a good overview of the quality characteristics of the different datasets. Figure 9 shows that the highest accuracy levels correspond to t1 for all SFI flag datasets, suggesting that the SFI flags represent also the light snow/frozen conditions. AMSR2flag is performing above average under both threshold conditions (median accuracy of respectively~78% and~74%). The SMAP SFI flag has a slightly higher accuracy for both t1 and t2 (respectively~83% and~76%), whereas the SMOS SM SFI flag scores lower (respectively~65% and~51%). On the one hand, the lower flagging intensity of AMSR2flag observed in Figure 8, is in line with higher FOR compared to SMOS SM and SMAP SM SFI flags. On the other hand, it appears that the high flagging intensity of SMOS SM is relatively more problematic considering the FDR that is significantly higher than those of SMAP SM and AMSR2flag.
For all three SFI flags, most outliers are detected in the FDR compared to the FOR. This indicates that there are more stations with an exceptionally high number of FP, than there are stations with an exceptional high number of FN. As expected, there are more outliers in stations with lower accuracy values than in stations with higher accuracy values.
The earlier findings on the maximum difference in SFFI impact ( Figure 8) between two of the datasets was detected to be largest (93%) in MAM at~78 • N latitude. Although there were no stations found for this study at such a high latitude, the most Northward one is situated within 10 degrees, namely at~70 • N in Alaska. Here we expect that some underflagging occurs for the AMSR2flag, while overflagging is likely present in the SMOS SM SFI flag. This suggestion is in line with the significant differences found between the SFI flags of SMOS SM and SMAP SM. In order to truly investigate whether AMSR2flag is underperforming for higher latitudes relative to lower latitudes and to provide a thorough overview of the spatial pattern in performance, we compared the three performance metrics in space for the threshold of highest performance, t1 ( Figure 10). As can be seen from Figure 10, no south to north degradation pattern in accuracy is present for AMSR2flag. On the contrary, the performance of AMSR2flag is highest for stations located at the highest northern latitudes (i.e., in Alaska). A northwest to southeast line in west to mid U.S. of high FDR numbers is visible for SMOS SM SFI flag and slightly less visible for AMSR2flag. This could be associated with the average position of the snow boundary.
Comparing the different subfigures of SMAP SM SFI flag and AMSR2flag illustrates very similar spatial patterns in the performance metrics. Coherence between SMAP SM SFI flag and AMSR2flag is even evident from the outliers in FDR (the stations in east U.S. and two in northwest U.S.) and for (one at the border Colorado-New Mexico, one south-east Alaska, and the stations at the border Washington-Canada). The stations in the east of the U.S. are also outliers in FDR for the SMOS SM SFI flag. However, for most cases, the outliers in SMOS SM SFI flag do not resemble the outliers displayed for SMAP SM SFI flag and AMSR2flag, especially the stations in east U.S. and some in south Alaska are marked by lower accuracy.
The percentages of FP and FN averaged over time, directly indicating data gaps and/or quality issues, were also investigated in space for t1 in Figure 11. A striking illustration of the different qualities of all three SFI flags can be seen in Figure 11. Relatively uniform low numbers of FN in time (<25%) were observed for SMOS SM SFI flag compared to substantial variation of FN in time (between 0-50%) for SMAP SM and AMSR2flag. However, Figure 11 displays relatively uniform low numbers of FP in time (<25%, except for two outlying stations in SMAP SM SFI flag) for SMAP SM SFI flag and AMSR2flag, compared to SMOS SM SFI flag (~20-50%, except for high performing north Alaskan stations). From these two findings it could be inferred that more observations are available in the SM time series of SMAP and AMSR2flag due to less overflagging, while the quality of more of these observations is disputable resulting from more underflagging, in comparison to the SMOS SM datasets. Please note that the differences in overflagging are bigger than the differences in underflagging, which is related to the high percentage of FP in time for the SMOS SM SFI flag dataset for the south Alaskan sites and almost all U.S. sites, especially those in the east of the U.S.
Despite the earlier findings of relatively high FDR numbers for SMAP SM SFI flag and AMSR2flag for the same locations, the percentages of FP in time are less exceptional than for SMOS SM SFI flag, which is in line with the relatively higher accuracy of SMAP SM SFI flag and AMSR2flag compared to SMOS SM SFI flag as demonstrated in Figures 9 and 10. In particular AMSR2flag performed better for these statistics, although the SMAP SM SFI flag outperformed AMSR2flag considering the percentages of FN in time for the stations in east U.S. Thus, this indicates less underflagging in the SMAP SM SFI flag dataset and less overflagging with the AMSR2flag. Considering Alaska, the AMSR2flag has fewer outliers of FP and FN in time than SMAP SM and SMOS SM SFI flags.

Conclusions
This study reviewed and evaluated the flagging systems of three passive microwave SM datasets. Although the multi-sensor and single-sensor flagging systems could not be compared directly, the many flagging differences demonstrated that the flagging of multiple conditions was quite complex. It can be difficult for a data user to find the right information on how the flags are used within each processing chain. Moreover, the quantification of the exact flagging impact on the data availability could not be directly derived from the metadata of the sensor datasets as it currently stands.
The flagging systems differ in the number of flags (from seven to 15 critical flags for ESA CCI SM and SMOS SM respectively) and underlying data source types. The flagging category with the most critical flags combined across all products (11) was the snow/frozen category. On the one hand, the flagging system of ESA CCI SM could be applied over multiple sensors solving for inconsistencies. On the other hand, it does only flag for certain conditions and is more simplified for the snow/frozen conditions compared to the flagging systems of SMOS and SMAP SM. Whereas the current paper showed the potential impact of the consistent ESA CCI SM flagging on the data availability, future studies are recommended to assess the impact of the flagging simplifications on the SM data quality.
Although the flags within the SMOS and SMAP SM products showed the same general spatial pattern with increasing flagging intensity with latitude, the study revealed that SMOS SM flags more extensively and with more seasonal variation than SMAP SM across the globe. Depending on the season, differences were highest for the northern high latitudes, mountainous regions, and equatorial latitudes (up to 37%, 33%, and 32% respectively). This study confirmed that the combined snow/frozen flagging category was the most important, as it was responsible for approximately 64% to 78% of the total data unavailability for SMOS and SMAP SM respectively.
The findings provided compelling evidence that current flagging differences may have significant influence on both seasonal and annual climate studies that are based on long-term SM records. Therefore, such a future climate study should carefully consider the impact of these differences in space and time. From the perspective of an independent long-term record, it would be preferred to replace the multiple model-dependent flags in SMOS (for snow/frozen conditions) and SMAP SM (forsnow/frozen conditions and precipitation) with measurement-based flags.
In conclusion, the considerable research effort that was required by the data user to quantify flagging impact, together with the introduced model dependency, and vast impact of the flagging differences, stress the importance of a consistent and satellite-based flagging solution. Our proposed strategy for such a solution targets the snow/frozen flagging because it impacts the data availability most. To the knowledge of the authors, this attempt to the development of a consistent flagging strategy is the first of its kind. The satellite-based algorithm used in this study is applicable from 1978 onwards and preserves the model-independence of the derived climate records. A limitation of this flagging approach is that the vertical 36.5, 23.8, and 18.7 GHz frequencies should be available for any pixel in time.
The AMSR2-based flag, AMSR2flag, shows similar spatial patterns as the flagging systems of SMOS SM and SMAP SM, but notable lower flagging intensities (with maximum differences of up to 55%) were detected for latitudes higher than 60 • N in summer and for open water bodies. It is likely that AMSR2flag was less able to mask wet snow layers because of a reduced contrast in potential emissivity. Therefore, extension of the flagging to mask open water bodies and regions that are covered by a wet snow layer is recommended. Thorough evaluation is necessary to assess the reliability of the new flagging strategy for high latitudes.
Based on the comparisons to 399 in situ stations in Alaska and the U.S., we found compelling evidence of reasonable performance (accuracy of~78% for SWE > 50 mm and/or LT < 0 without correction for spatial representativeness errors) of AMSR2flag and no visible degradation of quality with latitude. For most stations the performance of AMSR2flag was ranked in between the performances of the SMOS SM and SMAP SM SFI flags with most resemblance to SMAP SM in accuracy and stations with outliers. Whereas SMAP SM SFI flag and AMSR2flag had higher accuracies and less overflagging (on average~27% for t1, i.e., more observations available), SMOS SM SFI flag had less underflagging (on average~13%, i.e., less impact of snow/frozen conditions on the SM quality is expected). Although SMAP SM SFI flag performed slightly better (~2-5%) in the evaluation analysis, AMSR2flag is a simple and physical based algorithm that can be applied directly on the raw satellite signal.
Therefore, AMSR2flag was shown to be a good candidate to reduce model-dependence and flagging inconsistencies for long-term SM records as ESA CCI SM. As it could flag snow and frozen conditions at a similar performance as the extensive set of snow/frozen flags within SMOS and SMAP SM coming from external satellite and model data. The proposed flagging strategy could improve climate studies (f.e. of anomalies and trends) based on long-term merged SM records, as artificial breaks and trends in data availability or quality due to flagging inconsistencies will be removed. The reduction in model dependency would also increase the usefulness of these records for benchmarking or model evaluation purposes. Since not all the needed frequency bands are available for all satellite sensors within ESA CCI SM, more research is needed to understand the effects of different footprint sizes/shapes and different acquisition times.
The AMSRflag could also be used complementary to the current land surface-based flag for snow and frozen surfaces in ESA CCI SM. Although outside of the current scope, to assess the actual potential of the AMSR2flag for ESA CCI SM, it is recommended to compare the data availability impact and performance of AMSR2flag to the current ESA CCI SM flag for snow and frozen ground.
Extending the measurement-based flagging method to flag more conditions is recommended. Future evaluation studies should include the exploration of all active and other passive SM sensors, ground networks on the SH, and focus on seasonal variations in performance.
Author Contributions: M.v.d.V., R.v.d.S. and R.d.J. designed this study, prepared the AMSR2-based datasets, conducted the main analysis and coordinated the writing of the paper. N.R.-F. prepared the SMOS L3 SM dataset and contributed to the writing and revising of the manuscript. A.C. prepared the SMAP SM snow fraction dataset and contributed to the writing and revising of the manuscript. W.P., T.S. and W.D. prepared the soil moisture anomalies of the European Space Agency (ESA) Climate Change Initiative (CCI) soil moisture (SM) COMBINED v04.7 (as input for Figure 1) and contributed to the writing and revising of the manuscript. All authors have read and agreed to the published version of the manuscript.
Funding: This study has been funded through ESA's Climate Change Initiative (CCI) for soil moisture (grant no. 4000104814/11/I-NB).

Acknowledgments:
The SMOS L3SM products were obtained from the Centre Aval de Traitement des Données SMOS (CATDS), operated for the "Centre National d'Etudes Spatiales" (CNES, France) by IFREMER (Brest, France). A partial contribution to this work was made at the Jet Propulsion Laboratory, California Institute of Technology under a contract with the National Aeronautics and Space Administration. The authors would also like to thank JAXA and ISMN for making Tb, TS, SWE data available online. The authors also wish to thank the guest editor, Lionel Jarlan, and the three anonymous reviewers. Their comments and suggestions have been very helpful for improving the quality of this paper.

Conflicts of Interest:
The authors declare no conflict of interest. Table A1. Overview of the flagging system of the sensor dataset SMOS SM L3.

Internalonly Datasets
Snow/frozen 14 7 All wet snow, All mixed snow, Wet snow pollution, Mixed snow pollution, All frost, Frost pollution, All ice * There are three non-binary flags included (i.e., radio-frequency interference, fraction forest, and fraction nominal) for SMOS SM. Regarding radio-frequency interference (RFI), SMOS SM uses up to~250 Tb measurements at different angles and polarization for each SM retrieval, so that RFI is only critical if all but a few Tb's are masked due to RFI flags.  * For the flags applied on the ESA CCI SM products, internal-only data sources means that they are directly based on the Tb's of the individual input products via LPRM.