Optimal estimation of sea surface temperature from AMSR-E

: The Optimal Estimation (OE) technique is developed within the European Space Agency Climate Change Initiative (ESA-CCI) to retrieve subskin Sea Surface Temperature (SST) from AQUA’s Advanced Microwave Scanning Radiometer—Earth Observing System (AMSR-E). A comprehensive matchup database with drifting buoy observations is used to develop and test the OE setup. It is shown that it is essential to update the ﬁrst guess atmospheric and oceanic state variables and to perform several iterations to reach an optimal retrieval. The optimal number of iterations is typically three to four in the current setup. In addition, updating the forward model, using a multivariate regression model is shown to improve the capability of the forward model to reproduce the observations. The average sensitivity of the OE retrieval is 0.5 and shows a latitudinal dependency with smaller sensitivity for cold waters and larger sensitivity for warmer waters. The OE SSTs are evaluated against drifting buoy measurements during 2010. The results show an average difference of 0.02 K with a standard deviation of 0.47 K when considering the 64% matchups, where the simulated and observed brightness temperatures are most consistent. The corresponding mean uncertainty is estimated to 0.48 K including the in situ and sampling uncertainties. An independent validation against Argo observations from 2009 to 2011 shows an average difference of 0.01 K, a standard deviation of 0.50 K and a mean uncertainty of 0.47 K, when considering the best 62% of retrievals. The satellite versus in situ discrepancies are highest in the dynamic oceanic regions due to the large satellite footprint size and the associated sampling effects. Uncertainty estimates are available for all retrievals and have been validated to be accurate. They can thus be used to obtain very good retrieval results. In general, the results from the OE retrieval are very encouraging and demonstrate that passive microwave observations provide a valuable alternative to infrared satellite observations for retrieving SST.


Introduction
Sea surface temperature (SST) is an essential climate variable that is fundamental for climate monitoring, understanding of air-sea interactions, and numerical weather prediction. It has been observed from thermal infrared (IR) satellite instruments since the early 1980s, but these observations are limited by their inability to retrieve SST under clouds and biasing from aerosols [1][2][3]. SST observations retrievals. The objective of this study is to develop a retrieval algorithm that provides an optimal and physically consistent retrieval with a specific focus on retrieving SST to be used for generation of a climate data record.
The paper is structured such that Section 2 describes the satellite and in situ data, the matchup database used for developing the OE retrieval and the OE processor. The results are presented in Section 3, and discussion and conclusions are in Sections 4 and 5, respectively.

AMSR-E Brightness Temperatures
JAXA's AMSR-E instrument was launched in May 2002 on NASA's Aqua satellite. The AMSR-E instrument is a conical scanning microwave imaging radiometer that measures both vertical and horizontal linear polarizations at 6.9, 10.7, 18.7, 23.8, 36.5, and 89.0 GHz channels using an antenna diameter of 1.6 m. This suite of channels was chosen to support accurate retrievals of ocean, ice and atmospheric parameters. AMSR-E data are available from June 2002 through October 2011 when the antenna rotating mechanism on the instrument failed. This study uses the spatially resampled L2A swath data product AMSR-E V12 [33], produced by Remote Sensing Systems (RSS) and distributed by NASA's National Snow and Ice Data Center (NSIDC; https://nsidc.org/data/ae_l2a). The spatial resampling is generated by applying the Backus-Gilbert method to the L1A data. The RSS L2A product includes brightness temperatures for all AMSR-E channels that have been calibrated to the RSS version 7 standard, which includes inter-calibration with other satellite radiometers, and a correction to the AMSR-E hot load used during the calibration [34]. Brightness temperatures are re-sampled to the resolution of other channels and the location where the reflection vector intersects the geostationary sphere, used for development of RFI flagging, is included in the dataset. Sun glint angles are also calculated as a part of the RSS L2A AMSR-E V12 files. For this analysis, we use the re-sampling to 6.9 GHz resolution (75 × 43 km) for the five lowest frequencies.

In Situ Observations
The in situ dataset used for algorithm testing and validation is composed of quality-controlled measurements taken from the International Comprehensive Ocean-Atmosphere Dataset (ICOADS) version 2.5.1 [35], and the Met Office Hadley Centre (MOHC) Ensembles dataset version 4.2.0 (EN4) [36]. Observations from drifting buoys constitute the main source of observations. The temperature sensor on a drifting buoy is placed at around 20 cm depth in calm water, although the depth in perturbed conditions is poorly known. Temperature measurements are typically made hourly with an uncertainty from sensor calibration inferred to be about 0.2 • C [37]. MOHC quality control (QC) flags and track flags are provided with the data. See Atkinson et al. [38] for more information on the quality control.
In addition, temperature observations are used from the Argo profiling floats (see e.g., [39]). The data and quality control of these observations are described in Good et al. [36]. In this study, the uppermost temperature observations from the Argo observations have been used, which have a typical depth of 5 m [40] and a very high accuracy, with uncertainties of 0.002 • C [41,42]. Both Argo and drifting buoy observations have previously been used for algorithm development and validation studies [43][44][45][46].

Ancillary Data Fields
The OE method utilizes a priori information about the state of the ocean and atmosphere as first guess. For this study, we used Numerical Weather Prediction (NWP) information from the ERA-Interim NWP data as first guess on the atmospheric and oceanic state [47]. The ERA-Interim SST fields are from the Operational Sea Surface Temperature and Ice Analysis (OSTIA) level 4 SST analysis, Remote Sens. 2018, 10, 229 4 of 24 which is generated from IR and PMW satellite observations blended with in situ data from drifting buoys [19,21]. The larger number of IR satellites and the higher spatial resolution compared to the PMW satellite observations means that the OSTIA analysis is dominated by the IR satellite observations. An independent validation showed a global mean difference (OSTIA-Argo floats) of −0.05 K and standard deviation of 0.55 K (see Section 3.3), which was found when the OSTIA SST analysis fields were compared against observations from Argo floats for 2009-2011 (using the data filtering methods described in Section 2.2.2). For Sea Surface Salinity (SSS), we used the monthly, 0.25 degrees spatial resolution, objectively analyzed mean fields, SSS from the World Ocean Atlas (WOA) 2013 version 2 [48,49]. This climatology was determined from historical salinity data from a wide variety of sources that were carefully quality controlled and objectively analyzed into monthly globally complete maps of SSS. These data were linearly interpolated in time and space to the matchup location.

ESA-CCI Multi-Sensor Matchup Dataset
The basis for the retrieval algorithmic development and tuning is a Multi-sensor Matchup Dataset (MMD) pioneered by the ESA-CCI SST project [50]. The MMD has been constructed as a general dataset for algorithm development and not specifically for OE development. It includes AMSR-E orbital data matched to in situ measurements (drifting buoys and Argo floats) constrained by a maximally allowed geodesic distance and a maximal time difference.
The MMD was created using a Multi-sensor Matchup System (MMS) software that reads in all the in situ observations and finds the corresponding matching satellite observations throughout the full dataset. Matches were only included within a maximal geodesic distance of 20 km and a time difference of maximally 4 h. The spatial distance ensures that the in situ measurement is located within an AMSR-E footprint. The temporal distance balances the need for accurate collocated data with the need for a large number of useable matches.
The collocated AMSR-E data include a 21 by 21 pixel window with the matchup location in the center as well as all variables of the corresponding in situ measurement. The ERA-Interim NWP data were referenced to each AMSR-E pixel and each in situ measurement and spatially interpolated to the data raster using Climate Data Operators (CDO) [51]. This ancillary information includes a subset of the available ERA-Interim variables, covering a time range of −60 h to +36 h around the matchup time. Processing of the matchup dataset has been performed on the Climate and Environmental Monitoring from Space Facility (CEMS) computing facility at the Centre for Environmental Data Analysis (CEDA).

Data Filtering Methods
Developing an accurate retrieval algorithm relies on the quality of the satellite observations and auxiliary fields used for the retrieval and validation. It is therefore essential to flag erroneous matchups as accurately as possible and produce a clean dataset for the development.
Data have been flagged if any of the following quality flags were set to fail: AMSR-E pixel data quality, AMSR-E scan data quality, MOHC QC flag and MOHC track flag. If the brightness temperature was outside the normal range (0-320 K) for any of the channels, the data were flagged. To discard cases where the atmospheric contribution (largest for the 18-36 GHz channels) exceeds the information from the surface, data were flagged if the difference between the H and V brightness temperatures for the 18-36 GHz channels <0 K (for valid oceanic retrievals V should always be larger than H). Data were also flagged based on the spatial standard deviation of the 23V, 23H, 36V and 36H GHz brightness temperatures in the 21 × 21 pixel extracts surrounding each matchup. If the standard deviation of these channels were higher than 55, 35, 25 and 25 K, respectively, the data were flagged to remove obviously bad observations. Additionally, matchups were excluded if the ancillary data seemed to be erroneous. If in situ or NWP SSTs were less than −2 • C or greater than 40 • C or if NWP wind speeds Remote Sens. 2018, 10, 229 5 of 24 were greater than 20 m·s −1 then the matchup was flagged. All these flags have been combined into a gross error flag, which in total removes 13.1% of the drifter matchups.
Additional filtering was added to account for various situations where the SST retrieval could be compromised, namely during: land and ice contamination (due to antenna side-lobe contamination), sun glitter contamination, geostationary satellite and ground-based RFI, diurnal warming, and rain contamination. To avoid contamination due to land and ice, the AMSR-E land/ocean flag and NWP sea ice fraction were used to flag data. Applying this filter alone resulted in a flagging of 13.1% of the drifter matchups, which had already passed the gross error check. To avoid sun glitter contamination, data with sun glint angle <25 • were flagged (9.6%). Potential contamination due to RFI was detected using the observation location (for ground based RFI) and reflection vector (for geo-stationary RFI) using Table 2 presented in Gentemann et al. [52] (6.5%). To avoid diurnal warming effects, daytime data with NWP wind speeds <4 m·s −1 were flagged (8.0%). Rain contamination was accounted for by flagging data if the brightness temperature of the 18V channel >240 K (0.4%). Applying all these checks at once leads to an elimination of 41.1% of the total drifter matchups.
Finally, to obtain a more equal latitudinal distribution of the drifter matchups, a limit of 40,000 matchups per degree of latitude was imposed, removing an additional 12.2% of the drifter matchups. The summary statistics for different steps in the flagging process are shown in Table 1. The outcome is a high-quality, globally representative, final drifter matchup database. The focus for this study is the year 2010, which consists of 3,764,798 filtered drifter matchups that are used in the following. The final number of matchups per month during 2010 is shown in Figure 1. Figure 2a shows the geographical distribution of the matchups per square kilometer after all checks have been applied. Figure 2b shows the latitudinal distribution of matchups before and after the even-out-by-latitude filter has been applied, where the red line denotes the maximum allowed matchups per latitude.
Several subsets of the filtered drifter matchups have been applied throughout this study. The subsets were randomly selected from the filtered drifter matchups. This approach was chosen to reduce computational efforts and very small effects were seen on the final results, when the size of the subset was increased or reduced.  Several subsets of the filtered drifter matchups have been applied throughout this study. The subsets were randomly selected from the filtered drifter matchups. This approach was chosen to reduce computational efforts and very small effects were seen on the final results, when the size of the subset was increased or reduced.
The matchups with Argo floats are limited to 95,240, prior to quality filtering. The gross error check removes 18.9% of the Argo matchups. Removing matchups contaminated by land/ice, RFI, rain, and diurnal warming effects sums up to a total removal of 39.3%, leaving 57,810 Argo matchups to be used in the following. Due to a more equal latitudinal matchup distribution and the limited number of matchups, the even-out-by-latitude filter has not been applied here. Figure 3a shows the geographical distribution of the Argo matchups per square kilometer and Figure 3b shows the latitudinal distribution of the Argo matchups after all checks have been applied.  Several subsets of the filtered drifter matchups have been applied throughout this study. The subsets were randomly selected from the filtered drifter matchups. This approach was chosen to reduce computational efforts and very small effects were seen on the final results, when the size of the subset was increased or reduced.
The matchups with Argo floats are limited to 95,240, prior to quality filtering. The gross error check removes 18.9% of the Argo matchups. Removing matchups contaminated by land/ice, RFI, rain, and diurnal warming effects sums up to a total removal of 39.3%, leaving 57,810 Argo matchups to be used in the following. Due to a more equal latitudinal matchup distribution and the limited number of matchups, the even-out-by-latitude filter has not been applied here. Figure 3a shows the geographical distribution of the Argo matchups per square kilometer and Figure 3b shows the latitudinal distribution of the Argo matchups after all checks have been applied. The matchups with Argo floats are limited to 95,240, prior to quality filtering. The gross error check removes 18.9% of the Argo matchups. Removing matchups contaminated by land/ice, RFI, rain, and diurnal warming effects sums up to a total removal of 39.3%, leaving 57,810 Argo matchups to be used in the following. Due to a more equal latitudinal matchup distribution and the limited number of matchups, the even-out-by-latitude filter has not been applied here. Figure 3a shows the geographical distribution of the Argo matchups per square kilometer and Figure 3b shows the latitudinal distribution of the Argo matchups after all checks have been applied.

Optimal Estimation Development
The OE method can be used to retrieve geophysical parameters (e.g., SST) from PMW observations [29]. The relationship between the geophysical parameters and the measured brightness temperatures can be generalized to the following expression [53]: where y is the measurement vector (observed microwave brightness temperatures); F(x) is the nonlinear forward model approximating the physics of the measurement, including the surface emissivity and the radiative transfer through the atmosphere [54]; x is the state vector containing the relevant geophysical properties of the ocean and atmosphere; and e is a residual uncertainty term containing uncertainties due to the measurement noise and uncertainties in the forward model. The forward model predicts the top-of-atmosphere microwave brightness temperatures that should be measured by the individual channels of a radiometer given knowledge of the relevant geophysical parameters (x) of the ocean and atmosphere. The forward model used in this study is based on the physical surface emissivity and Radiative Transfer Model (RTM) described in Wentz et al. [54]. The RTM consists of an atmospheric absorption model for oxygen, water vapor and cloud liquid water and a sea surface emissivity model that determines the emissivity as a function of SST, SSS, sea surface wind speed and direction. Some components have been adjusted with respect to Wentz et al. [54]. These include the wind directional signal of sea surface emissivity, which has been suppressed as it did not improve the retrievals; and the fact that we only use the V-and Hpolarizations for the 5 lower frequencies: 6.9, 10.7, 18.7, 23.8, 36.5 GHz.
The aim of this study is to invert Equation (1) to retrieve the most likely state vector x that can reproduce the observed microwave brightness temperatures, y. In this study, the inversion approach follows the OE technique by Rodgers [53] and we broadly follow his conventions.
In OE, a priori information about the expected mean and covariance of the geophysical parameters can be used to put restrictions on the variances of the estimated geophysical parameters and thereby improve the retrieval. In this case, the prior information is NWP fields as described in Section 2.1.3.
Assuming the forward model is a general function of the state, the measurement + model error has a Gaussian distribution, and there is a prior estimate with a Gaussian uncertainty distribution, the maximum probability state x can be found by minimizing the cost function, J:

Optimal Estimation Development
The OE method can be used to retrieve geophysical parameters (e.g., SST) from PMW observations [29]. The relationship between the geophysical parameters and the measured brightness temperatures can be generalized to the following expression [53]: where y is the measurement vector (observed microwave brightness temperatures); F(x) is the non-linear forward model approximating the physics of the measurement, including the surface emissivity and the radiative transfer through the atmosphere [54]; x is the state vector containing the relevant geophysical properties of the ocean and atmosphere; and e is a residual uncertainty term containing uncertainties due to the measurement noise and uncertainties in the forward model. The forward model predicts the top-of-atmosphere microwave brightness temperatures that should be measured by the individual channels of a radiometer given knowledge of the relevant geophysical parameters (x) of the ocean and atmosphere. The forward model used in this study is based on the physical surface emissivity and Radiative Transfer Model (RTM) described in Wentz et al. [54]. The RTM consists of an atmospheric absorption model for oxygen, water vapor and cloud liquid water and a sea surface emissivity model that determines the emissivity as a function of SST, SSS, sea surface wind speed and direction. Some components have been adjusted with respect to Wentz et al. [54]. These include the wind directional signal of sea surface emissivity, which has been suppressed as it did not improve the retrievals; and the fact that we only use the V-and H-polarizations for the 5 lower frequencies: 6.9, 10.7, 18.7, 23.8, 36.5 GHz.
The aim of this study is to invert Equation (1) to retrieve the most likely state vector x that can reproduce the observed microwave brightness temperatures, y. In this study, the inversion approach follows the OE technique by Rodgers [53] and we broadly follow his conventions.
In OE, a priori information about the expected mean and covariance of the geophysical parameters can be used to put restrictions on the variances of the estimated geophysical parameters and thereby improve the retrieval. In this case, the prior information is NWP fields as described in Section 2.1.3.
Assuming the forward model is a general function of the state, the measurement + model error has a Gaussian distribution, and there is a prior estimate with a Gaussian uncertainty distribution, the maximum probability state x can be found by minimizing the cost function, J: where S is a covariance matrix for the measurement and forward model uncertainties, S a is the covariances of the a priori state x a (the a priori guess of the ocean and atmospheric state x). The cost function is a measure of the goodness of the fit to both the measurements (first term on the right) and the a priori state (second term on the right) balanced by the inverse of their relative uncertainties (S and S a ). In this nonlinear case, Newtonian iteration is a straightforward numerical method for finding the zero gradient of the cost function, J. Using Newtonian iteration, the state x that minimizes the cost function can be found by: where S x is the error covariance matrix of the retrieved parameters: The matrix K expresses the sensitivity of the forward model to a perturbation in the retrieved parameters, i.e., it is a matrix consisting of the partial derivatives of the brightness temperatures in a particular channel with respect to each parameter of the state vector. Due to non-linearity, these partial derivatives need to be computed at each iteration (state).

Initial OE Setup
The measurement vector, y, used in our forward model consists of dual polarization observations (v-pol and h-pol) at the 5 lower frequencies: 6.9, 10.7, 18.7, 23.8, 36.5 GHz. Four geophysical parameters are considered to be the leading terms controlling the observed microwave brightness temperatures in the measurement situation (considering open-ocean only): where WS is the wind speed, TCWV is the integrated columnar atmospheric water vapor content, TCLW is the integrated (columnar) cloud liquid water content, and SST is the sea surface temperature. The variations of the retrieved geophysical parameters are restricted by the use of a priori information from NWP about the mean (a priori state) and covariances of the parameters. OE can be considered to be an adjustment of the a priori state vector based on the difference between simulated and observed brightness temperatures. The method takes appropriate account of errors by combining the a priori state vector and the information content in the observed brightness temperatures. The covariance matrix of the geophysical parameters related to x is fixed to: where e WS = 2 m·s −1 , e TCWV = 0.9 mm, e TCLW = 1 mm and e SST = 0.50 K. The uncertainties on the WS, TCVW and TCLW are best estimates based upon published validation results (see e.g., [47,[55][56][57][58]). The SST uncertainty is derived from a comparison against Argo drifting buoys, using the MMD (see Section 3.3). The measurement covariance matrix, S is initially set to a diagonal matrix with all diagonal elements equal to 0.1 K [54]. The retrieved state vector is obtained by performing the Newtonian iteration, as described in Equation (3). For each iteration, the quality can be assessed by comparing the simulated and observed brightness temperatures and requiring these to be consistent within a certain uncertainty. This idea is quantified by the root-mean-square error (RMSE TB ): which is a confidence indicator of how well the simulated brightness temperatures, TB calc fit the observed ones, TB obs . The RMSE TB criteria is chosen here over e.g., χ 2 as it provides an almost linear relationship with the performance of the OE, which will be shown in Section 3. Figure 4a illustrates the mean RMSE TB difference for each iteration using a subset of the drifter MMD. The uncertainty bars mark one standard deviation. A strong reduction in RMSE TB and standard deviation is found by performing the first iteration. The second iteration similarly leads to a decrease in RMSE TB and standard deviation, while the following iterations show no significant improvement on the mean RMSE TB . The usefulness of RMSE TB as a confidence indicator will be further illustrated in Section 3.
To reduce computational cost, a convergence analysis is performed to decide whether a retrieval process has converged to sufficient precision or if further iterations are required. According to Rodgers [53] the most straightforward convergence test is to ensure that the cost function (Equation (2)) is actually being minimized. The change in the cost function between two subsequent iterations will always be small near a cost minimum. Noting that the expected value of the cost function at the minimum is equal to m degrees of freedom (m = 10) an appropriate test would be to require the change between iterations of In addition, ∆J is required to be positive at the final solution.
A maximum of 10 iterations are allowed and a failure to meet the above convergence criterion within 10 iterations leads to an exclusion of the data (<0.1%). Figure 4b shows the number of iterations performed for all drifter matchups during 2010 by applying the convergence criterion. Usually, convergence is reached by iteration 3-4. This setup will be referred to as the initial optimal estimator. Remote Sens. 2018, 10, x FOR PEER REVIEW 9 of 24 which is a confidence indicator of how well the simulated brightness temperatures, TBcalc fit the observed ones, TBobs. The RMSETB criteria is chosen here over e.g., χ 2 as it provides an almost linear relationship with the performance of the OE, which will be shown in Section 3. Figure 4a illustrates the mean RMSETB difference for each iteration using a subset of the drifter MMD. The uncertainty bars mark one standard deviation. A strong reduction in RMSETB and standard deviation is found by performing the first iteration. The second iteration similarly leads to a decrease in RMSETB and standard deviation, while the following iterations show no significant improvement on the mean RMSETB. The usefulness of RMSETB as a confidence indicator will be further illustrated in Section 3.
To reduce computational cost, a convergence analysis is performed to decide whether a retrieval process has converged to sufficient precision or if further iterations are required. According to Rodgers [53] the most straightforward convergence test is to ensure that the cost function (Equation (2)) is actually being minimized. The change in the cost function between two subsequent iterations will always be small near a cost minimum. Noting that the expected value of the cost function at the minimum is equal to m degrees of freedom (m = 10) an appropriate test would be to require the change between iterations of ∆ = − ≪ or ∆ = − < 0.1. In addition, ∆ is required to be positive at the final solution.
A maximum of 10 iterations are allowed and a failure to meet the above convergence criterion within 10 iterations leads to an exclusion of the data (<0.1%). Figure 4b shows the number of iterations performed for all drifter matchups during 2010 by applying the convergence criterion. Usually, convergence is reached by iteration 3-4. This setup will be referred to as the initial optimal estimator.

Improving the Forward Model
The PMW observations in the 6-10 GHz observe the subskin SSTs at ~1 mm depth, which is different from the IR skin SST observations where a cool skin effect is included [59]. For conditions free of diurnal variability signals, which is the case for the filtered MMD used here, we can assume that the PMW observed subskin SSTs are similar to the in situ drifting buoy observations at the nominal depth of ~20 cm [23]. We can thus use the in situ observations from the drifting buoys to validate the algorithm and for improving the forward model. Similarly, we can assume that the OSTIA SST fields, which are foundation temperatures, are similar to the subskin SSTs and the SSTs at 20 cm.

Improving the Forward Model
The PMW observations in the 6-10 GHz observe the subskin SSTs at~1 mm depth, which is different from the IR skin SST observations where a cool skin effect is included [59]. For conditions free of diurnal variability signals, which is the case for the filtered MMD used here, we can assume that the PMW observed subskin SSTs are similar to the in situ drifting buoy observations at the nominal depth of~20 cm [23]. We can thus use the in situ observations from the drifting buoys to validate the algorithm and for improving the forward model. Similarly, we can assume that the OSTIA SST fields, which are foundation temperatures, are similar to the subskin SSTs and the SSTs at 20 cm.
The OE method assumes an unbiased prior and forward model, which is not necessarily the case. The comparison against Argo SSTs has already addressed the bias in the prior NWP SST field with a mean difference of −0.05 K, which has been adjusted for. In order to get a measure of the forward model deficiency, we compare TB calc with TB obs . If we assume optimal forward model input variables and unbiased TB obs , we can regard the TB obs -TB calc differences as inefficiencies in the forward model that we would like to correct for. In other words, we want to use the best available input variables. Therefore, retrieved WS, TCWV and TCLW and in situ SST values are used in the forward model calculations, to bring us as close to true oceanic and atmospheric conditions as possible. The retrieved variables are obtained by running the initial optimal estimator for a subset of 37,242 matchups. These input variables are used to run the forward model once and the difference TB obs -TB calc is calculated for each channel. Part of the observed channel biases may be a result of the difference between the RTM used here and the one used in calibration of the RSS L2A product [60]. In addition, the RTM used here, does not include wind directional effects. Following Merchant et al. [25] cells with TB obs -TB calc differences falling outside the range given by the median ±3 robust standard deviations (RSD) in any of our 10 channels are discarded. Furthermore, only matchups that have passed the convergence test (Section 2.3.2) are included. The derived average TB obs -TB calc differences of the 10 channels range from −0.75 K on 10 GHz H to 0.62 K on 18.7 GHz V and are subsequently used as a constant bias correction of the forward model. In addition to the constant bias correction, an updated error covariance matrix S is calculated from the TB obs -TB calc subset. The updated S used in the following has an average of square root diagonals of 0.20 K, smallest for 10.7 GHz H and 36.5 GHz H (0.09 K) and largest for 6.9 GHz H (0.31 K).
Further steps are taken to improve the forward model used in the retrieval. The updated optimal estimator has been run for the 3,724,216 drifter matchups in 2010. The retrieved WS, TCWV and TCLW values have subsequently been used together with in situ SST to run the forward model once. Similar to the approach used for the constant bias correction, the simulated brightness temperatures are then compared with satellite observations for each channel. To improve the forward model, we use a correction scheme based upon a multivariate regression model. The regression model applies an empirical fit of TB calc -TB obs to analytic functions of in situ SST, retrieved WS and NWP wind direction relative to the azimuthal look, ϕ r . The fitting is done on averaged TB calc -TB obs values for binned data with respect to SST, WS and ϕ r and with binning intervals of: 1 • C, 2 m·s −1 and 15 • , respectively. Only average values from bins with more than 50 members are used when the regression coefficients are determined. Four sinusoidal terms were found to be the most optimal in representing the wind direction biases. The optimal regression model used for the forward model residuals is: with individual coefficients calculated for each channel. This correction is added to the simulated brightness temperatures individually every time the forward model is called using retrieved SST and WS from the latest iteration. Figure 5a-d show the average (solid lines) and standard deviation (dashed lines) of the difference TB calc -TB obs (final iteration) for all channels, and all matchups during 2010, before (black) and after (blue) the empirical bias correction scheme has been applied. Figure 5a indicates a positive bias at high latitudes, no bias at mid-latitudes and a slightly positive bias at the equatorial regions before the empirical bias correction has been applied. The black line of Figure 5b shows an almost linear trend in bias ranging from a positive bias of about 0.5 K in cold waters, no bias at temperatures~20-25 • C and a slightly positive bias for warmer waters, which is in good agreement with Figure 5a. Figure 5c also reveals a systematic bias in the TB calc -TB obs difference with the NWP WS before the empirical bias correction is applied. At low wind speeds little bias is present but with increasing wind speeds the bias rapidly becomes larger. Also, the binned ϕ r statistics reveal a dependency with a positive bias around ϕ r = 250 • that might be related to wind direction effects not included in the forward model. The bottom plots show the number of matchups in each bin (blue curve) and the cumulative percentage of matchups (red curve).
The blue lines of Figure 5a-d are the updated residuals after the empirical bias correction scheme has been added to TB calc for all channels, each time the forward model is called. The application of the empirical bias correction improves the behavior of the residuals against each of the four factors by flattening their bias curves and bringing them closer to zero. The standard deviation of the TB calc -TB obs difference also decreases with the application of the empirical bias correction scheme.
The retrieved parameters have been compared with and without including the empirical bias correction in the retrieval. The distributions are very similar and mean differences between the two retrievals are: −0.02 m·s −1 , −0.04 mm and 7.19 × 10 −4 mm for WS, TCWV and TCLW, respectively.
The empirical bias correction of the forward model completes the steps taken towards the final OE setup. The final OE configuration is briefly summarized in Table 2 and Figure 6 illustrates the different processes performed in the final Danish Meteorological Institute (DMI) OE algorithm. First, the algorithm reads in the predefined S , S a and e values, where e is the perturbation used to calculate the Jacobians. The observation loop is started for each satellite observation pixel or matchup by reading the observed brightness temperatures and the first guess values. Thereafter, the iteration process is initiated. For each iteration, the forward model is used to calculate the simulated brightness temperature from the state vector (in the first step: state vector = first guess). Moreover, the Jacobians (K), cost function (J), uncertainty (S x ) and sensitivity (A, Section 3.1) are calculated. The change in the cost function between two iterations is used to test for convergence and a maximum of 10 iterations are allowed. Until convergence is met, the state vector is updated for each iteration step and the iteration continues. When the iteration process is stopped the state vector is saved together with the uncertainties, corresponding averaging kernels and simulated brightness temperatures. The drifter matchups covering 2010 have been processed and the final OE SST is presented and validated in the following section.

Results
The DMI OE retrieval scheme has been run with the final configuration described in Section 2.3.3 for the total number of filtered drifter matchups: 3,764,798 for 2010 (Section 2.2.2). The summary statistics of the OE SSTs against drifters are given in Table 3. The retrievals that did not manage to fulfill the convergence criterion described in Section 2.3.2 within the 10 maximum allowed iterations have been eliminated (<0.1%). The OE retrievals that have passed the convergence criterion give SSTs with an initial bias of 0.02 K and a standard deviation of 0.57 K. We have not checked that the retrievals that did pass the convergence test have actually converged to the required solution. This can be done in several ways. One way is to apply a gross error check to throw away cases with unrealistic conditions. These include: (1) temperature conditions outside the accepted range between −2 • C and 35 • C; (2) retrieved wind speeds outside the range 0-30 m·s −1 ; and (3) retrieved cloud liquid water outside the range: 0-1.5 kg/kg (mass of condensate/mass of moist air). Applying this gross error check removes 9% of the retrievals and reduces the standard deviation to 0.54 K. Another approach is to check that the retrieval is consistent with the satellite observations by evaluating the RMSE TB value as described in Section 2.3.2. The practical usefulness of the quality indicator, RMSE TB , is shown in Figure 7. All retrievals that have passed the convergence test have been binned with respect to RMSE TB with a bin size of 0.1 K. The number of members in each bin is shown in the bottom plot (blue curve) together with the cumulative percentage (red curve). The middle plot displays the binned distribution of OE SST minus drifter SST (with bin size of 1 K) as a function of binned RMSE TB , where the color bar is the number of matchups in each bin. The top plot shows the mean (solid) and standard deviation (dashed) of OE SST minus drifter SST as a function of the binned RMSE TB statistic. We notice a large increase in scatter as RMSE TB increases. This makes the RMSE TB -value an efficient indicator of the quality of the OE SST retrieval. Limiting RMSE TB to 1 K removes only 8% of the converged retrievals and leaves the remaining 92% with a bias of 0.02 K and standard deviation of 0.51 K. These results reflect that the RMSE TB quality indicator provides a better discrimination of quality compared to the gross error check.
Around 64% of converged retrievals have a RMSE TB value below 0.5 K and a corresponding bias of 0.02 K and standard deviation of 0.47 K, while 42% have a RMSE TB -value less than 0.35 K and a corresponding bias of 0.02 K and standard deviation of 0.45 K. The validation results of the NWP SSTs are included here for reference, but note that drifting buoy observations and PMW observations have already been included in the generation of the NWP fields, as explained earlier. In the following we will only consider the 64% "good" retrievals, which have a corresponding RMSE TB < 0.5 K. Around 64% of converged retrievals have a RMSETB value below 0.5 K and a corresponding bias of 0.02 K and standard deviation of 0.47 K, while 42% have a RMSETB-value less than 0.35 K and a corresponding bias of 0.02 K and standard deviation of 0.45 K. The validation results of the NWP SSTs are included here for reference, but note that drifting buoy observations and PMW observations have already been included in the generation of the NWP fields, as explained earlier. In the following we will only consider the 64% "good" retrievals, which have a corresponding RMSETB < 0.5 K.  Figure 8a,b show global maps of the gridded (with grid size of 5 degrees) mean and standard deviation of OE SST minus drifter SST, respectively, for the 64% best retrievals with a corresponding RMSE TB < 0.5 K. The geographical distribution of the mean OE SST minus drifter SST reveals a dependency on latitude, with positive bias at mid-latitudes and negative bias in high latitudes and the equatorial region, likely linked to surface emissivity issues (dependent on wind speed and direction) and atmospheric effects. We notice areas with high standard deviations in e.g., the Gulf Stream Extension, the Kuroshio Current and the Aghulas Retroflection areas. These western boundary current regions are known to be very dynamical with high mesoscale activity and large SST gradients over smaller scales [61,62]. The mesoscale SST gradients will result in enhanced differences when the large (64 × 32 km native instantaneous field of view at 6.9 GHz) satellite footprints are compared with in situ observations. The elevated variability in these regions is therefore not related to the quality of the OE SST retrieval. Figure 9a,b show the OE SST performance, considering the retrievals with RMSE TB < 0.5 K, as a function of binned drifter SST and NWP WS, respectively. The OE SST displays a warm bias for drifter SSTs in the range of 15-25 • C and similar for the small fraction of very high (>28 • C) SSTs. Figure 9b shows that the OE SST has a bias dependency on the NWP WS with a warm bias for low (<6 m·s −1 ) wind speeds and a cold bias for higher wind speeds.
therefore not related to the quality of the OE SST retrieval. Figure 9a,b show the OE SST performance, considering the retrievals with RMSETB < 0.5 K, as a function of binned drifter SST and NWP WS, respectively. The OE SST displays a warm bias for drifter SSTs in the range of 15-25 °C and similar for the small fraction of very high (>28 °C) SSTs. Figure 9b shows that the OE SST has a bias dependency on the NWP WS with a warm bias for low (<6 m·s −1 ) wind speeds and a cold bias for higher wind speeds.

SST Sensitivity
One of the benefits of using the OE framework is that it naturally provides several diagnostics for assessing the quality and sensitivity of the retrieval. One of these diagnostics is the averaging kernel matrix, A, which contains the sensitivities of the retrieved parameters to the true state on its diagonal (and cross-sensitivities between parameters on the off-diagonals): where x t is the true state. If the averaging kernel was equal to the identity matrix the a priori state would have no influence on the retrieved state, which instead would be obtained purely from the information content of the measured brightness temperatures. The mean SST sensitivity for all drifter matchups during 2010 is found to be 0.50 with above OE setup and Figure 10a shows the geographical distribution of SST sensitivity. The SST sensitivity is lowest in high latitudes and increases towards the equatorial region, which is consistent with the fact that TB/ SST is smaller for cold waters (especially for X-band 10.65 GHz channels) [63]. The equatorial region reveals sensitivities of ~0.6

SST Sensitivity
One of the benefits of using the OE framework is that it naturally provides several diagnostics for assessing the quality and sensitivity of the retrieval. One of these diagnostics is the averaging kernel matrix, A, which contains the sensitivities of the retrieved parameters to the true state on its diagonal (and cross-sensitivities between parameters on the off-diagonals): where x t is the true state. If the averaging kernel was equal to the identity matrix the a priori state would have no influence on the retrieved state, which instead would be obtained purely from the information content of the measured brightness temperatures. The mean SST sensitivity for all drifter matchups during 2010 is found to be 0.50 with above OE setup and Figure 10a shows the geographical distribution of SST sensitivity. The SST sensitivity is lowest in high latitudes and increases towards the equatorial region, which is consistent with the fact that ∂ TB/ ∂ SST is smaller for cold waters (especially for X-band 10.65 GHz channels) [63]. The equatorial region reveals sensitivities of~0.6 while high latitudes have sensitivities around 0.4. Sensitivities from 0.39 to 0.65 were reported in Gentemann et al. [64] for 0 • C and 30 • C SST, respectively, for an AMSR-E regression type retrieval. In addition, Prigent et al. [63] used simulations to derive channel sensitivities ∂ TB/ ∂ SST of~0.3 to 0.6 for the 6 GHz V. These results are in good agreement with the sensitivities obtained here.

Retrieval Uncertainty
The OE technique offers several options to estimate an uncertainty for each individual retrieval. The OE methodology directly provides an estimate of the retrieval uncertainty, Sx, due to uncertainties in the measurements, forward model, and in the a priori state vector (see Equation (4)). Considering all converged drifter matchups during 2010, the global mean uncertainty is 0.35 K. From Figure 7, it is evident that the quality of the SST retrieval is closely connected to the RMSETB value from the retrieval. For that reason, we have set up an additional uncertainty indicator based on a scaled RMSETB value, using a scaling factor of 0.55. Figure 11 shows the validation results for the uncertainties of the converged matchups, where the actual SST retrieval differences against drifter observations are displayed versus the theoretical uncertainties obtained from the RMSETB values. The dashed line represents the ideal uncertainty under the assumptions that drifting buoys have a total uncertainty of 0.2 K and that the sampling uncertainty is 0.3 K. The point to satellite footprint sampling difference is estimated based on the results in Høyer et al. [44]. It is evident from the figure that there is a good agreement between the observed uncertainty and the modeled uncertainty estimates that are based on the RMSETB and an integrated part of every OE retrieval. The mean modeled uncertainty is estimated to 0.48 K including the in situ and sampling uncertainty. Figure  10b shows the geographical distribution of RMSETB considering the best 64% retrievals with a corresponding RMSETB < 0.5 K.

Retrieval Uncertainty
The OE technique offers several options to estimate an uncertainty for each individual retrieval. The OE methodology directly provides an estimate of the retrieval uncertainty, S x, due to uncertainties in the measurements, forward model, and in the a priori state vector (see Equation (4)). Considering all converged drifter matchups during 2010, the global mean uncertainty is 0.35 K. From Figure 7, it is evident that the quality of the SST retrieval is closely connected to the RMSE TB value from the retrieval. For that reason, we have set up an additional uncertainty indicator based on a scaled RMSE TB value, using a scaling factor of 0.55. Figure 11 shows the validation results for the uncertainties of the converged matchups, where the actual SST retrieval differences against drifter observations are displayed versus the theoretical uncertainties obtained from the RMSE TB values. The dashed line represents the ideal uncertainty under the assumptions that drifting buoys have a total uncertainty of 0.2 K and that the sampling uncertainty is 0.3 K. The point to satellite footprint sampling difference is estimated based on the results in Høyer et al. [44]. It is evident from the figure that there is a good agreement between the observed uncertainty and the modeled uncertainty estimates that are based on the RMSE TB and an integrated part of every OE retrieval. The mean modeled uncertainty is estimated to 0.48 K including the in situ and sampling uncertainty. Figure 10b shows the geographical distribution of RMSE TB considering the best 64% retrievals with a corresponding RMSE TB < 0.5 K.
sampling difference is estimated based on the results in Høyer et al. [44]. It is evident from the figure that there is a good agreement between the observed uncertainty and the modeled uncertainty estimates that are based on the RMSETB and an integrated part of every OE retrieval. The mean modeled uncertainty is estimated to 0.48 K including the in situ and sampling uncertainty. Figure  10b shows the geographical distribution of RMSETB considering the best 64% retrievals with a corresponding RMSETB < 0.5 K.

Validation against Independent Argo Floats
The validation against drifting buoys favors the NWP SST validation results as the drifting buoys are included in the OSTIA fields. For that reason, the OE retrieval algorithm has also been run for the 57,810 Argo matchups covering the period 2009-2011 (Section 2.2.2). The matchups with Argo floats are fewer than for drifting buoys; however, they are independent and can thus be used to compare the performance of the OE SST and NWP SST. Table 4 shows the validation of the OE SSTs and NWP SSTs against Argo floats for various subsets based on the listed filters. The Argo validation results resemble what was found for the drifting buoy observations with a very clear relation between the RMSE TB and the quality of the OE retrieval, and the highest quality OE retrievals performing better than the NWP SSTs. Note that the standard deviation of differences also includes the point to footprint sampling effects that are larger for the OE retrievals than for the NWP, which has an original spatial resolution of 0.05 degrees in latitude and longitude. Similar to drifters, two uncertainty estimates are given. The OE uncertainty S x has an average value of 0.35 K, while the modeled estimate has an uncertainty of 0.47 K. The modeled uncertainty is calculated using a scaled RMSE TB with a scaling factor of 0.65. The modeled uncertainty has been evaluated for the Argo matchups and the result is shown in Figure 12. The dashed lines represent the ideal uncertainty under the assumptions that Argo floats have an accuracy of 0.002 K [42] and that the sampling uncertainty is 0.3 K [44]. value of 0.35 K, while the modeled estimate has an uncertainty of 0.47 K. The modeled uncertainty is calculated using a scaled RMSETB with a scaling factor of 0.65. The modeled uncertainty has been evaluated for the Argo matchups and the result is shown in Figure 12. The dashed lines represent the ideal uncertainty under the assumptions that Argo floats have an accuracy of 0.002 K [42] and that the sampling uncertainty is 0.3 K [44].

OE vs. NWP Latitudinal Performance
Overall the OE SST performs similar to the NWP SST when compared to SST from drifters and Argo floats. However, there are regional differences in the performance of the two SST estimates. Figure 13 shows the latitudinal difference in standard deviations of OE and NWP SST compared against the same set of drifters and Argo floats, respectively. The figure shows that the OE SST performs better than NWP SST in both northern and southern mid-latitudes, while NWP SST performs better in the tropics. The latitudinal pattern in the relative performance is remarkably similar for both the drifting buoys and the Argo floats.

OE vs. NWP Latitudinal Performance
Overall the OE SST performs similar to the NWP SST when compared to SST from drifters and Argo floats. However, there are regional differences in the performance of the two SST estimates. Figure 13 shows the latitudinal difference in standard deviations of OE and NWP SST compared against the same set of drifters and Argo floats, respectively. The figure shows that the OE SST performs better than NWP SST in both northern and southern mid-latitudes, while NWP SST performs better in the tropics. The latitudinal pattern in the relative performance is remarkably similar for both the drifting buoys and the Argo floats.

Discussion
The OE algorithm developed here is an attempt to develop a dedicated SST OE algorithm based on multi-channel microwave radiometer satellite observations. The OE methodology with the integrated forward modeling based on physical properties that provide simulated brightness temperature estimates for every retrieval contains valuable information on pixel level about the quality of the satellite observations. In addition, the OE technique offers a possibility for identifying and discarding RFI information in a very efficient way that is not possible with statistical retrieval

Discussion
The OE algorithm developed here is an attempt to develop a dedicated SST OE algorithm based on multi-channel microwave radiometer satellite observations. The OE methodology with the integrated forward modeling based on physical properties that provide simulated brightness temperature estimates for every retrieval contains valuable information on pixel level about the quality of the satellite observations. In addition, the OE technique offers a possibility for identifying and discarding RFI information in a very efficient way that is not possible with statistical retrieval methods, such as regression models.
The basis for the OE setup, the development and improvements in the forward model, as well as the validation is the comprehensive MMD that contains all the required fields for a retrieval matched with in situ observations. The MMD is a very powerful tool for algorithm development and makes testing and dependency assessment straightforward. The results presented here demonstrate that updating the state variable through several iterations is crucial for the quality of the retrievals, with a clear reduction in RMSE TB for the first two iterations. This approach resembles what is used for OE sea ice concentration retrieval [30,31] but differs from the OE IR SST retrievals, where one inversion is typically performed [24,25]. The need for several iterations is probably a result of the non-linear behavior of the forward model.
The forward model is an essential part of the OE retrievals. The OE performance therefore depends on the performance of the forward model and any deficiencies in the model to simulate the observed brightness temperature will propagate into the retrieval. During development, it was evident that using the MMD to improve the forward model is an essential step. The main improvements in the forward models ability to simulate the observed brightness temperature were found for the wind speed and SST dependency. Using results from a regression model to correct the forward model led to significant improvements in both the bias and the standard deviation between the simulated and observed brightness temperatures.
In the microwave part of the spectrum, several geophysical factors contribute to changes observed by the satellite, such as the ocean state, water vapor and cloud liquid waters etc. [63]. This means that the sensitivity in e.g., the 6 GHz channel to the actual SST variations is not as high as reported for the thermal IR part of the spectrum, where TCLW and TCWV cases are discarded before assessing the sensitivity [65]. In Prigent et al. [63] a regression based retrieval model is used to derive maximum SST sensitivities in the order of 0.65 for the 6 GHz V channel and decreasing for lower SSTs. These results are in good agreement with what we find here, with an overall sensitivity of 0.50 to the true SST. In addition, the global pattern shows a higher sensitivity at lower latitudes with warmer waters, which is also in agreement with the modeled results.
The global validation results with a mean and standard deviation of OE SST minus drifter SST of 0.02 ± 0.47 K for the best 64% of the matchups are very encouraging. These results are comparable or better than the previous validation results for PMW SST retrievals [66,67], which showed a degradation in the SST performance in cold and moist conditions. Similarly, Gentemann [18] reported on a latitudinal and SST dependency in the standard deviations, when compared to drifting buoys. The OE SST performance shows an increase in standard deviations in regions with large mesoscale activity and strong SST gradients. The reason for this is probably the temporal and spatial sampling errors when satellite observations are compared to pointwise in situ observations. Despite an enhanced spatial sampling effect from the larger PMW satellite footprint, the OE SST retrievals perform better than NWP SST in regions with mesoscale activity at both northern and southern mid-latitudes when validated against drifting buoys and Argo floats. In the tropics NWP SST performs better. It is worth to notice the consistency between drifters and Argos.
The algorithm has been developed to retrieve subskin SSTs in conditions not affected by diurnal warming, as these matchups were filtered out. This allowed the use of drifter observations for improving the forward model and for a detailed validation against drifting buoys with a nominal depth of 20 cm and Argo observations at 5 m. SSTs can also be retrieved during daytime conditions with diurnal warming, but accurate validation of the performance in these conditions against in situ would require a model or method for the subskin to 20 cm or 5 m correction.
The OE retrieval provides optimal estimates of four state variables: WS, TCWV, TCLW and SST. The focus in this paper has been on the SST retrieval and on validating this state variable against accurate in situ observations. To validate all state variables against accurate in situ observations is not straightforward for the other variables. A detailed evaluation and validation of the three other parameters are outside the scope of this paper.
The main strength of OE is that it is able to select channels with the most information and provides an independent estimate for the individual retrievals on the uncertainty of the retrievals. The estimated uncertainties were validated against independent in situ observations. The accurate and reliable uncertainty estimates increase the applicability of the SST dataset. A need for very accurate SST retrievals will of course reduce the number of SST retrievals, but a realistic uncertainty estimate guides the users to select the combination of accuracy versus data coverage that is optimal for their application.

Conclusions
In this study, the optimal estimation (OE) method has been used to retrieve subskin sea surface temperature (SST) from passive microwave (PMW) satellite observations. The results indicate that the OE SST has an overall bias (OE SST-drifter SST) of 0.02 K and standard deviation of 0.47 K when considering the 64% matchups, where the simulated and observed brightness temperatures are most consistent. The corresponding mean modeled uncertainty is 0.48 K including the in situ and sampling uncertainty. An independent validation against Argo observations shows a mean difference of 0.01 K, standard deviation of 0.48 K and modeled uncertainty of 0.47 K considering the 62% best matchups. The modeled uncertainty estimates, available for each retrieval, have proven to be accurate and reliable, when compared to in situ observations. The main advantage of the OE technique is its capability to provide valuable information on pixel level about the quality of the satellite observations, which can be used directly to identify and discard erroneous retrievals (e.g., contamination from extreme wind, atmospheric attenuation and emission, sun-glint, land/ice, rain and RFI).
Future work on the OE methodology can include more focus on improving the forward model with regard to the other state variables and on the estimation of the S a and the S statistical parameters, as these are key parameters in the retrieval process. Alternative forward models could be tested in the retrieval process, but few accurate forward models exist at present that are suitable for use in an OE context. More work should therefore be put into improving the forward models with the aim of PMW OE SST retrievals. In addition, more work could be done on assessing the role of the first guess values and the impact of these observations on the retrieved SSTs. In the present work, we have disregarded observations in the vicinity of sea ice, but considering the results obtained within the ESA-CCI Sea Ice project, a future development could include the development of an integrated ocean and sea ice OE processor, that is able to estimate the sea ice concentration and SST at the same time and thus allowing for PMW SSTs closer to the marginal ice zone.
In the context of developing an SST climate data record, it is important to note that microwave radiometer provides an independent technique to conventional infrared (IR) radiometer satellite retrievals, potentially adding robustness to the resulting multi-mission satellite SST record. Furthermore, as some areas of the global ocean are quasi-permanently obscured by clouds, few IR measurements are available: microwave radiometry mitigates this negative situation meaning that multi-mission SST datasets provide a better representation of the SST with close to daily coverage with a single wide swath satellite instrument. We note that the GCOM-W AMSR-2 is now continuing the legacy of AMSR-E with an improved capability until the early 2020's. The rotating joint for the antenna scan mechanism of AMSR-E degraded within~9.5 years of launch. The design lifetime of AMSR-2 was 5 years so a replacement is urgently needed. However, there is currently no follow-on mission either planned or in development to provide continuity of the 6-7 GHz frequency imaging capability