Evaluation of the ERA5 Sea Surface Skin Temperature with Remotely ‐ Sensed Shipborne Marine ‐ Atmospheric Emitted Radiance Interferometer Data

: Sea surface temperature is very important in weather and ocean forecasting, and studying the ocean, atmosphere and climate system. Measuring the sea surface skin temperature (SST skin ) with infrared radiometers onboard earth observation satellites and shipboard instruments is a mature subject spanning several decades. Reanalysis model output SST skin , such as from the newly released ERA5, is very widely used and has been applied for monitoring climate change, weather prediction research, and other commercial applications. The ERA5 output SST skin data must be rigorously evaluated to meet the stringent accuracy requirements for climate research. This study aims to estimate the accuracy of the ERA5 SST skin fields and provide an associated error estimate by using measurements from accurate shipboard infrared radiometers: the Marine ‐ Atmosphere Emitted Radiance Interferometers (M ‐ AERIs). Overall, the ERA5 SST skin has high correlation with ship ‐ based radiometric measurements, with an average difference of~0.2 K with a Pearson correlation coefficient (R) of 0.993. Parts of the discrepancies are related to dust aerosols and variability in air ‐ sea temperature differences. The downward radiative flux due to dust aerosols leads to significant SST skin differences for ERA5. The SST skin differences are greater with the large, positive air–sea temperature differences. This study provides suggestions for the applicability of ERA5 SST skin fields in a selection of research applications.


Introduction
Sea-surface temperature (SST) has been declared to be an Essential Climate Variable (ECV; [1]) by the Global Climate Observing System (GCOS). SST data are essential in many areas of research, such as climate change and weather forecasting [2][3][4].
SST observations are unevenly distributed in terms of space and time. The retrieval of the sea surface skin temperature (SSTskin) both by radiometers on earth observation satellites [3,4] and shipboard instruments [5,6] has been developed over many years and is a mature subject. Climate change research usually needs consistent SST data, which may be acquired by long series of measurements. However, weather and ocean forecasting typically require the best estimate data, collected by as many observations as possible within a specific period of time, and available within a short interval after the measurements are taken. Reanalysis datasets usually strike a balance between these two requirements, trying to generate long-term, consistent, high-quality data [7]. Over the past few decades, a number of reanalyses, such as the European Centre for Medium-Range Weather Forecasts (ECMWF) re-analyses, ERA-Interim [8] and ERA5 [9,10]; the National Centers for Environmental Predictions (NCEP)-National Center for Atmospheric Research Climate Forecast System Reanalysis [11]; the NASA Modern Era Retrospective-Analysis for Research and Applications (MERRA) [12] and MERRA-2 [13,14]; the Japanese global atmospheric reanalysis JRA-55 [15], have drawn a lot of attention. These reanalysis products have created long-term global SST fields, from 1979 to present. This study focuses on evaluating the latest generation of high-resolution SSTskin from ERA5.
Several previous researchers have evaluated the performance of ERA5 using observations from field campaigns and meteorological stations. Graham, et al. [16] used radiosondes which have not been assimilated into any reanalyses to validate the ERA5 wind speed, humidity and air temperature data in the Arctic Fram Strait relative to MERRA-2, JRA-55 and ERA-Interim; the newly released ERA5 has a higher correlation with the independent radiosonde data than the other reanalyses, and with less bias. Hirahara, et al. [7] validated the high-resolution SSTs used in ECMWF, specifically, the HadISST [17] and OSTIA [18]; their optimal usage for ERA5 and performance is well described: these two products are in good agreement in the global SST fields: the spread of the global mean SST is about 0.02K, but with locally larger biases in eddy-active regions. Nogueira [19] presented a comprehensive inter-comparison of the rainfall over the last 40 years between the Global Precipitation Climatology Project (GPCP) and ERA5 reanalysis; the convective rainfall and moisture convergence patterns are better represented in ERA5 than ERA-Interim. The significant rainfall underestimation over the mid-latitude oceans in ERA-Interim has been significantly improved in ERA5. Mahto and Mishra [20] evaluated ERA5 hydrologic application data such as precipitation, runoff, soil moisture and surface temperatures against the observations from India Meteorological Department, revealing that ERA5 products perform better than other reanalysis data.
The performance of ERA5 SSTskin has not been evaluated. A key limitation is the paucity of surface-based SSTskin-related field campaigns or stations. In general, a popular SST validation source is the drifting buoy array, with thermometers mounted 10-20 cm below the sea surface, but the temperature differences between that depth and the surface [4,21,22] may introduce errors in the validation.
Independent SSTskin derived from the Marine-Atmosphere Emitted Radiance Interferometers (M-AERI; [6]) are used in this study to perform an assessment of ERA5 SSTskin and evaluate the potential inaccuracies associated with dust aerosols and sensitivity to air-sea temperature differences. Data from a series of NOAA Aerosols and Ocean Science Expeditions (AEROSE; [23]) and Royal Caribbean International (RCI) cruises are used in this study. In addition, in many research cruises where radiometric SSTskin were made, atmospheric temperature and humidity profiles were also measured. The datasets have not been submitted to any assimilation schemes, so the M-AERI data used here are independent of the ERA5 fields.
We organize this paper as follows: The M-AERI-retrieved SSTskin data, ERA5-derived SSTskin data, and other MERRA-2 inputs are introduced in Section 2. Details of the cruises are also introduced in Section 2. In Section 3, we present the overall statistics of the comparisons. The results of the error analysis are discussed in Section 4 with day/night differences, air-sea temperature difference effects, and dust aerosol effects.
The ERA5 SSTskin product is based on a model simulation with data from satellite-derived SSTs. The temperature of the depth where there is no diurnal signal is the foundation temperature; the foundation temperature for ERA5 is taken from the Operational Sea Surface Temperature and Sea Ice Analysis (OSTIA) analysis [18], which is a blended product from various satellite-retrieved SST and in situ data. As to near-surface effects, ocean temperature variability is represented by three physical processes: the thermal skin cool layer during both day and night, the diurnal heating warm layer during the day, and the salinity saturation effect near the surface [10].
The cool skin effect originates from the heat loss to the atmosphere, the temperature difference between the skin layer ( ) and at the foundation depth ( ) can be expressed as [24,25]: where is the net solar radiation at the surface, is the fraction of the surface-absorbed solar radiation, is the water density, is the volumetric heat capacity, is the molecular thermal conductivity of water and is the skin layer thickness. Q is the net heat flux in this cool layer: (2) where , , and denote the surface sensible heat flux, latent heat flux and net long wave radiation at the surface, respectively.
The can be given as: 0.065 11 6.6 10 1 / . ( The diurnal warming layer [25,26] is due to the solar absorption during the daytime; the diurnal warming effect may be affected by surface wind, by cloud amount and type, by free convection, or by internal waves [27]. The ERA5 diurnal warming calculations are based on Takaya, et al. [28] and can be expressed as: where is the temperature below the cool skin layer, d is the depth of the diurnal warm layer, which is set as 3 m, is the profile shape and it is set as 0.3, * w is the water friction velocity, / is the stability function and L is the Obukhov length; is the solar radiation absorbed at depth -d, which is The nondimensional shear stability function, / , is The Obukhov length L is * / Equation (4) has been integrated in time to derive the warm layer effect; during daytime, the warm layer effect from Equation (4) and the cool layer effect ( ) from Equation (1) have been added together to derive .
Different reanalysis schemes use different choices of these parameter settings: for example, according to Akella, et al. [29] and Gentemann and Akella [21], the NASA MERRA-2 temperature profile uses 2 m and 0.2 for the diurnal warm layer depth, d, and the diurnal profile shape, , respectively, but ERA5 uses 3 m and 0.3 [25]. It is essential to evaluate the newly updated ERA5 data.

M-AERI Data
Self-calibrating, ship-based radiometers provide SSTskin that is more directly comparable to the ERA5 SSTskin than the temperatures at the depth of the drifting buoy measurements. This study utilizes the M-AERI [6,30], a ship-based spectro-radiometer mounted a few meters above the sea surface on the ships, as shown in Figure 1, to validate the ERA5 SSTskin. The internal calibration of the M-AERIs is checked in the laboratory using an SI-traceable waterbath blackbody calibration target [31][32][33] (Figure 1). The M-AERI viewing geometry is shown in Figure 2; each M-AERI contains two internal blackbodies, one at ambient temperature and the other heated, that provide a two-point calibration before and after each measurement of the sea-surface and sky infrared emissions. The sky emission measurement is used for correcting the sky radiance. After the interferometer sequentially measures the sea and sky emissions over a specified time interval, the scan mirror rotates to the apertures of the two blackbody cavities to provide a real-time two-point calibration of the measured emission spectra.  [30]). Rsky, and Rsea are the spectral infrared radiances measured in the direction of sky and sea surface. The Rsky provides a correction for the sky radiation that is reflected at the sea surface.
The SSTskin derived from M-AERI instruments can be expressed as: where B is the Planck function; Rwater, Rsky, and Rh are the spectral radiance measured in the direction of the sea surface, emitted by the atmosphere above the instrument, and below the instrument (both directly into the measured beam and reflected at the sea surface). λ is the wavelength of the radiance, θ is the angle from vertical of the measurement, and ε is the surface emissivity at λ and θ. The detailed technical description, including the atmospheric correction, is given by Minnett, et al. [6]. The SSTskin derived from the M-AERI spectra has an uncertainty ~ 0.04 K. M-AERI deployments are monitored from the laboratory via a satellite Internet link.
M-AERI spectral measurements are also used to derive a near-surface air temperature [34]. Thus, the M-AERI spectral measurements can provide better air temperature than by conventional contact thermometers. Accurate M-AERI-derived air temperature have also been used in this study to characterize the conditions in the lower atmosphere in the comparisons with ERA5 data.
Until the recent suspension of cruises in response to the Covid-19 pandemic, there were four M-AERIs operational-three on ships of Royal Caribbean International (RCI): Celebrity Equinox (May 2014-March 2020), Allure of the Seas (June 2014-March 2020), and Adventure of the Seas (January, 2018-March 2020). The fourth is usually deployed on research vessels, such as on the NOAA ship Ronald H Brown (RHB) for a circumnavigation from March to October 2018. Figure 3 shows the tracks of deployments on several research vessels that have provided matchups in a wide range of environmental conditions. The RCI ships have provided a rich source of measurements in the western North Atlantic Ocean, Caribbean Sea, and the Mediterranean Sea. Data from nine campaigns from 2004 to 2019 were taken during the AEROSE project [23] on the NOAA Ship Ronald H. Brown and the R/V Alliance. AEROSE comprises Atlantic field campaigns to conduct in situ measurements of the effects of Saharan dust aerosol on the tropical and subtropical Atlantic Ocean. The dust effects on satellite-derived SSTskin have been quantified using AEROSE data [35,36]. The SSTskin provided by AEROSE is valuable to validate ERA5 SSTskin data under the dustpolluted air layers. Table 1. Summarizes the times and regions of M-AERIs deployed on RCI ships; Table 2 summarizes the same information, but for AEROSE cruises.

MERRA-2
Dust effects on satellite derived SSTskin have been discussed by Luo, et al. [35]; high concentrations of dust aerosol are also a problem for reanalyses [37], and dust appears to degrade the quality of MERRA-2 SSTskin [37]. MERRA-2 aerosol dust fields are used to quantify the effect of Sahara aerosol dusts on the ERA5-derived SSTskin.
NASA's Goddard Earth Sciences MERRA-2 dataset provides atmospheric and surface fields [13,14], some of which are useful for this study. The data were downloaded from http://disc.sci.gsfc.nasa.gov/mdisc/. The MERRA-2 aerosol analysis system [14,38] provides the assimilated aerosol-related radiation output and dust scattering aerosol optical thickness (AOT) for this study.
The MERRA-2 AOT profile is taken from the variable labelled tavg1_2d_aer_Nx, which is a 1hourly time-averaged aerosol diagnostic product. The surface net downward longwave flux due to aerosols is taken from the variable tavg1_2d_rad_Nx, which is a 1-hourly time-averaged radiation product and contains the surface-absorbed shortwave and longwave radiation, top of atmosphere incoming shortwave flux, cloud fraction, surface albedo, etc. The surface net downward longwave flux due to aerosol used is calculated as: where LW↓with_aerosol is the MERRA-2 LWGNTCLR product, meaning surface net downward longwave flux assuming clear sky (cloud-free), and LW↓clear is the MERRA-2 LWGNTCLRCLN product, meaning surface net downward longwave flux assuming clear sky and no aerosol.
The MERRA-2 dataset has a spatial resolution of 0.625° (longitude) and 0.5° (latitude), being different from ERA5 which has 0.25° × 0.25° resolution. Therefore, MERRA-2 aerosol and radiation data are bi-linearly interpolated to the ERA5 positions in this study.

Results
In a skin-to-skin temperature comparison, SSTskin values from ERA5 are directly compared with M-AERI SSTskin. The comparison of ERA5 SSTskin with M-AERI SSTskin values can be made by populating a matchup data base (MUDB). Each MUDB record includes the ERA5 SSTskin corresponding to a set of times and locations of a M-AERI measurement. The data vector also contains the M-AERI near-surface air temperature, MERRA-2 AOT, MERRA-2 radiation profile and other instrumental variables. The ERA5 SSTskin were temporally and spatially bi-linearly interpolated to the ship positions and times. Moreover, because RCI cruises are often near coasts and ERA5 has a horizontal resolution of 31 km, we calculate the distance to the land of each ship-board measurement and apply a filter to exclude the matchup points which are less than 32 km to land. In addition, some oceanic features, such as upwelling and freshwater input, are stronger near coasts; the corresponding SSTskin variations within 31 km cannot be determined from ERA5 data. For these reasons, the filter has been used to avoid significant errors due to the ERA5 spatial resolution.

Statistics of SSTskin Comparisons
The scatter plot in Figure 4 shows that there are a few matchups with significant bias, but that there is good quantitative agreement between ERA5 and M-AERI data. The histogram of the differences of ERA5 SSTskin minus M-AERI SSTskin are shown in Figure 4 (right) with a well-defined histogram peak; most of the differences fall into the range of −1 K to 1 K.  Table 3 shows the statistics of the ERA5 SSTskin minus M-AERI SSTskin differences during AEROSE cruises and Table 4 shows the same statistics for the RCI cruises. The mean differences are -0.190 K for AEROSE cruises and -0.220 K for RCI cruises. The overall standard deviations (STD) are 0.348 K and 0.358 K. Robust standard deviations (RSD) are less sensitive to outliers and are a better representation of the ERA5 SSTskin algorithm performance [39]. The robust statistics of the difference are the best assessment of the ERA5 SSTskin performances, which are between 0.239 K and 0.247 K, similar for both cruises and smaller than the STD. Table 5 summarizes the statistics of the SSTskin differences for all of the cruises, comprising a total of 291,986 match-up pairs. ERA5 SSTskin values are generally in good agreement with the corresponding M-AERI data, with a median difference of -0.214 K and an RSD of 0.356 K.     The map is representative of the whole data set. A cool skin effect is present all of the time, and the diurnal heating is present during the daytime when wind speeds are low. To compare the performance of ERA5 SSTskin derivation algorithms during the daytime and nighttime, the SSTskin difference has been separated as 7 AM-5 PM as daytime, and 7 PM-5 AM as nighttime. The histograms of the results are presented in Figure 6. There are 88,955 matchups during the daytime, and 166,849 matchups during the nighttime.

SSTskin Bias Distribution
The comparison, based on 88,955 daytime matchup pairs, showed that ERA5 had an average SSTskin difference of −0.172 K; the nighttime had an average SSTskin difference of -0.237 K, with an average STD of 0.347 K. A statistical two-sample t-test rejects the null hypothesis and the means between day and night should therefore be considered as dissimilar. The effects of diurnal heating in the upper ocean is expected to be small during the nighttime and the SSTskin variation should be less than during the daytime. However, the nighttime SSTskin had larger discrepancies with the M-AERI than the daytime by an average of 0.065 K. One possible reason for the larger nighttime difference may be due to the variations in the air-sea temperature difference, which will be discussed in the Section 4.2.

Discussion
This study is intended to provide better knowledge of the characteristics of the errors. Discussion in this section about the accuracy of the ERA5 fields is split into two parts: air-sea temperature differences, and aerosol dust effects.

Air-Sea Difference Effect
Accurate air temperatures derived from M-AERI spectra [34] are part of the matchup records. Figure 7 shows the M-AERI air temperature minus M-AERI SSTskin along the cruise tracks between 60 o W and 90 o W. Advection of the air over strong SST gradients, such as in the Gulf Stream area, could lead to anomalous air-sea temperature differences, where anomalous means different from the usual open-ocean distribution. To investigate the possible consequence of air-sea temperature differences, we focus an analysis from 0° N to 50° N, and 50° W to 100° W in the Atlantic region. The corresponding ERA5 minus M-AERI SSTskin differences are displayed in Figure 8.   Figure 7. The SSTskin differences are large with large air-sea temperature differences. Temperature differences are in K. Averaging is over 0.5 K bins.
The ERA5 minus M-AERI SSTskin difference is related to the air temperature minus SSTskin. Renfrew, et al. [40] compared the R/V Knorr surface meteorological measurements with ECMWF and NCEP reanalysis over the Labrador Sea during February to March of 1997. Since the sensible heat flux is directly related to the air-sea temperature difference when the air-sea temperature difference is large, the sensible heat flux is high. Smith, et al. [41] also highlighted the shortcomings of the surface heat flux parameterization, finding that the latent heat fluxes contain significant systematic errors dependent on dry stability (SST minus air temperature). Figure 8, using the data shown in Figure 6, compares the ERA5-M-AERI SSTskin differences during the daytime and the nighttime. The air temperature is usually warmer during the daytime, and, for the daytime SSTskin difference statistics shown in the histograms of Figure 6 (a), it is less negative than nighttime. According to Equations (1) and (2), the cool skin effect is strongly dependent on the heat flux parameterizations employed in the ERA5 SSTskin scheme.

Dust Aerosol Effects
The Saharan Air Layer and the associated dust outflow can flow over the Atlantic Ocean [42]. The radiative impact of mineral dust is one of the major contributors to the satellite-retrieved SSTskin inaccuracies in this region [35]. The Saharan dust layer has also been a problem for the reanalysis of SSTskin fields [37] and the numerical weather prediction [43]. The dust aerosols, transported across the Atlantic Ocean within the Saharan Air Layer, contribute to formation of shallow stratocumulus clouds under the base of the Saharan Air Layer [44,45]; satellite measurements frequently showed dust within the SAL layer between 1 km and 5 km altitude, and the presence of narrow stratocumulus clouds below the dust layer [46].
The SSTskin data collected during the cruises provide an opportunity to investigate the accuracies of the ERA5 SSTskin values near the regions susceptible to strong Saharan dust outbreaks in the tropical and subtropical Atlantic Ocean. Figure 9 shows the ERA5-M-AERI SSTskin differences along cruise tracks from 2004 to 2019, indicating that there are strong negative SSTskin biases near the Saharan dust region. Plots of the corresponding MERRA-2 AOT data are given in Figure 10. ERA5-M-AERI SSTskin differences increase with strong aerosol dust outflow.  The cloud influence on errors in ERA5 downwelling longwave radiation at the surface has been discussed by Silber, et al. [47]; however, the dust aerosol influence on the surface downwelling longwave radiation has not been studied. Numerical weather prediction models are usually under the effects of the longwave radiation and other model errors related to aerosol indirect effects [47]. The Saharan dust layer induces a vertical dipole effect [43,48], which warms within the dust layer and introduces a cooling of the surface below. The thermal dipole effect can lead to increased atmospheric stability during the daytime and decreased stability during the nighttime; the diurnal cycle of precipitation and wind speed is affected [49]. The dust layer radiative effect has been included in the NASA MERRA-2 reanalysis product. To derive the surface net downward longwave flux due to aerosols along the cruise tracks, we have matched the MERRA-2 radiation to the times and locations of the M-AERI measurements, then computed the surface net downward longwave flux due to aerosol according to Equation (9). Figure 11 shows the aerosol downwelling longwave radiation at the sea surface. Figure 12 shows the M-AERI and ERA5 SSTskin scatterplot with surface net downward longwave flux due to dust aerosols, and Figure 13 gives the relation with ERA5 SSTskin bias. It can be seen that the intense downward longwave flux leads to substantially significant SSTskin differences for ERA5; the averaged SSTskin difference can be as large as 1 K when the aerosol radiative flux is above 10 .
The atmospheric thermal structure change due to aerosol radiative effect will introduce changes in reanalysis models. Interactive-aerosol, which is a feature implemented in NASA GEOS-5 Global Forecasting System, was studied by Reale, et al. [48]; the consideration of the interactive aerosols radiative effects can increase the accuracy of the African easterly jet representation. Similarly, the ERA5 SSTskin scheme's improvements in accuracy would be expected if these aerosol effects were taken into account.

Conclusions
SST is an important parameter in the global climate system. In recent years, it has become increasingly apparent that those involved in the fields of climate change studies and weather prediction require highly accurate estimates of the errors and uncertainties of the reanalysis data. By assessing the accuracy of the ERA5-derived SSTskin, this study was aimed at improving the understanding of the strengths and weaknesses of ERA5 data. The use of high-accuracy shipboard radiometers with calibration traceability to SI-standards permitted the determination of the accuracies of ERA5 SSTskin.
The independent SSTskin observations from research vessels and RCI cruise ships provide a valuable way to validate ERA5 SSTskin values, including in areas influenced by Saharan dust aerosol. This study developed a matchup technique by using a subset of ERA5 data that coincide with the shipboard M-AERI measurements deployed for the validation of satellite-derived SSTskin [50,51]. The statistics in this study are considered as skin-to-skin temperature comparisons, which avoid the subsurface temperature variability inherent in comparisons with in situ sea temperature measurements. The results indicate good performance of the ERA5 SSTskin algorithm, with an average bias of -0.213 K, RSD of 0.243 K and STD of 0.356 K. The accuracy of the ERA5 SSTskin during the daytime is generally better than during the nighttime. The overall Pearson correlation coefficient (R) is 0.993 and the Nash-Sutcliffe efficiency coefficient (E) is 0.980; ERA5 and M-AERI have a very strong correlation with each other. The contributions of the atmospheric temperature effects should be paid attention to, as the ERA5 SSTskin bias appears to be straightforwardly related to the air-sea temperature differences. The ERA5 SSTskin difference with respect to the M-AERI measurements in the Saharan dust outflow regions, with aerosol distributions taken from the MERRA-2 AOT, indicates that the SSTskin derived by ERA5 is affected by the downward aerosol longwave flux. The averaged difference can be as large as 1 K when the aerosol downward longwave flux is above 10 / . However, more work is needed to evaluate the ERA5 SSTskin dependence on other factors, such as wind speed, water vapor, smoke, sea salt aerosol, and clouds. It is difficult to draw any firm conclusions concerning the accuracy of ERA5 SSTskin at the global level, due to the quite limited geographical area in this research. We anticipate that further comparison studies will be extended to wider geographic areas in the future. Moreover, further research will include the important dust effect on SST.