Comparison of Above-Water Seabird and TriOS Radiometers along an Atlantic Meridional Transect

The Fiducial Reference Measurements for Satellite Ocean Color (FRM4SOC) project has carried out a range of activities to evaluate and improve the state-of-the-art in ocean color radiometry. This paper described the results from a ship-based intercomparison conducted on the Atlantic Meridional Transect 27 from 23rd September to 5th November 2017. Two different radiometric systems, TriOS-Radiation Measurement Sensor with Enhanced Spectral resolution (RAMSES) and Seabird-Hyperspectral Surface Acquisition System (HyperSAS), were compared and operated side-by-side over a wide range of Atlantic provinces and environmental conditions. Both systems were calibrated for traceability to SI (Système international) units at the same optical laboratory under uniform conditions before and after the field campaign. The in situ results and their accompanying uncertainties were evaluated using the same data handling protocols. The field data revealed variability in the responsivity between TRiOS and Seabird sensors, which is dependent on the ambient environmental and illumination conditions. The straylight effects for individual sensors were mostly within ±3%. A near infra-red (NIR) similarity correction changed the water-leaving reflectance (ρw) and water-leaving radiance (Lw) spectra significantly, bringing also a convergence in outliers. For improving the estimates of in situ uncertainty, it is recommended that additional characterization of radiometers and environmental ancillary measurements are undertaken. In general, the comparison of radiometric systems showed agreement within the evaluated uncertainty limits. Consistency of in situ results with the available Sentinel-3A Ocean and Land Color Instrument (OLCI) data in the range from (400…560) nm was also satisfactory (−8% < Mean Percentage Difference (MPD) < 15%) and showed good agreement in terms of the shape of the spectra and absolute values.


Introduction
The European Commission provides daily global ocean color data via the Ocean and Land Color Instrument (OLCI) on board Sentinel-3 (S-3) satellite in the context of the Copernicus program. The first Sentinel-3 satellite was launched in 2016. The Sentinel-3 mission will continue for at least two decades through the sequential launch of a cluster of satellites. These will provide data to Europe's Copernicus environmental program to support monitoring, services, decision and policymaking, and climate change studies.
Based on the requirements of the Global Climate Observing System (GCOS), there is less than 5% uncertainty level expectation for water-leaving radiance (Lw) data contributing to climate studies. To reduce the uncertainties in the satellite products, System Vicarious Calibration (SVC) approach has been undertaken using field data to calibrate the combined system of satellite instrument and the processing algorithm [1,2]. SVC has been operationally used for previous, e.g., MEdium Resolution Imaging Spectrometer (MERIS) on board Environmental Satellite (ENVISAT) and on ongoing missions to meet ocean color mission requirements in open waters. For Sentinel-3 data, the SVC gains have now been applied to OLCI data on Sentinel-3A but not to Sentinel-3B yet. For S-3 OLCI radiometric products, Sentinel-3 mission requirements [3] foresee 5% uncertainty for bands (490, 510, 560) nm and 5%-10% uncertainty for bands (400, 412, 442) nm depending on water types. To qualify as Fiducial Reference Measurements (FRM), quantification of the uncertainties in the Earth observation data is required. This can only be done by quantifying the uncertainties in the field data used to validate and through rigorous intercomparison exercises to assess differences in radiometer systems used for such validation.
The Fiducial Reference Measurements for Satellite Ocean Color (FRM4SOC) project aimed to evaluate and improve state of the art in ocean color radiometry through review of commonly used radiometers [4], SI (Système international) traceable calibrations [5], protocols for the downwelling irradiance [6] and water-leaving radiance [7], uncertainty evaluation at different stages of the traceability chain [5,8], and through series of radiometric comparisons [9][10][11][12]. These included: • a comparison of radiance and irradiance sources used for calibration of radiometers National Physics Laboratory-UK(NPL, UK) [9]; • an indoor comparison of uniformly calibrated radiometers measuring stable radiance and irradiance sources where the illumination conditions and measurement geometry were strictly controlled and close to ideal [10]; • an outdoor comparison over a Case 2 water body with the radiometers installed on the fixed platform (Lake Kääriku, Estonia). The illumination conditions during this experiment were variable due to the weather, while the measurement geometry resembled as closely as possible to the realistic field conditions [11]; • a further outdoor comparison with the same instruments a year later on a fixed platform (the Aqua Alta Oceanographic Tower -AAOT) under near-ideal environmental conditions [12]; • a shipborne campaign on the Atlantic Meridional Transect 27 (AMT27), (the current study). The indoor exercise consisted of calibration of the instruments at the same laboratory and demonstrated satisfactory consistency between sensors with a standard deviation within ±1% [10]. The field comparisons had substantially larger variability between the same sensors, implying to the respective increase of the uncertainty of the field results. At Lake Kääriku, Estonia, there was large variability between recently calibrated sensors due to high spectral and spatial variability in the targets and environmental conditions. At the AAOT in the Adriatic Sea off Venice, there was a <5% difference in normalized water-leaving radiance of both TriOS-Radiation Measurement Sensor with Enhanced Spectral resolution (RAMSES) and Seabird-Hyperspectral Surface Acquisition System (HyperSAS) sensors compared to Aerosol Robotic Network for Ocean Color (AERONET-OC) SeaWiFS Photometer Revision for Incident Surface Measurements (SeaPRiSM) [12]. The reasons for the increase from 1 to 5% are likely to be due to different measurement targets during field measurements and calibration, both spectrally and spatially; less stable ambient temperatures during the field campaigns, which can vary, compared to the calibration temperature, by more than ±15 °C.
In this study, we analyzed the difference between a TriOS-RAMSES and Seabird-HyperSAS systems that were used in the indoor laboratory intercomparison and the two field intercomparisons [10][11][12] on the 27th Atlantic Meridional Transect (AMT27) cruise, which crossed a range of ocean provinces and different environmental conditions. The in situ data was also compared to S-3A OLCI radiometric products.
The objectives of this work were: (1) to analyze above-water in situ radiometric data measured using two different systems, both of which used three radiometers each, under variable environmental conditions, in the context of the previous intercomparisons [10][11][12]; (2) to specify the uncertainties for both systems, and to evaluate the consistency of measured in situ data; (3) to evaluate the consistency of satellite data with the in situ results, accounting for estimated in situ uncertainties.

Study Site
The AMT27 field campaign took place from 23rd September to 5th November 2017 from Southampton, UK to South Georgia and the Falkland Islands on the UK-Natrual Environment Research Council (NERC) ship Royal Research Ship (RRS) Discovery. The AMT is a multidisciplinary research program, which undertakes biological, chemical, and physical oceanographic research during an annual voyage between the UK and destinations in the South Atlantic. The program has been running for 20 years and was established in 1995, in collaboration with National Aeronautics and Space Administration (NASA), as an independent platform to validate Sea-Viewing Wide Fieldof-View Sensor (SeaWiFS) ocean color data. The transect covered several ocean provinces where key physical and biogeochemical variables, such as chlorophyll, primary production, nutrients, temperature, salinity, and oxygen, were measured. The stations sampled were principally in Case 1 waters [13,14]; in the North and South Atlantic Gyres, but also the productive waters of the Celtic Sea, Patagonian Shelf, and Equatorial upwelling zone were visited, which, therefore, offered a wide range of variability in which to conduct field intercomparisons. The measurement stations are listed in Table 1; Sentinel-3A OLCI quality controlled match-ups were available for station id-s 22, 32, 46, 48, 56.

In situ Above-Water Radiometric Data
Stations were sampled daily at 12:00 local time to ensure coincident in situ measurements within 1 h of the S-3 overpass. Radiometric measurements were performed with two sets of above water hyperspectral radiometers, both consisting of three separate sensors to measure radiance from the water surface Lu(λ), radiance from the sky Ld(λ), and downwelling solar irradiance Ed(λ). Plymouth Marine Laboratory (PML) used three Seabird (formerly Satlantic) HyperOCR sensors, while the University of Tartu (TO) used three TriOS RAMSES sensors. All radiance and irradiance sensors of both radiometric systems were SI-traceably calibrated at the Tartu Observatory following procedures entirely described in [10]. To comply with FRM, the sensors were calibrated frequently; in this case, three times: in April 2017 before the second SI-traceable Laboratory inter-comparison experiment (LCE-2) campaign, just after the AMT-27 campaign in January 2018, and in June 2018 before AAOT. All of these sensors were involved also in the LCE-2 intercomparisons, and during indoor measurements [10] demonstrated differences of less than ±1% both for radiance and irradiance results. However, during the outdoor exercise under cloudy conditions, the PML irradiance sensors did show up to 6% higher values in Ed at 400 nm, and radiance sensors did show up to about 10% higher values in red and IR parts of spectrum than the respective TO sensors [11]. Technical parameters [15,16] of the applied radiometers are given in Table 2. The sensors are based on the Carl Zeiss Monolithic Miniature Spectrometer (MMS), incorporating a 256-channel silicon photodiode array. The SI-traceable radiometric calibration covered the wavelength range of (350...900) nm. During the field measurements, the integration time was automatically adjusted to correspond to the measured light intensity. The data acquisition system consisted of power supplies, RS232 multiplexers, and logging computers. The instruments were mounted on a common steel frame, which was constructed to perform measurements under identical zenith and viewing angles ( Figure 1). The radiance sensors (Ld(λ)) and (Lu(λ)) were positioned at the very front of the ship ( Figure 1A), with an un-obscured view of the ocean and sky ( Figure 1C), side by side at the same height at 40° and 120° angles from zenith ( Figure 1B), respectively. The colinearity of the radiance sensors in the frame was set by visual observation from the side of the frame and was estimated to be within ±1°. The downwelling irradiance sensors were positioned on the same steel frame, higher from other sensors to avoid any ship shadows. A fixed mounting frame of the irradiance sensors ensured equal height and leveling of the cosine collectors.

In situ Data Processing
For both systems, the radiometric raw data were logged through a laptop, which was set up in the meteorological laboratory, some 50 m away from the setup of the radiometers on the meteorological platform at the bow of the ship ( Figure 1). The HyperOCR instruments produced proprietary binary data files as standard output from the manufacturer's software. The TriOS RAMSES spectrometers were operated by software developed in TO; spectra were stored in the American Standard Code for Information Interchange (ASCII) datafiles. The three HyperOCR spectrometers were individually measured in burst mode (i.e., continuously), while the three RAMSES devices performed synchronized measurement every 10 seconds. Altogether, five million spectra were collected by PML and 200,000 by TO. Additionally, ancillary meteorological and positional data were provided by the AMT crew in the form of Network Common Data Form (NetCDF) files. Particularly, position latitude and longitude, ship speed and direction, and wind speed and air temperature were used in this study.
A number of computer programs were created for this particular dataset to process the data. The algorithms were programmed directly without using external libraries. Third-party software was used to visualize the results and extract data from NetCDF files. Due to a large amount of spectra, a database was created first for all instruments, containing the filenames and positions within the data files, which was ordered by timestamps. Then, the database was used to dynamically extract the individual spectra from the raw data files without creating unnecessary copies of big data for the HyperOCR spectra, and the closest shutter measurement was subtracted from each field spectrum. In the case of TRiOS-RAMSES instruments, the average signal over the opaque pixels was used as a dark reference. In the case of HyperOCR, the spectrum was derived first and then the closest (within ±3 s) and spectrum in order to form the consistent data triplet. The spectra were derived according to the cast start and stop timestamps, calibration and all necessary corrections/filtering applied, after which the output quantities were calculated, and the corresponding uncertainties were evaluated. The results were stored as ASCII data files, which are convenient to use for post-processing or spreadsheet programs. The hyperspectral data was convoluted into 19 OLCI channels, from 400 nm to 885 nm, based on channel definitions from [17].
The corrections and filtering criteria were sequentially applied via command line parameters during the data processing. The steps included in the processing were: straylight correction, NIR similarity correction, clear sky, and overcast screening. The straylight removal algorithm was based on [18]. The Line Spread Functions (LSF) have been previously measured at TO. The straylight algorithm was applied separately to the raw calibration signals and to all the raw field spectra, individually for the six participating radiometers. NIR similarity correction was based on Formulas (3) and (4). The clear/overcast condition was based solely on the threshold level and was only used to assess the cosine error of the Ed sensors (Section 3.2).
The uncertainty of the results was evaluated according to the Guide to the Expression of Uncertainty in Measurement (GUM) [19]. For each input quantity, a relative standard uncertainty was estimated. The relative combined standard uncertainty of output quantity was calculated by combining relative standard uncertainties of all input estimates by using formula (12) of the Joint Committee for Guides in Metrology (JCGM) [19]. Radiometric calibration of the irradiance and radiance sensors and respective uncertainty budgets are described in [5]. The properties of the measured and derived quantities (water-leaving reflectance , downwelling sky irradiance Ed, water-leaving radiance Lw) and evaluation of related uncertainties can be found in [11,20]. Measurement models for the evaluation of the uncertainty of and Lw are defined by the formulas (1), (2), and (5). Type B uncertainty of results measured by the Ed sensor included the calibration uncertainty and the aging, contribution due to the non-cosine response and temperature effects. The calibration uncertainty included the following components: alignment, repeatability, temporal stability of the calibration source, repeatability of the calibration and dark signals, thermal effects in the laboratory, polarization sensitivity. The absolute calibration of the source was excluded from the uncertainty budget of because all the radiometers were calibrated against the same lamp. Type A uncertainty for , Ed, and Lw was calculated as the standard deviation of the station average, taking into account the effective degree of freedom based on the lag1 autocorrelation of the time series [21]. The autocorrelation coefficient varied from 0 to 1, depending on the station. In the figures, the expanded uncertainties (k=2) are shown. En numbers [22] were used to assess the agreement between the results from two radiometric systems. The water-leaving reflectance spectra were calculated from the synchronized triplets measured with HyperSAS and TriOS-RAMSES hyperspectral radiometers following MERIS-Regional Validation of MERIS Chlorophyll Products in North Sea coastal waters (REVAMP) and the National Aeronautics and Space Administration (NASA) protocols [23,24]. The water-leaving reflectance ⌊ ⌋ was calculated as where ( ) is the remote sensing reflectance, ( ) is the upwelling radiance from the sea, ( ) is the downwelling radiance from the sky, ( ) is the downwelling irradiance, and ( ) is the sea surface reflectance as a function of wind speed ( , m·s −1 ), calculated as The NIR similarity correction of the water-leaving reflectance spectra was based on [25,26]. First, the additive correction term for every individual spectrum was found as The constant parameter α , of the NIR similarity correction [25] is determined in [26] and depends on the choice of wavelengths λ1 and λ2; α , = 2.35 for the λ1 = 720 nm and λ2 = 780 nm. The NIR similarity-corrected water-leaving reflectance, ( ), was then calculated as: The NIR-corrected water leaving radiance was calculated from corrected ( ) as For the output quantities ( , , and ), the median over the station results was calculated, and only the spectra within ±10% in respect of the median were included in final averaging.
The evaluation of the agreement between the results from two radiometric systems, numbers were calculated following [22] as where and are the independent results subject to comparison; and are the expanded uncertainties of these results with k=2, respectively. The agreement between the compared values was considered satisfactory if | | ≤ 1 and non-satisfactory if | | > 1.

Sentinel-3A OLCI Data
Sentinel-3A OLCI full resolution level-2 data was downloaded from EUMETSAT (https://eoportal.eumetsat.int/) from the same day with in situ measurements. OLCI values from the 3 x 3 pixel Region of Interest (ROI) centered at the coordinates of the in situ stations were extracted for further analyses.
The recommended set of flags (CLOUD, CLOUD_AMBIGUOUS, CLOUD_MARGIN, INVALID, COSMETIC, SATURATED, SUSPECT, HISOLZEN, HIGHGLINT, SNOW_ICE, WHITECAPS, ANNOT_ABSO_D, ANNOT_MIXR1, ANNOT_TAU06, RWNEG_O2, RWNEG_O3, RWNEG_O4, RWNEG_O5, RWNEG_O6, RWNEG_O7, RWNEG_O8) was applied on the data to eliminate possible invalid values. Additionally, quality control was performed by checking the OLCI zenith angle < 60° and Sun zenith < 70°. Then, the mean (µ) and standard deviation (σ) were calculated within the ROI. For the match-ups, the Mean Absolute Percentage Difference (MAPD) to investigate dispersion and MPD to investigate bias were calculated between OLCI and in situ data: where ( ) , and ( ) , are, respectively, in situ and OLCI-derived values for the band λ and match-up i. The filtering of satellite and in situ data was done following the EUMETSAT "Recommendations for Sentinel-3 OLCI ocean color product validations in comparison with in situ measurements -Match-up Protocols" [27].

Measurement of Chlorophyll-a.
Surface water samples were collected using 20 L Niskin bottles and between 1 and 6 L of seawater were filtered onto 0.7 µm GF/F filters. Discrete water samples were collected along the transects from an underway flow-through optical system and from Niskin bottle rosettes deployed with a Seabird Conductivity-Temperature-Depth sampling device. The water samples were filtered onto Whatman GF/F filters (nominal pore size of 0.7 µm), transferred to Cryovials and stored immediately in liquid nitrogen. High Performance Liquid Chromatography (HPLC) was then used to determine Total Chl a following the methods given in [28]. A WET Labs hyperspectral absorptionattenuation instrument (AC-S) was coupled to the ship's clean flow-through system, which continually pumps seawater from a nominal depth of 5 to 7 m beneath the ship's hull. The underway spectrophotometric method of determining Chl a is given in detail in [29]. Chl a concentrations were estimated from the absorption-attenuation instruments, using the absorption coefficient of particulate matter (ap(λ)) data at 650, 676 and 715 nm [29].

Results
Radiometric measurements were conducted at 32 stations (Figure 2), covering the Solar Zenith Angle (SZA) range of (6…60)° and ambient temperature range of (1...30) °C. Chlorophyll-a (Chl a) concentrations varied from (0.05…1.0) mg·m -3 . They were highest on the UK shelf at 48°N where they reached 0.8 mg·m -3 and also in the South Atlantic from 33°S to 49°S where they were between 0.7 mg·m -3 and >0.9 mg·m -3 ( Figure 3). For the results given in Figure 4 data from all stations were included without screening. In the following text and figures, The Seabird HyperOCR from Plymouth Marine Laboratory and TriOS RAMSES data from Tartu Observatory are denoted as "PML" and "TO", respectively.
The uncorrected and spectra for all 32 stations are shown in Figure 4 together with the expanded uncertainties. The scatter in at noon was caused by the high Solar Zenith Angle and by the sky conditions, which varied from overcast to clear. The water type was Case 1 during the whole campaign with slightly modified short-wavelength reflectance near the coastline in the beginning and end of the voyage, as characterized by the reflectance spectra (Figure 4, right).
The initial spectra ( Figure 4) showed two potential outliers from stations 67 and 50, respectively, the lowest and highest spectra in Figure 4. While from station 67 could be explained by poor measurement conditions (Table 1), the measurement conditions were optimal in station 50. Besides, as both radiometric systems resulted in the closely similar shape and absolute values of at these stations, these spectra were included in further analyses in order to show the resulting processing steps.
The uncertainty lower limits (3% for and 5% for , k=2) were determined by type B instrumental components and could be reached in the near-ideal measurement conditions.    (left) and (right) spectra, re-calculated for OLCI (Ocean and Land Color Instrument) channels. Lower panel: expanded uncertainties of downwelling solar irradiance ( ) and water-leaving reflectance ( ). The color denotes the station ID, as listed in Table 1.

Environmental Effects
The difference in results of downwelling irradiance between PML and TO, as a function of ambient temperature and as a function of Solar Zenith Angle, is shown in Figure 5. The ratios of (PML)/ (TO) were averaged over all OLCI band values and included all measurement stations.

Comparison Between the in situ Radiometric Systems
A comparison between the PML and TO radiometric systems is shown in Figure 6. Results are given as ratios (PML/TO) for ( ) , ( ) , and ( ) , with and without the NIR similarity correction (not applicable in the case of ). The outliers for ( ), ( ) ratios, without any correction (left panel on Figure 6) and with straylight correction (middle panel in Figure 6), originated from station 67. After NIR similarity correction, the consistency between TO and PML radiometric data was poorer, although there were no distinct outliers ( Figure 6). The straylight correction was applied to all data. An example of the straylight and NIR similarity correction effects on is shown in Figure 7.  The NIR correction could substantially decrease the values (Figure 7), but also retained the spectral shape ( Figure A1).
In order to compare measured by the RAMSES and HyperOCR instruments, only clear sky conditions were used (Figure 8, Figure A2). First, individual spectra were screened for the irradiance threshold of 1200 mW/m 2 at 560 nm. The resulting dataset was further reduced by rejecting all stations with an expanded uncertainty of more than 5%. As a result, 10 stations were included in the comparison of ( Figure 8) from the whole dataset ( Figure A3). The average ratio between PML and TO over all 32 stations, together with the expanded uncertainty, is shown in Figure 9. The median of En numbers of individual station ratios is shown in Figure 10 as well.

Figure 8. Comparison between TO and PML
(mW·m -2 ·µm -1 ) and respective uncertainties over all OLCI's band. The comparison in each band separately can be found in Figure A2 for clear conditions and in Figure A3 for all conditions. The agreement between different radiometric systems is shown in Figure 10. The yellow line is the mean ratio (PML/TO) for ( ) together with uncertainty bars, and the purple line is the respective median of En number. Comparison of ( ) values measured by PML and TO, after correcting for straylight only or after applying both straylight and NIR similarity correction, is shown in Figure 11. For straylight correction only ( Figure A4), the agreement between PML and TO was good overall wavelengths, whereas for straylight and NIR similarity correction ( Figure A5), the agreement became weaker from 510 nm wavelength. Figure 11. Comparison between TO and PML ρw after straylight (left) and straylight + NIR similarity correction over all OLCI's band. The comparison at each band can be found in Figure A4 for straylight correction and in Figure A5 for straylight + NIR similarity correction.
Agreement between values as measured by PML and TO, after correcting for straylight only or after applying straylight and NIR similarity corrections, is shown in Figure 12 and Figure A6.  Figure A6 for straylight correction.

Consistency Between in situ and OLCI Radiometric Data
After applying the filtering criteria on OLCI's data as described in section 2.3, five match-ups were obtained. Comparison of in situ and OLCI-derived reflectances for all five match-ups is shown in Figure 13 in the case of different corrections of the in situ results. The corresponding scatterplots are shown in Figure 14 over all wavelengths and in Figure A7 for the wavelengths up to 560 nm separately. The statistics describing the agreement between in situ and satellite measurements are concluded in Table 3. The OLCI to in situ ratios of w together with uncertainty limits are shown in Figure 15. Figure 13. Comparison of OLCI w against TO in situ measured w after straylight correction and after straylight + NIR similarity correction in five match-up stations. The error bars denote respective uncertainties for in situ data and standard deviation inside the Region of Interest (ROI) for OLCI. The general information for each station can be found in Table 1.

Figure 14.
Correlation between OLCI-derived and in situ-measured w processed with straylight and NIR similarity correction over five match-ups over all wavelengths. The error bars denote respective uncertainties for in situ data and standard deviation inside the ROI for OLCI. Table 3. The Mean Absolute Percentage Difference (MAPD) and Mean Percentage Difference (MPD) as calculated between Ocean Land Colour Instrument (OLCI) and in situ w data measured by TO or PML over five match-up stations. Straylight and NIR similarity correction was applied to in situ data. The mean in situ uncertainty for both radiometers is shown in the right column.

MAPD (%)
MPD ( Figure 15. The ratio between mean OLCI w against in situ w (TO above and PML below) with straylight and NIR similarity corrections over four match-up stations (22,32,48,56). The error bars denote the uncertainty for the ratio calculated from in situ and OLCI's data.

Data Filtering Procedure
The aim of the paper was to compare the performance of the two different radiometric systems over different environmental conditions and water types. The variable environmental conditions (e.g. -53.65 < latitude (°) < 48.93; -38.05 < longitude (°) < -7.62; 5.84 < sun zenith angle (°) < 60.54; 1.48 < wind speed (m·s−1) < 19.71; 0.9 < temperature (°C) < 28.3) allowed the comparison of radiometric data slightly outside of the strict rules applied to produce validation datasets for satellites. This is important in order to show the reliability of the in situ measurements. It is likely that the recommended optimal conditions in satellite validation protocols are associated with the measurement limits of the radiometers. These do not always correspond to the expectations of the users nor the realistic measurement conditions. Therefore, to study the behavior of existing radiometers close to (or even beyond) the specification limits is important, in order to plan the nextgeneration systems.
Outliers in the datasets are typically caused by: 1) instrumental errors; 2) unsuitable measurement conditions or natural variations outside the acceptable limits; 3) methodical errors in sampling and statistical treatment of results [11]. In our dataset, two potential outlier stations were present (Figure 4, Figure 6, Figure 7). Because two different measurement systems resulted in nearly identical results over these stations, the instrumental and sampling errors could be excluded. Measurement conditions fell into the recommended limits in terms of SZA. Measurement procedures, including most aspects of sampling, were the same for both systems in all stations. Therefore, there was no clear reason for removing these stations from the dataset, and, instead, the behavior of data was analyzed during the processing. Nevertheless, in order to avoid biased conclusions, we acknowledge the need to study the potential outliers and remove them if justified. The uncertainty also needs to be considered as a criteria for removal of the outliers and further research and analysis is required on this. Moreover, during the previous phases of the FRM4SOC project, the investigation of the outliers (not caused by the instrumental errors) helped to reveal errors in the recommended measurement and processing procedure and in the instrument characterization [9,11]. This would not be possible when the outliers have been removed from the datasets without identifying the causes for them. As a result, no data were screened out from the AMT27 dataset, except in the case of clearsky Ed comparison (Figure 8). However, the in situ data of the five match-ups used for S-3 OLCI validation agreed well with the main group of stations and had mean uncertainty (6…7)% for (400…510) nm, which increased towards longer wavelengths (Table 3).

Comparison of Radiometric Measurement Systems on a Moving Vessel
The intercomparison of simultaneous radiometric measurements on the AMT27 field campaign allowed us to assess the consistency of data and investigate the uncertainties on a moving vessel. An intercomparison exercise of optical systems at the stable, AAOT during near-ideal conditions showed spectrally averaged values of relative differences comprised between −1% and +6% and spectrally averaged values of absolute differences around 6% for the above-water systems and 9% for buoybased systems [30]. As expected for a moving vessel, we found the differences to be slightly higher, depending both on the corrections and also on varying environmental conditions. Comparison of the in-water and above-water results of radiometric measurements [31] gave the best agreement at the 490 nm band, especially in Case 1 waters during cloud-free conditions, highlighting again that the agreement was better under ideal measurement conditions with minimal impact from the environmental parameters. Due to the responsivity of the radiometers and the influence of the environmental conditions on the radiance signal, the reflectance at wavelengths > 650 nm was weak and noisy, which is characteristic of these Case 1 oligotrophic waters [32]. The results above 650 nm were included to show the effects of straylight and NIR similarity corrections on the in situ radiometric data, especially in the context of comparisons with S-3A OLCI data. The signal at 760 nm was further affected by oxygen absorption, which was clearly sharper than the spectral resolution of the TRiOS and Seabird radiometers. The behavior of the corrected signal around this narrow spectral band showed an improvement using straylight correction and also revealed a possible shift in the wavelength scales of the radiometers (the wavelength scales of radiometers were not individually tested during this study). A large increase in the uncertainties above 650 nm was expected and did not limit the use of these radiometers for Case 2 waters, where reflectance in the red and NIR was substantially higher [11]. Nevertheless, even with the weak signal in Case 1 waters, the comparison showed agreement within the evaluated uncertainty limits.
Simulations from [26] showed that the NIR similarity spectrum correction applied in this study, was valid for waters and was (780)>0.0001. All (780) values, including the type-A uncertainty, were above the threshold except one data point measured at very high wind speed (19.71 m·s -1 , station id 67 from Table 1). The mean values, both for TO and PML systems were (780) > 0.004 before the NIR correction. Without the NIR similarity correction, the values for the ρw were too high due to sunglint. Therefore, the NIR correction was included in the standard data processing scheme, and its effect were evaluated. The results showed that for clear waters, with a very low radiometric signal and high noise in the NIR, the NIR similarity correction removed this and gave reasonable results. Although various corrections exist for Case 1 waters (e.g., [33,34]), due to the low signal/high noise in the NIR, it is difficult to choose a specific correction as the noise is higher than the signal strength, and these parameters are evaluated separately in the processing steps. This would require a reference value to eliminate the effect of wind and then to estimate the most accurate correction factor for each above-water measurement made. Historically, the scientific community has been encouraged to develop improved and universal correction methods for waters with very low signal in the NIR. To do this properly, however, it requires the development of a new instrument with improved signalto-noise performance in the NIR spectral range and independent reference methods that are not dependent on the wind that causes air-water surface effects and artifacts.

Environmental Effects
The field data revealed variability in the responsivity between TRiOS and Seabird sensors calibrated at the same laboratory, which depended on the ambient and illumination conditions. The variability in responsivity of both sensors was likely to be much larger compared to the change of the signal ratio but was masked by the similar behavior of sensors. The differences in Figure 2 varied from approximately -5% to +5% over the temperature range from 1 °C to 30 °C, which was a substantially smaller change than expected temperature effect determined for other TriOS sensors [35]. A slight dependence on the straylight correction was also evident. The irradiance sensors showed the best agreement at 21 °C, which corresponded to the calibration temperature. Thus, the characterization of thermal effects in the full temperature range of field measurements would improve the traceability of results to SI considerably. In this study, due to the lack of characterization data, the temperature correction was not applied.
Differences in downwelling irradiance between PML and TO as a function of SZA showed that the variation was in agreement with known or expected errors of the cosine collectors of the sensors, which were within ±3% [11]. The ratio of -s showed no correlation with SZA or temperature, while the ratio of ( ) showed the opposite pattern compared to the ratio for -s, as expected. By comparison with the temperature and angle effects, straylight effects were negligible and given in Figure 5 for reference only.

Comparison of Water-Leaving Reflectance and Radiance Spectra
Agreement between values measured by TO and PML over all stations and after correcting for straylight and NIR showed good agreement without any bias over all wavelengths. In general, the TO measured values were slightly higher than PML data. The best agreement was for bands (400…442.5) nm (R 2 =0.99, Figure A5), it was slightly lower for bands (490…560) nm (R 2 decrease 0.96 to 0.82, respectively, Figure A5), and rapidly decreased towards longer wavelengths, where the signal was small and negligible. The agreement between TO and PML for after applying two corrections was good at all wavelengths ( Figure A6), although TO measured values were slightly higher than PML ( Figure 12).
The NIR similarity correction increased both the bias and the scatter between the and ratios, but this could be expected as the signal levels in the Case 1 waters above 550 nm as a result of this correction normally would be extremely low ( Figure 11, and after straylight and NIR similarity correction).
The straylight effects for individual sensors, as well as for the derived quantities, were mostly within ±3%, which was consistent with previous results [10,11,36]. The effect of straylight correction on the is shown in the upper panel of Figure 7. The straylight correction was applied to all of the data, regardless of the other corrections. The NIR similarity correction [25,26], on the other hand, changed the and spectra significantly and converged the outliers closer to 1 ( Figure 6, Figure  7, lower panel). The NIR similarity correction removed the shift in (and ) spectra, which was probably caused by high tilt, waves, and skyglint. While the agreement between the two in situ radiometric systems could be satisfactory or even better without the NIR similarity correction ( Figure  6, Figure 7), distortion of the spectra was evident (Figure 13), and the comparison with satellite data showed significant bias without the NIR correction ( Figure 13).

Comparison of in situ and OLCI Radiometric Data
Agreement between OLCI and in situ values, after both straylight and NIR similarity corrections were applied, was very good in terms of the shape of the spectra and absolute values (Figure 13). Although a similar shape in the spectrum was obtained using straylight correction only, it was significantly larger than the OLCI-derived spectrum, especially at longer wavelengths ( Figure 13).
Based on the five match-ups available, there was good agreement between OLCI and in situ derived values for bands (400…510) nm and respective uncertainties less than 7%. For bands 620 nm onwards, the signal was very weak, and the relative uncertainties of the in situ data increased (> 30%); the signal to noise ratio of the OLCI's five cameras was lower than for blue bands [37], which made the validation of satellite radiometric products over Case 1 waters in this range challenging. The comparison between OLCI and in situ for both PML and TO from 400 nm to 560 nm showed comparable results ( Figure 14, Table 3), though at 412.5 nm, the dispersion was up to 16%, and bias was 15%. This was mainly caused by OLCI at stations 22 ( Figure A7) and 56 ( Figure 13), which had a steep decrease from 400 nm to 412.5 nm, and was not seen in the field measurements. The reason could be due to poor characterization of the aerosol load and type in the S-3A OLCI atmospheric correction [38,39]. Five match-ups, however are too few to make robust conclusions on the accuracy of S-3A OLCI. In this study, they were used as a preliminary analysis of the potential difference between using two different above-water systems for the validation of S-3A OLCI. For a comprehensive accuracy assessment, more match-up data are required.
The estimation of the uncertainty of the ratio of OLCI and in situ was up to 40% ( Figure 15). The combined uncertainty for the ratio of (OLCI) and (in situ) ( Figure 15) was within (10…16)% for the (400…510) nm, 39% for 560 nm, and above 100% for longer wavelengths, where the signal was very small (Figure 15). The contribution from in situ uncertainty was (6…7)% for bands (400…510) nm, about 11% for 560 nm, and 30% for longer wavelengths. The difference in the mean uncertainty between the TO and PML was within 1% for bands up to 510 nm and increasing towards longer wavelengths.
S-3 mission requirements [3] specify a 5% uncertainty for bands (490, 510, 560) nm in Case 1 waters and (5…10)% uncertainty for bands (400, 412, 442) nm depending on the water type. As per pixel uncertainty estimates on OLCI Level 2 data that do not include the uncertainty estimate from Level-1B products [40], it was currently not possible to validate the OLCI products against in situ data using uncertainty estimates for OLCI. The uncertainties for the match-ups were ~6% for the bands (400…510) nm, and based on the limited number of match-ups available, there was a difference of up to 16% between in situ and OLCI data.

Conclusions
The results of the AMT27 field intercomparison are a major international step in assessing differences in commonly used above-water radiometers under different environmental conditions on a moving ship. The AMT campaign was also a significant step, in the framework of both the FRM4SOC project and the international community, in enhancing the confidence in different sources of radiometric data used for satellite ocean color validation.
In general, the agreement between the two in situ systems during the field campaign was satisfactory, with up to a 5% difference over visible wavelengths before corrections applied to . Over the wavelength range from (400…510) nm, the relative mean uncertainty of in situ data was close to the S-3 mission requirements of 5%, but with an increase in wavelength beyond 500 nm, the relative uncertainty also increased mainly due to unstable targets, highly variable environmental conditions, and the low signal at red bands in oligotrophic waters.
The consistency between the satellite and in situ data for both sets of radiometers was similar. From 412 nm to 560 nm, the Seabird system showed -8% < MPD < 12% difference and TriOS 2% < MPD < 15% difference compared to OLCI data. The consistency between the in situ and OLCI radiometric data was good, and the respective uncertainties were less than 7%.
SI traceable calibration of radiometers before or after field campaigns is very important to ensure traceability. Calibration alone is not sufficient however, and to trace where the variability between radiometric systems comes from, it is necessary to characterize a number of other parameters, including temperature dependence, nonlinearity and spectral straylight. In parallel to correction for instrument bias, and to improve measurement uncertainties, a detailed characterization of environmental conditions during deployment is required. The effect of different processing corrections applied to different radiometric sensors, including NIR similarity correction, correction for straylight, non-linearity, also needs further testing using a simultaneous independent reference instrument that is less affected by these systematic effects. ( ) spectra at each measurement station corrected for straylight (green circles) and after the straylight and NIR similarity correction (brown triangles). More data from each station can be found in Table 1. Figure A7. Scatterplots for the validation of OLCI radiometric data for bands 400 to 560 nm using PML's and TO's in situ data for five data match-ups.