Mapping the Bathymetry of Melt Ponds on Arctic Sea Ice Using Hyperspectral Imagery

: Hyperspectral remote-sensing instruments on unmanned aerial vehicles (UAVs), aircraft and satellites o ﬀ er new opportunities for sea ice observations. We present the ﬁrst study using airborne hyperspectral imagery of Arctic sea ice and evaluate two atmospheric correction approaches (ATCOR-4 (Atmospheric and Topographic Correction version 4; v7.0.0) and empirical line calibration). We apply an existing, ﬁeld data-based model to derive the depth of melt ponds, to airborne hyperspectral AisaEAGLE imagery and validate results with in situ measurements. ATCOR-4 results roughly match the shape of ﬁeld spectra but overestimate reﬂectance resulting in high root-mean-square error ( RMSE ) (between 0.08 and 0.16). Noisy reﬂectance spectra may be attributed to the low ﬂight altitude of 200 ft and Arctic atmospheric conditions. Empirical line calibration resulted in smooth, accurate spectra ( RMSE < 0.05) that enabled the assessment of melt pond bathymetry. Measured and modeled pond bathymetry are highly correlated ( r = 0.86) and accurate ( RMSE = 4.04 cm), and the model explains a large portion of the variability ( R 2 = 0.74). We conclude that an accurate assessment of melt pond bathymetry using airborne hyperspectral data is possible subject to accurate atmospheric correction. Furthermore, we see the necessity to improve existing approaches with Arctic-speciﬁc atmospheric proﬁles and aerosol models and / or by using multiple reference targets on the ground.


Introduction
From May to September, evolution and refreezing of melt ponds at the sea ice surface dramatically changes the energy and momentum exchange between atmosphere and ocean in the Arctic. Melt ponds have a substantially lower albedo than bare ice or snow and significantly affect the heat and mass balance of sea ice e.g., [1]; therefore, they are considered the main driver of the ice albedo feedback mechanism [2]. Moreover, the amount of radiative energy stored in ponds and open water governs the onset of freezing in autumn [3]. Likewise, melt ponds increase the amount of photosynthetically active radiation transmitted through the ice into the upper ocean up to a factor of about four [4][5][6][7] fostering primary production [4,8], warming of the upper ocean layer [9], and bottom ice melt [10].
The temporal evolution of melt ponds is characterized by a strong seasonality and generally follows the surface metamorphism and change of phases from dry to melting snow, pond formation, pond drainage, pond evolution, and fall freeze-up [10,11].
Melt pond coverage, i.e., areal fraction of melt ponds on sea ice, depends significantly on sea ice surface topography which is in turn strongly influenced by deformation and aging processes, snow cover and ice permeability [12]. Undeformed first-year ice is characterized by a homogenous snow cover enabling lateral spreading of shallow ponds [13,14], with pond depth mostly < 50 cm [15,16], and cover up to 80% of the ice surface [6,17]. Multi-year ice shows a preexistent hummocky topography

Materials and Methods
We applied the model by [72] to atmospherically corrected, airborne hyperspectral imagery acquired with an AisaEAGLE sensor covering the wavelength region between 400 nm and 970 nm (see Section 2.2 for details), and validated results using in situ pond depth measurements. We compare two atmospheric correction approaches and evaluate their performances using field data. In order to evaluate the performance of the pond depth model, we validate calculated pond depths using field measurements and inspect bathymetry maps for plausibility.

Field Data
For this study, we gathered the data during RV Polarstern cruise PS106 as part of an 11-days ice station north of Svalbard [77] (see also Figure 1). Melting had just begun and most of the floe was still snow-covered but some ponds had already formed in ridged terrain. We sampled four ponds on 10 June 2017. Figure 1 also illustrates that the ponds differed in color: Ponds 2 and 3 were bright blueish, Pond 4 was dark grey-blue, while Pond 1 was split into a dark and a bright half. The ice floe consisted mostly of first-year and, to a lesser degree, of second-year ice. Occasional ice thickness measurements from 14 June 2017 show that the bottom of the bright ponds was thicker (≥0.9 m) than the dark pond's bottom (≤0.4 m), indicating that ponds 2 and 3 may have been located on older ice while the ice below pond 4 may have been younger. However, no ice cores were sampled at the pond site, prohibiting a reliable statement about ice type.

Pond Measurements
We performed measurements of pond depth (n = 60) either from the edge of a pond (Ponds 1, 2 and 4) or while wading through (Pond 3) using a folding ruler. Measured depth ranged between 1.3 cm and 25.0 cm. As described in [72], the measurement of pond depth is not trivial as the bottom of a pond may be highly variable, e.g., solid, slushy or riddled with holes and cracks, and may change on centimeter scales.
At every point, we also measured remote-sensing reflectance over the water surface [78]. The measurement setup consisted of two Ocean Optics STS-VIS spectrometers (Ocean Optics Inc., Douglas Avenue Dunedin, FL, USA). One downwards pointing spectrometer equipped with a 1 • fore optic that we referenced with a Labsphere Spectralon 99% diffuse reflectance standard (Labsphere Inc., North Sutton, NH, USA) to acquire remote sensing reflectance; and a second, upwards pointing spectrometer equipped with a cosine collector to track changes in downwelling irradiance. We used the latter to correct the data acquired with the first spectrometer. Both spectrometers cover the wavelength region from~340 nm to~820 nm with a sampling interval < 1 nm and a spectral resolution of 3 nm [79]. We

Pond Measurements
We performed measurements of pond depth (n = 60) either from the edge of a pond (Ponds 1, 2 and 4) or while wading through (Pond 3) using a folding ruler. Measured depth ranged between 1.3 cm and 25.0 cm. As described in [72], the measurement of pond depth is not trivial as the bottom of a pond may be highly variable, e.g., solid, slushy or riddled with holes and cracks, and may change on centimeter scales.
At every point, we also measured remote-sensing reflectance over the water surface [78]. The measurement setup consisted of two Ocean Optics STS-VIS spectrometers (Ocean Optics Inc., Douglas Avenue Dunedin, FL, USA). One downwards pointing spectrometer equipped with a 1° fore optic that we referenced with a Labsphere Spectralon 99% diffuse reflectance standard (Labsphere Inc., North Sutton, NH, USA) to acquire remote sensing reflectance; and a second, upwards pointing spectrometer equipped with a cosine collector to track changes in downwelling irradiance. We used the latter to correct the data acquired with the first spectrometer. Both spectrometers cover the wavelength region from ~340 nm to ~820 nm with a sampling interval < 1 nm and a spectral resolution of 3 nm [79]. We

Measurement Localization
On a drifting ice floe, the accurate localization of measurements taken at different times inside a high-spatial resolution remote-sensing image is a major challenge and small errors in localization may result in large errors when comparing derived and measured water depth. We used handheld Global Positioning System (GPS) devices to track our position on the floe and intended to correct the ice drift by using a triangulation technique with three stationary GPS devices as described in [29]. Spatial accuracy of the corrected locations, however, was insufficient. Therefore, we localized measurement positions in the AisaEAGLE images manually by thoroughly interpreting measurement logs and field photographs.
In Pond 3, we installed a cord system that facilitated reconstruction of measurement locations. We measured pond depth at equal distances of 100 cm along the cords that were tied to wooden sticks, which could be identified in an airborne image. We registered the image to the georeferenced hyperspectral data manually using prominent image features as control points. Knowing the distance of each measurement location to the ends of the respective cord, we were able to locate measurements in the hyperspectral image accurately.
We used a flagging system to provide information about the spatial localization uncertainty. Localization is very certain for measurements along the cord system, quite certain for measurements close to prominent sea ice features that are apparent on field photographs and in remote-sensing imagery, and less certain otherwise. To address the uncertainty, we buffered each measurement position with a radius of 20 cm for the very certain, 30 cm for the quite certain, and 40 cm for the less certain measurements. For each spatial buffer we computed the mean and standard deviation of all pixels whose center is located within. Figure 2 presents buffered measurement locations and measured pond depths. Pond color and measured depth in Ponds 3 and 4 further show that visual pond color is primarily influenced by the optical properties of the pond bottom and not by pond depth.
We measured pond depth at equal distances of 100 cm along the cords that were tied to wooden sticks, which could be identified in an airborne image. We registered the image to the georeferenced hyperspectral data manually using prominent image features as control points. Knowing the distance of each measurement location to the ends of the respective cord, we were able to locate measurements in the hyperspectral image accurately.
We used a flagging system to provide information about the spatial localization uncertainty. Localization is very certain for measurements along the cord system, quite certain for measurements close to prominent sea ice features that are apparent on field photographs and in remote-sensing imagery, and less certain otherwise. To address the uncertainty, we buffered each measurement position with a radius of 20 cm for the very certain, 30 cm for the quite certain, and 40 cm for the less certain measurements. For each spatial buffer we computed the mean and standard deviation of all pixels whose center is located within. Figure 2 presents buffered measurement locations and measured pond depths. Pond color and measured depth in Ponds 3 and 4 further show that visual pond color is primarily influenced by the optical properties of the pond bottom and not by pond depth.

Spectral Reference Measurements
To assess the quality of the atmospheric correction, we placed a 2 × 2 m 2 reference target made of black awning fabric on the ice. The target's reflectance has been determined in the laboratory with an ASD LabSpec5000 spectrometer (Analytical Spectral Devices Inc., Boulder, CO, USA) covering the wavelength region from 350 nm to 2500 nm with a 1 nm spectral sampling rate and equipped with contact probe. At 14:35 UTC, we measured the nadir reflectance of an ice surface covered by the AisaEAGLE data using the same spectrometer and a bare fiber with a 23 • field of view (FOV) mounted to a pistol grip. For both measurements we used a 95 % diffuse Labsphere Spectralon reflectance standard (Labsphere Inc., United States) as white reference. We resampled both spectra to AisaEAGLE bands using the sensor's spectral response functions and central wavelengths. Figure 3 illustrates the location of both spectral targets and the corresponding field reflectance spectra. The polygons of the black and white target contain 362 and 73 pixels, respectively.
Remote Sens. 2020, 12, 2623 6 of 23 grip. For both measurements we used a 95 % diffuse Labsphere Spectralon reflectance standard (Labsphere Inc., United States) as white reference. We resampled both spectra to AisaEAGLE bands using the sensor's spectral response functions and central wavelengths. Figure 3 illustrates the location of both spectral targets and the corresponding field reflectance spectra. The polygons of the black and white target contain 362 and 73 pixels, respectively.  [80]) and provide aerosol optical thickness at five wavelengths (440 nm, 500 nm,  [80]) and provide aerosol optical thickness at five wavelengths (440 nm, 500 nm, 675 nm, 870 nm and 1020 nm), Angstrom Exponent between 440 nm and 870 nm as well was water vapor column height. From the sun photometer measurements, we computed the aerosol optical thickness (AOT) at 550 nm (β) as (Equation (1)): Remote Sens. 2020, 12, 2623 where AOT λ is the mean AOT at λ (500 nm) and α is the mean Angstrom exponent between 440 nm and 870 nm. Table 1 summarizes the atmospheric parameters.

Remote-Sensing Data
To our knowledge, this is the first study presenting airborne hyperspectral imagery from Arctic sea ice. We used hyperspectral data acquired with an AisaEAGLE sensor (Specim, Spectral Imaging Ltd., Oulu, Finland) owned by Alfred Wegener Institute, Helmholtz Center for Marine and Polar Research. The sensor was mounted on a platform especially designed for BK117-C1 helicopters, which are operated on RV Polarstern by HeliService International GmbH.
The sensor is a progressive scanner based on a CCD (charge-coupled device) array with 1024 spatial pixels and covers the spectral region from 400 nm to 970 nm with a FOV of 37.42 • [81]. A fourfold spectral binning resulted in 130 spectral bands with an average full width half-maximum (FHWM) of 4.6 nm. On 10 June 2017, a helicopter flight covering the pond area was performed under clear-sky conditions. Hyperspectral data was recorded between 11:39 UTC and 11:53 UTC at a flight height of 200 ft and at a ground speed of 10 kn, the corresponding sun zenith angle was 58.9 • . A twofold spatial binning resulted in a ground sampling distance of 8.5 cm.
The Aisa system was equipped with an RT3100 inertial and GPS measurement system (Oxford Technical Solution Ltd., Bicester, UK). It is a combination of a Global Navigation Satellite System (GNSS) to provide GPS position and altitude while a 3-axial inertial measurement unit (IMU) is used to provide the orientation in 3-D space with a roll/pitch accuracy of 0.05 • and a heading accuracy of 0.1 • .

Radiometric and Geometric Preprocessing
12-bit raw data [-] were converted to at-sensor spectral radiances [W/ m 2 m sr ] by applying the sensor's radiometric calibration coefficients. Since the data showed noticeable striping, we applied a vertical stripe removal from the Spectral Processing Exploitation and Analysis Resource (SPEAR) toolbox in ENVI (Exelis Visual Information Solutions Inc., United States; v5.4.1).
Usually, the AisaEAGLE is equipped with a Fiber Optic Downwelling Irradiance Sensor (FODIS) to track changes in downwelling irradiance [W/ m 2 m ] and to compute at-sensor remote-sensing reflectance [sr −1 ] (R rs ). Measurements of downwelling irradiance, however, were influenced by rotor blade interference and partial coverage of the sensor's hemisphere by the helicopter, which prevented the use of the FODIS data.
A geographic lookup table (GLT) containing the pixel's location in the WGS84 UTM zone 31N projected coordinate system was created with CaliGeo (4.9.5) software (Specim, Spectral Imaging Ltd., Finland) using the IMU and GPS measurements.

Atmospheric Correction
The accurate removal of effects due to atmospheric absorption and scattering is required to produce measures of surface reflectance. In general, complex radiative transfer models are used to simulate the incoming solar irradiance, subsequent atmospheric effects and the final at-sensor radiance; atmospheric effects can then be eliminated and surface reflectance spectra derived. Atmospheric correction is a common preprocessing step and use of an appropriate, thorough correction is of great significance for interpretation of hyperspectral imagery and any subsequent processing [82].
On 10 June 2017, high pressure areas over Barents Sea and Svalbard caused the predominant subsidence temperature inversion reaching the ground. Prevailing low stratus clouds vanished and temperatures reached +2 • C [77]. The operator considered the atmospheric conditions during the flight as clear and calm, with blue sky conditions. Sun photometer measurements confirmed a low aerosol particle concentration as common over the inner Arctic sea ice in summer [83] where absolute humidity is usually low [84]. The low flight altitude of 200 ft reduces the influence of atmospheric absorption and scattering. However, adjacency effects may be strong in the contrast-rich sea ice environment [85]. To provide remote-sensing data in units of surface R rs , we applied two different atmospheric correction techniques, i.e. an approach based on a radiative transfer model and an empirical line based method.

Atmospheric and Topographic Correction Version 4 (ATCOR-4)
We used ATCOR-4 (Atmospheric and Topographic Correction version 4; v7.0.0), which is an established standard for atmospheric correction of wide FOV optical and thermal airborne remote-sensing imagery [86]. Atmospheric correction is performed by inversion of a look-up-table (LUT) pre-compiled with the Moderate-Resolution Atmospheric Radiance and Transmittance Model-version 5 (MODTRAN-5) [87] radiative transfer model. ATCOR-4 enables the correction of adjacency effects and has the advantage that aerosol type and maps of aerosol optical thickness and water vapor may be derived from the imagery if a sensor offers appropriate spectral bands and the image contains the required surface types [88].
In the absence of a digital ice surface model, we used ATCOR-4 for flat terrain to perform atmospheric correction on the radiometrically corrected image. We resampled the LUT to the AisaEAGLE bands using the sensor's spectral response functions and turned off the band interpolation in the wavelength 725/825 nm water vapor region. We increased the thresholds to compute the cloud mask in the blue-green wavelength region and the water mask in the near infrared to the respective maximum values of 80 % and 12 % and used the land-average water vapor (WV) for water pixels. Because an image-based retrieval of aerosol type is not possible for flight altitudes below 1 km, we manually defined a maritime aerosol type. We selected an atmospheric file with a flight altitude of 0 km above sea level that was later interpolated to a flight altitude of 0.1 km. Based on flight altitude we defined an adjacency range of 0.006 km according to [88]. The small amount of dark pixels in the image prohibited the computation of a variable visibility map. The image based visibility estimation resulted in a visibility of 100 km. We used the 820 nm region with band regression to compute a WV map before performing image processing. Eventually, we used the GLT to georeference the atmospherically corrected image. The output image is in units of surface reflectance [%*100] per band [µm]. For the follow-up processing, we converted the data into surface reflectance [-] per band [nm].

Empirical Line Calibration
In general, empirical line calibration produces very accurate results if ground-truth information is available. We first used the GLT to georeference the radiometrically corrected image and then used the empirical line calibration in the Tactical Hyperspectral Operations Resource (THOR) Atmospheric Correction Workflow in ENVI that requires one bright and one dark target with known reflectance located in the image. Here, we used the laboratory measurement of the black target, the field measurement of the ice surface and the reflectance measurements of the regions of interest displayed in Figure 3. The output image is in units of surface reflectance [-] per band.

Retrieval of Melt Pond Depth
To develop a model that is widely independent of the bottom albedo and robust against bias induced through the sample characteristics, [72] computed a library of melt pond spectra for five different bottom albedos and pond depths between 0 and 100 cm using the Water Color Simulator WASI (v4.1) [89][90][91]. They log scaled the spectra to account for the exponential absorption of light in water and computed the slope at each band by applying a Savitzky-Golay filter. The final model uses the 710 nm band and accounts for changing slope and y-intercept of the linear equation resulting from changing sun zenith angles. They validated the model using in situ R rs data generating a high accuracy (root-mean-square error (RMSE) = 2.81 cm). Simulated and measured pond depths were highly correlated (r = 0.89), and the model explained a large amount of the depth variations (R 2 = 0.74) [72].
To convert the remote-sensing data from surface reflectance into R rs , we divided the data by π. Then we linearly interpolated bands in 1 nm steps and applied a rolling mean filter with a window size of 5 nm, log-scaled each spectrum and computed the slope at each band by applying a Savitzky-Golay filter using a second order polynomial fit with a window length of 27 nm. We then applied the model of [72] to the data (Equation (2)): with z being the predicted pond depth and θ sun being the sun zenith angle. Offset (a) and slope (b) are computed as (Equations (3) and (4)): a(θ sun ) = −20.6 + 0.79 [cm]. (4)

Evaluation of Atmospheric Correction
To evaluate the atmospheric correction techniques, we compared results with field spectra. We compared the field spectra of the black target and the ice surface with the average surface reflectance within the respective regions of interest (ROIs) illustrated in Figure 3. Since we used the same information for the empirical line calibration, we further compared the average reflectance spectra from four melt pond buffers with the respective in situ spectra.
To find suitable melt pond buffers for comparison, we used the radiance data and computed the mean and standard deviation for each spectral band within each buffer. For each buffer we then normalized the average standard deviation with the average mean reflectance assuming that a small normalized average standard deviation refers to a small heterogeneity within the buffer, i.e., the buffer mean reflectance is a good candidate for comparison. Eventually, we chose the four buffers with the smallest normalized average standard deviations.
We resampled the field data to AisaEAGLE bands using the sensor's spectral response functions and extracted the mean surface reflectance from the corresponding ROIs to compute the RMSE according to [92,93] as: where y i is the in situ reflectance andŷ i is the buffer mean reflectance at the i-th band and n is the number of bands. We computed the RMSE for all bands in the wavelength region where the respective sensors overlap and, due to the relevance for the depth retrieval, separately for the wavelength region between 670 nm and 750 nm.

Evaluation of Bathymetry Retrieval
To quantify the quality of the depth retrieval we computed Pearson's correlation coefficient (r) according to [94] as (Equation (6)): where x i is measured depth at the i-th sample and y i is the mean simulated depths inside the i-th buffer area;x is the average measured depth andý is the average buffer mean and n is the number of samples. We further computed the RMSE according to Formula 4 with y i being the measured depth andŷ i being the mean predicted value of the buffer of the i-th sample and n being the number of samples. As a third measure we computed the coefficient of determination (R 2 ) according to [95] (Equation (7)): where (Equation (8) Note that according to this definition R 2 may possibly be negative indicating a complete lack of fit. Figure 4 illustrates that ATCOR-4 generally overestimates reflectance but retrieved spectra roughly match the shape of the ground truth measurements. RMSEs are 0.12 and 0.1 for the bright and dark targets and corresponding wavelength region from 400 nm to 973 nm, and between 0.08 and 0.11 for the pond spectra and corresponding wavelength region ranging between 400 nm and 823 nm. Melt pond spectra are spiky in the blue-green wavelength region and the bright ice surface spectrum shows spikes and peaks over the entire spectrum. Melt pond and ice spectra are characterized by a strong increase in reflectance between 400 nm and 430 nm, while the dark target spectrum is characterized by decreasing reflectance between 400 nm and 450 nm. All spectra show a distinct increase in the wavelength region above~900 nm (note that the linear behavior results from interpolation in the 940 nm water vapor region). The large standard deviation of the dark and bright target indicates that both targets are characterized by a relatively high spectral variability, while the pond spectra inside the buffer areas are less variable.

ATCOR-4
In the wavelength region between 670 nm and 750 nm, RMSE for the dark and bright spectra are 0.07 and 0.1, respectively. The RMSE for the pond spectra is between 0.05 and 0.07. Figure 5 illustrates that a peak at~689 nm characterizes all spectra.
Measured and modeled depth values are highly correlated (r = 0.85) but modeled depths remain underestimated ( Figure 6). This finds expression in the line of best fit's negative offset (−8.25) and results in a negative R 2 (−2.24) and high RMSE (11.57 cm). The slope of 0.83 indicates an increasing underestimation towards larger depths.

Figure 4. Comparison between Atmospheric and Topographic Correction (ATCOR) and field spectra for the dark (A) and bright (D) targets and four melt pond buffers (B,C,E,F). Root-mean-square error (
) is computed for the overlapping wavelength regions.
In the wavelength region between 670 nm and 750 nm, RMSE for the dark and bright spectra are 0.07 and 0.1, respectively. The RMSE for the pond spectra is between 0.05 and 0.07. Figure 5 illustrates that a peak at ~ 689 nm characterizes all spectra. is computed for bands between 670 and 750 nm.  In the wavelength region between 670 nm and 750 nm, RMSE for the dark and bright spectra are 0.07 and 0.1, respectively. The RMSE for the pond spectra is between 0.05 and 0.07. Figure 5 illustrates that a peak at ~ 689 nm characterizes all spectra. is computed for bands between 670 and 750 nm. Measured and modeled depth values are highly correlated ( = 0.85) but modeled depths remain underestimated ( Figure 6). This finds expression in the line of best fit's negative offset (−8.25) and results in a negative ² (−2.24) and high (11.57 cm). The slope of 0.83 indicates an increasing underestimation towards larger depths.  Figure 7 illustrates that the empirical line calibration increases the accuracy of the retrieved spectra. Figure 7A and D show that the mean radiance spectra perfectly fit with the field spectra ( = 0.0). In comparison to the field spectra, retrieved pond spectra overestimate reflectances in the wavelength region between 400 nm and ~700 nm, whereby overestimation decreases towards longer wavelengths. At wavelengths beyond 700 nm, the empirical line method results in an underestimation, with values dropping below zero reflectance and an increasing standard deviation beyond 950 nm.  Figure 7 illustrates that the empirical line calibration increases the accuracy of the retrieved spectra. Figure 7A and D show that the mean radiance spectra perfectly fit with the field spectra (RMSE = 0.0). In comparison to the field spectra, retrieved pond spectra overestimate reflectances in the wavelength region between 400 nm and~700 nm, whereby overestimation decreases towards longer wavelengths. At wavelengths beyond 700 nm, the empirical line method results in an underestimation, with values dropping below zero reflectance and an increasing standard deviation beyond 950 nm.   for the pond spectra is small (0.01-0.02) indicating a good fit. The slope of the curves, however, slightly differs. Generally, the slope of the retrieved curve is slightly steeper than the slope of the field measurements.  Figure 8 illustrates that the empirical line calibration results in a perfect fit of the retrieved spectra with field spectra of both targets (RMSE = 0.0). RMSE for the pond spectra is small (0.01-0.02) indicating a good fit. The slope of the curves, however, slightly differs. Generally, the slope of the retrieved curve is slightly steeper than the slope of the field measurements. is computed for the overlapping wavelength regions, respectively. Figure 8 illustrates that the empirical line calibration results in a perfect fit of the retrieved spectra with field spectra of both targets ( = 0.0). for the pond spectra is small (0.01-0.02) indicating a good fit. The slope of the curves, however, slightly differs. Generally, the slope of the retrieved curve is slightly steeper than the slope of the field measurements. Due to the good performance of the empirical line calibration, the pond depth retrieval shows higher accuracies compared to the ATCOR-4 results. We excluded pixels with depths ≤0 from the analysis to avoid an influence of adjacent ice pixels incorporated in the buffered areas on the statistics. Due to the good performance of the empirical line calibration, the pond depth retrieval shows higher accuracies compared to the ATCOR-4 results. We excluded pixels with depths ≤0 from the analysis to avoid an influence of adjacent ice pixels incorporated in the buffered areas on the statistics.   Correcting the offset ( Figure 10) improves RMSE (4.05 cm) and R 2 (0.74). Pond 1 in Figure 11 demonstrates that the model is insensitive to pond bottom color since the bright and dark areas do not correspond to the derived bathymetry.

Empirical Line Calibration
Correcting the offset ( Figure 10) improves (4.05 cm) and (0.74). Pond 1 in Figure 11 demonstrates that the model is insensitive to pond bottom color since the bright and dark areas do not correspond to the derived bathymetry.

Discussion
To the best of our knowledge, this study presents a first analysis of airborne hyperspectral image processing and analysis of Arctic sea ice. The setup consisting of spectral targets, field and atmospheric measurements is typical for airborne hyperspectral campaigns.

ATCOR-4
Most studies lacking atmospheric measurements use standard atmospheric profiles and aerosol models based on a qualitative assessment of environmental conditions. The MODTRAN-5 LUTs integrated in ATCOR-4 cover a wide range of environmental conditions but have not yet been tested in the Arctic sea ice environment.
ATCOR-4 performs a basic scene classification to compute a haze/cloud/water map with 19 classes and applies thresholds to certain bands to compute water and cloud masks [88]. Even though we raised the thresholds for clouds and water to their respective maxima, they were inappropriate for the sea ice environment, e.g., the reflectance of snow, ice and even melt ponds can reach values

Discussion
To the best of our knowledge, this study presents a first analysis of airborne hyperspectral image processing and analysis of Arctic sea ice. The setup consisting of spectral targets, field and atmospheric measurements is typical for airborne hyperspectral campaigns.

ATCOR-4
Most studies lacking atmospheric measurements use standard atmospheric profiles and aerosol models based on a qualitative assessment of environmental conditions. The MODTRAN-5 LUTs integrated in ATCOR-4 cover a wide range of environmental conditions but have not yet been tested in the Arctic sea ice environment.
ATCOR-4 performs a basic scene classification to compute a haze/cloud/water map with 19 classes and applies thresholds to certain bands to compute water and cloud masks [88]. Even though we raised the thresholds for clouds and water to their respective maxima, they were inappropriate for the sea ice environment, e.g., the reflectance of snow, ice and even melt ponds can reach values beyond 80% in the Blue/Green. Consequently, the resulting image classification is inappropriate; 52.3% were misclassified as clouds, 34.2 % were misclassified as land and only 13.5 % were classified as water while snow was overlooked, most probably due to the absence of a 1.6 µm channel. An inspection of the WV map shows that retrieved values range between 1.2 and 1.5 cm over water pixels and between 0.75 and 0.9 for the rest. Both values, however, do not correspond well with the sun photometer measurements (1.1415 (± 0.0226) cm).
The low flight altitude impeded the image-based retrieval of an aerosol type. Due to the location of the study area in the Arctic Ocean, however, we chose a maritime aerosol model. The variable visibility option was unavailable because the image did not contain a sufficient number of dark pixels. ATCOR-4 uses the dense dark vegetation approach to find the most suitable visibility for an image. Given the absence of vegetation in the Arctic sea ice, this method proved unsuitable. Nevertheless, retrieval of the visibility was possible resulting in a value of 100 km. The corresponding AOT (~0.2), however, is ten times overestimated compared to the sun photometer measurements (~0.02). Yet, even the maximum visibility of 120 km would result in an enormous overestimation of AOT, indicating that the chosen LUTs do not perform well in the Arctic. At the same time, flight altitude and corresponding atmospheric thickness is overestimated due to the limitation to 0.1 km minimum altitude. These issues may account for the spikes and peaks in the ATCOR-4 results (Figure 4).
Sensor calibration may also influence the results. The sensor was calibrated with the helicopter's acrylic window installed. Temperature differences between calibration, storage and operation may have caused slight defects or staining of the window, which may explain some of the noise visible in Figure 11. Another cause may be spectral smile effects or uncertainty in the applied solar irradiance spectrum, especially in the 380-600 nm spectral region as described by [96].
ATCOR reflectance spectra show a strong increase in the wavelength region >900 nm. This increase is also visible in the radiance data and may result from second order light contribution from the blue end of the spectrum as observed for the CASI sensor in Antarctica by [97].

Empirical Line Calibration
As expected, results of the empirical line calibration accurately match field spectra indicated by the small RMSEs (<0.05). Compared to the spectra retrieved with ATCOR-4, the spectra with the empirical line calibration are smoother, indicating that atmospheric conditions were stable during acquisition.
Although flight altitude was~200 ft on average, it varied between 157 ft and 263 ft over the course of the flight. The corresponding change in atmospheric thickness has likely influenced the empirical line calibration. Over the ROIs and pond measurements used for the quality assessment, altitude varied only between 229 ft and 243 ft. The small RMSEs indicate only marginal influence but in other parts of the flight stripe where flight altitude differed considerably, influence may be larger.
The overestimation in the blue-green wavelength region (Figure 7) may result from skyglint at the water surface or from the temporal difference between field and airborne measurements. We used spectral targets for the empirical line calibration. However, illumination conditions and viewing angle between ground truth spectroscopy and remote-sensing data acquisition differed and we did not consider the targets' bidirectional reflectance distribution function (BRDF). For future studies, particular attention is, therefore, required for measurements simultaneously to the image acquisition and/or to the use of targets with known BRDF. In addition, the utilization of additional white and grey targets as proposed in [97] may improve atmospheric correction of airborne sensors in the future. The size of the targets depends on the sensor's FOV, flight altitude, and desired spatial resolution of the imagery and should be increased accordingly.
The empirical line calibration requires field information and, therefore, is logistically more challenging than LUT-based approaches. The identification of pseudo-invariant features in the Antarctic to facilitate empirical line calibration is recommended by [97]. However, in the constantly changing central Arctic Ocean, such surfaces generally do not exist.

Pond Depth Retrieval
Results illustrate that the accurate retrieval of pond depth from airborne hyperspectral imagery depends on the quality of the atmospheric correction and the accurate retrieval of surface reflectance. Even though ATCOR-4 overestimated surface reflectance and partially resulted in spikes and peaks, the quality obtained may be sufficient for robust applications such as the computation of spectral indices. For applications that rely on precise spectral information, however, the results are inadequate.
The empirical line calibration reduced the differences in reflectance in the wavelength region around 710 nm, but the difference in slope resulted in an overestimation of pond depths. Correcting for the offset of~4.5 cm improved RMSE and R 2 and performance is similar to the values presented in [72] (r = 0.89; R 2 = 0.74; RMSE = 2.81 cm). Figure 12 exemplifies that the model produces accurate results. The pond mask accurately extracts the two small ice islands and the L-shaped draw around the bright, elevated ice is also described by the model. Figure 12 further indicates that the model is possibly sensitive to shadows. Pond depth appears to be overestimated in the shadow of the ice block on the right side of Figure 12B. Results for Pond 1 ( Figure 11A) illustrate that spectral information in the VIS is unsuitable for the derivation of pond depths. Bathymetry patterns in Figure 11 do not correspond to bottom color in Figure 1 suggesting the applicability of the model in dark and bright ponds.
Frequent cloud cover in the Arctic Ocean hampers observations with optical satellite sensors but UAVs and helicopters may operate below clouds. In its current state, the model presented in [72], however, is only valid for clear sky conditions. More research and development is necessary to enable pond depth retrieval using airborne hyperspectral data acquired under overcast conditions.
As discussed previously in [72], more in situ data covering a greater variety of pond characteristics is necessary to further test the model's applicability. The results presented in this study extend the model's testing and indicate its applicability in shallow ponds but tests in deep ponds on multi-year ice are still pending. However, field data of increased spatial resolution is desirable in order to investigate the spatial variability of pond depths. This may be achieved by means of remotely operated sonar equipped boats e.g., [66,68]. Yet, the shallowness of melt ponds on sea ice, especially at melt onset, might be an issue for most sonars.

Spatial Uncertainties
Given the high spatial resolution of 8.5 cm and considering the highly variable pond bottom bathymetry, the precise localization of field measurements is important for the validation process. At this scale, however, GPS measurements were insufficiently accurate to localize measurements within the ponds. For our study, we benefit from the large amount of field photographs and detailed protocols that permitted us to reconstruct measurement locations accurately. However, precision of the localization remains indefinable and its influence on the evaluation remains uncertain.
We addressed the vague measurement localization by buffering the points with three different radii according to the estimated spatial uncertainty and, therefore, computed the mean of all pixels whose center is located within the buffer area. However, each mean is influenced by the frequency and range of values within a buffer, which is in turn influenced by the buffer's center location and radius, and the respective heterogeneity of the pond bottom included in the buffer. The standard deviation plotted in Figure 10 illustrates that the variability in pond depth within a buffer may be large, reaching a maximum range of 19.31 cm. Choosing the buffers' mean values for the computation of the performance metrics influences the results, e.g., choosing the pixel value from each buffer that fits the in situ measurements best would result in very high performance (r = 0.99; p = 0.00; RMSE = 0.96 cm; R 2 = 0.98).
In order to reduce the influence of spatial uncertainties on the interpretation, an improved localization of field measurements is therefore desirable for future studies. One option may be the establishment of a local reference system based on other techniques such as ultra-wideband positioning [98,99].
We used the IMU/GPS data recorded during image acquisition for the geometric preprocessing of the AisaEAGLE imagery. While the progressive scanner is moving, the ice is drifting independently at the same time. We did not correct for these effects in the geometric preprocessing but comparisons with photographs taken at nadir from helicopter indicate that geometric effects due to ice drift are negligible in this case. Nevertheless, the influence of sea ice drift on progressive scanner image geometries may be worth investigating in future studies but requires accurate information about sea ice kinematics.

Conclusions
In this paper, we presented a novel study dealing with the acquisition, processing and atmospheric correction of airborne hyperspectral imagery in Arctic sea ice. We tested two atmospheric correction approaches (ATCOR-4 and empirical line calibration) on hyperspectral AisaEAGLE imagery acquired during RV Polarstern cruise PS106 in summer 2017 and validated results with field measurements. Based on this data, we tested if an existing model for pond depth assessment based on hyperspectral data described in [72] can be applied to airborne AisaEAGLE imagery.
We found that ATCOR-4 generally overestimates measured reflectances but retrieved spectra roughly match the shape of field measurements enabling robust applications such as computation of spectral indices. Retrieved spectra, however, are noisy and resulted in strongly underestimated pond depths. Given a maximum pond depth of about 100 cm, resulting accuracies were too low for pond depth assessment (R 2 = −2.24, RMSE = 11.57 cm). Therefore, we advise not using ATCOR-4 for pond depth retrieval at very low flight altitudes (<100 m). Empirical line calibration was more accurate and resulted in smooth spectra and reduced differences between retrieved and in situ reflectance, especially in the wavelength region around 710 nm, which is relevant for the pond depth retrieval. Final results were highly correlated (r = 0.86), accurate (RMSE = 4.04 cm) and the model explained a large portion of the variability in pond depth (R 2 = 0.74).
Although the ponds studied here cover a wide range of bottom type characteristics, more data from different sea ice regimes are desirable to further investigate the model's applicability. Maximum pond depth in this study was~35 cm. While the model is valid for pond depths up to 100 cm, tests in deep ponds as found on multi-year ice are still pending. We are confident that the Multidisciplinary drifting Observatory for the Study of Arctic Climate (MOSAiC) [100] will increase the amount and variability of ground truth data collected in conjunction with hyperspectral remote sensing data.
Our results are in good agreement with the findings of [72] and show the potential of melt pond bathymetry assessment by means of airborne hyperspectral remote sensing. Results, however, likewise indicate that the model is sensitive to differences in spectral slope. An application of the model for observations of the temporal evolution of melt pond depth, therefore, requires accurate radiometric and spectral calibration of the sensor as well as a precise atmospheric correction.
For future campaigns in Arctic sea ice, we recommend that the instrument should be calibrated under comparable environmental conditions. We further recommend upgrading the setup with a FODIS to track changes in downwelling irradiance and deploying of spectral reference targets for calibration and validation.
The technical development in the field of hyperspectral imaging sensors resulted in small-sized lightweight sensors that may be operated on UAVs [101]; and hyperspectral satellite missions such as EnMAP [102] may enable pond depth retrieval on new spatial and temporal scales. Operationalization of the method and scaling to satellite measurements, however, requires an atmospheric correction, where field information is not available. Therefore, we underline the need for improved atmospheric correction processors that consider environmental and atmospheric conditions in the Arctic Ocean. Furthermore, we hypothesize that the implementation of LUTs specifically designed for the Arctic sea ice environment may improve existing algorithms for atmospheric correction.
Besides an appropriate atmospheric correction, applicability of the pond depth retrieval in satellite remote sensing is limited to ponds of certain area and geometry depending on a sensor's spatial resolution. The model is only valid for clear melt pond pixels and spectral mixing with adjacent sea ice surfaces may result in erroneous bathymetry. Further research is necessary to investigate how these issues may be addressed, e.g., by identification and exclusion of mixed pixels or spectral unmixing.