Retrieval of Melt Pond Fraction over Arctic Sea Ice during 2000–2019 Using an Ensemble-Based Deep Neural Network

: The accurate knowledge of variations of melt ponds is important for understanding the Arctic energy budget due to its albedo–transmittance–melt feedback. In this study, we develop and validate a new method for retrieving melt pond fraction (MPF) over Arctic sea ice using all seven spectral bands of MODIS surface reﬂectance. We construct a robust ensemble-based deep neural network and use in-situ MPF observations collected from multiple sources as the target data to train the network. We examine the potential inﬂuence of using sea ice concentration (SIC) from di ﬀ erent sources as additional target data (besides MPF) on the MPF retrieval. The results suggest that the inclusion of SIC has a minor impact on MPF retrieval. Based on this, we create a new MPF data from 2000 to 2019 (the longest data in our knowledge). The validation shows that our new MPF data is in good agreement with the observations. We further compare the new MPF dataset with the previously published MPF datasets. It is found that the evolution of the new MPF is similar to previous MPF data throughout the melting season, but the new MPF data is in relatively better agreement with the observations in terms of correlations and root mean squared errors (RMSE), and also has the smallest value in the ﬁrst half of the melting season.


Introduction
The onset of the melting season, typically in late spring, is driven by the increase in incoming solar radiation and weather systems originating in the mid-latitudes. As snow and ice melts, the melt water accumulates on the surface in valleys of sea ice topography. The area and depth of the ponds depend on sea ice topography and internal structure. Field experiments showed that melt ponds are smaller and deeper on multiyear ice (MYI), but wider and shallower on seasonal sea ice (e.g., [1][2][3][4][5]). Melt ponds have lower albedo than snow and bare ice (e.g., [4,[6][7][8]). This increases the absorption of solar radiation and further melts the snow and ice, leading to an important positive albedo feedback [8][9][10][11]. There are several stages of the evolution of sea ice albedo with melt ponds from May to September [4,12,13].

MODIS Surface Reflectance Data
In this study, the MODIS/Terra Surface Reflectance 8-Day L3 version 6 (MOD09A1, [52]) was used to train the network and derive the MPF dataset over Arctic sea ice. The MOD09A1 has a spatial resolution of 500 m in the Sinusoidal projection and is available at 8-day interval. In MOD09A1, each pixel contains the relatively best possible L2G (Level 2 gridded data) observation during an 8-day period. Each orbit observation is assigned a score, based on whether it is flagged for cloud, cloud shadow, high aerosol or low aerosol, or contains high view angle or low solar zenith angle. The lowest score, 0, is assigned to observations with fill values for data. The remaining scores are set from 1 (BAD) to 10 (GOOD) based on the aforementioned factors (see the MODIS Surface Reflectance User's Guide Collection 6 at https://lpdaac.usgs.gov/documents/306/MOD09_User_Guide_V6.pdf for details). Here we evaluate the scores of the MOD09A1 data used in the network. The results show that 98.9% of MOD09A1 data are in the highest quality (score = 10) and others are in good quality (score = 9 or 8). In this study, the reasons we use the MOD09A1 to derive the MPF are on one hand to minimize the effect of the clouds on the spectral reflectance (MOD09A1 selects the observation with the highest score and the lowest view angle), and on the other hand to make our MPF dataset comparable to the previous MPF dataset [35] in terms of time interval.

MPF Measurements for Network Training
A number of melt pond surveys have been conducted in the Arctic. Figure 1 shows the geographical distribution of all the MPF measurements used in this study. The observations can be classified into two groups. The first group has the resolution of the observations that are below 1 × 1 km. The second group has the resolution of the observations that are above 1 × 1 km. Only the observations in the first group were used as the training data in the DNN, since they have relatively comparable resolution as the MODIS data. The MPF in the first group are collected from satellite-based, airborne, ship-based, and in-situ observations from the following eight sources. Table 1 [53]. Development of this data set was based on experience gained using reconnaissance imagery during the Surface Heat Budget of the Arctic Ocean (SHEBA) and earlier summer ice monitoring experiments [54,55]. Visible band imagery was analyzed using supervised maximum likelihood classification to derive either two (water and ice) or three (pond, open water, and ice) surface classes. The estimated pond coverage was under 500 m square cells within 10 km square images (image resolution is 1 m). The NSIDC MPF used in the network training was measured at 72.8- Figure 1. The geographical distribution of the observed MPF from multiple sources. The base map was developed by Esri using HERE data, Garmin basemap layers, OpenStreetMap contributors, Esri basemap data, and data from the GIS user community.   [53]. Development of this data set was based on experience gained using reconnaissance imagery during the Surface Heat Budget of the Arctic Ocean (SHEBA) and earlier summer ice monitoring experiments [54,55]. Visible band imagery was analyzed using supervised maximum likelihood classification to derive either two (water and ice) or three (pond, open water, and ice) surface classes. The estimated pond coverage was under 500 m square cells within 10 km square images (image resolution is 1 m). The NSIDC MPF used in the network training was measured at  [44]. The cameras were mounted with a view of the horizon and ice pack in front of the ship. The images were classified into five types (water only; ice only; water and ice; pond and ice; water, pond, and ice). Due to the camera malfunction and some bad ice conditions, information was missing in some years [44]. The MPF used in the network training was measured at 68.9-88.2 • N during July to September 2011. The covered areas of per image range from 1453 to 2397 m 2 . The obtained measurement from JOIS is the MPF relative to the grid (image area). The data could be obtained by contacting the first author in Tanaka et al. [44].
Figures S1-S8 also show the map-based cases of the above MPF observations overlaid on the NASA Team SIC [62]. It appears that most of the MPF observations are in the grids with SIC above 40%.

MPF Measurements for Independent Validation
We also used two additional satellite-based observations as the independent validation data in this study. The geographical distribution is shown in Figure 1. Note: the observations from IceWatch and TransArc (visibility of ≈10 km) were also used as independent validation, since these observations are not used in the network training.  [63]. The high resolution (0.46 and 1.84 m resolution) WorldView optical satellite imagery is able to capture the evolution of the ice and ocean surface state directly [45]. This dataset contains the results from processing the WorldView imagery of sea ice using an Open Source Sea-Ice Processing algorithm [45]. This method classifies surface coverage into three main categories: snow and bare ice; melt ponds and submerged ice; open water. Tests show the classification accuracy of this method typically exceeds 96% [45]. The recorded data covers  Figure S9 shows the SIC conditions for the above two MPFs using scatter plots. The results suggest that the MPF from MEDEA are mainly measured at SIC above 60%, and the MPF above 10% from WorldView are mainly measured at SIC above 40%.

Ensemble-Based Deep Neural Network
In this study, we constructed an ensemble-based DNN to retrieve the MPF. First, we processed the data to co-locate the MOD09A1 and the MPF observations before the DNN training. For each 8-day composite of MOD09A1, there are 40 tiles to cover the entire Arctic. We processed all the tiles into the mosaic and reprojected them to a GeoTIFF on the 500 m polar stereographic grid. Each spectral band (bands 1-7) is stored as a separate GeoTIFF file. The surface reflectance of MOD09A1 was obtained from the file which covers the date of the MPF observations, i.e., corresponding to the NSIDC MPF on 4 July 2000, the surface reflectance was obtained from the MOD09A1 with the date spanning from 3 to 10 July 2000, therefore it covers the MPF observation date. Before we processed the observed MPF, the MPF relative to ice-covered area was transformed to the MPF relative to the grid using the observed SIC. We located the MODIS grid that the observation falls in based on the recorded latitude, longitude, and date of observation. Then we calculated the averaged observed MPFs in each MODIS grid (note: MODIS grid in the same location but on a different date is regarded as a different grid). The averages of the observed MPFs corresponding to the MODIS grids were used in the DNN training. We evaluated the variance of the observed MPFs in 500 m grid. The results show that the observed MPFs in the central Arctic have small variance (most grids have the standard deviations within 5%), while the observed MPF in the ice edges have relatively larger variance (some grids have standard deviations above 8%). The average of these grid-based standard deviations is 7.71 ± 6.77.
Second, we built the DNN using the above processed data. The structure of the DNN is shown in Figure 2. The input in the DNN training is the processed spectral reflectance of the seven bands (bands 1-7) from MOD09A1 on the 500 m polar stereographic grid. The targets (output) are the observations from eight sources (NSIDC, PRIC-Lei, NPI, TransArc, HOTRAX, DLUT, IceWatch, JOIS) mentioned in Section 2.2. The first target data is the processed observed MPF relative to the grid. The second target data is the processed SIC. In the DNN training, we only considered the grids that meet the following conditions: (i) the surface reflectance in seven bands are all within the valid range; (ii) the observed MPF is above 0 and below 100%; (iii) the observed MPF relative to the grid is smaller than the observed SIC (with MPF considered). In the DNN training, 25, 35, and 45 neurons were used in the first, second, and third hidden layer, respectively. The DNN randomly selects 80%, 10%, and 10% of the data as the training, validation, and test dataset. The three datasets have no overlap. The training dataset was used to build the relationship between the input and the target. The validation dataset was used to tune the parameters to avoid overfitting and to validate if the network has learned some meaningful relationship from the training dataset. The test dataset was completely independent data that was used to evaluate the performance of the network. To evaluate the impacts of different SIC target data on the MPF retrieval, the DNN was built by the four combinations (  Third, we ran the DNN 100 times with random weights and biases assigned initially for the neurons in the four network combinations, which generated four groups of 100 networks. Table 3  To evaluate the impacts of different SIC target data on the MPF retrieval, the DNN was built by the four combinations ( Table 2). The training input (spectral reflectance from seven bands of MOD09A1) remained unchanged. Here we used the same training, validation, and test datasets for the four network combinations. The cost functions used in the four networks are all mean squared errors. The only difference among the four networks is the SIC target (output) data.
• DNN_MPF (no SIC) is the DNN trained only using the processed MPF from the observations in Section 2.2 as the target data. The DNN_MPF (no SIC) does not include SIC as the target. • DNN_MPF+ ObsSIC is the DNN trained by adding the processed SIC from the observations in Section 2.2 as the second target. The observed MPF and SIC are from the same source. • DNN_MPF+AMSRSIC is the DNN trained by adding the SIC derived from the Advanced Microwave Scanning Radiometer-Earth Observing System and Advanced Microwave Scanning Radiometer 2 (referred to as AMSR SIC [64]) as the second target. The AMSR SIC is developed by the University of Bremen using the ARTIST Sea Ice algorithm. In the DNN training, the AMSR SIC was resampled from 6.25 km to the 500 m polar stereographic grid and then extracted based on the latitude and longitude of the corresponding MPF observation. The SIC data can be found at https://seaice.uni-bremen.de/sea-ice-concentration. • DNN_MPF+NASASIC is the DNN trained by adding the SIC derived from the Nimbus-7 SMMR and DMSP SSM/I-SSMIS Passive Microwave Data based on a NASA Team algorithm (referred to as NASA Team SIC [62]) as the second target. In the DNN training, the NASA Team SIC is resampled from 25 km to the 500 m polar stereographic grid and then extracted based on the latitude and longitude of the corresponding MPF observation. The SIC data can be found at https://nsidc.org/data/nsidc-0051. Third, we ran the DNN 100 times with random weights and biases assigned initially for the neurons in the four network combinations, which generated four groups of 100 networks. Table 3 shows the mean correlation coefficients and RMSE with standard deviations (estimated from each group of 100 networks) against the observations in training, validation, test, and total datasets for the four network combinations. The results show that the MPF retrievals from the different datasets have comparable R (range from 0.60 to 0.67) and RMSE (range from 0.082 to 0.089). This suggests the network is not overfitted. The MPF derived by DNN_MPF+ObsSIC shows relatively stronger correlations and lower RMSE. Figure 3 shows the performance of the 100 networks in total dataset using DNN_MPF+ObsSIC. The correlation coefficients between the observations and the MPF retrieved from the networks range from 0.57 to 0.67 and the RMSE range from 0.082 to 0.091. This suggests that the relationship identified by the network is sensitive to initially assigned weight and bias. To obtain a robust network, we removed the 20 networks with 10 highest and 10 lowest correlation coefficients, and selected the 80 networks which correlation coefficients range within 10-90 percentile of the 100 networks. We applied the ensemble averaging to the selected networks (ensemble-based DNN). The ensemble average is the averaged MPF of the MPFs derived from the selected 80 networks. We used the ensemble average as the final output. As shown in Figure 3, the retrieved MPF from the ensemble average is in a good agreement with the MPF measurements. The correlation coefficient is 0.65 and the RMSE is 0.083, which are both better than most of the 100 networks. The ensemble averaging can smooth the retrievals and reduce the effects of singular values. It is widely used in numerical model studies and has been demonstrated that it has better performance than a single simulation. shows the mean correlation coefficients and RMSE with standard deviations (estimated from each group of 100 networks) against the observations in training, validation, test, and total datasets for the four network combinations. The results show that the MPF retrievals from the different datasets have comparable R (range from 0.60 to 0.67) and RMSE (range from 0.082 to 0.089). This suggests the network is not overfitted. The MPF derived by DNN_MPF+ObsSIC shows relatively stronger correlations and lower RMSE. Figure 3 shows the performance of the 100 networks in total dataset using DNN_MPF+ObsSIC. The correlation coefficients between the observations and the MPF retrieved from the networks range from 0.57 to 0.67 and the RMSE range from 0.082 to 0.091. This suggests that the relationship identified by the network is sensitive to initially assigned weight and bias. To obtain a robust network, we removed the 20 networks with 10 highest and 10 lowest correlation coefficients, and selected the 80 networks which correlation coefficients range within 10-90 percentile of the 100 networks. We applied the ensemble averaging to the selected networks (ensemble-based DNN). The ensemble average is the averaged MPF of the MPFs derived from the selected 80 networks. We used the ensemble average as the final output. As shown in Figure 3, the retrieved MPF from the ensemble average is in a good agreement with the MPF measurements. The correlation coefficient is 0.65 and the RMSE is 0.083, which are both better than most of the 100 networks. The ensemble averaging can smooth the retrievals and reduce the effects of singular values. It is widely used in numerical model studies and has been demonstrated that it has better performance than a single simulation.    For the final MPF dataset, we aimed to develop the dataset on the 12.5 km grid to match the resolution of the previous MPF datasets [35,36] and also to make it comparable with most SIC data and the independent validation data. We resampled the aforementioned MOD09A1 GeoTIFF files of seven bands from the 500 m to 12.5 km polar stereographic grid using the mean value of a 25 × 25 size window. We then applied the ensemble-based DNN to derive the final MPF dataset from 2000 to 2019. The input is the surface reflectance of seven bands (bands 1-7) from MOD09A1 on the 12.5 km polar stereographic grid. The output trained by the ensemble-based DNN is the MPF relative to the grid on the 12.5 km polar stereographic grid. This is also the case for SIC for DNN_MPF+ObsSIC, DNN_MPF+AMSRSIC, DNN_MPF+NASASIC ( Table 2).

Evaluation
In this study, we used all seven spectral bands from MOD09A1 to retrieve the MPF in comparison with that of Rösel et al. [33,35] which only used three MOD09A1 bands. To evaluate the importance of each MODIS band, we calculated the relative contribution of each band on the MPF retrieval using "connection weights" proposed by Olden and Jackson [65]. The "connection weights" calculates the product of the raw input-hidden and hidden-output connection weights between each input and output neuron and sums the products across all hidden neurons. The results show the largest contribution is from MOD09A1 band 4 (21%). Although the contributions from band 1 to band 4 are higher than that from band 5 to band 7, each MOD09A1 band shows considerable contribution (above 10%) in the MPF retrieval ( Figure 4). Additionally, we used the "increased RMSE" method proposed by Gevrey et al. [66] to further evaluate the variable importance. The importance was based on the changes in the RMSE by sequentially setting input variables to their mean value. The results show the similar importance to that calculated by "connection weights". These suggest that using all seven spectral bands benefits the MPF retrieval. For the final MPF dataset, we aimed to develop the dataset on the 12.5 km grid to match the resolution of the previous MPF datasets [35,36] and also to make it comparable with most SIC data and the independent validation data. We resampled the aforementioned MOD09A1 GeoTIFF files of seven bands from the 500 m to 12.5 km polar stereographic grid using the mean value of a 25 × 25 size window. We then applied the ensemble-based DNN to derive the final MPF dataset from 2000 to 2019. The input is the surface reflectance of seven bands (bands 1-7) from MOD09A1 on the 12.5 km polar stereographic grid. The output trained by the ensemble-based DNN is the MPF relative to the grid on the 12.5 km polar stereographic grid. This is also the case for SIC for DNN_MPF+ObsSIC, DNN_MPF+AMSRSIC, DNN_MPF+NASASIC ( Table 2).

Evaluation
In this study, we used all seven spectral bands from MOD09A1 to retrieve the MPF in comparison with that of Rösel et al. [33,35] which only used three MOD09A1 bands. To evaluate the importance of each MODIS band, we calculated the relative contribution of each band on the MPF retrieval using "connection weights" proposed by Olden and Jackson [65]. The "connection weights" calculates the product of the raw input-hidden and hidden-output connection weights between each input and output neuron and sums the products across all hidden neurons. The results show the largest contribution is from MOD09A1 band 4 (21%). Although the contributions from band 1 to band 4 are higher than that from band 5 to band 7, each MOD09A1 band shows considerable contribution (above 10%) in the MPF retrieval ( Figure 4). Additionally, we used the "increased RMSE" method proposed by Gevrey et al. [66] to further evaluate the variable importance. The importance was based on the changes in the RMSE by sequentially setting input variables to their mean value. The results show the similar importance to that calculated by "connection weights". These suggest that using all seven spectral bands benefits the MPF retrieval. To evaluate the impact of adding SIC as the target data on the MPF retrieval, we calculated the spatial correlation coefficient and RMSE of the MPFs from the three networks containing SIC against the MPF from DNN_MPF (no SIC) in each year during 2000-2019. The results show that the average spatial correlation coefficient is ≈0.99 and the RMSE is ≈0.01. This suggests that the MPF from the networks which contain SIC as the target data are generally consistent with that from DNN_MPF (no SIC). We further examined the percentage of grid cell with MPF greater than NASA Team SIC (considered as bad retrieval) from the four networks (Table 4). The results suggest a decreasing trend for the MPF bad retrievals during 2000-2019. This is related to the reduced total grid cells in the study period. The bad retrievals mostly appear in the ice edges. As the ice edges gradually retreated, the To evaluate the impact of adding SIC as the target data on the MPF retrieval, we calculated the spatial correlation coefficient and RMSE of the MPFs from the three networks containing SIC against the MPF from DNN_MPF (no SIC) in each year during 2000-2019. The results show that the average spatial correlation coefficient is ≈0.99 and the RMSE is ≈0.01. This suggests that the MPF from the networks which contain SIC as the target data are generally consistent with that from DNN_MPF (no SIC). We further examined the percentage of grid cell with MPF greater than NASA Team SIC (considered as bad retrieval) from the four networks (Table 4). The results suggest a decreasing trend for the MPF bad retrievals during 2000-2019. This is related to the reduced total grid cells in the study period. The bad retrievals mostly appear in the ice edges. As the ice edges gradually retreated, the bad retrievals also decrease. Additionally, the MPF in Table 4 is compared with the NASA Team SIC, which is underestimated in summer due to ponds and may affect the estimation of bad retrievals. If we compare the MPF with the SIC retrieved by DNN_MPF+ObsSIC, these bad retrievals will be greatly reduced. As shown in Table 4, for DNN_MPF (no SIC), 0.09% of the grid cells have bad MPF retrieval when considering grids with SIC > 30%. The percentage of the grids with MPF larger than NASA Team SIC does not change much for the other three networks which contain SIC. This suggests adding SIC as the target has small effects on the MPF retrieval in our method, but adding observed SIC (DNN_MPF+ObsSIC) can help to reduce the amounts of bad retrievals (0.05%). In this study, we used the retrieval from DNN_MPF+ObsSIC. The bad retrievals were removed in the analyses. In this study, we used the MPF on both the 500 m and 12.5 km polar stereographic grid to validate our retrieval. Figure 5 shows the validation against the observations on the 500 m polar stereographic grid. The validation data used here are the total dataset (combination of training, validation, and test data). In general, the retrieved MPF shows best performance with the NSIDC observations (r = 0.70, RMSE = 0.066 in Figure 5a). This is not only due to the relatively larger number of observations from NSIDC used in the DNN training, but also the same resolution of NSIDC observations with MODIS data. For the validations against other observations, the RMSEs, except for IceWatch and JOIS, are generally within 0.1 which is proposed as an important factor to evaluate the data accuracy in Wright and Polashenski [34]. We further evaluated the correlations and RMSE on the training, validation, and test dataset for each observation (Table S1). The results show that the R and RMSE in different datasets are comparable. This suggests the network has been trained but not overfitted. In addition, the root mean square (RMS) of the observed MPF in the sub-figures are all larger than the RMSE (Table S2). Remote Sens. 2020, 12, x FOR PEER REVIEW 13 of 30  Figure 6 provides map-based cases of the observed MPF from multiple sources overlaid on our retrieved MPF. It appears that the retrieved MPF has reasonable agreement with the observations. We note that the large difference is usually found for the observed MPF above 35%. This is partly due to the observed MPF focused on small areas (within 1 × 1 km), which has large possibility of MPF above 35%. However, for the areas of 12.5 × 12.5 km, the possibility of such high MPF is reduced.
For detailed validation, we examined our retrieved MPF (DNN_MPF) and the MPF version 2 (UH_MPFv2) developed by Rösel et al. [35] against the observations on 12.5 km polar stereographic grid (Figure 7). We only used the observations where DNN_MPF and UH_MPFv2 are both within the valid ranges. The two MPFs were both retrieved from the MOD09A1 on the 12.5 km polar stereographic grid. The validation data used here are the total dataset. Some validation of the UH_MPFv2 is missing due to the invalid UH_MPFv2 or data gap after 2011. The validations suggest that the DNN_MPF shows better agreement with the observations than UH_MPFv2 (Figure 7). The correlation coefficients of DNN_MPF are all higher than that of UH_MPFv2, i.e., the validation against PRIC-Lei (r = 0.54 vs. r = 0.31 in Figure 7b Figure 7h). The DNN_MPF shows relatively better performance with the observations from NSIDC (r = 0.80), DLUT, and JOIS. However, the validations against the TransArc observations for the two MPF data are both not good. For the 12.5 km retrieval, the RMSE of DNN_ MPF is generally within 0.1, except for IceWatch. The RMS of each observations are also larger than the RMSE (Table S2) as that in the 500 m grid. The results suggest that the validation on 12.5 km grid is even better than that of the 500 m grid. This can be partly explained by the large variabilities of MPFs in small areas (within 1 × 1 km), while these variabilities can be smoothed in large areas (12.5 × 12.5 km).  Figure 6 provides map-based cases of the observed MPF from multiple sources overlaid on our retrieved MPF. It appears that the retrieved MPF has reasonable agreement with the observations. We note that the large difference is usually found for the observed MPF above 35%. This is partly due to the observed MPF focused on small areas (within 1 × 1 km), which has large possibility of MPF above 35%. However, for the areas of 12.5 × 12.5 km, the possibility of such high MPF is reduced.  For detailed validation, we examined our retrieved MPF (DNN_MPF) and the MPF version 2 (UH_MPFv2) developed by Rösel et al. [35] against the observations on 12.5 km polar stereographic grid (Figure 7). We only used the observations where DNN_MPF and UH_MPFv2 are both within the valid ranges. The two MPFs were both retrieved from the MOD09A1 on the 12.5 km polar stereographic grid. The validation data used here are the total dataset. Some validation of the UH_MPFv2 is missing due to the invalid UH_MPFv2 or data gap after 2011. The validations suggest that the DNN_MPF shows better agreement with the observations than UH_MPFv2 (Figure 7). The correlation coefficients of DNN_MPF are all higher than that of UH_MPFv2, i.e., the validation against PRIC-Lei (r = 0.54 vs. r = 0.31 in Figure 7b Figure 7h). The DNN_MPF shows relatively better performance with the observations from NSIDC (r = 0.80), DLUT, and JOIS. However, the validations against the TransArc observations for the two MPF data are both not good. For the 12.5 km retrieval, the RMSE of DNN_ MPF is generally within 0.1, except for IceWatch. The RMS of each observations are also larger than the RMSE (Table S2) as that in the 500 m grid. The results suggest that the validation on 12.5 km grid is even better than that of the 500 m grid. This can be partly explained by the large variabilities of MPFs in small areas (within 1 × 1 km), while these variabilities can be smoothed in large areas (12.5 × 12.5 km).  We note that the above observations have been used as samples in the DNN training. To further evaluate our data quality, we used the observations from MEDEA and WorldView as the independent validations (Figure 8c,d). The observations from IceWatch and TransArc with visibility of 10 km or above are also used as independent validations (Figure 8a,b). Note: these observations were not used in the DNN training. The results suggest that the retrieved MPFs have better We note that the above observations have been used as samples in the DNN training. To further evaluate our data quality, we used the observations from MEDEA and WorldView as the independent validations (Figure 8c,d). The observations from IceWatch and TransArc with visibility of 10 km or above are also used as independent validations (Figure 8a,b). Note: these observations were not used in the DNN training. The results suggest that the retrieved MPFs have better agreement with the satellite-based observations (MEDEA and WorldView) than the manual observations (TransArc and IceWatch), which have larger uncertainties in MPF measurements. The original satellite images with meter to decimeter resolution used in MEDEA and WorldView observations can capture the detailed ice surface features, which help improve the classification accuracy of surface types [8,45]. Previous studies showed the overall classifier accuracy of MEDEA observation was as high as 96-99% during May to August [8], and the accuracy of the type determination in WorldView observation is approximately 96% [45]. The MEDEA and WorldView observations have been used as the validation data in many previous studies [34,50,[67][68][69][70]. These suggest that the satellite-based observations from MEDEA and WorldView have sufficient accuracy to be used for the independent validation. The DNN_MPF and UH_MPFv2 show good performance (r = 0.80 vs. r = 0.75) when validating with the MEDEA observations (Figure 8c). The correlation coefficients of DNN_MPF against WorldView observations is 0.61 and the RMSE is 0.08 (Figure 8d). The RMSE (0.08) is much smaller than that of UH_MPFv2, which is 0.18 calculated by Wright and Polashenski [34]. In this study, we did not use the fixed spectral reflectance of each surface type (i.e., bare ice, snow cover ice, melt pond, and open water) to build the relationship. Instead, we used the observed MPF from multiple sources to directly train the DNN. The advantage of our method is that it avoids large uncertainties of spatially and temporally varying reflectance associated with surface species, which can result in large uncertainties for the retrieval. The "spectral unmixing" that used the constant reflectance in Rösel et al. [33,35] has been proposed as a problem in MPF retrieval by Wright and Polashenski [34]. Our retrieval directly based on the DNN might help to solve this issue.
Remote Sens. 2020, 12, x FOR PEER REVIEW 15 of 30 agreement with the satellite-based observations (MEDEA and WorldView) than the manual observations (TransArc and IceWatch), which have larger uncertainties in MPF measurements. The original satellite images with meter to decimeter resolution used in MEDEA and WorldView observations can capture the detailed ice surface features, which help improve the classification accuracy of surface types [8,45]. Previous studies showed the overall classifier accuracy of MEDEA observation was as high as 96-99% during May to August [8], and the accuracy of the type determination in WorldView observation is approximately 96 % [45]. The MEDEA and WorldView observations have been used as the validation data in many previous studies [34,50,[67][68][69][70]. These suggest that the satellite-based observations from MEDEA and WorldView have sufficient accuracy to be used for the independent validation. The DNN_MPF and UH_MPFv2 show good performance (r = 0.80 vs. r = 0.75) when validating with the MEDEA observations (Figure 8c). The correlation coefficients of DNN_MPF against WorldView observations is 0.61 and the RMSE is 0.08 (Figure 8d). The RMSE (0.08) is much smaller than that of UH_MPFv2, which is 0.18 calculated by Wright and Polashenski [34]. In this study, we did not use the fixed spectral reflectance of each surface type (i.e., bare ice, snow cover ice, melt pond, and open water) to build the relationship. Instead, we used the observed MPF from multiple sources to directly train the DNN. The advantage of our method is that it avoids large uncertainties of spatially and temporally varying reflectance associated with surface species, which can result in large uncertainties for the retrieval. The "spectral unmixing" that used the constant reflectance in Rösel et al. [33,35] has been proposed as a problem in MPF retrieval by Wright and Polashenski [34]. Our retrieval directly based on the DNN might help to solve this issue.

Uncertainties
Here we examined the accuracy of the DNN_MPF in each month by calculating the probability density distribution (PDF) of the bias (DNN_MPF minus observed MPF) in 500 m grid. As shown in Figure 9, the PDF is normal distribution and most of the bias are within in the range of −0.1-0.1. For each individual month (Figure 10), the DNN_MPF in June shows the best performance. The retrieval in July and August has similar distribution, even though the DNN_MPF in July seems slightly toward negative side. The retrieval in September shows the similar PDF distribution, but it has relatively

Uncertainties
Here we examined the accuracy of the DNN_MPF in each month by calculating the probability density distribution (PDF) of the bias (DNN_MPF minus observed MPF) in 500 m grid. As shown in Figure 9, the PDF is normal distribution and most of the bias are within in the range of −0.1-0.1. For each individual month (Figure 10), the DNN_MPF in June shows the best performance. The retrieval in July and August has similar distribution, even though the DNN_MPF in July seems slightly toward negative side. The retrieval in September shows the similar PDF distribution, but it has relatively higher probability with large bias. This is due to the fewer observations (172 observations) in September used in the network training. higher probability with large bias. This is due to the fewer observations (172 observations) in September used in the network training.   Table 5 were estimated by the average of the grid-based standard deviation on each date during 2000-2019. The results show that the magnitude of the uncertainty of our MPF retrieval varies slightly from May to August, with an average of 1.9%. The uncertainty of SIC is slightly larger than that of the MPF, within 5-6% in May and then decreases during June to September. The small uncertainties suggest that our retrieval using the ensemble-based networks is robust. Figure 11 shows    Table 5 were estimated by the average of the grid-based standard deviation on each date during 2000-2019. The results show that the magnitude of the uncertainty of our MPF retrieval varies slightly from May to August, with an average of 1.9%. The uncertainty of SIC is slightly larger than that of the MPF, within 5-6% in May and then decreases during June to September. The small uncertainties suggest that our retrieval using the ensemble-based networks is robust. Figure 11 shows  Table 5 were estimated by the average of the grid-based standard deviation on each date during 2000-2019.
The results show that the magnitude of the uncertainty of our MPF retrieval varies slightly from May to August, with an average of 1.9%. The uncertainty of SIC is slightly larger than that of the MPF, within 5-6% in May and then decreases during June to September. The small uncertainties suggest that our retrieval using the ensemble-based networks is robust. Figure 11 shows the spatial distribution of the grid-based standard deviations of the DNN_MPF averaged for the period of May to September during 2000-2019. The uncertainties of MPF are generally within 5% in much of the Arctic Ocean during the entire melting season. In May, the MPF uncertainties in the central Arctic are about 1-3% larger than other months, especially for that in the first 8-day period. In June and July, the uncertainties decrease and are mostly within 3%. In August, the uncertainties increase by about 2% in most of the Arctic. The DNN_MPF shows generally larger uncertainties in the ice edge zone from June to August and in the Canadian Archipelago from May to June. We further check the spatial distribution of the uncertainties of the DNN_MPF in 2001 (the year with the largest sea ice extent in September) and 2012 (the year with the smallest sea ice extent in September). Consistent with Figure 11, the uncertainties of MPF in the two years are within 5% in most areas. The uncertainties of DNN_MPF are about 2-3% larger during June to July in 2001 than that in 2012.

Inter-Comparison
In this study, we compare our MPF dataset with the UH_MPFv2 [35] and the MPF developed by Istomina et al. [36] (referred as UB_MPF). The UB_MPF consists of daily MPF retrieved from MERIS swath Level 1b data using the MPD (Melt Pond Detector) retrieval [37]. To compare with DNN_MPF and UH_MPFv2, we calculated the 8-day averages of the UB_MPF corresponding to the date ranges of the MOD09A1 8-day composite. All the MPFs are the fraction relative to the 12.5 km polar stereographic grid. Figures 12-14 show the averaged DNN_MPF, UH_MPFv2, and UB_MPF in the period of May to September from 2003 to 2011 (the overlapping period of the three datasets). The climatology of the three MPFs do not exceed 40% in most areas of the Arctic, except for the Canadian Archipelago in the UB_MPF. In May, the DNN_MPF and UH_MPFv2 have similar spatial pattern in much of the Arctic Ocean, but the UH_MPFv2 is relatively larger in the sea ice edge zone of the Barents Sea, Greenland Sea, and Hudson Bay. The UB_MPF is generally larger than the other two data in the central Arctic. In June, the DNN_MPF and UH_MPFv2 have comparable MPF in the ice edges (around 25%), especially in the Baffin Bay, Kara Sea, and East Siberian Sea. The DNN_MPF is slightly smaller than the UH_MPFv2. The DNN_MPF and UH_MPFv2 both show apparent increase in the end of June. In the Arctic Basin, the UB_MPF tends to evolve early in the western Arctic dominated by the multi-year ice, while the DNN_MPF and UH_MPFv2 seem to evolve synchronously in the western and eastern region. The MPF of all three data increases quickly in the band of the Beaufort, Chukchi, and East Siberian Seas. The UB_MPF (above 35%) covers most areas of the Canadian Archipelago in June, which is larger than the other two MPF data. In July, higher fractions (above 20%) gradually extend to the central Arctic based on all three of the datasets. The DNN_MPF and the UH_MPFv2 have similar spatial pattern, but the large fraction (above 25%) of DNN_MPF covers smaller areas than that of the UH_MPFv2. The DNN_MPF and UH_MPFv2 have similar amount of MPF in the eastern and western Arctic basin, although the UH_MPFv2 in the central Arctic is relatively larger. Different from the other two data, the UB_MPF has higher MPF in the western than the eastern regions. The DNN_MPF in the Canadian Archipelago shows apparently later increase than the other two MPFs and also has smaller fractions. In August, the MPF of the three datasets gradually decreases. The UH_MPFv2 is remarkably higher than the other two data. The UH_MPFv2 and UB_MPF, respectively, have the slowest and fastest decrease rate, while the decrease rate of the DNN_MPF is in between. During early to mid-August, the magnitude of the DNN_MPF varies slightly in most areas of the Arctic. That is also true for the UH_MPFv2, but the fractions of UH_MPFv2 is relatively larger. The large fraction (above 30%) of UH_MPFv2 covers the ice edges even in late August. Different from the two MPFs which have similar fractions in the eastern and western central Arctic, the UB_MPF has larger fractions in the western regions. By the end of August, the DNN_MPF and UB_MPF are less than 20% in the Arctic basin, while the UH_MPFv2 still maintains at high fraction.
Remote Sens. 2020, 12, x FOR PEER REVIEW 18 of 30 Figure 11. Uncertainties of the DNN_MPF from May to September during 2000-2019.

Inter-Comparison
In this study, we compare our MPF dataset with the UH_MPFv2 [35] and the MPF developed by Istomina et al. [36] (referred as UB_MPF). The UB_MPF consists of daily MPF retrieved from MERIS swath Level 1b data using the MPD (Melt Pond Detector) retrieval [37]. To compare with DNN_MPF and UH_MPFv2, we calculated the 8-day averages of the UB_MPF corresponding to the date ranges of the MOD09A1 8-day composite. All the MPFs are the fraction relative to the 12.5 km polar stereographic grid. Figures 12-14 show the averaged DNN_MPF, UH_MPFv2, and UB_MPF in the period of May to September from 2003 to 2011 (the overlapping period of the three datasets). The climatology of the three MPFs do not exceed 40% in most areas of the Arctic, except for the Canadian Archipelago in the UB_MPF. In May, the DNN_MPF and UH_MPFv2 have similar spatial pattern in much of the Arctic Ocean, but the UH_MPFv2 is relatively larger in the sea ice edge zone of the Barents Sea, DNN_MPF is in between. During early to mid-August, the magnitude of the DNN_MPF varies slightly in most areas of the Arctic. That is also true for the UH_MPFv2, but the fractions of UH_MPFv2 is relatively larger. The large fraction (above 30%) of UH_MPFv2 covers the ice edges even in late August. Different from the two MPFs which have similar fractions in the eastern and western central Arctic, the UB_MPF has larger fractions in the western regions. By the end of August, the DNN_MPF and UB_MPF are less than 20% in the Arctic basin, while the UH_MPFv2 still maintains at high fraction.     Figure 12, except for the UB_MPF. Figure 14. Same as Figure 12, except for the UB_MPF. Figure 15 shows the time series of the three MPF data relative to the 12.5 km polar stereographic grid. Here we only consider the grids with SIC above 30% and the region in the Arctic Circle (north of 66.5 • N). The three datasets have some amount of melt ponds in early May (7.6%, 8.3%, and 9.4% for DNN_MPF, UH_MPFv2, and UB_MPF), but our data has the smallest value. The relatively higher MPF for the three datasets in this period mainly appear on the ice edges. The DNN_MPF has a limited increase in May and the UH_MPFv2 and UB_MPF has an increase within 2%. In June, the UB_MPF increases faster than the other two data and reach up to 20% in mid-late June, which is about 5% and 3% larger than the DNN_MPF and UH_MPFv2 at the same time. The DNN_MPF and UH_MPFv2 mainly evolves in late June. By the end of June, the fractions of the three data are 19% (DNN_MPF), 21% (UH_MPFv2), and 23% (UB_MPF). The DNN_MPF and UH_MPFv2 both reach to the largest fraction ≈22% and ≈25% in late July, while the UB_MPF reach its maximum ≈23% in early July. This suggests the UB_MPF decreases about two 8-day periods earlier than the other two data. From late July to the end of melting season, the UH_MPFv2 always have higher MPF than the other two data. In August, the fractions of DNN_MPF and UB_MPF are close, although that of DNN_MPF is slightly larger. The UH_MPFv2 decline slower than the other two data and maintains at high fractions above 20% for a longer duration. The fraction of UH_MPFv2 is about 2-6% and 3-7% higher than the DNN_MPF and UB_MPF, respectively. The standard deviations of the three data are larger in June and August than other months.
July to the end of melting season, the UH_MPFv2 always have higher MPF than the other two data. In August, the fractions of DNN_MPF and UB_MPF are close, although that of DNN_MPF is slightly larger. The UH_MPFv2 decline slower than the other two data and maintains at high fractions above 20% for a longer duration. The fraction of UH_MPFv2 is about 2-6% and 3-7% higher than the DNN_MPF and UB_MPF, respectively. The standard deviations of the three data are larger in June and August than other months. We further compared the evolution of the three MPFs relative to the grid during the melting season with the observed MPFs ( Figure 16). We used the average of the observations from the aforementioned multiple sources to represent the observed MPF on the specific date range. We extracted the MPFs in the grids corresponding to those observations and calculated their averages to represent the retrieved MPFs. The results show that the DNN_MPF and the UH_MPFv2 are closer to the observed MPFs than the UB_MPF during the early melting season. The RMSE of the DNN_MPF against the observations is ≈1.4% in May, which is smaller than that of the other two MPFs. The DNN_MPF also shows better agreements with the observed MPFs during early to late June. The RMSE of DNN_MPF ranges from 1.2% to 8.4%, which is smaller than that of UH_MPFv2 and UB_MPF. The three retrieved MPFs are relatively closer around mid-July than other periods and show good agreement with the observations during 10-15 July. The RMSE of the DNN_MPF (10.6%) is smaller than the other two MPFs (RMSE = 14.5% for UH_MPFv2 and RMSE = 12.7% for UB_MPF) in July. In later melting season, the UH_MPFv2 and UB_MPF are, respectively, larger and smaller than the observed MPFs. The DNN_MPF in this period is generally between the UH_MPFv2 and UB_MPF and is also closer to the observed MPFs. The RMSE of the three MPFs against the observations all increase in August, and the values vary within the range of 8.5-12.2% for DNN_MPF, 7.0-16.9% for UH_MPFv2, and 7.0-19.1% for UB_MPF, respectively. We further compared the evolution of the three MPFs relative to the grid during the melting season with the observed MPFs ( Figure 16). We used the average of the observations from the aforementioned multiple sources to represent the observed MPF on the specific date range. We extracted the MPFs in the grids corresponding to those observations and calculated their averages to represent the retrieved MPFs. The results show that the DNN_MPF and the UH_MPFv2 are closer to the observed MPFs than the UB_MPF during the early melting season. The RMSE of the DNN_MPF against the observations is ≈1.4% in May, which is smaller than that of the other two MPFs. The DNN_MPF also shows better agreements with the observed MPFs during early to late June. The RMSE of DNN_MPF ranges from 1.2% to 8.4%, which is smaller than that of UH_MPFv2 and UB_MPF. The three retrieved MPFs are relatively closer around mid-July than other periods and show good agreement with the observations during 10-15 July. The RMSE of the DNN_MPF (10.6%) is smaller than the other two MPFs (RMSE = 14.5% for UH_MPFv2 and RMSE = 12.7% for UB_MPF) in July. In later melting season, the UH_MPFv2 and UB_MPF are, respectively, larger and smaller than the observed MPFs. The DNN_MPF in this period is generally between the UH_MPFv2 and UB_MPF and is also closer to the observed MPFs. The RMSE of the three MPFs against the observations all increase in August, and the values vary within the range of 8.5-12.2% for DNN_MPF, 7.0-16.9% for UH_MPFv2, and 7.0-19.1% for UB_MPF, respectively.

Discussion
DNN is a powerful tool for data analysis and is particularly suitable for modeling complex relationships between variables. However, it suffers from the limitation of providing meaningful interpretations that can enhance understanding of underlying physical processes. This leads to a question whether the retrieval by DNN is better than simple approaches. Based on this, we conducted

Discussion
DNN is a powerful tool for data analysis and is particularly suitable for modeling complex relationships between variables. However, it suffers from the limitation of providing meaningful interpretations that can enhance understanding of underlying physical processes. This leads to a question whether the retrieval by DNN is better than simple approaches. Based on this, we conducted several experiments to evaluate the retrieval ability of DNN. First, we used the average of the observed MPF with 10 groups of random noises added (the noises have the similar standard deviations as the observed MPF) to examine whether the MPF derived from this simple approach (regarded as the estimated MPF) is better than our retrieval. The results show that the estimated MPF has little correlation with the observed MPF (Table 6). We further calculated the correlation coefficients by subtracting the estimated MPF from the observed MPF and DNN_MPF. The correlation coefficients did not decrease, which suggest the DNN has learned some complicated relationship (i.e., the variability of the MPF) more than just the mean MPF. Second, we used the linear regression based on the observed SIC to regress the MPF to evaluate if the DNN only calculates a simple fraction of the SIC. Additionally, the RMSE of the regressed MPF against the observations is also larger than that of the DNN_MPF. These suggest the DNN has captured a more complicated relationship than a simple fraction of SIC. Third, we used the multiple linear regression based on the surface reflectance of the seven MODIS bands (same as the input of DNN) to regress the MPF. The results show that the DNN_MPF is always better than the regressed MPF in both 500 m and 12.5 km grid as shown in Figure 17 (note: the TransArc only have eight observations for validation, and we think the negative correlations have no indication). The stronger correlations suggest the DNN has learned some complicated relationship, which cannot be obtained only from a simple method like regression. In addition, we provided a case of the spatial average for the regressed MPF on May 17 during 2003-2011 ( Figure 18). It shows an overestimation compared with other MPF data (DNN_MPF, UH_MPFv2 and UB_MPF), i.e., in Hudson Bay and the band of Beaufort, Chukchi, and East Siberian seas. The regressed MPF is over 30% even 40% for some areas in the 12.5 km grid, which is not possible for the early melting season. These experiments show that the DNN identifies a more complex physical relationship between the MPF and surface reflectance than the simple approaches.   The summer passive microwave SIC, i.e., NASA Team SIC, is not accurate due to the effect of melt ponds. According to NSIDC, the total concentration of the NASA Team SIC is within +/−15% in the Arctic during summer due to melt ponds (https://nsidc.org/data/nsidc-0051). Kern et al. [71] also pointed out that the NASA Team SIC has a large underestimation in the grids with MPF above 40%. However, we used the MPF retrieved from the network DNN_MPF+ObsSIC, which do not contain the passive microwave SIC. The SIC used in the DNN_MPF+ObsSIC training is the observed SIC from the same sources as the observed MPF. The observed SIC are not retrieved from the passive microwave data, but from the satellite visible band imagery, airborne, ship-based, and in-situ observations. In addition, we made further experiments to compare the MPF retrieved from the networks which, respectively, use NASA Team SIC and revised NASA Team SIC as the target. The random noises ranging from −15% to +15% are added to the NASA Team SIC, which is regarded as the revised NASA Team SIC (note: we only consider the SIC above 30%. If the revised SIC is above    The summer passive microwave SIC, i.e., NASA Team SIC, is not accurate due to the effect of melt ponds. According to NSIDC, the total concentration of the NASA Team SIC is within +/−15% in the Arctic during summer due to melt ponds (https://nsidc.org/data/nsidc-0051). Kern et al. [71] also pointed out that the NASA Team SIC has a large underestimation in the grids with MPF above 40%. However, we used the MPF retrieved from the network DNN_MPF+ObsSIC, which do not contain the passive microwave SIC. The SIC used in the DNN_MPF+ObsSIC training is the observed SIC from the same sources as the observed MPF. The observed SIC are not retrieved from the passive microwave data, but from the satellite visible band imagery, airborne, ship-based, and in-situ observations. In addition, we made further experiments to compare the MPF retrieved from the networks which, respectively, use NASA Team SIC and revised NASA Team SIC as the target. The random noises ranging from −15% to +15% are added to the NASA Team SIC, which is regarded as the revised NASA Team SIC (note: we only consider the SIC above 30%. If the revised SIC is above The summer passive microwave SIC, i.e., NASA Team SIC, is not accurate due to the effect of melt ponds. According to NSIDC, the total concentration of the NASA Team SIC is within +/−15% in the Arctic during summer due to melt ponds (https://nsidc.org/data/nsidc-0051). Kern et al. [71] also pointed out that the NASA Team SIC has a large underestimation in the grids with MPF above 40%. However, we used the MPF retrieved from the network DNN_MPF+ObsSIC, which do not contain the passive microwave SIC. The SIC used in the DNN_MPF+ObsSIC training is the observed SIC from the same sources as the observed MPF. The observed SIC are not retrieved from the passive microwave data, but from the satellite visible band imagery, airborne, ship-based, and in-situ observations. In addition, we made further experiments to compare the MPF retrieved from the networks which, respectively, use NASA Team SIC and revised NASA Team SIC as the target. The random noises ranging from −15% to +15% are added to the NASA Team SIC, which is regarded as the revised NASA Team SIC (note: we only consider the SIC above 30%. If the revised SIC is above 100%, then we use 100%). The results show that the correlation coefficients and RMSE of the MPFs from the two networks are 0.996 and 0.006. This suggests the two MPFs are very similar and the inaccurate SIC has little impact on the MPF retrieval in our method.
According to the comparison of our MPF dataset with the previous UH_MPFv2 and UB_MPF datasets, the three MPF data in early May seem to be higher than actual condition (7.6% for DNN_MPF, 8.3% for UH_MPFv2, and 9.4% for UB_MPF during 2003-2011). To address this, we trained the network by only using the surface reflectance in clear sky (DNN_MPF_ClearSky), and compared the results with the current MPF, which is based on both clear and cloudy conditions. As shown in Table 7, the MPF can be reduced by 0.5-3.2% during 2003-2011 by using the DNN_MPF_ClearSky. This suggests that the higher MPF in May is in part due to the effect of clouds, which influences the input of the surface reflectance. We then examined the spatial difference between the two MPFs in May. The results suggest the current MPF is 0-4% larger than that of DNN_MPF_ClearSky in some regions during 2000-2019.
Although the amount of MPF in May can be reduced by using DNN_MPF_ClearSky, there are lots of missing data in the Arctic, as shown in Figure 19. For the DNN_MPF_ClearSky, we only retained the grids in 12.5 km which contain more than 90% clear sky grids in 500 m. This largely limits its application to retrieve the MPF. In this study, we attempted to obtain the MPF covering the entire Arctic, therefore we did not choose to use the network only based on clear sky to retrieve the MPF.  Previous studies [44,72,73] suggested that except for the surface reflectance, the emissivity can also be used to detect the melt ponds. The passive microwave emissivity is influenced by melt ponds in summer [72]. Mathew et al. [73] showed that although the emissivity of multiyear ice remains similar for most months, it varies a lot in summer months (June and July) as a result of the melt pond formation. Tanaka et al. [44] also pointed out that the brightness temperatures in 6.9 and 18.7 GHz decrease with the MPF due to the lower emissivity of water than ice, while the brightness temperatures in 89 GHz increase with the MPF due to the higher emissivity of water than dry MYI. These suggest it is possible to use the emissivity to derive the MPF, but this needs experiments to demonstrate the effective frequencies. Tanaka et al. [44] pointed out that the AMSR-E brightness temperatures with 6 GHz H and 89GHz V are sensitive to the MPF. Studies also showed other parameters, i.e., surface temperature, winds, and roughness, can affect the emissivity in MPF retrieval [74,75]. The frozen melt ponds in late summer might be regarded as ice, due to the similar emissivity of the two types [44]. The MODIS emissivity products (MOD11 and MOD21) have a large data gap in the polar region due to the constraints in the retrieval algorithm, including the solar zenith angle, the time difference between the day and night observations, and the threshold of the brightness temperatures. As a result, we did not use the MODIS emissivity data to retrieve the MPF in this study. However, this work can be done by undertaking experiments using other emissivity data in the future. Previous studies [44,72,73] suggested that except for the surface reflectance, the emissivity can also be used to detect the melt ponds. The passive microwave emissivity is influenced by melt ponds in summer [72]. Mathew et al. [73] showed that although the emissivity of multiyear ice remains similar for most months, it varies a lot in summer months (June and July) as a result of the melt pond formation. Tanaka et al. [44] also pointed out that the brightness temperatures in 6.9 and 18.7 GHz decrease with the MPF due to the lower emissivity of water than ice, while the brightness temperatures in 89 GHz increase with the MPF due to the higher emissivity of water than dry MYI. These suggest it is possible to use the emissivity to derive the MPF, but this needs experiments to demonstrate the effective frequencies. Tanaka et al. [44] pointed out that the AMSR-E brightness temperatures with 6 GHz H and 89GHz V are sensitive to the MPF. Studies also showed other parameters, i.e., surface temperature, winds, and roughness, can affect the emissivity in MPF retrieval [74,75]. The frozen melt ponds in late summer might be regarded as ice, due to the similar emissivity of the two types [44]. The MODIS emissivity products (MOD11 and MOD21) have a large data gap in the polar region due to the constraints in the retrieval algorithm, including the solar zenith angle, the time difference between the day and night observations, and the threshold of the brightness temperatures. As a result, we did not use the MODIS emissivity data to retrieve the MPF in this study. However, this work can be done by undertaking experiments using other emissivity data in the future.

Conclusions
In this study, we developed a new method for retrieving MPF over sea ice on the Arctic-wide scale. We developed an ensemble-based DNN and used MODIS surface reflectance from all seven bands as the input data and in-situ observations of MPF from multiple sources as the target data. It was shown that our ensemble-based DNN is robust, which is not sensitive to initially assigned weight and bias. To assess the potential effect of adding SIC as the target on the MPF retrieval, we separately trained the networks that include SIC from three independent sources as the target data. The MPFs retrieved from these networks are similar. This suggests that the SIC target data has a minor effect on the MPF retrieval in our method. We further carried out detailed validation of our MPF retrieval against the in-situ observations, including independent observations. For the initial MPF retrieval on the 500 m polar stereographic grid, the RMSE were below 0.1 against most of the observations. This was proposed as an important factor to evaluate the accuracy of MPF [34]. The final MPF data on the 12.5 km polar stereographic grid also suggested our MPF retrieval was in good agreement with the observations, including independent validation data. The correlation coefficients of the final MPF data against the observations were mostly higher than that of the previously published data [35]. Based on this, we created a new data set of MPF from 2000 to 2019. We are currently attempting to make the data in near-real time and available to the research community.
We compared our MPF data to the previously published MPF datasets (UH_MPFv2, UB_MPF) during 2003-2011. In general, the three MPFs show good agreement in terms of the evolution of the MPF throughout the melting season. However, our MPF and UH_MPFv2 increase relatively slower than UB_MPF in June. The UB_MPF reaches the maximum in early July, but our MPF and UH_MPFv2 reach the peak about two 8-day periods later. During the later melting season, the UH_MPFv2 is significantly higher than the other two data and also maintains at the high fraction for a longer time. Our MPF during this period is between the UH_MPFv2 and UB_MPF. We further compared the evolution of the three retrieved MPFs to the observations. The results suggest our MPF is generally closer to the observations than the other two data throughout the melting season.
As mentioned previously, a few efforts have been taken to generate the Arctic wide MPF from the MODIS and MERIS data. However, these data are only available until 2011. Here we created a new data set of MPF from 2000 to 2019, which is the longest MPF data in our knowledge. Previous studies showed the MPF is a good predictor of the September Arctic sea ice extent [29,30]. However, the satellite-based MPF suggests a prediction time of one month later than the model-based MPF for the sea ice extent. This dataset, which has a longer temporal coverage, can validate the prediction time and further confirm the predictive ability of MPF for the sea ice extent in recent years. This new data can also be used to evaluate melt pond parameterizations in sea ice models. So far, the MPF has not been assimilated into sea ice models yet. The assimilation of melt ponds in dynamical models may help to improve the prediction of the Arctic sea ice extent [76].
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-4292/12/17/2746/s1, Figure S1. Cases of the observed MPF from NSIDC overlaid on NASA Team SIC, Figure S2. Cases of the observed MPF from PRIC-Lei overlaid on NASA Team SIC, Figure S3. Cases of the observed MPF from NPI overlaid on NASA Team SIC, Figure S4. Cases of the observed MPF from TransArc overlaid on NASA Team SIC, Figure S5. Cases of the observed MPF from HOTRAX overlaid on NASA Team SIC, Figure S6. Cases of the observed MPF from DLUT overlaid on NASA Team SIC, Figure S7. Cases of the observed MPF from IceWatch overlaid on NASA Team SIC, Figure S8. Cases of the observed MPF from JOIS overlaid on NASA Team SIC, Figure S9. Observed MPF from (a) MEDEA and (b) WorldView against the Observed SIC, Table S1. Correlation coefficients and RMSE in total, training, validation, and test dataset for each observation in 500 m grid, Table S2. RMS of the observed MPF and RMSE of the DNN_MPF against the observed MPF.