Remote Sensing-Based Statistical Approach for Deﬁning Drained Lake Basins in a Continuous Permafrost Region, North Slope of Alaska

: Lake formation and drainage are pervasive phenomena in permafrost regions. Drained lake basins (DLBs) are often the most common landforms in lowland permafrost regions in the Arctic (50% to 75% of the landscape). However, detailed assessments of DLB distribution and abundance are limited. In this study, we present a novel and scalable remote sensing-based approach to identifying DLBs in lowland permafrost regions, using the North Slope of Alaska as a case study. We validated this ﬁrst North Slope-wide DLB data product against several previously published sub-regional scale datasets and manually classiﬁed points. The study area covered >71,000 km 2 , including a >39,000 km 2 area not previously covered in existing DLB datasets. Our approach used Landsat-8 multispectral imagery and ArcticDEM data to derive a pixel-by-pixel statistical assessment of likelihood of DLB occurrence in sub-regions with different permafrost and periglacial landscape conditions, as well as to quantify aerial coverage of DLBs on the North Slope of Alaska. The results were consistent with previously published regional DLB datasets (up to 87% agreement) and showed high agreement with manually classiﬁed random points (64.4–95.5% for DLB and 83.2–95.4% for non-DLB areas). Validation of the remote sensing-based statistical approach on the North Slope of Alaska indicated that it may be possible to extend this methodology to conduct a comprehensive assessment of DLBs in pan-Arctic lowland permafrost regions. Better resolution of the spatial distribution of DLBs in lowland permafrost regions is important for quantitative studies on landscape diversity, wildlife habitat, hydrology, geotechnical conditions, high-latitude carbon


Introduction
Permafrost-region lakes and drained lake basins (DLBs) are dominant landforms in Arctic lowland regions and play an important role in the geomorphological and ecological Expanding upon existing regional DLB datasets and working towards a pan-Arctic DLB dataset is a crucial step in furthering our understanding of Arctic landscapes on a circumpolar scale, as well as an important milestone for efforts seeking to upscale field measurements, such as carbon stocks and habitat change, and to better represent permafrost landscapes in Earth system models. To work towards pan-Arctic mapping of DLBs, a scalable mapping approach utilizing widely available data and accessible methodologies is necessary. In this study, we present a novel DLB mapping approach utilizing freely available data (Landsat imagery and the ArcticDEM), which can be readily transferred and scaled to other areas within the pan-Arctic permafrost region. This study provides the impetus and a framework for future large-scale mapping of DLB areas in lowland Arctic permafrost regions.

Study Area
In this study, we focus on the lake-rich region of the North Slope of Arctic Alaska, an area (>71,000 km 2 ) that is delineated by the administrative borders of the North Slope Borough (see Figure 1) and the Canadian border, as well as natural borders such as the Brooks Range to the south and the Arctic Ocean (Beaufort and Chukchi sea) coastline to the north and west [44]. The North Slope region is underlaid by thick, continuous permafrost (200-600 m) [45]. The surficial geology ( Figure 2) varies across the study region, with the most prevalent being aeolian silt, marine sand, and eolian sand [12]. High ground ice content occurs in regions underlaid by marine, fluvial, and eolian silt deposits with large ice wedges, ranging from 50 to 90% of the total volumetric ice content [9]. In these areas, development of thermokarst (surface subsidence caused by the melting of massive ground ice) is common [12,46]. In other areas with coarser sediment, such as sand and gravels, the permafrost tends to be less ice-rich, with less subsidence upon thawing. Lakes and DLBs in these regions are typically considered to be of non-thermokarst origin [9]. The active layer, being the ground surface layer that thaws during the summer and freezes during the winter, reaches from 20 cm to more than 100 cm thickness [45]. However, active layer thickness can vary greatly on a local level, being strongly influenced by local air and surface temperatures [47], the thermal conductivity of the soil, the thickness of the surficial organic mat, and snow thickness distribution. The North Slope region is characterized by numerous lakes, DLBs, and other thermokarst features [11]. Vegetation across the North Slope varies along a north to south gradient from the coast towards the Brooks Range [48]. The North Slope region is underlaid by thick, continuous permafrost (200-600 m) [45]. The surficial geology ( Figure 2) varies across the study region, with the most prevalent being aeolian silt, marine sand, and eolian sand [12]. High ground ice content occurs in regions underlaid by marine, fluvial, and eolian silt deposits with large ice wedges, ranging from 50 to 90% of the total volumetric ice content [9]. In these areas, development of thermokarst (surface subsidence caused by the melting of massive ground ice) is common [12,46]. In other areas with coarser sediment, such as sand and gravels, the permafrost tends to be less ice-rich, with less subsidence upon thawing. Lakes and DLBs in these regions are typically considered to be of non-thermokarst origin [9]. The active layer, being the ground surface layer that thaws during the summer and freezes during the winter, reaches from 20 cm to more than 100 cm thickness [45]. However, active layer thickness can vary greatly on a local level, being strongly influenced by local air and surface temperatures [47], the thermal conductivity of the soil, the thickness of the surficial organic mat, and snow thickness distribution. The North Slope region is characterized by numerous lakes, DLBs, and other thermokarst features [11]. Vegetation across the North Slope varies along a north to south gradient from the coast towards the Brooks Range [48]. Tundra vegetation is present across the region, with sedges, mosses and dwarf shrubs being dominant in lowland areas and tussock-forming sedges, dwarf shrubs, and mosses being dominant in uplands [48][49][50]. Vegetation varies between DLBs and the surrounding areas, as well as among DLBs of different drainage ages [42].

Data Selection and Pre-Processing 2.2.1. Landsat-8 Mosaic
In this study, we utilized Landsat-8 Tier-1 Top of Atmosphere (TOA) late summer imagery (August to mid-September) from 2014-2019 through Google Earth Engine (GEE) [51] to create a complete cloudless mosaic for the North Slope of Alaska. Landsat-8 provides multispectral data in multiple bands, including 3 bands in the visible spectrum, a near infrared band (NIR, 0.85-0.88 µm), and 2 short-wave infrared bands (SWIR, 1.57-1.65 µm; 2.11-2.29 µm) at a moderate spatial resolution (30 m). TOA imagery is necessary when working with Tassled Cap indices (as the coefficients are calibrated for TOA), which are described below. Mid-to late summer imagery was chosen to avoid early summer flooding caused by snow melt, which would lead to an overestimation of lake size and interfere with unambiguously defining DLBs. The Landsat-8 mosaic was filtered using a series of additional steps prior to analysis. First, to achieve the cloudless mosaic and to exclude pixels influenced by the presence of clouds, we took advantage of the Simple Cloud Score Algorithm, implemented in GEE. The algorithm computes a single cloud likelihood score (the likelihood of a pixel being cloud from 0 to 100), considering brightness, temperature, and the normalized difference snow index (NDSI) (https://developers.google.com/earth-engine/landsat; last accessed on 21 April 2021). The resulting cloud likelihood score is suitable for use as a simple cloud mask. Landsat-8 pixels with cloud likelihood scores below 20 were included in the mosaicking process. Other studies have limited their selection of pixels to those with cloud likelihood scores below 10 [52][53][54]; however, due to the scarcity of cloud-free Landsat acquisitions, we expanded the limit to include imagery with a cloud likelihood score of up to 20. Pixels fulfilling this criterion were used to create a median mosaic from all included imagery, taking the median value of each available pixel over the time series. Using the median value instead of the mean value served as a second step to exclude pixels that were outliers due to cloud influence or ephemeral flooding during a year with high precipitation. The resulting cloud-free Landsat-8 mosaic (Figure 1) was used for all further analyses in this study. To exclude water bodies, including lakes, rivers, and the ocean, we used the normalized difference water index (NDWI) with a threshold of 0.4 [55]. This threshold included both larger lakes as well as smaller surface water bodies, such as remnant lakes within partially drained basins. Finally, mountainous areas were masked from the Landsat-8 mosaic using a landform dataset presented in [56], which is further described in Section 2.2.3. Tundra vegetation is present across the region, with sedges, mosses and dwarf shrubs being dominant in lowland areas and tussock-forming sedges, dwarf shrubs, and mosses being dominant in uplands [48][49][50]. Vegetation varies between DLBs and the surrounding areas, as well as among DLBs of different drainage ages [42].

Landsat-8 Mosaic
In this study, we utilized Landsat-8 Tier-1 Top of Atmosphere (TOA) late summer imagery (August to mid-September) from 2014-2019 through Google Earth Engine (GEE) [51] to create a complete cloudless mosaic for the North Slope of Alaska. Landsat-8 provides multispectral data in multiple bands, including 3 bands in the visible spectrum, a near infrared band (NIR, 0.85-0.88 µm), and 2 short-wave infrared bands (SWIR, 1.57-1.65 µm; 2.11-2.29 µm) at a moderate spatial resolution (30 m). TOA imagery is necessary when working with Tassled Cap indices (as the coefficients are calibrated for TOA), which are described below. Mid-to late summer imagery was chosen to avoid early summer flooding caused by snow melt, which would lead to an overestimation of lake size and interfere with unambiguously defining DLBs. The Landsat-8 mosaic was filtered using a series of additional steps prior to analysis. First, to achieve the cloudless mosaic and to exclude pixels influenced by the presence of clouds, we took advantage of the Simple Cloud Score Algorithm, implemented in GEE. The algorithm computes a single cloud likelihood score (the likelihood of a pixel being cloud from 0 to 100), considering brightness, temperature, and the normalized difference snow index (NDSI) (https://developers.google.com/earth-engine/landsat; last accessed on 21 April 2021). The resulting cloud likelihood score is suitable for use as a simple cloud mask. Landsat-8 pixels with cloud likelihood scores below 20 were included in the mosaicking process. Other studies have limited their selection of pixels to those with cloud likelihood scores below 10 [52][53][54]; however, due to the scarcity of cloud-free Landsat acquisitions, we expanded the limit to include imagery with a cloud likelihood score of up to 20. Pixels fulfilling this criterion were used to create a median mosaic from all included imagery, taking the median value of each available pixel over the time series. Using the median value instead of the mean value served as a second step to exclude pixels that were outliers due to cloud influence or ephemeral flooding during a year with high precipitation. The resulting cloud-free Landsat-8 mosaic (Figure 1) was used for all further analyses in this study. To exclude water bodies, including lakes, rivers, and the ocean, we used the normalized difference water index (NDWI) with a threshold of 0.4 [55]. This threshold included both larger lakes as well as smaller surface water bodies, such as remnant lakes within partially drained basins. Finally, mountainous areas were masked from the Landsat-8 mosaic using a landform dataset presented in [56], which is further described in Section 2.2.3. Figure 2. General geology of the study area based on [56], with gray areas representing areas outside of the study area.
For the mapping of DLBs, the Tasseled Cap transformation (TCT) was derived from the cloud-free Landsat-8 mosaic. The TCT is a well-established multispectral transformation that provides information on the brightness, greenness, and wetness of a given pixel and has been successfully used to monitor Arctic landscapes [57,58], including the evolution of retrogressive thaw slumps [59], landcover changes [60], and lake drainage events [61]. We used the TCT coefficients published by Baig et al. [62] with the transformation coefficients as reported in Table 1.

Landform Classification
To complement the spectral information from the Landsat-8 mosaic, we used the Arctic-DEM [63] to develop a general landform classification. The ArcticDEM is a high-resolution circumpolar elevation dataset provided by the Polar Geospatial Center (PGC) [63]. Arctic-DEM data were derived from high-resolution WorldView-1, WorldView-2, WorldView-3, and GeoEye-1 panchromatic stereo imagery. The ArcticDEM Mosaic product's full resolution is 2 m, which we aggregated, using the median elevation value, to a 10 m digital elevation model (DEM) to minimize irregularities in the data, while still retaining a spatial resolution capable of resolving primary landform features. The ArcticDEM was also used because it covers the entire circumpolar region and allows for a future expansion of our workflow to other regions in the Arctic.
Based on an approach presented by Guisan et al. [64], Weiss [65] proposed a GIS-based landform classification approach based on the Topographic Position Index (TPI) or the difference from mean elevation (DIFF) as defined by Gallant and Wilson [66]. The TPI describes the relative topographic position of a point as the difference between its elevation and the mean elevation of a predetermined neighborhood with a set radius. The approach introduced by [65][66][67], and applied in a number of landscape-scale studies (e.g., [68][69][70][71]), allowed us to differentiate the study area into 10 distinct landform classes: (1) deeply incised streams, (2) mid-slope drainages and shallow valleys, (3) upland drainages and headwaters, (4) U-shaped valleys, (5) lowland plains, (6) open slopes, (7) upland plains, (8) local ridges/hills in valleys, (9) mid-slope ridges and small hills in plains, and (10) mountain tops and high ridges. In our study, we used this landform classification method through the ArcGIS extension published by [67] to create a TPI-based landform classification using the ArcticDEM for our study area (see Table 2). The 10 landform classes resulting from this approach were adapted to better describe periglacial landforms, as this terminology differed from the terminology more commonly used in these settings. Landforms included in the analysis are shown in Figure 3.

Ancillary Geospatial Datasets
Maps detailing the surficial geology of the North Slope of Alaska published in Jorgenson et al. [56] were used to exclude mountainous areas, large rivers, and active floodplains from the data set prior to any statistical analysis. Previously published local and regional DLB datasets were used for comparison with our remote sensing-based statistical approach. These include data published in Frohn et al. [6] based on Landsat ETM+ data for the western Arctic Coastal Plain of Alaska, in Jones and Arp [11] for an area north of Teshekpuk Lake based on Landsat ETM+ data and an IfSAR digital surface model, and in Jones et al. [72] for an area within the Anaktuvuk River fire [73] area based on TPI analysis of down-sampled LiDAR data.

Local Moran's I Statistic
We used the local Moran's I statistic to develop a per-pixel statistical likelihood DLB product based on the TCT from the Landsat-8 mosaic ( Figure 4). The Local Moran's I statistic of spatial association is based on the global Moran's I and it was first proposed by Anselin [74] as a local indicator of spatial association (LISA) statistic. It is commonly used to detect patterns of spatial clusters. While the global Moran's I provides a measure for the spatial autocorrelation of a given dataset, the local Moran's I relates each observation (in the case of satellite imagery, this means each pixel) to its neighbors and it gives an indication of spatial autocorrelation on a local scale. To achieve this, the local Moran's I estimates and captures the strength of the similarity in x between observations i and j within the neighborhood of i. This statistic is calculated as follows: where z i = x i − x and w ij is the matrix of weights. Equation (1) standardized the value x for observation i. It determined if this value was high or low relative to the regional mean value and subsequently standardized the values of x for j to determine if the neighborhood had high or low values relative to the mean.

Ancillary Geospatial Datasets
Maps detailing the surficial geology of the North Slope of Alaska published Jorgenson et al. [56] were used to exclude mountainous areas, large rivers, and act floodplains from the data set prior to any statistical analysis. Previously published lo and regional DLB datasets were used for comparison with our remote sensing-based s tistical approach. These include data published in Frohn et al. [6] based on Landsat ET , and the northern Teshekpuk Lake area (C)) discussed in the text. Overlaid is a DLB layer, as published by Frohn et al. [6].
In this study, we calculated the local Moran's I using the MoranLocal function included in the R raster package [75]. The MoranLocal function was applied to the greenness, brightness, and wetness bands of the TCT based on the cloud-free Landsat-8 mosaic. The statistic was calculated using a moving window with a size of 3 × 3 pixels applied to the different bands of the TCT. Different sizes for the moving window were tested with 3 × 3 pixels showing the best results following visual interpretation. High values for the local Moran's I statistic generally indicate that a given pixel is part of a local cluster,  Figure 4. Workflow of the DLB classification approach. Th workflow was separated into tasks accomplished using Google Earth Engine, R, and ArcGIS. Landsat imagery was processed by Google Earth Engine and transformed via the Tasseled Cap transformation into greenness, brightness, and wetness information (orange box). This information was then further processed using R and by applying local Moran's I statistics, and further combined into the pDLBraw and pDLB values.

Remote Sensing-Based Statistical Classification Approach
Our novel remote sensing-based statistical classification approach for mapping DLBs on the North Slope of Alaska combined the results of the local Moran's I statistic from the TCT data and a landform parameter based on the TPI analysis of the ArcticDEM. We assumed that DLBs had comparably large local Moran's I values within the basin, caused by DLBs differing from their surrounding areas in greenness, brightness, and wetness and therefore forming clusters with similarly high or low values. was is assumed that areas of negative local Moran's I values could be excluded from the subsequent analysis, as they represented areas of spatial outliers and were not part of clusters that could signify DLBs. A linear combination of the results from the local Moran's I statistics of greenness, brightness, and wetness was chosen to determine if the pixels were likely to be located in DLBs. To determine the optimal coefficients for the linear combination of the local Moran's I results, different coefficients were tested and their results compared with a manually classified point dataset (see Section 2.5). The results of this iterative approach are reported in Section 3.2, Table 3 and Figure 7.
The coefficients in Equation (2) were found to be the combination that resulted in the most accurate DLB classification and were used for the DLB dataset presented in this study.

pDLB
= LMI(brightness) + 2 × LMI(greenness) − LMI(wetness) Equation (2), including the parameters local Moran's I (LMI) value for brightness, greenness, and wetness, results in the pDLBraw value (the probability of a pixel being a DLB), depending solely on the statistics based on the TC values of the Landsat-8 mosaic. . Workflow of the DLB classification approach. Th workflow was separated into tasks accomplished using Google Earth Engine, R, and ArcGIS. Landsat imagery was processed by Google Earth Engine and transformed via the Tasseled Cap transformation into greenness, brightness, and wetness information (orange box). This information was then further processed using R and by applying local Moran's I statistics, and further combined into the pDLBraw and pDLB values.

Remote Sensing-Based Statistical Classification Approach
Our novel remote sensing-based statistical classification approach for mapping DLBs on the North Slope of Alaska combined the results of the local Moran's I statistic from the TCT data and a landform parameter based on the TPI analysis of the ArcticDEM. We assumed that DLBs had comparably large local Moran's I values within the basin, caused by DLBs differing from their surrounding areas in greenness, brightness, and wetness and therefore forming clusters with similarly high or low values. was is assumed that areas of negative local Moran's I values could be excluded from the subsequent analysis, as they represented areas of spatial outliers and were not part of clusters that could signify DLBs. A linear combination of the results from the local Moran's I statistics of greenness, brightness, and wetness was chosen to determine if the pixels were likely to be located in DLBs. To determine the optimal coefficients for the linear combination of the local Moran's I results, different coefficients were tested and their results compared with a manually classified point dataset (see Section 2.5). The results of this iterative approach are reported in Section 3.2, Table 3 and Figure 7. Table 3. Results of the t-test (mean value, p-value, t-value, degrees of freedom (DF)) comparing local Moran's I (LMI) values for Tasseled Cap (TC) brightness, greenness, and wetness for areas within and outside of the DLBs as classified by Frohn et al. [6]. The coefficients in Equation (2) were found to be the combination that resulted in the most accurate DLB classification and were used for the DLB dataset presented in this study.

LMI TCT
Equation (2), including the parameters local Moran's I (LMI) value for brightness, greenness, and wetness, results in the pDLB raw value (the probability of a pixel being a DLB), depending solely on the statistics based on the TC values of the Landsat-8 mosaic. Equation (3) combines pDLB raw with the landform parameter (LP, see Equation (3)) for each pixel based on the landform classification of the ArcticDEM.
This landform parameter (LP) was created to exploit the information contained in a DEM to increase the accuracy of the DLB mapping approach. The LP adjusts the pDLB raw value depending on the landform classification. As described in Equation (3), the LP can only lower the value of pDLB raw for certain landforms, not increase the value. Landform types that are assumed to rule out the possibility of DLBs (e.g., smaller mountain tops and ridges missed by the masking approach) were given LP values of 0. For these topographic high points, the pDLB value was set to 0. For each landform type, LP values were chosen by comparing the DEM-based landform classification with satellite imagery and visually assessing the likelihood of DLBs occurring within this landform. Landforms like U-shaped valleys and plains were given the LP value of 1, meaning the pDLB raw value would not be adjusted. Other landforms were assigned values between 0 and 1, depending on how frequently DLBs occur in these landform types.

Accuracy Assessment and Calibration
To assess the validity of the remote sensing-based statistical classification approach for classifying DLBs on the North Slope of Alaska, 1500 points were randomly generated using the ArcGIS Create Random Points tool and manually classified in ArcGIS into DLB vs. non-DLB. Of the randomly generated points, 147 points were in excluded or masked areas, resulting in 1353 points available for validation within the study area. These random points were compared to with DLB classification results and formed the basis of our primary accuracy assessment. For the comparison, the pDLB as well as DLB classes (DLB/non-DLB, described in Section 2.7) were extracted using the manually classified points. The manual classification was then compared with both the pDLB values and the DLB classes generated by the approach presented in this study. To complement the random point datasets, we also used several ancillary DLB datasets with localized coverage, published in the past, as a secondary accuracy assessment. The datasets used for this purpose are described in Section 2.2.3. To strengthen the results, a Welch 2-sample t-test, as implemented in R as the t.test function, was used to show the validity of the model selection as well as to corroborate the initial assumption of the suitability of the local Moran's I and Tasseled Cap approach.

Quantifying Area of Lakes and Drained Lake Basins
To quantify the area of lakes and DLBs in areas of different surficial geology [56], lakes were extracted from the water-mask dataset used to remove water bodies from the DLB mapping analysis. The water-mask included larger lakes as well as remnant lakes within partially drained basins. To separate lakes from other water bodies included in the mask dataset, the raster data were transformed to polygons, and features such as the ocean, coastal water, and rivers were excluded manually.

Classifying Drained Lake Basins
From the 1353 random points generated for the accuracy assessment, thresholds for DLB presence/absence classification were derived for subsets of the study area. The dataset was divided into subsets representing the areas of the different surficial geology classes based on [56] present in the study area (see Figure 2). Based on the agreement of the manually classified random points for different subsets, thresholds were determined to separate the pDLB values into DLB/non-DLB classes. Due to the data preparation prior to analysis, the non-DLB class did not include rivers, lakes, and mountainous areas. For each subset representing a unique surficial geology class, the lower threshold was defined as the average of the median value of random points classified as DLB and the median value of random points classified as non-DLB. Values under this lower threshold were classified as non-DLB. Values above the median of random points classified as DLB were assigned to the DLB class. To account for areas where pDLB values might not be clearly separable (see the boxplots in Figure 8), a third class for ambiguous values was created, encompassing values above the lower threshold and under the median of pDLB values classified as DLBs.

Local Moran's I Statistic Based on Tasseled Cap Transformation
The remote sensing-based statistical classification approach using the cloud-free Landsat-8 mosaic for the North Slope of Alaska resulted in a likelihood index for the presence of DLBs. The basis of the DLB likelihood index (pDLB raw ) presented in this study was formed by the statistical analysis of the TCT derived from the cloud-free Landsat-8 mosaic. The local Moran's I for greenness, brightness, and wetness showed distinct features, suggesting that a combination of all three variables could be suitable for distinguishing between areas likely to be DLBs and those that are unlikely to be DLBs ( Figure 5). Figure 5 shows local examples of the local Moran's I for greenness, brightness, and wetness, with likely areas of DLBs being especially distinct in the greenness and brightness layers. This observation is supported by the histograms shown in Figure 6, which compare the local Moran's I values for greenness, brightness, and wetness within and outside of DLB areas, based on data from a previously published DLB dataset [63]. For all three parameters (local Moran's I values for greenness, brightness, and wetness), the areas which were not previously classified as DLBs showed a high percentage of their area to be covered by low values, with >50%, >40%, and >30% of the non-DLB areas showing values between 0 and 0.2 for the local Moran's I of brightness, greenness, and wetness, respectively. The results of two-sample t-tests supported this observation, showing significant (p-value < 0.001) differences in mean values for the local Moran's I values for greenness, brightness, and wetness representing areas within and outside of the DLB areas as classified by Frohn et al. [6] (Table 3). Remote Sens. 2021, 13, x FOR PEER REVIEW 12 of 28   [6]. The x-axis shows data binned into 0.2 wide bins of LMI values.

Defining Drained Lake Basins
The DLB classification presented in this study was based on the equations reported in Section 2.4 (Equation (2) and Equation (3)). Figure 7 shows the comparison of the pDLBraw values to manually classified points. This comparison revealed the ability of the different coefficients to possibly distinguish DLBs. Version 7 was chosen, as it showed the largest differences between pDLBraw median values for points classified as DLB/non-DLB. The coefficients also exhibited the largest absolute t-value (−21.497, see Table 4), indicating that this combination of coefficients was the most suitable for distinguishing between DLB and non-DLB areas (Figure 8).   [6]. The x-axis shows data binned into 0.2 wide bins of LMI values.

Defining Drained Lake Basins
The DLB classification presented in this study was based on the equations reported in Section 2.4 (Equations (2) and (3)). Figure 7 shows the comparison of the pDLBraw values to manually classified points. This comparison revealed the ability of the different coefficients to possibly distinguish DLBs. Version 7 was chosen, as it showed the largest differences between pDLB raw median values for points classified as DLB/non-DLB. The coefficients also exhibited the largest absolute t-value (−21.497, see Table 4), indicating that this combination of coefficients was the most suitable for distinguishing between DLB and non-DLB areas (Figure 8).
Remote Sens. 2021, 13, x FOR PEER REVIEW 13 of 28 Figure 6. Histograms of local Moran's I values for Tasseled Cap brightness, greenness, and wetness for areas within and outside of the DLBs as classified by Frohn et al. [6]. The x-axis shows data binned into 0.2 wide bins of LMI values.

Defining Drained Lake Basins
The DLB classification presented in this study was based on the equations reported in Section 2.4 (Equation (2) and Equation (3)). Figure 7 shows the comparison of the pDLBraw values to manually classified points. This comparison revealed the ability of the different coefficients to possibly distinguish DLBs. Version 7 was chosen, as it showed the largest differences between pDLBraw median values for points classified as DLB/non-DLB. The coefficients also exhibited the largest absolute t-value (−21.497, see Table 4), indicating that this combination of coefficients was the most suitable for distinguishing between DLB and non-DLB areas (Figure 8).     The pDLBraw parameter, which describes the results of the DLB classification ap- The pDLB raw parameter, which describes the results of the DLB classification approach before the inclusion of landform information, showed visible delineation of DLBs (see Figure 9). The inclusion of DEM information into the classification approach led to a clearer delineation of DLB features (see Figure 9). Thresholds for separating the pDLB values into DLB/ambiguous/non-DLB classes were determined separately for areas of different surficial geology (see Table 5). Initial tests using one threshold for the entire study area did not lead to satisfactory results. Figure 8 shows the pDLB values for manually classified random points with distinct differences among the different surficial geology types. For some types of surficial geology (for example, eolian sand), points manually classified as DLB and non-DLB, respectively, had distinctly different pDLB values. For other surficial geology types, such as colluvium (upland), values for DLB and non-DLB areas had a significant overlap (see Figure 8). Based on the pDLB values and the manually classified points, the results of this study are the DLB maps presented in Figures 10-13. The DLB maps, based on the pDLB values and the thresholds found through the data presented in Figure 8, were created separately for the areas with different types of surficial geology that are included in Figure 8.

Accuracy Assessment
A comparison of the manually classified points with the DLB classification results (Figure 8) revealed that of the points manually classified as DLBs, 64.46% were classified as DLB areas by our approach, 31% were classified as ambiguous areas, and 4.54% were classified as non-DLBs by our approach (Table 6). Of the points which were manually classified as non-DLBs, 83.2% were classified as non-DLBs by our approach, 12.2% were classified as ambiguous, and 4.6% were classified as DLBs by our approach (Table 6). These percentages varied greatly among different surficial geology types (see Figures 10-12). as DLB areas by our approach, 31% were classified as ambiguous areas, and 4.54% were classified as non-DLBs by our approach (Table 6). Of the points which were manually classified as non-DLBs, 83.2% were classified as non-DLBs by our approach, 12.2% were classified as ambiguous, and 4.6% were classified as DLBs by our approach (Table 6). These percentages varied greatly among different surficial geology types (see .  [11]. Maps of the overall study area (A) and bar plots (B) showing a comparison of the previously published dataset [11] with the DLB dataset presented in this study. Figure 10. Comparison of the DLB dataset (including the classes non-DLB, ambiguous, and DLB) with the existing dataset, published in Jones et al. [11]. Maps of the overall study area (A) and bar plots (B) showing a comparison of the previously published dataset [11] with the DLB dataset presented in this study.
Remote Sens. 2021, 13, x FOR PEER REVIEW 17 of 28 Figure 11. Comparison of DLB dataset (including the classes non-DLB, ambiguous, and DLB) with existing data from Jones et al. [75]. Maps of the overall study area (A) and a zoomed-in example (B), and bar plots (C), showing a comparison of the previously published dataset [75] with the DLB dataset presented in this study.

Areas of Lakes and DLBs for Regions with Different Surficial Geology
The relative area occupied by DLBs is not equally distributed across the subregions of the study area. Areas of different surficial geology have different DLB area extents. Table 4 shows the estimated lake area, based on the water-mask used in this analysis, and the DLB areas for the different surficial geology types present in the study area. The highest areal coverage of both DLBs and lakes was found in areas of the coastal plain and areas of eolian sand. When we considered the ratio of DLB area to lake area, areas described as colluvium basin and the coastal plain had the highest values. Ranges of DLB areas included minimum values based on the extent of the DLB class and the extent of both the DLB and the ambiguous class combined.

Areas of Lakes and DLBs for Regions with Different Surficial Geology
The relative area occupied by DLBs is not equally distributed across the subregions of the study area. Areas of different surficial geology have different DLB area extents. Table 4 shows the estimated lake area, based on the water-mask used in this analysis, and the DLB areas for the different surficial geology types present in the study area. The highest areal coverage of both DLBs and lakes was found in areas of the coastal plain and areas of eolian sand. When we considered the ratio of DLB area to lake area, areas described as colluvium basin and the coastal plain had the highest values. Ranges of DLB areas included minimum values based on the extent of the DLB class and the extent of both the DLB and the ambiguous class combined.

DLB Data Product
Our approach resulted in a comprehensive DLB classification for the North Slope of Alaska covering >71,000 km 2 , including previously mapped areas [6,11,73], as well as 55% of newly mapped terrain ( Figure 13). The classification approach captured the diversity in DLBs prevalent in the study area, including permafrost areas with varying volumetric ground ice content, and in regions with differing tundra vegetation communities. This approach differs from previously published mapping methodologies that relied on DLBs' similarity to each other, assuming DLBs are similar to each other in spectral values and shape and match a substantial amount of the training data [6,7]. The training data of these previous approaches required significant manual interaction in the classification workflows. Our approach relied on the DLBs demonstrating different characteristics compared with the surrounding areas in greenness, wetness, and brightness and in landform characteristics, as derived from the ArcticDEM. Exploiting this difference in characteristics instead of relying on classification methods based on basins' similarity to each other in pixel values or shape parameters allowed this approach to be applied to a wide range of environments and DLB types, such as shallower basins, recently drained basins, and basins drained thousands of years ago. The basis of the classification approach presented in this paper was the results of the local Moran's I for the three bands of the Tasseled Cap Transformation (brightness, greenness, and wetness). This was shown to be a suitable approach for extracting DLB areas The basis of the classification approach presented in this paper was the results of the local Moran's I for the three bands of the Tasseled Cap Transformation (brightness, greenness, and wetness). This was shown to be a suitable approach for extracting DLB areas from Landsat imagery, with significant differences (p-value < 0.001) between DLB and non-DLB areas (as classified by Frohn et al. [6]). The results of the t-test strengthened the initial observations based on visual interpretation of the TCT (Figure 9), which suggested that the DLBs can be identified by using the TCT. The local Moran's I was calculated by using a set moving window size of 3 × 3 pixels, translating to an area of 90 m × 90 m (taking into account the 30 m Landsat-8 pixel size). Larger window sizes are likely to miss smaller DLBs, as well as drained areas of partially drained basins. While smaller than 90 m × 90 m window sizes might be suitable for DLB mapping utilizing higher resolution satellite imagery, with the given resolution of Landsat-8 (30 m), this would not be a valid approach.
Our approach relies in part on a landform classification [67] separating different relief types which can be associated with DLB features (e.g., valleys, slopes) or that can be assumed not to be DLBs (e.g., ridges, summits, streams, and drainages). In order to include the landform information into the DLB classification approach, each landform was assigned an LP value, adjusting the pDLB raw value. It is important to note that this adjustment was designed to lower the pDLB raw value for areas with landforms unlikely to be DLBs and had no possibility of increasing the pDLB raw value. It therefore could not be higher than the value resulting from Equation (3), based on the Tasseled Cap local Moran's I values. Specific LP values for the different landform classes were derived by visual comparison of the landform classification and previously published DLB data (specifically from Frohn et al. [6]; see Figure 3). Before possibly applying this method to a different or a larger study area, revisiting or adjusting this parameter to suit the specific study region is prudent. However, the area of interest in this study covers a wide range of Arctic tundra landscape types, ranging from coastal areas to the foothills of the Brooks Range. It can therefore be assumed that the LP values used in this study are somewhat robust. The TPI has been previously used in studies in combination with multispectral imagery [76][77][78]; however, the combination of a Landsat-8 based TCT with TPI inferred landform classes for landform mapping on a large scale is, to the best of our knowledge, a novel approach. While this landform classification approach was not originally designed for Arctic or cold-region environments, it proved to be a valuable tool for excluding certain areas from the DLB mapping approach, as well as showing the increased likelihood of other areas being DLBs. However, when DLBs that had low elevation differences from the surrounding landscape (e.g., low relief DLBs) or that had especially steep rims, low pDLB values were resolved for DLBs in some parts of the study area, resulting in misclassification and thus exclusion from the DLB dataset. Low-relief DLBs are often missed by the landform classification used in this approach [67,69] and are instead classified as part of plains and included in the ambiguous category. We argue that the use of DEMs with higher vertical resolution and accuracy, such as provided, for example, by airborne Light Detection and Ranging (LiDAR), could overcome the challenge of misclassifying very shallow DLBs. However, for a pan-Arctic approach, this type of data is not available. In addition, the TPI classification is sensitive to the resolution of the input DEM data. The ArcticDEM is available in 2-m resolution (as well as versions with reduced spatial resolutions of 10 m, 32 m, 100 m, 500 m, and 1 km). To reduce the influence of potential noise within the DEM data, as well as to cut down the computing time needed for applying this method to large study areas, the 10-m resolution version was chosen for our approach. It was assumed that this resolution would be sufficient to adequately represent DLB features. However, if this method would be applied, for example, to study spatial differences within basins, remnant lake features, or other features with small scale variability, a higher-resolution DEM might be more suitable.
In this study, pixels classified as DLBs were limited to areas which had completely drained, meaning that remnant lakes and other areas with open water within DLB features were not included in the results and were missing from the estimation of DLB area presented here. This should be taken into consideration when comparing the numbers presented here with already published or future estimates of DLB area on a pan-Arctic scale. While our study offers a comprehensive and novel approach to the classification of DLBs, the classification method cannot directly distinguish separate parts of overlapping or joined DLB features. Multi-generation separation of individual sub-basins from such coalesced DLBs would require manual post-processing of the presented dataset. Overlapping DLB features with often more than two connected or overlapping basins are common within the study area [6,10]. As such, in other regions, multiple overlapping basin generations are also common. For example, on the northern Seward Peninsula, up to seven overlapping generations of basins have been identified [10]. In the Lena Delta, the Yedoma uplands contain a large number of nested DLBs that form large complexes of deeply incised drained lake basins [16]. In several regions of northeastern Siberia, such as the Kolyma lowlands, overlapping DLBs dominate the landscape entirely, indicating massive degradation of permafrost by lake formation and drainage in the past [13]. Hence, the limitation in our approach to identify individual DLB features may result in an underestimate of the number of individual DLB features. However, our primary focus was on estimating the total areal coverage of DLBs, and a count of separate DLB features was not attempted here.
In addition, individual DLBs are not homogeneous features [7,42] and often show spatial heterogeneity within one basin, particularly as the basin ages [42]. These spatial differences include differences in surface moisture, vegetation patterns, and the presence or absence of remnant water bodies caused by partial drainage and complex drainage histories [11,23,25,27]. The spatial heterogeneity with DLBs led to differences in pDLB values within single basins and added to the difficulty of delineating basins based on the results presented in this study. Further separation of identified basin areas could rely on spectral characteristics associated with successional dynamics post-drainage [10,43] and potentially morphometric object-based characteristics [7].
The classification of our results into the DLB/ambiguous/non-DLB classes allowed for a better quantification of DLB areas compared with the pDLB results. Comparing the results with existing datasets as well as with manually classified random points showed the high potential of the approach for scaling to larger domains. Figure 8 shows the importance of treating areas with different surficial geology separately for classification. Manually classified random points did not allow for a clear separation of DLB and non-DLB areas for all areas with different surficial geology. The ambiguous class encompassed values that cannot be assigned to the other classes with statistical confidence. This allowed for a conservative estimate of DLB area (only including the DLB class) and a more encompassing estimate, including both the DLB and ambiguous classes, indicating the potential maximum range in DLB coverage of a landscape cover. Thresholds used for the classification were derived separately for the areas with different surficial geology (Table 5) and varied between a pDLB of 0.16 and 0.39. These large differences in thresholds further highlight the importance of treating areas with different surficial conditions separately in this process. For surficial geology classes where pDLB values were not clearly separated between DLB and non-DLB, the potential for misclassification of DLB as non-DLB areas (or the opposite) became more likely. For these areas, the ambiguous class naturally becomes of greater importance, as it will limit misclassified areas and, instead, clearly indicates the underlying uncertainty involved for these specific regions. Potential users of the dataset could decide on an application-specific basis if they wish to use the conservative estimate excluding the ambiguous class, or to utilize all classes, potentially including a larger area of non-DLBs which were misclassified.
Additional sources of uncertainty include the wide temporal range of the Landsat input data due to the requirement of creating a cloud-free mosaic, resulting in the potential for missing recent lake drainage events. A further potential source of error is the data gaps in the Arctic DEM, which may be relevant in some Arctic regions more affected by poor imaging conditions.

Comparison of DLB Classification with Previously Published Datasets
A comparison of the DLB classification results with existing DLB datasets for parts of the study area was made as a secondary accuracy assessment (Figures 10-12). A comparison of the DLB classification results to the DLB dataset presented in Frohn et al. [6] is shown in Figure 9 and indicates an agreement of 25% of the area classified as DLBs and >36% classified as non-DLB sin both datasets. Approximately 21% of the area was classified as ambiguous in our approach, with most of this area being classified as non-DLB area in the previously published dataset. For 15% of the area, our classification results and the Frohn dataset disagreed, with a majority of these areas being classified as DLBs in our approach and as non-DLBs by Frohn et al. [6]. A comparison of our results with the dataset presented by Jones et al. [11], shown in Figure 11, reveals good general agreement, with >53% of the total area being classified as DLBs in both datasets and 11% of the area being classified as non-DLBs by both datasets; however 7% of the area was classified as DLBs in the previously published dataset but as non-DLBs by our approach. A comparison of our results with the DLB data published in Jones et al. [73] (Figure 12) shows that a large portion of the area was classified as non-DLB in both datasets (39%) and 21% of the area was classified as DLBs in both datasets; however, 12% of the area was classified as non-DLBs in the previously published dataset but as DLBs by our approach.

Lakes and Drained Lake Basins on the North Slope of Alaska
Lakes and DLBs are common features in the Arctic landscape of the North Slope of Alaska. Previous studies have described the distribution of lakes and DLBs for different terrain types in the region, but a wall-to-wall map of the North Slope was lacking [6,22,[79][80][81]. By quantifying the proportion of lakes and DLBs in particular areas (i.e., by differing surficial geologies) and across the entirety of the North Slope, for the first time, this study provides critical information as to the rate of past landscape changes and a lens on the variability in landscape dynamics operating over the course of millennia. For example, general geology types that contain generally ice-rich permafrost deposits have a DLB:lake ratio that exceeds 0.7. In terrain with generally moderate to low permafrost ground ice content, the DLB:lake ratio ranges from 0.3 to 0.5, while in terrain with generally ice-poor permafrost deposits, the DLB:lake ratio is essentially zero. Thus, identifying the ratio of DLBs to lakes in regions with varying permafrost ground ice conditions provides a better understanding regarding the dynamics of surficial processes and the underlying landscape characteristics. Mapping the variability in permafrost ground ice conditions is inherently challenging (e.g., [9]); therefore, the observations provided by our remote-sensing based approach in this study could be used to better inform regional and pan-Arctic permafrost region mapping efforts.

Potential Applications and Upscaling of the Results
By using widely available input data, such as Landsat imagery, the ArcticDEM, and ancillary geospatial datasets that describe surface geology conditions, our classification approach appears to be useful for mapping DLBs in lowland permafrost regions across the Arctic. This makes the approach a valuable tool for a wide range of applications which require reliable DLB maps as inputs for their analysis. Possible applications for this dataset could be the upscaling of relevant field measurements in connection with the presence or absence of DLBs, upscaling basin ages based on radiocarbon dating results to achieve an estimate of basin age distribution across larger study regions [10,42,43], and habitat mapping for lowland permafrost regions in the Arctic. The dataset also has the potential to act as input data in future modeling approaches, including hydrological models, snow modeling, and permafrost models. The wide availability of Landsat-8 imagery and the generally good coverage over Arctic areas allows for potentially applying this approach on a circumpolar or pan-Arctic scale. While the ArcticDEM is available on a pan-Arctic level, it currently has gaps, especially over parts of Siberia. Regions with poor ArcticDEM coverage might require an alternate or supplemental DEM dataset, filling the gaps or extending the study area southward (as the ArcticDEM is limited to areas above 60 • N) to apply this classification approach. A pan-Arctic DLB product would substantially help with efforts concerning quantification of soil carbon, particularly permafrost soil carbon affected by prior lake thermokarst [81][82][83][84][85]. Further, understanding DLBs' distribution and ages will help in quantifying quaternary lake formation and persistence dynamics, which is directly relevant for determining lakes' history as well as lakes' methane dynamic [86]. A wide-spanning and especially a pan-Arctic DLB product will also help scaling high-latitude wetland methane emissions during the Holocene [86] and more generally help place current lake drainage dynamics in a wider spatial and temporal perspective [11,24,26,27].

Conclusions
The approach to mapping areas of DLBs in permafrost environments presented in this study was shown to be effective over areas with different Quaternary histories and over the large environmental and climatic gradient that is present on the North Slope of Alaska. The area included in this study covers >71,000 km 2 , including a >39,000 km 2 area not previously covered in existing DLB datasets. We showed the importance of treating areas with different surficial geology separately in DLB classification schemes due to DLBs in different areas having distinct characteristics and requiring different classification thresholds. The results were shown to be consistent with prior existing regional DLB datasets (up to 87% agreement, depending on the dataset) and show high agreement with manually classified random points (64.4-95.5% for DLB and 83.2-95.4% for non-DLB areas). This methodology has the potential to map DLBs on a pan-Arctic scale, providing a comprehensive workflow for DLB mapping where suitable Landsat imagery, ArcticDEM data, and information on variability in surficial geology exist. Comprehensive mapping of DLB areas across the circumpolar permafrost landscape would allow for future utilization of these data in pan-Arctic models and greatly further our understanding of DLBs in the context of permafrost landscapes. This approach relies on freely available multispectral and DEM datasets, highlighting the importance of freely available and accessible remote sensing datasets to the research community.

Data Availability Statement:
The DLB data produced in this study has been submitted for publication and will be openly available via the Arctic data center at doi.org/10.18739/A2FQ9Q63P, reference number [87]. In addition, two of the data sets used as comparisons in this study have also been submitted for publication and will be available under doi.org/10.18739/A2KH0F07C (reference number [88]) and doi.org/10.18739/A29Z90C99 (reference number [89]).