1. Introduction
The objective of this paper is to contribute to three challenges in different disciplines: (1) Earth observation and data analysis, (2) climatic and cryospheric change, and (3) machine learning (ML). In order to put these challenges and our approach into a broader context, this paper includes a review section centered around the three topics.
Challenge 1. Harnessing the data revolution in Earth observation from space. Observations of our rapidly changing Earth are largely carried out from space, and the collection of such Earth observation data from satellites has rapidly advanced with increasingly large and detailed data sets becoming available for scientific investigations [
1]. The data revolution has led to both new opportunities and challenges for science, as extraction of information on complex geophysical processes from large and high-resolution data sets is becoming increasingly difficult (a problem that has been summarized as “Harnessing the data revolution” by the U.S National Science Foundation [
2]). In turn, this phenomenon has created a cyberinfrastructure problem in terms of a disconnect between the revolutionary increase in satellite image data on the one hand and the development of numerical Earth system models on the other hand, which are employed to aid in assessment of global climatic changes and their manifestations in warming and sea-level rise (SLR) [
3,
4,
5,
6,
7,
8,
9,
10,
11,
12]. A bottleneck is created—growing with the data revolution—as this new wealth of information revealed by the new satellites makes it hard to incorporate observations into physical-process models, as the improved spatio-temporal scale introduced by the data sheds light onto subprocesses not easily incorporated into models.
In this paper, we will introduce an approach that integrates machine learning and physical knowledge into a physically-driven neural network, whose application will facilitate derivation of physical process understanding from high-resolution satellite data. Results include parameterized information in the form of thematic maps (time series of segmented satellite imagery) that can inform modeling as well as lend themselves to direct geophysical interpretation and discovery.
Challenge 2. Glacial acceleration and sea-level-rise assessment. We address a climatic and cryospheric change problem, the phenomenon of glacial acceleration, which has been identified as one of two main sources of uncertainty in SLR assessment, as identified by the Intergovernmental Panel on Climate Change (IPCC) in their 2013 Assessment Report 5 (the other source is atmospheric) [
13]. The most recent IPCC AR 6, published in 2021, does not present a solution but rather elevates the urgency of understanding glacial acceleration by declaring it a “deep uncertainty” in SLR assessment [
3]. The different types of accelerating glaciers include surge-type glaciers, tidewater glaciers, fjord glaciers (isbræ) and ice streams [
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30]. Acceleration frequency may be intrinsic to the glacier type, quasi-periodic, or single-time. Initialization of an acceleration may be due to internal dynamics of the glacier or externally forced, for instance, induced by warming ocean water at the front of the glacier or controlled by a combination of several factors [
31,
32,
33,
34]. Spatial acceleration may be due to subglacial (bed) topography [
33] or caused by a dynamic event. All types of acceleration typically lead to the formation of crevasse fields. Surging is the type of acceleration that has seen the least amount of research, and complexity of ice flow during surging defies many classic data analysis methods, thus rendering most cyberinfrastructures incapable of modeling this geophysical process.
In this paper, we focus on an exemplary analysis of glacial acceleration during the surge of an Arctic glacier system, the Negribreen Glacier System (NGS), through classification of crevasse patterns as indicators of the drastic and rapid dynamic changes that occur during a surge. The surge led to mass transfer from the glacier to the ocean on the order of 0.5–1% of global annual SLR in just a few months during the height of the surge [
35,
36]. The fact that a surge causes sudden mass transfer events from the cryosphere to the ocean leads to a catastrophic type of uncertainty in SLR estimation (with the term “catastrophic” defined as continuous changes leading to sudden effects). If we are to reconcile SLR assessment, we need to understand surge processes.
Challenge 3. Integration of physically-constrained classification and modern “Deep Learning” approaches in satellite image classification. The surge is captured in the time series of high-resolution satellite image data, which motivates a ML-based classification. While deep convolutional neural network (CNN) architectures have been considered to provide state-of-the-art performance on standard image classification benchmarks such as the ImageNet data set [
37,
38,
39,
40,
41], two problems exist: First, deeper networks only lead to increased performance up to a point, after which increased network depth results in increasingly worse performance due to the vanishing gradient problem [
42]. Second, and more challenging for applications in the cryospheric sciences, is the fact that no published labeled training data sets exist for tasks of classification of ice-surface features, such as crevasses (see [
43]). The role of crevasse types in identification of deformation types, which are directly related to glacial acceleration, will be described in
Section 3. A main task is thus the creation of such labeled data sets required for training of a neural net (NN). For CNNs, the problem is exacerbated by the fact that very large numbers of training data (on the order of 100,000 s) are needed.
We have previously developed a physically constrained ML approach, the connectionist-geostatistical classification method [
18,
44,
45]. The connectionist-geostatistical method uses a two-tiered approach, in which the first step is a physically informed spatial statistical analysis, carried out in a discrete mathematics framework. The output of the geostatistical step provides the input for the NN, activating the neurons of the input layer. In order to carry out an actual classification, a connectionist approach is selected, which can utilize a multi-layer perceptron with backpropagation of errors (MLP-BP), or simply, a MLP. The MLP has proven to provide a robust and functional architecture for this type of classification and provided an efficient solution 20 years ago already [
44]. To train the connectionist-geostatistical classification, a small data set suffices, of a size that can reasonably be derived by an expert [
44], on the order of several hundred labeled video-scenes or small subimages of a satellite image. However, advances in Earth observation, increasing data resolution and data set size, as well as advances in computer hardware and processing speed warrant investigation of modern “Deep Learning” architectures to facilitate fast and efficient processing.
The salient difference in the effectiveness of the two approaches lies less in the NN architecture (MLP versus CNN) than in the fact that the connectionist-geostatistical classification is a physically informed approach (where the physical knowledge informs our approach to geostatistics), whereas in the case of the CNN the network’s many more degrees of freedom are what the determination of classes relies on. CNNs can be trained supervised or unsupervised [
46].
In this paper, we will investigate the trade-offs of a physically constrained NN and a CNN and introduce a first approach to leverage the advantages of both ML methods in an integrated image classification system. We propose a solution to natural science problems that takes an approach of combining and integrating physically constrained neural networks and modern ML methods. To this end, we will demonstrate that a physically constrained NN can be utilized to aid in creating a labeled training data set of sufficient size to train a CNN. We emphasize that physical knowledge needs to be leveraged in designing a ML approach that can be expected to provide solutions for the physical sciences and advance knowledge there.
This last objective of the paper, providing an outlook towards future directions of ML in the physical sciences, is grounded in the context of a review section of ML learning in general and in the physical sciences, specifically and more recently the geosciences (
Section 2). Given that application of ML in the geosciences emerged several decades ago, but is now rapidly gaining traction, a quick review of the state of the art may be helpful for many readers. The review section covers topics from general references, classic papers and books, early applications of NNs in the geosciences, spectral versus spatial classification, computer science developments of ML methods for image processing and classification, such as CNNs, to the identified needs for advancing remote-sensing-data classification using ML methods, especially in the geosciences, reporting some recent geoscience applications of NNs, and lastly an approach aimed at integrating physical sciences and ML.
In
Section 3, we provide background on the glaciological problem, focusing on the importance of the surge phenomenon as a main source of uncertainty in sea-level-rise assessment, the surge of the Negribreen Glacier System (which is the glacier system that is studied in this paper) and the crevasse-centered approach that facilitates treatment of the surge problem using our machine learning approach, which is based on spatial data analysis.
4. Summary of the Approach
4.1. Objectives, Summary of Approach, Classification and Analysis Steps
The main objective of this paper is the exploration of the trade-offs between a physically constrained NN, a CNN (“Deep Learning”) for a specific, but generalizable, problem in the geosciences: The classification of crevasse types that form during the surge in an Arctic glacier system, the Negribreen Glacier System, Svalbard, to derive objective information about the evolution of the surge. To achieve this objective, we create a software system, termed GEOCLASS-image that facilitates classification of surface features from high-resolution satellite imagery and other imagery, perform testing and quality assessment (Q/A) of the software system, and release it as the core of an associated cyberinfrastructure.
Based on the results of the two trade-offs studies, we derive an example of a ML approach that combines the advantages of a physically constrained, classic NN with those of a CNN, thereby creating a physically constrained NN with a combined architecture, which will be termed VarioCNN. The final VarioCNN is applied to a time series of WorldView images, to derive information on the evolution of the surge in an Arctic Glacier System, the Negribreen Glacier System.
The combined NN, VarioCNN, will be applied to a time series of WorldView-1 and WorldView-2 images, collected in 2016–2018 during the acceleration stage and mature stage of the surge in the NGS. Each image will be analyzed individually and provided an element in a time series of thematic maps of crevasse provinces. The goal is to derive geophysical information on the evolution of the surge during these core stages. Specifically, we aim to create a classification of crevasse patterns, as they relate to deformation types that occur as a result of ice-dynamic processes. Crevasses are manifestations of the local strain state of the ice. Occurrence of fresh crevassing indicates the expansion of the surge, and as the surge progresses, new types of crevasse patterns form. The time series of crevasse maps will be interpreted geophysically. Lastly, we provide a description of the GEOCLASS-image software system.
In summary, the work in this paper builds on the following three ideas:
- (1)
Employ geostatistical parameters as a mathematical formulation for physically informed extraction of complex information from imagery
- (2)
Utilize different NN types as connectionist association structures: MLPs and CNNs
- (3)
Compare and then combine the NNs into a three-tiered approach: geostatistical-connectionist with MLP and CNN
4.2. Approach Steps
Objectives of the work in this paper are the following:
- (1)
Create a software that
- (1.1)
encompasses the main principles of the connectionist-geostatistical classification method,
- (1.2)
is sufficiently tested/robust/quality-assessed to form the center-piece of a community software for image classification in the geosciences and beyond,
- (1.3)
has a user-friendly GUI for image manipulation, selection of training data, through classification,
- (1.4)
facilitates training and classification of several crevasse types,
- (1.5)
allows analysis of different types of satellite imagery,
- (1.6)
includes utility tools for cartographic projections and other image manipulations,
- (1.7)
includes several neural network types, including multi-layer perceptrons, convolutional neural networks, and
- (1.8)
is open to generalization to more architecture types,
- (2)
Explore the trade-offs between a physically constrained NN and a CNN for a specific, but generalizable, problem in the geosciences: the classification of crevasse types that form during a glacier surge,
- (3)
Create an example of a ML approach that combines the advantages of a physically constrained, classic NN with those of a CNN, thereby creating a physically constrained NN with a combined architecture, and
- (4)
Apply the resultant NN to a time series of WorldView images, to derive information on the evolution of the surge in an Arctic Glacier System, the Negribreen Glacier System.
4.3. Terminology
- (1)
The
connectionist-geostatistical classification method [
44] is the original approach that combines a physically driven geostatistical analysis of an input data set and a neural network into a ML approach. As described in [
45], the geostatistical analysis or characterization can take several different forms, in any case, the output of the geostatistical analysis is used as input for the neural network. Examples of geostatistical analysis include (a) the experimental variogram, a discrete function, and (b) results of geostatistical characterization parameters. The neural network type applied in most of our studies is generally a form of a multi-layer perceptron (MLP) with back-propagation of errors [
44,
45,
61] (see
Section 6).
- (2)
The acronym
VarioMLP is used for the connectionist-geostatistical NN type that is applied in this paper; it employs an four-directional experimental vario function to activate the input neuron of a MLP with back-propagation of errors (see
Section 6).
- (3)
The term
convolutional neural network (CNN) stands for a specific class of neural networks that realize the concept of “deep learning” [
46,
58].
- (4)
ResNet-18 is the acronym for the specific CNN used in this paper [
41,
82] (see
Section 8).
- (5)
The acronym
VarioCNN will be used for the combined new method that integrates VarioMLP and ResNet-18 into a unique, physically constrained ML approach (see
Section 9).
- (6)
Specific architectures of a NN are identified by adding information in square brackets, for example, VarioMLP[18, 4,(5,2)] identifies a VarioMLP, where 18 is the number of steps in the vario function (for each direction), 4 the number of directions of vario-function calculations, yielding 72 nodes in the input layer, and (5,2) the factor in the number of nodes of hidden layers; here a MLP with two hidden layer is used, where the first layer includes 72 times 5 nodes and the second layer 72 times 2 nodes (see
Section 7).
More generally,
identifies a VarioMLP, where
is the number of steps in the vario function (for each of
directions) and
with
the factor in the number of nodes in
hidden layers; here a MLP with
hidden layers is used, where layer
i has
nodes for
(see
Section 7).
- (7)
GEOCLASS-image is the software system utilized to create the neural networks and labeled data sets referred to in this paper and carry out the classifications of crevasse types during the surge of the NGS, Svalbard [
136].
7. Image Labeling and Training Approach (for VarioMLP and ResNet-18)
7.1. Training Approach
The training approach reflects the goal of creating a physically constrained NN by combining knowledge of glaciological processes and Earth observation technology with ML methods at every step. In the last section, we saw that the selection of sizes of training images is controlled by a requirement of spatial homogeneity, constraints associated with the spatial resolution of the satellite imagery, and the spacing of crevasses on the glacier surface, which results from the glacial movement and acceleration that we aim to analyze.Training is carried out as a form of supervised training; training as such is an optimization problem of the model’s internal parameters.
7.2. Crevasse Classes
Crevasse classes are selected by an expert, based on structural glaciology (
Section 3.3). Because a main objective of this paper is the integration of a physically constrained NN and a CNN, we utilize (only) four basic crevasse classes: (a) one-directional crevasses, (b) multi-directional crevasses, (c) shear crevasses, and (d) chaos crevasses, or shear–chaos crevasses. The crevasse types associated with these classes are illustrated in
Figure 3. Crevasse types (a), (b) and (c) are associated with basic deformation matrices [
124]: The one-directional crevasse type results from an extension in one direction (
Figure 3a). The multi-directional, including two-directional, crevasse type results from a deformation with more than one stress axis (
Figure 3b). It can also result from two deformation processes that affect the material ice in sequence. The shear crevasse type results from shear, a deformation type that typically occurs when fast-moving ice borders slow-moving ice. In the case of a surge, the ice of one glacier (Negribreen) accelerates, while the ice of an adjacent glacier (e.g., Ordonnansbreen) continues to flow at normal, much slower speeds (
Figure 3c). Depending on the spatial and temporal velocity gradient, shear crevasses can have different appearances (
Figure 3c,d). Transportation, weathering and interaction of several deformation processes can lead to complex ice-surface and near-surface structures, in which the signatures of individual processes can no longer be distinguished, thus, they are summarized as a “chaos” crevasse class (
Figure 3d). In some areas, the signature of shear deformation is still evident in the chaos crevasse fields (
Figure 3f), but separation in an image classification process may be too difficult, thus, the class is summarized as chaos/shear–chaos. Two additional classes need to be added to each classification, one for undisturbed snow/ice and a rest class for “other” surfaces, which can include moraines, rock avalanches, subimages that include snow/ice and rock surfaces, and indiscernible images, to limit misclassification of the four better defined crevasse classes. A rendering of representative examples of split images, subselected from WorldView satellite imagery, is seen in
Figure 4. The images have a size of 201(=3 × 67) pixels by 268(=4 × 67) pixels, i.e., they follow the (3-4-5) size rule.
7.3. Image Labeling
A second main objective of this paper is the derivation of a labeled training data set for the problem of crevasse classification from satellite imagery. With this objective, we address the problem that application of ML in the geosciences and specifically the cryospheric sciences has been hampered by the lack of labeled training data sets, as identified by authors working in the field (e.g., [
43,
89,
90]) and described in more detail in
Section 2.
To initiate training, sets of split-images for each class are identified and selected by the structural glaciologist. In our experiments, we found that several tens of example images per class are sufficient for an initial training run of VarioMLP.
Technically, image labeling is carried out using the Split Image Explorer Tool, visualized in
Figure 5, described in more detail at [
136]. Individual images can be selected from the WorldView image, optionally with a polygonal area of interest outlined that contains the glacier area, viewed enlarged at the top left, and associated to a class. The association can be (1) performed initially by the glaciologist, or (2) displayed as the result of the NN classification, or (3) overwritten (accepted or rejected) in a control pass in the training loop (see
Section 7.6). A sliding bar in the left middle of the explorer tool allows application of confidence as a filter for visualization (only images classified with a confidence level exceeding the user-selected confidence threshold are displayed in color).
7.4. Data Handling and Feature Engineering
Feature engineering is the design of the input for the neural network. Of importance for robustness of the results is that identification of a crevasse type is independent of orientation and view angle of the satellite, relative to features on the ground. Directional bias is removed by calculating vario functions in several different directions for each split-image.
Prior to extraction of split-images, the satellite image needs to be oriented in a geographic or rectangular projection framework that facilitates output of the final classification in the form of a thematic map of crevasse provinces. Raw satellite imagery is typically collected along orbits and constrained by the view angle of the observatory, which is fixed for some imagers, but adjustable or sweeping for most (including WorldView). To accomplish mapping larger areas from a single or multiple satellite images, utility functions for image projection and mosaicking are implemented as part of the GEOCLASS-image system. To visualize, the reader may compare the different sizes and orientations of the input imagery shown in
Figure 6.
Data from the panchromatic channel of WorldView-1 and WorldView-2 are utilized, because the classification principle is a spatial classification. In the more common form of multivariate statistical classification, data from several spectral channels are used. Our study combined imagery from two different image systems, WorldView-1 and WorldView-2, which resulted in imagery of a somewhat different pixel size and resolution (0.45 m for WorldView-1 and 0.42 m for WorldView-2, see
Section 5.2). A utility function in GEOCLASS-image facilitated simultaneous analysis and classification of imagery from both satellite types.
Application of the vario function to a typical image from the classes of (1) undisturbed snow and ice surfaces and (2) one-dimensional crevasse types, seen in
Figure 2, illustrates how the NN can separate these crevasse types based on the vario function values for different directions and distances. First, the maximum of the resultant vario function values is much lower for undisturbed surfaces than for crevassed surfaces (compare the
axes in
Figure 2c,d). Second, an anisotropic behavior of the set of directional vario functions is typical for one-directional crevasses (
Figure 2b,d), where the direction that is near-parallel to the crevasse direction does not reach the sill of the vario function (green in
Figure 2d), whereas the other three directional vario functions exhibit a typical wavy pattern resultant from washed out cross-correlation, with spacing dependent on the relative angle of the crevasse orientation to the directional calculations.
7.5. Criteria for Evaluation of Training Success
We use the terminology of
intrinsic criteria for quantitative, computational criteria (cross-entropy measure of training loss, confidence of classification result, co-occurence matrix) and
extrinsic criteria for glaciological criteria that are typically based on airborne field observations of the glacier system during surge and on additional expert knowledge on the evolution of crevasse types during a surge [
16,
18,
20,
125,
126]. The application of extrinsic criteria is best explained in an applied example of image labeling and in the geophysical interpretation (see
Section 7.6 and
Section 11).
7.5.1. Softmax Function
A softmax function is used to convert the NN output layer to a probability distribution for the possible classes. Each output node is assigned a value between 0 and 1 (
), with all outputs summing to 1, so that they can be interpreted as probabilities. The class with the largest probability is selected as the NN’s final classification of a given input and the confidence of the classification result is equal to that probability, i.e., the maximum of the softmax function. The loss function associated with the softmax function is given by the cross-entropy loss, which is used for training purposes (see,
Section 7.5.2). The softmax function is commonly used in many CNNs [
40,
41], due to its simplicity and probabilistic interpretation.
7.5.2. Cross-Entropy
Training an MLP is an optimization of the model’s internal parameters, carried out iteratively. At each iteration, VarioMLP predicts the class of each training example and uses the cross-entropy loss function as a quantification of the difference between predicted values and training data. Entropy is first introduced in [
156] to quantify the level of uncertainty of a random variable
X based on possible outcomes
according to
for
and
is the number of classes. For VarioMLP, the outcomes are the crevasse classes and the probabilities are those which the model assigns to each output neuron. The DDA-MLP uses cross-entropy loss as its loss criterion, calculated as
where
n is the number of classes,
is the truth label for class
i, and
is the model-predicted probability for class
i as its loss criterion. The optimization problem is then for the model to learn an internal parameter set that minimizes this loss function, and to accomplish this the DDA-MLP employs stochastic gradient descent (SGD) via the Adam algorithm for first-order gradient-descent based optimization problems introduced by [
157]. During training, backpropagation, as defined in [
158], involves computing the gradient of the loss layer by layer, starting from the output and moving backward towards the input layer. In this case, the Adam algorithm for SGD only computes the first-order gradient, and employs adaptive learning rates for parameters based on estimates of the first- and second-order moments, and updates the parameters proportionally to the learning rate hyperparameter in the direction of steepest descent of the gradient [
157]. Application of cross-entropy loss for training of deep NNs is described in [
159].
Cross-entropy loss is utilized to identify functional training runs and reject training mistakes. For example, overfitting in a test-run of the model is illustrated in
Figure 7.
7.5.3. Confidence
Classification confidence is a measurement of the probability that the association of an input image to a class is correct. Confidence approaches have been discussed in [
160]. We utilize confidence to accept or reject classified crevasse images into the training data set, applying a threshold of 90% confidence. The Split-Image Explorer Tool allows user-selected confidence values.
7.5.4. Other Training Hyperparameters
Overall, the training and feedback-loop experiments are repeated several times with different parameters and variations of the classification models. The split of the training data into actual training images and validation images is held constant at 80% (training) and 20% (evaluation) for all experiments training VarioMLP and ResNet-18. This means that, however many labeled training images exist for a given run, 80% are randomly selected at runtime for the actual training process, and 20% are reserved to evaluate the model performance after each epoch. It is important to separate the training and evaluation data sets, because if a model is not evaluated on images it did not see during training, it will simply memorize the training data set if the model is sufficiently complex. Each training run is carried out with a maximum number of 50 epochs. For each epoch that results in a new best validation loss, a checkpoint of the classification model is saved for further evaluation. For all training experiments, cross-entropy loss is used with the Adam optimizer as the method for gradient descent calculation (
Section 7.5.2).
7.6. Interleave of Split Image Labeling with the Training Process: The Feedback Loop
Following creation of an initial set of expert-labeled training data, a VarioMLP is trained. The resultant network architecture can be applied to simply classify an entire satellite image. However, in order to derive a large data set of labeled training images, an iterative approach to split-image labeling and VarioMLP training is taken. The goal is the creation of a data set that is large enough to train a CNN, which in turn can be expected to facilitate rapid classification of many satellite images for similar problems, i.e., a higher level of generalization of the task of crevasse classification.
The iterative approach is implemented as a feedback loop in VarioMLP, executed as a mix of computational criteria and expert interaction, interleaved in the training process of VarioMLP as follows (see,
Figure 8): The initial data set is considered the first-order data set, used to train the NN. Validation loss and training loss are evaluated as quantified by the cross-entropy measure (see,
Section 7.5.2). A trained VarioMLP architecture results.
VarioMLP, with first-approximation final structure, is then applied to classify the entire set of all split-images from a given satellite image (all split-images inside the polygon that outlines the NGS). Each split image is associated to a class and written out into a directory of that class. Next, only split images with a classification confidence at least 0.9 are retained in the crevasse class directories. Then, the glaciology expert quickly views all new images in each class (i.e., any images that are not part of the original labeled data set) and rejects images that are misclassified. This process is much faster, requiring a fraction of human expert time, than labeling thousands of split-images initially. VarioMLP is then rerun, using the larger set of labeled data as training data. By repeating the feedback loop, a labeled data set with 3933 images is obtained in a reasonable amount of time. The final labeled data set of 3993 split images includes between 522 and 953 images per crevasse class, with a distribution given in
Table 4. This distribution is relatively even and not varied enough to be a significant potential source of inaccuracy for the model training.
In this exemplary application, the expert that selected the initial data set was a glaciologist experienced in structural glaciology, especially observation of glacier surfaces during surges (the lead author of the paper), whereas in later iterations, the sorting of images was performed by a computer science student, indicating that the sorting procedure grows increasingly fast and simple as the training goes through several iteration steps. To simplify the process, only a set of four main crevasse classes, plus undisturbed plus a rest class/chaos class are chosen for this study.
On the other hand, to ascertain the general application of the labeled training data set to a range of previously unseen WorldView data sets from the NGS and other regions of surge glaciers, as well as for analysis and classification of data from WorldView-1 and WorldView-2, split-images are sourced from 11 different WorldView data sets collected over the NGS in 2016, 2017, and 2018. This results in a total of 108,623 split-images. The distribution of split images in the final 3933 data set per WorldView source files is given in (
Table 2).
At this point, we have achieved two results: (1) The derivation of a labeled training data set, and (2) VarioMLP together with the feedback loop as either a standalone NN or a component in a physically constrained CNN, the VarioCNN.
In the next sections, we will describe ResNet-18, the CNN component selected for VarioCNN, its training, comparison to VarioMLP, and finally the design of the combined classification system, VarioCNN, and the classification software system, GEOCLASS-image. Experiments with VarioCNN, using GEOCLASS-image, are rounded off by geophysical application and interpretation of the evolution of crevasse provinces during the surge in the NGS.
7.7. Determination of VarioMLP Hyperparameters
The VarioMLP architecture includes hyperparameters that can be optimized to tune the model for testing performance and generalization. Both the directional variogram and multi-layer perceptron steps of the VarioMLP architecture have hyperparameters that affect the training and testing in different ways. Input image size and resolution have already been discussed in
Section 6.1.1, as this is constrained by the observations technology, the surface signatures and the assumption of spatial homogeneity. To optimize the architecture of VarioMLP, experiments are carried out to determine the optimum number of lag steps in the vario function and the shape and number of the internal layers. In both series of experiments, cross-entropy loss is used as the measure for assessment of training quality and network performance.
7.7.1. Number of Vario Function Steps
VarioMLP is trained on the output of the discrete, experimental vario function, calculated in four directions (horizontal, vertical and along the two diagonals, using the advantages of the 3-4-5 size of the image for efficient computation (201 = 3 × 67, 268 = 4 × 67, 335 = 5 × 67)). The number of directions is kept fixed. If the number of lag steps used is too small, then the directional vario function may not be able to provide sufficiently different characteristics for a given set of surface types for reliable classification. If the number of lag steps is too high, the characteristics provided by the directional vario function can be polluted by noise and small-scale features that are present in multiple surface types. These characteristics may bury the salient features of each surface type needed for classification. During training, the lag step parameter is tested for values of 10, 12, 14, 16, 18 and 20 (
Table 5). In this experiment, the hidden layer shape is fixed at [5, 2], and the final validation data set included 786 images (20% of the final 3933-image labeled data set.) An MLP model denoted as [5, 2] refers to a model with two fully-connected hidden layers, which contain 5 and 2 times as many nodes as the input layer respectively, in our experiments [5, 2] = [5 × k × 4, 2 × k × 4] for
. The best performance is achieved with a lag step value of 18, resulting in an internal layer structure of [5, 2] = [5 × 18 × 4, 2 × 18 × 4] (see,
Figure 8). It is interesting to note that performance is not correlated with the number of lag steps used in the vario function phase. Rather, the model seems to perform relatively well with values of 12, 14 and 18, and relatively poorly with values of 10, 16 and 20.
7.7.2. Hidden Layer Structure in the MLP
The number of hidden layers in the MLP step of the VarioMLP architecture is a function of the size of the input layer, as well as the size of the training data set. If the number of hidden layers is too large relative to the input layer size, then the model becomes unnecessarily complex and thus more susceptible to overfitting. Too few hidden layers produce the opposite problem—the model lacks the complexity necessary to capture the full variance in the data set and suffers from underfitting. This is an example of what is commonly referred to in machine learning as the bias–variance trade-off [
161,
162,
163]. Choosing a perfect model size and depth becomes increasingly difficult for problems where there is no existing reference data set of labeled training examples, since as the size of the training data set increases, so too does the optimal fully-connected model size. However, this relationship is nearly impossible to calculate, so trial-based estimation is necessary. To reduce the scope of this optimization during training, the shape of the hidden layers of MLP architecture are limited to being exact multiples of the input layer size. An MLP model denoted as [5, 10, 2] refers to a model with three fully-connected hidden layers, which contain 5, 10 and 2 times as many nodes as the input layer, respectively. During training, model architectures of [2, 2], [5, 2], [5, 5, 2], [10, 5, 2], and [10, 10, 2] are tested (
Table 6). For each test run, the lag steps for the vario function are fixed at 18. The best performing hidden layer shape is [5, 2]. For networks both wider and deeper than this, the performance significantly decreases. This is likely due to the fact that for the relatively small amount of information at the input layer (the concatenated output of the vario function stage), larger networks simply converge on memorizing the training data set. This is another example of the bias–variance trade-off at play, the network must not be overly complex for the scope of the input data.
10. Experiments with VarioCNN: Application to Classification of Crevasse Types from a Time Series of WorldView Satellite Imagery
Following training of VarioCNN using the 3933 set of labeled training images, a final architecture of VarioCNN is derived. The final, trained VarioCNN is then applied to a time series of 7 WorldView-1 and WorldView-2 data sets (
Table 2). From a large catalog of WorldView images, 11 images are found to be suitable with regards to cloud cover and area coverage, of those, 7 images are selected to represent the time interval between May 2016 to May 2018. A disadvantage of any analysis that utilizes WorldView imagery is the large delay between the time of data collection and the time when imagery is first made available to the glaciological research community. All useful images are WorldView-1 or WorldView-2 data.
As described in
Section 3, crevasse types are the results of ice-dynamic processes that occur during the surge. The spatial patterns recorded in the satellite image provide a snap-shot of the local result of the dynamic state of the material ice, which is the kinematic force/state associated with the deformation that results in the crevasse type.
At the beginning of the classification work for this paper, 22 crevasse classes (including ancillary classes), are created. To facilitate efficient implementation and application of the software, crevasse classes are combined into four larger classes: the current selection of classes ((1) one-directional, (2) multi-directional, (3) shear and (4) shear/chaos) provides relatively simple descriptors of deformation kinematics but allows to capture the formation of main crevasse provinces, as the following analysis will demonstrate.
The resultant time series of thematic maps of the six surface classes, which include four crevasse types, undisturbed surface and a rest class, is shown in
Figure 9. An important criterion for the consistency and geophysical interpretability of the results is the fact that the areas of each crevasse class consist of one or several simply connected regions, without being post-processed, such as smoothed. The region of crevassed ice expands upglacier, as time passes and the surge progresses. Therefore, interpretation of our results from the physically constrained CNN, VarioCNN, is warranted and will be presented in the next section.
12. Experiments Using Small Data sets to Train ResNet-18 Directly, without VarioMLP
The objective of this section is to answer the question of whether the role of VarioMLP and its feedback loop in deriving a labeled training data set for VarioCNN is actually necessary—or whether, alternatively, it may be possible to train a ResNet-18 directly, utilizing the experience gained with crevasse classification from WorldView imagery. To this end, we carry out a series of experiments, using data sets of several hundred labeled split images (see,
Figure 10 and
Figure 11).
For the first series of experiments, a training data set is created by selecting approximately 50 split-images for each class, for a total of 384 from the WorldView-2 image acquired 2016-06-25 (see,
Figure 6b and
Figure 9b). Split images for any given class are selected from regions where crevasses of the type of that class are identified in application of VarioCNN (i.e., in
Figure 9b). The size of this initial data set is similar to that of the first iteration of the data set used for VarioMLP (approximately 300 images, see
Section 9.2). We then train a ResNet-18 model using the resultant 2016_50 labeled data set with an 80%/20% split into training and validation data. Training and validation loss curves are shown in
Figure 11.
The resultant ResNet-18 model (2016-ResNet for short) is applied to three images: (1) WorldView-2 acquired 2016-06-25 (see,
Figure 6b and
Figure 9b)), (2) WorldView-1 acquired 2017-05-30 (see,
Figure 6e and
Figure 9e)), and (3) WorldView-1 acquired 2018-05-26 (see,
Figure 6g and
Figure 9g)).
Classifications of crevasse types using 2016-ResNet, applied to image (1), i.e., the image from which the labeled training set is sourced, works reasonably well. The area of the surge is correctly identified, as is the area of undisturbed snow. The region of other surfaces is classified similarly to the result in
Figure 9b. The region of multi-directional crevasses approximately matches that in
Figure 9b and the region of one-directional crevasses is mostly, but not entirely identified as the upglacier part of the region affected by the surge. But the more difficult to identify crevasse types of shear and shear/chaos are not correctly classified. There are misclassifications of the shear margin of the surge region as either one-directional or chaos or multi-directional. A look at the loss curves (
Figure 11a) shows that the 2016-ResNet is overfitting to the small training data set of 384 images but does not generalize well, an observation that can be expected given the size of the training data set (for typical learning curves, see [
44]). More interesting in the context of evaluation of the contribution of VarioMLP to the creation of a training data set is the observation that the CNN is not able to distinguish imagery based on spatial patterns associated with the deformation characteristics of crevasse fields, which is attributable to the fact that the CNN does not have a spatial statistical decision criterion, as is explicitly implemented as part of VarioMLP. Application of 2016-ResNet to the 2017 WorldView data yields misclassification of the entire region, which is interpreted as a result of poor generalization. Comparison of the classification results after 47 epochs (near the end of experiments) and 41 epochs (minimum distance between validation loss and training loss curves) indicates that for 41 epochs, an approximately along-flow orientation of provinces emerges, however, all crevasse locations are misclassified as “undisturbed snow/ice” or “other”. In comparison, the result of application of 2016-ResNet to the 2018 WorldView image is somewhat better in that crevassed areas are called out approximately where they exist (see
Figure 9g), but the crevasse types are classified incorrectly for most locations of the glacier: The region of “chaos” is far too large, the shear margin is missing and misclassified as “other”, and one-directional crevasses do not lead the expansion of the surge upglacier.
In the second series of experiments, approximately 100 split-images are selected for each class from the 2017 WorldView-1 image (2), resulting in a total of 634 training data. The loss curves (
Figure 11b) for the resultant ResNet-18 model, 2017-ResNet for short, indicates that the training process is not stable, hence, we cannot expect that 2017-ResNet will yield correct crevasse classification results. Resultant crevasse classification maps are plotted for an epoch with a small difference between validation and training loss (epoch 5; see
Figure 11b). Similar to the previous experiments from 2016-ResNet, the classification works to some extent in application to the data set from which the 634 labeled training images are sourced, simply because the training data are a significant part of the data to be analyzed. Comparing
Figure 10d to
Figure 9e, the chaos class is approximately correctly mapped, with a province of multi-directional crevasses splitting it near the terminus. Distinguishing multi-directional and shear crevasse fields appears most challenging for this network. Results of application of 2017-ResNet to 2016 and 2018 WorldView data suffer from the poor generalization capability of the CNN (2017-ResNet). Approximately doubling the number of training data without changing the labeling strategy does not improve the classification capability significantly. For the 2016 data set, large areas are misclassified as shear. For the 2018 data set, large areas are misclassified as shear but are actually fields of one-directional crevasses in
Figure 9g (and also
Figure 9f) or multi-directional crevasses. These comparisons affirm the conclusion from the 2016-ResNet experiments that the capability for association of those spatial characteristics, which relate ice deformation to resultant surface pattern (crevasse types), requires VarioMLP and the connectionist-geostatistical approach.
13. Summary, Discussion and Conclusions
The work in this paper has addressed three challenges, posed in the introduction: Challenge 1. Harnessing the data revolution in Earth observation from space; Challenge 2. Glacial acceleration and Sea-Level-Rise assessment; and Challenge 3. Integration of physically-constrained classification and modern “Deep Learning” approaches in satellite image classification.
Challenge 1. Harnessing the data revolution in Earth observation from space. Through the integration of physical knowledge and two different ML approaches into a physically-driven NN, the VarioCNN, we have provided a means for rapid and efficient extraction of complex information from submeter resolution satellite imagery (and other imagery). The new NN, VarioCNN, combines the advantages of a physically-driven, relatively easily trainable MLP, with those of an efficient CNN, and thus directly provides an answer to Challenge 3. Integration of physically-constrained classification and modern “Deep Learning” approaches in satellite image classification.
There are several key concepts that are instrumental in the mathematical and computational formulation of a connection between physical understanding and physically constrained classification: (1) ice dynamics of glacial acceleration, especially surging, (2) deformation of the material ice during rapid acceleration, (3) the resultant surface signatures: crevasse patterns, and their formation, transport and overprinting, (4) recording of ice-surface structures in optical satellite imagery (and other imagery) and (5) mathematical representation of crevasse patterns in multi-directional vario functions—these components comprise the physical constraints of VarioMLP. VarioMLP utilizes the connectionist-geostatistical classification method [
18,
44,
45] to first process satellite imagery by calculation of directional vario-functions, which are then used to activate the neurons of an input layer of a MLP.
While there has been an increasing acceptance of deep learning methods in the geosciences, the lack of adequate, problem-specific labeled training data has hampered derivation of new knowledge using said deep learning approaches, because CNNs require training data sets with on the order of 100,000s to millions of labeled data. Science applications of CNNs have been limited to areas where more training data exist, including (a) biology and medicine, (b) atmospheric sciences and weather forecasting, and (c) sea surface temperature (ocean remote sensing) [
92].
In a comparison of VarioMLP and ResNet-18, the shallowest “deep” NN that is commonly used [
41,
82], we find that the primary advantage of VarioMLP over the CNN is that VarioMLP can be trained with a relatively small set of labeled training data of a number of input images that can feasibly be labeled by an expert in the field. Starting from a set of several hundred training data sets of crevassed surface images, associated to six classes by a structural glaciologist, a feedback loop of retraining and reinforcement, with a fast rejection/acceptance feature supported by a GUI in combination with a confidence measure and expert-controlled decision, leads to the creation of a labeled crevasse class data set of 4000 images.
We proceed to create a combined three-tiered network, termed VarioCNN, which consists of VarioMLP, the feedback loop, and a backend of a CNN (ResNet-18); this NN can be trained with the 4000-image labeled data set and has better training properties than VarioMLP alone. A flexible and versatile open-source software system, GEOCLASS-image [
136], has been designed and built for image classification. It performs all the tasks in this analysis and more; it is easily generalizable to other network structures and applications because of its modular design. GEOCLASS-image is user friendly, and it includes a functional GUI that appeals to the expert and non-expert in glaciology or computer science alike (i.e., it does not require a lot of knowledge of ML, however, it has a PyTorch framework).
While ResNet-18 is classically trained using square input images of 224 by 224 pixels [
41,
82], especially for benchmarking experiments, this is not a requirement. In GEOCLASS-image, all currently utilized NN architectures and approaches can be trained with rectangular split-images of any size [
136]. Using the same size for VarioMLP and ResNet-18 in the combined VarioCNN architecture yields the most consistent results (here, 201 by 268).
With GEOCLASS-image and VarioCNN, we have created an infrastructure that facilitates rapid analysis of submeter resolution commercial satellite image data, such as Maxar WorldView data, thus answering Challenge 1. Furthermore, the work in this paper presents an approach for a path forward in harnessing the data revolution towards obtaining an advanced understanding of complex geophysical phenomena (here, glacial acceleration) in a climate-change science framework.
Challenge 2. Glacial acceleration and sea-level-rise assessment. Our research in this paper presents an advance in the complexity of physics that can be extracted from satellite imagery (crevasse classification, deformation), in an area where such research has not been conducted yet. In the introduction, we have summarized the relationship between glacial acceleration and sea-level rise. In summary, glacial acceleration constitutes a deep uncertainty in SLR assessment, a term coined by the 6th Assessment Report of the IPCC [
3]. Surges are the least understood form of glacial acceleration. The work in this paper culminates in an application of VarioCNN to study the evolution of crevasse provinces during the current (2016–2024) surge of an Arctic glacier system, the Negribreen Glacier system, Svalbard, based on the classification of crevasse types in a time series of WorldView images for 2016–2018. This constitutes a novel approach, resulting in new results in glaciology. This classification is the first of its kind, carried out for an entire Arctic glacier system and for WorldView data. Negribreen last surged in 1935/36 [
35,
113].
Using four principal crevasse types (one-directional, multi-directional, shear and chaos), plus a class for undisturbed snow/ice surfaces and a rest class, we have derived segmentations of a surging glacier into crevasse provinces that allow geophysical interpretation of the surge evolution in 2016–2018, which includes most of the acceleration phase of the surge. Some results are: More crevasses form, as the surge expands. Fields of one-directional crevasses always form on the upglacier, leading edge of the surge expansion. Fields of shear crevasse type form between areas of accelerating and fast-moving ice and areas of slow-moving ice that is not, or not yet, affected by the surge. Multi-generational, multi-directional crevasse types form, as a new wave of surge forces affects regions with pre-existing crevasses. Lastly, continued deformation can render the crevassed area as a region of “chaos class”, where individual deformation events cannot be traced back to individual deformation events any more. Over time, the surge expands upglacier and into marginal areas. Links to modeling are outlined.
A limitation of the current analysis is the small number of crevasse classes, chosen to more easily derive the first combined network that integrates the connectionist-geostatistical approach and a CNN. A classification that distinguishes up to 13 crevasse classes is in preparation by the authors’ group.
More generally, the specific glaciological results obtained in this paper demonstrate that geoscience and computer science are equally important disciplines in the development of physically constrained NNs (i.e., glaciology is not merely “domain knowledge”), in light of the goal to utilize modern observation technology to advance geophysical understanding.