Next Article in Journal
MosReformer: Reconstruction and Separation of Multiple Moving Targets for Staggered SAR Imaging
Next Article in Special Issue
Interpretation of Bridge Health Monitoring Data from Satellite InSAR Technology
Previous Article in Journal
An Infrared Maritime Small Target Detection Algorithm Based on Semantic, Detail, and Edge Multidimensional Information Fusion
Previous Article in Special Issue
Volume Loss Assessment with MT-InSAR during Tunnel Construction in the City of Naples (Italy)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Entity Embeddings in Remote Sensing: Application to Deformation Monitoring for Infrastructure

1
Department of Engineering Science, University of Oxford, Oxford OX1 3PJ, UK
2
Satellite Applications Catapult, Didcot OX11 0QR, UK
3
School of Geography and the Environment, University of Oxford, Oxford OX1 3QY, UK
4
Department of Computer Science, University of Oxford, Oxford OX1 3QG, UK
5
Department of Engineering, University of Cambridge, Cambridge CB3 0FA, UK
*
Author to whom correspondence should be addressed.
Remote Sens. 2023, 15(20), 4910; https://doi.org/10.3390/rs15204910
Submission received: 13 July 2023 / Revised: 26 September 2023 / Accepted: 9 October 2023 / Published: 11 October 2023

Abstract

:
There is a critical need for a global monitoring capability for Tailings Storage Facilities (TSFs), to help protect the surrounding communities and the environment. Satellite Synthetic Aperture Radar Interferometry (InSAR) shows much promise towards this ambition. However, extracting meaningful information and interpreting the deformation patterns from InSAR data can be a challenging task. One approach to address this challenge is through the use of data science techniques. In this study, the representation of InSAR metadata as Entity Embeddings within a Deep Learning framework (EE-DL) is investigated for modelling the spatio-temporal deformation response. Entity embeddings are commonly used in natural-language-processing tasks. They represent discrete objects, such as words, as continuous, low-dimensional vectors that can be manipulated mathematically. We demonstrate that EE-DL can be used to predict anomalous patterns in the InSAR time series. To evaluate the performance of the EE-DL approach in SAR interferometry, we conducted experiments over a mining test site (Cadia, Australia), which has been subject to a TSF failure. This study demonstrated that EE-DL can detect and predict the fine spatial movement patterns that eventually resulted in the failure. We also compared the results with deformation predictions from common baseline models, the Random Forest model and Gaussian Process Regression (GPR). Both EE-DL and GPR greatly outperform Random Forest. While GPR is also able to predict displacement patterns with millimetric accuracy, it detects a significantly lower number of anomalies compared to EE-DL. Overall, our study showed that EE-DL is a promising approach for building early-warning systems for critical infrastructures that use InSAR to predict ground deformations.

1. Introduction

Mining is at the core of our global energy transition. An estimated three billion tons of metals will be required to produce the renewable energy technologies [1,2]. To put this in the context of the challenge posed to the extractive industries, this number is small compared to the amount of material that will need to be taken out of the ground. For example, over 98% of the material in copper mining is waste. Therefore, the often forgotten consequence of our metal-hungry future is the handling of the enormous amount of resulting mine waste.
Usually, mine waste is stored behind large earthen dams, known as “Tailings Storage Facilities” (TSFs). Due to mining being one of the oldest industries, over 30,000 TSFs are estimated to exist worldwide, a quarter of which are abandoned and not monitored at all [3]. Even relatively well-monitored TSFs can sometimes fail, resulting in socio-environmental tragedies such as the Brumadinho failure in 2019 [4] and, most recently, the Jagersfontein failure in 2022 [5]. Unfortunately, the number of catastrophic TSF failures is increasing [6]. Therefore, there is a critical need for a global monitoring capability for the TSFs of the world.
A technology offering much promise towards this ambition is satellite Interferometric Synthetic Aperture Radar (InSAR). InSAR analysis enables the detection of millimetric-scale ground movements from satellite constellations hundreds of kilometres in space. The prospect for global monitoring with InSAR is imminent, as illustrated by the recent, bold initiative, the European Ground Motion Service [7,8]. This initiative has extended ambitious nation-wide InSAR monitoring to the continental scale. Moreover, the applicability of advanced multi-temporal InSAR techniques [9] has been demonstrated for the monitoring of critical infrastructure such as bridges (e.g., [10,11]), dams (e.g., [12]), underground construction activities (e.g., [13]), and ground de-watering (e.g., [14]).
Although relatively sparse, several studies have explored the applicability of InSAR for the monitoring of TSFs. A key research question has been whether precursory indications of structural failure are detectable from InSAR preceding TSF failures, such as the Brumadinho (e.g., [15]) and Cadia failures [16,17,18,19]. Most studies have explored this question from the perspective of analysing the robustness and limitations of particular custom post-processing of the raw SAR data into InSAR. For example, Grebby et al. (2021) [15] was able to detect deformation trends that were inconsistent with expected ground consolidation settlements of Brumadinho TSF. Although potentially anomalous measurements were also detected by [20], those authors concluded that the statistical significance of these anomalies is insufficient (due to the high signal-to-noise ratio) to reliably trigger an early-warning alert.
Nevertheless, there is a general consensus on the feasibility of detecting anomalous deformation from InSAR preceding the Cadia TSF failure [16,17,18]. Hudson et al. (2021) [18] demonstrated that the signal is discernible regardless of the raw SAR data type and resolution, from both medium- (Sentinel-1) and high-resolution commercial (Radarsat-2) data. Carlà et al. (2019) [17] proposed an “inverse velocity method” for the estimation of the failure date using InSAR for a variety of slope instabilities, including the Cadia TSF failure. Their proposed technique has been adopted by a variety of InSAR-TSF-monitoring studies for both Cadia [16] and Brumadinho [15]. In those studies, anomalous measurements were identified both qualitatively and visually, i.e., whether any obvious clusters of high subsidence measurements occurred near the failure area. However, the location of the failure surface is not known without the benefit of hindsight analysis. Therefore, there is a need for a robust approach to identifying anomalous InSAR deformation behaviour as they develop over time. This motivated the focus of this study, which is on the detection of anomalous deformation measurements using a data-driven approach.
Recently, there has been a significant interest in the application of machine learning, especially deep learning (DL) approaches, to InSAR monitoring. DL models have been tasked with identifying the presence of deformation over volcanoes [21], tunnels or gas/water extraction [22], airports [23,24,25]. and geological faults [26,27]. Both sequential-data-based Recurrent (RNN) and image-based Convolutional Neural Net (CNN) approaches have been considered in the literature. The InSAR deformation through time was treated as purely sequential data within RNNs in Zhao et al. (2021) [25]. Those authors treated each InSAR deformation measurement separately and based their prediction of the trend and magnitude purely on the 2D temporal history. However, such approaches completely neglect the inherent spatial relationship between deformation measurements and require the generation of synthetic data. In image-based models, synthetic data were also used to train the CNNs in [22,24]. In Anantrasirichai et al. (2020) [22], the synthetic data were used for classifying deformation patterns from point- or line-based geometries, representing underground gas/water extraction or tunnelling, respectively. In Chen et al. (2020) [24], the synthetic data were used to learn the non-linear relationship between clean and noisy interferograms for improving deformation data quality.
Moreover, various tweaks to traditional CNN approaches have explored combining the spatio-temporal components together. For example, a modified U-Net architecture was used to ingest a sequence of 12 temporal InSAR deformation maps to predict the following 12 time steps in [23]. Similarly, auto-encoders have been trained on synthetic data to extract deformation signals directly from a sequence of nine interferograms to predict the next nine in [27]. Ma et al. (2020) [23] was able to predict both linear settlement and non-linear seasonal deformation from InSAR data over an airport. However, the InSAR data are not spatially continuous, and measurement gaps are common. The use of interpolation techniques on InSAR likely impacts the performance of CNN-type approaches, as they are known to struggle with data gaps [28].
In this paper, a novel methodology based on the use of Entity Embeddings within a Deep Learning framework (EE-DL) is proposed for detecting anomalous InSAR deformation. Embeddings are similar to the “word vectors” used in natural language processing, providing a rich numerical representation of categorical variables. A variety of EE-DL applications have recently emerged, from predicting the final destinations using embeddings of taxi rides [29] to projecting sales using embeddings from stores [30,31] and the fusing of data from different modalities based on shared embeddings [32]. In this study, embeddings were created to represent the categorical variables of the InSAR metadata. The use of entity embeddings captures both the spatial and temporal components of the InSAR measurements, without requiring the generation of image data. This makes the model more flexible and overcomes typical DL challenges around the handling of the inherent spatial data gaps present in InSAR.
Due to the time-dependent settlement processes (i.e., consolidation) inherent to tailings, a level of deformation needs to be presumed for TSFs. Therefore, the EE-DL model is tasked with identifying whether that deformation is as expected, i.e., “normal”. The EE-DL model was benchmarked using the results of a common probabilistic forecasting approach, namely Gaussian Process Regression (GPR) and a simple baseline Random Forest (RF) model. These models were chosen due to their highly interpretable nature and the fact that they represent a spectrum of model complexity. RF is a widely adopted benchmark and is a highly flexible shallow machine learning algorithm. It provides feature importance scores that help in understanding which features contribute most to the deformation prediction. On the other hand, GPR offers several advantages not captured in the EE-DL and RF approaches. This includes the ability to encode prior knowledge, and the probabilistic approach of GPR provides an uncertainty metric for each prediction, which helps define anomalies (i.e., if a measurement is outside the uncertainty envelope).
Finally, the performance of the EE-DL algorithm for identifying anomalous deformation behaviour is illustrated using the failed Cadia TSF as a case study. A deeper dive into a single case study was chosen, instead of superficial evaluations over multiple examples of TSFs. The depth and rigour invested in previous work [19] served as the foundation and validation for the input data in the present study. It provides an in-depth understanding of the failure mechanism and geotechnical validation of the InSAR measurements.

2. Data and Software

SAR is an active sensor, in that the signal emitted from the satellite reflects off the surface and is then re-detected by the same satellite. SAR satellites operate at wavelengths within the microwave part of the electromagnetic spectrum. Their data consist of the amplitude and phase component of the detected signal. Several processing schemes are feasible, and interferometry is based on the generation of interferograms, involving the use of the phase difference between acquisitions (i.e., at different times) to estimate the movement of the ground surface [33]. Because the difference in phase is used, the movement can be detected at high precision as a fraction of the wavelength of the signal.
The available InSAR approaches can be categorised broadly as follows: (a) Persistent Scatter (PS) interferometry [34], (b) Short-Baseline Subsets (SBASs) and their variations [35], and (c) hybrid approaches combining both PS and SBAS elements, such as StaMPS [36] and SqueeSAR [37]. PS-InSAR processing aims to detect strong and stable signals from dominant reflectors such as man-made structures and sharp-edged objects. On the other hand, SBASs are designed to detect the cumulative contribution of smaller, distributed scatterers of radar signals in non-urban areas, such as agricultural fields, forests, and diverse land covers. For more-comprehensive reviews of these techniques, readers can refer to, e.g., [38,39].
“Sentinel-1” is a SAR satellite from the European Space Agency’s Copernicus program. It operates in the C-band, which has a wavelength of approximately 5.5 cm. Sentinel-1 data have an intermediate spatial resolution of 5 m by 20 m and a temporal resolution of a 12-day revisit depending on the location on Earth [40].
In this study, intermittent SBAS data were used on 2.5 years of the Sentinel-1 descending stack [41]. Deformations are detected in the Line-Of-Sight (LOS) of the satellite. To decompose LOS deformation signals into their horizontal and vertical components of motion, data from two geometries are required. This is not the case for Cadia, where only one orbit (descending) is available from Sentinel-1 at the time of failure. Figure 1a plots the spatial distribution of the average satellite LOS velocity measurements obtained from this stack over Cadia TSF. A clear subsidence (red) is visible along the inner rim of both the failed north and stable south TSFs. Figure 1b is a zoom-in of the north TSF failure zone.
The InSAR data span a total of 68 dates, from 2015-12-02 to 2018-02-25 (YYYY-MM-DD). The last InSAR data were captured 12 days before the north TSF failure on the 9th of March 2018. The across-dam temporal behaviour of the Cadia TSFs is illustrated in Figure 2a,b for the failed north TSF and stable south TSF, respectively, where “downstream” indicates towards the toe of the dam. There were significantly higher deformations experienced in the upstream compared to the downstream parts of the TSFs due to the underlying thicker tailings [19].
Depending on the type of InSAR processing, a variety of attributes may describe the ground deformation measurements. The dependent variable, deformation, is predicted using the other attributes: the date, the geographical coordinates (latitude and longitude), and the coherence of the measurements. Coherence serves as an indicator of the quality of InSAR data, where higher coherence corresponds to greater accuracy in the deformation signals. Moreover, it is possible to extract additional attributes such as the velocity or acceleration from InSAR data. However, it is essential to emphasise that, in the context of this study, which primarily focused on identifying anomalous deformations leading up to TSF failure, the importance should be placed on the actual deformation signal observed at each specific time instance (i.e., LOS deformation), rather than relying on the average deformation rate. While attributes such as the deformation rate might hold value in different scenarios, considering the specific goal of this research, LOS deformation may serve as a more-direct and -sensitive metric. The deep learning training was implemented using the abstraction library FastAI [42] and the PyTorch framework [43]. The Gaussian Process Regression was implemented in GPyTorch [44].

3. Methods

A common feature-engineering approach was applied to the date variable of the InSAR metadata before ingesting into the EE-DL and RF approaches [42]. Instead of representing the date as a string such as “2023-05-22”, it was expanded with other representative columns to enrich and assist the machine learning model. The date column was expanded into 13 other columns, such as the day, year, day of the week, and number of days elapsed since 1970. These were then calculated for each date of the InSAR measurement and were used as the input into the model.
The redundancy and importance of each of these InSAR metadata features were evaluated using a simple RF model. Figure 3 plots the relative importance of each of the features predicting deformation using all data points on the full temporal domain. For example, the top-three variables were the location of the measurement as described by the latitude (“x_lat”), longitude (“x_lon”), and the “elapsed” variable, which summarises each date as the number of days elapsed since 1970. The RF hyper-parameters are specified in Table 1. The most-important features were then used as guidance for choosing the variables to be taken forward into the EE-DL approach: latitude, longitude, elapsed, coherence, year, month, and day. The performance of the various approaches was evaluated using the Root-Mean-Squared Error (RMSE), which is described by Equation (1), where x i and x i ^ are the measured and predicted deformations, respectively, for variable i, where N is the number of measurements.
R M S E = Σ i = 1 N ( x i x i ^ ) 2 N

3.1. Entity Embedding

The novel use of entity embeddings captures both the spatial and temporal components of the InSAR measurements. A sequence of random numbers was assigned to each category of the InSAR metadata, also known as latent factors. The randomly initialised latent factors were then modified by the DL workflow for the prediction of deformation. Therefore, the latent factors capture the relationships between different categories. The word “embedding” refers to the computational shortcut that links the particular entity in the InSAR metadata to its corresponding sequence of numbers.
Based on preliminary experiments, the best performance was obtained by treating all the InSAR metadata as categorical and only one variable, “elapsed”, as continuous. Therefore, embeddings will not be created for the continuous variable “elapsed”. The entities chosen as categorical variables from the InSAR metadata are in Table 2. The cardinality is the number of unique categories representing each entity, for example the cardinality of “Month” is 12. The embedding size is the number of values (i.e., vectors) representing each of the entities, so that entities with higher cardinality are captured by larger embeddings to help capture the subtle non-linearities. The experiments on the embedding size have been conducted within various deep learning frameworks, which is based on some fraction of the cardinality. For example, Google/TensorFlow have adopted an embedding size of γ 0.25 , where γ is the cardinality [45]. In this study, the embedding size proposed in FastAI/[31] was adopted, where the embedding size equals 1.6 γ 0.52 .

3.2. Fully Connected DL Architecture

A simple, fully connected neural network performed the best with entity embeddings in the Kaggle-winning model [29]. That model outperformed more-complex approaches, including recurrent neural networks and their variants. Therefore, for simplicity, fully connected architectures were adopted for training the InSAR embeddings. Figure 4 illustrates the fully connected architecture adopted for this study. This model has ten hidden layers, with each layer made up of Linear–ReLU–Batch Norm–Dropout layers. The first hidden layer starts with 1100 neurons, and in each hidden layer, the number of neurons reduces by 100. The hidden layer structure and the neuron size ratio are similar to the fully connected architecture adopted for training embeddings in [30,31]. The output layer is a linear layer with a sigmoid activation function.
Regularisation techniques such as Batch Normalisation (Batch Norm), weight decay, and Dropout have been demonstrated to be vital in aiding deep learning model generalisation and improving training performance [47]. Batch Norm is made up of a multiplicative and an additive bias-like term [48]. Santurkar et al. (2018) [49] demonstrated the substantial regularising effect of Batch Norm. Therefore, a Batch Norm layer handles the continuous variables and is included in the hidden layers, as illustrated in Figure 4. The embedding Dropout was set to 0.04, the Dropout in the first hidden layer was 0.001, then 0.01 for the rest of the hidden layers. Finally, stochastic gradient descent with momentum was adopted for minimising the loss function between predictions and measurements. A simple loss function, namely the mean-squared error, was adopted for training. A one-cycle policy learning rate with cosine annealing was adopted with a maximum learning rate of 0.01.

3.3. Gaussian Process Regression

A collection of random variables that have a joint Gaussian distribution can be completely described by their mean and co-variance (i.e., kernel) functions [50]. Gaussian Process Regression estimates the distribution of functions, x, that are most likely given the training data. The mean of those likely functions m ( x ) estimates the most-likely prediction. The spread of the functions is described by the covariance k ( x i , x j ) of the i-th and j-th random variable. Therefore, the covariance function provides the uncertainty envelope: a measure of the confidence interval for m ( x ) . Therefore, GPR is able to provide not only the prediction of InSAR deformation for a particular location, but also the uncertainty associated with its predictions.
The GPR noise is handled by introducing an extra variable for representing the noise variance in the modelling [51]. The exact mathematical treatment of the Gaussian noise within GPR carries a high computational load and has been shown to compare well with computational shortcuts. In short, the noise variance is inferred from the data and is represented as extra hyper-parameters, which are trained alongside the other hyper-parameters.
Critically, prior knowledge can be encoded through the choice of the kernels. The commonly used Radial Basis Function (RBF) kernel was adopted here and is described by Equation (2). It has a length scale parameter l 2 , which describes the smoothness of the function. This parameter controls whether the model captures the data similarity of measurements very close or far away from each other. Therefore, a large l 2 results in a smoother function. Figure 5 shows the ability of the RBF kernel in capturing the InSAR deformation. It is clear from Figure 5a that the RBF kernel on its own tends to return to zero and, therefore, does not adequately capture the expected behaviour of the InSAR deformation. This tendency to return to zero was overcome through the combination of the RBF kernel with a linear kernel, as shown in Figure 5b. GPR models are described by hyper-parameters, which influence the predictive performance. Therefore, these hyper-parameters were optimised using gradient descent methods and the likelihood function.
K R B F ( x , x ) = e x p ( 1 2 l 2 ( x x ) )

3.4. Defining Anomalous Deformation Behaviour

A deformation behaviour may be described as anomalous if a measurement falls outside the “normal” variation of the data. In the case of GPR, the bounds between normal and anomalous behaviour are defined by the 95% confidence interval. Prediction uncertainty is not obtained using the EE-DL and RF approaches. Therefore, if a measurement is beyond the background noise level of the data, it will be outlined as an anomaly for the purposes of comparing the performance of the various models at detecting anomalous behaviour. The background noise level will be estimated using the RMSE between the measurements and predictions of the EE-DL approach.
Moreover, the anomalies are then clustered spatially so that anomalies next to other anomalies are given higher importance than those occurring on their own. Therefore, an arbitrary number (four) was chosen here as the minimum number of anomalies required for an anomalous cluster. For simplicity, any anomalous clusters inside the failure area were classed as true positives, and any outside the immediate failure area were treated as false positives. A simple definition of anomalous behaviour was adopted, intentionally, to showcase the potential of the algorithms in a direct and concise manner.

4. Results and Discussion

4.1. Prediction Performance of EE-DL Model

Deep learning experiments were successfully undertaken on the embeddings of the InSAR deformation metadata. The temporal behaviour of the EE-DL predictions is given in Figure 6. It presents the error between InSAR measurements and the predicted deformations from a variety of experiments. The input timescale and prediction horizon (i.e., forecast steps into the future) of the experiments are detailed in Table 3. In these experiments, the steps correspond to the number of dates at which prediction was performed, and each step equals the temporal resolution of the SAR data, which is every 12 days over Cadia. In the case of one-step-ahead prediction, the inference starts from n + 1 , where n represents the last date used in training. Similarly, in the context of four-step-ahead prediction (as seen in Run-C), following the completion of the four-step-ahead prediction, the data at n + 4 were inserted back into the training dataset, and the model was re-trained. Therefore, it is important to note that training occurred after the introduction of each new set of predictions.
For a typical stable location not experiencing deformation in Figure 6a, the absolute error between predictions and measurements was within 6 mm for all experiments, independent of the time frame and prediction horizons. However, the variation in the absolute error was significantly different for a typical measurement from the failure area, as plotted in Figure 6b. The absolute error was within 8 mm, which is ±4 mm, before almost doubling in the last three dates (2018-01-20 onwards) immediately before failure (2018-02-25). This is a clear sign of anomalous deformation behaviour arising from the TSF foundation issues attributed to the failure [19].
The minimum number of temporal data points required as training data for an acceptable EE-DL model performance was also explored. Figure 7 plots the global RMSE for InSAR measurements and the predictions for each inference date. The shortest number of input dates used for training was in Run-A, which had an input of 21 dates (2015-12-02 to 2016-08-10). The RMSE for Run-A was noisy for the first 13 predictions until 2017-02-06, after which the RMSE variability was significantly less in time, plateauing at approximately 3mm. This implies that a minimum of 35 dates, from 2015-12-02 to 2017-01-25, is required to obtain a stable deformation prediction. The data up to 2017-02-06 were used for training Run-B and Run-C. The longest input of 56 dates from 2015-12-02 to 2017-10-04 was for Run-D.
As expected, the RMSE of all models gradually increased away from the 3 mm plateau near the failure date in March 2018, as illustrated in Figure 7. This supports the hypothesis that an increasing amount of anomalous behaviour was present before failure. There was a sudden jump in the RMSE on 2017-10-16 for all models. This was likely due to inaccuracies in the InSAR processing, such as in the separation of the atmospheric phase component. Moreover, Run-B operated at a one-step-ahead ( n + 1 ) and Run-C at a four-step-ahead ( n + 4 ) prediction horizon. The variability in the RMSE of the four-step-ahead prediction horizon was larger at the 3 mm plateau, but this difference disappeared as the RMSE gradually increased for all experiments from 2017-10-28 to 2018-02-25. Finally, these results showed that the EE-DL model performed well at both one-step and four-step-ahead predictions. Therefore, the four-step-ahead prediction horizon was adopted for the purposes of comparing the performance of EE-DL with the GPR and RF models. The predictions of the last four temporal data points were used for comparing the performance of the models in detecting anomalous deformation behaviour.

4.2. Prediction Performance of GPR

The temporal history of example InSAR measurements and GPR predictions is plotted in Figure 8. In these plots, the InSAR measurements from the last four dates were not included in the training of the model and were used as testing data. Figure 8a displays a linear, relatively simple deformation behaviour. The test measurements may be described as displaying a “normal” behaviour, because they (blue) were captured within the uncertainty envelope. A more-complex, non-linear deformation behaviour is captured in Figure 8b. In this plot, all test measurements fall significantly outside the uncertainty envelope—displaying an anomalous deformation behaviour.
Any measurements displaying such anomalous deformation behaviour in the last four temporal points were then clustered. The number of anomalous clusters detected at each date were counted and are summarised in Figure 9. As expected, there was a significant number of anomalous clusters detected in the failure area of the TSF for all dates, also referred to as true positives in the “T” column. However, this was at the expense of a much larger number of false positives (“F” column). This large number of false positives limited the ability of using the GPR uncertainty envelope for localising the problematic parts of the TSF—the failure area.

4.3. Comparison of EE-DL, GPR, and RF

The results of the EE-DL and GPR approaches were compared to the baseline model, RF. The global error between the measured and predicted deformation at test time is plotted in Figure 10. As expected, the simplest approach, RF, performed the poorest. It has a consistently high RMSE for all test dates compared to the other models in Figure 10a. The best-performing model was GPR, which had the lowest RMSE for all the time steps. The EE-DL performance was intermediate between RF and GPR for all dates. The RMSEs were approximately up to ±4 mm and were similar to the absolute error related to the background prediction noise level in Figure 6. A value of ±4 mm is consistent with the background noise level of InSAR measurements reported in previous studies, e.g., [52,53]. Therefore, an error envelope of 8 mm was adopted as a threshold for separating normal from anomalous deformation behaviour. Similar to the RMSE results, the Maximum Error (MAXE) was also highest for RF. In Figure 10b, RF consistently has the highest MAXE for all dates. Interestingly, the shape of the MAXE was consistent for the last three dates for all the models, which gave the highest MAXE for 2018-02-13 and the lowest immediately preceding the failure in 2018-02-25.
A spatial visualisation of the anomalous clusters captured by the various models is given in Figure 11a–c. These maps show that all models successfully captured anomalous clusters in the failure area of the northern TSF. It is clear that the anomalous clusters in Figure 11 for (a) EE-DL and (b) GPR are much more focused around the slump area compared to (c) RF. Figure 11c, RF, contains many anomalous clusters outside the immediate slump area, spreading to all parts of the northern TSF. This large spread of anomalous clusters does not assist in narrowing down the area of potential problems. The RF model also detected a small number of anomalous clusters in the stable, south TSF. The anomalous cluster 1 in Figure 11c was only detected by RF and not the other two approaches, potentially limiting its reliability, whereas cluster 2 was detected by all models, giving more confidence to its anomalous nature.
All the algorithms detected some false positives. Thus, it is important to note that not all the instances flagged as false positives in Figure 11 and Figure 12 are necessarily incorrect in identifying anomalous behaviour. Even though these measurements were outside the slump area, they may still exhibit true anomalous behaviour that is unrelated to failure, for instance anomalous behaviour stemming from other construction processes. Interpreting the location and temporal occurrence of anomalies outside the slump area is beyond the scope of this study. Further clarification may be gained from combining InSAR with other data sources. The definition of true and false positives was intentionally simplified to enable a clear and concise comparison of the wide spectrum of algorithms employed in this study.
Moreover, the number of true and false anomalies detected by the various models was counted and is summarised in Figure 12. It shows that EE-DL detected the most true anomalous clusters, with a total of 69 compared to 59 by RF and 26 by GPR. It is also clear in Figure 12 that the RF model consistently detected the most false positives, a total of 177 compared to 59 for EE-DL and 30 for GPR. Moreover, the RF model had a higher number of false positives than true positives for all dates, potentially limiting its ability to locate the failure area. Although, the GPR detected the least amount of false positives, this was at the expense of the total number of anomalies. Therefore, the EE-DL approach detected the most true positives compared to the other approaches, but its true positives were not at the expense of false positives, like the RF model.

5. Conclusions

A novel methodology based on Entity Embeddings and Deep Learning (EE-DL) was proposed for detecting anomalous InSAR deformation. Embeddings of InSAR metadata were used for training a fully connected neural network. The use of entity embeddings captured both the spatial and temporal components of the InSAR measurements, without the typical DL challenges related to InSAR data. The feasibility of the EE-DL approach was demonstrated by a failed TSF case study to detect anomalous deformation behaviour preceding failure. The performance of the EE-DL model was compared to a probabilistic approach, GPR, and a simple baseline model, RF.
The comparison outlined that the EE-DL approach performed better than both GPR and RF for predicting ground deformation from interferometric time series. The novel use of entity embeddings provided an accurate geolocation of anomalous areas that could be subject to failure. The most true anomalous clusters were detected using the GPR uncertainty envelope. However, these anomalies were at the expense of the false positives, limiting its ability to geolocate the failure area. The EE-DL and RF approaches did not have an associated probabilistic uncertainty envelope; therefore, the background noise level of the predictions, an RMSE of ±4 mm, was used for defining an anomaly threshold. When this definition of anomaly was applied to the GPR mean predictions, GPR performance improved compared to the performance of its uncertainty envelope. The EE-DL approach still performed better than the GPR, because the improved GPR performance was at the expense of a significantly lowered detection of total anomalies, both true and false positives. The EE-DL also performed better than the RF, as the RF consistently detected more false positives than true positives.
The conclusions from this study set a promising direction for the establishment of early-warning systems for infrastructure failure with remote-sensing-derived data. These systems, alongside ground-based instrumentation, geotechnical simulation, and inspection, can create a monitoring ecosystem aimed at mitigating and, ideally, preventing the catastrophic consequences of TSF failures.

Author Contributions

Conceptualisation, M.B., C.R. and B.S.; methodology, M.B.; software, M.B.; validation, M.B.; formal analysis, M.B.; investigation, M.B.; resources, M.B.; data curation, M.B.; writing—original draft preparation, M.B.; writing—review and editing, supervision, C.R., F.K. and B.S.; project administration, M.B.; funding acquisition, M.B., C.R. and B.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Royal Commission for the Exhibition of 1851 and the Royal Academy of Engineering under the Industrial Fellowship and the Research Fellowship schemes, respectively.

Data Availability Statement

Third-party data restrictions apply to the availability of these data. The data were obtained from Terra Motion Limited and are available from the authors with the permission of Terra Motion Limited.

Acknowledgments

The authors wish to thank the financial support of the Royal Commission for the Exhibition of 1851 and the Royal Academy of Engineering. The authors wish to thank Terra Motion Limited for providing with ISBAS data over the test site.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hund, K.; La Porta, D.; Fabregas, T.; Laing, T.; Dexhage, J. Minerals for Climate Action: The Mineral Intensity of the Clean Energy Transition. Climate-Smart Mining Facility; World Bank: Washington, DC, USA, 2020. [Google Scholar]
  2. Rossi, C.; Bateson, L.; Bayaraa, M.; Butcher, A.; Ford, J.; Hughes, A. Framework for remote sensing and modelling of lithium-brine deposit formation. Remote Sens. 2022, 14, 1383. [Google Scholar] [CrossRef]
  3. WMTF. 2020. Available online: https://worldminetailingsfailures.org/estimate-of-world-tailings-portfolio-2020/ (accessed on 13 July 2023).
  4. Robertson, P.; de Melo, L.; Williams, D.J.; Wilson, G.W. Report of the Expert Panel on the Technical Causes of the Failure of Feijão Dam I. 2020. Available online: http://www.b1technicalinvestigation.com/ (accessed on 13 July 2023).
  5. Torres-Cruz, L.A.; O’Donovan, C. Public remotely sensed data raise concerns about history of failed Jagersfontein dam. Sci. Rep. 2023, 13, 4953. [Google Scholar] [CrossRef] [PubMed]
  6. Bowker, L.N.; Chambers, D.M. The risk, public liability, & economics of tailings storage facility failures. Earthwork Act 2015, 24, 1–56. [Google Scholar]
  7. Crosetto, M.; Solari, L.; Mróz, M.; Balasis-Levinsen, J.; Casagli, N.; Frei, M.; Oyen, A.; Moldestad, D.A.; Bateson, L.; Guerrieri, L.; et al. The evolution of wide-area DInSAR: From regional and national services to the European Ground Motion Service. Remote Sens. 2020, 12, 2043. [Google Scholar] [CrossRef]
  8. Siegmund, R.; Brcic, R.; Kotzerke, P.; Eineder, M. The European Ground Motion Service EGMS—Processing Central Europe with First Results on Quality and Point Densities. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 5105–5108. [Google Scholar] [CrossRef]
  9. Even, M.; Schulz, K. InSAR deformation analysis with distributed scatterers: A review complemented by new advances. Remote Sens. 2018, 10, 744. [Google Scholar] [CrossRef]
  10. Selvakumaran, S.; Plank, S.; Geiß, C.; Rossi, C.; Middleton, C. Remote monitoring to predict bridge scour failure using Interferometric Synthetic Aperture Radar (InSAR) stacking techniques. Int. J. Appl. Earth Obs. Geoinf. 2018, 73, 463–470. [Google Scholar] [CrossRef]
  11. Selvakumaran, S.; Rossi, C.; Marinoni, A.; Webb, G.; Bennetts, J.; Barton, E.; Plank, S.; Middleton, C. Combined InSAR and Terrestrial Structural Monitoring of Bridges. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7141–7153. [Google Scholar] [CrossRef]
  12. Milillo, P.; Bürgmann, R.; Lundgren, P.; Salzer, J.; Perissin, D.; Fielding, E.; Biondi, F.; Milillo, G. Space geodetic monitoring of engineered structures: The ongoing destabilization of the Mosul dam, Iraq. Sci. Rep. 2016, 6, 37408. [Google Scholar] [CrossRef]
  13. Macchiarulo, V.; Milillo, P.; DeJong, M.J.; Gonzalez Marti, J.; Sanchez, J.; Giardina, G. Integrated InSAR monitoring and structural assessment of tunnelling-induced building deformations. Struct. Control Health Monit. 2021, 28, e2781. [Google Scholar] [CrossRef]
  14. Cigna, F.; Tapete, D. Satellite InSAR survey of structurally-controlled land subsidence due to groundwater exploitation in the Aguascalientes Valley, Mexico. Remote Sens. Environ. 2021, 254, 112254. [Google Scholar] [CrossRef]
  15. Grebby, S.; Sowter, A.; Gluyas, J.; Toll, D.; Gee, D.; Athab, A.; Girindran, R. Advanced analysis of satellite data reveals ground deformation precursors to the Brumadinho Tailings Dam collapse. Commun. Earth Environ. 2021, 2, 2. [Google Scholar] [CrossRef]
  16. Thomas, A.; Edwards, S.; Engels, J.; McCormack, H.; Hopkins, V.; Holley, R. Earth observation data and satellite InSAR for the remote monitoring of tailings storage facilities: A case study of Cadia Mine, Australia. In Proceedings of the 22nd International Conference on Paste, Thickened and Filtered Tailings. , Cape Town, South Africa, 8–10 May 2019; Australian Centre for Geomechanics: Perth, Australia, 2019; pp. 183–195. [Google Scholar]
  17. Carlà, T.; Intrieri, E.; Raspini, F.; Bardi, F.; Farina, P.; Ferretti, A.; Colombo, D.; Novali, F.; Casagli, N. Perspectives on the prediction of catastrophic slope failures from satellite InSAR. Sci. Rep. 2019, 9, 14137. [Google Scholar] [CrossRef] [PubMed]
  18. Hudson, R.; Sato, S.; Morin, R.; McParland, M.A. Comparison of Sentinel-1 and RADARSAT-2 Data for Monitoring of Tailings Storage Facilities. In Proceedings of the EUSAR 2021—13th European Conference on Synthetic Aperture Radar, Online, 29 March–1 April 2021; pp. 1–6. [Google Scholar]
  19. Bayaraa, M.; Sheil, B.; Rossi, C. InSAR and numerical modelling for tailings dam monitoring—The Cadia failure case study. Géotechnique, 2022; ahead of print. [Google Scholar]
  20. Holden, D.; Donegan, S.; Pon, A. Brumadinho Dam InSAR study: Analysis of TerraSAR-X, COSMO-SkyMed and Sentinel-1 images preceding the collapse. In Proceedings of the 2020 International Symposium on Slope Stability in Open Pit Mining and Civil Engineering, Perth, Western Australia, 12–14 May 2020; Australian Centre for Geomechanics: Perth, Australia, 2020; pp. 293–306. [Google Scholar]
  21. Anantrasirichai, N.; Biggs, J.; Albino, F.; Bull, D. A deep learning approach to detecting volcano deformation from satellite imagery using synthetic datasets. Remote Sens. Environ. 2019, 230, 111179. [Google Scholar] [CrossRef]
  22. Anantrasirichai, N.; Biggs, J.; Kelevitz, K.; Sadeghi, Z.; Wright, T.; Thompson, J.; Achim, A.M.; Bull, D. Detecting Ground Deformation in the Built Environment using Sparse Satellite InSAR data with a Convolutional Neural Network. IEEE Trans. Geosci. Remote Sens. 2020, 59, 2940–2950. [Google Scholar] [CrossRef]
  23. Ma, P.; Zhang, F.; Lin, H. Prediction of InSAR time-series deformation using deep convolutional neural networks. Remote Sens. Lett. 2020, 11, 137–145. [Google Scholar] [CrossRef]
  24. Chen, Y.; Bruzzone, L.; Jiang, L.; Sun, Q. ARU-Net: Reduction of Atmospheric Phase Screen in SAR Interferometry Using Attention-Based Deep Residual U-Net. IEEE Trans. Geosci. Remote Sens. 2020, 59, 5780–5793. [Google Scholar] [CrossRef]
  25. Zhao, Z.; Wu, Z.; Zheng, Y.; Ma, P. Recurrent neural networks for atmospheric noise removal from InSAR time series with missing values. ISPRS J. Photogramm. Remote Sens. 2021, 180, 227–237. [Google Scholar] [CrossRef]
  26. Brengman, C.M.; Barnhart, W.D. Identification of Surface Deformation in InSAR Using Machine Learning. Geochem. Geophys. Geosyst. 2021, 22, e2020GC009204. [Google Scholar] [CrossRef]
  27. Rouet-Leduc, B.; Jolivet, R.; Dalaison, M.; Johnson, P.A.; Hulbert, C. Autonomous extraction of millimeter-scale deformation in InSAR time series using deep learning. Nat. Commun. 2021, 12, 6480. [Google Scholar] [CrossRef]
  28. Anantrasirichai, N.; Biggs, J.; Albino, F.; Hill, P.; Bull, D. Application of Machine Learning to Classification of Volcanic Deformation in Routinely Generated InSAR Data. J. Geophys. Res. Solid Earth 2018, 123, 6592–6606. [Google Scholar] [CrossRef]
  29. De Brébisson, A.; Simon, É.; Auvolat, A.; Vincent, P.; Bengio, Y. Artificial neural networks applied to taxi destination prediction. arXiv 2015, arXiv:1508.00021. [Google Scholar]
  30. Guo, C.; Berkhahn, F. Entity embeddings of categorical variables. arXiv 2016, arXiv:1604.06737. [Google Scholar]
  31. Howard, J.; Gugger, S. Deep Learning for Coders with Fastai and PyTorch; O’Reilly Media: Sebastopol, CA, USA, 2020. [Google Scholar]
  32. Girdhar, R.; El-Nouby, A.; Liu, Z.; Singh, M.; Alwala, K.V.; Joulin, A.; Misra, I. Imagebind: One embedding space to bind them all. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 15180–15190. [Google Scholar]
  33. Bamler, R.; Hartl, P. Synthetic aperture radar interferometry. Inverse Probl. 1998, 14, R1. [Google Scholar] [CrossRef]
  34. Ferretti, A.; Prati, C.; Rocca, F. Permanent scatterers in SAR interferometry. IEEE Trans. Geosci. Remote Sens. 2001, 39, 8–20. [Google Scholar] [CrossRef]
  35. Berardino, P.; Fornaro, G.; Lanari, R.; Sansosti, E. A new algorithm for surface deformation monitoring based on small baseline differential SAR interferograms. IEEE Trans. Geosci. Remote Sens. 2002, 40, 2375–2383. [Google Scholar] [CrossRef]
  36. Hooper, A. A multi-temporal InSAR method incorporating both persistent scatterer and small baseline approaches. Geophys. Res. Lett. 2008, 35. [Google Scholar] [CrossRef]
  37. Ferretti, A.; Fumagalli, A.; Novali, F.; Prati, C.; Rocca, F.; Rucci, A. A new algorithm for processing interferometric data-stacks: SqueeSAR. IEEE Trans. Geosci. Remote Sens. 2011, 49, 3460–3470. [Google Scholar] [CrossRef]
  38. Crosetto, M.; Monserrat, O.; Cuevas-González, M.; Devanthéry, N.; Crippa, B. Persistent Scatterer Interferometry: A review. ISPRS J. Photogramm. Remote Sens. 2015, 115, 78–89. [Google Scholar] [CrossRef]
  39. Lanari, R.; Casu, F.; Manzo, M.; Zeni, G.; Berardino, P.; Manunta, M.; Pepe, A. An overview of the small baseline subset algorithm: A DInSAR technique for surface deformation analysis. In Deformation and Gravity Change: Indicators of Isostasy, Tectonics, Volcanism, and Climate Change; Birkhäuser: Basel, Switzerland, 2007; pp. 637–661. [Google Scholar]
  40. Torres, R.; Snoeij, P.; Geudtner, D.; Bibby, D.; Davidson, M.; Attema, E.; Potin, P.; Rommen, B.; Floury, N.; Brown, M.; et al. GMES Sentinel-1 mission. Remote Sens. Environ. 2012, 120, 9–24. [Google Scholar] [CrossRef]
  41. Sowter, A.; Bateson, L.; Strange, P.; Ambrose, K.; Syafiudin, M.F. DInSAR estimation of land motion using intermittent coherence with application to the South Derbyshire and Leicestershire coalfields. Remote Sens. Lett. 2013, 4, 979–987. [Google Scholar] [CrossRef]
  42. Howard, J.; Gugger, S. Fastai: A layered API for deep learning. Information 2020, 11, 108. [Google Scholar] [CrossRef]
  43. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
  44. Gardner, J.; Pleiss, G.; Weinberger, K.Q.; Bindel, D.; Wilson, A.G. Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; Volume 31. [Google Scholar]
  45. Introducing TensorFlow Feature Columns. 2017. Available online: https://developers.googleblog.com/2017/11/introducing-tensorflow-feature-columns.html (accessed on 13 July 2023).
  46. NN-SVG Tool. Available online: http://alexlenail.me/NN-SVG/index.html (accessed on 13 July 2023).
  47. He, T.; Zhang, Z.; Zhang, H.; Zhang, Z.; Xie, J.; Li, M. Bag of tricks for image classification with convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 558–567. [Google Scholar]
  48. Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
  49. Santurkar, S.; Tsipras, D.; Ilyas, A.; Madry, A. How does batch normalization help optimization? arXiv 2018, arXiv:1805.11604. [Google Scholar]
  50. Williams, C.K.; Rasmussen, C.E. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006; Volume 2. [Google Scholar]
  51. McHutchon, A.; Rasmussen, C. Gaussian process training with input noise. In Proceedings of the 24th International Conference on Neural Information Processing Systems, Granada, Spain, 12–15 December 2011; Volume 24. [Google Scholar]
  52. Bonì, R.; Herrera, G.; Meisina, C.; Notti, D.; Béjar-Pizarro, M.; Zucca, F.; González, P.J.; Palano, M.; Tomás, R.; Fernández, J.; et al. Application of multi-sensor advanced DInSAR analysis to severe land subsidence recognition: Alto Guadalentín Basin (Spain). Proc. Int. Assoc. Hydrol. Sci. 2015, 372, 45–48. [Google Scholar] [CrossRef]
  53. Cigna, F.; Osmanoğlu, B.; Cabral-Cano, E.; Dixon, T.H.; Ávila-Olivera, J.A.; Garduño-Monroy, V.H.; DeMets, C.; Wdowinski, S. Monitoring land subsidence and its induced geological hazard with Synthetic Aperture Radar Interferometry: A case study in Morelia, Mexico. Remote Sens. Environ. 2012, 117, 146–161. [Google Scholar] [CrossRef]
Figure 1. Spatial view of the InSAR data over Cadia TSFs. (a) Illustration of InSAR velocity measurements averaged over the whole SAR data stack (from 2015-12-02 to 2018-02-25). Basemap image copyright © ESA Sentinel-2 data. ISBAS InSAR data © Terra Motion. (b) Zoom-in of the failed area. Basemap image copyright © 2021 Maxar Technologies © CNES/Airbus © Google Earth.
Figure 1. Spatial view of the InSAR data over Cadia TSFs. (a) Illustration of InSAR velocity measurements averaged over the whole SAR data stack (from 2015-12-02 to 2018-02-25). Basemap image copyright © ESA Sentinel-2 data. ISBAS InSAR data © Terra Motion. (b) Zoom-in of the failed area. Basemap image copyright © 2021 Maxar Technologies © CNES/Airbus © Google Earth.
Remotesensing 15 04910 g001
Figure 2. Indicative across-TSF temporal plot of InSAR Line-Of-Sight (LOS) deformation: (a) NTSF slump area and (b) stable south TSF. Downstream indicates the location of the dam toe, and upstream indicates the direction towards the deposited tailings.
Figure 2. Indicative across-TSF temporal plot of InSAR Line-Of-Sight (LOS) deformation: (a) NTSF slump area and (b) stable south TSF. Downstream indicates the location of the dam toe, and upstream indicates the direction towards the deposited tailings.
Remotesensing 15 04910 g002
Figure 3. The importance of each InSAR metadata feature in predicting deformation, as determined by the Random Forest model.
Figure 3. The importance of each InSAR metadata feature in predicting deformation, as determined by the Random Forest model.
Remotesensing 15 04910 g003
Figure 4. The fully connected architecture used in this study. The first hidden layer starts with 1100 neurons, and at each hidden layer, the number of neurons reduces by 100. The visualisation of the architecture is generated through the NN-SVG [46].
Figure 4. The fully connected architecture used in this study. The first hidden layer starts with 1100 neurons, and at each hidden layer, the number of neurons reduces by 100. The visualisation of the architecture is generated through the NN-SVG [46].
Remotesensing 15 04910 g004
Figure 5. Comparison of GPR kernels in capturing the InSAR deformation behaviour: (a) Radial Basis Function kernel (RBF) and (b) combined RBF and linear kernels. The addition of a linear kernel gets rid of the tendency of the RBF to return to zero.
Figure 5. Comparison of GPR kernels in capturing the InSAR deformation behaviour: (a) Radial Basis Function kernel (RBF) and (b) combined RBF and linear kernels. The addition of a linear kernel gets rid of the tendency of the RBF to return to zero.
Remotesensing 15 04910 g005
Figure 6. The absolute error between the EE-DL prediction and the InSAR deformation measurements (grey) is plotted. The dashed lines mark the absolute error associated with the background noise level in the data. (a) An example of an InSAR measurement not experiencing deformation. (b) A measurement displaying a deformation of 40 mm and a clear anomalous deformation behaviour immediately preceding failure. Note the sudden jump in the absolute error for the last four dates.
Figure 6. The absolute error between the EE-DL prediction and the InSAR deformation measurements (grey) is plotted. The dashed lines mark the absolute error associated with the background noise level in the data. (a) An example of an InSAR measurement not experiencing deformation. (b) A measurement displaying a deformation of 40 mm and a clear anomalous deformation behaviour immediately preceding failure. Note the sudden jump in the absolute error for the last four dates.
Remotesensing 15 04910 g006
Figure 7. The global Root-Mean-Squared Error (RMSE) for different input timescales and prediction horizons. All experiments, except Run-C, predicted at one-step-ahead, and Run-C was at a four-stepahead prediction horizon.
Figure 7. The global Root-Mean-Squared Error (RMSE) for different input timescales and prediction horizons. All experiments, except Run-C, predicted at one-step-ahead, and Run-C was at a four-stepahead prediction horizon.
Remotesensing 15 04910 g007
Figure 8. Temporal plots of example InSAR measurements and the GPR predictions. (a) A typical linear deformation with all test measurements within the uncertainty envelope, therefore displaying a “normal” behaviour. (b) An example of anomalous deformation behaviour with the test measurements falling significantly outside the uncertainty envelope.
Figure 8. Temporal plots of example InSAR measurements and the GPR predictions. (a) A typical linear deformation with all test measurements within the uncertainty envelope, therefore displaying a “normal” behaviour. (b) An example of anomalous deformation behaviour with the test measurements falling significantly outside the uncertainty envelope.
Remotesensing 15 04910 g008
Figure 9. Measurements falling outside the GPR confidence interval are clustered and counted. A minimum of four or more anomalous measurements forms a cluster. If these anomalous clusters occur inside the failure area, they are counted as a true positive, “T”, otherwise as a false positive, “F”.
Figure 9. Measurements falling outside the GPR confidence interval are clustered and counted. A minimum of four or more anomalous measurements forms a cluster. If these anomalous clusters occur inside the failure area, they are counted as a true positive, “T”, otherwise as a false positive, “F”.
Remotesensing 15 04910 g009
Figure 10. Global error metrics for the various models. (a) RMSE and (b) Maximum Error (MAXE).
Figure 10. Global error metrics for the various models. (a) RMSE and (b) Maximum Error (MAXE).
Remotesensing 15 04910 g010
Figure 11. Spatial distribution of anomalous clusters detected on the last four test dates for (a) EE-DL, (b) Gaussian Process Regression mean, and (c) Random Forest baseline model. The failure area of the northern TSF is outlined. Slumping occurred in two stages; the black outline precedes white. Basemap image copyright © 1995–2020 Esri.
Figure 11. Spatial distribution of anomalous clusters detected on the last four test dates for (a) EE-DL, (b) Gaussian Process Regression mean, and (c) Random Forest baseline model. The failure area of the northern TSF is outlined. Slumping occurred in two stages; the black outline precedes white. Basemap image copyright © 1995–2020 Esri.
Remotesensing 15 04910 g011
Figure 12. Summary count of the number of anomalous clusters detected by (a) EE-DL, (b) GPR mean, and (c) RF.
Figure 12. Summary count of the number of anomalous clusters detected by (a) EE-DL, (b) GPR mean, and (c) RF.
Remotesensing 15 04910 g012
Table 1. Random Forest hyper-parameter setup.
Table 1. Random Forest hyper-parameter setup.
Random Forest
n_estimators40
max_samples200,000
min_samples_leaf5
max_features0.5
Table 2. The embedding size is calculated based on the size of the cardinality of the entities. The cardinality is the number of unique categories representing each entity.
Table 2. The embedding size is calculated based on the size of the cardinality of the entities. The cardinality is the number of unique categories representing each entity.
EntityCardinalityEmbedding Size
Coherence101677
Longitude24835
Latitude20532
Day3011
Month126
Year43
Table 3. EE-DL input training data with different lengths of temporal data points as described by the start and end dates. The total number of dates used in the training and the prediction horizon, i.e., forecast steps into the future, are also specified. There is a 12-day interval between each date.
Table 3. EE-DL input training data with different lengths of temporal data points as described by the start and end dates. The total number of dates used in the training and the prediction horizon, i.e., forecast steps into the future, are also specified. There is a 12-day interval between each date.
StartEndTotal DatesSteps
Run-A2015-12-022016-08-10211
Run-B2015-12-022017-01-25351
Run-C2015-12-022017-01-25354
Run-D2015-12-022017-10-04561
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bayaraa, M.; Rossi, C.; Kalaitzis, F.; Sheil, B. Entity Embeddings in Remote Sensing: Application to Deformation Monitoring for Infrastructure. Remote Sens. 2023, 15, 4910. https://doi.org/10.3390/rs15204910

AMA Style

Bayaraa M, Rossi C, Kalaitzis F, Sheil B. Entity Embeddings in Remote Sensing: Application to Deformation Monitoring for Infrastructure. Remote Sensing. 2023; 15(20):4910. https://doi.org/10.3390/rs15204910

Chicago/Turabian Style

Bayaraa, Maral, Cristian Rossi, Freddie Kalaitzis, and Brian Sheil. 2023. "Entity Embeddings in Remote Sensing: Application to Deformation Monitoring for Infrastructure" Remote Sensing 15, no. 20: 4910. https://doi.org/10.3390/rs15204910

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop