Automatic Classification of Active Deformation Areas Based on Synthetic Aperture Radar Data and Environmental Covariates Using Machine Learning—Application in SE Spain †

: Deformation processes, both natural (e


Introduction
The detection and classification of active deformation areas is a novel approach that allows non-expert users of InSAR to integrate SAR-based products into risk management.Bonì et al. [1] and Barra et al. [2] established the initial methodologies for the automatic detection of Active Deformation Areas (ADAs) using GIS tools.Bonì et al. [3] implemented their methodology using ArcGIS, while Navarro et al. [4] implemented Barra's methodology in a software package with a graphical user interface called ADAfinder (V2.0.9 is the last version and it's available free on request), using the C++ programming language.ADAfinder determines active Deformational Time Series (DTS) through standard deviation thresholds, isolation distance, and average velocity.Subsequently, it groups them into polygonal clusters (ADAs), whose dimensions depend on parameters such as the defined influence radius and the minimum number of DTS required to form an ADA.Additionally, ADAfinder calculates a quality index for each ADA.
Tomás et al. [5] developed ADAclassifier (V2.0.9 is the last version and it's available free on request), a software package that determines whether the deformation of an ADA is related, potentially related, or unrelated to a sliding, sinkhole, subsidence, or settlement process.The classification is determined using a heuristic decision tree based on intersection thresholds with inventories of processes (landslides, subsidence, and sinkholes), infrastructure, and geological variables (Quaternary deposits and saline-carbonate soils/rocks), as well as thresholds for the horizontal velocity, slope, and coefficient-of-fit correlation of the DTS to a negative exponential function.
Recently, Festa et al. [6] proposed a machine learning-based methodology to classify DTS (instead of ADAs) into three processes: subsidence, landslide, and deformation related to mining.In this methodology, random forest is trained with morphometric variables (slope, aspect, elevation, Topographic Wetness Index (TWI), profile curvature, general curvature, and plan curvature), variables related to inventories (distance to landslides and mining sites), a geological variable (lithology), and a variable that describes the ratio between horizontal (E-W) and vertical velocity, called KVH, useful for distinguishing landslides from subsidence.
In this study, we combined the inherent advantages of each approach to achieve the automatic classification of deformation processes using machine learning in a large area of approximately 17,500 km 2 in southeastern Spain.This region encompasses a significant part of the Region of Murcia, as well as the provinces of Alicante and Almería (Figure 1).The study area exhibits a wide range of geological materials, including predominantly metamorphic hard rocks (HR) and unconsolidated sedimentary deposits (USD) (Figure 2).
Tomás et al. [5] developed ADAclassifier (V2.0.9 is the last version and it's available free on request), a software package that determines whether the deformation of an ADA is related, potentially related, or unrelated to a sliding, sinkhole, subsidence, or settlement process.The classification is determined using a heuristic decision tree based on intersection thresholds with inventories of processes (landslides, subsidence, and sinkholes), infrastructure, and geological variables (Quaternary deposits and saline-carbonate soils/rocks), as well as thresholds for the horizontal velocity, slope, and coefficient-of-fit correlation of the DTS to a negative exponential function.
Recently, Festa et al. [6] proposed a machine learning-based methodology to classify DTS (instead of ADAs) into three processes: subsidence, landslide, and deformation related to mining.In this methodology, random forest is trained with morphometric variables (slope, aspect, elevation, Topographic Wetness Index (TWI), profile curvature, general curvature, and plan curvature), variables related to inventories (distance to landslides and mining sites), a geological variable (lithology), and a variable that describes the ratio between horizontal (E-W) and vertical velocity, called KVH, useful for distinguishing landslides from subsidence.
In this study, we combined the inherent advantages of each approach to achieve the automatic classification of deformation processes using machine learning in a large area of approximately 17,500 km 2 in southeastern Spain.This region encompasses a significant part of the Region of Murcia, as well as the provinces of Alicante and Almería (Figure 1).The study area exhibits a wide range of geological materials, including predominantly metamorphic hard rocks (HR) and unconsolidated sedimentary deposits (USD) (Figure 2).We utilized ground deformation measurements obtained from the processing of descending Sentinel-1 SAR data for the Region of Murcia and its surroundings, covering the period from 2015 to 2021.The selection and labeling process of each measurement point or persistent scatterer (PS hereafter), which corresponds DTS related to deformation processes, involved intersecting the DTS with national process inventories/catalogs and polygons resulting from previous SAR-based analysis and interpretation.For each labeled PS, we applied the elbow method to determine the optimal number of clusters (k) for both K-means and K-shape algorithms.The Soft_DTW algorithm served as the distance metric in both cluster analyses.We then identified and eliminated noisy and stable clusters that were not associated with deformation processes by using thresholds.

Methodology
Subsequently, we constructed a database by associating the values of each of the 26 variables (Table 1) with their respective DTS.We combined the temporal information from

Methodology
Figure 3 provides an overview of the methodology employed in this research.We utilized ground deformation measurements obtained from the processing of descending Sentinel-1 SAR data for the Region of Murcia and its surroundings, covering the period from 2015 to 2021.The selection and labeling process of each measurement point or persistent scatterer (PS hereafter), which corresponds DTS related to deformation processes, involved intersecting the DTS with national process inventories/catalogs and polygons resulting from previous SAR-based analysis and interpretation.For each labeled PS, we applied the elbow method to determine the optimal number of clusters (k) for both K-means and K-shape algorithms.The Soft_DTW algorithm served as the distance metric in both cluster analyses.We then identified and eliminated noisy and stable clusters that were not associated with deformation processes by using thresholds.
Subsequently, we constructed a database by associating the values of each of the 26 variables (Table 1) with their respective DTS.We combined the temporal information from We utilized ground deformation measurements obtained from the processing of descending Sentinel-1 SAR data for the Region of Murcia and its surroundings, covering the period from 2015 to 2021.The selection and labeling process of each measurement point or persistent scatterer (PS hereafter), which corresponds DTS related to deformation processes, involved intersecting the DTS with national process inventories/catalogs and polygons resulting from previous SAR-based analysis and interpretation.For each labeled PS, we applied the elbow method to determine the optimal number of clusters (k) for both K-means and K-shape algorithms.The Soft_DTW algorithm served as the distance metric in both cluster analyses.We then identified and eliminated noisy and stable clusters that were not associated with deformation processes by using thresholds.
Subsequently, we constructed a database by associating the values of each of the 26 variables (Table 1) with their respective DTS.We combined the temporal information from the displacement and hydrological time series into a single aggregated variable using statistical techniques.The thematic and continuous maps were included in the database as either categorical or numerical variables.To address the issue of numerous lithological classes and prevent redundancy, we reclassified the GEODE into eight classes (Figure 2) based on their geotechnical characteristics.Finally, we trained ML algorithms based on decision trees to generate an optimal model capable of classifying DTS according to their deformation process.

Results
During the conducted analysis, we identified a total of 58 deformation processes, with 39 corresponding to mining slides (L_M), 12 to landslides (L), 5 to dump subsidence (Su_Du), and 2 to groundwater subsidence (Su_Dw) (Figure 1a).By intersecting the data from the descending PS with the deformation processes, we successfully extracted and labeled 20,499 DTS.The vast majority of these series (97%) corresponded to subsidence caused by groundwater extraction (Su_Dw).We carried out the identification of noisy and stable time series for each deformation process through clustering of the time series.Figure 4a displays the clustering results obtained for Su_Dw.By utilizing the elbow technique, we identified six clusters.Applying thresholds related to the mean absolute deviation and mean velocity, we determined that cluster ID3 was the only one related to the deformation process.Therefore, we eliminated 5456 time series from the other clusters located at the valley edges (Figure 4b).
After filtering, we obtained 15,043 DTS related to deformation processes, which formed the database.We applied the synthetic minority over-sampling technique (SMOTE) to generate samples from minority classes and balance the data, as the majority of them belonged to the Su_DW class.We used the random forest algorithm for classification.The model achieved a perfect classification with an AUC of 1.0 in the test set, as observed in the confusion matrix of Figure 5a.Hydrological, geological, morphometric, and geotechnical variables proved to be the most relevant for the classification model (Figure 5b).Specifically, the presence of groundwater masses, distance to faults, slope, percentage of sand and clay, lithology, soil bulk density, Vs30, and geological age were the most determining variables, while variables related to displacement, hazards, and land cover had less importance.

Figure 2 .
Figure 2. Geological context of the study area.

Figure 3
Figure 3 provides an overview of the methodology employed in this research.

Figure 3 .
Figure 3. Methodology flowchart.(1) Extraction and labeling of DTS.(2) Clustering and filtering of DTS.(3) Creation of database.(4) Generation of classification model of deformation process using ML.Nc: Noise cluster; Sc: Stable cluster; Oc: Other clusters not related to the main process.

Figure 2 .
Figure 2. Geological context of the study area.

Figure 3 6 Figure 2 .
Figure 3 provides an overview of the methodology employed in this research.

Figure 3 .
Figure 3. Methodology flowchart.(1) Extraction and labeling of DTS.(2) Clustering and filtering of DTS.(3) Creation of database.(4) Generation of classification model of deformation process using ML.Nc: Noise cluster; Sc: Stable cluster; Oc: Other clusters not related to the main process.

Figure 3 .
Figure 3. Methodology flowchart.(1) Extraction and labeling of DTS.(2) Clustering and filtering of DTS.(3) Creation of database.(4) Generation of classification model of deformation process using ML.Nc: Noise cluster; Sc: Stable cluster; Oc: Other clusters not related to the main process.

Figure 4 .
Figure 4. Results of DTS filtering.Groundwater subsidence in Lorca, SE Spain.(a) Centroids of clusters generated with Kshape algorithm and statistic associated with the threshold of filtering (mean absolute deviation and mean velocity).ID 3 is the unique cluster that exceeds the filtered threshold.(b) Spatial representation of the clustering.The comparison between the pink geometries of the corners maps allows us to identify the DTS to be removed (green points corresponding to clusters other than ID 3): noisy DTS in red (circle), stable DTS in yellow (rectangle), and DTS with inverse trends in blue (rectangle).

Figure 4 .
Figure 4. Results of DTS filtering.Groundwater subsidence in Lorca, SE Spain.(a) Centroids of clusters generated with Kshape algorithm and statistic associated with the threshold of filtering (mean absolute deviation and mean velocity).ID 3 is the unique cluster that exceeds the filtered threshold.(b) Spatial representation of the clustering.The comparison between the pink geometries of the corners maps allows us to identify the DTS to be removed (green points corresponding to clusters other than ID 3): noisy DTS in red (circle), stable DTS in yellow (rectangle), and DTS with inverse trends in blue (rectangle).

Figure 4 .
Figure 4. Results of DTS filtering.Groundwater subsidence in Lorca, SE Spain.(a) Centroids of clusters generated with Kshape algorithm and statistic associated with the threshold of filtering (mean absolute deviation and mean velocity).ID 3 is the unique cluster that exceeds the filtered threshold.(b) Spatial representation of the clustering.The comparison between the pink geometries of the corners maps allows us to identify the DTS to be removed (green points corresponding to clusters other than ID 3): noisy DTS in red (circle), stable DTS in yellow (rectangle), and DTS with inverse trends in blue (rectangle).

Table 1 .
Covariates of the proposed national database, classified according to their research domain.
Each covariate is associated with a superscript number that serves to indicate features such as resolution and variable type in the table.N: nominal categorical; O: ordinal categorical; C: continue numerical; D: discrete numerical.