Machine Learning for Deﬁning the Probability of Sentinel-1 Based Deformation Trend Changes Occurrence

: The continuous monitoring of displacements occurring on the Earth surface by exploiting MTInSAR (Multi Temporal Interferometry SAR) Sentinel-1 data is a solid reality, as testified by the ongoing operational ground motion service in the Tuscany region (Central Italy). In this framework, anomalies of movement, i.e., accelerations or deceleration as seen by the time series of displacement of radar targets, are identified. In this work, a Machine Learning algorithm such as the Random Forest has been used to assess the probability of occurrence of the anomalies induced by slope instability and subsidence. About 20,000 anomalies (about 7000 and 13,000 for the slope instability and the subsidence, respectively) were collected between 2018 and 2020 and were used as input, while ten different variables were selected, five related to the morphological and geological setting of the study area and five to the radar characteristics of the data. The resulting maps may provide useful indications of where a sudden change of displacement trend may occur, analyzing the contribution of each factor. The cross-validation with the anomalies collected in a following timespan (2020–2021) and with official landslide and subsidence inventories provided by the regional authority has confirmed the reliability of the final maps. The adoption of a map for assessing the probability of the occurrence of MTInSAR anomalies may serve as an enhanced geohazard prevention measurement, to be periodically updated and refined in order to have the most precise knowledge possible of the territory.


Introduction
The detection and mapping of geological hazards are paramount activities for land management and risk reduction policies. Indeed, geohazards can result in the loss of human lives, extensive physical damage to buildings and critical infrastructure. According to the Emergency Events Database (https://public.emdat.be/, accessed on 18 February 2022), in 2021 about 250 geohazards have occurred worldwide, claiming the lives of more than 150 people and affecting almost 20 million people in total, while Franceschini, et al. [1] found that over 13,000 landslide events were reported in Italy from 2010 to 2019, with more than 300 of these in the Tuscany region. Ground motion events, such as landslides and subsidence, can be easily monitored by a wide spectrum of techniques, depending on the scale of analysis and the intensity of the deformation. Earth observation spaceborne practices, and especially InSAR and MTInSAR methods (Interferometry Synthetic Aperture Radar and Multi-Temporal InSAR), have demonstrated their suitability for a wide range of slow-moving ground motion phenomena and the variety of scale of analysis, i.e., continental [2,3], national [4][5][6][7], regional [8][9][10][11][12][13] and very local [14][15][16][17][18], performing very well either for the monitoring and analysis of subsidence [19,20] or landslides [21][22][23]. The present abundance of SAR imagery, the simultaneous activity of many SAR systems and the shortening of the revisit time, in particular with the launch of the Sentinel-1 (S-1)

Input Data
The data used for the application of the Random Forest (RF) model can be split into two groups: the anomalies inventory; and the environmental and radar variables used as predisposing factors (PF). The flowchart in Figure 2 summarizes the use of the input data within the implemented model and provides a conceptual model of the work carried out.

Input Data
The data used for the application of the Random Forest (RF) model can be split into two groups: the anomalies inventory; and the environmental and radar variables used as predisposing factors (PF). The flowchart in Figure 2 summarizes the use of the input data within the implemented model and provides a conceptual model of the work carried out.
Remote Sens. 2022, 14, x FOR PEER REVIEW 5 of 22 Figure 2. Flowchart of the conceptual framework.

Anomalies Inventory
The regional continuous monitoring services benefit from Sentinel-1 (both A and B) data. The initial analysis was conducted by acquiring and processing, through the usage of SqueeSAR technique [24], a second-generation PSInSAR algorithm capable of processing long temporal stacks of SAR images, and, acquired over the same area, the entire ESA images archive of Sentinel-1 imagery. Unlike the PSInSAR approach, however, the SqueeSAR technique allows one to measure displacements by exploiting both point-wise coherent scatterers (i.e., the Persistent Scatterers, PS) and partially coherent Distributed Scatterers (DS). The basic idea of SqueeSAR is to identify sets of pixels that share the same kind of radar backscattered signal, i.e., statistically homogeneous pixels identified through statistical tests. SqueeSAR analysis is designed to identify a sparse grid of measurement points (MP), for which it is possible to estimate the following parameters: displacement time series (TS) along the satellite line of sight (LOS), displacement rate (in mm/year) and a set of quality parameters. The time series (TS) of displacement is generated, and abrupt changes in the trend (as shown further), i.e., the anomalies, are identified and evaluated. Datasets used in this service include Sentinel-1 C-band images (central frequency 5.4 GHz and wavelength 5.6 cm) acquired in both ascending and descending geometry, with the right-side looking configuration of the SAR system. The TS screening activity, necessary to identify the anomalies, is based on the addition of two consecutive S-1 images, thus every 12 days. After the latest S-1 acquisition, the TS of each measurement point is calculated and any trend changes are recognized if there is a difference due to a certain velocity variation (Δv), higher than a selected threshold in a given temporal window (Δt). Four different trend changes are detected: accelerating and decelerating trends with movements towards or away from the satellite view. The Δt was set in 150-days and the velocity thresholds Δv at 10 mm/year after a tuning period. The anomaly generation starts from a SqueeSAR dataset of more than 900,000 MP, acquired along the LOS direction for each geometry of acquisition ( Figure 3). Further details about the processing and the anomalies generation can be found in Raspini, et al. [26], where the algorithm is thoroughly described, and in Confuorto, et al. [28], where a summary of one year of operational service in Tuscany is reported. The final stage of the operational service is the interpretation of the anomaly, according to the main triggers. For this work, anomalies col-

Anomalies Inventory
The regional continuous monitoring services benefit from Sentinel-1 (both A and B) data. The initial analysis was conducted by acquiring and processing, through the usage of SqueeSAR technique [24], a second-generation PSInSAR algorithm capable of processing long temporal stacks of SAR images, and, acquired over the same area, the entire ESA images archive of Sentinel-1 imagery. Unlike the PSInSAR approach, however, the SqueeSAR technique allows one to measure displacements by exploiting both point-wise coherent scatterers (i.e., the Persistent Scatterers, PS) and partially coherent Distributed Scatterers (DS). The basic idea of SqueeSAR is to identify sets of pixels that share the same kind of radar backscattered signal, i.e., statistically homogeneous pixels identified through statistical tests. SqueeSAR analysis is designed to identify a sparse grid of measurement points (MP), for which it is possible to estimate the following parameters: displacement time series (TS) along the satellite line of sight (LOS), displacement rate (in mm/year) and a set of quality parameters. The time series (TS) of displacement is generated, and abrupt changes in the trend (as shown further), i.e., the anomalies, are identified and evaluated. Datasets used in this service include Sentinel-1 C-band images (central frequency 5.4 GHz and wavelength 5.6 cm) acquired in both ascending and descending geometry, with the right-side looking configuration of the SAR system. The TS screening activity, necessary to identify the anomalies, is based on the addition of two consecutive S-1 images, thus every 12 days. After the latest S-1 acquisition, the TS of each measurement point is calculated and any trend changes are recognized if there is a difference due to a certain velocity variation (∆v), higher than a selected threshold in a given temporal window (∆t). Four different trend changes are detected: accelerating and decelerating trends with movements towards or away from the satellite view. The ∆t was set in 150-days and the velocity thresholds ∆v at 10 mm/year after a tuning period. The anomaly generation starts from a SqueeSAR dataset of more than 900,000 MP, acquired along the LOS direction for each geometry of acquisition ( Figure 3). Further details about the processing and the anomalies generation can be found in Raspini, et al. [26], where the algorithm is thoroughly described, and in Confuorto, Remote Sens. 2022, 14, 1748 5 of 21 et al. [28], where a summary of one year of operational service in Tuscany is reported. The final stage of the operational service is the interpretation of the anomaly, according to the main triggers. For this work, anomalies collected over the whole Tuscan territory between 1 January 2018 and April 2021 have been used. As previously mentioned, SI and S anomalies have been taken into consideration (excluding Mining Activity, MA, Geothermal Activity, GA, Uplift, U, Dump Site, D, Not Determined, ND, and Noisy, R, anomalies, see Confuorto, et al. [28] for further details about the classification), for a total of 7175 points for the first category (2962 along the ascending and 4213 along the descending orbit, Figure 4a,b) and 13,971 for the second (3758 ascending and 10,213 descending, Figure 4c,d).
In order to validate the probability of occurrence maps with a statistically independent sub-dataset, part of the databases have been extrapolated. In detail, SI anomalies collected after January 2020, for a total of 1087 (727 + 360, Figure 4a,b), and S anomalies between May 2020 and April 2021, counting 327 points (121 + 206, Figure 4c,d), were used for statistically assessing the consistency of each final map of this study. All the anomalies not belonging to the category of the assigned input data, (e.g., for SI anomalies used as input, all the remnant anomalies classified differently, including thus S, MA, GA, U, D, ND and R anomalies) were used as absence data in the RF algorithm, counting for about 29,000 points in the case of subsidence and about 26,000 in the case of landslides. In Table 1  used. As previously mentioned, SI and S anomalies have been taken into consideration (excluding Mining Activity, MA, Geothermal Activity, GA, Uplift, U, Dump Site, D, Not Determined, ND, and Noisy, R, anomalies, see Confuorto, et al. [28] for further details about the classification), for a total of 7175 points for the first category (2962 along the ascending and 4213 along the descending orbit, Figure 4a,b) and 13,971 for the second (3758 ascending and 10,213 descending, Figure 4c,d). In order to validate the probability of occurrence maps with a statistically independent sub-dataset, part of the databases have been extrapolated. In detail, SI anomalies collected after January 2020, for a total of 1087 (727 + 360, Figure 4a,b), and S anomalies between May 2020 and April 2021, counting 327 points (121 + 206, Figure 4c,d), were used for statistically assessing the consistency of each final map of this study. All the anomalies not belonging to the category of the assigned input data, (e.g., for SI anomalies used as input, all the remnant anomalies classified differently, including thus S, MA, GA, U, D, ND and R anomalies) were used as absence data in the RF algorithm, counting for about 29,000 points in the case of subsidence and about 26,000 in the case of landslides. In Table 1 a summary of all the above-mentioned figures is reported.

Predisposing Factors (PF)
According to the morphological setting of the Tuscan territory and the main characteristic of the SAR system used to monitor the deformational scenario, ten PFs have been selected; five related to the first category and five to the second one. Namely, they are: Slope Aspect, Slope gradient, Topographic Position Index, Land Use and Lithology for the first class; and C-index, R-Index, Horizontal EW Velocity of displacement, Vertical velocity of displacement and Standard Deviation of the velocity of displacement for the second class ( Figure 5).
The slope aspect (Figure 5a) is defined as the orientation of the Earth's surface with respect to the sun. It is measured clockwise in degrees from 0 to 360, where 0 is northfacing, 90 is east-facing, 180 is south-facing, and 270 is west-facing. Flat areas have a value of −1. The slope aspect can play an important role in many factors controlling slope stability. Here, the slope aspect is also used to discriminate between flat and hilly areas, differentiating between subsidence and slope deformations.
The slope angle expresses the degree of incline of the topographic surface and may be one of the most significant factors for slope instabilities (Figure 5b).

Predisposing Factors (PF)
According to the morphological setting of the Tuscan territory and the main characteristic of the SAR system used to monitor the deformational scenario, ten PFs have been selected; five related to the first category and five to the second one. Namely, they are: Slope Aspect, Slope gradient, Topographic Position Index, Land Use and Lithology for the first class; and C-index, R-Index, Horizontal EW Velocity of displacement, Vertical velocity of displacement and Standard Deviation of the velocity of displacement for the second class ( Figure 5).
The slope aspect ( Figure 5a) is defined as the orientation of the Earth's surface with respect to the sun. It is measured clockwise in degrees from 0 to 360, where 0 is north-facing, 90 is east-facing, 180 is south-facing, and 270 is west-facing. Flat areas have a value of −1. The slope aspect can play an important role in many factors controlling slope stability. Here, the slope aspect is also used to discriminate between flat and hilly areas, differentiating between subsidence and slope deformations.
The slope angle expresses the degree of incline of the topographic surface and may be one of the most significant factors for slope instabilities (Figure 5b). The Topographic Position Index (TPI) is defined as the difference in elevation between a central pixel and the mean of its surrounding cells [58] (Figure 5c).
where Vasc, Vdesc is the velocity from the ascending and descending geometries. Vv, Vh are the vertical and east-west values, θasc, θdesc are the incidence angles for the ascending and descending orbits.
The standard deviation of the LOS velocity expresses the dispersion of the velocity from the mean value of the whole dataset ( Figure 5j).
All the above-mentioned maps were rasterized with a 50 m cell size, which can be considered a good and balanced cell size for regional-scale studies and has already been adopted in the Tuscany region for landslide susceptibility [32]. Land use-land cover corresponds to the ground surface cover (with natural or humanmodified elements), which is connected with natural dynamics and human activities acting on the territory. The CORINE Land Cover (CLC) map of 2018 (the most updated available version) has been adopted for this work (Figure 5d). The third level products (the most detailed) have been used to be as precise as possible. To make the figure visible, the legend of Figure 5d is referred to in the first level. Class 1 indicates the artificial surfaces, class 2 the agricultural areas, class 3 the forest and semi-natural area, while classes 4 and 5 indicate the wetlands and the water bodies, respectively.
The lithological map has been taken from the regional cartographic service, available on the webGIS of the Tuscany region ( Figure 5e).
The C-Index (Figure 5f), defined also as a visibility map [59], is a parameter expressing the percentage of the detectable movement by the SAR spaceborne systems. It can be calculated by applying the following formula [60], based on the geometric features of the slopes and the satellites (Equation (1)): where S is the slope angle while A is the slope aspect, and N, E and H are the LOS (Line Of Sight) directional cosines and can be calculated using the incidence angle and the LOS azimuth through the following formulas (Equation (2)): where θ is the incidence angle and α is the satellite ground track angle. The four classes, as shown in Figure 5f, express the percentage of visibility (Class 1, with C values below 25%; Class 2, C values between 25 and 50%; Class 3 with C values between 50 and 75% and finally Class 4 with values > 75%).
The R-Index ( Figure 5g) provides an estimation of the radar visibility of an area, by taking into account the radar geometry, the slope angle and slope orientation [61,62]. It is calculated as the ratio between the slant range and the ground range, starting from the geometry of the sensor and the geometry of the surface under analysis. The following equation (Equation (3)) defines the R-Index: where S is the slope angle, A is the aspect and θ is the incident angle of the LOS. Class 1 of the R-index expresses those areas affected by a very low visibility, with distortion effects such as layover, foreshortening and shadow. Class 2 indicates R-Index < 30%, with a low amount of PSs (due to topographic effects); Class 3 and Class 4 are related to values of the R-Index between 30-50%, and over 50% (average and high amount of PSs), respectively. The horizontal component (Figure 5h), along the EW direction, of the velocity of displacement (V h ) can be obtained through the combination of ascending and descending data, through the following equation (Equation (4)): Furthermore, the vertical component ( Figure 5i) can be obtained by combining the ascending and descending velocities of displacement, through the following equation (Equation (5)): where V asc , V desc is the velocity from the ascending and descending geometries. V v , V h are the vertical and east-west values, θ asc , θ desc are the incidence angles for the ascending and descending orbits. The standard deviation of the LOS velocity expresses the dispersion of the velocity from the mean value of the whole dataset (Figure 5j). All the above-mentioned maps were rasterized with a 50 m cell size, which can be considered a good and balanced cell size for regional-scale studies and has already been adopted in the Tuscany region for landslide susceptibility [32].

Random Forest
RF is a nonparametric multivariate technique based on an ML algorithm [44]. It is a powerful method implemented in various scientific segments, such as economics, medicine, psychology, and in recent years has also been utilized in environmental modeling, for instance in the modeling of species distribution [63], and landslide susceptibility modeling [32,64]. The algorithm is based on the concepts of bootstrap aggregation (bagging) and classification or regression trees. Bootstrap aggregation takes uniform samples from an original dataset of predictors to create a subset of data that is allowed to have duplicated samples. From each sample, a tree is generated, thus splitting the data into subsets. However, in the RF algorithm, only a certain number of randomly selected points in the set are used to be split into subsets. In this way, the RF technique takes several bagged subsets from an original dataset and creates a certain amount of trees that are grown by randomly sampling points at each node. Once all the trees are grown, and a new value needs to be predicted, the values calculated by all the regression trees are averaged, assembling the final result. Indeed, each tree is developed to minimize classification errors, however the random selection influences the results, thus making a single-tree classification very unstable. For this reason, the RF-type methods make use of an ensemble of trees (the socalled "forest"), thereby ensuring model stability [32]. RF is an established and widely used technique in landslide studies as it outperforms traditional statistical methods, and it can be as effective as other ML methods [65][66][67]. Moreover, the RF technique is acknowledged for several advantages such as the joint use of categorical and numerical variables, the unnecessity of assumptions on the statistical distribution of the data and the capability of providing the statistical weight of each variable implemented. In this work, the RF algorithm implemented on the SDM (Spatial Distribution Modeling) package [68] on the R platform was adopted.

Modeling Validation
In order to validate the RF final products, the predictive accuracy of the model was assessed numerically by using the TSS (True Skill Statistic); TSS is a test that takes into account errors of omission and commission and ranges from −1 to +1, where the latter indicates a perfect performance [69]. The TSS method combines sensitivity and specificity so that both omission and commission errors are accounted for. This statistic compares the number of correct forecasts, minus those attributable to random guessing, to that of a hypothetical set of perfect forecasts [58]. For a 2 × 2 confusion matrix, TSS is defined as (Equation (6)): It can also be seen that TSS is not affected by the size of the validation set, and that two methods of equal performance have equal TSS scores. A further validation of the final modeled maps was performed by using a subset of the anomalies collected after 2020 (as described in Section 2.1). In this case, the anomalies were used and compared with the PO classes as determined by the RF model (see Section 3.4 for further information).

Results
RF modeling has been implemented in four different runs, by using the four datasets of anomalies, ascending and descending SI anomalies and ascending and descending S anomalies, while other anomalies are used as absence data. For the generation of the model, 10 environmental variables were adopted as PFs, as mentioned in Section 2.2. In addition, an eleventh parameter defined through the generation of random values all over the Tuscan territory, was created as a benchmark control variable to assess the significance of each PF. If the statistical weight of the random variable is higher than any other PF, the latter is discarded. In no cases, among the different runs, was the random map importance higher than the other PFs employed, thus confirming the whole selection. True Skill Statistic (TSS) has been considered to measure the accuracy of the RF model showing a good performance; in particular, an average value of 0.9 for landslides and 0.96 for subsidence has been obtained combining the results of the values of the ascending and descending orbits.

Slope Instability Anomalies Probability of Occurrence Map
As previously mentioned, two different SI anomalies Probability of Occurrence maps were generated, one using ascending anomalies and one for the descending, combined with the environmental and radar PFs. Both maps have values of PO between zero and one (Figure 6a,b), as computed by the Random Forest algorithm. These values have been divided into four classes: low probability (between 0 and 0.25); average (>0.25 and <0.5); high (>0.5 and <0.75); and very high (>0.75) for better visualization purposes and to compare with the control anomalies subset. The highest values of probability of occurrence of SI anomalies can be found over the main reliefs of the Apennine sector (especially over the Firenze and Pistoia side, with some spots over the Arezzo side), the Siena hilly area and the Mt. Amiata (Grosseto province) slopes. Especially in the descending map, some areas with higher values of probability can be found over the Apuan Alps (Massa-Carrara province). The RF algorithm is also capable of providing an assessment of the statistical weight of each PF within the model ( Table 2). As regards the ascending dataset map, the "heaviest" parameters are the slope gradient, followed by the horizontal velocity and the standard deviation of the velocity, while the R-Index, Slope Aspect and TPI show the lowest value. In the descending map, the horizontal velocity, lithology and slope gradient show the highest weight, and conversely, the R-Index, TPI and C-index show the lowest. A final SI anomalies probability of occurrence map has been produced by combining the ascendingand descending-related maps (Figure 6c). For the combination, the highest value of each cell has been taken and reported in the new map, so as to be more conservative and cautious. The spatial pattern of the different values of PO is extremely similar to that reported for the "single" maps.

Subsidence Anomalies Probability of Occurrence Map
For the subsidence anomalies, PO two maps were redacted as well, according to the ascending and descending anomalies and radar-based PFs (Figure 7a,b). Furthermore, in this case, values range between zero and one and have been subdivided into four classes. The distribution of the different values of PO is more or less similar between the ascending and the descending map, as expected for subsidence phenomena, with extensive higher values over the main plains, in particular over the Firenze-Prato-Pistoia plain and the Pisa-Livorno and Grosseto coastal sectors. As for the parameter weight (Table 2), as assessed by the RF model, in the ascending map the role of Vertical Velocity, Land Cover and slope gradient is higher than the rest of the PFs, while the R-Index and aspect do not show any relative importance; in the descending map, the main role is given by the vertical velocity, followed by the slope and land cover, while no significance is given by the R-Index and TPI. The two-orbit maps were combined to get a final map, which has the same spatial class distribution pattern (Figure 7c).

Subsidence Anomalies Probability of Occurrence Map
For the subsidence anomalies, PO two maps were redacted as well, according to the ascending and descending anomalies and radar-based PFs (Figure 7a,b). Furthermore, in this case, values range between zero and one and have been subdivided into four classes. The distribution of the different values of PO is more or less similar between the ascending and the descending map, as expected for subsidence phenomena, with extensive higher values over the main plains, in particular over the Firenze-Prato-Pistoia plain and the Pisa-Livorno and Grosseto coastal sectors. As for the parameter weight (Table 2), as assessed by the RF model, in the ascending map the role of Vertical Velocity, Land Cover and slope gradient is higher than the rest of the PFs, while the R-Index and aspect do not show any relative importance; in the descending map, the main role is given by the vertical velocity, followed by the slope and land cover, while no significance is given by the R-Index and TPI. The two-orbit maps were combined to get a final map, which has the same spatial class distribution pattern (Figure 7c

Response Curves
One of the abilities of the RF application is to represent the response curves of each model. A response curve is used to quantify the behavior of variables and to recognize the relationships between each conditioning factor and the modeling [70]. In the case of the SI probability of occurrence map, the response curves of the most valuable PFs, i.e., horizontal velocity, slope gradient, lithology and velocity standard deviation, show a higher response of some values with respect to others ( Figure 8). For instance, in the descending map, the horizontal velocities with a higher response are those > +30 mm/year, while in the ascending map a higher response is given by the velocities < −25 mm/year. The slope gradient response curves show higher values between 10 and 20° in the case of the ascending map, while higher than 30° for the descending one. Instead, the standard deviation of the ascending map has a peak with the value of 0.4; the lithologies with a higher response in the descending PO map are the mudstone and marly arenaceous flysch formations. For the subsidence PO map, the main role, as evidenced in Section 3.2, is provided by three PFs, the vertical velocity, the slope gradient and the land cover. The response curves of these three parameters are very similar between the ascending and the descending map (Figure 8): the vertical velocity curve shows a higher response with values < −10 mm/year, as well as generally low slope values (below 5°) that give a better response. Finally, the classes of land cover showing a high response are those related to the artificial surfaces (class 1 of the Corine Land Cover classification) in both maps.

Response Curves
One of the abilities of the RF application is to represent the response curves of each model. A response curve is used to quantify the behavior of variables and to recognize the relationships between each conditioning factor and the modeling [70]. In the case of the SI probability of occurrence map, the response curves of the most valuable PFs, i.e., horizontal velocity, slope gradient, lithology and velocity standard deviation, show a higher response of some values with respect to others ( Figure 8). For instance, in the descending map, the horizontal velocities with a higher response are those >+30 mm/year, while in the ascending map a higher response is given by the velocities <−25 mm/year. The slope gradient response curves show higher values between 10 and 20 • in the case of the ascending map, while higher than 30 • for the descending one. Instead, the standard deviation of the ascending map has a peak with the value of 0.4; the lithologies with a higher response in the descending PO map are the mudstone and marly arenaceous flysch formations. For the subsidence PO map, the main role, as evidenced in Section 3.2, is provided by three PFs, the vertical velocity, the slope gradient and the land cover. The response curves of these three parameters are very similar between the ascending and the descending map (Figure 8): the vertical velocity curve shows a higher response with values <−10 mm/year, as well as generally low slope values (below 5 • ) that give a better response. Finally, the classes of land cover showing a high response are those related to the artificial surfaces (class 1 of the Corine Land Cover classification) in both maps.

Cross-Validation of PO Maps
As described in Section 2.2.1, 1087 SI anomalies collected between January 2020 and April 2021 have been used to validate the final PO maps (Figure 9). An increasing trend can be observed in the numerical distribution. The highest frequency is seen in the very high PO class (377 out of 1087), followed by high PO, with 27% (292), average PO, with about 25% (267) and, lastly, low PO with 14% (151). On the other hand, 327 S anomalies between May 2020 and April 2021 were used to validate the merged final maps. In detail, 138 anomalies fall within the very high PO class, representing about 42% of the total, 93 in the class high PO (28%), while the average and low PO classes count 40 and 56 anomalies, respectively (12 and 17% of the total).

Cross-Validation of PO Maps
As described in Section 2.2.1, 1087 SI anomalies collected between January 2020 and April 2021 have been used to validate the final PO maps (Figure 9). An increasing trend can be observed in the numerical distribution. The highest frequency is seen in the very high PO class (377 out of 1087), followed by high PO, with 27% (292), average PO, with about 25% (267) and, lastly, low PO with 14% (151). On the other hand, 327 S anomalies between May 2020 and April 2021 were used to validate the merged final maps. In detail, 138 anomalies fall within the very high PO class, representing about 42% of the total, 93 in the class high PO (28%), while the average and low PO classes count 40 and 56 anomalies, respectively (12 and 17% of the total).

Cross-Validation of PO Maps
As described in Section 2.2.1, 1087 SI anomalies collected between January 2020 and April 2021 have been used to validate the final PO maps (Figure 9). An increasing trend can be observed in the numerical distribution. The highest frequency is seen in the very high PO class (377 out of 1087), followed by high PO, with 27% (292), average PO, with about 25% (267) and, lastly, low PO with 14% (151). On the other hand, 327 S anomalies between May 2020 and April 2021 were used to validate the merged final maps. In detail, 138 anomalies fall within the very high PO class, representing about 42% of the total, 93 in the class high PO (28%), while the average and low PO classes count 40 and 56 anomalies, respectively (12 and 17% of the total).  To evaluate the spatial distribution of the various PO classes, a cross-comparison with official landslide and subsidence inventories was performed. Indeed, IFFI landslide (Inventario dei Fenomeni Franosi in Italia, Landslide Inventory in Italy) [71,72] and DIANA subsidence (Dati Interferometrici per l'ANalisi Ambientale-Interferometric data for environmental analysis [54,73]) inventories were intersected with the pixels of the SI and S PO maps, respectively. In the first case (SI anomalies PO map), the number of pixels within and outside the inventoried landslide was evaluated, analyzing the number of cells for each class of PO ( Figure 10). The total number of PO pixels within landslide polygons is 982,906, of which 12,413 (1.26% of the total number) have a value of PO between 0.75-1 (very high), 47,801 are high (4.86%), 440,518 in the average class (44.82%) and 482,174 (49.06%) in the low class. The same analysis was performed on the non-landslide cells, and a very low percentage of cells falls within the higher PO classes (0.24% and 2.15% for classes very high and high, respectively), while much higher concentrations can be found over the lower classes (30.47 and 67.13% for class average and low, respectively). The number of cells for each PO class was also analyzed according to the total number of cells for the whole Tuscany territory (i.e., dividing the number of pixels with a certain PO class within the landslide and the total number of pixels with a certain PO value). Such analysis has revealed that, in the case of the SI PO cells overlapped on landslide polygons, the higher "influence" is related to the very high class (46.24% of the total), thus these values decrease proportionally with the class values. Conversely, the cells over the non-landslide areas show a balance among the different classes of probability of occurrence, with a percentage lower than 30% in all four classes (19 and 25% over the very high and high classes).
within and outside the inventoried landslide was evaluated, analyzing the number of cells for each class of PO ( Figure 10). The total number of PO pixels within landslide polygons is 982,906, of which 12,413 (1.26% of the total number) have a value of PO between 0.75-1 (very high), 47,801 are high (4.86%), 440,518 in the average class (44.82%) and 482,174 (49.06%) in the low class. The same analysis was performed on the non-landslide cells, and a very low percentage of cells falls within the higher PO classes (0.24% and 2.15% for classes very high and high, respectively), while much higher concentrations can be found over the lower classes (30.47 and 67.13% for class average and low, respectively). The number of cells for each PO class was also analyzed according to the total number of cells for the whole Tuscany territory (i.e., dividing the number of pixels with a certain PO class within the landslide and the total number of pixels with a certain PO value). Such analysis has revealed that, in the case of the SI PO cells overlapped on landslide polygons, the higher "influence" is related to the very high class (46.24% of the total), thus these values decrease proportionally with the class values. Conversely, the cells over the non-landslide areas show a balance among the different classes of probability of occurrence, with a percentage lower than 30% in all four classes (19 and 25% over the very high and high classes).
The same analysis was performed on the S PO cells over the subsidence areas as defined by the DIANA inventory ( Figure 11). As regards the pixels over the subsidence areas, the highest percentage is given by the PO low class (55.66%) and the lowest by PO very high class (10.61%), however, normalizing this value with the total number of cells in Tuscany, the trend becomes the opposite, with the highest weight given by very high class (57.36%) and the lowest by the low class (1.57%). A similar trend can be observed in the overlap between S PO pixels and non-subsiding areas: analyzing the concentration of pixels in non-subsiding areas, 95% fall within the lowest class (values of PO between 0-0.25), however, normalizing the number of these pixels with the total number of pixels, the percentages are balanced between the different classes of PO. A summary of the comparison between PO cells and landslide and subsidence inventories is reported in Table 3.  The same analysis was performed on the S PO cells over the subsidence areas as defined by the DIANA inventory ( Figure 11). As regards the pixels over the subsidence areas, the highest percentage is given by the PO low class (55.66%) and the lowest by PO very high class (10.61%), however, normalizing this value with the total number of cells in Tuscany, the trend becomes the opposite, with the highest weight given by very high class (57.36%) and the lowest by the low class (1.57%). A similar trend can be observed in the overlap between S PO pixels and non-subsiding areas: analyzing the concentration of pixels in non-subsiding areas, 95% fall within the lowest class (values of PO between 0-0.25), however, normalizing the number of these pixels with the total number of pixels, the percentages are balanced between the different classes of PO. A summary of the comparison between PO cells and landslide and subsidence inventories is reported in Table 3.

Discussion
The joint use of advanced Earth Observation products, fostered by the launch of groundbreaking spaceborne missions such as the Sentinel-1, and Machine Learning algorithms, is gaining much more consideration in studies dealing with the analysis of geohazards. The use of remote sensing SAR data, in particular of continuous information such as in the case of the Tuscany region, is a valid support for the constant and up-todate near-real-time monitoring of wide areas. On the other hand, the implementation of statistical-probabilistic methodologies, such as Random Forest, is a considerable development for the preparation of highly accurate provisional models. In recent years, the combination of these two techniques has provided significant results for landslide susceptibility mapping, as represented by the works of [41], which used the SBAS data to refine a susceptibility model of landslides along the Karakorum highway (Chinese Himalaya), or Ilia, et al. [74,75], which investigated the relationship between land subsidence and the spatiotemporal pattern of groundwater in western Thessaly (Greece).
In this work, RF and InSAR data have been jointly used for the assessment of the probability of occurrence of anomalies of movement, i.e., acceleration of deceleration of displacement as detected by SAR satellites. Since the launch of the continuous monitoring system in October 2016, based on the exclusive use of Sentinel-1 data for the whole territory of Tuscany, more than 50,000 anomalies were collected and interpreted. In Confuorto, et al. [28], an analysis was conducted on the distribution of one year (July 2019-July 2020) of anomalies in three Italian regions (including Tuscany) where the operational service was active. The results showed a substantial connection between the spatial disposition of the anomalies and the physiographic, environmental, and geological settings of each region. Following these outcomes, the objective of this work has been to set up a methodology, which allows us to identify the main factors and the main processes that lead to the changes in the displacement trend and to estimate the occurrence probability of anomalies of movement, and thus, identify the most prone areas.
The RF model was selected for this aim since it is capable of producing good predictions, handling large datasets efficiently and with a high level of accuracy, as testified by the different validation approaches tested in this work. Indeed, TSS test values showed a good performance of the RF model (TSS = 0.9 for landslides and TSS = 0.96 for subsidence); additionally, the anomalies collected in a second phase (between May 2020 and April 2021 for S and between January 2020 and April 2021 for SI) were used to validate the PO models. The cross-validation revealed that generally about 70% of these anomalies fall within the higher classes of PO, thus demonstrating the good predictive performance of the maps and that the anomalies are very often recurrent and persistent in space and over time.
Another key aspect of RF is the capability of estimating the variable importance, providing valuable insights that help us to interpret the models. In this work, classical environmental and geomorphological features, the expression of the territory under analysis, were coupled with radar features, typical of the SAR data from which the anomalies were generated. These parameters can be considered a synthesis of the main aspects leading to the anomaly's generation, the latter being dependent on either the territory physiography, or geology and land use, as well as the radar visibility and characteristics. In the case of SI anomalies, the most significant factors are the slope gradient, which is undoubtedly a diffuse conditioning element for landsliding, and the horizontal projection of the LOS velocity, confirming that the major component of the movement is along the horizontal axis (even if with SAR, only the E-W component can be obtained). The lithology also assumes a certain role, thus confirming the geological control over the landslide displacements and their acceleration, as well as the standard deviation of the velocity, which implies that the higher this value is, the more likely it is that an abrupt acceleration can be connected with a sudden reactivation of a mass movement along a hillslope or mountainside. With regard to the S anomalies PO map, three factors are the most important: the vertical velocity, the slope gradient, and the land use. In both cases (ascending and descending map), the first is the factor with the highest weight, which is used by the RF algorithm to identify subsidence phenomena as they are essentially a vertical displacement. The significance of the slope gradient is given by the flat morphology (thus very low values of slope), where generally subsidence occurs. Finally, the land use assumes a notable function, since most of the anomalies are verified in an urban setting, where most of the accelerations have a seasonal character and are mostly due to overexploitation of underground resources (such as in the cases of Montemurlo, Prato province, as highlighted by Del Soldato, et al. [27]) or where the building load induces severe settlements (e.g., in Pisa, [76], or on the freight terminal of Guasticce, Livorno province [77]).
The role of each PF and the value of each variable can also be estimated through the response curves, as seen in Section 3.3. Their implementation provides a valuable support for the understanding of the model and of the variables. In the SI PO map, the response curves of the most significant PF, highlight that SI anomalies generally occur in areas with high horizontal velocities of displacement (higher than 20 mm/year), and with a variable slope gradient (between 10 and 30 • ); as regards the lithology, the mudstone and the flysch formations give a higher response, since landslides occur very often in Tuscany and are more sensitive to acceleration in such terrains, as in the case of Carpineta (Pistoia province, Raspini, et al. [77]) and as observed by Rosi, et al. [53] across the entire Tuscan regional territory. For the S PO map, the response curves highlighted that generally the vertical velocity values should not be necessarily very high, and furthermore, S anomalies are likely to occur over flat areas (as testified by the higher response of very low slope gradient values). As previously mentioned, as concerns the land use, generally SI anomalies are more likely to occur over urban areas, as is evident from the response curve of the CLC map.
Finally, the comparison of the anomalies PO map values with the existing landslide and subsidence inventories demonstrated that SI anomalies may not necessarily only occur in already inventoried landslides, but they may also be generated in unknown slopes (almost 45% of the non-landslide pixels have high and very high PO values).The S anomalies show a great percentage of the pixel within subsidence inventoried areas having high and very high PO values (about 90%), while a minor percentage of non-subsidence pixels (about 38%) are found. This result is also significant, considering that the subsidence inventory is dated back to 2013, and thus has not been updated. This confirms that accelerations due to SI may take place in many slope sectors and that inventories should be updated frequently, while subsidence accelerations very often occur in well-known areas, and they are induced by seasonal fluctuations or by settlements due to the urban sprawl.

Conclusions
The availability of an unprecedented amount of spaceborne information, as represented by the Sentinel-1 imagery, is a turning point for the monitoring of geohazards, and along with a reduced revisit time of up to six days, made possible the beginning of operational continuous monitoring services based exclusively on SAR (Synthetic Aperture Radar) data. Tuscany is at the forefront in this sense as it was the first region to rely on this kind of service, which is based on the identification of the so-called anomalies of movement, i.e., measurement points, derived from MTInSAR (Multi Temporal Interferometry SAR) techniques, by means of a changing deformation trend through a data-mining algorithm. In this work, the probability of occurrence of landslide-and subsidence-related anomalies was assessed through the adoption of Machine Learning Random Forest algorithms. Two maps were generated, one estimating the probability of occurrence for Tuscany to have Slope Instability (SI) anomalies, and one Subsidence (S) anomalies. A total of ten environmental factors were selected and included in the modeling; five connected to the radar data and five to the morphology and geology of the study area. The final maps were validated through the overlap with further and subsequent anomalies and compared with the landslide and subsidence inventories available. In the first case, more than 70% of the control anomalies fall within the higher classes of probability of occurrence, both for SI and for S. As for the second case, the comparison of SI probability of occurrence values with inventoried landslide showed that anomalies can be generated even in non-inventoried areas (as testified by almost 45% of the non-landslide pixels having high and very high probability of occurrence values); in the case of S anomalies, a minor percentage of non-subsidence pixels (about 38%) were found, showing that most of the accelerations were within well-known subsiding areas. The adoption of a ML method also enabled us to comprehend the main driving forces that lead to the changes in the deformation trends, highlighting the major role of predisposing factors such as the slope gradient, the horizontal or vertical projection of the velocity of displacement, the land cover and the lithology. Further improvements can be made regarding the choice of the parameters to consider, whose selection could be extended to other environmental factors, as well as the replication of the analysis, even with other ML methods, over other Italian regions. The adoption of a map for assessing the probability of occurrence of MTInSAR anomalies may represent a step forward for the understanding of deformational phenomena and may serve as an enhanced geohazard prevention measurement, to be periodically updated and refined in order to have the most precise as possible knowledge of the territory.
Author Contributions: Conceptualization, methodology, investigation, writing-original draft preparation, P.C.; software, formal analysis, data curation, investigation, visualization, C.M.; validation, supervision, writing-review and editing, S.B.; software, validation, writing-review and editing, M.D.S.; validation, data curation, writing-review and editing, A.R.; data curation, writing-review and editing, S.S.; supervision, project administration, N.C. All authors have read and agreed to the published version of the manuscript.