Monitoring the Recovery after 2016 Hurricane Matthew in Haiti via Markovian Multitemporal Region-Based Modeling

: The aim of this paper is to address the monitoring of the recovery phase in the aftermath of Hurricane Matthew (28 September–10 October 2016) in the town of J é r é mie, southwestern Haiti. This is accomplished via a novel change detection method that has been formulated, in a data fusion perspective, in terms of multitemporal supervised classiﬁcation. The availability of very high resolution images provided by last-generation satellite synthetic aperture radar (SAR) and optical sensors makes this analysis promising from an application perspective and simultaneously challenging from a processing viewpoint. Indeed, pursuing such a goal requires the development of novel methodologies able to exploit the large amount of detailed information provided by this type of data. To take advantage of the temporal and spatial information associated with such images, the proposed method integrates multisensor, multisource, and contextual information. Markov random ﬁeld modeling is adopted here to integrate the spatial context and the temporal correlation associated with images acquired at different dates. Moreover, the adoption of a region-based approach allows for the characterization of the geometrical structures in the images through multiple segmentation maps at different scales and times. The performances of the proposed approach are evaluated on multisensor pairs of COSMO-SkyMed SAR and Pl é iades optical images acquired over J é r é mie, in the aftermath of and during the three years after Hurricane Matthew. The effectiveness of the change detection results is analyzed both quantitatively, through the computation of accuracy measures on a test set, and qualitatively, by visual inspection of the classiﬁcation maps. The robustness of the proposed method with respect to different algorithmic choices is also assessed, and the detected changes are discussed in relation to the recovery endeavors in the area and ground-truth data collected in the ﬁeld in April 2019.


Introduction
In the last decades, advances in the design and development of optical and synthetic aperture radar (SAR) satellite sensors have favored the deployment of new technological solutions able to acquire imagery at very high spatial resolution (VHR) with short revisit time. Moreover, the combined use of such systems commonly leads to analysis of data characterized by different acquisition modes and geometries and different spatial and sensor input data. Computationally, through MRF modeling, the Hammersley-Clifford theorem [5] allows formulating the maximum a posteriori (MAP) decision rule in terms of the minimization of a suitable energy function.
The energy function [5,6,9] of the proposed Markovian fusion framework comprises different contributions related to (i) the spatial relationships between neighboring pixels of the imaged scene, (ii) the temporal relationships between the same location at different acquisition times, and (iii) the multiscale information conveyed by a region-based analysis of the input imagery. Concerning the spatial and temporal relationships, an ad hoc graph is constructed on top of the multitemporal input data. Such a graph is used, within the Markovian framework, to integrate prior information in the fusion process. The resulting change map is then characterized by spatial and temporal regularization that ensures consistency between the classification maps, and thus in the change detection product. This allows taking into account both spatial and temporal information in the monitoring of the recovery process across each pair of considered observation times.
Concerning multiscale information, the VHR nature of the input image requires characterization of both large geometrical structures and smaller spatial details. This is accomplished here by exploiting both fine and coarse scales thanks to the use of multiple segmentation maps. Indeed, the third component of the developed Markov model is related to the results of a segmentation method applied with the goal of extracting details at different scales. The opportunity of using region-based concepts is consistent with the broad framework of geospatial object-based image analysis (OBIA) [10]. In the context of remote sensing image classification, the use of data at different resolutions provides at the same time benefits and drawbacks depending on the final goal of the classification task. Indeed, on one hand, high-resolution images allow for the detection of the spatialgeometrical configuration and the generation of land cover maps with detailed thematic classes. On the other hand, the high spatial heterogeneity conveyed by high-resolution observations may not lead to the identification of the main thematic classes, such as urban or built-up land covers. The use of segmentation maps at different scales as an input of the classification process makes it possible to take advantage of both fine-scale and coarse-scale data representations.
It is worth mentioning that a first preliminary formulation of the work presented in this paper has been published as a conference paper [11]. The goal of that publication was to give a preliminary idea of the method and briefly show the first results. In the present article, we extend the conference paper in terms of both a more detailed methodological description and of an expanded experimental analysis over a longer timeframe after the hurricane. Furthermore, validation of the classification maps is made by means of groundtruth data collected in the field in April 2019. The outcomes are discussed in relation to land management and anthropogenic processes occurring in Jérémie.
The paper is organized as follows: Section 2 summarizes the state-of-the-art methods for the identification of land cover changes and transitions from remote sensing imagery. Then, Section 3 describes materials and methods, presenting the case study and the available dataset (Section 3.1) and the proposed methodological approach (Sections 3.2-3.4). Section 4 presents the results achieved by the developed method with details on the produced change maps and confusion matrices for a quantitative evaluation of the performances. Section 5 provides the discussion of the results with respect to the problem at hand, i.e., the monitoring of the recovery phase in the aftermath of Hurricane Matthew. Finally, Section 6 summarizes the main conclusions.

Previous Work on Land Cover Change Detection
In the last 30 years, the increased availability of satellite data with shorter revisit period and finer spatial resolution, together with the related time series of images acquired on a regular basis, have attracted the attention of the remote sensing community. In the literature, several methodological approaches have been developed in the context of multitemporal remote sensing analysis to characterize the changes in land cover [12][13][14][15]. Various review papers adopt different categorization principles to classify these change detection approaches into separate categories.
A first scheme was presented in [12] and uses a categorization based on the order by which the input images are processed. In [16,17], classification maps are produced independently for each acquisition date, and then a comparative study of the obtained classification maps is performed to highlight the changes that occurred. On the contrary, similarly to the work proposed in this paper, in [18][19][20], different images are analyzed simultaneously by taking into account their multitemporal relationships from the beginning. A second classification scheme, proposed in [21], focuses the attention on the object of analysis, distinguishing between classification approaches [19,22] and change measurements [23,24]. Moreover, a different scheme is proposed in [25] and identifies three categories depending on the role of the time dimension. Finally, the review in [3] discriminates between methods performing the fusion of the multitemporal data at the feature level or at the decision level. In the former case, multitemporal information is extracted through the generation of new features able to emphasize changes in the dataset. Conversely, in the latter case, higher-level analysis is conducted by applying suitable decision rules to identify land cover classes and highlight the changes.
Concerning the multitemporal fusion at the feature level, a first type of approach is represented by techniques based on image comparison. Such techniques are usually sensorspecific and are designed by considering the noise model for the data at hand. The image ratioing approach, sometimes expressed on the logarithmic scale, is a common example [26,27] in the case of SAR data affected by the multiplicative speckle component. Conversely, techniques such as the univariate image differencing [28,29] or the change vector analysis (CVA) [30,31] are usually applied in the case of data collected by passive optical sensors, whose noise is usually modeled as additive Gaussian. Other relevant families of approaches are the transformation techniques (e.g., the principal component analysis (PCA) [32]); the regression approaches [33]; the methods based on distance functions and similarity measures (e.g., the Kullback-Leibler divergence [34]); and additional methods based, for instance, on statistical mixture models, backscattering coefficient, H-α decompositions, and polarimetric signatures.
Another alternative solution is represented by the formalization of the change detection problem in an unsupervised Bayesian framework. In this context, a predefined statistical distribution for the classes associated with changed and unchanged pixels, according to the specific type of data, is assumed. After the application of implicit or explicit automatic parameter estimation processes, the change detection task is performed through pattern recognition approaches [31,35].
This discussion points out that multitemporal fusion at the feature level is mainly related to unsupervised detection approaches whose final product is often limited to the identification of changed and unchanged areas; i.e., it is usually a binary output map. These approaches are especially appealing for the application in emergency scenarios where the main goal is to identify changed areas accurately and with short computation times or in event-based risk assessment through data mining in long image archives (e.g., [36,37]).
Concerning the methods belonging to the category of multitemporal fusion at the decision level, they are explicitly designed to identify land cover classes and land cover transitions, providing a better understanding of the observed data and characterizing the typology of the changes that occurred. Most of the approaches in this category involve supervised and semisupervised classification techniques, which require the availability of a labeled training set. In contrast to the methods described above, this is consistent with the application to a detailed monitoring study, but it would poorly match the requirements of the application to an emergency. These techniques allow simultaneously performing change detection and change classification on multitemporal time series of images. Three different subgroups can be identified: (i) postclassification comparison (PCC), (ii) direct multidate classification, and (iii) joint multitemporal classification. PCC methods accomplish the change detection task by classifying separately each image of the multitemporal dataset and performing a comparison between the obtained thematic maps. On one hand, this allows minimizing the impacts of the sensor and atmospheric differences resulting from the asynchronous acquisition of the images in time series. On the other hand, the simple formulation of PCC methods is not able to consider the temporal correlation. Thus, the overall performances are strictly connected to the results generated on each image by the adopted classifier. In the literature, there exist examples of PCC methods using pixel-by-pixel classification [38,39] and region-based approaches [40,41].
In order to take into account temporal correlations, direct multidate and joint multitemporal methods carry out a single analysis on a combined dataset containing two or more images acquired on different dates. In the case of direct multidate classification, the feature vector associated with each image pixel is composed of the feature values of all the images stacked together. Assuming that training pixels related to the same area on the ground are available at different dates, a classifier is trained separately for the identification of each land-cover transition [42]. An intrinsic limitation of such approaches is that they do not model the temporal structure of the data. Indeed, the joint statistics of the features on either date and the joint statistics across the observation dates are substantially merged, without attempting to model their different semantics. Conversely, joint multitemporal classification allows the integration of a dataset of two or more dates into a single analysis process. In a Bayesian framework, two different strategies can be adopted in order to tackle this task: the cascade approach and the mutual approach. In the former case, each image is classified on the basis of itself and of previous images by removing the coupling between the spectral and temporal dimensions [43]. On the contrary, the latter classifies each image on the basis of the previous and the subsequent images of the dataset [22]. From an application-oriented viewpoint, the cascade approach is typical of situations in which a pre-existing land cover map should be updated according to newly available satellite imagery. The mutual approach meets the requirements of the applications in which a full time series of images, all collected in the past, should be used to study the related temporal phenomena.
The above analysis recalls how classification methods have proven effective in the application to land cover change detection. The main reason lies in their capability to intrinsically exploit multitemporal and possibly multisensor, multiscale, and multiresolution information. Indeed, they make it possible to produce land cover transition maps by taking advantage of the large amount of information related to images acquired on the same ground area on different dates and by sensors of different types (e.g., optical multispectral and SAR images). Nevertheless, the integration of such a large amount of data in the case of last-generation VHR imagery may represent a challenge. On one hand, VHR images are characterized by strong correlation between neighboring pixels. On the other hand, the spatial behavior of the pixel intensity is heterogeneous and allows distinguishing among spectral responses produced by different ground objects and materials. Four main methodological approaches exist that are able to incorporate spatial contextual information in image analysis: (i) spatial feature extraction, (ii) region-based or object-based methods, (iii) probabilistic graphical models, and (iv) deep learning methods based on convolutional neural networks (CNNs).
The first approach relies on the inclusion in the classification process of further features able to capture the spatial relationship among the intensities of neighboring pixels. Classical examples include Haralick's features and the grey level co-occurrence matrix [44], mathematical morphology [45], and wavelet transforms [46].
Concerning the second family of approaches, in the classification process, regionbased and object-based methods usually include the results obtained by the application of a segmentation algorithm able to capture the geometrical structure associated with the image. Examples include region-growing algorithms [47], fuzzy-connectedness techniques [48], watershed methods [49], and hierarchical algorithms [50]. The use of a region-based Remote Sens. 2021, 13, 3509 6 of 29 approach allows dealing with data at very high resolution. Indeed, the use of segmentation maps at different scales as an input of the classification process makes it possible to take advantage of both high-resolution and coarse-resolution observations. The third approach represents not only an additional methodology for the integration of spatial contextual information, but also a powerful general data fusion framework, which can play an important role in the change detection task as well. Probabilistic graphical models are a general family of models for the dependence across a collection of random variables. They are formalized in terms of a probabilistic formulation on top of a graph topology, typically associated with Markovianity properties. In the application to remote sensing image processing, MRF and conditional random field (CRF) models on undirected graphs are especially prominent [51][52][53][54]. Differently from MRFs, which are designed to model the prior distribution of the desired output map, CRFs [55] have been introduced to formalize the Markovianity property with regard to the posterior distribution directly, often further enhancing modeling flexibility. The remarkable results obtained by this Markovian approach to multisource fusion in the remote sensing field can be seen in various examples involving multisensor [56], multitemporal [22], multiresolution [57], or multichannel imagery [58], or combinations of the above [59,60].
On one hand, in the case of methods based on spatial features, one often speaks of "feature engineering" to emphasize that the rationale of the feature extraction stage is determined by the specific interpretation of the feature semantic, as meant and defined by a human operator. On the other hand, effective approaches have been recently developed using deep learning concepts [61]. In particular, architectures based on CNNs intrinsically extract spatial contextual features through convolutional filtering and pooling operators. These features are meant to provide data representations at progressively more abstract semantic levels and are determined by the learning of the network based on the training set and not engineered by the operator. An operational limitation is generally due to the need for large annotated ground truth to be used for training purposes. Examples of state-of-the-art solutions based on CNNs are [62][63][64][65].

Case Study
The proposed method to address land cover change was tested in support of the monitoring of the recovery phase after Hurricane Matthew, which struck Haiti on 4 October 2016. To this end, the study area encompassed the coastal zone around the town of Jérémie, capital of the Grande Anse department in the southwest of Haiti, i.e., the main urban settlement that was dramatically affected by the hurricane. Jérémie was selected among the priority areas of interest by the Haitian authorities, the technical institutions, and the space agencies involved in the CEOS Haiti Recovery Observatory (RO) project [2]. The Haiti RO was a 4-year-long initiative carried out in the framework of the CEOS Working Group on Disasters (WGDisasters), chaired by the Italian Space Agency (ASI) during the development of the present work, and coordinated by the French National Centre for Space Studies (CNES) and the National Geospatial Information Center (CNIGS) of Haiti. Among its goals, the project aimed to demonstrate in a high-profile context the value of using satellite Earth observation to support recovery from a major disaster, in the near term (e.g., support for postdisaster needs assessment (PDNA) process [66]) and long term (e.g., major recovery planning and monitoring, estimated to be from 1 to 3 years) [2]. In this framework, the CEOS-DRM project, which is the baseline for the work that is proposed in this paper, contributed to the generation of the thematic product "watershed/flood" of the Haiti RO product portfolio [1].
Following the Recovery Observatory Operations Plan [2], CNES and ASI activated dedicated satellite data acquisition campaigns with the Pléiades and COSMO-SkyMed constellations, respectively, the latter in addition to a regional-scale coverage with TerraSAR-X by the German Aerospace Center (DLR) and Sentinel-1 by the European Space Agency (ESA) [67]. It is in this framework that the present study exploited the bespoke dataset of Besides the native resolutions of the input images, in accordance with the requirements of the CEOS-DRM initiative, the change detection products have been generated at the resolution of 10 m. The original images have been downsampled to this target resolution as a preprocessing step, and then the proposed method has been applied to the resulting data, i.e., downsampling has been applied in the domain of the input images and not of the output maps. Specifically, the adopted downsampling strategy has been linear and a low-pass filter has been applied before downsampling to prevent aliasing.
The target resolution of 10 m has been identified as a tradeoff between a rather fine spatial detail and a synoptic view of the overall recovery process. The focus has been on the recovery after Hurricane Matthew, whose spatial scale was quite broad. Therefore, the 10 m resolution was expected to make it possible to appreciate the damages caused by the hurricane and the subsequent recovery efforts. Indeed, such a design choice within the CEOS-DRM initiative has been confirmed precisely by the validation results that are reported in Section 5, and that allowed identifying land cover transitions associated with meaningful phenomena and activities related to after-event recovery. Furthermore, focusing on the resolution of 10 m allows processing a rather large area without a heavy computational burden.
For the sake of completeness, some mapping results have also been generated at the finer resolution of 2 m in order to assess the capability of the proposed approach to work with high-resolution imagery. Indeed, such a dedicated experiment was again consistent with the goals of the CEOS-DRM initiative but focused specifically on an urban scenario, in which the finer spatial resolution allows better appreciating the heterogeneity of its land cover. As described in Sections 1 and 3.2, the capability to adapt to multiple resolutions is granted by the integration of multiple segmentation maps, each associated with a different level of spatial scale, within the Markovian fusion framework, and thus within the final mapping product.
The datasets available in the considered years were as follows: It is worth anticipating that the proposed method jointly generates a change map and two classification maps starting from data collected at two different dates. In the following, when more than one image (e.g., optical and SAR acquisitions) is available at a given date, they are listed as separated by a semicolumn. RGB compositions of the multispectral images and grey-scale displays of the SAR amplitude images are shown in Figures 1-4.

Overview of the Proposed Method
The proposed method generates a change map and two classification maps by modeling the spatial and temporal relationships within the input multitemporal and multisensor dataset via the regularization and data fusion capabilities of MRF modeling. The flowchart of the method is depicted in Figure 5. At both dates 0 and 1 , by using an available training set, a preliminary classification map is computed by means of kernelbased methods, ensemble methods, or other classifiers. As is shown later, the proposed framework is not sensitive to the choice of this initial classifier. Moreover, segmentation maps corresponding to multiple levels of detail are extracted from the input images. Then, the Markovian framework takes advantage of such a collection of data to jointly compute the two classification maps by modeling their spatiotemporal relationship. The change map, i.e., the map of class transitions from 0 to 1 , is finally computed based on the two classification results. Due to the complexity of the input dataset, the straightforward application of the MAP decision rule for the joint computation of the two classification maps, and hence the change information, would be computationally intractable. Nevertheless, leveraging on

Overview of the Proposed Method
The proposed method generates a change map and two classification maps by modeling the spatial and temporal relationships within the input multitemporal and multisensor dataset via the regularization and data fusion capabilities of MRF modeling. The flowchart of the method is depicted in Figure 5. At both dates t 0 and t 1 , by using an available training set, a preliminary classification map is computed by means of kernel-based methods, ensemble methods, or other classifiers. As is shown later, the proposed framework is not sensitive to the choice of this initial classifier. Moreover, segmentation maps corresponding to multiple levels of detail are extracted from the input images. Then, the Markovian framework takes advantage of such a collection of data to jointly compute the two classification maps by modeling their spatiotemporal relationship. The change map, i.e., the map of class transitions from t 0 to t 1 , is finally computed based on the two classification results.

Overview of the Proposed Method
The proposed method generates a change map and two classification maps by modeling the spatial and temporal relationships within the input multitemporal and multisensor dataset via the regularization and data fusion capabilities of MRF modeling. The flowchart of the method is depicted in Figure 5. At both dates 0 and 1 , by using an available training set, a preliminary classification map is computed by means of kernelbased methods, ensemble methods, or other classifiers. As is shown later, the proposed framework is not sensitive to the choice of this initial classifier. Moreover, segmentation maps corresponding to multiple levels of detail are extracted from the input images. Then, the Markovian framework takes advantage of such a collection of data to jointly compute the two classification maps by modeling their spatiotemporal relationship. The change map, i.e., the map of class transitions from 0 to 1 , is finally computed based on the two classification results. Due to the complexity of the input dataset, the straightforward application of the MAP decision rule for the joint computation of the two classification maps, and hence the change information, would be computationally intractable. Nevertheless, leveraging on Due to the complexity of the input dataset, the straightforward application of the MAP decision rule for the joint computation of the two classification maps, and hence the change information, would be computationally intractable. Nevertheless, leveraging on the Hammersley-Clifford theorem and on the Markovian property of the prior probabilities, an energy minimization problem can be formulated as a computationally affordable solution to find the MAP estimate [5].
In particular, collecting all the image data in a matrix X and all the labels related to the thematic classes in a vector L, the joint posterior distribution P(L|X ) is a Gibbs distribution and is proportional to the quantity exp[−U(L|X )], where the energy U is defined locally according to the aforementioned neighborhood system. The Bayesian MAP rule is equivalent to the minimization of the energy U(L|X ) with respect to L, given the input image X . Multiple information sources can also be fused in this Markovian framework by defining appropriate energy functions as linear combinations of contributions associated with the individual sources [43,59,60].

Energy Function of the Proposed Markov Model
The energy of the proposed MRF model takes into consideration three terms related to the spatial relationships between neighboring pixels, the temporal relationship between the acquisitions at different times, and the multiscale information conveyed by a region-based analysis of the input imagery.
Let us consider a multitemporal dataset defined over a pixel lattice I and composed of two images X 0 and X 1 acquired at times t 0 and t 1 , respectively, where t 0 < t 1 . Let us also suppose that the two images are well registered so that it is possible to process them coherently on the same reference frame. In the literature, many image registration methods exist [68][69][70], also capable of addressing multisensor registration problems [71][72][73].
Focusing on the image X r (r = 0, 1), the ith pixel of the pixel lattice I is represented by a feature vector x ir ∈ R d r of d r components. Moreover, denoting with Ω r = {ω kr : k = 1, 2, . . . , K r } the set of thematic classes provided with training samples at time t r , the class label of the ith pixel is denoted as ir ∈ Ω r . Indeed, by assigning a class label to each pixel in the lattice I, it is possible to construct the label map L r = { ir } i∈I .
The pair of label maps L 0 and L 1 indicates the possible class transitions occurring between times t 0 and t 1 . Indeed, the joint classification of the two images provides a solution to the change detection problem per se. Moreover, the changes are detected not only as a Boolean indication of changed and unchanged pixels, but also in terms of multiple transitions from/to different land covers at different dates. Examples may include the identification of the transitions among thematic classes such as urban areas, agricultural fields, grasslands, and forests.
A set S r = S 1r , S 2r , . . . , S Qr of segmentation maps related to Q different scale levels is generated using the well-known Felzenszwalb and Huttenlocher's segmentation method in [74,75]. It is a computationally efficient region-merging algorithm based on a graph representation of the input image.
Consistently with the Markovian approach to data fusion, the contextual spatial information related to each image, the temporal correlation between X 0 and X 1 , and the multiscale information provided by S r are fused together as a linear combination of different contributions to the energy function of the proposed MRF model: where the spatial and temporal neighborhood relations are indicated by i ∼ j and i j, respectively; s iqr represents the label of the ith pixel in the qth segmentation map S qr (q = 1, 2, . . . , Q); and the parameters α qr , β r , and γ r represent the coefficients of the linear combination and weigh the various contributions to the MRF energy function.
In more detail, i ∼ j indicates that pixels i and j at the same date are neighbors with respect to a 4-or 8-connected spatial neighborhood. The notation i j means that i is a pixel in the image acquired at time t 0 , j is a pixel in the image collected at time t 1 (or vice versa), and either i and j correspond to the same spatial location or j belongs, in its own image, to the 4-or 8-connected neighborhood centered on the same location as i (see Figure 6). This relation defines a local neighborhood across the temporal pair of images.
In other words, the relations ∼ and define a spatiotemporal undirected graph across the pixel lattices of the two acquisitions at times t 0 and t 1 : each pixel at either t 0 or t 1 is a node of the graph, and there is an undirected edge between each pair of pixels i and j such that i ∼ j or i j. This graph, together with the Markovian formulation expressed by the proposed energy function, determines a multitemporal probabilistic graphical model for the relation among pixels at the same or at different times.
Remote Sens. 2021, 13, 3509 11 of 29 ( = 1,2, … , ); and the parameters , , and represent the coefficients of the linear combination and weigh the various contributions to the MRF energy function.
In more detail, ~ indicates that pixels and at the same date are neighbors with respect to a 4-or 8-connected spatial neighborhood. The notation ⋈ means that is a pixel in the image acquired at time 0 , is a pixel in the image collected at time 1 (or vice versa), and either and correspond to the same spatial location or belongs, in its own image, to the 4-or 8-connected neighborhood centered on the same location as (see Figure 6). This relation defines a local neighborhood across the temporal pair of images. In other words, the relations ~ and ⋈ define a spatiotemporal undirected graph across the pixel lattices of the two acquisitions at times 0 and 1 : each pixel at either 0 or 1 is a node of the graph, and there is an undirected edge between each pair of pixels and such that ∼ or ⋈ . This graph, together with the Markovian formulation expressed by the proposed energy function, determines a multitemporal probabilistic graphical model for the relation among pixels at the same or at different times.
(a) (b) Figure 6. Graphical representation of the two different local neighborhoods used within the proposed energy function: (a) local neighborhood corresponding to the notation ∼ ; (b) local neighborhood corresponding to the notation ⋈ .
The energy function (⋅) is composed of unary and pairwise contributions. The former relates to the pixelwise terms, while the latter model the relations between pairs of neighboring pixels. The first term of the energy is a unary term defined according to the probability mass function (PMF) ( ) = { = |ℓ = } ( = 1,2, … , ; = 1,2, … , ; = 0,1) of the considered segment labels conditioned to the thematic classes. It represents the energy contribution associated with each segmentation map at time . The use of class-conditional PMFs in this role and their inclusion in the energy through negative logarithms are inspired by the analogous role of the class-conditional probability density functions of the feature vectors in classical MRF models for image classification and segmentation [5,9].
The PMF ( ) can be estimated as a relative frequency, i.e., as the percentage of the pixels belonging to segment in the segmentation map and assigned to class over the total number of pixels assigned to ( = 1,2, … , ; = 1,2, … , ; = 0,1). However, to perform this estimation, not only the stack of multiscale segmentation maps but also preliminary classification maps are necessary inputs at time ( = 0,1). Accordingly, as a preliminary phase, each image is classified separately in a supervised manner using the training samples available at time ( = 0,1). In general, this preliminary stage can be addressed using an arbitrary supervised classifier. In the proposed technique, a variety of methods has been considered:


The contextual classification method proposed in [76], which consists in a support vector machine (SVM) whose kernel function is based on a region-based approach and incorporates spatial information associated with an input segmentation map. The segmentation map associated with the finest scale among the aforementioned ones is used. (a) local neighborhood corresponding to the notation i ∼ j; (b) local neighborhood corresponding to the notation i j.
The energy function U(·) is composed of unary and pairwise contributions. The former relates to the pixelwise terms, while the latter model the relations between pairs of neighboring pixels. The first term of the energy is a unary term defined according to the probability mass function (PMF) P kqr (s) = P s iqr = s ir = ω kr (k = 1, 2, . . . , K r ; q = 1, 2, . . . , Q; r = 0, 1) of the considered segment labels conditioned to the thematic classes. It represents the energy contribution associated with each segmentation map S qr at time t r . The use of classconditional PMFs in this role and their inclusion in the energy through negative logarithms are inspired by the analogous role of the class-conditional probability density functions of the feature vectors in classical MRF models for image classification and segmentation [5,9].
The PMF P kqr (s) can be estimated as a relative frequency, i.e., as the percentage of the pixels belonging to segment s in the segmentation map S qr and assigned to class ω kr over the total number of pixels assigned to ω kr (k = 1, 2, . . . , K r ; q = 1, 2, . . . , Q; r = 0, 1). However, to perform this estimation, not only the stack of multiscale segmentation maps but also preliminary classification maps are necessary inputs at time t r (r = 0, 1). Accordingly, as a preliminary phase, each image X r is classified separately in a supervised manner using the training samples available at time t r (r = 0, 1). In general, this preliminary stage can be addressed using an arbitrary supervised classifier. In the proposed technique, a variety of methods has been considered:

•
The contextual classification method proposed in [76], which consists in a support vector machine (SVM) whose kernel function is based on a region-based approach and incorporates spatial information associated with an input segmentation map. The segmentation map associated with the finest scale among the aforementioned ones is used.

•
The framework proposed in [77] and extended in [78] that provides a rigorous methodological integration of the SVM and MRF approaches. It is based on a Hilbert space formulation, and its kernel combines the rationale of SVMs and a predefined spatial MRF model. The well-known Potts model is used in this role. The extensions in [78] also integrate global or near-global energy minimization algorithms based on graph cuts or belief propagation.

•
The random forest (RF) classifier, rooted in the ensemble learning theory. RF combines multiple individual decision trees, each trained on a random resampling of the training data (bagging) and using, at each decision node, a random subset of the full set of features.
The choice of these methods has been such that the set of classifiers considered for the preliminary phase of the proposed approach range from noncontextual (RF) to contextual methods based on both Markovian ( [77,78]) and region-based formulations ( [76]).
The second term of the linear combination in Equation (1) is the first pairwise contribution and models the temporal relationships. It is expressed in terms of transition probabilities P ir = ω kr j,1−r = ω h,1−r from each thematic class ω h,1−r at time t 1−r to each class ω kr at time t r (r = 0, 1). The transition probability obtained for the pair (h, k) represents the (h, k)-element of the K 1−r × K r transition probability matrix (TPM). Following the approach in [8], the expectation maximization (EM) method [79] is used to automatically estimate the TPM from the input pair of multitemporal observations, thus catching the temporal correlation between the two images. Details of this estimation process can be found in [8].
The last contribution in the MRF energy is again a pairwise contribution and models the spatial contextual information within the image collected at each time. It acts as a spatial regularizer by enforcing a smooth prior in the Bayesian formulation. It is integrated in the energy function through the Potts model [5].

Optimization of the Parameters and Energy Minimization
The parameters α qr , β r , and γ r in Equation (1) represent the weights of the energy contributions. Thanks to the linear model adopted in the proposed Markovian energy formulation, the estimation of such parameters can be performed using the method presented in [80]. This technique combines a minimum mean square error (MMSE) formulation with Platt's sequential minimal optimization (SMO) algorithm. The technique in [80] formalizes the problem of the estimation of the weight parameters of the MRF model as a constrained quadratic optimization problem that is based on the correctness of the classification on the training set. SMO is well-known in the literature of quadratic programming for kernel machines (e.g., for SVM classification and regression) and is used here to numerically solve the aforementioned constrained quadratic problem. Algorithmic details can be found in [80].
Finally, the minimization of the proposed energy function is accomplished using the graph cut α-β swap algorithm [76]. The graph cut approach has been chosen due to its capability to obtain strong local energy minima in a computationally efficient manner [81,82]. Leveraging on the Ford-Fulkerson theorem, the graph cut method for binary classification is based on the reformulation of the energy minimization problem as a max-flow/min-cut problem over a suitable graph [83]. In this binary case, graph cuts allow reaching a global minimum in polynomial time. In the case of a multiclass classification setup, graph cut methods, such as the α-β swap algorithm, converge to a local minimum that is characterized by "strong" analytical properties with respect to suitable optimality criteria [81,82]. Intuitively, and according to [82], such a strong local minimum may be thought of as a local minimum in a "wide valley". As compared to deterministic methods such as iterated conditional mode [84], which converges to a generic local minimum in a short time [5], or to stochastic methods such as simulated annealing, which converges to a global minimum but takes a very long time [6], the graph cut approach represents an effective solution from both viewpoints of accuracy and computational burden.

Results
The proposed approach has been applied to the case study described in Section 3.1 and associated with optical (Pléiades) and SAR (COSMO-SkyMed) data collected between 2016 and 2019 in the area of Jérémie, southwestern Haiti, in relation to the recovery after Hurricane Matthew.
Given the target spatial resolution of the output maps (10 m for all results and 2 m in specific cases; see Section 3.1), the input images have been subsampled onto the corresponding pixel grid, because the method produces classification results at the same resolution as the input data. Proper antialiasing has been applied within this subsampling preprocessing step. The proposed method has been applied by using five segmentation maps (i.e., by considering five spatial scales), generated from each input image, and by using the method in [76] as the baseline supervised classifier to be used in the estimation of the class-conditional PMFs. The results shown in the present section refer to this setup. We refer to Section 5 for a discussion of the sensitivity to the number of scales and to the choice of the baseline preliminary classifier.
First, Figure 7 shows the outputs of the proposed method when applied at 10 m resolution to the pair of images "Jérémie 2016" and "Jérémie 2017". The land cover classes include "urban/anthropogenic", "tall vegetation", "low vegetation", and "water" in both imaged scenes and "muddy water" and "bare soil" in the cases of "Jérémie 2016" and "Jérémie 2017", respectively. Intentionally, the labeling "urban/anthropogenic" is used to encompass the variety of changes occurring in Jérémie during the recovery phase due to urbanization-meant as the construction not only of new buildings but also of new infrastructures (e.g., roads)-and anthropogenic activities that alter the land cover (and thus the spectral reflectance and/or the radar backscatter) such as wastelands.
effective solution from both viewpoints of accuracy and computational burden.

Results
The proposed approach has been applied to the case study described in Section 3.1 and associated with optical (Pléiades) and SAR (COSMO-SkyMed) data collected between 2016 and 2019 in the area of Jérémie, southwestern Haiti, in relation to the recovery after Hurricane Matthew.
Given the target spatial resolution of the output maps (10 m for all results and 2 m in specific cases; see Section 3.1), the input images have been subsampled onto the corresponding pixel grid, because the method produces classification results at the same resolution as the input data. Proper antialiasing has been applied within this subsampling preprocessing step. The proposed method has been applied by using five segmentation maps (i.e., by considering five spatial scales), generated from each input image, and by using the method in [76] as the baseline supervised classifier to be used in the estimation of the class-conditional PMFs. The results shown in the present section refer to this setup. We refer to Section 5 for a discussion of the sensitivity to the number of scales and to the choice of the baseline preliminary classifier.
First, Figure 7 shows the outputs of the proposed method when applied at 10 m resolution to the pair of images "Jérémie 2016" and "Jérémie 2017". The land cover classes include "urban/anthropogenic", "tall vegetation", "low vegetation", and "water" in both imaged scenes and "muddy water" and "bare soil" in the cases of "Jérémie 2016" and "Jérémie 2017", respectively. Intentionally, the labeling "urban/anthropogenic" is used to encompass the variety of changes occurring in Jérémie during the recovery phase due to urbanization-meant as the construction not only of new buildings but also of new infrastructures (e.g., roads)-and anthropogenic activities that alter the land cover (and thus the spectral reflectance and/or the radar backscatter) such as wastelands.   Table 1. Table 1. Accuracy scores for the two classification maps obtained with the datasets "Jérémie 2016" and "Jérémie 2017". The corresponding maps are shown in Figure 7a,b. Legend: OA = overall accuracy, AA = average accuracy, PA = producer accuracy, UA = user accuracy, kappa = Cohen's κ statistic.  Figure 7 includes the classification map for 2016 (Figure 7a), the classification map for 2017 (Figure 7b), the change map highlighting the changes that occurred in the considered time window (Figure 7c), and the legend of the change map (Figure 7d). In this legend, colors indicate changes and shades of grey denote unchanged pixels belonging to the various classes. Table 1 shows the accuracy of the test samples for these classification maps. The table also specifies the legend used in the classification maps of Figure 7a,b. Discussion on the classes and class transitions that the method was able to identify is presented in Section 5.
The same set of experiments has also been carried out considering all pairs of datasets acquired in consecutive pairs of years between 2017 and 2019. In this case as well, the optical and SAR images have been downsampled to the resolution of 10 m. The resulting set of multitemporal classification maps contributes to the monitoring of the recovery phase after the hurricane. Figure 8 shows the classification maps obtained with pairs of datasets collected in 2017, 2018, and 2019. The classes are the same as in the case of 2017 (Figure 7). The proposed method jointly computes a couple of classification maps based on data collected at two different dates. Therefore, the same classification map is obtained more than once, from distinct temporal pairs. An example is the map of 2018, which is computed considering either the data of 2017 and 2018 or the data collected in 2018 and 2019. The same comment holds in the case of the map of 2017. However, the differences between the two maps generated for the same year were minor in all the cases, so, for the sake of brevity, we report only a single case, i.e., the map coming from the pair ("Jérémie 2016", "Jérémie 2017") for 2017 and the map from the pair ("Jérémie 2017", "Jérémie 2018") for 2018. For the legend of the thematic maps, we refer to Table 2. Black pixels indicate unclassified areas where no data were available (e.g., cloud masking or out of the satellite footprint). For the legend of the thematic maps, we refer to Table 2. Black pixels indicate unclassified areas where no data were available (e.g., cloud masking or out of the satellite footprint). With regard to these classification results, Figure 9 shows the corresponding change maps, while Table 2 summarizes the accuracy scores. In this case, the thematic and change maps show unclassified areas, which correspond to parts of the input image that were covered by clouds in the Pléiades data and were therefore masked out. Such a masking operation has been chosen as a simple preprocessing step enabling the straightforward applicability of the proposed method to the application of recovery monitoring. Other solutions are indeed possible, such as missing data reconstruction techniques based on Bayesian or neural approaches [85,86]. The legends are shown in Figure 9 and Table 2. For the legend of the thematic maps, we refer to Table 2. Black pixels indicate unclassified areas where no data were available (e.g., cloud masking or out of the satellite footprint).
(a) (b) (c)  The results shown so far have been generated at the target spatial resolution of 10 m. To prove the capability of the proposed method to generate results at a finer resolution, a zoomed detail of the same area has also been classified at the resolution of 2 m. The images used for this dedicated experiment are those collected in 2016 and 2017.
The zoomed area has been selected in the urban region of Jérémie, west of the main town cemetery. This choice has been driven by the possibility of using the high spatial details of the urban areas to assess the capability of the proposed method to generate effective products also at finer resolutions. The classes that can be appreciated in this The results shown so far have been generated at the target spatial resolution of 10 m. To prove the capability of the proposed method to generate results at a finer resolution, a zoomed detail of the same area has also been classified at the resolution of 2 m. The images used for this dedicated experiment are those collected in 2016 and 2017.
The zoomed area has been selected in the urban region of Jérémie, west of the main town cemetery. This choice has been driven by the possibility of using the high spatial details of the urban areas to assess the capability of the proposed method to generate effective products also at finer resolutions. The classes that can be appreciated in this zoomed area at 2 m resolution are "urban/anthropogenic" (in this case most exclusively due to the urban footprint and residential buildings), "grass", and "shrubs and bush", consistently with the finer level of spatial detail than in the aforementioned 10 m imagery. Indeed, the qualitative analysis of the maps in Figure 10, together with the quantitative evaluation of the scores summarized in Table 3, confirm the absence of oversmoothing effects and artifacts due to the Markovian modeling. Conversely, the maps show a remarkable homogeneity in the areas of the same land cover class. The discussions on the classes and class transitions in this case of higher-resolution input data are also provided in Section 5.
Indeed, the qualitative analysis of the maps in Figure 10, together with the quantitative evaluation of the scores summarized in Table 3, confirm the absence of oversmoothing effects and artifacts due to the Markovian modeling. Conversely, the maps show a remarkable homogeneity in the areas of the same land cover class. The discussions on the classes and class transitions in this case of higher-resolution input data are also provided in Section 5.  Figure 11, while the legend of the classification maps is provided in Table 3.   Figure 11, while the legend of the classification maps is provided in Table 3.
Indeed, the qualitative analysis of the maps in Figure 10, together with the quantitative evaluation of the scores summarized in Table 3, confirm the absence of oversmoothing effects and artifacts due to the Markovian modeling. Conversely, the maps show a remarkable homogeneity in the areas of the same land cover class. The discussions on the classes and class transitions in this case of higher-resolution input data are also provided in Section 5.  Figure 11, while the legend of the classification maps is provided in Table 3. Figure 11. Legend representing the class transitions in the change map of Figure 10f. Figure 11. Legend representing the class transitions in the change map of Figure 10f. Table 3. Accuracy scores on the test samples for the two classification maps obtained with the datasets "Jérémie 2016" and "Jérémie 2017" at the resolution of 2 m and using 4 segmentation maps.
The corresponding maps are shown in Figure 10d,e.

Discussion
In Section 4, we have presented the experimental results achieved by the proposed method on the datasets spanning the time window that ranges from 2016 to 2019. Here, the focus is on the discussion of such results and of the behavior of the proposed method with respect to the related model selection issues. Since the weight parameters in the energy function are automatically optimized, these issues regard the number Q of segmentation levels and the choice of the baseline classifier used for the generation of the preliminary thematic maps. Another point that is addressed is related to the multisensor fusion capabilities. The method fuses multisensor optical-SAR acquisitions at two different dates for classification purposes. Hence, an ablation study, in which the multitemporal classification is performed using both optical and SAR data as compared to optical data only, is presented to assess the importance of the radar component with respect to the classification capabilities. Multispectral data are well known to be usually a very informative source of information for the classification of heterogeneous land cover types. Nevertheless, SAR data can be very useful for the discrimination of a subset of such classes, such as urban areas and water bodies. The ablation study is aimed precisely at appreciating the added value of the SAR component in the output results of the proposed approach. Finally, the change maps are analyzed by assessing their capability of identifying land cover transitions that are typical of the recovery phases in the aftermath of natural disasters.
Starting from the results obtained on the 10 m resolution dataset of 2016 and 2017 (see Figure 7), the most relevant transition is the one highlighting the "muddy water" turning into clean "water" in the area close to the mouth of the Grande Anse River. This is in line with the recovery of the natural riverine and coastal environments from the dramatic situation imaged in 2016, just a few days after the hurricane.
A second type of change indicates the rise of a "bare soil" area at the mouth of the river in 2017 ( Figure 12) and still existing in 2019 (Figure 13). Such a formation was not present in the period immediately after the disaster. The change detection method therefore was able to document and spatially locate another typical natural process that, in the case of Jérémie, was found to have interesting implications for the local community economics and subsistence. At the time of the ground-truth validation in April 2019, the new river mouth bar and the rejuvenated coastline ( Figure 13c) were exploited as a natural commodity by local inhabitants to source sand and gravel to be used as building construction materials (Figure 13d). This adds to the evidence of quarrying activities that developed later in 2018-2019 in the nearby hills, a few hundred meters to the south, as highlighted by the 2019 classification map in the form of a pixel cluster of land cover transition due to "anthropogenic" activity and not "urbanization" in a strict sense (see black circle in Figure 13b).
Moreover, in the coastal area, some "urban/anthropogenic" and "water" areas have turned into "bare soil" areas (i.e., yellow and purple pixels in the change map). While the ground-truth data corroborate the accuracy of this change detection classification, they also confirm that the sand and gravel are interspersed with widespread rubbish and waste (Figure 13c), which contributed to a spectral heterogeneity of the "bare soil" class.
Finally, another type of transition that has been identified involves the mutual change between "urban/anthropogenic" and "vegetated" areas (i.e., orange and blue pixels in the change map). As mentioned in Section 4, at the considered resolution, the class "urban/anthropogenic" is spatially heterogeneous and composed of a set of subclasses (including some types of "wasteland" areas), which makes the related classification process a challenging task.   Table 1.  Considering the experiments with data collected in the subsequent years (i.e., the "Jérémie 2017", "Jérémie 2018", and "Jérémie 2019" datasets), the changes that have been identified involve primarily "vegetation" and "urban/anthropogenic" areas. Figure 14 zooms onto the southwestern sector of Jérémie and the related details of the classification and change maps resulting from the 2017-2018 temporal pair. The vast majority of transitions involve "tall" and "low vegetation". On one hand, this is related to the fact that the two acquisitions are temporally located in two different seasons (i.e., autumn and spring), thus leading to a change in the vegetation present in the scene. On the other hand, at the resolution of 10 m, the textural details of the remotely sensed images do not well characterize the typical features of the vegetation, thus hampering correct discrimination of different vegetation types. also confirm that the sand and gravel are interspersed with widespread rubbish and waste (Figure 13c), which contributed to a spectral heterogeneity of the "bare soil" class.
Finally, another type of transition that has been identified involves the mutual change between "urban/anthropogenic" and "vegetated" areas (i.e., orange and blue pixels in the change map). As mentioned in Section 4, at the considered resolution, the class "urban/anthropogenic" is spatially heterogeneous and composed of a set of subclasses (including some types of "wasteland" areas), which makes the related classification process a challenging task.
Considering the experiments with data collected in the subsequent years (i.e., the "Jérémie 2017", "Jérémie 2018", and "Jérémie 2019" datasets), the changes that have been identified involve primarily "vegetation" and "urban/anthropogenic" areas. Figure 14 zooms onto the southwestern sector of Jérémie and the related details of the classification and change maps resulting from the 2017-2018 temporal pair. The vast majority of transitions involve "tall" and "low vegetation". On one hand, this is related to the fact that the two acquisitions are temporally located in two different seasons (i.e., autumn and spring), thus leading to a change in the vegetation present in the scene. On the other hand, at the resolution of 10 m, the textural details of the remotely sensed images do not well characterize the typical features of the vegetation, thus hampering correct discrimination of different vegetation types.  Figure 9, while the legend of the classification maps is provided in Table 1.
Conversely, the transitions from "vegetated" areas to "urban/anthropogenic" areas that are identified are correlated with the reconstruction efforts of the recovery phase, as was ascertained during the validation survey in April 2019. Such increase in residential buildings mostly occurred across a hilly and quite steep area of the town, where a parallel  Figure 9, while the legend of the classification maps is provided in Table 1.
Conversely, the transitions from "vegetated" areas to "urban/anthropogenic" areas that are identified are correlated with the reconstruction efforts of the recovery phase, as was ascertained during the validation survey in April 2019. Such increase in residential buildings mostly occurred across a hilly and quite steep area of the town, where a parallel investigation with multitemporal interferometric SAR (InSAR) techniques [67] relying on Sentinel-1 scenes in the period 2017-2018 highlighted the presence of ground motions in the direction away from the satellite sensor (velocity of up to −2.4 cm/year). As emerged during the two user workshops organized by CNES and CNIGS with the Haitian partners and stakeholders in Jérémie and Port-au-Prince in April-May 2019, such a combination of geospatial information represents relevant information for local and national authorities in their efforts to monitor the recovery phase, prevent new exposure to hazards, and implement resilient land use policies [87].
All the comments above refer to the results obtained from input data downsampled to 10 m. Regarding the results at 2 m resolution, most of the transitions that have been identified involve the "urban/anthropogenic" and "vegetation" classes. In the change map in Figure 10f, the blue transitions (from "vegetation" to "urban/anthropogenic") suggest rebuilding operations that have been conducted after the disaster. It is common that some pixels showing highly damaged buildings were classified as "bare soil" or "vegetation" in 2016. Conversely, the transitions from "urban/anthropogenic" to "vegetation" (orange in Figure 10f) may indicate areas that were so damaged that the old buildings have been dismissed and replaced by green areas.
Concerning the classification of urban areas, it is well known from the literature [88] that SAR data are usually informative for the discrimination of this class. Moreover, the case reported above with VHR data is particularly challenging in this perspective due to the spatial heterogeneity and the presence of multiple subclasses in the urban scene. Nevertheless, the proposed method effectively discriminates the built-up areas in the scene by taking advantage of both multispectral optical and SAR data. To assess the importance of the COSMO-SkyMed data, Figure 15 shows a comparison of the results obtained with and without input COSMO-SkyMed imagery. The figure focuses on an example of detail of the 2016-2017 temporal pair. Figure 15c shows the change map generated using the multispectral optical dataset only, while Figure 15d shows the result obtained by integrating both optical and radar data. It is straightforward to qualitatively appreciate the visual impact of the second source of information on the effective discrimination of the changes, and thus the correct classification of the challenging "urban" pixels. In this case as well, the legend is the one reported in Figure 10c.
during the two user workshops organized by CNES and CNIGS with the Haitian partners and stakeholders in Jérémie and Port-au-Prince in April-May 2019, such a combination of geospatial information represents relevant information for local and national authorities in their efforts to monitor the recovery phase, prevent new exposure to hazards, and implement resilient land use policies [87].
All the comments above refer to the results obtained from input data downsampled to 10 m. Regarding the results at 2 m resolution, most of the transitions that have been identified involve the "urban/anthropogenic" and "vegetation" classes. In the change map in Figure 10f, the blue transitions (from "vegetation" to "urban/anthropogenic") suggest rebuilding operations that have been conducted after the disaster. It is common that some pixels showing highly damaged buildings were classified as "bare soil" or "vegetation" in 2016. Conversely, the transitions from "urban/anthropogenic" to "vegetation" (orange in Figure 10f) may indicate areas that were so damaged that the old buildings have been dismissed and replaced by green areas.
Concerning the classification of urban areas, it is well known from the literature [88] that SAR data are usually informative for the discrimination of this class. Moreover, the case reported above with VHR data is particularly challenging in this perspective due to the spatial heterogeneity and the presence of multiple subclasses in the urban scene. Nevertheless, the proposed method effectively discriminates the built-up areas in the scene by taking advantage of both multispectral optical and SAR data. To assess the importance of the COSMO-SkyMed data, Figure 15 shows a comparison of the results obtained with and without input COSMO-SkyMed imagery. The figure focuses on an example of detail of the 2016-2017 temporal pair. Figure 15c shows the change map generated using the multispectral optical dataset only, while Figure 15d shows the result obtained by integrating both optical and radar data. It is straightforward to qualitatively appreciate the visual impact of the second source of information on the effective discrimination of the changes, and thus the correct classification of the challenging "urban" pixels. In this case as well, the legend is the one reported in Figure 10c. With regard to "urban/anthropogenic", SAR also helps to capture other land cover transitions that are associated with anthropogenic impact (e.g., wasteland, dump sites) that may be confused due to spectral heterogeneity if multispectral optical data were used alone. However, the interpretation of the cause of such a land cover transition could not be so straightforward without some knowledge about the condition on the ground. This was evident during the validation of "urban/anthropogenic" changes found in the classification map corresponding to 2017 along the right bank of the Grande Anse River, south of Jérémie ( Figure 16). These changes were not due to building construction as the "urban" classification would suggest, but rather wasteland as unregulated and uncontrolled open-air dump sites (Figure 16e,f), where garbage, plastic, and different types of solid waste are accumulated and, sometimes, even burnt (Figure 16g). obtained without using the COSMO-SkyMed data and corresponding to the timeframe 2016-2017; (d) change map obtained using the COSMO-SkyMed data and corresponding to the timeframe 2016-2017. The legend is the one provided in Figure 10c.
With regard to "urban/anthropogenic", SAR also helps to capture other land cover transitions that are associated with anthropogenic impact (e.g., wasteland, dump sites) that may be confused due to spectral heterogeneity if multispectral optical data were used alone. However, the interpretation of the cause of such a land cover transition could not be so straightforward without some knowledge about the condition on the ground. This was evident during the validation of "urban/anthropogenic" changes found in the classification map corresponding to 2017 along the right bank of the Grande Anse River, south of Jérémie ( Figure 16). These changes were not due to building construction as the "urban" classification would suggest, but rather wasteland as unregulated and uncontrolled openair dump sites (Figure 16e,f), where garbage, plastic, and different types of solid waste are accumulated and, sometimes, even burnt (Figure 16g).  The issue of piles of solid wastes by roadsides, rivers, and other open spaces associated with the expansion of urban areas and lack of city planning in Haiti (particularly Portau-Prince), thus causing significant health and environmental problems, was already reported in the literature [89] and URD's Observatory in Haiti [90]. The developed change detection method was able to detect that such change already occurred in the first year after the hurricane. On the contrary, the classification map corresponding to 2019 seemed to highlight that most of the dump area transitioned to "low vegetation" (Figure 16c), whereas in situ inspection confirmed that this area was still present (Figure 16e,g). The waste burning and the growth of vegetation have likely led the classifier to confuse the local spectral response and thus output such a classification result.
For sure, this example reassures about the importance of undertaking multitemporal change detection through regularly spanned satellite observations across the whole duration of the recovery phase (i.e., from near to long term), the wise choice to adopt a wide concept of the "urban/anthropogenic" class given its heterogeneity, and the opportunity to validate the classification results vs. the current context on the ground. A correct interpretation of land cover transitions also has a direct impact on the possible use of these classification maps in risk assessment and reduction. The assumption that the changes found along the river were due to urban constructions may have led to the wrong assessment about an increase in the elements at risk, for instance, of flooding events or hurricanes. On the contrary, the detected land cover changes warn about an anthropogenic process associated with the recovery process, which causes an increase in local hazards to both the environment and the health of the local community.
Concerning the behavior of the method as a function of the number Q of segmentation levels, Figure 17 displays the Cohen's κ of different classification maps as a function of Q. Here, the maps that are being considered are the ones corresponding to 2016 at 10 m and those of 2016 and 2018 at 2 m. The performances do not vary much as the number of scales is increased from 2 to 5. On one hand, this result confirms the stability of the output products with respect to the parameter Q, thus suggesting that the choice of its value is not critical. On the other hand, according to common practice in remote sensing, the test pixels are located inside homogeneous areas to minimize the impact of mixed pixels on accuracy assessment. Nevertheless, all the maps shown in Section 4 suggest a regular behavior at the interface between adjacent classes without artifacts or oversmoothing effects. ated with the expansion of urban areas and lack of city planning in Haiti (particularly Port-au-Prince), thus causing significant health and environmental problems, was already reported in the literature [89] and URD's Observatory in Haiti [90]. The developed change detection method was able to detect that such change already occurred in the first year after the hurricane. On the contrary, the classification map corresponding to 2019 seemed to highlight that most of the dump area transitioned to "low vegetation" (Figure 16c), whereas in situ inspection confirmed that this area was still present (Figure 16e,g). The waste burning and the growth of vegetation have likely led the classifier to confuse the local spectral response and thus output such a classification result.
For sure, this example reassures about the importance of undertaking multitemporal change detection through regularly spanned satellite observations across the whole duration of the recovery phase (i.e., from near to long term), the wise choice to adopt a wide concept of the "urban/anthropogenic" class given its heterogeneity, and the opportunity to validate the classification results vs. the current context on the ground. A correct interpretation of land cover transitions also has a direct impact on the possible use of these classification maps in risk assessment and reduction. The assumption that the changes found along the river were due to urban constructions may have led to the wrong assessment about an increase in the elements at risk, for instance, of flooding events or hurricanes. On the contrary, the detected land cover changes warn about an anthropogenic process associated with the recovery process, which causes an increase in local hazards to both the environment and the health of the local community.
Concerning the behavior of the method as a function of the number of segmentation levels, Figure 17 displays the Cohen's of different classification maps as a function of . Here, the maps that are being considered are the ones corresponding to 2016 at 10 m and those of 2016 and 2018 at 2 m. The performances do not vary much as the number of scales is increased from 2 to 5. On one hand, this result confirms the stability of the output products with respect to the parameter , thus suggesting that the choice of its value is not critical. On the other hand, according to common practice in remote sensing, the test pixels are located inside homogeneous areas to minimize the impact of mixed pixels on accuracy assessment. Nevertheless, all the maps shown in Section 4 suggest a regular behavior at the interface between adjacent classes without artifacts or oversmoothing effects.  Concerning the behavior of the proposed method as a function of the preliminary supervised classifier used in the estimation of the class-conditional PMFs, Figure 18 displays examples both of preliminary classification maps obtained via different baseline classifiers and of the output maps of the proposed technique. In addition to the aforementioned results, which were based on the use of the method in [76] for the preliminary classification stage, RF and the method in [78] are addressed in this example. The higher accuracy granted in the preliminary stage by the contextual algorithm in [78] than by the noncontextual RF can be noted by comparing Figure 18a,b. Nevertheless, it can be immediately seen that the final maps of the proposed Markovian method are mostly identical to each other and to the map in Figure 7a (which was obtained by starting from the preliminary map generated by the technique in [76]). This suggests the very limited sensitivity of the Markovian fusion framework to the baseline classifier considered for the initialization. curacy granted in the preliminary stage by the contextual algorithm in [78] than by the noncontextual RF can be noted by comparing Figure 18a,b. Nevertheless, it can be immediately seen that the final maps of the proposed Markovian method are mostly identical to each other and to the map in Figure 7a (which was obtained by starting from the preliminary map generated by the technique in [76]). This suggests the very limited sensitivity of the Markovian fusion framework to the baseline classifier considered for the initialization.  Table 1.
This robustness can be interpreted in terms of the role of the preliminary classification maps within the overall proposed method. Indeed, the preliminary maps are used to estimate the class-conditional distributions ( |ℓ ) of the segment labels at all considered scales (see Equation (1)). So, first, they affect only the multiscale terms of the energy function in Equation (1), whereas the temporal and spatial terms are unaffected. Then, the multiscale energy terms are also associated with a stack of segmentation maps, which, in turn, are generated independently from the preliminary classification maps. The use of the segmentation maps implies a remarkable spatial regularization in the estimates of the class-conditional distributions ( |ℓ ). This spatial regularization significantly mitigates the possible impact of the specific choice of the preliminary classification maps on the energy function of the proposed MRF model-and in turn on the resulting energy  Table 1.
This robustness can be interpreted in terms of the role of the preliminary classification maps within the overall proposed method. Indeed, the preliminary maps are used to estimate the class-conditional distributions P(s iqr iqr ) of the segment labels at all considered scales (see Equation (1)). So, first, they affect only the multiscale terms of the energy function in Equation (1), whereas the temporal and spatial terms are unaffected. Then, the multiscale energy terms are also associated with a stack of segmentation maps, which, in turn, are generated independently from the preliminary classification maps. The use of the segmentation maps implies a remarkable spatial regularization in the estimates of the class-conditional distributions P(s iqr iqr ) . This spatial regularization significantly mitigates the possible impact of the specific choice of the preliminary classification maps on the energy function of the proposed MRF model-and in turn on the resulting energy minimum identified using graph cuts. Nevertheless, it is worth noting that the initial classification maps in Figure 18a,b are overall consistent. They differ in the discrimination of low/tall vegetation and of the urban/anthropogenic areas and especially in the shape of the detected region of muddy water region (see Figure 18). However, as described above, such differences do not significantly affect the output of the Markovian modeling.
In particular, as pointed out again by Figure 18, the proposed method is able to generate more accurate classification maps than the pixelwise RF classifier and the contextual method in [78] combining SVM and MRF. The classification errors made by the method in [78] and by RF are evident in Figure 18a,b, respectively, as compared to the maps obtained by the proposed approach in panels (c) and (d). For instance, the map of RF in panel (b), which corresponds to an overall accuracy of around 79%, looks quite noisy, which is in accordance with the fully noncontextual formulation of this classifier. However, the proposed method generally demands more computational resources than the considered previous techniques. Nevertheless, such an increased computational burden is not dramatic, especially in relation to the aforementioned gain in accuracy and to the addressed goal. For example, on the 2016-2017 dataset, RF required around 3 min to generate the two classification maps, while the contextual method in [78] and the SVM with region-based kernel in [76] required around 8 min. Conversely, the proposed method, given the input preliminary maps generated with one of the aforementioned algorithms, took a time ranging from 9 to 15 min depending on the number of scales (ranging from 2 to 5). The machine used for the experiments was a laptop equipped with an i73632QM processor working at a maximum frequency of 3.6 GHz and 8 GB of DDR3 RAM. The times taken in the cases of the other image pairs were similar. In particular, the goal of the considered approach is to map land-cover changes to monitor the recovery effort after Hurricane Matthew. It is worth noting that, compared to the timescale of a recovery process, the computation time of the proposed approach is very short; i.e., there are no computational criticalities in relation to the addressed application.

Conclusions
A novel method has been proposed in this paper to address the problem of supervised multitemporal joint classification of multisensor optical-SAR images in the framework of the application to the monitoring of the recovery phase after Hurricane Matthew in Haiti. The method makes use of a probabilistic graphical approach formulated through a novel MRF model. This model integrates spatial and temporal relations as well as multiscale information associated with segmentation maps considering different levels of detail.
The method has been applied to and validated in the CEOS Haiti RO test area of Jérémie. The experimentation involved Pléiades and COSMO-SkyMed data collected in the period ranging from 2016 to 2019. From an application viewpoint, the proposed method proved to be accurate in the discrimination of the land cover classes in the considered scene. In particular, the land cover transitions that have been detected are consistent with the presence of rebuilding activity, the removal of damaged buildings in favor of green areas, and seasonal changes in the vegetation of the area. The output classification maps also highlighted areas of land cover transition to "bare soil", at the mouth of the Grande Anse River and along the coast, as well as to "urban/anthropogenic" along the right riverbank. These areas are related to anthropogenic activities of mining, quarrying, and waste disposal that represent some of the social, cultural, and economic facets of the recovery process in Jérémie. The obtained results confirm the potential of MRF-based approaches in applications to remote sensing analysis tasks involved in the management of natural disasters and risk reduction [91,92].
Consistently with the requirements of the general context of this study, which was framed within the CEOS Haiti RO initiative, the mapping was mainly focused on 10 m spatial resolution. However, the challenging scenario of VHR data at 2 m resolution has also been analyzed, pointing out the effectiveness of the proposed approach in this case as well and the usefulness of the input SAR data in the discrimination of the urban area.
From a methodological viewpoint, the experimental validation also addressed the sensitivity of the proposed technique with respect to the related model selection issues. In particular, the performances of the method exhibited remarkable stability as the number of input segmentation maps was varied and as a function of the preliminary classification maps involved in the calculation of the unary energy contributions. This suggests that the configuration of the method is not a critical phase and confirms its applicability in the addressed recovery monitoring framework. In particular, among the baseline classifiers that have been considered, both a well-known noncontextual algorithm (random forest) and three contextual techniques of various complexities have been experimentally evaluated. The very low impact of the choice of this preliminary classification stage on the performances of the proposed approach suggests using the most time-efficient baseline method among the considered ones, i.e., random forest. Indeed, this limited sensitivity is consistent with the fact that the preliminary maps are used only to determine one of the components of the energy of the proposed MRF, whereas the output land-cover change result is determined by fusing this information with spatial, temporal, and multiscale contributions.
From this perspective, an interesting future extension of this work will consist in further extending the spatial modeling capabilities of the proposed approach by also combining it with CNN architectures [61]. On one hand, this is expected to favor a further improvement in classification performances. On the other hand, it will generally imply stricter requirements in terms of ground truth. In this respect, similar to [93,94], the combination of MRF/CRF and CNN models could be specifically aimed at mitigating the need for especially large training sets, typical of deep learning techniques. Furthermore, in the conducted case studies, cloud-covered portions of the imaged scene were masked out directly. As a further generalization, missing data reconstruction methods, based for instance on Bayesian or deep neural approaches [85,86], could be integrated in the developed approach to benefit from the available satellite image time series and fill the gaps due to the presence of clouds.
From an application viewpoint, the effectiveness demonstrated by the proposed approach in the case study on Jérémie, Haiti, suggests its extension to other multitemporal processing tasks associated with disaster risk management, including, for instance, the update of land cover maps in the assessment of the vulnerability of a given area or the detailed mapping of the ground changes after an event occurs.