Hierarchical Terrain Classification Based on Multilayer Bayesian Network and Conditional Random Field

This paper presents a hierarchical classification approach for Synthetic Aperture Radar (SAR) images. The Conditional Random Field (CRF) and Bayesian Network (BN) are employed to incorporate prior knowledge into this approach for facilitating SAR image classification. (1) A multilayer region pyramid is constructed based on multiscale oversegmentation, and then, CRF is used to model the spatial relationships among those extracted regions within each layer of the region pyramid; the boundary prior knowledge is exploited and integrated into the CRF model as a strengthened constraint to improve classification performance near the boundaries. (2) Multilayer BN is applied to establish the causal connections between adjacent layers of the constructed region pyramid, where the classification probabilities of those sub-regions in the lower layer, conditioned on their parents’ regions in the upper layer, are used as adjacent links. More contextual information is taken into account in this framework, which is a benefit to the performance improvement. Several experiments are conducted on real ESAR and TerraSAR data, and the results show that the proposed method achieves better classification accuracy.


SAR Images Classification
Synthetic Aperture Radar (SAR) provides two-dimensional images independent from weather, daylight and cloud coverage conditions and has various applications, such as mapping, urban planning, disaster prevention [1], etc.Among these applications, terrain classification is one of the extremely active research interests.SAR images classification is a task to recognize objects by computing similarity and discrimination between them relying on the extracted features.An increasing number of papers specific to this topic have appeared over the last three decades; these proposed methods can be roughly cast into three categories: polarimetric target decomposition, feature extraction and model construction.
The polarimetric target decomposition method has been widely used in SAR image classification; the general idea behind this method is to represent the average backscattering as the sum of independent components.Two theories can be distinguished, namely coherent and incoherent decomposition.For the coherent decomposition, it aims to describe the scattering matrix as a combination of scattering responses.Since the polarization information is usually partially polarized, alternatively, the incoherent decomposition is used to express the target average matrix as the sum of the single scattering matrix; this approach provides a simplified way to extract geographical features.Over the years, a large number of associated methods have been proposed.The concept of target decomposition was first put forward by Huynen [2] for the analysis of scattering distribution in 1970.Pauli decomposition [3] was introduced by Cloude to decompose the polarization scattering matrix into four components.In 1990, Krogager [4] decomposed the complex symmetric scattering matrix into three components.In 1997, Cloude et al. proposed the H/α decomposition.After that, the target decomposition combining model was put forward, e.g., the three-component scattering model proposed by Freeman et al. [5].In 2005, Yamaguchi [6] presented the four-component scattering model based on single reflection, secondary reflection and skew scattering.In 2007, Touzi [7] put forward incoherent decomposition, which is rotation invariant and is based on the coherent scattering model.In recent years, several new methods have been developed, such as three-component decomposition based on models [8] and the four-component decomposition model with extended scatterers [9].
Feature extraction has also received much attention.The most straightforward approach is to regard the scattering coefficient or the coherence/incoherence matrix as the underlying image features.Besides polarimetric features, texture has been proven as an efficient feature for image classification [10], such as the gray-level co-occurrence matrix [11], the wavelet with statistic textures [12], discrete wavelet transform [13], the semi-variance graph [14], etc.In 2011, Dai [15] put forward a multi-level local histogram descriptor, which is robust to speckle noise.It captures both local and global information and has been proven superior to the gray-level co-occurrence matrix and Gabor wavelet.In recent years, colorization for the SAR image has been developed.For instance, Deng et al. [16] proposed a method to visualize the SAR image based on the scattering mechanism.Tumer et al. [17] used the elliptic icon to express the data and to integrate the polarization characteristics into contextual information.Uhlmann et al. [18] employ various visible color descriptors to represent SAR images and then perform supervised classification.In [19,20], Golparvar and Balali present the texton-based and non-parameter feature extraction methods for image segmentation, which are innovative and can also be considered for SAR images.
Though much progress has been developed in SAR images classification, there are still some problems waiting to be resoled, especially for those images with complex structure information.For example, the extracted pixel-level features are often sensitive to the clutter.In this case, the information, provided by the image itself, fails to perform robust and effective classification.To overcome this limitation, several models have been proposed to incorporate additional prior knowledge to facilitate classification.For example, the Markov Random Field (MRF) [21], Conditional Random Field (CRF) [22] and Bayesian Network (BN) [23] are used to model prior knowledge for performance enhancement.MRF is widely used in SAR images' interpretation; it captures the spatial interactions among neighborhoods as prior knowledge to guide classification.For instance, Elia [24] proposed a MRF model based on the regional and statistic texture; Voisin [25] put forward a multilayer MRF model based on texture information.However, with only local potential relationships considered in MRF, the connections among all observation data are ignored.As an extension of MRF, CRF is a discriminative model; it directly models the posterior probability of the sample labels.For example, Su et al. [26] construct a regional connected model to classify SAR images.Zhang et al. [27] integrate the class margin constraints into the CRF model to classify the spectral-based images.Ding [28] designs the unary potential, regional context potential and pairwise potential to improve the classification accuracy.As for the BN, which is a directed graphical model, it captures the causal relations among random variables.With the latent conditional independence, the joint probability can be factorized into several products of local conditional probabilities, which is a benefit to simplify the constructed model.

Motivation and Contributions
Although the multi-level CRF and BN model are adopted by those methods described above, it mainly focused on modeling either the spatial relationships or the causal dependencies alone.
The hierarchical connections, namely the causal links between adjacent layers and the spatial relations within a sub-layer, are not combined together.The combination of a priori knowledge and image data itself plays an important role in performing robust and effective SAR image classification.To exploit this potential contextual information to improve classification accuracy, this paper presents a hierarchical classification framework based on the multilayer Bayesian network [29] and conditional random field.
The main contribution of this paper is that those causal dependencies between adjacent layers are incorporated into the proposed approach; what this means is that additional prior knowledge is exploited to improve classification accuracy.In SAR image classification, farmland, for example, is often far from urban areas, and therefore, the probability of a sub-region to be labeled as farmland, conditioned on its parents' region, which is the urban area, is often small.Consequently, those existing dependencies between adjacent layers can be thought of as a priori information; the objective of the proposed method is to exploit this prior knowledge to improve the classification accuracy.Figure 1 illustrates a comparison between the proposed and the traditional methods.Compared with the traditional approach, the causalities between adjacent layers, as well as the spatial relationships within each layer are modeled by multilayer BN and CRF, respectively.The rest of this paper is organized as follows.Section 2 presents an overview of the proposed approach.Section 3 introduces the construction of a region pyramid by means of multiscale oversegmentation.Section 4 details how to build a semantic region pyramid by CRF and BN.Section 5 provides experimental results and the analysis.Finally, the conclusions are given in Section 7.

Overview of the Framework
The framework of the proposed classification method is illustrated in Figure 2. It mainly consists of three parts, i.e., the multiscale segmentation, the CRF and BN modeling.
Multiscale segmentation: Given an input image, the multiscale segmentation is performed by adjusting an adaptive threshold, where the input image is partitioned into coarser and finer regions, namely superpixels.Then, a region pyramid is constructed by arranging the coarser oversegmentation on the top layer, whereas the finer one at the bottom layer.In this paper, a region pyramid with three layers is formed.CRF modeling with boundary prior knowledge: With the above constructed region pyramid, the CRF is used to capture the spatial relationships among regions within each layer of the region pyramid.Moreover, the boundary prior knowledge is exploited and incorporated into the CRF model.With this additional prior information, it is equivalent to setting constraints to improve the classification accuracy near the boundaries.
Semantic pyramid construction by BN: Though the spatial relations among those regions within each layer can be effectively captured by the CRF model, the connections between adjacent layers are not explored and utilized.In order to establish these connections, BN is used to model the contextual information between adjacent layers of the region pyramid, and therefore, a semantic pyramid can be formed.Here, the BN models the causal relationships between those regions in the upper layer and its sub-regions in the lower layer.The causalities are referred to as the classification probabilities of those sub-regions conditioned on their parents' regions, where the causalities are obtained from the statistical results of CRF labeling.In addition, the natural existing relationships among the regions, edges and vertices are also modeled in BN, where the edge is formed by two regions with different classes, and the vertex corresponds to the intersection of edges.In other words, more prior knowledge is integrated into the proposed classifying framework to enhance classification performance.

Multiscale Segmentation
In this paper, we construct a region pyramid with three layers, where each layer contains the oversegmented regions of the input image.To construct such a region pyramid, edge detection is first performed, and then, those extracted edges are combined to form several closed regions.The last step is multiscale segmentation; the finer and coarser regions can be obtained by adjusting a threshold.

Edge Detection
Given an image I, in order to predict an edge with orientation θ at location (x, y), the image is first transformed into three feature channels, including intensity, polarization and texture, then the gradients for these feature channels are calculated.Finally, an improved detector gDet (x, y, θ) is designed by combining those gradients for edge detection.
Intensity, polarization and texture gradients: An edge corresponds to the change between neighboring areas in an image, and the gradient describes this characteristic.In order to extract the gradient, a circular disc with radius σ is placed at location (x, y); this disc splits the local region into two sub-regions g and h, and the orientation is defined by a diameter at angle θ.The magnitude of gradient G (x, y, θ) at (x, y) is defined as the histograms' chi-square distance between the two sub-regions, which is given by: where g(i) and h(i) denote the i-th element in the histograms of the sub-regions g and h, respectively.
Here, three types of feature are extracted, i.e., multi-channel intensity, polarization and texture, and then, the gradient is calculated in each feature channel independently.
Multiscale linear combination: To detect the desired fine and coarse structure information, the gradient of each feature channel is computed at three scales: [σ/2, σ, 2σ].For the intensity, σ = 5 pixels; as for polarization and texture, σ = 10 pixels.Then, these extracted gradients are linearly combined together to form mDet (x, y, θ), which is defined as: where s and i denote the scale and feature channel; G i,σ(i,s) (x, y, θ) is the gradient at location (x, y) in channel i with a radius σ(i, s) and orientation θ; a i,s represents the weight for each gradient.In this paper, eight orientations are sampled in interval [0, π) with equal space, and the maximum mDet (x, y) over orientations is defined as the boundary response at location (x, y), which is given by: Curve and edge detection: Spectral clustering [30] is applied to image segmentation.A sparse symmetric incidence matrix W is constructed by the maximum of mDet; the elements of W measure the similarity between pixels.The normalized cuts algorithm is used to solve the standard feature vector; a descriptor with n-dimensions is extracted for each pixel, and the clustering algorithm, such as k-means, is employed to segment the image.In order to obtain the correct partition, a convolution step is concluded, where Gaussian-oriented derivative filters are used at multiple orientations.Therefore, the spectral boundary detector sDet (x, y, θ) is defined as: where ∇ θ v k (x, y) denotes the oriental signal and λ k is the k-th eigenvalue.Notice that mDet carries edge information, while sDet conveys the curve information in an image.In order to extract both the edge and curve information, a linear combination of mDet and sDet is constructed, and then, an improved detector gDet is given by: where β i,s and ω can be learned during the training procedure.For more details, please refer to the original paper [30].

Region Pyramid Construction
Here, the region pyramid contains three layers, where these layers, from top to bottom, correspond to the undersegmentations and oversegmentations of the initial image, respectively.In order to construct such a region pyramid, a multiscale segmentation process is required; the initial image is partitioned recursively by adjusting a threshold; and a set of fine and coarse segmentations, namely regions, can be produced and used to form the region pyramid.
The Oriented Watershed Transformation (OWT) [30] is first used to construct an initial segmentation; the output of OWT is the finest partition for the region pyramid.Then, a sequence of the Ultrametric Contour Map (UCM) [31] can be produced by adjusting the threshold, where the UCM describes the strength of the curve boundary, namely the probability of being a true contour.Consequently, the oversegmentations and undersegmentations can be obtained; Figure 3 shows an example of multiscale segmentation.Finally, the region pyramid can be constructed by arranging the coarsest UCM at the top layer, whereas the finest UCM at the bottom layer.

Semantic Pyramid Construction by CRF and BN
With the region pyramid constructed above, prior knowledge is exploited to form a semantic pyramid.The prior knowledge consists of the spatial relationships among those regions within each sub-layers and the causal connections between adjacent layers, which are modeled by CRF and BN, respectively.

CRF with Boundary Prior Knowledge
CRF is a discriminative model, and it can effectively capture the spatial relationships among random variables.In this paper, the boundary prior knowledge is also considered and integrated into the CRF model to improve the classification accuracy of such regions near the boundary.
Standard CRF: As shown in Figure 4, the image is oversegmented into several regions, and CRF is used to model the spatial relationships among those regions; here, the regions correspond to the nodes in CRF.Given an image I, let x denote the observation data, namely the pixel, and y is the label to be assigned.Then, let I = {x 1 , x 2 , • • • , x M } represent image segmentation, where x i denotes the i-th oversegmentation, namely a superpixel (a collection of some pixels), i ∈ S = {1, 2, • • • , M}.The posterior probability P (y|φ (x)) is given by: where j is a superpixel in the spatial neighborhood N i of i and A i and I ij denote the unary potential and pairwise potential, respectively.The feature function φ i (x) maps the i-th oversegmentation data x i to the feature space, and µ(φ i (x), φ j (x)) is the feature vector of pairwise block (i, j).Z (x) is the partition function, which is defined as: CRF with boundary prior knowledge: During CRF inference, the pixels within a region are assigned to the same label since such a region is considered as a homogeneous block, and the adjacent regions are often labeled as different categories.However, those adjacent regions may be of the same label.To overcome this limitation, the boundary prior knowledge is incorporated into the CRF model as a constraint to improve the classification accuracy near the boundaries.
The boundary prior knowledge refers to the distance between the boundary and pixel or superpixel.If the label of the current pixel is the same as its neighborhoods, the greater the distance between the pixel and boundary, the weaker the relationship between them.Otherwise, this boundary prior information can be ignored.Here, the boundary is the strong response of the watershed transform algorithm.Given a curve Ω, the distance between pixel p and Ω is defined as the minimum Euclidean distance between p and the points on Ω, which is given by: where p is the nearest point to p and x p , y p and x p , y p are the associated coordinates, respectively.Based on the above definition, the boundary prior knowledge can be obtained from the labeled image, where the labeling procedure is implemented by CRF.In order to define the conditional probability with the boundary prior knowledge, let Ω = {Ω 1 , • • • , Ω H } denote the H nearest lines to pixel p i ; R h represents the h-th region; and R h contains the line Ω h , where h = 1, 2, • • • , H.Then, the conditional probability with boundary prior knowledge of p i labeled as c can be given by: where γ is the parameter for boundary prior knowledge and can be learned by the training process, c denotes the original label by CRF and t c, p i , R j and F dis p i , Ω j are defined as: where F dis p i , Ω j is the potential function, threshold λ is used to normalize the distance dis p i , Ω j within interval (0, 1) and t c, p i , R j ensures p i to be labeled the same as its neighborhoods.
With the above definition, the Equation ( 6) can be re-written as: where P Pi and P Li denote the prior and likelihood terms in Equation ( 6), respectively.The P BP (y i |φ(x i ), Ω, c ) is given by: where x i denotes a superpixel and p i is a pixel within x i .

Multi-Layer Bayesian Network
The causal connections between adjacent layers are modeled by BN; these connections are referred to as the classification probabilities of the sub-regions in the lower layer conditioned on their parents' regions in the upper layer.In addition, the causal dependencies among regions, edges and vertices are also taken into account and modeled by BN.
Causal connections between adjacent layers: As described in Section 2, the regions in the top layer can be further divided into several sub-regions in the lower layer by means of multiscale oversegmentation.Note that the labels to be assigned to those sub-regions do have some dependencies on their parents' regions in the top layer.In terrain classification from SAR images, for example, the roads often intersect within the buildings and are far from the forests; the farmland is often far from the urban areas, and so on.These dependencies can be thought of as additional contextual knowledge to link the adjacent layers and can be exploited to improve classification accuracy.
The labeled probability of a sub-region conditioned on its parent region can be defined as the following conditional probability: where y d and y u denote the labels of the sub-region in the lower layer and its parent region in the upper layer, and the corresponding assigned labels are c i and c j , respectively; ϕ ij is the conditional probability.This conditional probability can be obtained from the statistical results of the CRF labeling.
For SAR image classification, if there are N classes, a Conditional Probability Table (CPT) with N × N dimensions can be used to describe those causalities.Figure 5 illustrates the causal connections between adjacent layers, where the causalities are constructed by those classification probabilities of sub-regions in the lower layer conditioned on their parents' regions in the upper layer.Regions', edges' and vertices' relationships: The relationships among regions, edges and vertices refers to such contextual knowledge, which has been extensively exploited by humans.The edge is formed by the intersection of regions with different labels; the vertex is from the intersections of edges.Figure 6 illustrates the causalities among regions, edges and vertices; edge e 2 , for example, is formed by regions y 3 and y 4 , and therefore, y 3 and y 4 are the parents nodes of e 2 .These contexts can be modeled by conditional probabilities in BN. , where m and l denote the numbers of edges and vertices.There are two states to be assigned to the edges and vertices, namely true or f alse represented by one and zero, respectively.BN models the causal relationships among the oversegmentation regions {y i } n i=1 , the edges e j m j=1 and vertices {v t } l t=1 .For an edge, it is formed by the intersection of two regions, which are defined as the edge's parent nodes.Consequently, if the labels of the parent nodes are different, it is likely that there is a true boundary existing between the two regions, namely e j = 1.The conditional probability can be defined as: P e j = 1|pa e j = 0.8 labels of parent nodes are different, 0.2 others.(14) where pa e j denotes the parent node of edge e j .As for the vertex, its parent nodes are those intersecting edges.In this paper, a vertex is formed by at least three edges.Therefore, the conditional probability between the vertex and its parent nodes can be defined as: 0.7 more than two edge nodes are true, 0.3 others.(15) where pa (v t ) is the parent node of vertex v t .
Let y, e and v represent all of the regions {y i } n i , edges e j m j and {v t } l t , respectively, then the image classification can be performed by inferring the optimal states y * , e * and v * : where P (y, e, v) is the joint probability.With the assumption of conditional independence in BN, the joint probability can be expressed as the product of local conditional probabilities, which is given by: where P (y i ) is the prior probability of regions.With the consideration of insufficient training samples in practice, here uniform distribution is used to model the prior probability.With the causal relationships of the adjacent layers modeled by multilayer BN, a semantic pyramid is built from the region pyramid constructed in Section 3. The joint probability of the multilayer BN is given by: P (y, e, v, x) = ∏ <u,d>∈Υ where < u, d >∈ Υ denotes the combination of the adjacent layers; the numbers of classes, edges and vertices in the second layer of BN are represented by n , m and l , respectively.

Unified Inference Model for CRF and BN
Since both CRF and BN are included to model prior knowledge, to perform a global inference, the Factor Graph (FG) is used to represent the CRF and BN in a unified framework.In FG, the global joint probability is described by a set of factorization, where the random variable node is represented by a circle and the factor node by a solid square; if and only if two variables are relevant, there exits a connection between them.
With the assumption of the global Markov property in FG, the joint probability of each layer in semantic pyramid P (y, e, v, x) can be factorized by: P (y, e, v, x) = P (e, v|y, x) P (y, x) = P (v|e) P (e|y) P (y|x) P (x) (19) where P (v|e) and P (e|y) denote the causalities among edges, vertices and labels in BN, namely Equations ( 14) and ( 15), respectively; P (y|x) represents Equation ( 6), and P (x) is constant because x is observed.
Note that the constructed semantic pyramid consists of a multilayer, where the causal connections between adjacent layers are modeled by BN, and therefore, the global joint probability P (y, e, v, x) can be expressed by: P (y, e, v, x) = ∏ <u,d>∈Υ P u (y, e, v, x) P d (y, e, v, x) P y d = c|y u = c where P u (y, e, v, x) and P d (y, e, v, x) denote the probabilities of the upper and lower layer, respectively, and these two probabilities can be factorized like Equation (19).With the above joint probability, several methods can be used to implement the maximum probabilistic inference.In this paper, Stochastic Local Search (SLS) [32] is employed to perform Most Probable Explanation (MPE) reasoning, and the result can be given by:

Experiment Data
To evaluate the performance of the proposed classification method, several experiments are conducted on ESAR and TerraSAR-X images.(1) As shown in Figure 7a, the ESAR image, acquired in Germany, has a dimension of 1300 × 1200 pixels and a spatial resolution of 3 m × 2.2 m. (2) The second data, shown in Figure 7b, are acquired by TerraSAR from Wuhan, Hubei province, China; the dimension and spatial resolution are 1500 × 1500 pixels and 1.25 m × 1.25 m, respectively.We use ArcGIS to label the experiment images into five categories, i.e., building, forest, farmland, road and others.

Experiment Settings
As shown in Table 1, three types of features are extracted, i.e., intensity, polarization and texture features.For the intensity, the Haar and Grey histogram are included; the polarization feature consists of Pauli, SDHand Huynen; as for the texture, a filter bank consisting of 17 Gaussian filters is used to capture the texture feature.Notice that the TerraSAR image, used in our experiment, is of single polarization, and the extracted features only contain the intensity and texture.

Gaussian filters 17
Five experiments are designed for performance comparison, and the settings are described as follows.Experiment 1: classification based on the CRF model alone: Firstly, 3 different thresholds are selected to construct a region pyramid.For the ESAR image, 0.2, 0.08 and 0.05 are set, and the corresponding region blocks are 1468, 3016 and 6134, respectively.For the TerraSAR image, the thresholds are set as 0.11, 0.08 and 0.06, and the segmentation blocks are 1898, 3629 and 7034, respectively.Secondly, half of those regions are used as training data and the others for testing.Thirdly, the spatial relationships of regions within each sub-layer of the region pyramid are model by the CRF.L-BFGS [33] is applied to estimate the parameters of the CRF model during the training process, and the min-sum algorithm is used for inference.
Experiment 2: classification based on CRF and boundary prior knowledge: The contour lines are detected by the method gDet-OWT-UCM, described in Section 3. To obtain the boundary prior knowledge, here those strong responses in UCM are retained, as shown in Figure 8.The boundary prior knowledge is calculated by Equation ( 8), and the corresponding linear weighted parameters are learned by training process.
Experiment 3: classification combining CRF with the BN model: The CPT is shown in Table 2; Classes 1 to 5 correspond to the buildings, forests, farmland, roads and others.These conditional probabilities are set according to the results in Experiments 1 and 2. For ESAR data, for example, the probability 0.15, in Row 2 Column 3, represents the classification probability of farmland in the lower layer, conditioned on the forests in the upper layer.[34] is used to classify the oversegmentation regions; the settings for the feature extraction are the same as Experiment 1. Half of the obtained regions are randomly selected for testing data and the others for testing.
Experiment 5: CRF model with mean shift and Potts prior: To verify the segmentation efficiency of gDet-OWT-UCM, the CRF model is built based on the mean shift segmentation and Potts' prior [35].The minimum block size is 200 × 200, and the obtained regions are 3370 and 3955 for ESAR and TerraSAR data, respectively.Other settings for CRF are the same as Experiment 1.

Results and Analysis
Figures 9 and 10 qualitatively illustrate the classification results of each experiment on ESAR and TerraSAR images.To further quantitatively compare the performance, Tables 3 and 4 list the confusion matrices of the classification results.
(1) For the ESAR data, the average classification accuracy of CRF model alone [36] is about 70.7%, as shown in Figure 9b.When the boundary prior knowledge is incorporated into CRF model, the average accuracy is up to 72.8%, the classification performance is improved about 2%, especially those areas around the boundary, as shown in Figure 9c.The performance of the proposed method is more promising, as shown in Figure 9d.Compared with the CRF model alone and CRF with boundary prior approaches, the accuracy is improved about 6% and 3.8%, respectively.This is because of the additional contextual knowledge, namely the causal dependencies between adjacent layers are integrated into the proposed classification framework.Figure 9e,f presents the results of SVM [34] and the CRF model based on mean shift [35]; the comparable performance demonstrates the effectiveness of the segmentation algorithm used in this paper.(2) Figure 10 displays the results on the TerraSAR image.The average classification accuracy of the proposed method is about 81.2%, which is better than the CRF model alone, 73.0%, and CRF with boundary prior, 76.1%.These experimental results also demonstrate that the incorporation of additional prior knowledge, namely the causal connections modeled by BN, is a benefit to the enhancement of classification performance.Moreover, note that the accuracy of CRF with boundary prior knowledge is improved about 3% compared with the CRF model alone, and the recognition ability on those sub-regions, e.g., forest, buildings, is improved effectively, which verifies the effectiveness of the incorporation of the boundary prior knowledge.(3) Since the causal relationships between adjacent layers, as well as the boundary prior knowledge are integrated into the proposed method, the computational cost of our method is relatively higher than those methods used for comparison purposes.

Discussion
To achieve a finer classification result, we incorporate more prior knowledge into our classification framework by constructing a multilayer region pyramid, where CRF and BN are used to model the spatial relationships and adjacent causal connections.To further improve the accuracy and to apply the proposed method in practical applications, the following aspects should be taken into account.
(1) The number of layers for constructing a region pyramid plays an important role in performance enhancement, and how to select the optimal number of layers is an issue that should be further studied.In theory, the more layers we select, the higher the accuracy that will be achieved.However, an optimum selection is intractable because we should ensure the existence of the classification probabilities of those sub-regions conditioned on their parents' regions in the upper layer.Therefore, this issue should be further studied for the enhancement of the performance.(2) There are several hyperparameters in the process of oversegmentation, e.g., the scale for the extraction of local edges, etc.These hyperparameters, which like the concept of receptive field in the deep learning community, have some impact on the classification performance.However, how to select an optimum setting still needs to be addressed.(3) Since there are several parameters that should be learned in our proposed method, a higher computational cost should be paid.Consequently, we should make a tradeoff between the computational cost and classification accuracy, especially for the selection of the number of layers in forming a region pyramid.(4) Combining more prior knowledge with the image data itself is a benefit to the accuracy improvement.Therefore, more prior knowledge is encouraged to be incorporated into this classification framework to further performance enhancement.(5) The overfitting problem should also be considered in the case of insufficient training samples.
In this paper, a uniform distribution is used to model the prior probability.To further improve the generalization, other strategies, like the solution in [37], should be taken into account.

Conclusions
This paper has presented a hierarchical classification method for SAR images.In contrast to existing multiscale approaches, which concentrate on the development of contextual information within a sub-layer, the proposed classifying framework explores the causal connections between adjacent layers as prior knowledge to facilitate robust and effective classification.To integrate the causalities, a multilayer region pyramid is first constructed by multiscale oversegmentation, and then, both the Bayesian Network (BN) and Conditional Random Field (CRF) are used to model prior knowledge, where CRF models the spatial relationships among regions within sub-layers and BN captures the causal connections between adjacent layers.Experiments conducted on real SAR images demonstrate the superiority of the proposed method, and the accuracy is improved by about 3%.

Figure 1 .
Figure 1.Comparison between the traditional and the proposed approaches.(a) Traditional single-layer model; (b) traditional multilayer model; CRF models the spatial relationships among those regions within each layer; (c) the proposed method combines CRF with BN into a unified framework, where BN captures the causal dependencies between adjacent layers, and therefore, more prior knowledge is exploited to guide classification.

Figure 2 .
Figure 2. Framework of the proposed method.(a) Input SAR image; (b) feature extraction, including intensity, polarization and texture features; (c) edge detection; (d) region pyramid construction based on multiscale segmentation; (e) conditional random field with boundary prior knowledge; (f) the region pyramid modeled by CRF; the spatial relationships are formed by CRF; (g) causal connections between adjacent layers are captured by the Bayesian network; (h) semantic pyramid based on the CRF and BN model; (i) classification output.

Figure 3 .
Figure 3. Multiscale segmentation under three different thresholds.(a) The input SAR image; (b-d) the segmentation results by adjusting the threshold k.The lower the threshold is set, the more sub-regions are detected.

Figure 4 .
Figure 4.The spatial relationships are modeled by the conditional random field.The image is first partitioned into several regions, then the CRF is used to model the spatial relationships among those regions, where the node y represents a region, while x denotes the whole image.

Figure 5 .
Figure 5.The causal connections between adjacent layers are modeled by the Bayesian network.The input image x is first coarsely classified into labels y u 1 , y u 2 , y u 3 in the upper layer, and then, those regions in the upper layer are finely labeled into y d 1 , y d 2 , y d 3 , y d 4 , y d 5 in the lower layer, where the classification probabilities of the sub-regions in the lower layer, conditioned on their parents' regions in the upper layer, are denoted by black squares.

Figure 6 .
Figure 6.The causalities among regions, edges and vertices.An edge is formed by the intersection of the regions with different labels, and these regions are the parent nodes of the edge.The vertex is formed by edges, and these associated edges are the parent nodes of the vertex.The BN model constructed here is based on the oversegmentation edge graph; the edge graph consists of edges e j m j=1 and vertices {v t } l t=1, where m and l denote the numbers of edges and vertices.There are two states to be assigned to the edges and vertices, namely true or f alse represented by one and zero, respectively.BN models the causal relationships among the oversegmentation regions {y i } n i=1 , the edges e j

Figure 8 .
Figure 8. Contour lines for boundary prior knowledge.

Figure 9 .Figure 10 .
Figure 9. Classification results of ESAR image.(a) The ground truth; (b-d) the results by the CRF model alone, CRF with boundary prior knowledge, CRF and the BN model, respectively; (e,f) the results by SVM and CRF with mean shift.

Table 1 .
Three types of features are extracted for the experiment.

Table 3 .
Confusion matrix for the classification results of ESAR data.

Table 4 .
Confusion matrix for the classification results of TerraSAR data.