Exploiting SAR Tomography for Supervised LandCover Classification

In this paper, we provide the first in-depth evaluation of exploiting Tomographic Synthetic Aperture Radar (TomoSAR) for the task of supervised land-cover classification. Our main contribution is the design of specific TomoSAR features to reach this objective. In particular, we show that classification based on TomoSAR significantly outperforms PolSAR data provided relevant features are extracted from the tomograms. We also provide a comparison of classification results obtained from covariance matrices versus tomogram features as well as obtained by different reference methods, i.e., the traditional Wishart classifier and the more sophisticated Random Forest. Extensive qualitative and quantitative results are shown on a fully polarimetric and multi-baseline dataset from the E-SAR sensor from the German Aerospace Center (DLR).


Introduction
Since a single channel SAR image does not contain a sufficient amount of information to enable accurate land-cover classification [1], most work focuses on multi-channel SAR consisting of images of multiple polarizations, frequencies, or taken at different times.Those classification approaches can be coarsely divided into two groups depending on whether reference data is used (i.e., supervised) or not (i.e., unsupervised).Unsupervised methods aim to link the measurement within the image pixels to the underlying backscattering process, which is particularly successful for polarimetric SAR (PolSAR) through widely used polarimetric decomposition theorems [2,3].In many applications, however, user-defined semantic classes are of interest which are (potentially) related but not identical to the physical phenomenology of the electromagnetic backscattering according to which those theorems categorize pixels.This can only be achieved by supervised approaches that map pixel values to semantic labels (such as city, road, forest) instead of physical categories (such as surface, double-bounce, or volume scattering).Such machine-learning-based and data-driven classification methods have shown a tremendous progress for the task of land-cover classification.The Wishart classifier [4] is a simple and classical example which can be used in supervised as well as unsupervised classification scenarios [5].While early approaches are merely based on physical characteristics, later works (e.g., [6][7][8]) showed that the usage of other features leads to improved classification results.Since it is often unknown which features are most informative for a given task, recent works either aim to extract a quasi-exhaustive set as input to a classifier with built-in feature selection (e.g., Random Forest (RF) [9]), or incorporate the feature extraction into the optimization problem of the classifier itself (e.g., by RFs [10] or Convolutional Networks [11,12]).Other examples of multi-dimensional SAR data that have been used for classification purposes are multi-frequency SAR (e.g., in [13,14]), multi-temporal SAR (e.g., [15][16][17]), and the combination of polarimetric and interferometric information (e.g., in [18][19][20]).
Tomographic SAR (TomoSAR, see Section 2) aims at providing a 3-D image of the reflectivity of a scene by creating a synthetic aperture in elevation from a stack of SAR images acquired with slightly different angles to retrieve the backscattering profile within each image pixel [21].It has been successfully used for the reconstruction of the vertical structure of forest [22,23], ice subsurface imaging [24], and the reconstruction of urban objects (e.g., in [25,26]).While most works focus either on the generation of TomoSAR data or on its geometric processing, e.g., regularization [27], geometric primitive extraction [28,29], or object modelling [30], only a few works address semantic analysis.Recently, TomoSAR descriptors have been proposed in [31] for the classification of forest development stages.
More traditional approaches for land-cover classification focus on polarimetric properties of the measured echo including features describing its local variation.In particular targets that have a characteristic backscatter intensity or polarimetric behavior can be successfully distinguished by corresponding methods.However, polarimetric features often depend on the relative orientation between object and sensor (e.g., double-bounce) and can lead to similar signatures of targets belonging to different semantic classes (e.g., volume scattering may be caused by oriented buildings making them indistinguishable from forest).Tomographic SAR data on the other hand provides a rather geometric interpretation of the scene by providing information about the underlying 3D structure, i.e., number and location of scatterers in the elevation direction.The average height of a region is an obviously highly informative feature to distinguish between classes with very different height levels, e.g., between grassland, fields, roads on the one hand and forest, urban areas on the other hand.Besides the average height, local fluctuations of the vertical reflectivity can be a helpful feature, even in the case where absolute height differences lie below the vertical resolution of the TomoSAR data.
First studies on the potential of TomoSAR for general land-cover classification, have been proposed in the conference works [32,33].This paper builds upon these previous works by proposing a more in-depth evaluation of the results as well as a more detailed presentation of the methodology.We propose features which are adapted to the description of TomoSAR data.They are either based on the tomographic covariance matrix or on the extracted tomograms obtained after tomographic focusing.The descriptive power of those features is evaluated on a land-cover classification task.
Rather than proposing a novel classification pipeline based on TomoSAR data, we aim to investigate whether TomoSAR data have a potential to be used for land-cover classification tasks at all.Consequently, we do not limit the set of features based on prior assumptions but rather aim to analyze a large and diverse feature set.Moreover, we want to study the possibility of using TomoSAR for this task without any polarimetric information, i.e., when no quad-or dual-pol dataset is available.To be able to set the obtained results into context, we compare the results to those obtained by either other classification techniques or other data, i.e., polarimetric SAR images.
The paper is structured as follows: Section 2 introduces the basic principles of TomoSAR focusing.Section 3 describes the used methodology.Section 4 shows a quantitative evaluation of the proposed features on experimental E-SAR (DLR) data and compares them with polarimetric features.Section 5 provides a more in-depth interpretation and discussion of the results.Finally, Section 6 concludes the paper.

Signal Model
SAR imaging performs a 2-D projection of objects contained in a 3-D space.Thus, the contributions of scatterers located at different elevations in the resolution cell are superposed.Tomographic SAR (TomoSAR) is an extension to multiple baselines of traditional interferometric techniques which allows to reconstruct the backscattering profile along the elevation direction.To this aim, several images acquired at different view angles are combined.The principle of TomoSAR is illustrated in Figure 1.Here we consider a simplified signal model describing the process of superposition of targets along the elevation coordinate s in a complex multi-dimensional pixel u.The k-th component u k is expressed as where γ(s) is the complex reflectivity along the elevation, φ k (s) is the phase corresponding the sensor to target distance and k varies from 1 to K, where K is the number of images.

3-D Imaging by Tomographic Inversion
In the case of most natural targets such as tree canopy and rough surfaces, it is common to model the reflectivity as the result of many elementary scatterers which are randomly distributed inside the resolution cell.The interference of these many scatterers results into the so-called speckle phenomenon and the data has to be modelled in a statistical framework.The goal of TomoSAR is to estimate the backscattered power: where E denotes the expectation operator and γ(s) was defined in Equation (1).For this purpose, the Covariance Matrix (CM) where † represents the Hermitian transpose, has to be computed to perform the tomographic inversion.
In practice C is unknown and has to be estimated from the image stack by spatial averaging assuming the ergodicity of the signal in a spatial neighborhood.Although a theoretical minimal number of looks is required to avoid rank deficiency of the covariance matrix, it is a common practice to pre-filter the data with a boxcar to reduce the variance of the estimate, at the cost of spatial resolution.To avoid degrading the resolution, specific adaptive filtering methods have been developed.In this paper, we used the boxcar filter for the sake of simplicity and redirect the interested reader to [34] for a study on covariance filtering for TomoSAR.Then, from the estimated sample covariance matrix C, the power distribution P(s) may be estimated by common array processing techniques such as Fourier beamforming: where a is a synthesized steering vector of phasors corresponding to the sensor-target distances.Fourier beamforming is well known to perform poorly in unevenly sampled baselines and the Capon estimator is often preferred, due to its side-lobe cancelling properties: The interested reader may refer to [35] for a more thorough description of these standard array processing techniques.
In this paper, we have used the Capon estimator to estimate the reflectivity profiles called tomograms.Figure 2 shows a visualization of the obtained results over a whole scene.Although more sophisticated methods such as Compressive Sensing (CS) [36] allow better performance, our goal is not to achieve the best possible reconstruction of P(s) but rather to obtain tomograms which carry a sufficient amount of 3-D information to be able to discriminate the different classes we want to retrieve.

Pixel-Based Features
Firstly, we consider features that are calculated from the value of a single pixel.As mentioned previously, the data can be represented either as covariance matrices (CMs) or the tomograms resulting from tomographic focusing and containing the estimated backscattered power as a function of the elevation.

Covariance-Based Features
The most straightforward way to obtain a feature vector from a CM is to stack its values into a vector.We use a RF implementation in this work that takes real-valued vectors as input.Therefore, accounting for the Hermitian structure of the CM, the first "raw" feature set is obtained by stacking the real diagonal elements and the real and imaginary part of the complex elements above the main diagonal.
The elements of the CM can be interpreted in a more physical way: The diagonal elements C ii correspond to the backscattered power in each image whereas the off-diagonal elements can be decomposed as where ρ ik is often called coherence and Φ ik is the relative phase between image i and image k.
ρ ik is linked to the amount of noise on the phase and, for TomoSAR, depends on the geometric and temporal baselines [38] as well as the intrinsic properties of the imaged scatterers [39].In the case of TomoSAR data, the relative phase depends on the distances between the targets and the sensor.Its interpretation is not straightforward since the responses of several targets may be combined in a single pixel.However, coherence and phase may be useful to discriminate targets in a classification context.Therefore, we propose to use a vector formed of intensities (i.e., diagonal elements), coherences and phases for the description of the CM.For comparison, the same features will be extracted from the PolSAR CM (see Section 4.2) and evaluated separately.

Tomogram-Based Features
Tomograms contain the power distribution along elevation, which is easier to interpret for a human than features extracted from the CM.In this work, we want to investigate the potential of using this information as an input for land-cover classification.As for CMs, we first define a "raw" feature vector simply composed of all the values of the tomogram for the considered pixel.This simple approach may present several flaws: Firstly, the dimension of the vector is large while only a few values may be relevant.This is typically the case for bare soil, where only a few coefficients are nonzero.Secondly, due to the presence of decorrelation (i.e., low coherences) noise may corrupt the tomogram.Consequently, it may be useful to define statistical descriptors which summarize the structure of the tomogram, resulting in a lower dimensional vector.We therefore define a pixel-based feature set containing the following features: Minimum, maximum, and median values of the tomogram reflectivity, centered statistical moments up to order 10, as well as the value and position of the 10 highest peaks (i.e., local maxima) of the tomogram.Additionally, we complement this set with more features carrying more complex information: The number of "nonzero" values, i.e., the number of coefficients of the tomogram above a threshold corresponding to the background noise and determined experimentally, is extracted.This gives a description of how "dense" the tomogram is.Finally, we extract the Shannon entropy of the normalized tomogram as well as the coefficient of variation CV = σ t /µ t where σ t and µ t are the standard deviation and mean of the tomogram, respectively.

Spatial Features
To introduce spatial context into the features, we introduce three types of spatial features: Region descriptors, which are describing the properties of 2-D homogeneous areas; textural descriptors, which describe the 2-D texture of the intensity; and 3-D features which are tailored to describe the local volumetric structure of the tomograms.

Region Features
Due to the random nature of physical media, most of the intensity of a SAR image is affected by speckle.Moreover, several types of decorrelation cause noisy phases, which in turn affect the quality of the tomograms.This results in noisy classification maps.To gain robustness against this effect, while preserving the spatial details of the scene spatial structure, we define a first type of spatial features, where the properties of pixels are averaged inside small 2-D segments.These segments are obtained by applying a superpixel segmentation algorithm to the intensity of the image.Here, we use the SLIC (Simple Linear Iterative Clustering) algorithm [40] which was originally developed for natural grayscale and color images.This algorithm performs a local clustering of the pixel value and spatial coordinates in the neighborhood of "seed" pixels which are sampled on a grid.To use this algorithm with SAR intensities, we first apply to the CM image a multi-dimensional speckle filter called NDSAR-BLF [34] based on the non-local filtering principle.From these filtered CMs, the intensity is computed, and this image is segmented into superpixels.This process is illustrated in Figure 3.
In the following, we take advantage of the superpixel segmentation to define features on image intensity as well as statistics on the tomograms.We have tried several statistics of regions including ones describing the shape of regions, such as geometric moments, eccentricity, etc.However, these shape features were not as informative as simpler descriptors such as minimum, maximum, average, and standard deviation of the intensity image.
For tomograms, the features are the 2-D average over each region of the values and positions of the four highest peaks of the reflectivity.We also add the maximum, minimum and median value of the tomogram inside the region.
Although superpixels may miss some edges in the image, the potential losses of spatial resolution are compensated by the combination with the above introduced pixel-based features.

Texture Features
Most SAR images exhibit spatial patterns which characterize the different classes.This is especially true for heterogeneous environments such as urban areas or forest.To capture these textural variations, we propose to add feature vectors made of intensity values contained in a N p × N p square patch where N p represents the azimuth and range dimensions of the patch in pixels.Although classical texture features such as Grey-level Cooccurrence matrices [41] and Gabor filter [42] are common in classification, we found in our experiments that these simple patches were leading to a better performance in terms of accuracy.

3-D Descriptors
As illustrated in Figure 2, the 3-D reflectivity image produced by all the tomograms contains rich information about the spatial structure of the different classes.Exploiting this information is more difficult than for 2-D images due to the higher dimensionality of the data and the sparsity of the tomograms.In fact, for some classes such as fields and streets, only a few coefficients of the tomograms are relevant.Therefore, considering all the possible cubic patches around each voxel in the 3-D image as a feature would lead to an inefficient description and would require too much memory.Thus, it is necessary to find descriptors that summarize the local neighborhood information for each pixel.We have first tried to undersample patches on a grid to reduce the dimension of the descriptors and to average the values along the elevation leading to an "average texture" descriptor.However, this did not lead to an improvement of the classification performance.Similarly, gradient-based descriptors such as the 3-D gradient structure tensor [43] did not increase the performance.Therefore, we have opted for geometric moments as a descriptor: where x, y, z represent the azimuth, range and elevation coordinates of a voxel and the summation is performed in a N m × N m × H sliding window, where N m represents the azimuth and range dimensions of the patch in pixels and H is the dimension in voxels along the elevation direction of the tomogram.

Random Forest Classifier
In this work we consider the widely successful Random Forest (RF) classifier for our task.The principle of RF is to combine the output of a large number of simple decision trees to increase the robustness and the generalization power of these classifiers.Each tree is created by performing a certain number of data splits in the feature space.Each split is performed by first sampling a subset of features at random and selecting the dimension leading to the best split in terms of purity of the partition.The original implementation of RF considers the minimization of the Gini impurity measure [44].The trees stop growing when a user-defined minimum number of samples per leaf is reached.The final classification is obtained by averaging the posterior probabilities for each class over all trees and selecting the class leading to the maximum posterior for each data point.
This method has a certain number of advantages compared to other classifiers: Increasing the number of trees does not lead to overfitting the data.Since the splits are done according to a single feature, they are relatively fast to compute.The number of parameters to tune is low and the method does not require a complex optimization procedure which may lead to numerical instability.Since dimensions are considered independently, the method is insensitive to feature rescaling.Finally, RFs are intrinsically performing feature selection, thus discarding irrelevant features.

Data
Our experiments have been performed over the L-band POLTOM dataset from the E-SAR sensor (DLR), acquired over the site of Oberpfaffenhofen in 1998.This dataset has 14 fully polarimetric images.The resolution in range and azimuth are of respectively 2.3 m and 1 m.The pixel spacing is of approximately 1.5 m in range and 0.5 m in azimuth.
For our tomographic experiment, we used only 10 images.This limits the dimension of the covariance matrix and the initial number of looks required to avoid numerical instability in tomographic focusing.We have chosen to use the VV channel for our tomographic study.Although this study could be repeated on other individual channels, visual inspection of the tomograms showed that VV was a good tradeoff since the scattering intensities of the volume components such as tree canopy and the surface components of buildings and ground were both well represented.A presuming of 3 lines in azimuth has been applied to obtain approximately square pixels.Furthermore, a 5 × 5 boxcar filter has been applied to compute the sample covariance matrix, in order to get full-rank matrices and reduce the level of noise in the tomograms at the cost of a loss of spatial resolution.For comparison, PolSAR CMs have been computed with the same amount of multilooking for a single polarimetric image.
According to the works of [21,45] the theoretical vertical resolution of the full dataset should be around 3 m considering a range distance of about 4500 m with an effective baseline of 185 m.Since we have used only the 10 first tracks, this baseline was reduced to approximately 126 m and the corresponding vertical resolution is of approximately 4 m.The horizontal separation of the tracks was of approximately 20 m.However, this separation varies along the azimuth dimension since perfectly parallel tracks could not be achieved in practice.The data has been deramped [21] to remove topographic phases.The Capon method (Equation ( 5)) has been applied over the estimated CM using the VV polarization channel.
Reference data has been obtained by manual labelling.Five classes are considered: City, Field, Forest, Shrubland, and Street.A subset representing approximately 26% of the reference data has been used to train the classifier.This reference data is shown along with an optical image of the area and the corresponding SAR intensity in Figure 4.For each subset of features, the number of trees for RFs has been set to 80, which was the maximum allowed on our desktop PC with 24 Gb RAM.

Description of the Feature Subsets
To evaluate the influence of the different features on the classification, we have defined several subsets for each type of data.The subsets TCM1 and PCM1 are composed of only diagonal elements of the tomographic and polarimetric covariance matrix, respectively.Subset TGR1 is composed of the pixel-based features described in Section 3.1.2.Subsets TCM2 and PCM2 add to TCM1 and PCM1 the normalized cross-correlations coefficients of the resp.TomoSAR and PolSAR CM.PCM3 adds the phases of the off-diagonal elements for the PolSAR covariance.In the case of TomoSAR data, subsets containing phases are considered separately, for reasons that will be discussed in the subsequent section.TGR2, TCM3, and PCM4 combine the pixel-based features of TGR1, TCM2, and PCM3 with region-based features of intensity described in Section 3.2.1.These last subsets are combined with patch features described in Section 3.2.2 to form TGR3, TCM4, and PCM5.The subset TGR3 is combined with region features of tomograms (see Section 3.2.1) to form subset TGR4.Finally, 3-D features of Section 3.2.3 are added to TGR4 to form TGR5. The spatial sizes of patches for texture and 3-D moments are set to 11 × 11 pixels in azimuth and range.All features are summarized in Table 1.
Table 1.Summary of the different feature groups based on polarimetric and tomographic covariance matrices, i.e., PCMx and TCMx, respectively, as well as on tomograms, i.e., TGRx.

Tomographic CM raw
Real and imaginary part of all elements of the TomoSAR CM.TCM1 Diagonal elements of the TomoSAR CM (Section 3.1.1).TCM2 TCM1 + Coherences extracted from the TomoSAR CM (Section 3.1.1).TCM3 TCM2 + Region-based features of intensity (Section 3.2.1).TCM4 TCM3 + Patch features (Section 3.2.2).

Results
Tables 2-4 summarize the quantitative results obtained with the previously described feature sets.For each feature subset, we display the per-class accuracies, the overall accuracy (OA) which is the percentage of correctly classified pixels, and the balanced accuracy (BA) which is the average of the per-class accuracy.BA tends to be lower than OA because it equally weights classes independently of their number of samples.In our case, classes such as Field and Forest, which are relatively easy to classify, tend to dominate OA because of their large extent.BA allows to better reflect the influence of challenging classes such as City and Shrubland.When only intensity (i.e., diagonal elements) or raw data are used, PolSAR classification leads to better results than TomoSAR for both CM and tomogram features.However, with the help of off-diagonal information, TomoSAR CM features outperforms PolSAR.Table 5 shows the influence of adding the interferometric phases to the different feature sets, resulting in alternate subsets TCM2b, TCM3b and TCM4b.When it comes to phase, it is worth noting that adding phases to TomoSAR CM features has a slightly negative impact on classification whereas the contrary holds for PolSAR data (compare Tables 2, 3, and 5).Using only pixel-based features for tomograms leads to poor accuracies.Nevertheless, when combined with spatial features, tomogram classification leads to the best OA and BA.All data types i.e., PolSAR CM, TomoSAR CM, and tomogram largely profit from introducing spatial features.Adding only four intensity-based region features to PolSAR and TomoSAR CM and tomograms leads to jumps of roughly 8% (PCM3 to PCM4), 7% (TCM2 to TCM3) and 12% (TGR1 to TGR2) of BA. 2-D patch features have a higher impact on tomogram features.Furthermore, adding tomogram region features and 3-D tomogram descriptors lead to a BA of 90.6% which is the best score of all feature subsets.In particular, challenging classes such as City and Street and Shrubland, which were not well handled by single-pixel features reach their highest accuracy when combining tomograms and spatial features.It is also noteworthy that designing hand-crafted features is critical for TomoSAR CM and tomogram-based classification.In the case of tomograms, the gain in performance between TGR5 and raw data is more than 17% in BA whereas the number of features drops from 200 to 186.
Table 6 shows the results obtained with the supervised Wishart classifier [46] which is a traditional method adapted to the statistics of the CM.As expected, the Wishart classifier underperforms RF for both PolSAR and TomoSAR CM.This is due to the fact that this method assumes that each class is unimodal, i.e., it is generated by a model assuming a single mean CM.Thus, Wishart classification is unable to describe the complexity of classes.On the contrary, RFs are non-parametric and allow a much more flexible modelling of the data.Figures 5-7 show classification maps obtained for all data types for raw data, pixel-based, and best feature sets.For comparison, Figure 8 show the results of applying supervised Wishart classification.Wishart poorly performs on heterogeneous classes such as Forest and City for PolSAR data, due to its inability to describe multi-modal distributions.In the case of TomoSAR, the field class is poorly classified.This is probably due to the effects of baseline decorrelation which is not uniform across the image, thus substantially changing the statistics of the CM.In the case of the raw data, PolSAR classification seems visually more correct than TomoSAR CM and tomogram ones especially on the street class.However, it can be observed that the TomoSAR CM seems to lead to more homogeneous classification over the City.The same observation can be made for the pixel-based features.TomoSAR CM pixel-based features lead to a better classification of large isolated buildings.It is well known that PolSAR does not allow a good characterization of buildings which alignment strongly deviates from the azimuth direction.The misclassification of the street class by TomoSAR is also probably due to the strong dependency of coherences and the range coordinate.This effect is strongly mitigated by the introduction of spatial features, especially in the case of tomograms, which profit from both 2-D intensity and 3-D tomographic descriptors.Finally, Figure 10 shows the importance of features, i.e., the added impurity decrease during training for each individual feature.This reflects how frequently each feature was used by the RF classifier to split the data space.It can be observed that many texture features are extremely frequently used by RFs.For PolSAR CM, the intensity values have higher importance than off-diagonal elements.For TomoSAR CM, coherences have a higher relative importance compared to diagonal elements.For tomograms, some simple single-pixel features such as median, mean, and minimum value have an extremely high importance.Interestingly, the values of peaks seem to be more informative than their positions.

Discussion
The overall observation that can be made regarding these experimental results is that it is possible to achieve land-cover classification with good performance from TomoSAR data without the help of any polarimetric information.It is interesting to consider the alternative of using multi-baseline data with a single polarization for this purpose in situations where polarimetric data is not available.Our study also shows that performance comparable to classification from PolSAR may be directly obtained from the multi-baseline CM.However, better results were obtained from the tomograms, at the cost of a careful step of feature design, to be able to capture the intensity variations present along the elevation direction.
Considering CM-based features, the results have shown the importance of interferometric coherences as a feature.In fact, the intensities of the TomoSAR CM are not expected to carry as much information as the PolSAR ones because they all arise from the same polarimetric channel, although from a slightly different view angle.Our experiments showed the relevance of using the interferometric coherences as features.This tends to confirm the findings of previous works such as [39,47,48] exploiting the dependency of interferometric coherence on the type of target.However, simple methods considering unimodal data distribution such as Wishart will fail to capture the possible variations of class statistics across the image, which are more likely to occur on multi-baseline datasets due to a loss of correlation in the near range area and variation of the vertical resolution depending on slant range.To illustrate the previous observations, Figure 11 shows examples of coherences for two different baselines.Nevertheless, the use of RF classifier allowed to obtain a better performance than Wishart from both PolSAR and TomoSAR CM, due to its ability to handle multi-modal distributions.When using RF in combination with coherence features, TomoSAR CM classification consistently outperformed PolSAR.Therefore, the issue of range-dependent decorrelation related to multi-baseline images can be easily overcome by using a more sophisticated classifier that is able to model multi-modal distributions.
Considering the tomogram-based features, which represent the main contribution of this work, our first experiments showed that simply using the raw tomograms was not leading to the best classification accuracies.Therefore, we have focused on hand-crafting features that summarize the intensity information along elevation carried by the tomogram.Considering SAR imaging, it is a natural choice to consider statistical descriptors which allow to model the possibly complex distribution of the data [49].A property of statistical features is that they are robust to noise.In TomoSAR, this noise is mainly due to decorrelation effects and baseline errors.As it is shown by recent works, non-physically motivated features can achieve excellent results in supervised approaches [10].We also have designed features which have a more straightforward physical interpretation, such as the position of the peaks, which represent the location of the dominant scatterers in height.We found out that hand-crafting features was drastically improving the performance of the tomogram-based classification.However, it is difficult to provide a physical interpretation of results obtained from a high dimensional feature space.For this reason, we proposed to study the feature importance provided by the RF classifier with the full feature set.This showed that there was equally high importance for features belonging to both categories (physically motivated and statistical).For example, the intensity values of the tomogram peaks were contributing roughly equally as region and 3-D descriptors.Overall, the results showed that the geometric and backscattering information contained in the tomograms was improving the performance over CM classification.One important aspect to consider is the vertical resolution, which mainly depends on the range distance and the effective baseline [21].In our study, we chose to use only 10 images, limiting the resolution to 4 m.This was sufficient to obtain tomographic imaging of buildings and forests present in the scene.With such resolution, it was expected that the classes which benefit from the vertical information contained in tomograms would be City, Forest and Shrubland.This has been confirmed by our experiments, showing a significant accuracy increase compared to PolSAR and TomoSAR CM-based classification.With this resolution, it was likely that the subtle differences in the vertical structure of fields and road could not be resolved.Indeed, these classes accuracies with tomogram features were very similar to the ones obtained with PolSAR features.It is then reasonable to think that the use of TomoSAR is suitable to distinguish classes with strong variation in the vertical reflectivity, but we would not recommend it for crop classification, or at least not with this vertical resolution.However, it may be noted that thanks to the reflectivity information of the tomogram, tomographic features were sufficient to obtain reasonable accuracies even for classes for which variation in the vertical structure cannot be resolved.

Conclusions
In this paper, we have explored the potential of using SAR tomography for supervised land-cover classification.Our experiments have shown that, provided carefully hand-crafted features are designed, TomoSAR-based classification leads to excellent performance and outperforms PolSAR-based classification.We have studied the use of both CM and tomograms for classification and introduced several hand-crafted features that were adapted to each type of data.Our experiments showed that TomoSAR-based classification leads to much better accuracy when spatial features are introduced.This was especially true in the case of tomograms.Introducing region-based and 3-D specific descriptors for tomograms helped dramatically in increasing both overall and balanced accuracies.
The results confirm the initial hypothesis that tomographic information is in particular helpful for classes that have a distinct 3D structure such as City, Forest, and Shrubland.Classes that can be well approximated by a 2D geometry, such as Street and Field, do not benefit from corresponding 3D features.Interestingly, though, their classification accuracy is still on par with results obtained by using polarimetric information if proper TomoSAR features are extracted.
Future work may consider the joint use of polarimetric and tomographic data.As feature hand-crafting needs a lot of trial and error, we also would like to consider methods that automatically extract features from the data.Other types of classifiers such as artificial neural networks should also be considered.Such study could be performed on spaceborne images when data from future spaceborne mission such as Tandem-L (DLR) is available.We also would like to study the influence of the focusing method as well as the number of tracks, the frequency band, the types of classes and the accuracy of the DEM on classification performance.

Figure 1 .
Figure1.Illustration of the principle of TomoSAR imaging.Here, r and s denote the slant range and elevation coordinates in the radar reference system.All the scatterer contributions contained in a pixel centered around a particular azimuth-range position are superposed.TomoSAR uses several acquisitions at a slightly different incidence angle to reconstruct the reflectivity profile along the elevation direction s.

Figure 2 .
Figure 2. Result of tomographic inversion for multiple pixels.(a) Average intensity of an L-band E-SAR (DLR) image stack for a manually cropped area of size 918 × 929 pixels, (b) rendering [37] of the 3-D reflectivity image obtained by Capon focusing and (c) color scale of this rendered image where alpha is the opacity.Each color and opacity represents a range of intensities.This figure is best seen in the digital version of the paper.

Figure 4 .
Figure 4. Visualization of optical image of the area, SAR intensity image, manually labelled reference data and subset used to train the RF classifier.(a) Optical image; (b) SAR Intensity; (c) Reference labels; (d) Training subset.

Figure 9 Figure 9 .
Figure 9 shows the influence of the different feature subsets on tomogram-based classification.While the City classification is very noisy when using pixel-based features, it becomes more homogeneous when introducing region features over intensity.Introducing 2-D patches has a strong effect on Street classification while introducing tomogram-based region features increases the homogeneity of isolated buildings.

Figure 11 .
Figure 11.Example of interferometric coherence for (a) a small and (b) a large baseline, helping the interpretation of the Wishart classification results for TomoSAR CM.On the one hand, the coherence is an informative feature for TomoSAR CM classification.On the other hand, baseline decorrelation provokes a strong loss of coherence, especially in the near range.The Wishart classifier is sensitive to this effect due to its unimodal assumption whereas RF can handle this variation.

Table 2 .
Summary of the classification results (in percent) for PolSAR CM features.Best accuracies are indicated in bold font.

Table 3 .
Summary of the classification results (in percent) for TomoSAR CM features without phase.Best accuracies are indicated in bold font.

Table 4 .
Summary of the classification results (in percent) for tomogram features.Best accuracies are indicated in bold font.

Table 5 .
Summary of the classification results (in percent) with TomoSAR CM for feature sets including phase.Best accuracies are indicated in bold font.

Table 6 .
Summary of the classification results for supervised Wishart classification.Best accuracies are indicated in bold font.