1. Introduction
Since a single channel SAR image does not contain a sufficient amount of information to enable accurate land-cover classification [
1], most work focuses on multi-channel SAR consisting of images of multiple polarizations, frequencies, or taken at different times. Those classification approaches can be coarsely divided into two groups depending on whether reference data is used (i.e., supervised) or not (i.e., unsupervised). Unsupervised methods aim to link the measurement within the image pixels to the underlying backscattering process, which is particularly successful for polarimetric SAR (PolSAR) through widely used polarimetric decomposition theorems [
2,
3]. In many applications, however, user-defined semantic classes are of interest which are (potentially) related but not identical to the physical phenomenology of the electromagnetic backscattering according to which those theorems categorize pixels. This can only be achieved by supervised approaches that map pixel values to semantic labels (such as city, road, forest) instead of physical categories (such as surface, double-bounce, or volume scattering). Such machine-learning-based and data-driven classification methods have shown a tremendous progress for the task of land-cover classification. The Wishart classifier [
4] is a simple and classical example which can be used in supervised as well as unsupervised classification scenarios [
5]. While early approaches are merely based on physical characteristics, later works (e.g., [
6,
7,
8]) showed that the usage of other features leads to improved classification results. Since it is often unknown which features are most informative for a given task, recent works either aim to extract a quasi-exhaustive set as input to a classifier with built-in feature selection (e.g., Random Forest (RF) [
9]), or incorporate the feature extraction into the optimization problem of the classifier itself (e.g., by RFs [
10] or Convolutional Networks [
11,
12]). Other examples of multi-dimensional SAR data that have been used for classification purposes are multi-frequency SAR (e.g., in [
13,
14]), multi-temporal SAR (e.g., [
15,
16,
17]), and the combination of polarimetric and interferometric information (e.g., in [
18,
19,
20]).
Tomographic SAR (TomoSAR, see
Section 2) aims at providing a 3-D image of the reflectivity of a scene by creating a synthetic aperture in elevation from a stack of SAR images acquired with slightly different angles to retrieve the backscattering profile within each image pixel [
21]. It has been successfully used for the reconstruction of the vertical structure of forest [
22,
23], ice subsurface imaging [
24], and the reconstruction of urban objects (e.g., in [
25,
26]). While most works focus either on the generation of TomoSAR data or on its geometric processing, e.g., regularization [
27], geometric primitive extraction [
28,
29], or object modelling [
30], only a few works address semantic analysis. Recently, TomoSAR descriptors have been proposed in [
31] for the classification of forest development stages.
More traditional approaches for land-cover classification focus on polarimetric properties of the measured echo including features describing its local variation. In particular targets that have a characteristic backscatter intensity or polarimetric behavior can be successfully distinguished by corresponding methods. However, polarimetric features often depend on the relative orientation between object and sensor (e.g., double-bounce) and can lead to similar signatures of targets belonging to different semantic classes (e.g., volume scattering may be caused by oriented buildings making them indistinguishable from forest). Tomographic SAR data on the other hand provides a rather geometric interpretation of the scene by providing information about the underlying 3D structure, i.e., number and location of scatterers in the elevation direction. The average height of a region is an obviously highly informative feature to distinguish between classes with very different height levels, e.g., between grassland, fields, roads on the one hand and forest, urban areas on the other hand. Besides the average height, local fluctuations of the vertical reflectivity can be a helpful feature, even in the case where absolute height differences lie below the vertical resolution of the TomoSAR data.
First studies on the potential of TomoSAR for general land-cover classification, have been proposed in the conference works [
32,
33]. This paper builds upon these previous works by proposing a more in-depth evaluation of the results as well as a more detailed presentation of the methodology. We propose features which are adapted to the description of TomoSAR data. They are either based on the tomographic covariance matrix or on the extracted tomograms obtained after tomographic focusing. The descriptive power of those features is evaluated on a land-cover classification task.
Rather than proposing a novel classification pipeline based on TomoSAR data, we aim to investigate whether TomoSAR data have a potential to be used for land-cover classification tasks at all. Consequently, we do not limit the set of features based on prior assumptions but rather aim to analyze a large and diverse feature set. Moreover, we want to study the possibility of using TomoSAR for this task without any polarimetric information, i.e., when no quad- or dual-pol dataset is available. To be able to set the obtained results into context, we compare the results to those obtained by either other classification techniques or other data, i.e., polarimetric SAR images.
The paper is structured as follows:
Section 2 introduces the basic principles of TomoSAR focusing.
Section 3 describes the used methodology.
Section 4 shows a quantitative evaluation of the proposed features on experimental E-SAR (DLR) data and compares them with polarimetric features.
Section 5 provides a more in-depth interpretation and discussion of the results. Finally,
Section 6 concludes the paper.
5. Discussion
The overall observation that can be made regarding these experimental results is that it is possible to achieve land-cover classification with good performance from TomoSAR data without the help of any polarimetric information. It is interesting to consider the alternative of using multi-baseline data with a single polarization for this purpose in situations where polarimetric data is not available. Our study also shows that performance comparable to classification from PolSAR may be directly obtained from the multi-baseline CM. However, better results were obtained from the tomograms, at the cost of a careful step of feature design, to be able to capture the intensity variations present along the elevation direction.
Considering CM-based features, the results have shown the importance of interferometric coherences as a feature. In fact, the intensities of the TomoSAR CM are not expected to carry as much information as the PolSAR ones because they all arise from the same polarimetric channel, although from a slightly different view angle. Our experiments showed the relevance of using the interferometric coherences as features. This tends to confirm the findings of previous works such as [
39,
47,
48] exploiting the dependency of interferometric coherence on the type of target. However, simple methods considering unimodal data distribution such as Wishart will fail to capture the possible variations of class statistics across the image, which are more likely to occur on multi-baseline datasets due to a loss of correlation in the near range area and variation of the vertical resolution depending on slant range. To illustrate the previous observations,
Figure 11 shows examples of coherences for two different baselines. Nevertheless, the use of RF classifier allowed to obtain a better performance than Wishart from both PolSAR and TomoSAR CM, due to its ability to handle multi-modal distributions. When using RF in combination with coherence features, TomoSAR CM classification consistently outperformed PolSAR. Therefore, the issue of range-dependent decorrelation related to multi-baseline images can be easily overcome by using a more sophisticated classifier that is able to model multi-modal distributions.
Considering the tomogram-based features, which represent the main contribution of this work, our first experiments showed that simply using the raw tomograms was not leading to the best classification accuracies. Therefore, we have focused on hand-crafting features that summarize the intensity information along elevation carried by the tomogram. Considering SAR imaging, it is a natural choice to consider statistical descriptors which allow to model the possibly complex distribution of the data [
49]. A property of statistical features is that they are robust to noise. In TomoSAR, this noise is mainly due to decorrelation effects and baseline errors. As it is shown by recent works, non-physically motivated features can achieve excellent results in supervised approaches [
10]. We also have designed features which have a more straightforward physical interpretation, such as the position of the peaks, which represent the location of the dominant scatterers in height. We found out that hand-crafting features was drastically improving the performance of the tomogram-based classification. However, it is difficult to provide a physical interpretation of results obtained from a high dimensional feature space. For this reason, we proposed to study the feature importance provided by the RF classifier with the full feature set. This showed that there was equally high importance for features belonging to both categories (physically motivated and statistical). For example, the intensity values of the tomogram peaks were contributing roughly equally as region and 3-D descriptors. Overall, the results showed that the geometric and backscattering information contained in the tomograms was improving the performance over CM classification.
One important aspect to consider is the vertical resolution, which mainly depends on the range distance and the effective baseline [
21]. In our study, we chose to use only 10 images, limiting the resolution to 4 m. This was sufficient to obtain tomographic imaging of buildings and forests present in the scene. With such resolution, it was expected that the classes which benefit from the vertical information contained in tomograms would be City, Forest and Shrubland. This has been confirmed by our experiments, showing a significant accuracy increase compared to PolSAR and TomoSAR CM-based classification. With this resolution, it was likely that the subtle differences in the vertical structure of fields and road could not be resolved. Indeed, these classes accuracies with tomogram features were very similar to the ones obtained with PolSAR features. It is then reasonable to think that the use of TomoSAR is suitable to distinguish classes with strong variation in the vertical reflectivity, but we would not recommend it for crop classification, or at least not with this vertical resolution. However, it may be noted that thanks to the reflectivity information of the tomogram, tomographic features were sufficient to obtain reasonable accuracies even for classes for which variation in the vertical structure cannot be resolved.
6. Conclusions
In this paper, we have explored the potential of using SAR tomography for supervised land-cover classification. Our experiments have shown that, provided carefully hand-crafted features are designed, TomoSAR-based classification leads to excellent performance and outperforms PolSAR-based classification. We have studied the use of both CM and tomograms for classification and introduced several hand-crafted features that were adapted to each type of data. Our experiments showed that TomoSAR-based classification leads to much better accuracy when spatial features are introduced. This was especially true in the case of tomograms. Introducing region-based and 3-D specific descriptors for tomograms helped dramatically in increasing both overall and balanced accuracies.
The results confirm the initial hypothesis that tomographic information is in particular helpful for classes that have a distinct 3D structure such as City, Forest, and Shrubland. Classes that can be well approximated by a 2D geometry, such as Street and Field, do not benefit from corresponding 3D features. Interestingly, though, their classification accuracy is still on par with results obtained by using polarimetric information if proper TomoSAR features are extracted.
Future work may consider the joint use of polarimetric and tomographic data. As feature hand-crafting needs a lot of trial and error, we also would like to consider methods that automatically extract features from the data. Other types of classifiers such as artificial neural networks should also be considered. Such study could be performed on spaceborne images when data from future spaceborne mission such as Tandem-L (DLR) is available. We also would like to study the influence of the focusing method as well as the number of tracks, the frequency band, the types of classes and the accuracy of the DEM on classification performance.