As an important means of longtime earth surface monitoring and large-scale land cover information acquiring, synthetic aperture radar (SAR) has become increasingly important in land cover classification [1
], natural disaster prevention [3
], target recognition [4
] and urban observation [6
]. More and more SAR systems, e.g., RadarSAT-2, TerraSAR-X, Gaofen-3 [2
] can acquire polarimetric SAR (PolSAR) images and provide more information about terrain targets and land cover [9
] by emitting and receiving fully-polarized radar waves. Therefore, developing classification methods for PolSAR interpretation to identify land cover information has been widely studied. Generally, PolSAR land cover classification algorithms can be divided into three categories called statistical model-based algorithms [10
], scattering mechanism analysis-based algorithms [7
], and the algorithms combining machine-learning classifiers and polarimetric features [14
] or PolSAR data [16
]. Many works focus on the third category among abovementioned approaches; more and more complicated and efficient supervised learning [14
] and deep learning algorithms [17
] have been proposed to interpret the PolSAR land cover information.
Despite the fact that the some traditional supervised learning and deep learning algorithms for PolSAR classification can achieve promising results, they fail to recognize unseen classes which are not included in the training data. That is, in the test stage, if there is a new land cover class which was never seen in the training stage, they do not possess the ability to learn or infer the new classes, which may limit these methods for practical applications [20
]. On the other hand, remote sensing has traditionally enjoyed an abundance of data, but obtaining label information has always been an important bottleneck in classification studies [21
]. Especially for PolSAR interpretation, correct and sufficient labeled samples are usually rare and labeling the samples always needs some professional knowledge. In short, labeling samples in PolSAR imagery is difficult, expensive, and time-consuming [23
]. Therefore, it is especially necessary to develop methods for PolSAR interpretation to identify the unseen targets or land cover, since the massiveness of the PolSAR data, the urgent need for PolSAR interpretation tasks, and the labeled data are not always available.
Zero-shot learning (ZSL) aims to recognize objects whose instances have not been seen during training [24
] by means of leveraging intermediate semantic information, which could significantly extend machine-learning abilities for handling practical problems [27
]. With the capability of transferring semantic knowledge, ZSL can be regarded as a good complement to conventional supervised learning [28
]. Therefore, ZSL has received much attention recently in computer vision research [27
] and has obtained promising results for identifying unseen samples in several standard datasets [24
], including the Animals with Attributes dataset (AwA) [28
], CUB-200-2011 Birds dataset (CUB) [27
], aPascal and aYahoo datasets [26
], ImageNet [24
], and so on. Typical intermediate semantic information, such as visual attributes [27
] or Word2Vec vector representations [22
] are shareable to both seen and unseen classes [28
], thus, ZSL can learn how to recognize new unseen classes that have no training samples by relating them to seen classes that were previously learned.
However, despite ZSL achieving promising results in several standard datasets, few articles on ZSL application in the remote sensing interpretation have been reported. Li et al. [22
] applied ZSL for high spatial resolution remote sensing images classification and the average accuracy was about 58% for the UC Merced land use dataset with the unseen/seen ratio equal to 5/16 (16 kinds of training samples and the testing samples were the other five kinds). Song et al. [32
] employed ZSL to SAR target recognition demonstrated on the MSTAR data set, where seven targets samples were used in training and the eighth target type was used for testing. Sumbul et al. [21
] studied the ZSL problem for fine-grained street tree recognition in aerial data and achieved a 14.3% recognition accuracy with an unseen/seen ratio of 16/24. Moreover, ZSL methods have been studied in an unrealistic setting where the test data are assumed to come from unseen classes only [33
]. Generally, the seen classes are more common than the unseen ones, therefore, it is unrealistic to assume that we will never encounter seen classes during the test stage. In a real scenario, though there are limited types of labeled training classes, the test samples always contain both the seen samples and unseen samples. This problem is known as generalized zero-shot learning (GZSL) [24
] and is considered a more challenging problem setting [33
]. GZSL relaxes the unrealistic assumption in conventional ZSL that test data belong only to unseen novel classes. In GZSL, test data may also come from seen classes, and the labeling space is the union of both the unseen and seen classes [33
As mentioned above, to get correct and sufficient labeled samples in PolSAR imagery is always difficult, expensive, and time consuming. For some obvious land cover classes such as urban areas, forest lands, water areas, etc., it is easy to obtain labeled samples in PolSAR imagery, but the labeled samples of rural areas, wetland, and grasslands are relatively difficult to obtain. Furthermore, a common situation for PolSAR land cover classification is that there are only samples of urban areas and water areas, but the actual test data contain more abundant land cover categories and need to get more detailed category information. On the other hand, there is rich scene information that can be described by semantic relationships in the PolSAR imagery, for example, both forest lands and grasslands exhibit certain surface scattering characteristics. Therefore, GZSL can provide more practical solutions for PolSAR interpretation to recognize some new land cover categories without labeled samples, while conventional supervised learning and common deep learning algorithms always fail to categorize unseen instances. This paper focuses on the GZSL for PolSAR land cover classification since we hold the view that the GZSL in PolSAR land cover classification is more practical than ZSL.
The main challenge for GZSL applied in PolSAR land cover classification comes from the intermediate semantic information representation of PolSAR land cover classes and the inference of whether a testing sample belongs to a seen or unseen class. Attributes often refer to the well-known common semantic characteristics of objects and can be acquired by human annotation [21
], or neural language models such as Word2Vec and GloVe [35
]. The commonly used attributes in traditional ZSL are mainly for natural images or ground-level images. Whether some of the related annotated attributes and the neural language models are available for PolSAR land cover classes’ semantic representation need to be further verified. The inference progresses of GZSL are often more complicated than ZSL, since the search space is limited to the unseen classes only in the ZSL inference, while the search space is the union of both seen and unseen classes in GZSL inference. Recent work [24
] has shown that the accuracies of most ZSL approaches drop significantly in the GZSL setting in which the test samples always contain both the seen samples and unseen samples. To solve these two challenges of the GZSL framework applied in PolSAR classification the following steps have been designed and conducted: Firstly, the semantic attributes from the Word2Vec semantic vectors [35
], SUN attributes [36
], and the selected SUN attributes are collected and evaluated to describe the characteristics of the PolSAR typical classes and semantic relevance between attributes is obtained to relate unseen and seen classes. Then the projection relationship between the mid-level representation of PolSAR data samples and class attributes is established by a latent embedding model [37
] during training stage. At last, for every test instance, through the GZSL model constructed by the polarization feature representation, the projection relationship, and semantic relevance, the labels of the test instances can be predicted. Semantic relevance is used to constraint and to amend the scores between the unseen and seen classes in this prediction process. Even though some test instances do not have training samples backing them, the inference progress in the proposed framework is determined by the polarization feature, projection relationship, and semantic relevance. The proposed method has the following contributions and advantages:
The adaption of the available semantic attributes for PolSAR land cover class description has been evaluated, including the Word2Vec semantic vectors, SUN attributes, and the selected SUN attributes.
By utilizing the rich polarization features and semantic information in the PolSAR imagery, the proposed GZSL framework can provide a more practical solution for PolSAR interpretation to classify some new land cover categories without labeled samples, which can reduce the requirement for sample labeling and make the framework has the ability to identify the new types in PolSAR land cover classification.
To the best of our knowledge, the GZSL framework for PolSAR land cover classification has not been studied in the remote sensing literature even though it is a highly probable scenario where new land cover categories can be introduced after the training stage or when no training examples exist for several rare classes that are still of interest. The remainder of the paper is structured as follows: in Section 2
, the related work about the ZSL and GZSL framework, and the intermediate semantic information are briefly introduced. In Section 3
, we present the workflow and the implementation of the proposed GZSL for the PolSAR land cover classification method in detail. In Section 4
, we present and evaluate the results obtained by applying the proposed framework for PolSAR land cover classification. The capabilities and limitations are discussed in Section 5
. Finally, the conclusions are presented in Section 6
Since the land cover classes in the Flevoland data and Wuhan Data1 are not very numerous, the unseen class in the above experiment is one class; that is, the ratio of the training classes and testing classes is 3/4. To further illustrate the effectiveness of the proposed GZSL method, we implemented the following experiment on Wuhan Data2, as shown in Figure 8
a. There are more rich land cover classes in Wuhan Data2. The land cover classes in this data mainly consist of six types: urban areas, water, forest lands, rural areas, wetland, and agricultural land.
For the RadarSAT-2 Wuhan Data2, the training samples include urban areas, water, and forest lands, and the testing samples include all six types mentioned above; that is, the unseen classes are the rural areas, wetland, and agricultural land. The PolSAR data are first divided into 50 × 50 pixel patches with a 50% overlapping rate, the number of test samples in Wuhan Data2 is 30,441. Additionally, the sample number for each training class is 100 and all the three kinds of the labeled samples are selected randomly. These three kinds of samples in the Wuhan Data2 can be the same as the Wuhan Data1 samples. The six kinds of test classes’ results obtained by the proposed GZSL method has been shown in Figure 8
b. Additionally, the adapted semantic attributes mainly come from the Word2Vec model, the main reason was that there was no exact category with attributes in the SUN attributes database on ‘wetland’ and ‘agricultural land’, and the ‘rural areas’ attributes in the SUN attributes database were not exactly the same.
From Figure 8
b, it can be seen that the rural areas and agricultural land were basically recognized simultaneously, although neither of these classes have training samples. ROI_1 and ROI_2 correspond to the rural areas and agricultural land (most of these lands contain some low vegetation). The corresponding ROI optical images from Google Earth have been illustrated in Figure 9
a,c. Typical field investigation images in the selected ROIs have been illustrated in Figure 9
b,d. Combined with the provided reference images in Figure 9
, it can be seen that the ROI_1 and ROI_2 classification results in Figure 8
b have a certain degree of effectiveness. Moreover, the ratio of the training classes and testing classes is 3/6, which also shows that the proposed method is effective and has certain extensibility. In the above experiments, there is a certain semantic relevance between the unseen classes and the seen classes, such as between the rural areas and the urban areas, between wetland and water. Thus, GZSL can classify some new land cover categories without the labeled samples, even though the ratio of the training classes and test classes is 3:6.
Experiments on the above three RadarSAT-2 PolSAR datasets show that the proposed method can classify 4–6 testing classes with only three training classes. That is, the ratio of unseen/seen classes can be 1/3–3/3. There are few works on ZSL application in the remote sensing interpretation, and GZSL for PolSAR land cover classification has not been studied in the remote sensing literature, to the best of our knowledge. We introduce the unseen/seen class ratio settings in other related works (ZSL applied in remote sensing interpretation) to make a comparative analysis. For the ZSL target recognition problem in the SAR image in [32
], the ratio of unseen/seen classes is 1/7, that is, seven classes for training and only one class for testing. For the ZSL classification problem in the high spatial resolution image in [22
], the average accuracy was about 58% with an unseen/seen ratio of 5/16. Another ZSL problem in [21
] is the fine-grained street tree recognition in the aerial data; the recognition accuracy was about 14.3% with an unseen/seen ratio of 16/24. In our work, combining the actual basic requirements of the PolSAR interpretation, the unseen/seen ratio was 1/3–3/3, that is, the training/testing classes’ ratio is 3/4–3/6. Due to the small amount of training samples selected in the experiments, less than 1.9% (300/15,996, 300/20,805, 300/30,441) of the samples were randomly selected as training samples, the accuracy of our experimental results is not very high at present. However, we think that our method is feasible and effective in the application of classifying some new land cover categories without labeled samples, for PolSAR imagery interpretation with about 8-m resolution. The GZSL framework can reduce the requirement for sample labeling and give the framework the ability to identify new types in PolSAR land cover classification.
In subsequent research work, several topics need to be further studied. Firstly, a more professional semantic description about the land cover classes or targets in PolSAR imagery should be analyzed, including the scattering characteristic, resolution, polarization mode, incident angle, seasonal, and other information. Secondly, the semantic modeling methods and tools for the aforementioned semantic description of the PolSAR land cover classes or targets need to be further developed. A potential and promising semantic modeling method is the ontological semantic model [4
], and this will be the focus of our research work in the next step. Thirdly, as an important topic in computer vision research, the ability of ZSL or GZSL has not yet been standardized; that is, there is no agreed upon ZSL or GZSL benchmark [24
]. Thus, the potential application of ZSL and GZSL can be further explored for SAR image interpretation.
For PolSAR land cover classification, it is a highly probable scenario that new land cover categories can be introduced after the training stage, or that no training examples available for several rare and interesting classes. Inspired by generalized zero-shot learning (GZSL), which can categorize instances from both seen and previously unseen classes, this paper studies the problem of classifying both the unseen and seen land cover classes’ information from the PolSAR image under a semantic expressed GZSL framework. By leveraging the rich semantic relevance between land cover attributes in the PolSAR imagery, the semantic relevance between attributes is first obtained to relate unseen and seen classes. Then, the projection relationship between the effective mid-level polarization features and class attributes is established by latent embedding during training. Finally, for every test instance, through the GZSL model constructed by the mid-level polarization feature, projection relationship, and semantic relevance, the labels of the test instances can be predicted, even though some test instances do not have training samples backing them. The quantitative and qualitative evaluation of experiments on the three RadarSAT-2 datasets have shown that the classification accuracy of an unseen class is about 73% if there are some semantically-related seen classes in the training stage. Additionally, the proposed method can classify 4–6 testing classes with only three training classes. This GZSL framework can reduce the requirement for sample labeling giving the framework the ability to identify new types in PolSAR land cover classifications. Moreover, three kinds of land cover class attributes, which include the Word2Vec semantic vectors, SUN attributes, and the selected SUN attributes, have been applied and evaluated.
Since the currently employed semantic attributes lack some polarimetric semantic expressions, including the resolution, scattering characteristic, incident angle, season information, and so on, the classification ability of the proposed GZSL method is still relatively conservative. In the future, we expect to continually improve and develop our research on semantic modeling of the land cover or targets in SAR images.