A Generalized Zero-Shot Learning Framework for PolSAR Land Cover Classification

Most supervised classification methods for polarimetric synthetic aperture radar (PolSAR) data rely on abundant labeled samples, and cannot tackle the problem that categorizes or infers unseen land cover classes without training samples. Aiming to categorize instances from both seen and unseen classes simultaneously, a generalized zero-shot learning (GZSL)-based PolSAR land cover classification framework is proposed. The semantic attributes are first collected to describe characteristics of typical land cover types in PolSAR images, and semantic relevance between attributes is established to relate unseen and seen classes. Via latent embedding, the projection between mid-level polarimetric features and semantic attributes for each land cover class can be obtained during the training stage. The GZSL model for PolSAR data is constructed by mid-level polarimetric features, the projection relationship, and the semantic relevance. Finally, the labels of the test instances can be predicted, even for some unseen classes. Experiments on three real RadarSAT-2 PolSAR datasets show that the proposed framework can classify both seen and unseen land cover classes with limited kinds of training classes, which reduces the requirement for labeled samples. The classification accuracy of the unseen land cover class reaches about 73% if semantic relevance exists during the training stage.


•
The adaption of the available semantic attributes for PolSAR land cover class description has been evaluated, including the Word2Vec semantic vectors, SUN attributes, and the selected SUN attributes. • By utilizing the rich polarization features and semantic information in the PolSAR imagery, the proposed GZSL framework can provide a more practical solution for PolSAR interpretation to classify some new land cover categories without labeled samples, which can reduce the requirement for sample labeling and make the framework has the ability to identify the new types in PolSAR land cover classification.
To the best of our knowledge, the GZSL framework for PolSAR land cover classification has not been studied in the remote sensing literature even though it is a highly probable scenario where new land cover categories can be introduced after the training stage or when no training examples exist for several rare classes that are still of interest. The remainder of the paper is structured as follows: in Section 2, the related work about the ZSL and GZSL framework, and the intermediate semantic information are briefly introduced. In Section 3, we present the workflow and the implementation of the proposed GZSL for the PolSAR land cover classification method in detail. In Section 4, we present and evaluate the results obtained by applying the proposed framework for PolSAR land cover classification. The capabilities and limitations are discussed in Section 5. Finally, the conclusions are presented in Section 6.

Related Work
In this section, a brief overview of the previous studies related to the ZSL and GZSL framework is provided, followed by a short introduction to intermediate semantic information.

From Zero-Shot Learning to Generalized Zero-Shot Learning
Zero-shot learning is an attractive new task that has recently aroused increasing attention [22,30,35]. ZSL has made it possible to recognize new category without acquiring training examples beforehand via leveraging semantic information. Typically, there are three main parts in a conventional ZSL framework, including the image feature extraction, intermediate semantic information of the training and test classes, and semantic embedding W [22,31,38], as illustrated in Figure 1a. We denote the label space of the seen classes in the training stage as C S = {1, 2, . . . , p}; C U = {p + 1, p + 2, . . . , p + q} means the label space of the unseen classes in test stage, in ZSL framework where C S ∩ C U = Ø. The ZSL framework proposed by Lampert et al. [39], first links the attribute annotations that are human prior knowledge with low-level features extracted from images. Typically, the taxonomy of ZSL framework could be distinguished as the approaches based on independent semantics and the approaches based on semantic embedding [29,39]. For ZSL approaches based on independent semantics, it consists of learning an independent classifier per semantic [29,31]. Due to its simplicity, ZSL approaches based on independent semantics became widely popular, including direct attribute prediction [39], semantic auto-encoder [26], and latent embedding [37]. While, for ZSL approaches based on semantic embedding, these were accomplished using a label embedding function ϕ, to map each class C i into a vector ϕ(C i ) in the space of attributes. Semantic embedding based methods such as indirect attribute prediction [39], label embedding [38], semantic similarity embedding [40], and so on, learn single multi-class classifiers that optimally discriminate between all seen classes, and the predicted probabilities and attribute annotations of the seen classes are then used to estimate object attribute values [27,29]. As illustrated in Figure 1a, the label inference progress of a ZSL usually employs the 1-nearest neighbor (NN) strategy [25,26,31].
Although the aforementioned ZSL models have shown considerable promise on some standard datasets, a key limitation in most of these models is that, at the test stage, they are highly biased towards predicting the seen classes [24,33,34]. This is because the ZSL model can only learn from the seen classes in the training stage, while the test examples only come from the unseen classes and the search space is limited to the unseen classes only. The more challenging setting where the training and test classes are not disjoint is known as generalized zero-shot learning (GZSL), and is considered a more formidable problem setting [33,34].
The biggest difference between GZSL and ZSL is the setting of test classes, as illustrated in Figure 1a,b. The training and test classes are assumed to be strictly disjoint in the ZSL framework, while in GZSL, the test data are from both seen and unseen classes. Recent work [33] has shown that the accuracies of most ZSL approaches drop significantly in GZSL settings. Thus, the inference progress should more complicated than ZSL. With the ability to identify both the unseen and seen classes via leveraging semantic information, GZSL can provide a new way for PolSAR image land cover classification by utilizing the rich polarization features and semantic information in PolSAR imagery. To a certain extent, GZSL can reduce the requirement for sample labeling and make the framework has the ability to identify new types in PolSAR land cover classification.
approaches based on semantic embedding, these were accomplished using a label embedding function φ, to map each class Ci into a vectorφ(Ci) in the space of attributes. Semantic embedding based methods such as indirect attribute prediction [39], label embedding [38], semantic similarity embedding [40], and so on, learn single multi-class classifiers that optimally discriminate between all seen classes, and the predicted probabilities and attribute annotations of the seen classes are then used to estimate object attribute values [27,29]. As illustrated in Figure 1a, the label inference progress of a ZSL usually employs the 1-nearest neighbor (NN) strategy [25,26,31].

Intermediate Semantic Information
Since no training instances are available for some unseen test categories in GZSL, image features alone are not sufficient to form the association between the unseen and seen classes. Thus, intermediate semantic information (auxiliary information [21] or side information [20]) can act as an intermediate layer for building this association between the unseen and seen classes. As typical intermediate semantic information, attributes [20,21] acquired by human annotation have been successfully used in ZSL tasks for the identification of different animal, bird, or dog species or indoor and outdoor scene categories in computer vision. Another main type of intermediate semantic information available is the word vectors from a linguistic corpus with neural language models such as the Word2Vec model [41]. The word vectors are vector representations of words learned in a large-scale language corpus which suggests that words frequently appearing in a common context would result in a closer distance [28], which can be used to map names of scene classes (both seen and unseen) to semantic vectors [22,26,37,40].
Remote sensing images, especially PolSAR images, contain rich scene information, including the land cover surface properties and spatial properties [42]. The SUN attribute database [43,44] is the first large-scale scene attribute database, build on top of the diverse SUN categorical database that spans more than 700 categories and 14,000 images. Additionally, the SUN attributes are related to materials, surface properties, functions, and spatial envelope properties. The adaption of Word2Vec semantic vectors, SUN attributes, and the selected SUN attributes for PolSAR land cover class description will be evaluated in Section 4.

Methodology
The applied GZSL workflow ( Figure 2) of PolSAR land cover classification is organized by three parts including the polarization feature representation, semantic representation about PolSAR land cover classes, and the generalized zero-shot learning with semantic relevance. The test instances are from both the seen and unseen classes, and we need to classify them into the joint labeling space of both types of seen and unseen classes. The detailed description of the method works as follows. The applied GZSL workflow ( Figure 2) of PolSAR land cover classification is organized by three parts including the polarization feature representation, semantic representation about PolSAR land cover classes, and the generalized zero-shot learning with semantic relevance. The test instances are from both the seen and unseen classes, and we need to classify them into the joint labeling space of both types of seen and unseen classes. The detailed description of the method works as follows.

Polarization Feature Representation
PolSAR data can be represented by a scattering matrix S, covariance matrix C, or coherency matrix T, which can provide more scatting information for land cover classification [45]. The Yamaguchi decomposition [13,42,46] extended the three-component scattering model [46] by adding the helix scattering mechanism as the fourth component to deal with the observed actual phenomenon. The model can be expressed as: T is the measured polarimetric coherency matrix. TS, TD, TV, and TH correspond to the coherency matrix for surface scattering, double-bounce scattering, volume scattering, and helix scattering, respectively, and fs, fd, fv, and fc are the corresponding coefficients. Furthermore, the orientation of buildings areas with respect to the radar illumination also affects their polarimetric properties [7]; this is possible to cause confusion between those buildings areas and vegetation. A rotation of the coherency matrix (namely, deorientation) [13] can be adopted for a more accurate decomposition.

Polarization Feature Representation
PolSAR data can be represented by a scattering matrix S, covariance matrix C, or coherency matrix T, which can provide more scatting information for land cover classification [45]. The Yamaguchi decomposition [13,42,46] extended the three-component scattering model [46] by adding the helix scattering mechanism as the fourth component to deal with the observed actual phenomenon. The model can be expressed as: T is the measured polarimetric coherency matrix. T S , T D , T V , and T H correspond to the coherency matrix for surface scattering, double-bounce scattering, volume scattering, and helix scattering, respectively, and f s , f d , f v , and f c are the corresponding coefficients. Furthermore, the orientation of buildings areas with respect to the radar illumination also affects their polarimetric properties [7]; this is possible to cause confusion between those buildings areas and vegetation. A rotation of the coherency matrix (namely, deorientation) [13] can be adopted for a more accurate decomposition.
Based on the aforementioned scattering powers obtained from Yamaguchi four-component decomposition with deorientation, we employ our previous work [5,46,47] to obtain the mid-level components, called intermediates, which are unsupervised statistical patterns learned from PolSAR images. The mid-level polarization feature representation has proven to have a good classification ability, especially for the differentiation of building areas and vegetation [5,46], and the differentiation of building density [47] in PolSAR data. The flowchart of the applied mid-level polarization feature (scattering mechanism based statistical feature) representation algorithm based on four-component decomposition with deorientation is given in Figure 3. The PolSAR images analyzed in proposed framework are mainly the 8-m resolution full polarimetric SAR imagery, thus, the sample size of 50 × 50 pixels can hold the composition of different land cover types and capture sufficient context information. The dimension of the applied mid-level polarization feature is 80. Typically, the volume scattering power and surface scattering power are much larger than the power of double-bounce scattering and helix scattering, for vegetation areas, water areas and some low-and medium-density building areas with special orientations in PolSAR imagery [7,46,47]. In order to keep enough characteristics of the volume scattering and surface scattering, and avoid too much of the zero value in double-bounce scattering and helix scattering characteristics after the merging, the merging number of the four scattering components are different.  Based on the aforementioned scattering powers obtained from Yamaguchi four-component decomposition with deorientation, we employ our previous work [5,46,47] to obtain the mid-level components, called intermediates, which are unsupervised statistical patterns learned from PolSAR images. The mid-level polarization feature representation has proven to have a good classification ability, especially for the differentiation of building areas and vegetation [5,46], and the differentiation of building density [47] in PolSAR data. The flowchart of the applied mid-level polarization feature (scattering mechanism based statistical feature) representation algorithm based on four-component decomposition with deorientation is given in Figure 3. The PolSAR images analyzed in proposed framework are mainly the 8-m resolution full polarimetric SAR imagery, thus, the sample size of 50 x 50 pixels can hold the composition of different land cover types and capture sufficient context information. The dimension of the applied mid-level polarization feature is 80. Typically, the volume scattering power and surface scattering power are much larger than the power of double-bounce scattering and helix scattering, for vegetation areas, water areas and some low-and medium-density building areas with special orientations in PolSAR imagery [7,46,47]. In order to keep enough characteristics of the volume scattering and surface scattering, and avoid too much of the zero value in double-bounce scattering and helix scattering characteristics after the merging, the merging number of the four scattering components are different.

Semantic Representation Of PolSAR Land Cover Classes
The typical land cover classes in 8-m resolution PolSAR images include urban areas (c1), rural areas (c2), water (c3), forest lands (c4), croplands (c5), wetland (c6), and agricultural land (c7). The abovementioned land cover classes are all regional objects in PolSAR images under 8-m resolution and they possess certain surface properties and spatial properties. By employing the Word2Vec

Semantic Representation Of PolSAR Land Cover Classes
The typical land cover classes in 8-m resolution PolSAR images include urban areas (c1), rural areas (c2), water (c3), forest lands (c4), croplands (c5), wetland (c6), and agricultural land (c7). The abovementioned land cover classes are all regional objects in PolSAR images under 8-m resolution and they possess certain surface properties and spatial properties. By employing the Word2Vec model [41], these land cover classes can all be mapped to semantic word vectors; the setting is the Skip-Gram model and the vector is of 400 dimensions. Some examples of the typical land cover classes and corresponding word vectors have been illustrated in Table 1.
On the other hand, we also employ the SUN attributes [43,44] since the 102-dimensional attribute vector, including some typical surface and spatial properties. These may correspond to the rich scene information in PolSAR images. So, some attributes of the corresponding scene categories are selected, the 'city' scene category represents the 'urban areas' class, the 'lake natural' scene category represents the 'water' class, the 'woodland' scene category represents the 'forest lands' class, the 'cornfield' scene category represents the 'croplands' class, the 'village' and 'factory outdoor' scene category represents the 'rural areas' class, and so on. As the same scene category in the SUN attributes has more Remote Sens. 2018, 10, 1307 8 of 21 than one vector, for the same scene, we use the mean of all vectors to represent the scene category attribute vectors. In detail, some attributes in the 102-dimensional attribute vectors, including 'natural, man-made, glossy, matte, sterile,' etc., are desirable attributes for the land cover classes' descriptions. However, the attributes including 'fire, sunny, rusty, warm, scary,' etc., are too different in the actual properties of the land cover classes. Thus, we subjectively selected 58-dimensional attribute vectors, which are more suitable for describing the actual properties of the land cover classes from the 102-dimensional original SUN attribute vectors. Some examples of the typical land cover classes and corresponding SUN and selected SUN attribute vectors have been illustrated in Table 2. In Table 1 and Table 2, the applied semantic information represented by word vectors and SUN attribute vectors are both quantified, which can represent typical word-based semantic vectors and attribute-based semantic vectors, respectively.  In detail, some attributes in the 102-dimensional attribute vectors, including 'natural, man-made, glossy, matte, sterile,' etc., are desirable attributes for the land cover classes' descriptions. However, the attributes including 'fire, sunny, rusty, warm, scary,' etc., are too different in the actual properties of the land cover classes. Thus, we subjectively selected 58-dimensional attribute vectors, which are more suitable for describing the actual properties of the land cover classes from the 102-dimensional original SUN attribute vectors. Some examples of the typical land cover classes and corresponding SUN and selected SUN attribute vectors have been illustrated in Table 2. In Table 1 and Table 2, the applied semantic information represented by word vectors and SUN attribute vectors are both quantified, which can represent typical word-based semantic vectors and attribute-based semantic vectors, respectively.  In detail, some attributes in the 102-dimensional attribute vectors, including 'natural, man-made, glossy, matte, sterile,' etc., are desirable attributes for the land cover classes' descriptions. However, the attributes including 'fire, sunny, rusty, warm, scary,' etc., are too different in the actual properties of the land cover classes. Thus, we subjectively selected 58-dimensional attribute vectors, which are more suitable for describing the actual properties of the land cover classes from the 102-dimensional original SUN attribute vectors. Some examples of the typical land cover classes and corresponding SUN and selected SUN attribute vectors have been illustrated in Table 2. In Table 1 and Table 2, the applied semantic information represented by word vectors and SUN attribute vectors are both quantified, which can represent typical word-based semantic vectors and attribute-based semantic vectors, respectively.  In detail, some attributes in the 102-dimensional attribute vectors, including 'natural, man-made, glossy, matte, sterile,' etc., are desirable attributes for the land cover classes' descriptions. However, the attributes including 'fire, sunny, rusty, warm, scary,' etc., are too different in the actual properties of the land cover classes. Thus, we subjectively selected 58-dimensional attribute vectors, which are more suitable for describing the actual properties of the land cover classes from the 102-dimensional original SUN attribute vectors. Some examples of the typical land cover classes and corresponding SUN and selected SUN attribute vectors have been illustrated in Table 2. In Table 1 and Table 2, the applied semantic information represented by word vectors and SUN attribute vectors are both quantified, which can represent typical word-based semantic vectors and attribute-based semantic vectors, respectively.  In detail, some attributes in the 102-dimensional attribute vectors, including 'natural, man-made, glossy, matte, sterile,' etc., are desirable attributes for the land cover classes' descriptions. However, the attributes including 'fire, sunny, rusty, warm, scary,' etc., are too different in the actual properties of the land cover classes. Thus, we subjectively selected 58-dimensional attribute vectors, which are more suitable for describing the actual properties of the land cover classes from the 102-dimensional original SUN attribute vectors. Some examples of the typical land cover classes and corresponding SUN and selected SUN attribute vectors have been illustrated in Table 2. In Table 1 and Table 2, the applied semantic information represented by word vectors and SUN attribute vectors are both quantified, which can represent typical word-based semantic vectors and attribute-based semantic vectors, respectively.  In detail, some attributes in the 102-dimensional attribute vectors, including 'natural, man-made, glossy, matte, sterile,' etc., are desirable attributes for the land cover classes' descriptions. However, the attributes including 'fire, sunny, rusty, warm, scary,' etc., are too different in the actual properties of the land cover classes. Thus, we subjectively selected 58-dimensional attribute vectors, which are more suitable for describing the actual properties of the land cover classes from the 102-dimensional original SUN attribute vectors. Some examples of the typical land cover classes and corresponding SUN and selected SUN attribute vectors have been illustrated in Table 2. In Table 1 and Table 2, the applied semantic information represented by word vectors and SUN attribute vectors are both quantified, which can represent typical word-based semantic vectors and attribute-based semantic vectors, respectively.  In detail, some attributes in the 102-dimensional attribute vectors, including 'natural, man-made, glossy, matte, sterile,' etc., are desirable attributes for the land cover classes' descriptions. However, the attributes including 'fire, sunny, rusty, warm, scary,' etc., are too different in the actual properties of the land cover classes. Thus, we subjectively selected 58-dimensional attribute vectors, which are more suitable for describing the actual properties of the land cover classes from the 102-dimensional original SUN attribute vectors. Some examples of the typical land cover classes and corresponding SUN and selected SUN attribute vectors have been illustrated in Table 2. In Tables 1 and 2, the applied semantic information represented by word vectors and SUN attribute vectors are both quantified, which can represent typical word-based semantic vectors and attribute-based semantic vectors, respectively. the 'city' scene category represents the 'urban areas' class, the 'lake natural' scene category represents the 'water' class, the 'woodland' scene category represents the 'forest lands' class, the 'cornfield' scene category represents the 'croplands' class, the 'village' and 'factory outdoor' scene category represents the 'rural areas' class, and so on. As the same scene category in the SUN attributes has more than one vector, for the same scene, we use the mean of all vectors to represent the scene category attribute vectors.
In detail, some attributes in the 102-dimensional attribute vectors, including 'natural, man-made, glossy, matte, sterile,' etc., are desirable attributes for the land cover classes' descriptions. However, the attributes including 'fire, sunny, rusty, warm, scary,' etc., are too different in the actual properties of the land cover classes. Thus, we subjectively selected 58-dimensional attribute vectors, which are more suitable for describing the actual properties of the land cover classes from the 102-dimensional original SUN attribute vectors. Some examples of the typical land cover classes and corresponding SUN and selected SUN attribute vectors have been illustrated in Table 2. In Table 1 and Table 2, the applied semantic information represented by word vectors and SUN attribute vectors are both quantified, which can represent typical word-based semantic vectors and attribute-based semantic vectors, respectively. the 'city' scene category represents the 'urban areas' class, the 'lake natural' scene category represents the 'water' class, the 'woodland' scene category represents the 'forest lands' class, the 'cornfield' scene category represents the 'croplands' class, the 'village' and 'factory outdoor' scene category represents the 'rural areas' class, and so on. As the same scene category in the SUN attributes has more than one vector, for the same scene, we use the mean of all vectors to represent the scene category attribute vectors.
In detail, some attributes in the 102-dimensional attribute vectors, including 'natural, man-made, glossy, matte, sterile,' etc., are desirable attributes for the land cover classes' descriptions. However, the attributes including 'fire, sunny, rusty, warm, scary,' etc., are too different in the actual properties of the land cover classes. Thus, we subjectively selected 58-dimensional attribute vectors, which are more suitable for describing the actual properties of the land cover classes from the 102-dimensional original SUN attribute vectors. Some examples of the typical land cover classes and corresponding SUN and selected SUN attribute vectors have been illustrated in Table 2. In Table 1 and Table 2, the applied semantic information represented by word vectors and SUN attribute vectors are both quantified, which can represent typical word-based semantic vectors and attribute-based semantic vectors, respectively. the 'city' scene category represents the 'urban areas' class, the 'lake natural' scene category represents the 'water' class, the 'woodland' scene category represents the 'forest lands' class, the 'cornfield' scene category represents the 'croplands' class, the 'village' and 'factory outdoor' scene category represents the 'rural areas' class, and so on. As the same scene category in the SUN attributes has more than one vector, for the same scene, we use the mean of all vectors to represent the scene category attribute vectors.
In detail, some attributes in the 102-dimensional attribute vectors, including 'natural, man-made, glossy, matte, sterile,' etc., are desirable attributes for the land cover classes' descriptions. However, the attributes including 'fire, sunny, rusty, warm, scary,' etc., are too different in the actual properties of the land cover classes. Thus, we subjectively selected 58-dimensional attribute vectors, which are more suitable for describing the actual properties of the land cover classes from the 102-dimensional original SUN attribute vectors. Some examples of the typical land cover classes and corresponding SUN and selected SUN attribute vectors have been illustrated in Table 2. In Table 1 and Table 2, the applied semantic information represented by word vectors and SUN attribute vectors are both quantified, which can represent typical word-based semantic vectors and attribute-based semantic vectors, respectively. the 'city' scene category represents the 'urban areas' class, the 'lake natural' scene category represents the 'water' class, the 'woodland' scene category represents the 'forest lands' class, the 'cornfield' scene category represents the 'croplands' class, the 'village' and 'factory outdoor' scene category represents the 'rural areas' class, and so on. As the same scene category in the SUN attributes has more than one vector, for the same scene, we use the mean of all vectors to represent the scene category attribute vectors.
In detail, some attributes in the 102-dimensional attribute vectors, including 'natural, man-made, glossy, matte, sterile,' etc., are desirable attributes for the land cover classes' descriptions. However, the attributes including 'fire, sunny, rusty, warm, scary,' etc., are too different in the actual properties of the land cover classes. Thus, we subjectively selected 58-dimensional attribute vectors, which are more suitable for describing the actual properties of the land cover classes from the 102-dimensional original SUN attribute vectors. Some examples of the typical land cover classes and corresponding SUN and selected SUN attribute vectors have been illustrated in Table 2. In Table 1 and Table 2, the applied semantic information represented by word vectors and SUN attribute vectors are both quantified, which can represent typical word-based semantic vectors and attribute-based semantic vectors, respectively. the 'city' scene category represents the 'urban areas' class, the 'lake natural' scene category represents the 'water' class, the 'woodland' scene category represents the 'forest lands' class, the 'cornfield' scene category represents the 'croplands' class, the 'village' and 'factory outdoor' scene category represents the 'rural areas' class, and so on. As the same scene category in the SUN attributes has more than one vector, for the same scene, we use the mean of all vectors to represent the scene category attribute vectors.
In detail, some attributes in the 102-dimensional attribute vectors, including 'natural, man-made, glossy, matte, sterile,' etc., are desirable attributes for the land cover classes' descriptions. However, the attributes including 'fire, sunny, rusty, warm, scary,' etc., are too different in the actual properties of the land cover classes. Thus, we subjectively selected 58-dimensional attribute vectors, which are more suitable for describing the actual properties of the land cover classes from the 102-dimensional original SUN attribute vectors. Some examples of the typical land cover classes and corresponding SUN and selected SUN attribute vectors have been illustrated in Table 2. In Table 1 and Table 2, the applied semantic information represented by word vectors and SUN attribute vectors are both quantified, which can represent typical word-based semantic vectors and attribute-based semantic vectors, respectively. the 'city' scene category represents the 'urban areas' class, the 'lake natural' scene category represents the 'water' class, the 'woodland' scene category represents the 'forest lands' class, the 'cornfield' scene category represents the 'croplands' class, the 'village' and 'factory outdoor' scene category represents the 'rural areas' class, and so on. As the same scene category in the SUN attributes has more than one vector, for the same scene, we use the mean of all vectors to represent the scene category attribute vectors.
In detail, some attributes in the 102-dimensional attribute vectors, including 'natural, man-made, glossy, matte, sterile,' etc., are desirable attributes for the land cover classes' descriptions. However, the attributes including 'fire, sunny, rusty, warm, scary,' etc., are too different in the actual properties of the land cover classes. Thus, we subjectively selected 58-dimensional attribute vectors, which are more suitable for describing the actual properties of the land cover classes from the 102-dimensional original SUN attribute vectors. Some examples of the typical land cover classes and corresponding SUN and selected SUN attribute vectors have been illustrated in Table 2. In Table 1 and Table 2, the applied semantic information represented by word vectors and SUN attribute vectors are both quantified, which can represent typical word-based semantic vectors and attribute-based semantic vectors, respectively.  Therefore, we can obtain the word vector matrix S _wv , SUN attributes vector matrix S _sa , and the selected SUN attribute vector matrix S _ssa about the typical land cover classes in the 8-m resolution PolSAR images. The size of S _wv is n × 400; n is the class number in the training and testing of PolSAR data samples, including the seen and unseen classes. The size of S _sa is n × 102, the size of S _ssa is n × 58. Then, we further construct semantic relevance G between the attributes, G _wv for S _wv , G _sa for S _sa , G _ssa for S _ssa , correspondingly. For example: G N = S T N S N , G N = [g 1 ; g 2 ; · · · ; g i ; · · · ; g n ] (4) g i = g i / g i 2 , G = g 1 ; g 2 ; · · · ; g i ; · · · ; g n (5) In Equation (2), s i is the attributes vector of a land cover class. From Equations (2)-(5), G _wv , G _sa , and G _ssa can be obtained. The size of G is n × n and it represents the semantic relationship between land cover. The semantic relevance G will be applied to take part in determining whether the test sample is a seen or unseen class.

Generalized Zero-Shot Learning with Semantic Relevance
As the above-mentioned ZSL and GZSL framework in Section 2.1, how to represent the projection relationship W between attributes with image features is a key for ZSL and GZSL. Here we employed the latent embedding model [37,48,49] to obtain W; latent embedding is a non-trivial extension of structured joint embedding [50] (SJE, the objective used for learning W in SJE is similar to that proposed for the structured SVM parameter learning). Instead of learning a single mapping transformation in SJE, latent embedding learns a piecewise linear compatibility function of K parameter matrices W i (i = 1,···, K, K ≥ 2). Latent embedding ZSL applies a ranking based objective to learn the model using an efficient and scalable stochastic gradient descent based solver [37]. For a typical ZSL task, the training data D = {(x n , y n )} N n=1 with the labels y n from the label space of the seen classes C S = {1, 2, . . . , p}, x n represents the image features of the sample. We denote C U = {p + 1, p + 2, . . . , p + q}, the label space of unseen the classes (C S ∩ C U = Ø). The main goal of ZSL is to classify the test data into the unseen classes, assuming the absence of the seen classes in the test stage. In other words, each test data is assumed to come from, and will be assigned to, one of the labels in C U . Given a test instance x (image features), conventional latent embedding ZSL will be labeled as the class whose semantic representation maximizes the following Equations (6) and (7). The inference in ZSL is usually based on the nearest neighbor strategy. It has been reported that latent embedding was state-of-the-art in ZSL for the benchmark datasets in computer vision [37,48,49].
In Equation (6), S Te is the semantic vector matrix of the test classes, only the unseen classes in ZSL. F c (x) means the discriminant scoring function.ŷ represents the predicted label. However, it has been proved that if the above ZSL classification prediction process in Equations (6) and (7) has been simply employed in the GZSL setting, the accuracies of most ZSL approaches drop significantly in this setting [33]. That is, nearly all test data from the unseen classes are misclassified into the seen classes when the ZSL approaches are applied with the test samples containing both seen and unseen classes. Thus, to realize GZSL in the proposed framework, the label prediction process must be improved.
There has been very little work on generalized zero-shot learning (GZSL) [24,33,34], but GZSL is more attractive and practical in the PolSAR interpretation application. GZSL can provide a new way for PolSAR image land cover classification by utilizing the rich polarization features and semantic information in the PolSAR imagery to identify both the unseen and seen classes via leveraging semantic information. In the existing GZSL model proposed in [33], the main idea was to introduce a calibration factor to calibrate the classifiers for both seen and unseen classes. That model has been tested on benchmark datasets, including the AwA and CUB. This means that the inference stage is important for GZSL.
Here the semantic relevance obtained from Section 3.2 has been employed to the inference progress of the proposed GZSL framework. After the projection relationship W between the semantic attributes with polarimetric features is established by means of the latent embedding in the training stage. GZSL test samples include not only the unseen classes, but also the seen classes. In the proposed GZSL framework for PolSAR land cover classification, S Te includes the seen and unseen classes' semantic vectors. The above semantic relevance G was introduced to constraint and amend the GZSL classification prediction process. The union of the seen classes C S and unseen classes C U are represented by SU = C S ∪ C U .
Given a test instance x which represents the mid-level polarization feature in the proposed framework, we can get a preliminary discriminant score from Equation (6). For every test instance, the preliminary discriminant score vector has p + q elements, as shown in Equation (8). Then, the final label of the test sample can be obtained from the classification rule in Equations (9) and (10). The corresponding semantic relevance g ij is an amendment to reduce the scores for the seen classes.

GZSL For PolSAR Land Cover Classification
This paper is aimed at recognizing both seen and unseen instances in PolSAR images by applying the semantic information, as illustrated in Figure 2. The GZSL is constructed by means of land cover class semantic attribute descriptions, the relationship between semantic attributes, and the projection between the image feature layer, intermediate semantic information layer, and the class label layer. In Section 3.1, the effective mid-level polarization feature is first extracted. Then, as in Section 3.2, the semantic attributes from the Word2Vec semantic vectors, SUN attributes, and the selected SUN attributes are collected and analyzed to describe the characteristics of the PolSAR typical classes, and the semantic relevance G between the attributes is obtained. Moreover, the projection relationship W between the mid-level representation of the PolSAR data samples and class attributes are established by latent embedding during the training stage. Finally, for every test instance, through the GZSL model constructed by polarization feature representation, projection relationship W and semantic relevance G, the labels of the test instances can be predicted, even though some test classes do not have training samples. By utilizing the rich polarization features and semantic information in the PolSAR imagery, the proposed GZSL can provide a more practical solution for PolSAR interpretation to recognize some new land cover categories without labeled samples, while the conventional supervised approaches always fail to categorize the unseen instances. To a certain extent, GZSL can reduce the requirement for sample labeling and make the framework has the ability to identify the new types in PolSAR land cover classification.
It should be noted that the selected unseen class should correspond with some semantically related seen samples, e.g., when the unseen class is croplands, there should be some vegetation classes in the seen samples; when the unseen class is building areas, there should be some man-made classes in the seen samples. The semantic relation is the base of the ZSL and GZSL ability, and these can avoid the projection domain shift problem [26,51] to some degree. This phenomenon will be further demonstrated by experiments in Sections 4 and 5.

Experimental Results
In this section, we first introduce the experimental data and then present the classification results obtained by applying the proposed method from the RadarSAT-2 fully-polarimetric SAR imagery. Moreover, the evaluation of the experimental results has been shown.

Experimental Data and the Settings
The effectiveness of the proposed method has been tested on three RadarSAT-2 PolSAR datasets (C-band at fine quad-pol mode, with a resolution of 8 m). The selected data contain more types of land cover classes and the orientations of the building areas are also more complicated; higher requirements are put forward for the classification algorithm. In total, the experimental data contain seven kinds of typical land cover classes. The basic information and the GZSL unseen/seen ratio of the selected experimental data are shown in Table 3. All the samples from PolSAR data are 50 × 50 pixels patches, and the training samples are selected randomly.

Results and Evaluation of the Flevoland Data
For the RadarSAT-2 Flevoland data, as shown in Figure 4, the training samples include urban areas, water, and forest lands; the testing samples include urban areas, water, forest lands, and croplands; that is, the unseen class is croplands. The PolSAR data is first divided into 50 × 50 pixel patches with an 80% overlapping rate; that is, the number test samples of the Flevoland data is 15,776. Additionally, the sample number for each training class is 100 and all the three kinds of labeled samples are selected randomly. The four kinds of test class results obtained by the proposed GZSL method have been shown in Figure 4, the accuracy has been evaluated in Table 4, and the adapted semantic attributes from the Word2Vec semantic vectors, SUN attributes, and the selected SUN attributes have been evaluated, too.  In Figure 4, when the training classes are urban areas, water, forest lands, and the test classes are urban areas, water, forest lands, and croplands, the GZSL results applied Word2Vec attributes, SUN attributes, and the selected SUN attributes are shown in Figure 4c-e, correspondingly. Additionally, the proposed method can recognize the unseen class with an accuracy of about 60.76%, 68.72%, and 73.93% through the applied Word2Vec attributes, SUN attributes, and the selected SUN attributes, accordingly, without the training cropland samples. Furthermore, the overall accuracies of the proposed GZSL framework in the Flevoland data are all above 74.5% when the unseen/seen ratio is 1/3; that is, the training/testing classes' ratio is 3/4. Among the employed attributes, the results in Figure 4 and Table 4 show the best effect are the selected SUN attributes, with an overall accuracy of about 78%. We repeated experiments in Figure 4e ten times, the mean and standard deviation of the unseen class accuracy were 73.07% and 0.015, and the mean and standard deviation of the overall accuracy were 77.56% and 0.0075. In Figure 4, when the training classes are urban areas, water, forest lands, and the test classes are urban areas, water, forest lands, and croplands, the GZSL results applied Word2Vec attributes, SUN attributes, and the selected SUN attributes are shown in Figure 4c-e, correspondingly. Additionally, the proposed method can recognize the unseen class with an accuracy of about 60.76%, 68.72%, and 73.93% through the applied Word2Vec attributes, SUN attributes, and the selected SUN attributes, accordingly, without the training cropland samples. Furthermore, the overall accuracies of the proposed GZSL framework in the Flevoland data are all above 74.5% when the unseen/seen ratio is 1/3; that is, the training/testing classes' ratio is 3/4. Among the employed attributes, the results in Figure 4 and Table 4 show the best effect are the selected SUN attributes, with an overall accuracy of about 78%. We repeated experiments in Figure 4e ten times, the mean and standard deviation of the unseen class accuracy were 73.07% and 0.015, and the mean and standard deviation of the overall accuracy were 77.56% and 0.0075.
For the experimental settings in Figure 4, the semantic relevance between the unseen class (croplands in Figure 4) and the forest lands in the seen classes is of great semantic similarity. In order to further verify the semantic relevance and domain shift problem described in Section 3, the following experiments set the urban areas, water, and forest lands as the unseen class, one by one, and then used the other three classes of samples to obtain the information of the four land cover classes' information Remote Sens. 2018, 10, 1307 13 of 21 and to make corresponding quantitative evaluations. All the GZSL results in Figure 5 applied the selected SUN attributes.   For the experimental settings in Figure 4, the semantic relevance between the unseen class (croplands in Figure 4) and the forest lands in the seen classes is of great semantic similarity. In order to further verify the semantic relevance and domain shift problem described in Section 3, the following experiments set the urban areas, water, and forest lands as the unseen class, one by one, and then used the other three classes of samples to obtain the information of the four land cover classes' information and to make corresponding quantitative evaluations. All the GZSL results in Figure 5 applied the selected SUN attributes.
From Figure 5a-d, the classification results have been obtained with the unseen class of urban areas, water, forest lands, and croplands, correspondingly. As illustrated in Figure 5a, the training classes are water, forest lands, and croplands, and the test classes are urban areas, water, forest lands, and croplands. The accuracy has been evaluated in Table 5. As shown in Figure 5 and Table 5, it can be found that when the unseen classes are urban areas and water, the overall classification accuracies are low: 68.53% and 72.65%, respectively. Additionally, when the unseen classes are forest lands and croplands, the overall classification accuracies are higher: 77.42% and 78.04%, respectively. These are From Figure 5a-d, the classification results have been obtained with the unseen class of urban areas, water, forest lands, and croplands, correspondingly. As illustrated in Figure 5a, the training classes are water, forest lands, and croplands, and the test classes are urban areas, water, forest lands, and croplands. The accuracy has been evaluated in Table 5. As shown in Figure 5 and Table 5, it can be found that when the unseen classes are urban areas and water, the overall classification accuracies are low: 68.53% and 72.65%, respectively. Additionally, when the unseen classes are forest lands and croplands, the overall classification accuracies are higher: 77.42% and 78.04%, respectively. These are due to the high semantic relevance between forest lands and croplands, which can be distinguished based on the distance relationships described in Section 3.4; while the semantic relevance between urban areas with other land cover classes in the Flevoland data is poor, so are those of the water with other land cover classes. Since ZSL and GZSL depend on the semantic relevance to identify unseen classes, if the semantic relevance between the seen classes and unseen classes are poor, the above-mentioned domain shift problem [26,51] tends to appear, making it difficult to identify the unseen classes with poor semantic relevance accurately.

Results of the Wuhan Data1
For the RadarSAT-2 Wuhan Data1, as shown in Figure 6a, the training samples include urban areas, water, and forest lands; the testing samples include urban areas, water, forest lands, and rural areas. That is, the unseen class are the rural areas. The PolSAR data are first divided into 50 × 50 pixel patches with a 50% overlapping rate, the number of test samples in Wuhan Data1 is 20,805. Additionally, the sample number for each training class is 100 and all the three kinds of the labeled samples are selected randomly. The results obtained by the proposed GZSL method have been shown in Figure 6b-d, and the adapted semantic attributes from the Word2Vec semantic vectors, the SUN attributes, and the selected SUN attributes have been shown accordingly.
Remote Sens. 2018, 10, x FOR PEER REVIEW 14 of 21 due to the high semantic relevance between forest lands and croplands, which can be distinguished based on the distance relationships described in Section 3.4; while the semantic relevance between urban areas with other land cover classes in the Flevoland data is poor, so are those of the water with other land cover classes. Since ZSL and GZSL depend on the semantic relevance to identify unseen classes, if the semantic relevance between the seen classes and unseen classes are poor, the abovementioned domain shift problem [26,51] tends to appear, making it difficult to identify the unseen classes with poor semantic relevance accurately.

Results of the Wuhan Data1
For the RadarSAT-2 Wuhan Data1, as shown in Figure 6a, the training samples include urban areas, water, and forest lands; the testing samples include urban areas, water, forest lands, and rural areas. That is, the unseen class are the rural areas. The PolSAR data are first divided into 50 × 50 pixel patches with a 50% overlapping rate, the number of test samples in Wuhan Data1 is 20,805. Additionally, the sample number for each training class is 100 and all the three kinds of the labeled samples are selected randomly. The results obtained by the proposed GZSL method have been shown in Figure 6b-d, and the adapted semantic attributes from the Word2Vec semantic vectors, the SUN attributes, and the selected SUN attributes have been shown accordingly.  For the experiment with the Wuhan Data1, we set the unseen class as rural areas; that is, there are no training samples with rural areas. However, from the GZSL results in Figure 6b-d, it can be seen that the rural areas' information can also be classified. In detail, it is easy to find that the rural areas in Figure 6c,d are larger than the results in Figure 6b. This is mainly due to the semantic difference. The SUN attribute database does not have a category that strictly corresponds to the rural areas, so we use a combination of attributes from 'townhouse' and 'village' to represent the semantic information of the rural areas.
In order to illustrate the effectiveness of the classification of the rural areas' information in Figure 6, we chose an ROI for verification. This ROI correspond to Canglongdao in the Jiangxia district, a suburb of Wuhan. Additionally, the corresponding ROI optical images from Google Earth have been illustrated in Figure 7b. There are a large number of low and small buildings in this ROI, and the buildings are sparse, which are different from the urban areas. The results in Figure 7c-e reflect the semantic relevance between the urban areas and rural areas in the Word2Vec attributes space, the SUN attributes spaces, and the selected SUN attribute spaces. areas in Figure 6c,d are larger than the results in Figure 6b. This is mainly due to the semantic difference. The SUN attribute database does not have a category that strictly corresponds to the rural areas, so we use a combination of attributes from 'townhouse' and 'village' to represent the semantic information of the rural areas.
In order to illustrate the effectiveness of the classification of the rural areas' information in Figure  6, we chose an ROI for verification. This ROI correspond to Canglongdao in the Jiangxia district, a suburb of Wuhan. Additionally, the corresponding ROI optical images from Google Earth have been illustrated in Figure 7b. There are a large number of low and small buildings in this ROI, and the buildings are sparse, which are different from the urban areas. The results in Figure 7c-e reflect the semantic relevance between the urban areas and rural areas in the Word2Vec attributes space, the SUN attributes spaces, and the selected SUN attribute spaces.
The results in Sections 4.2 and 4.3 show that the proposed GZSL framework can classify some new land cover categories without labeled samples by using the semantic information between land cover categories and effective polarization features. The experimental results also illustrate the potential of semantic information for PolSAR land cover classification, and the GZSL mechanism can reduce the requirement for sample labeling to a certain extent. Moreover, the effectiveness of adapted semantic attributes about Word2Vec semantic vectors, SUN attributes, and the selected SUN attributes have been evaluated and compared. From the quantitative evaluation of Flevoland data in Table 4, the SUN attributes are slightly better than the Word2Vec semantic vectors under the same experimental conditions. This may be due to the fact that the SUN attributes contain more spatial and surface properties, which is consistent with the characteristics of remote sensing images. The results in Table 5 prove that the semantic relation is the base of the GZSL ability.

Discussion
Since the land cover classes in the Flevoland data and Wuhan Data1 are not very numerous, the unseen class in the above experiment is one class; that is, the ratio of the training classes and testing classes is 3/4. To further illustrate the effectiveness of the proposed GZSL method, we implemented The results in Sections 4.2 and 4.3 show that the proposed GZSL framework can classify some new land cover categories without labeled samples by using the semantic information between land cover categories and effective polarization features. The experimental results also illustrate the potential of semantic information for PolSAR land cover classification, and the GZSL mechanism can reduce the requirement for sample labeling to a certain extent. Moreover, the effectiveness of adapted semantic attributes about Word2Vec semantic vectors, SUN attributes, and the selected SUN attributes have been evaluated and compared. From the quantitative evaluation of Flevoland data in Table 4, the SUN attributes are slightly better than the Word2Vec semantic vectors under the same experimental conditions. This may be due to the fact that the SUN attributes contain more spatial and surface properties, which is consistent with the characteristics of remote sensing images. The results in Table 5 prove that the semantic relation is the base of the GZSL ability.

Discussion
Since the land cover classes in the Flevoland data and Wuhan Data1 are not very numerous, the unseen class in the above experiment is one class; that is, the ratio of the training classes and testing classes is 3/4. To further illustrate the effectiveness of the proposed GZSL method, we implemented the following experiment on Wuhan Data2, as shown in Figure 8a. There are more rich land cover classes in Wuhan Data2. The land cover classes in this data mainly consist of six types: urban areas, water, forest lands, rural areas, wetland, and agricultural land.
the following experiment on Wuhan Data2, as shown in Figure 8a. There are more rich land cover classes in Wuhan Data2. The land cover classes in this data mainly consist of six types: urban areas, water, forest lands, rural areas, wetland, and agricultural land.
For the RadarSAT-2 Wuhan Data2, the training samples include urban areas, water, and forest lands, and the testing samples include all six types mentioned above; that is, the unseen classes are the rural areas, wetland, and agricultural land. The PolSAR data are first divided into 50 × 50 pixel patches with a 50% overlapping rate, the number of test samples in Wuhan Data2 is 30,441. Additionally, the sample number for each training class is 100 and all the three kinds of the labeled samples are selected randomly. These three kinds of samples in the Wuhan Data2 can be the same as the Wuhan Data1 samples. The six kinds of test classes' results obtained by the proposed GZSL method has been shown in Figure 8b. Additionally, the adapted semantic attributes mainly come from the Word2Vec model, the main reason was that there was no exact category with attributes in the SUN attributes database on 'wetland' and 'agricultural land', and the 'rural areas' attributes in the SUN attributes database were not exactly the same. From Figure 8b, it can be seen that the rural areas and agricultural land were basically recognized simultaneously, although neither of these classes have training samples. ROI_1 and ROI_2 correspond to the rural areas and agricultural land (most of these lands contain some low vegetation). The corresponding ROI optical images from Google Earth have been illustrated in Figure  9a,c. Typical field investigation images in the selected ROIs have been illustrated in Figure 9b,d. Combined with the provided reference images in Figure 9, it can be seen that the ROI_1 and ROI_2 classification results in Figure 8b have a certain degree of effectiveness. Moreover, the ratio of the training classes and testing classes is 3/6, which also shows that the proposed method is effective and has certain extensibility. In the above experiments, there is a certain semantic relevance between the For the RadarSAT-2 Wuhan Data2, the training samples include urban areas, water, and forest lands, and the testing samples include all six types mentioned above; that is, the unseen classes are the rural areas, wetland, and agricultural land. The PolSAR data are first divided into 50 × 50 pixel patches with a 50% overlapping rate, the number of test samples in Wuhan Data2 is 30,441. Additionally, the sample number for each training class is 100 and all the three kinds of the labeled samples are selected randomly. These three kinds of samples in the Wuhan Data2 can be the same as the Wuhan Data1 samples. The six kinds of test classes' results obtained by the proposed GZSL method has been shown in Figure 8b. Additionally, the adapted semantic attributes mainly come from the Word2Vec model, the main reason was that there was no exact category with attributes in the SUN attributes database on 'wetland' and 'agricultural land', and the 'rural areas' attributes in the SUN attributes database were not exactly the same.
From Figure 8b, it can be seen that the rural areas and agricultural land were basically recognized simultaneously, although neither of these classes have training samples. ROI_1 and ROI_2 correspond to the rural areas and agricultural land (most of these lands contain some low vegetation). The corresponding ROI optical images from Google Earth have been illustrated in Figure 9a,c. Typical field investigation images in the selected ROIs have been illustrated in Figure 9b,d. Combined with the provided reference images in Figure 9, it can be seen that the ROI_1 and ROI_2 classification results in Figure 8b have a certain degree of effectiveness. Moreover, the ratio of the training classes and testing classes is 3/6, which also shows that the proposed method is effective and has certain extensibility. In the above experiments, there is a certain semantic relevance between the unseen classes and the seen classes, such as between the rural areas and the urban areas, between wetland and water. Thus, GZSL can classify some new land cover categories without the labeled samples, even though the ratio of the training classes and test classes is 3:6.
Remote Sens. 2018, 10, x FOR PEER REVIEW 17 of 21 unseen classes and the seen classes, such as between the rural areas and the urban areas, between wetland and water. Thus, GZSL can classify some new land cover categories without the labeled samples, even though the ratio of the training classes and test classes is 3:6. Experiments on the above three RadarSAT-2 PolSAR datasets show that the proposed method can classify 4-6 testing classes with only three training classes. That is, the ratio of unseen/seen classes can be 1/3-3/3. There are few works on ZSL application in the remote sensing interpretation, and GZSL for PolSAR land cover classification has not been studied in the remote sensing literature, to the best of our knowledge. We introduce the unseen/seen class ratio settings in other related works (ZSL applied in remote sensing interpretation) to make a comparative analysis. For the ZSL target recognition problem in the SAR image in [32], the ratio of unseen/seen classes is 1/7, that is, seven classes for training and only one class for testing. For the ZSL classification problem in the high spatial resolution image in [22], the average accuracy was about 58% with an unseen/seen ratio of 5/16. Another ZSL problem in [21] is the fine-grained street tree recognition in the aerial data; the recognition accuracy was about 14.3% with an unseen/seen ratio of 16/24. In our work, combining the actual basic requirements of the PolSAR interpretation, the unseen/seen ratio was 1/3-3/3, that is, the training/testing classes' ratio is 3/4-3/6. Due to the small amount of training samples selected in the experiments, less than 1.9% (300/15,996, 300/20,805, 300/30,441) of the samples were randomly selected as training samples, the accuracy of our experimental results is not very high at present. However, we think that our method is feasible and effective in the application of classifying some new land cover categories without labeled samples, for PolSAR imagery interpretation with about 8m resolution. The GZSL framework can reduce the requirement for sample labeling and give the framework the ability to identify new types in PolSAR land cover classification.
In subsequent research work, several topics need to be further studied. Firstly, a more professional semantic description about the land cover classes or targets in PolSAR imagery should be analyzed, including the scattering characteristic, resolution, polarization mode, incident angle, seasonal, and other information. Secondly, the semantic modeling methods and tools for the Experiments on the above three RadarSAT-2 PolSAR datasets show that the proposed method can classify 4-6 testing classes with only three training classes. That is, the ratio of unseen/seen classes can be 1/3-3/3. There are few works on ZSL application in the remote sensing interpretation, and GZSL for PolSAR land cover classification has not been studied in the remote sensing literature, to the best of our knowledge. We introduce the unseen/seen class ratio settings in other related works (ZSL applied in remote sensing interpretation) to make a comparative analysis. For the ZSL target recognition problem in the SAR image in [32], the ratio of unseen/seen classes is 1/7, that is, seven classes for training and only one class for testing. For the ZSL classification problem in the high spatial resolution image in [22], the average accuracy was about 58% with an unseen/seen ratio of 5/16. Another ZSL problem in [21] is the fine-grained street tree recognition in the aerial data; the recognition accuracy was about 14.3% with an unseen/seen ratio of 16/24. In our work, combining the actual basic requirements of the PolSAR interpretation, the unseen/seen ratio was 1/3-3/3, that is, the training/testing classes' ratio is 3/4-3/6. Due to the small amount of training samples selected in the experiments, less than 1.9% (300/15,996, 300/20,805, 300/30,441) of the samples were randomly selected as training samples, the accuracy of our experimental results is not very high at present. However, we think that our method is feasible and effective in the application of classifying some new land cover categories without labeled samples, for PolSAR imagery interpretation with about 8-m resolution. The GZSL framework can reduce the requirement for sample labeling and give the framework the ability to identify new types in PolSAR land cover classification.
In subsequent research work, several topics need to be further studied. Firstly, a more professional semantic description about the land cover classes or targets in PolSAR imagery should be analyzed, including the scattering characteristic, resolution, polarization mode, incident angle, seasonal, and other information. Secondly, the semantic modeling methods and tools for the aforementioned semantic description of the PolSAR land cover classes or targets need to be further developed. A potential and promising semantic modeling method is the ontological semantic model [4,27], and this will be the focus of our research work in the next step. Thirdly, as an important topic in computer vision research, the ability of ZSL or GZSL has not yet been standardized; that is, there is no agreed upon ZSL or GZSL benchmark [24,33]. Thus, the potential application of ZSL and GZSL can be further explored for SAR image interpretation.

Conclusions
For PolSAR land cover classification, it is a highly probable scenario that new land cover categories can be introduced after the training stage, or that no training examples available for several rare and interesting classes. Inspired by generalized zero-shot learning (GZSL), which can categorize instances from both seen and previously unseen classes, this paper studies the problem of classifying both the unseen and seen land cover classes' information from the PolSAR image under a semantic expressed GZSL framework. By leveraging the rich semantic relevance between land cover attributes in the PolSAR imagery, the semantic relevance between attributes is first obtained to relate unseen and seen classes. Then, the projection relationship between the effective mid-level polarization features and class attributes is established by latent embedding during training. Finally, for every test instance, through the GZSL model constructed by the mid-level polarization feature, projection relationship, and semantic relevance, the labels of the test instances can be predicted, even though some test instances do not have training samples backing them. The quantitative and qualitative evaluation of experiments on the three RadarSAT-2 datasets have shown that the classification accuracy of an unseen class is about 73% if there are some semantically-related seen classes in the training stage. Additionally, the proposed method can classify 4-6 testing classes with only three training classes. This GZSL framework can reduce the requirement for sample labeling giving the framework the ability to identify new types in PolSAR land cover classifications. Moreover, three kinds of land cover class attributes, which include the Word2Vec semantic vectors, SUN attributes, and the selected SUN attributes, have been applied and evaluated.
Since the currently employed semantic attributes lack some polarimetric semantic expressions, including the resolution, scattering characteristic, incident angle, season information, and so on, the classification ability of the proposed GZSL method is still relatively conservative. In the future, we expect to continually improve and develop our research on semantic modeling of the land cover or targets in SAR images.
Author Contributions: All authors contributed to forming the general idea of the paper, and helped conceive and design the experiments. R.G. created the research design, performed the experiments, analyzed the data, and wrote the draft; X.X. conducted the coordination of the research activities and provided critical comments to improve the paper; L.W. helped edit the draft and contributed to developing the PolSAR feature representation algorithm; R.Y. contributed to the accuracy assessment and manuscript writing; and F.P. helped propose and develop the semantic information model.