A Data Transfer Fusion Method for Discriminating Similar Spectral Classes

Hyperspectral data provide new capabilities for discriminating spectrally similar classes, but such class signatures sometimes will be difficult to analyze. To incorporate reliable useful information could help, but at the same time, may also lead increased dimensionality of the feature vector making the hyperspectral data larger than expected. It is challenging to apply discriminative information from these training data to testing data that are not in the same feature space and with different data distributions. A data fusion method based on transfer learning is proposed, in which transfer learning is introduced into boosting algorithm, and other out-date data are used to instruct hyperspectral image classification. In order to validate the method, experiments are conducted on EO-1 Hyperion hyperspectral data and ROSIS hyperspectral data. Significant improvements have been achieved in terms of accuracy compared to the results generated by conventional classification approaches.


Introduction
With advanced sensors and space technology, now it is possible to access to remote sensing (RS) image data, which potentially provide more information individually and as a time series. RS has become an indispensable tool in many scientific disciplines. It is one of the major tools in monitoring our earth environment in a cost-effective way. Hyperspectral sensors simultaneously capture hundreds of narrow and contiguous spectral bands from a wide range of the electromagnetic spectrum. Due to their capability to precisely characterize the spectral signatures of different materials, hyperspectral images have been extensively used in the last decades in remote sensing applications. In such context, hyperspectral images are informative sources for detailed mapping, environmental monitoring, modeling, and biophysical characterization of agricultural crops [1][2][3][4].
Apart from deploying improved hardware system approaches [5], efficient application of this advanced capability should be with sophisticated hyperspectral image processing and analysis methods. While any of multispectral classification methods may be directly extended to hyperspectral images, there are additional challenges in the huge training data requirements, computational cost, and constraints on exploiting the information content. Classification of hyperspectral imagery is usually performed in a reduced feature space whose dimensionality is significantly lower than the number of original spectral bands [6].
Those will limit the direct application of multispectral image classification methods for hyperspectral image classification. Consequently, several image pre-processing techniques are now available for hyperspectral dimensionality reduction while using multispectral classification methods. Many other methods have also been introduced such as Spectral Angle Mapper (SAM), Spectral

Transfer Learning Based Fusion Method
In this paper, we mainly use the instance-transferred idea and propose a method that combines transfer learning with data fusion. In many machine learning applications, the case exists that the training or labeled data are too sparse to train a classification model with better performance. In this case, the traditional learning methods require users to re-collect more labeled data, which is expensive in time and cost. However, there are often a lot of existing out-of-date data which are related to training data. Part of those data can be able to considered as the source domain and reused to instruct these problems as the target domain. Transfer learning can be applied in different learning domains. In transfer learning, we are particularly interested in transferring the knowledge from source task to a target task rather than learning all source and target tasks simultaneously, whereas, not all source instances are available to the target task. If the training sets in source data were sufficient, the algorithm efficiency would be lower due to excessive selection. To improve the efficiency of transfer learning, the framework of source instances based on adaboost has been carried out as shown in Figure 1. There are mainly two parts in this scheme, which include selecting instances and removing misleading instances.

Selecting the Source Domain Instances to Append Labeled Target Domain Instances
In transfer learning, the problem solved firstly is to implement the domain adaptation, one approach of which is instances transfer [16]. In this paper, we mainly use the instances transfer. However, not all source instances are available to the target task. If the source data are large scale, the algorithm efficiency would be lower because of the available selection. Thus, we select some source data firstly to implement preliminary domain adaptation. Generally, instances transfer methods are also motivated by instance weighting. In our work, we used the adaboost algorithm to select the source instances for transfer learning.
In this method, domain adaptation is firstly implemented. Generally, instances transfer methods will be motivated by instance weighting. The adaboost algorithm is used to select the source instances for transfer learning. The distribution of source domain distinguishes from the target domain; however, both domains are in the identical feature space. Assume the source domain instances set is Equation (1), and the target domain instances set is Equation (2): As it is expected that the source domain knowledge can instruct the target domain learning, the training procedure in the adaboosting is adopted to select the instances according to their weights. During training procedure of adaboosting, the weights of the instances wrongly classified are increased. The training dataset is composed of source instances and target labeled ones instructed. After training, those source instances with higher weights are similar to the target ones. The source instance set is X S , and the target instance-set X T1 , where n1 < m. Let the labels of X S be 1, while the labels of X T1 are −1. In order to make two patterns instances of balance, X S can be divided into about m/n1 portions, that is Equation (5), where [.] is a rounding operator. Each portion of training datasets is Equation (6), which respectively train the classifiers based on adaboosting. The instances weights are updated during training, which of those instances wrongly classified increase. After a few rounds training, the source instances with higher weights that exceed a threshold W are considered that they have the similar property to the target domain. Then, they can be selected to form instances set .

Selecting the Source Domain Instances to Append Labeled Target Domain Instances
In transfer learning, the problem solved firstly is to implement the domain adaptation, one approach of which is instances transfer [16]. In this paper, we mainly use the instances transfer. However, not all source instances are available to the target task. If the source data are large scale, the algorithm efficiency would be lower because of the available selection. Thus, we select some source data firstly to implement preliminary domain adaptation. Generally, instances transfer methods are also motivated by instance weighting. In our work, we used the adaboost algorithm to select the source instances for transfer learning.
In this method, domain adaptation is firstly implemented. Generally, instances transfer methods will be motivated by instance weighting. The adaboost algorithm is used to select the source instances for transfer learning. The distribution of source domain distinguishes from the target domain; however, both domains are in the identical feature space. Assume the source domain instances set is Equation (1), and the target domain instances set is Equation (2): If the source instances only instruct one type or some types, the labeled set is Equation (3), while the residual target labeled instance set is Equation (4). The preliminary selection scheme of source data is shown as follows: As it is expected that the source domain knowledge can instruct the target domain learning, the training procedure in the adaboosting is adopted to select the instances according to their weights. During training procedure of adaboosting, the weights of the instances wrongly classified are increased. The training dataset is composed of source instances and target labeled ones instructed. After training, those source instances with higher weights are similar to the target ones. The source instance set is X S , and the target instance-set X T1 , where n 1 < m. Let the labels of X S be 1, while the labels of X T1 are −1. In order to make two patterns instances of balance, X S can be divided into about m/n 1 portions, that is Equation (5), where [.] is a rounding operator. Each portion of training datasets is Equation (6), which respectively train the classifiers based on adaboosting. The instances weights are updated during training, which of those instances wrongly classified increase. After a few rounds training, the source instances with higher weights that exceed a threshold W are considered that they have the similar property to the target domain. Then, they can be selected to form instances set X S sub .

Removing "Misleading" Source Domain Instances
However, the above method is only applied to select source instances for only one type of target domain. The source instances used to transfer learning should be easily classified from instances of other types. To avoid the negative transfer, adaboost is used to remove "misleading" source domain instances as shown in Figure 2. The training dataset is composed of X S sub and the other types target labeled instance-set X T2 . Unlike previous steps, we select the source instances classified correctly X S sub from X S sub . X S sub1 is the final transfer instance-set to instruct the target learning.

Removing "Misleading" Source Domain Instances
However, the above method is only applied to select source instances for only one type of target domain. The source instances used to transfer learning should be easily classified from instances of other types. To avoid the negative transfer, adaboost is used to remove "misleading" source domain instances as shown in Figure 2. The training dataset is composed of and the other types target labeled instance-set X T2 . Unlike previous steps, we select the source instances classified correctly from . is the final transfer instance-set to instruct the target learning. The source domain instances selected out are as the training data with the labeled ones of target domain. In traditional classification model, the training instances are considered to have the same distributions as test instances. If the distributions varies, the traditional model is not acceptable and supposed to be modified.
The source domain instances selected out are as the training data with the labeled ones of the target domain. In traditional classification model, the distributions of the training and test instances are considered the same. When the distributions are different, the traditional model is not suitable and must be modified.
In the target domain, the instances set is X T , in which the unlabeled instances set is the test dataset = { ), where ∈ ( = 1,2, … , ), and the labeled instances set whose distribution is similar to TS is denoted = { , ( )}, where ∈ ( = 1,2, … , ), ( ) is the label for and = ∪ , so the size of X T is k + n. with a different distribution from TS is redefined as = { , ( )} , where ∈ ( = 1,2, … , ) . The training data is denoted as = ∪ . Also and are respectively named the same-distribution dataset and the diff-distribution dataset. The scheme is as follows: Input: Directed Network N; Nodes number K; Training rounds T; Sampling parameter ; The target labeled instances set and the source instances on the k-th node are respectively and , k = 1,2,…,K, where lS(k) is the size of ; Initialize: for any node k (k = 1, 2,…, K), the weight of instance xi: Do for: Step 1.
Generate a replicate training set , of size ( ) + ( ) , by weighted sub-sampling with replacement from training set and , k = 1, 2,…, K, respectively; The source domain instances selected out are as the training data with the labeled ones of target domain. In traditional classification model, the training instances are considered to have the same distributions as test instances. If the distributions varies, the traditional model is not acceptable and supposed to be modified.
The source domain instances selected out are as the training data with the labeled ones of the target domain. In traditional classification model, the distributions of the training and test instances are considered the same. When the distributions are different, the traditional model is not suitable and must be modified.
In the target domain, the instances set is X T , in which the unlabeled instances set is the test dataset TS = x T i ) , where x T i ∈ X T (i = 1, 2, . . . , k), and the labeled instances set whose distribution is similar to TS is denoted is the label for x T i and TR S = X T1 ∪ X T2 , so the size of X T is k + n. X S sub1 with a different distribution from TS is redefined as The training data is denoted as TR = TR S ∪ TR D . Also TR S and TR D are respectively named the same-distribution dataset and the diff-distribution dataset. The scheme is as follows: Input: Directed Network N; Nodes number K; Training rounds T; Sampling parameter ρ; The target labeled instances set and the source instances on the k-th node are respectively TR S k and TR D k , k = 1,2, . . . ,K, where l S (k) is the size of TR D k ; Initialize: for any node k (k = 1, 2, . . . , K), the weight of instance x i : Do for: Step 1. Generate a replicate training set T k,t of size ρl S (k) + ρl D (k), by weighted sub-sampling with replacement from training set TR S k and TR D k , k = 1, 2, . . . , K, respectively; Step 2. Train the classifier (node) C k,t in the classifiers network with respect to the weighted training set T k,t and obtain the hypothesis for multi-classification h k,t : x → Y , k = 1, 2, . . . , K, where Y is the label set.
Step 3. Calculate the weighted error rate of the instances in TR S k : Step 4. Hypothesize the classifier C k,t weight: Step 5. Set weights update parameters β k,i = Step 6. Update the weight of instance i of node k: where node n is neighbor of node k.
Output: Final hypothesis: As shown in the algorithm, in each round, the training subset is respectively sampled from TR S and TR D with the same sampling rate ρ. The hypothesis contains the classifier weight α k,t , which represents the classifier importance for the final hypothesis. It stabilizes the final result and tune to attain the stable classifier network. The weight update methods are different between the same-distribution instances and the diff-distribution instances. The weight update parameter of the same-distribution instances is β k,t , while that of the diff-distribution ones is γ k , which is related to the diff-distribution instances number on the current node. In each round, if a diff-distribution training instance is mistakenly predicted, it may likely conflict with the same-distribution training data. Then, adjustment should be introduced to its training weight to reduce its effect through multiplying its weight by γ k = γ k −λ k,t(i) , which is opposite to the same-distribution training data. After several rounds, the diff-distribution training instances with better fitting of the same-distribution will have larger training weights, while the dissimilar ones will have smaller weights.

Experiments and Discussion
In this section, we provide empirical evidence that incorporating the boosting algorithm into the knowledge transfer framework results in classification rate curves. We present results showing that our proposed method exhibits better classification rates than updating existing classifiers with data points selected either at random or via an existing, related general method. We also empirically show results that the proposed method offer a significant advantage over the more traditional semi-supervised methods by requiring far fewer data points to obtain better classification accuracies.
The proposed fusion classification method is tested on hyperspectral data sets obtained from two sites: NASA's Okavango Delta, Botswana [26], University of Pavia and Pavia center [27]. Support Vector Machine (SVM) is selected for the basic learner.  Table 1. The flight over the city of Pavia, Italy, was operated by the Deutschen Zentrum fur Luftund Raumfahrt (DLR, the German Aerospace Agency) in the framework of the HySens project, and managed and sponsored by the European Union. According to specifications, the number of bands of the ROSIS-3 sensor is 115 with a spectral coverage ranging from 0.43 to 0.86 µm. The data have been atmospherically corrected but not geometrically corrected. The spatial resolution is 1.3 m per pixel. Two data sets were used in the experiment.

University Area
The first test set is around the Engineering School at the University of Pavia. The image is 610 × 340 pixels in size, with a spatial resolution of 1.3 m. The ROSIS sensor has 115 spectral channels, with a spectral range of 430-860 nm. The 12-noisiest channels were removed, and the remaining 103 spectral bands were used in this experiment. The reference data contain nine ground-cover classes: asphalt, meadows, gravel, trees, metal sheets, bare soil, bitumen, bricks, and shadows. This is a challenging classification scenario as the image is dominated by complex urban classes and spatially nested regions. True-color composite and related ground reference maps are shown in Figure 3, and the number of class-dependent labeled samples is shown in Table 2.
Sensors 2016, 16, 1895 7 of 13 spatially nested regions. True-color composite and related ground reference maps are shown in Figure 3, and the number of class-dependent labeled samples is shown in Table 2.    The second test set is the center of Pavia. The Pavia center image was originally 1096 × 490 pixels. Some channels (13) have been removed due to noise. The remaining 102 spectral dimensions are processed. Nine classes of interest are considered, i.e., water, trees, meadows, bricks, soil, asphalt, bitumen, tiles, and shadows. Available training and testing set for each data set are given in Table 2, and Figure 3 shows false color images for both data sets.

Experiments
The assumption of transfer learning is that two data sets are different but related. It exploits relationships between data sets and extends a current statistical model to another data set. A class of popular transfer learning methods involves the updating strategy, whose origin is semi-supervised learning. Model parameters are updated by incorporating samples from the new data set. Therefore, a modified model can be generalized to the new data set.
In this section, we provide empirical evidence that incorporating adaboost into the knowledge transfer framework results in better accuracies. We present results showing that the proposed method exhibits better learning rates than traditional classifiers with data points selected by only stacking. We also empirically show results that this method have a significant advantage in few training samples, comparing with the more traditional methods, by requiring fewer data points to obtain better classification accuracies.
In these experiments, SVM was used as the basic learners in transfer adaboost. SVM classification (adopting the LIBSVM library [28]) was accomplished using a Gaussian RBF kernel. The SVM hyper parameters were optimized every ten iterations of the process by a fivefold cross validation. The C and γ parameters were selected in the range [2 −5 , 2 15 ] and [2 −15 , 2 3 ], respectively. In the experiment, the network structure selected is regular network with 20 nodes and the degree is 10, the sampling parameter ρ = 0.6, the training round T = 10. Furthermore, some constraints were added to the basic learners to avoid the case of training weights being unbalanced. Thus, during training procedure, the overall training weights between positive and negative examples are consistently balanced.
Three benchmark methods are implemented by using SVM as shown in Table 3. In the following, SVM, SVMt, TSVM will be used to represent various implementations of classifiers. SVMt means that training data will be only stacked by using SVM classifier. Moreover, the TSVM is the key method proposed in this paper. Pavia Center The second test set is the center of Pavia. The Pavia center image was originally 1096 × 490 pixels. Some channels (13) have been removed due to noise. The remaining 102 spectral dimensions are processed. Nine classes of interest are considered, i.e., water, trees, meadows, bricks, soil, asphalt, bitumen, tiles, and shadows. Available training and testing set for each data set are given in Table 2, and Figure 3 shows false color images for both data sets.

Experiments
The assumption of transfer learning is that two data sets are different but related. It exploits relationships between data sets and extends a current statistical model to another data set. A class of popular transfer learning methods involves the updating strategy, whose origin is semi-supervised learning. Model parameters are updated by incorporating samples from the new data set. Therefore, a modified model can be generalized to the new data set.
In this section, we provide empirical evidence that incorporating adaboost into the knowledge transfer framework results in better accuracies. We present results showing that the proposed method exhibits better learning rates than traditional classifiers with data points selected by only stacking. We also empirically show results that this method have a significant advantage in few training samples, comparing with the more traditional methods, by requiring fewer data points to obtain better classification accuracies.
In these experiments, SVM was used as the basic learners in transfer adaboost. SVM classification (adopting the LIBSVM library [28]) was accomplished using a Gaussian RBF kernel. The SVM hyper parameters were optimized every ten iterations of the process by a fivefold cross validation. The C and γ parameters were selected in the range [2 −5 , 2 15 ] and [2 −15 , 2 3 ], respectively. In the experiment, the network structure selected is regular network with 20 nodes and the degree is 10, the sampling parameter ρ = 0.6, the training round T = 10. Furthermore, some constraints were added to the basic learners to avoid the case of training weights being unbalanced. Thus, during training procedure, the overall training weights between positive and negative examples are consistently balanced.
Three benchmark methods are implemented by using SVM as shown in Table 3. In the following, SVM, SVMt, TSVM will be used to represent various implementations of classifiers. SVMt means that training data will be only stacked by using SVM classifier. Moreover, the TSVM is the key method proposed in this paper. The Botswana Hyperion data and ROSIS University of Pavia data set are respectively split into two sets: a training set X S and a test set S. We adopted KPCA algorithm to extract the image features including 30 dimensions [29]. The comparison experiment based on TrAdaBoost is performed. In the experiment, the network structure selected is regular network with 20 nodes and the degree is 10, the sampling parameter ρ = 0.6, the training round T = 10. Table 4 presents the experimental results of SVM, SVMt and TrAdaBoost (TSVM) when the ratio between training and testing data is 2% and 5%. The performance in classification accuracy was the average of 10 repeats by random. Finally, the Botswana data classification maps obtained by the different methods are shown in Figure 4 and the ROSIS data in Figure 5. The second test set is the center of Pavia. The Pavia center image was originally 1096 × 490 pixels. Some channels (13) have been removed due to noise. The remaining 102 spectral dimensions are processed. Nine classes of interest are considered, i.e., water, trees, meadows, bricks, soil, asphalt, bitumen, tiles, and shadows. Available training and testing set for each data set are given in Table 2, and Figure 3 shows false color images for both data sets.

Experiments
The assumption of transfer learning is that two data sets are different but related. It exploits relationships between data sets and extends a current statistical model to another data set. A class of popular transfer learning methods involves the updating strategy, whose origin is semi-supervised learning. Model parameters are updated by incorporating samples from the new data set. Therefore, a modified model can be generalized to the new data set.
In this section, we provide empirical evidence that incorporating adaboost into the knowledge transfer framework results in better accuracies. We present results showing that the proposed method exhibits better learning rates than traditional classifiers with data points selected by only stacking. We also empirically show results that this method have a significant advantage in few training samples, comparing with the more traditional methods, by requiring fewer data points to obtain better classification accuracies.
In these experiments, SVM was used as the basic learners in transfer adaboost. SVM classification (adopting the LIBSVM library [28]) was accomplished using a Gaussian RBF kernel. The SVM hyper parameters were optimized every ten iterations of the process by a fivefold cross validation. The C and γ parameters were selected in the range [2 −5 , 2 15 ] and [2 −15 , 2 3 ], respectively. In the experiment, the network structure selected is regular network with 20 nodes and the degree is 10, the sampling parameter ρ = 0.6, the training round T = 10. Furthermore, some constraints were added to the basic learners to avoid the case of training weights being unbalanced. Thus, during training procedure, the overall training weights between positive and negative examples are consistently balanced.
Three benchmark methods are implemented by using SVM as shown in Table 3. In the following, SVM, SVMt, TSVM will be used to represent various implementations of classifiers. SVMt means that training data will be only stacked by using SVM classifier. Moreover, the TSVM is the key method proposed in this paper. The Botswana Hyperion data and ROSIS University of Pavia data set are respectively split into two sets: a training set X S and a test set S. We adopted KPCA algorithm to extract the image features including 30 dimensions [29]. The comparison experiment based on TrAdaBoost is performed. In the experiment, the network structure selected is regular network with 20 nodes and the degree is 10, the sampling parameter ρ = 0.6, the training round T = 10. Table 4 presents the experimental results of SVM, SVMt and TrAdaBoost (TSVM) when the ratio between training and testing data is 2% and 5%. The performance in classification accuracy was the average of 10 repeats by random. Finally, the Botswana data classification maps obtained by the different methods are shown in Figure 4 and the ROSIS data in Figure 5. The second test set is the center of Pavia. The Pavia center image was originally 1096 × 490 pixels. Some channels (13) have been removed due to noise. The remaining 102 spectral dimensions are processed. Nine classes of interest are considered, i.e., water, trees, meadows, bricks, soil, asphalt, bitumen, tiles, and shadows. Available training and testing set for each data set are given in Table 2, and Figure 3 shows false color images for both data sets.

Experiments
The assumption of transfer learning is that two data sets are different but related. It exploits relationships between data sets and extends a current statistical model to another data set. A class of popular transfer learning methods involves the updating strategy, whose origin is semi-supervised learning. Model parameters are updated by incorporating samples from the new data set. Therefore, a modified model can be generalized to the new data set.
In this section, we provide empirical evidence that incorporating adaboost into the knowledge transfer framework results in better accuracies. We present results showing that the proposed method exhibits better learning rates than traditional classifiers with data points selected by only stacking. We also empirically show results that this method have a significant advantage in few training samples, comparing with the more traditional methods, by requiring fewer data points to obtain better classification accuracies.
In these experiments, SVM was used as the basic learners in transfer adaboost. SVM classification (adopting the LIBSVM library [28]) was accomplished using a Gaussian RBF kernel. The SVM hyper parameters were optimized every ten iterations of the process by a fivefold cross validation. The C and γ parameters were selected in the range [2 −5 , 2 15 ] and [2 −15 , 2 3 ], respectively. In the experiment, the network structure selected is regular network with 20 nodes and the degree is 10, the sampling parameter ρ = 0.6, the training round T = 10. Furthermore, some constraints were added to the basic learners to avoid the case of training weights being unbalanced. Thus, during training procedure, the overall training weights between positive and negative examples are consistently balanced.
Three benchmark methods are implemented by using SVM as shown in Table 3. In the following, SVM, SVMt, TSVM will be used to represent various implementations of classifiers. SVMt means that training data will be only stacked by using SVM classifier. Moreover, the TSVM is the key method proposed in this paper. The Botswana Hyperion data and ROSIS University of Pavia data set are respectively split into two sets: a training set X S and a test set S. We adopted KPCA algorithm to extract the image features including 30 dimensions [29]. The comparison experiment based on TrAdaBoost is performed. In the experiment, the network structure selected is regular network with 20 nodes and the degree is 10, the sampling parameter ρ = 0.6, the training round T = 10. Table 4 presents the experimental results of SVM, SVMt and TrAdaBoost (TSVM) when the ratio between training and testing data is 2% and 5%. The performance in classification accuracy was the average of 10 repeats by random. Finally, the Botswana data classification maps obtained by the different methods are shown in Figure 4 and the ROSIS data in Figure 5. Pavia Center The second test set is the center of Pavia. The Pavia center image was originally 1096 × 490 pixels. Some channels (13) have been removed due to noise. The remaining 102 spectral dimensions are processed. Nine classes of interest are considered, i.e., water, trees, meadows, bricks, soil, asphalt, bitumen, tiles, and shadows. Available training and testing set for each data set are given in Table 2, and Figure 3 shows false color images for both data sets.

Experiments
The assumption of transfer learning is that two data sets are different but related. It exploits relationships between data sets and extends a current statistical model to another data set. A class of popular transfer learning methods involves the updating strategy, whose origin is semi-supervised learning. Model parameters are updated by incorporating samples from the new data set. Therefore, a modified model can be generalized to the new data set.
In this section, we provide empirical evidence that incorporating adaboost into the knowledge transfer framework results in better accuracies. We present results showing that the proposed method exhibits better learning rates than traditional classifiers with data points selected by only stacking. We also empirically show results that this method have a significant advantage in few training samples, comparing with the more traditional methods, by requiring fewer data points to obtain better classification accuracies.
In these experiments, SVM was used as the basic learners in transfer adaboost. SVM classification (adopting the LIBSVM library [28]) was accomplished using a Gaussian RBF kernel. The SVM hyper parameters were optimized every ten iterations of the process by a fivefold cross validation. The C and γ parameters were selected in the range [2 −5 , 2 15 ] and [2 −15 , 2 3 ], respectively. In the experiment, the network structure selected is regular network with 20 nodes and the degree is 10, the sampling parameter ρ = 0.6, the training round T = 10. Furthermore, some constraints were added to the basic learners to avoid the case of training weights being unbalanced. Thus, during training procedure, the overall training weights between positive and negative examples are consistently balanced.
Three benchmark methods are implemented by using SVM as shown in Table 3. In the following, SVM, SVMt, TSVM will be used to represent various implementations of classifiers. SVMt means that training data will be only stacked by using SVM classifier. Moreover, the TSVM is the key method proposed in this paper. The Botswana Hyperion data and ROSIS University of Pavia data set are respectively split into two sets: a training set X S and a test set S. We adopted KPCA algorithm to extract the image features including 30 dimensions [29]. The comparison experiment based on TrAdaBoost is performed. In the experiment, the network structure selected is regular network with 20 nodes and the degree is 10, the sampling parameter ρ = 0.6, the training round T = 10. Table 4 presents the experimental results of SVM, SVMt and TrAdaBoost (TSVM) when the ratio between training and testing data is 2% and 5%. The performance in classification accuracy was the average of 10 repeats by random. Finally, the Botswana data classification maps obtained by the different methods are shown in Figure 4 and the ROSIS data in Figure 5. Pavia Center The second test set is the center of Pavia. The Pavia center image was originally 1096 × 490 pixels. Some channels (13) have been removed due to noise. The remaining 102 spectral dimensions are processed. Nine classes of interest are considered, i.e., water, trees, meadows, bricks, soil, asphalt, bitumen, tiles, and shadows. Available training and testing set for each data set are given in Table 2, and Figure 3 shows false color images for both data sets.

Experiments
The assumption of transfer learning is that two data sets are different but related. It exploits relationships between data sets and extends a current statistical model to another data set. A class of popular transfer learning methods involves the updating strategy, whose origin is semi-supervised learning. Model parameters are updated by incorporating samples from the new data set. Therefore, a modified model can be generalized to the new data set.
In this section, we provide empirical evidence that incorporating adaboost into the knowledge transfer framework results in better accuracies. We present results showing that the proposed method exhibits better learning rates than traditional classifiers with data points selected by only stacking. We also empirically show results that this method have a significant advantage in few training samples, comparing with the more traditional methods, by requiring fewer data points to obtain better classification accuracies.
In these experiments, SVM was used as the basic learners in transfer adaboost. SVM classification (adopting the LIBSVM library [28]) was accomplished using a Gaussian RBF kernel. The SVM hyper parameters were optimized every ten iterations of the process by a fivefold cross validation. The C and γ parameters were selected in the range [2 −5 , 2 15 ] and [2 −15 , 2 3 ], respectively. In the experiment, the network structure selected is regular network with 20 nodes and the degree is 10, the sampling parameter ρ = 0.6, the training round T = 10. Furthermore, some constraints were added to the basic learners to avoid the case of training weights being unbalanced. Thus, during training procedure, the overall training weights between positive and negative examples are consistently balanced.
Three benchmark methods are implemented by using SVM as shown in Table 3. In the following, SVM, SVMt, TSVM will be used to represent various implementations of classifiers. SVMt means that training data will be only stacked by using SVM classifier. Moreover, the TSVM is the key method proposed in this paper. The Botswana Hyperion data and ROSIS University of Pavia data set are respectively split into two sets: a training set X S and a test set S. We adopted KPCA algorithm to extract the image features including 30 dimensions [29]. The comparison experiment based on TrAdaBoost is performed. In the experiment, the network structure selected is regular network with 20 nodes and the degree is 10, the sampling parameter ρ = 0.6, the training round T = 10. Table 4 presents the experimental results of SVM, SVMt and TrAdaBoost (TSVM) when the ratio between training and testing data is 2% and 5%. The performance in classification accuracy was the average of 10 repeats by random. Finally, the Botswana data classification maps obtained by the different methods are shown in Figure 4 and the ROSIS data in Figure 5.

S S SVM
The Botswana Hyperion data and ROSIS University of Pavia data set are respectively split into two sets: a training set X S and a test set S. We adopted KPCA algorithm to extract the image features including 30 dimensions [29]. The comparison experiment based on TrAdaBoost is performed. In the experiment, the network structure selected is regular network with 20 nodes and the degree is 10, the sampling parameter ρ = 0.6, the training round T = 10. Table 4 presents the experimental results of SVM, SVMt and TrAdaBoost (TSVM) when the ratio between training and testing data is 2% and 5%. The performance in classification accuracy was the average of 10 repeats by random. Finally, the Botswana data classification maps obtained by the different methods are shown in Figure 4 and the ROSIS data in Figure 5.   From Table 4, the accuracy given by TSVM are obviously higher than those given by SVM and SVMt. Intuitively, this is inevitable since SVM is not a learning technique designed for transfer classification, while adaboost is. However, as several researchers have already noted, transfer learning could not improve the generalization classification accuracy all the time and sometimes will show even lower performance on test set. This phenomenon is mentioned as transfer learning lowers the original performance negative transfer. Although in our experiments, adaboost continuously exhibit better or comparative performances than baselines, there is no guarantee for TrAdaboost to improve the basic learner.  From Table 4, the accuracy given by TSVM are obviously higher than those given by SVM and SVMt. Intuitively, this is inevitable since SVM is not a learning technique designed for transfer classification, while adaboost is. However, as several researchers have already noted, transfer learning could not improve the generalization classification accuracy all the time and sometimes will show even lower performance on test set. This phenomenon is mentioned as transfer learning lowers From Table 4, the accuracy given by TSVM are obviously higher than those given by SVM and SVMt. Intuitively, this is inevitable since SVM is not a learning technique designed for transfer classification, while adaboost is. However, as several researchers have already noted, transfer learning could not improve the generalization classification accuracy all the time and sometimes will show even lower performance on test set. This phenomenon is mentioned as transfer learning lowers the original performance negative transfer. Although in our experiments, adaboost continuously exhibit better or comparative performances than baselines, there is no guarantee for TrAdaboost to improve the basic learner.
In Figure 6, University of Pavia data set was deliberately used. The ratio between training and diff-distribution testing examples gradually increased from 0.01 to 0.1. Classifications were performed 10 times for each sampling rate. The average overall accuracies and standard deviations of the two baseline methods and the proposed method are listed in Figure 5. TrAdaBoost (SVM) consistently improves the performance of SVMt. TrAdaBoost (SVM) also outperforms SVM, when the ratio is lower than 0.05. But, when the ratio reaches larger than 0.05, TrAdaBoost (SVM) performs a little worse than SVM, but still comparative. Generally out-date image set training data contain both good knowledge and noisy data. In the case of that too few original image set training data could be used to train a good classifier, the useful knowledge from out-date image set training data will be beneficial to the learner, while the noisy part does not have significant negative effect. In Figure 6, University of Pavia data set was deliberately used. The ratio between training and diff-distribution testing examples gradually increased from 0.01 to 0.1. Classifications were performed 10 times for each sampling rate. The average overall accuracies and standard deviations of the two baseline methods and the proposed method are listed in Figure 5. TrAdaBoost (SVM) consistently improves the performance of SVMt. TrAdaBoost (SVM) also outperforms SVM, when the ratio is lower than 0.05. But, when the ratio reaches larger than 0.05, TrAdaBoost (SVM) performs a little worse than SVM, but still comparative. Generally out-date image set training data contain both good knowledge and noisy data. In the case of that too few original image set training data could be used to train a good classifier, the useful knowledge from out-date image set training data will be beneficial to the learner, while the noisy part does not have significant negative effect. In the following discussion, ROSIS data were used as an illustrative example. This data set combination is representative of the remaining data sets as due to its similarity with some classes. We first use SVM classifier on the University of Pavia image data set. The resultant graph presents a misleading clustering condition, and consequently leads to an unfaithful joint manifold. Subsequently, as seen in the example in Table 5, some misclassified samples are observed, e.g., for classes 2 and 4. It can be seen that the two classes, i.e., Meadow (Class 2) and Bare_soil (Class 4), exhibit significant confusion. Samples of Class 2 from the source image and samples of Class 4 from the target image are difficult to discriminate since the two features are very similar, which can also be validated by the confusion matrix in Table 4. The separation of these two classes is clearer in the latent space provided by the proposed method. The same trend is observed for Classes 3 and 4 of the data, as well as these two class pairs of the Class 3 and Class 2. The Asphalt (Class 1) and Brick (Class 6) land cover types also show some confusion. In addition to improvement of the classification accuracy, the proposed method also selects the most informative data points from these classes. Compared to the results given by the two baselines, In the following discussion, ROSIS data were used as an illustrative example. This data set combination is representative of the remaining data sets as due to its similarity with some classes. We first use SVM classifier on the University of Pavia image data set. The resultant graph presents a misleading clustering condition, and consequently leads to an unfaithful joint manifold. Subsequently, as seen in the example in Table 5, some misclassified samples are observed, e.g., for classes 2 and 4. It can be seen that the two classes, i.e., Meadow (Class 2) and Bare_soil (Class 4), exhibit significant confusion. Samples of Class 2 from the source image and samples of Class 4 from the target image are difficult to discriminate since the two features are very similar, which can also be validated by the confusion matrix in Table 4. The separation of these two classes is clearer in the latent space provided by the proposed method. The same trend is observed for Classes 3 and 4 of the data, as well as these two class pairs of the Class 3 and Class 2. The Asphalt (Class 1) and Brick (Class 6) land cover types also show some confusion. In addition to improvement of the classification accuracy, the proposed method also selects the most informative data points from these classes. Compared to the results given by the two baselines, TSVM provides higher overall accuracy. Among the ten common classes in the UOP data pair, classes 1, 2, 3, 4, 6 are difficult to discriminate within a single image because the classes are comprised of mixtures. Spectral changes and mixed spectral signatures make domain adaptation in these data pairs even more difficult. The Class 2/Class 4 pair exhibits the most confusion. As shown in Figure 3, classes 2 and 4 from the source image (UOP) are very similar, and we can also observe that the spectral drifting of Class 2 is evident. Thus, many samples of Class 2 from the target image (COP) are misclassified as Class 4 when the training samples are only from the source image. The proposed method provides a significant improvement in the classification accuracy of Class 2. Table 6 shows the confusion matrix obtained by using the TSVM algorithm. This method eliminates the confusion among some classes, and also exhibits a better accuracy in tree (Class 3), bare_soil (Class 4), bitumen (Class 5) and shadow (Class 7).

Conclusions
In this paper, we have proposed a novel framework method for knowledge transfer fusion by boosting a basic learner. This algorithm is with high efficacy and accuracy and especially suitable to the small sample and similar classes discrimination. The basic idea is to select most useful instances as additional training data for predicting the labels. This method firstly finds the distribution of original image training data, and then selects the most helpful out-date image-training samples as additional training data. The methods, including SVMt and TSVM, show excellent performance as compared to SVM on two data sets (KSC and BOT). Our experiments on two hyperspectral data also demonstrate by using the method there will be a better transfer ability for similar classes discrimination than traditional learning techniques. The overall accuracy has been improved, and important is the most classes accuracies also have been improved. In addition to the concept level guidance, results show notable improvements especially for critical classes without scarifying much of the overall performance. TSVM further incorporates the informative analysis, thus performs the best.
Moreover, for case of small sample, this method exhibits a better performance than the benchmark methods. This study could be expanded when more hyperspectral data are available, especially to determine the effectiveness of the active learning based knowledge transfer framework when the spatial/temporal separation of the data sets is increased systematically.