An Impartial Semi-Supervised Learning Strategy for Imbalanced Classification on VHR Images

Imbalanced learning is a common problem in remote sensing imagery-based land-use and land-cover classifications. Imbalanced learning can lead to a reduction in classification accuracy and even the omission of the minority class. In this paper, an impartial semi-supervised learning strategy based on extreme gradient boosting (ISS-XGB) is proposed to classify very high resolution (VHR) images with imbalanced data. ISS-XGB solves multi-class classification by using several semi-supervised classifiers. It first employs multi-group unlabeled data to eliminate the imbalance of training samples and then utilizes gradient boosting-based regression to simulate the target classes with positive and unlabeled samples. In this study, experiments were conducted on eight study areas with different imbalanced situations. The results showed that ISS-XGB provided a comparable but more stable performance than most commonly used classification approaches (i.e., random forest (RF), XGB, multilayer perceptron (MLP), and support vector machine (SVM)), positive and unlabeled learning (PU-Learning) methods (PU-BP and PU-SVM), and typical synthetic sample-based imbalanced learning methods. Especially under extremely imbalanced situations, ISS-XGB can provide high accuracy for the minority class without losing overall performance (the average overall accuracy achieves 85.92%). The proposed strategy has great potential in solving the imbalanced classification problems in remote sensing.


Introduction
Classification with an imbalanced sample set is very common in real-world scenarios [1]. Many thematic classification maps can only be trained, calibrated, and validated with imbalanced samples due to the high costs of obtaining labels and the lack of ancillary information for sampling [2]. However, these imbalanced samples often pose difficulties for learning algorithms. Because classifiers are biased towards the majority class [3], the minority class, which may be a category of concern for researchers, is omitted under this situation [4]. Inaccurate results introduce potential risks for Data and class decomposition can break up the data distribution of problems into multiple minority or majority classes.
Semi-supervised learning methods, which explore the hidden distribution information from unlabeled data for learning, may be effective methods for solving this difficulty [48]. Positive and unlabeled learning (PU learning), proposed by Elkan and Noto [49], is one of the best semi-supervised methods for remote sensing data [50]. This semi-supervised method handles the classes one by one (OBO), which is different from the traditional all-in-one (AIO) framework (i.e., where the samples of all classes are combined to train the classifier at the same time) [50]. The unlabeled data in PU learning can provide the distribution information of the covariates for model training [51] and act as a regularizer to prevent overfitting [48]. However, in the remote sensing field, this method has been reformed for class-incomplete issues [50,52] or binary classification applications [51,53,54] but not for sample imbalance problems.
In this paper, we proposed an impartial semi-supervised learning strategy based on PU learning for sample imbalanced issues in remote sensing. This strategy utilizes unlabeled data to convert a positive-positive (traditional all-in-one methods with all positive samples) classification scheme into a positive-unlabeled scheme and compensated for the learning deficiency caused by the extreme rarity of samples through multiple positive-unlabeled training. In this strategy, a simulator is needed for the positive-unlabeled training, such as the back-propagation neural network (BP) [50,52,53] or support vector machine (SVM) [51,55]. However, it is known that BP and SVM are sensitive to extremely rare samples, which may lead to simulation bias [1,3,25]. Extreme gradient boosting (XGB) has proven to be more predictive than non-boosting methods and uses iterative gradient boosters for regression simulation with limited samples [56,57]. Thus, the proposed impartial semi-supervised learning strategy employs XGB (an impartial semi-supervised learning strategy based on extreme gradient boosting, ISS-XGB) as a simulator to improve the simulations and predictions for the positive class.
The rest of this paper is organized as follows. Section 2 describes the principle of ISS-XGB for imbalanced learning. Section 3 outlines experiments conducted on two VHR remote sensing data resources across eight study areas with different complexities. Section 4 demonstrates the performance of ISS-XGB with imbalanced samples. Section 5 further evaluates the effectiveness of the proposed strategy via two contrastive analyses: (1) a comparison with previous semi-supervised models in remote sensing and (2) a comparison with synthetic resampling technique-based methods. The influence of unlabeled data on ISS-XGB is also discussed in this section.

The Principle of ISS-XGB: Impartial Semi-Supervised Learning Strategy for Imbalanced Learning
ISS-XGB is an impartial semi-supervised learning strategy based on extreme gradient boosting. It is built on the PU learning framework, and models are trained with positive samples and are randomly selected as unlabeled data through the OBO strategy.
Suppose that x represents the covariates associated with an instance and y represents the property of the instance, where y = 1 denotes positive data, and y = 0 denotes negative data. The target of classification can be defined as a function of the probability that a pixel is positive based on its characteristics (or covariates), denoted as f (x) = p(y = 1 x).
However, in ISS-XGB, the input datasets for each training procedure are only the positive samples of a certain class and unlabeled data (s = 1 represents labeled, and s = 0 represents unlabeled). Thus, for certain class i, the learner is a positive-unlabeled simulator g(x) = p(s = 1|x), which simulates the probability that a pixel is labeled. For n-class classification problems, n positive-unlabeled models are generated ( Figure 1). The imbalanced learning problem is converted to multiple positive-unlabeled problems. Such decomposition strategies can help to avoid inputting positive samples of minority classes and majority classes into the simulator at the same time.

Study Areas and Data
Eight study areas with different complexities were selected. Areas 1-7 (aerial images acquired by ADS40 in 2014 with a 0.2 m spatial resolution) are located in Beihai, Guangxi, a normal suburban area in China. Area 8 (GeoEye-1 data from the environment for visualizing images (ENVI) tutorial data, an open-access data source acquired in 2009 with a 0.5 m spatial resolution) is located in northwest Hobart (Tasmania, Australia). The landscape distributions of all 8 areas exhibit different class imbalances without uncertain shadows caused by high buildings. (Figure 2) All datasets use VHR data, which is beneficial for random sampling and evaluating the performance of different methods [50]. VHR allows manual image interpretation to be carried out for labeling land-cover types with higher confidence than that achieved with coarser resolution images. For accuracy verification, manual interpretation was used to obtain the pixel labels. Based on China's National Standard "Current Land-use Classification, GB/T 21010-2017" and ancillary data (the production of Second National Land Survey for Beihai), areas 1-7 cover several typical land classes: house, road, tree grass, soil, water, farmland, and others. Meanwhile, the farmland class was further divided into farmland with and without crops by manual interpretation, as to be consistent with the surface cover (Table 1). Referring to the same standards and labeling procedures above, area 8 was ultimately finally divided into 11 classes (Table 1), including building-1 with red roofs, building-2 with grey roofs, and building-3 with light green roofs in pseudo mode (R/G/B:4/3/2). Additionally, highlight objects, such as ground vehicles, surface vessels, etc., were classified as one type. In every study area, one land-cover class with a low proportion was specified as the minority class, and the rest were all majority classes (Table 1). Eight second-order texture metrics (mean, variance, homogeneity, contrast, dissimilarity, entropy, second moment, and correlation) [50,52] for each spectral band were Since only positive data can be labelled, the probability of a negative datum to be labeled is zero: The labeled samples are chosen randomly from all positive pixels; thus, the probability that a positive pixel is labeled is a constant "c" regardless of x is as follows: With Equations (1) and (2), we have Therefore, there is a clear relationship between f (x) and g(x): Li [52] provided an approach to obtain c by using an average of the predicted probabilities of multiple positive pixels in a validation set. More detailed proofs can be found in [49].
The PU learning framework takes advantage of randomly selected unlabeled data by employing auxiliary information of the positive sample distribution to simulate the target. Research based on this framework often tends to use a large amount of unlabeled data (5000 in [51,52]) to provide sufficient information. However, an inconsistent number of positive and unlabeled samples can still lead to Sensors 2020, 20, 6699 5 of 20 imbalanced learning issues, especially when the positive samples are extremely rare. Thus, in ISS-XGB, equal quantities of positive samples and unlabeled datasets are used for training a balanced one-class classifier ( Figure 1). Moreover, to obtain enough information on the whole distribution and ensure the stability of the simulation, an ensemble of multiple unlabeled datasets (10 times in this paper) is created for each class. In this way, the target f (x) can be identified as f j (x), where j = 10. Ten simulators (g j (x)) and c j are integrated into a sub simulation system f class−i (x). The final posterior probability is calculated by the mean of the probability output, i.e., f class−i (x) = f j (x), representing the probability of an instance belonging to class i. The final label is assigned based on the maximum posterior probability of each class ( Figure 1).
The simulator g(x) is trained by a binary classifier, XGB. This is a scalable regularized gradient boosting technology that provides predictive performance. As an ensemble tree model, XGB uses iterative gradient boosters to construct a strong classification model. XGB has shown predictive abilities for binary classification problems [57,58] and land-cover classification problems [39,47,59,60]. The gradient correction of XGB helps classifier learning and constant estimation from imperfect representations of limited samples. The gradient correction of XGB is helpful for classifier learning and constant estimations from the imperfect representation of limited samples.
In summary, ISS-XGB has two main optimizations aimed at imbalance issues. First, equal amounts of the training data (positive and unlabeled) prevent the inner imbalance of the sample sets. Meanwhile, multiple unlabeled datasets are used to offer substantial information on the distribution. Second, XGB is employed as the simulator to improve the certainty of simulation via iterative gradient boosters. It is worth noting that the unlabeled data are randomly selected from the images. ISS-XGB can handle the imbalanced land-cover classification problem without extra labeling costs.
For n-class classification problems, the core idea of ISS-XGB is to use positive-unlabeled framework to transform the tasks into n-binary learning components. Each positive-unlabeled sub-component consists of 10 PU training routines. To compensates for learning deficiency caused by sample imbalance, every routine is based on the same number of positive samples and unlabeled data. To establish the relationship (estimate c) between f (x) and g(x), the training data in every routine is split, 75% for training, and 25% for validation. Random splitting and multiple routines with different unlabeled data can also help control overfitting. According to Equation (4), the conditional probability of one routine is estimated by XGB. Then the average of 10 routines provides the estimations of current class i. All n f class−i (x) constitute the ISS-XGB ensemble model. Moreover, ISS-XGB adopts maximum probabilities rule for label prediction ( Figure 1).

Study Areas and Data
Eight study areas with different complexities were selected. Areas 1-7 (aerial images acquired by ADS40 in 2014 with a 0.2 m spatial resolution) are located in Beihai, Guangxi, a normal suburban area in China. Area 8 (GeoEye-1 data from the environment for visualizing images (ENVI) tutorial data, an open-access data source acquired in 2009 with a 0.5 m spatial resolution) is located in northwest Hobart (Tasmania, Australia). The landscape distributions of all 8 areas exhibit different class imbalances without uncertain shadows caused by high buildings. (Figure 2) All datasets use VHR data, which is beneficial for random sampling and evaluating the performance of different methods [50]. VHR allows manual image interpretation to be carried out for labeling land-cover types with higher confidence than that achieved with coarser resolution images. For accuracy verification, manual interpretation was used to obtain the pixel labels. Based on China's National Standard "Current Land-use Classification, GB/T 21010-2017" and ancillary data (the production of Second National Land Survey for Beihai), areas 1-7 cover several typical land classes: house, road, tree grass, soil, water, farmland, and others. Meanwhile, the farmland class was further divided into farmland with and without crops by manual interpretation, as to be consistent with the surface cover (Table 1). Referring to the same standards and labeling procedures above, area 8 was ultimately finally divided into 11 classes (Table 1), including building-1 with red roofs, building-2 with grey roofs, and building-3 with light green roofs in pseudo mode (R/G/B:4/3/2). Additionally, high-light objects, such as ground vehicles, surface vessels, etc., were classified as one type. In every study area, one land-cover class with a low proportion was specified as the minority class, and the rest were all majority classes (Table 1). Eight second-order texture metrics (mean, variance, homogeneity, contrast, dissimilarity, entropy, second moment, and correlation) [50,52] for each spectral band were also used for training. All features were extracted in ENVI 5.1 with a 3 × 3 pixel template along the horizontal direction.  . Each area is limited to 1000 × 1000 pixels. Therefore, the numbers of positive samples are limited to within 1.1% of the whole population (0.3%, 0.5%, 0.6%, 0.5%, 0.6%, 0.6%, 0.7%, and 1.1%, respectively).

Experimental Set-up and Accuracy Assessment
To evaluate the performance of ISS-XGB on different sample imbalances, 50 sample sets with different class proportions were randomly selected from each study area. Each sample set consisted of 1000×n (where n represents the number of classes) samples, whose class proportion was a:b (a represents the minority class, and b represents the majority class). To generate different imbalanced . Each area is limited to 1000 × 1000 pixels. Therefore, the numbers of positive samples are limited to within 1.1% of the whole population (0.3%, 0.5%, 0.6%, 0.5%, 0.6%, 0.6%, 0.7%, and 1.1%, respectively).
To quantify the complexity and diversity of the study areas, the Shannon diversity index (SHDI), which considers both species richness and abundance, was used [61]. This index was calculated using the entropy equation with the proportions of classes existing in each area. The SHDI of the 8 study areas ranged from 0.83 to 2.22 (acquired by Fragstats 4.2 [62]), meaning that the diversity of the areas ranged from relatively simple to complex.

Experimental Set-up and Accuracy Assessment
To evaluate the performance of ISS-XGB on different sample imbalances, 50 sample sets with different class proportions were randomly selected from each study area. Each sample set consisted of 1000 × n (where n represents the number of classes) samples, whose class proportion was a:b (a represents the minority class, and b represents the majority class). To generate different imbalanced sample sets, a was increased from 1 to 50, while b was altered in the reverse manner (from 99 to 50) [63], forming distributions ranging from extremely skewed to completely balanced. Here, we assumed that due to the large distribution of the majority classes, it was easy to form abundant positive sample data for the classes. Simultaneously, to reduce the impact of non-single changes in the number of samples on the study of imbalanced classification patterns, the majority classes in the experiments used a sample set with the same amount. Reproducible trials were conducted with datasets of different class imbalances to ensure statistically reliable accuracy. The final reported results for each accuracy measure represents the average values of 100 trials. Additionally, test sets were randomly selected from whole images before the selection of training samples.
The performance of ISS-XGB with different imbalanced samples was also compared with that of approaches dealing with imbalanced learning issues in the remote sensing and machine learning fields, such as MLP [64], RF [37], and XGB [58,65]. It was also compared with SVM [38], which used a binary training mode (one-vs.-rest strategy). However, SVM and ISS-XGB are quite similar in the learning framework. The most significant difference is that SVM used negative samples, but ISS-XGB used unlabeled data. Moreover, typical semi-supervised learning methods in remote sensing, including PUL [52] (PU-BP in this paper) and PUL-SVM [55] (PU-SVM in this paper), were also compared with the proposed strategy to develop on insightful analysis of the optimizations in ISS-XGB. In addition, SMOTE [16], which uses relative positive neighbors to synthesize extra samples for the minority class, is a popular and direct approach to deal with imbalanced learning issues [25]. Thus, we also explored the performance of ISS-XGB and SMOTE-based methods.
To obtain complete insight into the performance of this multi-class imbalanced learning scenario, not only the overall performance but also the accuracy of the minority class was quantified. The overall accuracy (OA), Cohan's Kappa coefficient (κ), and user and producer accuracy are the most frequently used measures in the remote sensing field [66]. However, Pontius (2011) indicated that κ only focused on the agreement and evaluated accuracy using randomness as a baseline, which was useless for explaining the error. Thus, quantity disagreement (QD) and allocation disagreement (AD) were recommended to assess disagreement performance [67]. Moreover, the recommended disagreement measures [67] could convert the observed sample matrix into an estimated unbiased population matrix to assess the performance of the population (this aspect is ignored in many applications). Thus, this paper used QD and AD to quantify the disagreement of the label prediction. The F 1 score [25] represented the harmonic mean of user and producer accuracy, which reached its best value at 1 and worst at 0. Therefore, we used this indicator to evaluate the agreement of the minority class. To this end, we followed the recommendations and routine to calculate OA, QD, and AD (based on the population matrix [67]) to determine that overall performance and F 1 score, QD', and AD' (based on the confusion matrix [67]) for the minority class. These measures were divided into two categories: agreement performances (OA, F 1 score) and disagreement performances (QD, AD).

Parameter Optimization
The parameters of all methods were optimized in an empirical parameter space. For ISS-XGB and XGB, five main parameters were optimized, including the number of estimators (1~1000), learning rates (0.1~1), maximum tree depths (2~23), minimum leaf instances (1~29), and subsample rates (0.1~1). Similar to XGB, RF is composed of many simple decision trees as a strong model. To ensure the same complexity available in RF-and XGB-based methods, the number of simple trees was set to equal to the optimized number of estimators in XGB. The max_features parameter was set to the square root of the number of predictor variables [37,68]. For MLP and PU-BP, we used the same net structure [50], which mainly optimized the penalty parameter. This paper utilized the RBF kernel-based SVM, which is widely used in remote sensing classifications [38]. Cost parameters were mainly optimized on a validation set from balanced samples for every study area with a search scale of 1 to 200. For the multi-class classification scenario in this paper, the "one against the rest" mode was employed for SVM training. In addition, a data set with balanced positive and unlabeled samples is recommended for PU-SVM [55]. Therefore, the amount of unbalanced data in PU-SVM was set as equal to the positive samples of the current class, whereas 5000 unlabeled data were used in PU-BP, which is consistent with [50]. For SMOTE, the parameters of k (number of neighbors participating in synthesis) were set to 5, and the synthesis magnification was 100 (that is, equalization synthesis). All approaches were implemented using the scikit-learn package for Python 3.6 [69] (Windows 10, 3.40 GHz, and 8.0 Gb memory with an of Intel Xeon E3 CPU).

Results
Identification of the minority class is a difficult task in imbalanced classification. The proposed method (ISS-XGB) aimed to improve the accuracy of the minority class without losing the accuracy of the majority classes and the overall performance. Thus, we analyzed the performance of ISS-XGB to both the minority class and entirety class. In addition, we further investigated the effectiveness of ISS-XGB under different data complexities.

Performance on the Minority Class
Area 8, which contained 11 land-cover species (Table 1) and possessed the highest complexity and diversity (SHDI = 2.22), was chosen as an example. We analyzed the performance of ISS-XGB, MLP, RF, XGB, and SVM on the minority class under two class proportions. The proportion rate (2:98) represented an extremely imbalanced situation, while the proportion rate (50:50) was the balanced situation. The probability maps of the minority class (i.e., tree) and the quantity measures are shown in Figure 3 and Table 2. Under extremely imbalanced situations, the probability maps were significantly different (first row in Figure 3). The highest probabilities of MLP, SVM, and RF were only 0.012, 0.234, and 0.062, which may have led to a failure in identifying minority classes. Indeed, these methods omitted all tree pixels in Area 8 with a QD' of 100% (Table 2). Both XGB and ISS-XGB provided higher probabilities (approximately 0.67 in Figure 3). However, XGB omitted most tree pixels with a QD' of 99.64% and AD' of 0% ( Table 2). The F 1 score of the minority class in XGB was only 0.01, while ISS-XGB was much higher, reaching 0.77 (Table 2). ISS-XGB successfully identified the minority class when the data distribution was extremely skewed. Table 2. Quantity accuracy of approaches with sample sets of different imbalances by MLP (multilayer perceptron), SVM (support vector machine), RF (random forest), XGB (eXtreme gradient boosting), and ISS-XGB (impartial semi-supervised learning strategy based on extreme gradient boosting).   Sensors 2020, 20, 6699 9 of 20 pixels in Area 8 with a QD' of 100% (Table 2). Both XGB and ISS-XGB provided higher probabilities (approximately 0.67 in Figure 3). However, XGB omitted most tree pixels with a QD' of 99.64% and AD' of 0% ( Table 2). The F1 score of the minority class in XGB was only 0.01, while ISS-XGB was much higher, reaching 0.77 (Table 2). ISS-XGB successfully identified the minority class when the data distribution was extremely skewed.  Under a balanced situation, ISS-XGB can achieve the same performance as the commonly used methods. The probability maps of the minority class are visually similar (second row in Figure 3). The F 1 scores of different methods are also very close and range from 0.82 to 0.86 (Table 2). However, the highest probability that ISS-XGB provides under different situations (Figure 3e,j) is more stable than other methods.

Models
This phenomenon is related to the role of unlabeled data in training. In different scenarios, unlabeled data has different importance to the description of class heterogeneity and the general information. The unlabeled data has different suppression effects in probability decision function in ISS-XGB. Therefore, in the situation with more balanced sample data, the probability estimates by ISS-XGB for the minority class do not increase as much as those of other methods (details in Section 5.1).

Overall Performance
As the contribution of the minority class to overall performance is limited in complex imbalanced datasets, a relatively simple area (i.e., area 6) was used as an example. This area, exhibiting an SHDI value of 1.43, contained six land-cover species: house, tree, soil, road, grass, and others. The minority class was the house. The visualization maps and accuracy results are shown in Figures 4 and 5 and Table 3. A detailed confusion matrix of one trial can be found in Table 4.
Under an extremely imbalanced situation, the difference in classification results was obvious (First row in Figure 4). ISS-XGB demonstrated better performance than the other approaches, providing an average OA of 85.92% (Figure 5a and Table 3). The mean F 1 score of the minority class by ISS-XGB was 0.83, which was much higher than RF (0.04), XGB (0.09), MLP, and SVM (almost 0). The house pixels in parts A and B were successfully identified using ISS-XGB (Figure 4e), while the other four methods misclassified them as soil or roads (Figure 4a-d).
Compared with the imbalanced situation, the performance with the balanced samples improved (Figure 4). Notably, the F 1 scores of MLP, SVM, RF, and XGB increased by over 0.8 ( Figure 5). The growth of the average OA and the F 1 scores of ISS-XGB was low with less than 1.8% and 0.06, respectively ( Figure 5). ISS-XGB offered comparable performance in both imbalanced and balanced cases.
As the contribution of the minority class to overall performance is limited in complex imbalanced datasets, a relatively simple area (i.e., area 6) was used as an example. This area, exhibiting an SHDI value of 1.43, contained six land-cover species: house, tree, soil, road, grass, and others. The minority class was the house. The visualization maps and accuracy results are shown in Figures 4 and 5 and Table 3. A detailed confusion matrix of one trial can be found in Table 4.   * indicates that the accuracy of one approach differs at a 95 percent level of confidence with different imbalanced samples. QD and AD in Table 4 are the overall quantity disagreement incorporating all 6 classes using the population matrix [67]. Because SVM and MLP fail at predicting the house class with imbalanced samples (2:98), the conversion of the observed sample matrix into an estimated unbiased population matrix is invalid. Thus, the overall QD and AD are invalid.
To analyze the statistical reliability of 100 trails, Table 3 shows the average and standard deviations (STD) of OA and F 1 scores for the minority class under two different imbalance scenarios. Under an extremely imbalanced situation (minority:majority = 2:98), both in terms of OA and minority F 1 scores, the ISS-XGB method showed more excellent accuracy and stability. Especially, the STD of the minority F 1 scores via ISS-XGB was only 30% of that by RF, and 21% of the STD by XGB. It implied that ISS-XGB had better stability in this case than other methods. However, in imbalanced situations, ISS-XGB did not have obvious advantages in extremely imbalanced situations ( Table 3).
The QD and AD of ISS-XGB with imbalanced samples were similar to those with balanced samples ( Table 4). The slight increase with balanced samples was mainly due to a decrease of commission errors in the minority class (e.g., Parts C and D in Figure 4e,g). Balanced samples provided more helpful information for the minority class identification.
In general, ISS-XGB achieved a stable performance. Especially under extremely imbalanced situations, ISS-XGB provided high accuracy for the minority class without losing overall performance.

The Performance under Different Levels of Data Complexity
To analyze the performance of ISS-XGB under different levels of data complexity, experiments were conducted in 8 areas across 50 different sample distributions (i.e., the class proportions were 1:99, 2:98, 3:97, . . . , 50:50). The curves of the OA and F 1 scores are shown in Figures 6 and 7. population matrix is invalid. Thus, the overall QD and AD are invalid.

The Performance under Different Levels of Data Complexity
To analyze the performance of ISS-XGB under different levels of data complexity, experiments were conducted in 8 areas across 50 different sample distributions (i.e., the class proportions were 1:99, 2:98, 3:97,…, 50:50). The curves of the OA and F1 scores are shown in Figures 6 and 7. Figures 6 and 7 showed that when the samples were extremely imbalanced (i.e., the class proportion ranges from 2:98 to 10:90), the accuracy of ISS-XGB was higher than the other methods. For example, in area 2, the OA of ISS-XGB (Figure 6b) converged to 87.63% when the class proportion was 10:90, while other methods were close to or far below this value. MLP and SVM converged to approximately 82% when the sample distribution was balanced (50:50). RF generated better performance (87.31% at the class proportion of 50:50) than MLP and SVM but much worse performance than that of XGB and ISS-XGB. Although the maximum OA of XGB was slightly higher (90.19%), the requirements for the samples also increased. In either case, the OA of ISS-XGB always converged within a short interval (i.e., before the class proportion increases to 10:90). Moreover, this advantage of ISS-XGB was more obvious in terms of the F1 scores for the minority class ( Figure 7); i.e., the performance more quickly converged under other approaches and ultimately achieved a comparable performance.
The proposed method performed well on extremely imbalanced remote sensing datasets regardless of the data complexity.

Discussion
To explore the characteristics and advantages of the proposed ISS-XGB, this section discusses the influence of unlabeled data (section A), the utility of the semi-supervised learning strategy (section B), and develops comparisons with methods at the data level in imbalanced learning.  Figures 6 and 7 showed that when the samples were extremely imbalanced (i.e., the class proportion ranges from 2:98 to 10:90), the accuracy of ISS-XGB was higher than the other methods. For example, in area 2, the OA of ISS-XGB (Figure 6b) converged to 87.63% when the class proportion was 10:90, while other methods were close to or far below this value. MLP and SVM converged to approximately 82% when the sample distribution was balanced (50:50). RF generated better performance (87.31% at the class proportion of 50:50) than MLP and SVM but much worse performance than that of XGB and ISS-XGB. Although the maximum OA of XGB was slightly higher (90.19%), the requirements for the samples also increased. In either case, the OA of ISS-XGB always converged within a short interval (i.e., before the class proportion increases to 10:90). Moreover, this advantage of ISS-XGB was more obvious in terms of the F 1 scores for the minority class ( Figure 7); i.e., the performance more quickly converged under other approaches and ultimately achieved a comparable performance.

The Influence of Unlabeled Data on ISS-XGB
The proposed method performed well on extremely imbalanced remote sensing datasets regardless of the data complexity.

Discussion
To explore the characteristics and advantages of the proposed ISS-XGB, this section discusses the influence of unlabeled data (section A), the utility of the semi-supervised learning strategy (section B), and develops comparisons with methods at the data level in imbalanced learning.

The Influence of Unlabeled Data on ISS-XGB
Unlabeled samples can provide supplementary information about the non-target class, the influence of which is not the same under different imbalanced situations.
When the samples were extremely imbalanced, ISS-XGB had a significant advantage in identifying the minority class. The methods that adopted AIO strategies (RF, XGB, and MLP) were always partial to the majority class, generating biased results. ISS-XGB broke up multi-class problems into several binary-class classification tasks, ensuring that the minority class received sufficient consideration. Thus, the accuracy of ISS-XGB converged quickly to a higher value. Although this strategy was quite similar to that for SVM, the unlabeled samples used in ISS-XGB offered extra distribution information while the negative samples in SVM did not. In particular, when the samples suffered from extreme rarity, SVM failed to fit in the hyperplane and made almost entirely incorrect predictions (Figures 3 and 4b, Tables 2 and 3). Additionally, the unlabeled data in the training process of ISS-XGB provided the general information of the data to remedy the incomplete description of the target class by rare positive samples. Thus, ISS-XGB provided higher probability prediction for the minority class than the other methods (Figure 3a-e).
When the data were relatively balanced, the advantages of ISS-XGB were not significant. For further explanation, the 2D spectral spaces of each class in Area 6 were drawn ( Figure 8). As the data tended to be balanced, the descriptive ability of the target samples improved (Figure 8b-g). The unlabeled samples were likely less intense or had a negative influence because the unlabeled data represented a mixed class. The features of the data inevitably reflected the target class (Figure 8h), which blurred the distinctions between the unlabeled data and the target samples. Thus, when the positive samples were sufficient, the unlabeled samples interfered with PU heterogeneity and suppressed the class description. Under this situation, the distinction of the positive-positive scheme was better, resulting in a more accurate performance (e.g., XGB and RF in Figures 6 and 7) and higher possibility prediction (Figure 3f-j). Sensors 2020, 20, x FOR PEER REVIEW 14 of 20

Comparison with PU-BP and PU-SVM
ISS-XGB, PU-BP, and PU-SVM are all semi-supervised methods based on the PU learning framework. The main differences between ISS-XGB and the other two are the strategy patterns for unlabeled data and the positive-unlabeled simulator. For the contrastive analysis, the same training samples from Areas 6 and 8 were employed. Details of PU-BP and PU-SVM can be found in [50,55]. Figure 9 shows the OA and F1 scores for the minority classes of the three methods under different imbalanced situations (i.e., the ratio of the minority and majority from 1:99 to 50:50). ISS-XGB (the red line) had higher and more stable accuracy than PU-BP and PU-SVM. Notably, when the ratio of the minority and the majority was under an extremely imbalanced situation (i.e., from 1:99 to 10:90), the F1 score of ISS-XGB for the minority class was an average of 0.14 and 0.10 greater than that of PU-BP and PU-SVM.
Unlabeled data were used in all three methods, but different usages resulted in significant differences in accuracy. In PU-BP, 5000 unlabeled data were employed, which was more than 5 times the number of positive samples for the minority. In this case, imbalanced learning still existed, while the simulator of PU-BP (i.e., BP) was sensitive to the skewed sample distribution. Thus, the curves of PU-BP (the blue line) strongly fluctuated in their OA and F1 scores. In PU-SVM, although the same amount of unlabeled data and positive samples was used, this amount was not sufficient to provide enough information to construct the hyperplane. Therefore, the accuracy under PU-SVM was much lower than that under ISS-XGB (nearly 20% for OA, and approximately 15% for the F1 score of the minority class). In ISS-XGB, impartial and multiple positive-unlabeled datasets were constructed. Optimization of the strategy pattern and simulator of ISS-XGB helped the characteristics of the minority be learned fairly and comprehensively.

Comparison with PU-BP and PU-SVM
ISS-XGB, PU-BP, and PU-SVM are all semi-supervised methods based on the PU learning framework. The main differences between ISS-XGB and the other two are the strategy patterns for unlabeled data and the positive-unlabeled simulator. For the contrastive analysis, the same training samples from Areas 6 and 8 were employed. Details of PU-BP and PU-SVM can be found in [50,55]. Figure 9 shows the OA and F 1 scores for the minority classes of the three methods under different imbalanced situations (i.e., the ratio of the minority and majority from 1:99 to 50:50). ISS-XGB (the red line) had higher and more stable accuracy than PU-BP and PU-SVM. Notably, when the ratio of the minority and the majority was under an extremely imbalanced situation (i.e., from 1:99 to 10:90), the F 1 score of ISS-XGB for the minority class was an average of 0.14 and 0.10 greater than that of PU-BP and PU-SVM.
Unlabeled data were used in all three methods, but different usages resulted in significant differences in accuracy. In PU-BP, 5000 unlabeled data were employed, which was more than 5 times the number of positive samples for the minority. In this case, imbalanced learning still existed, while the simulator of PU-BP (i.e., BP) was sensitive to the skewed sample distribution. Thus, the curves of PU-BP (the blue line) strongly fluctuated in their OA and F 1 scores. In PU-SVM, although the same amount of unlabeled data and positive samples was used, this amount was not sufficient to provide enough information to construct the hyperplane. Therefore, the accuracy under PU-SVM was much lower than that under ISS-XGB (nearly 20% for OA, and approximately 15% for the F 1 score of the minority class). In ISS-XGB, impartial and multiple positive-unlabeled datasets were constructed. Optimization of the strategy pattern and simulator of ISS-XGB helped the characteristics of the minority be learned fairly and comprehensively. Sensors 2020, 20, x FOR PEER REVIEW 15 of 20

Comparison with SMOTE Sampling-Based Methods
SMOTE is a popular method in imbalanced learning [25]. This method uses k minority class nearest neighbors to generate synthetic samples by operating in feature space [16], which can be implemented directly at the data level without any modifications to the algorithms.
A comparison was conducted between ISS-XGB and SMOTE-based XGB (SXGB), RF (SRF), MLP (SMLP), and SVM (SSVM). Five neighbors and 100% of the class proportion were set to create new balanced sample sets via SMOTE. Then, the classifiers were trained with generated balanced sets.
Areas 6 and 8 were chosen as examples for discussion ( Figure 10). In general, ISS-XGB (the red line) provided comparable (even higher) performance to the SMOTE-based methods. As shown in Figure 10c, the highest OA of ISS-XGB (91.25%) is similar to that of SXGB (the yellow line, 91.07%) and 5.23% and 12.06% higher than that of SRF (the green line) and SSVM (the blue line), respectively. Moreover, the OA and F1 scores of ISS-XGB were relatively more stable than those of the SMOTE-based methods. For instance, in Area 6, the OA of SMLP (the brown line) fluctuated between 0.76 and 0.87 dramatically, while that of ISS-XGB quickly converged to approximately 0.88 (Figure 10a). Under an extremely imbalanced situation (i.e., class proportions ranging from 1:99 to 10:90), the advantages of ISS-XGB were more obvious. The difference of the F1 score between ISS-XGB and SXGB and SSVM was at least 0.10 ( Figure 10d).

Comparison with SMOTE Sampling-Based Methods
SMOTE is a popular method in imbalanced learning [25]. This method uses k minority class nearest neighbors to generate synthetic samples by operating in feature space [16], which can be implemented directly at the data level without any modifications to the algorithms.
A comparison was conducted between ISS-XGB and SMOTE-based XGB (SXGB), RF (SRF), MLP (SMLP), and SVM (SSVM). Five neighbors and 100% of the class proportion were set to create new balanced sample sets via SMOTE. Then, the classifiers were trained with generated balanced sets.
Areas 6 and 8 were chosen as examples for discussion ( Figure 10). In general, ISS-XGB (the red line) provided comparable (even higher) performance to the SMOTE-based methods. As shown in Figure 10c, the highest OA of ISS-XGB (91.25%) is similar to that of SXGB (the yellow line, 91.07%) and 5.23% and 12.06% higher than that of SRF (the green line) and SSVM (the blue line), respectively. Moreover, the OA and F 1 scores of ISS-XGB were relatively more stable than those of the SMOTE-based methods. For instance, in Area 6, the OA of SMLP (the brown line) fluctuated between 0.76 and 0.87 dramatically, while that of ISS-XGB quickly converged to approximately 0.88 (Figure 10a). Under an extremely imbalanced situation (i.e., class proportions ranging from 1:99 to 10:90), the advantages of ISS-XGB were more obvious. The difference of the F 1 score between ISS-XGB and SXGB and SSVM was at least 0.10 ( Figure 10d). SMOTE synthesized extra samples for the minority class but was not adopted in ISS-XGB because, in remote sensing datasets, each sample corresponds to a certain object, while SMOTE-synthesized samples may not be related to any objects. The synthesized samples may produce uncertainty in the feature space, which could greatly impact some classifiers (e.g., SMLP). Thus, in ISS-XGB, cost-free unlabeled data were introduced to provide extra information on the overall distribution, rather than the synthesized positive data for the minority class. In addition, since the positive samples of the minority class are extremely rare in many scenarios, multiple unlabeled data sets (10 in this paper) were used for impartial semi-supervised training.

Conclusions
This study proposed an impartial semi-supervised learning strategy (i.e., ISS-XGB) for land-cover classification with imbalanced remote sensing data. The results generated from the eight study areas demonstrated that ISS-XGB can effectively identify minority classes, especially under extremely imbalanced situations. This method provided high accuracy for the minority class without degrading overall performance and was robust to data complexity. The cost-free unlabeled data utilized in ISS-XGB could provide extra information about the non-target class. Moreover, compared with other semisupervised methods based on the PU learning framework, optimizing in the strategy patterns and simulations of ISS-XGB can help the characteristics of the minority be learned more fairly and comprehensively. This work provides a new strategy for solving imbalanced classification problems in remote sensing. In the future, combinations of multiple strategies with ISS-XGB and computational efficiency will be further investigated. SMOTE synthesized extra samples for the minority class but was not adopted in ISS-XGB because, in remote sensing datasets, each sample corresponds to a certain object, while SMOTE-synthesized samples may not be related to any objects. The synthesized samples may produce uncertainty in the feature space, which could greatly impact some classifiers (e.g., SMLP). Thus, in ISS-XGB, cost-free unlabeled data were introduced to provide extra information on the overall distribution, rather than the synthesized positive data for the minority class. In addition, since the positive samples of the minority class are extremely rare in many scenarios, multiple unlabeled data sets (10 in this paper) were used for impartial semi-supervised training.

Conclusions
This study proposed an impartial semi-supervised learning strategy (i.e., ISS-XGB) for land-cover classification with imbalanced remote sensing data. The results generated from the eight study areas demonstrated that ISS-XGB can effectively identify minority classes, especially under extremely imbalanced situations. This method provided high accuracy for the minority class without degrading overall performance and was robust to data complexity. The cost-free unlabeled data utilized in ISS-XGB could provide extra information about the non-target class. Moreover, compared with other semi-supervised methods based on the PU learning framework, optimizing in the strategy patterns and simulations of ISS-XGB can help the characteristics of the minority be learned more fairly and comprehensively. This work provides a new strategy for solving imbalanced classification problems in remote sensing. In the future, combinations of multiple strategies with ISS-XGB and computational efficiency will be further investigated.