A Deep Neural Networks Approach for Augmenting Samples of Land Cover Classification

Land cover is one of key indicators for modeling ecological, environmental, and climatic processes, which changes frequently due to natural factors and anthropogenic activities. The changes demand various samples for updating land cover maps, although in reality the number of samples is always insufficient. Sample augment methods can fill this gap, but these methods still face difficulties, especially for high-resolution remote sensing data. The difficulties include the following: (1) excessive human involvement, which is mostly caused by human interpretation, even by active learning-based methods; (2) large variations of segmented land cover objects, which affects the generalization to unseen areas especially for proposed methods that are validated in small study areas. To solve these problems, we proposed a sample augment method incorporating the deep neural networks using a Gaofen-2 image. To avoid error accumulation, the neural network-based sample augment (NNSA) framework employs non-iterative procedure, and augments from 184 image objects with labels to 75,112 samples. The overall accuracy (OA) of NNSA is 20% higher than that of label propagation (LP) in reference to expert interpreted results; the LP has an OA of 61.16%. The accuracy decreases by approximately 10% in the coastal validation area, which has different characteristics from the inland samples. We also compared the iterative and non-iterative strategies without external information added. The results of the validation area containing original samples show that non-iterative methods have a higher OA and a lower sample imbalance. The NNSA method that augments sample size with higher accuracy can benefit the update of land cover information.


Introduction
Land cover is the physical material (e.g., grass, trees, bare ground, and water) at the surface of the earth, which is an essential parameter for global change, crop production estimation, and terrestrial water cycle [1][2][3][4]. Land cover is changing as a result of both natural factors and human activities, increasing difficulties and uncertainties in updating land cover maps [5]. Fortunately, remote sensing technology provides images that models reality of land cover as an indispensable data source for updating land cover maps [6]. A sample usually contains the label information of a location in image, and it is used to label other locations in the image. The sample is undoubtedly crucial for updating land cover maps by remote sensing classification, as it impacts the accuracy and quality of the end product.
The sample size (i.e., the number of samples) affects the accuracy of remote sensing classification, and reducing the number of samples will produce lower classification accuracy in general [7][8][9][10][11], especially for object-based image analysis (OBIA). For a remote sensing image, the classification can be conducted on each pixel or a bunch of neighboring pixels (i.e., image object) [12,13]. Compared with pixels that provides spectral information, image objects contain additional information on spectra, geometry, context, and texture. Thus, OBIA leads to sample sparsity in high dimensional data space, which increases the needs for larger sample size [14]. Although some machine learning algorithms, which are popular in supervised remote sensing classification, are tolerable to insufficient sample size in high dimensions, studies show that the sample size leads to larger variations in accuracy than the algorithms themselves [10,15]. A large sample size demands a lot of manpower and financial resources, which may seem unrealistic for updating land cover maps when the land-surface elements are continuously changing both temporally and spatially. A small sample set is obviously more efficient in manpower and financial resources, and one that is interpreted by experts is especially more representative than random sampling results, since an expert can generalize the characteristics of some land-surface elements successfully from few examples or their early experience [16]. However, the small sample size may highly reduce the land cover classification accuracy for current computer-based classification algorithms. Thus, it is necessary to augment sample size from small to large to ensure the accuracy of land cover maps.
To augment a sample size, a number of techniques that utilize an existing small sample set have been developed [17]. These can be categorized into two basic types [11]: (1) active learning and (2) semisupervised learning. Also, classification methods with prediction probability can be used in sample augment [18][19][20]. Active learning will query unlabeled samples in the training data set for their labels, and thus requires a lower number of samples to classify land covers. Much research has been conducted on active learning-based sample size augmented for remote sensing classification [21][22][23]. Although active learning can reduce the amount of sample required, the process of labeling new samples still demands a large amount of manpower, especially in a complicated study area. Meanwhile, the samples queried by active learning may be uninformative or indistinguishable by humans. Semisupervised learning uses unlabeled samples with the help of labeled samples. The label propagation (LP) algorithm is one of the popular semisupervised methods, which exploits labeled and unlabeled samples in constructing a graph model to predict the labels of unlabeled samples by the similarity between two samples [24][25][26]. For example, Shi et al. [27] used LP to predict the unlabeled samples in remote sensing image classification, and Wang, Hao, Wang, and Wang [25] propagated labels to unlabeled samples using LP with the help of spatial-spectral graph. However, the feature vector of segmented land cover image objects varies with segmentation parameters, which brings difficulty in selecting a representative sample set. The variation in land cover inherent from the dynamics and complexity of land covers [28] also affects the generalization of LP derived graph under small sample size, which is a big challenge for sample augmentation. One possible way to alleviate the effects is to utilize the generalization power of deep neural networks (DNN) [29], which have achieved remarkable practical success in various application domains [30][31][32][33]. Though there are sample augment methods incorporating current development of neural networks on hyperspectral images [34][35][36][37], the methods working on features derived from multispectral images are still lacking.
To alleviate the effects of variations and to improve the size and accuracy of sample augmented results, we developed a sample augmented framework that incorporates DNN. The proposed neural network-based sample augment (NNSA) framework can be described in four steps: (1) Select optimal features for identifying each land cover category; (2) measure similarities between image objects and samples belonging to a certain land cover category; (3) feed DNN with the similarity measurement results; and (4) cluster to refine sample augmented results by DNN. To quantitatively evaluate the proposed NNSA, we compared the method results with those of LP and DNN in reference to expert interpretation results. We compared the generalization capacities of the three methods on another unseen coastal validation area, which is different from samples that only contain inland land cover characteristics. Furthermore, we compared the iterative and non-iterative strategies for sample augment in the inland validation area.

Data and Study Area
The Gaofen-2 image data acquired on 26 May 2015, were used as a major data source. The Gaofen-2 satellite scans a swath of~45 km and provides 1-m panchromatic images and 4-m multispectral images with 4 bands that belong to a series of civilian high-resolution optical satellites of the China National Space Administration. Both multispectral and panchromatic images were orthorectified and corrected to surface reflectance with the FLAASH (Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes) algorithm [38] and then were fused together using the Gram-Schmidt pan-sharpen method [39].
The study area is approximately 518 km 2 over the coast of the Bohai Sea and covers more than 90 villages and 3 towns (Figure 1). This area belongs to the Beijing-Tianjin-Hebei region, where unprecedented coastal development has led to fragmented, complicated and fast-changing land covers [40]. The land cover type of the study area consists of six classes, including water, forestland, grass land, crop land, bare land, and residential and built-up land, which involve in the entire National Land Resource Classification System of China. Furthermore, high-resolution satellites provide more detailed information on land covers and greater separability between further subcategories, which leads to increased intraclass variations. The residential and built-up land mainly includes residential districts, industrial areas, roads, and vegetable greenhouses. The forestland is comprised of sparse forest alongside roads, dense forest along the river and close to the sea, and juvenile woodland. Crop land has different colors, such as brown, light brown, dark brown, and green, which depend on the crops and soil moisture. The large intraclass variability poses challenges for sample augmentation. For example, crop lands that are in green color are easily confused with the grass land during sample augmentation.

Data and Study Area
The Gaofen-2 image data acquired on 26 May, 2015, were used as a major data source. The Gaofen-2 satellite scans a swath of ~45 km and provides 1-m panchromatic images and 4-m multispectral images with 4 bands that belong to a series of civilian high-resolution optical satellites of the China National Space Administration. Both multispectral and panchromatic images were orthorectified and corrected to surface reflectance with the FLAASH (Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes) algorithm [38] and then were fused together using the Gram-Schmidt pan-sharpen method [39].
The study area is approximately 518 km 2 over the coast of the Bohai Sea and covers more than 90 villages and 3 towns (Figure 1). This area belongs to the Beijing-Tianjin-Hebei region, where unprecedented coastal development has led to fragmented, complicated and fast-changing land covers [40]. The land cover type of the study area consists of six classes, including water, forestland, grass land, crop land, bare land, and residential and built-up land, which involve in the entire National Land Resource Classification System of China. Furthermore, high-resolution satellites provide more detailed information on land covers and greater separability between further subcategories, which leads to increased intraclass variations. The residential and built-up land mainly includes residential districts, industrial areas, roads, and vegetable greenhouses. The forestland is comprised of sparse forest alongside roads, dense forest along the river and close to the sea, and juvenile woodland. Crop land has different colors, such as brown, light brown, dark brown, and green, which depend on the crops and soil moisture. The large intraclass variability poses challenges for sample augmentation. For example, crop lands that are in green color are easily confused with the grass land during sample augmentation.  The size of Gaofen-2 image data of our study area is 26,631 rows by 27,407 columns. It is segmented to 154,667 image objects using eCognition Developer 9.0 software (Trimble Inc., Munich, Germany) and an automated parameterization algorithm [41]. A total of 28 attributes were calculated on these image objects (Table 1). Then, a total of 184 instances for all six land cover categories was selected as initial samples for augmentation ( Figure 1c). To verify the sample augmentation method, we collected the expert-interpreted land cover results (Table 2), which consists of two parts (Figure 1c,d), with a total of 23,484 image objects.

Feature Selection from Small-Size Samples
Feature selection can improve the performance of object-based image classification [11]. To select features under low sample size and high dimensionality, we adapted a revised version of our previous work [14], which is designed to provide a solution to this problem. The previous work, namely the group-corrected partial least squares generalized linear regression (PLSGLR) method, can be described in three steps: (1) Group features based on Pearson's correlation coefficient; (2) rank features by PLSGLR and remove insignificant features; (3) reconstruct categories when the features are added one-by-one to calculate the Bayesian information criterion.
Compared to our previous work, we improved the stability of feature selection against random sampling uncertainty by incorporating a co-occurrence matrix and voting strategy. Given the feature group result G that results from one of the total sampling numbers, N, the co-occurrence matrix P can be defined as where i and j are the i-th feature and the j-th feature, respectively. When the value of the co-occurrence between a pair of features is greater than threshold, th, the group between two features is retained; otherwise it is discarded. With regard to the ranking matrix, R, with m rows and N columns, the feature of the i-th position, f i , can be expressed as where m is the number of features. For the given ranking vector {R 1 , R 2 , · · · , R n }, the BIC matrix B with n rows and N columns, the final feature number, n f , is defined as where n is the number of ranking features (n ≤ m), and b k is the number of optimal features at the k-th sampling result.

Similarity Measurement of Image Objects
Nonparametric methods make fewer preliminary hypotheses and are more powerful for describing the nonlinear and complex relationships [42]. To model the relationship between unlabeled image objects and samples that are marked as belonging to a certain category, a nonparametric method, kernel density estimation (KDE), which had been applied to land cover classification [18], is used to extract the relationships. For samples of a category, we use KDE to extract the curve on one of the selected features without assuming the relationship in advance. The curve f (x) can be described as where M is the number of samples of a category, x is the value to be estimated, x i is value of the i-th sample, K is the kernel function, and h is the bandwidth that controls smoothness of the estimated curve. In this study, we used the Gaussian kernel and determined the bandwidth with the normal reference rule [43].
To improve the performance and stability, we employed repeated sampling with a replacement strategy and calculated the normalized relationship each time. The curve F(x) can be represented as the average of a set of curves that result from the repeated sampling process.
After the curve F(x) of a feature on a category is determined, the similarity of the selected features of a certain category is calculated. For an unlabeled image object, the similarities corresponding to values of the selected features are calculated by the interpolation method, and similarities of different features are weighted to generate the final similarity. The similarity S can be expressed as follows where j is the j-th selected feature and ω is user-defined weight vector. As the results of feature selection are sorted in descending order of importance, a weight vector (0, 0, ..., 1) means the limiting factor principle; a weight vector (1, 0, ..., 0) represents the dominant factor principle; and a weight vector (1, 1, ..., 1) indicates the average principle.

Sample Augmentation by Neural Networks
After feature selection and similarity measurement, the image object that has a similarity over 0.5 for a certain land cover category is feed to a DNN. The structure of DNN is displayed in Figure 2. Given an image object X with a label y, the feature vector f 1 , f 2 , · · · , f m is fed to the input layer of a DNN. For the hidden neuron h j that is activated with a rectified linear unit (ReLU), the output value can be written as In view of the output neuron o k , which incorporates the results of multiple hidden layers, S k can be described as where j is the j-th hidden node, and k is the k-th output node. The parameters of the DNN are optimized to minimize the binary cross entropy error: where y k is the label of an image object, and o k is the prediction result. In addition, dropout is employed to prevent the DNN from overfitting [44]. In this study, we used dropout layer with a fixed drop rate of 0.25 between each hidden layer, which could prevent overfitting. We used RMSprop optimizer with adaptive learning rate to train the model, and cross entropy to evaluate predicted classes. The implementation was based on Keras in python environment [45].
The data set is then randomized and split into a training set accounting for 80% and a testing set accounting for 20%. A DNN with the same structure is trained using the training set. The DNN used in this study has four hidden layers. In addition, the number of neurons of each layer in the DNN is determined by 80% decreasing in number, empirically.

Postprocessing by Clustering
The similarity measurement can introduce an error into the sample augmentation process, and the error may be further amplified with supervised learning. To eliminate these error samples, a clustering method is used can discover the potential structure of samples. It is feasible in practice to manually specify the cluster centers or to automatically determine the cluster numbers with some criteria. In this study, we employed the spectral clustering method [46,47] for post processing. Given a set of augmented samples, X, with the calculated image objects features, the similarity ω ij between two samples (image objects), x i and x j , can be written as where x i ∈ Ω x j indicates that x i is among the k-nearest neighbors of x j , and δ controls the width of the neighborhoods. Then, the normalized Laplacian matrix L rm of the similarity matrix W is constructed as follows where E is an identity matrix, and D is a diagonal matrix with diagonal elements written as The eigenvectors of the Laplacian matrix are sorted according to the eigenvalues, and the top k eigenvectors (e.g., a number of cluster) are delivered into the k-means algorithm. The cluster number k is determined by Bartlett's test [48]. As a summary, Figure 2 shows the detailed workflow of sample augment by the proposed framework.

Validation
To test thoroughly the sample augment results, we compared the results of NNSA with the results of the LP and DNN. As the core part of NNSA, DNN has the potential to augment samples directly without the help of the proposed framework. Two regions with experts interpreted land covers are employed to compare the augment accuracy. One region (Figure 1c) contains all 184 training and testing samples, while the other region ( Figure 1d) does not contain any samples used in this study. The two regions show different land cover characteristics. One covered with typical inland land covers has large areas of crop land, and the other in coastal areas has marine farms and more forest land. We compared the overall accuracy (OA) and sample imbalance of these methods in each region. The sample imbalance is described by the maximum value of the sample size ratio between different land cover categories.
We also compared the iterative and non-iterative strategies in sample augment. In the iterative scenario, we iteratively improved the number of samples of LP and support vector machine (SVM). The radial basis kernel function was used in SVM with the gamma parameter of 5 and the cost parameter of 10 after tuning. We iterated 300 times over the region that contains original samples for each scenario, adding 10 samples to training dataset each time for iterative scenario, and assessing the results by OA and sample imbalance. In the non-iterative scenario, we just trained from the original samples and selected equal sample size by classification probabilities.

Sample Augmentation Results
The accuracy and loss curve for training DNN in NNSA are shown in Figure 3. The accuracy values for were converged at about 96% and 98% from the epoch number of 1500. The training stopped at the epoch number of 3442, after the model was not improved after trying 500 epochs. The sample augmentation results by the NNSA are shown in Figure 4. There are 48.56% image objects assigned to a certain category by NNSA. Among these augmented samples, 43.76% image objects, which were distributed in the oceans, rivers, and fishery farms, were identified as water. In view of the crop land, 22.47% image objects were mainly scattered around the residential land in the eastern part of the study area. A further 16.95% image objects that were concentrated in residential The sample augmentation results by the NNSA are shown in Figure 4. There are 48.56% image objects assigned to a certain category by NNSA. Among these augmented samples, 43.76% image objects, which were distributed in the oceans, rivers, and fishery farms, were identified as water. In view of the crop land, 22.47% image objects were mainly scattered around the residential land in the eastern part of the study area. A further 16.95% image objects that were concentrated in residential areas, roads, and fisheries farms were identified as the residential and built-up land. There were 9.57% image objects that were labeled as the forestland, which was mainly distributed near the sea and in the vicinity of the rivers. The 2.01% image objects that were assigned to the grass land were located around the forestland and water. The remaining 5.23% image objects that were recognized as bare land were located by the sea and rivers. The augmented samples are consistent with the distribution of different land cover categories, although there are inevitably some errors, such as some crop land in the green being labeled as the grass land.

Performance in the Similar Land Cover Region
The validation area with original 184 training samples has typical inland land cover characteristics. In this validation area, the OAs of NNSA, LP, and DNN are 83.85%, 61.16%, and 88.52%, respectively. Figure 5 demonstrates further accuracy assessment in metrics of UA and PA. The absence of the grass land and the bare land in DNN results from their prediction probabilities being lower than 0.9 (to acquire similar sample size in the validation regions, we adopt this threshold). In these two land cover categories, NNSA and LP both suffer from low accuracy. As to the remaining land cover categories, the DNN has higher accuracy than those of LP, which demonstrates the reasonability of the introduction DNN to sample augment area.

Performance in the Similar Land Cover Region
The validation area with original 184 training samples has typical inland land cover characteristics. In this validation area, the OAs of NNSA, LP, and DNN are 83.85%, 61.16%, and 88.52%, respectively. Figure 5 demonstrates further accuracy assessment in metrics of UA and PA. The absence of the grass land and the bare land in DNN results from their prediction probabilities being lower than 0.9 (to acquire similar sample size in the validation regions, we adopt this threshold). In these two land cover categories, NNSA and LP both suffer from low accuracy. As to the remaining land cover categories, the DNN has higher accuracy than those of LP, which demonstrates the reasonability of the introduction DNN to sample augment area.  Figure 6 shows the spatial distribution of the augmented samples. All the three methods (i.e., NNSA, LP, and DNN) perform well in the crop land with a PA of over 90%. For water and forest land, DNN has similar accuracy with NNSA, while LP labels some wet crop land as water, and some buildings as the forest land. In view of grass land and bare land, DNN has lower probability in identifying these image objects. NNSA and LP both suffer from low accuracy in these two land cover categories, especially in the grass land. The grass land is very similar to crop land in green, since the segmentation scale invalid the shape difference between crop land and grass land. The clustering process of NNSA alleviates the problem to some extent. NNSA and DNN achieved better results than LP by numeric accuracy assessment and visual inspection in this validation area.

Performance in Dissimilar Land Cover Region
In this validation area, the OAs of NNSA, LP, and DNN are 75.80%, 50.53%, and 72.45%, respectively. The OA values are about 10% lower than the above validation region. The Figure 7 demonstrates the UAs and PAs of these three methods. The DNN has high accuracy in forest land  Figure 6 shows the spatial distribution of the augmented samples. All the three methods (i.e., NNSA, LP, and DNN) perform well in the crop land with a PA of over 90%. For water and forest land, DNN has similar accuracy with NNSA, while LP labels some wet crop land as water, and some buildings as the forest land. In view of grass land and bare land, DNN has lower probability in identifying these image objects. NNSA and LP both suffer from low accuracy in these two land cover categories, especially in the grass land. The grass land is very similar to crop land in green, since the segmentation scale invalid the shape difference between crop land and grass land. The clustering process of NNSA alleviates the problem to some extent. NNSA and DNN achieved better results than LP by numeric accuracy assessment and visual inspection in this validation area.

Performance in Dissimilar Land Cover Region
In this validation area, the OAs of NNSA, LP, and DNN are 75.80%, 50.53%, and 72.45%, respectively. The OA values are about 10% lower than the above validation region. The Figure 7 demonstrates the UAs and PAs of these three methods. The DNN has high accuracy in forest land and crop land, except it mislabeled some wet crop land as water. For the grass land, LP performs better in this validation area since there are more grass lands. LP suffered from low accuracy in crop land and water, as it also identified wet crop land as water. NNSA performs better in these land cover categories except grass land. The spatial distribution of the augmented samples is shown in Figure 8. The consistency between these method results and expert interpretation decreased, compared to the previous validation area. The reduction in consistency may result from the complex land covers in coastal area. This area has both inland land covers and marine land covers. The original samples only contain the characteristics of inland land covers. Thus, the reduction in accuracy is inevitable. The ditches scatter in crop land increase the soil moisture, which confuses the LP method. NNSA achieved better results by visual inspection in this validation area. The spatial distribution of the augmented samples is shown in Figure 8. The consistency between these method results and expert interpretation decreased, compared to the previous validation area. The reduction in consistency may result from the complex land covers in coastal area. This area has both inland land covers and marine land covers. The original samples only contain the characteristics of inland land covers. Thus, the reduction in accuracy is inevitable. The ditches scatter in crop land increase the soil moisture, which confuses the LP method. NNSA achieved better results by visual inspection in this validation area.

Effects of Segmented Land Cover Image Objects and Intraclass Variability
The results of this paper show that all the three methods (i.e., NNSA, LP, and DNN) have a reduction in accuracy in the validation area without samples, though NNSA incorporates the generalization capacity of DNN up to four hidden layers. We interpret the mechanism as follows: as illustrated in Figure 9, the crop land has at least four manifestations in the study area, and the samples collected from a small part of the area take no consideration of the variations of each land cover category. Also, the segmentation scheme converting images from pixels to objects affects the results. Inevitably, the objects corresponding to ground entities and patches of surface cover depend on the segmentation parameters [13,49]. The parameters in this study area are selected by optimizing the mean of local variance in a global search scheme [41]. The segmented objects therefore do not guarantee the fitness on each land cover category. For example, reflectivity of crop land is affected by many factors such as humidity, crop variety, growth cycle, and temperature in reality; experts identify it by regular boundaries. When the segmentation results on crop land is fragmentized, the information on geometry describing the boundary will be invalidated. Segmentation based on certain sub-category land cover objects may relieve this problem, which needs further studies.

Effects of Segmented Land Cover Image Objects and Intraclass Variability
The results of this paper show that all the three methods (i.e., NNSA, LP, and DNN) have a reduction in accuracy in the validation area without samples, though NNSA incorporates the generalization capacity of DNN up to four hidden layers. We interpret the mechanism as follows: as illustrated in Figure 9, the crop land has at least four manifestations in the study area, and the samples collected from a small part of the area take no consideration of the variations of each land cover category. Also, the segmentation scheme converting images from pixels to objects affects the results. Inevitably, the objects corresponding to ground entities and patches of surface cover depend on the segmentation parameters [13,49]. The parameters in this study area are selected by optimizing the mean of local variance in a global search scheme [41]. The segmented objects therefore do not guarantee the fitness on each land cover category. For example, reflectivity of crop land is affected by many factors such as humidity, crop variety, growth cycle, and temperature in reality; experts identify it by regular boundaries. When the segmentation results on crop land is fragmentized, the information on geometry describing the boundary will be invalidated. Segmentation based on certain sub-category land cover objects may relieve this problem, which needs further studies. In addition to the fragmentized segmentation results, intraclass variability of a land cover category also affect the augmentation results. In this study, we used a classification system derived from Data Center for Resources and Environmental Sciences, Chinese Academy of Sciences [50]. Take crop land as an example: the category consists of paddy field and dry farming field, which was defined by the way of land use. Since it is unreasonable to define subcategories for land covers in detail, visual inspection as in [14] is inevitable. As a complementary, an unsupervised clustering method as in Section 2.5 is also recommended. With a better segmentation result with a refined classification system, the efficiency NNSA will be improved. The refined classification system, which decreases the intraclass variability by subcategories, needs further exploration.

Comparisons with Other Sample Augment Methods
The NNSA extends the previous studies that augments small samples to large samples. NNSA utilizes classification methods with predication probability for sample augmentation and has higher OA than LP in reference to expert interpreted results. The NNSA uses non-iterative sample augment approach, while LP employs iterative sample augment procedure. We compared the iterative and non-iterative sample augment procedure without external information added, even active learning and semisupervised learning favor to iteratively improve the number of samples. Figure 10 shows the accuracy and sample imbalance of LP and SVM in two scenarios. In our study area, the noniterative LP had higher accuracy and lower sample imbalance. The accuracy reached the highest at the 36th iteration (OA = 89.45%), and then gradually decreased to around 70%. In view of the results of SVM, the non-iterative had a little bit lower accuracy than the iterative version. Considering the severe imbalance of the iteration SVM results (Table 3), its reasonable that the non-iterative SVM is more suitable for sample augmentation. In addition to the fragmentized segmentation results, intraclass variability of a land cover category also affect the augmentation results. In this study, we used a classification system derived from Data Center for Resources and Environmental Sciences, Chinese Academy of Sciences [50]. Take crop land as an example: the category consists of paddy field and dry farming field, which was defined by the way of land use. Since it is unreasonable to define subcategories for land covers in detail, visual inspection as in [14] is inevitable. As a complementary, an unsupervised clustering method as in Section 2.5 is also recommended. With a better segmentation result with a refined classification system, the efficiency NNSA will be improved. The refined classification system, which decreases the intraclass variability by subcategories, needs further exploration.

Comparisons with Other Sample Augment Methods
The NNSA extends the previous studies that augments small samples to large samples. NNSA utilizes classification methods with predication probability for sample augmentation and has higher OA than LP in reference to expert interpreted results. The NNSA uses non-iterative sample augment approach, while LP employs iterative sample augment procedure. We compared the iterative and non-iterative sample augment procedure without external information added, even active learning and semisupervised learning favor to iteratively improve the number of samples. Figure 10 shows the accuracy and sample imbalance of LP and SVM in two scenarios. In our study area, the non-iterative LP had higher accuracy and lower sample imbalance. The accuracy reached the highest at the 36th iteration (OA = 89.45%), and then gradually decreased to around 70%. In view of the results of SVM, the non-iterative had a little bit lower accuracy than the iterative version. Considering the severe imbalance of the iteration SVM results (Table 3), its reasonable that the non-iterative SVM is more suitable for sample augmentation.
Non-iterative procedure avoids the accumulation of error labels, which is critical for object-based image analysis. For multispectral image, there are often dozens of features after segmentation. As to hyperspectral data, there may be hundreds of features available. Only some of these features describe the characteristics of an image object, since the segmentation process does not guarantee the validity of these features. The iterative procedure may suffer from accumulation and amplification of errors during the rolling.  Non-iterative procedure avoids the accumulation of error labels, which is critical for objectbased image analysis. For multispectral image, there are often dozens of features after segmentation. As to hyperspectral data, there may be hundreds of features available. Only some of these features describe the characteristics of an image object, since the segmentation process does not guarantee the validity of these features. The iterative procedure may suffer from accumulation and amplification of errors during the rolling.
The sample augment procedure is highly needed for land cover classification over very large areas [28], as there are many unseen areas that lack samples in this scenario. To cope with the unseen areas, these methods employ different strategies. The active learning queries humans about the true labels of a series of samples to migrate to unseen areas [21,22,51], thus taking the variations of unseen areas into consideration. Compared to active learning, NNSA avoids feeding method with sequences of samples, since selecting and interpreting samples are not error-free and potentially cause biased  The sample augment procedure is highly needed for land cover classification over very large areas [28], as there are many unseen areas that lack samples in this scenario. To cope with the unseen areas, these methods employ different strategies. The active learning queries humans about the true labels of a series of samples to migrate to unseen areas [21,22,51], thus taking the variations of unseen areas into consideration. Compared to active learning, NNSA avoids feeding method with sequences of samples, since selecting and interpreting samples are not error-free and potentially cause biased classification results [52,53]. The iterative methods fuse samples with high similarities into the original sample set. As mentioned above, the segmented objects may not describe reality well, and this fusion will introduce high uncertainty in each rolling procedure. The NNSA also incorporates similar samples but refines by robust neural networks and cluster scheme without the rolling procedure, which reduces the uncertainty and avoids the error accumulation.
There is still work to be done for the development of NNSA. One part comes from the DNN itself, which remain unsolved such as hyperparameter selection and network structure determination. The other part involves the descriptions of segmented image objects. The image objects can be described as feature vectors or images in two dimensions. The proposed NNSA uses feature vectors as inputs. Recent advances in deep convolutional neural network hold the promise of describing the image objects by themselves [54], thus reducing the uncertainty of the NNSA framework.

Sample Augmentation for Remote Sensing with an Insufficient Sample Size
Recent development of remote sensing sensors supplies a large volume of fine-scale spatial-temporal data [55]. Most applications of big data methods in geography focus on social behaviors due to the sufficiency of mobile phone data, microblog data, and traffic data [56]. The big data methods are always limited by insufficient samples in land cover classification [57]. Field surveys require more investment; thus, a large sample size is always inaccessible. The NNSA framework, designed for jumping from a small sample set to a huge sample set, can alleviate the problem to some extent.
Both the sample size and sample quality are crucial [58,59], though big data methods concentrate more on sample size. The balance between sample size and sample quality such as representativeness is important for sample augmentation. The proposed NNSA currently focuses on the sample size, and the representativeness is implicit in the cluster procedure. To reduce computational overhead over large areas, representative samples can be selected from each cluster.

Conclusions
In this paper, we presented a sample augmentation framework that incorporates DNN for object-based image analysis to augment samples with high accuracy. The proposed framework was applied to a scene of Gaofen-2 image to augment 184 samples to a large sample set, and achieved an improvement on the accuracy in comparison with LP and DNN. NNSA achieved an overall accuracy about 20% higher than that of LP in both validation areas. In detail, the NNSA method achieved an overall accuracy of 83.85% in validation area with original samples and an overall accuracy of 75.80% in validation area without original samples. To prove the advantages of the non-iterative strategy used in NNSA, we compared the iterative and non-iterative sample augment procedures in the validation area when no external information was added, and found the non-iterative procedure has lower sample imbalance and higher overall accuracy. The non-iterative procedure avoids error accumulations since the segmented image objects may deviate from the ground entities in global segmentation parameter optimization procedures. The NNSA incorporates similar samples but refines them by robust neural networks and cluster scheme without the rolling procedure, which reduces the uncertainty and avoids the error accumulation.
The proposed NNSA framework can be used to generate a big sample set from a relatively small one for object-based image analysis in land cover classification. The NNSA merely utilizes the original 184 samples and avoids error accumulations by a non-iterative procedure, which can be applied in sample augment in large area. As there is a gap between insufficient sample sizes and big data for remote sensing, the proposed framework may be helpful for land cover classification by small size samples. Future work will focus on the description of image objects.