A Multi-Level Non-Uniform Spatial Sampling Method for Accuracy Assessment of Remote Sensing Image Classification Results

Accuracy assessment of classification results has important significance for the application of remote sensing images, which can be achieved by sampling methods. However, the existing sampling methods either ignore spatial correlation or do not consider spatial heterogeneity. Here, we proposed a multi-level non-uniform spatial sampling method (MNSS) for the accuracy assessment of classification results. Taking the remote sensing image of Kobo Askov, Texas, USA, as an example, the classification result of this image was obtained by Support Vector Machine (SVM) classifier. In the proposed MNSS, the studied spatial region was zoned from high to low resolution based on the features of spatial correlation. Then, the sampling rate of each zone was deduced from the low to high resolution based on the spatial heterogeneity. Finally, the positions of sample points were allocated in each zone, and the classification results of the sample points were obtained. We also used other sampling methods, including a random sampling method (SRS), stratified sampling method (SS), and spatial sampling of the gray level co-occurrence matrix method (GLCM), to obtain the classification results of the sample points (2-m resolution). Five categories of ground objects in the same region were used as the ground truth data. We than calculated the overall accuracy, Kappa coefficient, producer accuracy, and user accuracy to estimate the accuracy of the classification results. The results showed that MNSS was the strictest inspection method as shown by the minimum value of accuracy. Moreover, MNSS overcame the shortcoming of SRS, which did not consider the spatial correlation of sample points, and overcame the shortcomings of SS and GLCM, which had redundant information between sample points. This paper proposes a novel sampling method for the accuracy assessment of classification results of remote sensing images.


Introduction
Remote sensing images have been widely used in many aspects of earth observation. Most notably, the classification of remote sensing image is a key step for the application of remote sensing images [1]. Accuracy assessment is a fundamental component of classification analysis [2]. Accuracy assessment is required to be conducted before the application of classification results of remote sensing for scientific investigation and policy decision [3].
The sampling method is an effective tool for performing accuracy assessment [4,5]. Traditional sampling methods, including simple random sampling method [6], stratified sampling method [7], systematic sampling method [8], and cluster sampling method [9], have been used for the accuracy assessment of classification results of remote sensing images. Koyuncu (1) where d(x, z) is the Euclidean distance from point x to the center of kernel function z. σ is the width parameter, which controls the radial action range. Gamma is a parameter of the RBF function after it is selected as a kernel. The larger the gamma, and the less the support vector, while the smaller the gamma value, the more support vectors. C is the penalty coefficient, namely, the tolerance of error.
The higher the C value, the greater the amount of error that cannot be tolerated, and overfitting is easy. Figure 1 shows the flow chart of the accuracy assessment of the classification results of remote sensing images.
Appl. Sci. 2020, 10, x 2 of 14 calibration estimator to estimate the population mean in stratified random sampling [10]. Using stratified sampling method, Wickham et al. reported agreement statistics between map and reference labels for national land cover database (NLCD) 2011, which includes land cover for ca. 2001, ca. 2006, and ca. 2011 [4]. Di et al. developed a reasonable spatial sampling plan for crop area estimation based on the SPOT-5 image, combined with the selection of stratified standards and the size of sampling units [11]. However, these sampling methods are more suitable for the accuracy assessment of products with independent items. By contrast, remote sensing images have dependent feature information and are characterized by spatial correlation and spatial heterogeneity. Recently, several sampling methods were proposed for the accuracy assessment of spatial images. Meng et al. used the spatial stratified sampling method for the accuracy assessment of regional surface coverage [12]. Wang et al. proposed a "sandwich" sampling model for the accuracy assessment of small-cultivated land areas [13]. Stehman proposed a unified framework for crop area estimation based on a confusion matrix [14]. Mayaux et al. calculated the species richness and uniformity of each sample grid using a diversity index (Shannon-Weaver (SW)) [15]. These spatial sampling methods only consider the feature of spatial correlation and ignore spatial heterogeneity. Spatial heterogeneity is also an important determinant in accuracy assessment. Spatial heterogeneity within pixels biases the nonlinear estimation of classification variables from remote sensing images.
In this study, a multi-level non-uniform spatial sampling method (MNSS) was proposed for the accuracy assessment of classification results of remote sensing images, which was deduced by calculating spatial correlation and heterogeneity.

Methodology
Here, the remote sensing images were classified using the SVM classifier. The kernel function of SVM is the radial basis function (RBF), defined by [16] exp exp where ( ) , d x z is the Euclidean distance from point x to the center of kernel function z . σ is the width parameter, which controls the radial action range. Gamma is a parameter of the RBF function after it is selected as a kernel. The larger the gamma, and the less the support vector, while the smaller the gamma value, the more support vectors. C is the penalty coefficient, namely, the tolerance of error. The higher the C value, the greater the amount of error that cannot be tolerated, and overfitting is easy. Figure 1 shows the flow chart of the accuracy assessment of the classification results of remote sensing images. The proposed MNSS consists of three steps. In step 1, the fragmentation indexes of the classification results are calculated based on spatial autocorrelation, and then the studied region is zoned from high to low resolution based on the fragmentation indexes. In step 2, the sampling rate for each zone is deduced from the low to high resolution based on spatial heterogeneity. In step 3, the position of each sample point is selected, and the accuracy of the classification results is accessed by comparing each sample point with the ground truth data.

Step 1: The Studied Region was Zoned from High to Low Resolution
Taking the classification results of remote sensing images as the input data, the studied region was zoned by calculating the fragmentation index (F), as shown in Figure 2. The proposed MNSS consists of three steps. In step 1, the fragmentation indexes of the classification results are calculated based on spatial autocorrelation, and then the studied region is zoned from high to low resolution based on the fragmentation indexes. In step 2, the sampling rate for each zone is deduced from the low to high resolution based on spatial heterogeneity. In step 3, the position of each sample point is selected, and the accuracy of the classification results is accessed by comparing each sample point with the ground truth data.

Step 1: The Studied Region was Zoned from High to Low Resolution
Taking the classification results of remote sensing images as the input data, the studied region was zoned by calculating the fragmentation index (F), as shown in Figure 2. Step-by-step zoning of remote sensing images based on the fragmentation index.
The fragmentation index (F) indicates the fragmentation degree of the study region, reflecting the spatial complexity of the region [17].
F is calculated as Step-by-step zoning of remote sensing images based on the fragmentation index.
The fragmentation index (F) indicates the fragmentation degree of the study region, reflecting the spatial complexity of the region [17].
F is calculated as Appl. Sci. 2020, 10, 5568 where F i is the fragmentation index of the i-th window. Ai is the number of different types of land cover in the i-th window. N is the number of pixels in the i-th window. Here, F was calculated by a 3 × 3 pixel window. According to F, the regions were zoned from high to low resolution.

2.2.
Step 2: The Sampling Rate was Deduced from Low to High Resolution The sampling rate of each zone was deduced from low to high resolution by Equation (3).
where i (i = 0, 1, 2, ...) indicates the different spatial scale of the classification results. i = 0 represents the original spatial resolution. F it is the fragmentation index of the t-zone at the i-th spatial scale, t = 1, 2, ... m. m is the zone number at the i-th spatial scale. w it is the sampling weight coefficient of the t -zone at the i-th spatial scale, i.e., the sampling rate of the t -zone at the i-th spatial scale. n it is the sample size of the t -zone at the i-th spatial scale.

Step 3: Accuracy Assessment of the Classification Results
The accuracy of classification results was accessed by comparing each sample point with the ground truth data. In this study, the sample unit of accuracy assessment was the pixel of remote sensing images. The sample size was calculated following [18,19] where N is the lot size of the pixels of remote sensing images. p is the proportion of nonconforming items, and it denotes the average pixel classification error rate of the remote sensing classifications. p 0 is the proportion of the nonconforming estimator, which is the maximum limit of the proportion of nonconforming items allowed in the application of remote sensing classification. r is the maximum relative error between the percent non-conforming estimator p 0 and the true value p. µ 1−α/2 is the critical value of the standard normal distribution at the confidence level 1 − α/2. Taking five categories of ground objects in the same region (2-m resolution) as the ground truth data, the accuracy of the classification results was accessed by comparing each sample point with the ground truth data. The sample points were selected in the same regions as the ground truth data (2-m resolution). If the type of land-cover from the two different images was consistent, the variable was assigned a value of 1; otherwise, the variable was assigned a value of 0 [20].

Experimental Data
The study region is located in Kobo Askov, Texas, USA. The experiment dataset is the fusion image of multispectral and panchromatic images based on HYDICE (Hyperspectral Digital Imagery Collection Experiment) obtained on April 16, 2015, with 2-m spatial resolution (Figure 3a). The ground truth data were obtained by a structured sparse regularization non-negative matrix decomposition (SS-NMF) method [21], which includes five categories of ground objects in the same region. Both the experimental data and the ground truth data are available at http://www.escience.cn. All datasets follow the same coordinate system, WGS_1984_UTM_zone_50N.
Appl. Sci. 2020, 10, x 5 of 14 ground truth data were obtained by a structured sparse regularization non-negative matrix decomposition (SS-NMF) method [21], which includes five categories of ground objects in the same region. Both the experimental data and the ground truth data are available at http://www.escience.cn. All datasets follow the same coordinate system, WGS_1984_UTM_zone_50N. Five different types of land cover were classified, i.e., road, grass, forest, roof, and others based on the SVM in Environment for Visualizing Images (ENVI) 5.1 software (ENVI, a complete remote sensing image processing platform, is the flagship product of Exelis Visual Information Solutions, Boulder, CO, USA.).The classification results are shown in Figure 4. Five different types of land cover were classified, i.e., road, grass, forest, roof, and others based on the SVM in Environment for Visualizing Images (ENVI) 5.1 software (ENVI, a complete remote sensing image processing platform, is the flagship product of Exelis Visual Information Solutions, Boulder, CO, USA.).The classification results are shown in Figure 4. decomposition (SS-NMF) method [21], which includes five categories of ground objects in the same region. Both the experimental data and the ground truth data are available at http://www.escience.cn. All datasets follow the same coordinate system, WGS_1984_UTM_zone_50N.  Here, the lot size of the pixels of the remote sensing image was 94249. Given the accuracy requirement, p 0 was 85%, µ 1−α/2 was 0.80, and r was 1%, according to Equation (4), and the sample size was 1800.
Appl. Sci. 2020, 10, 5568 6 of 13 Figure 5 shows the zone results of remote sensing classification at five different spatial scales, from 2-m resolution, 6-m resolution, 18-m resolution, 54-m resolution, to 162-m resolution. According to the zone results, the sampling rate of each zone was deduced from low to high resolution, which was from 162-m resolution, 54-m resolution, 18-m resolution, 6-m resolution, to 2-m resolution. At the fifth level of the spatial resolution, the sampling rate of each zone was determined by the area of each zone to the total studied region area. According to the sampling rate at the fifth level spatial resolution, the sampling rate of each zone was calculated by Equation (2) from low to high resolution. Tables 1-5 show the sampling rate of each zone at different spatial scales. Here, the lot size of the pixels of the remote sensing image was 94249. Given the accuracy requirement, p0 was 85%, 1 2 α μ − was 0.80, and r was 1%, according to Equation (4), and the sample size was 1800. Figure 5 shows the zone results of remote sensing classification at five different spatial scales, from 2-m resolution, 6-m resolution, 18-m resolution, 54-m resolution, to 162-m resolution. According to the zone results, the sampling rate of each zone was deduced from low to high resolution, which was from 162-m resolution, 54-m resolution, 18-m resolution, 6-m resolution, to 2-m resolution. At the fifth level of the spatial resolution, the sampling rate of each zone was determined by the area of each zone to the total studied region area. According to the sampling rate at the fifth level spatial resolution, the sampling rate of each zone was calculated by Equation (2) Figure 5. Zone results of remote sensing classification at five different spatial scales. Table 1. Sampling rate of each zone at the fifth-level spatial resolution (162-m).  Table 3. Sampling rate of each zone at the third-level spatial resolution (18-m).

Zones at the third-level spatial resolution
Sampling Rate (%)  The sample pixel number of the studied remote sensing image (n) was 1800. According to the sampling rate as shown in Table 5, the sampling size of each zone at the original spatial resolution was determined. And form 1-zone to 5-zone, the sampling size was and it was 355, 459, 378, 456, and 152, respectively.

Comparison of Accuracy Assessment Efficiency between Different Sampling Methods
In this section, we used four different sampling methods to a conduct accuracy assessment of the classification results, i.e., the simple random sampling method (SRS), stratified sampling method (SS), spatial sampling of the gray level co-occurrence matrix method (GLCM) [2], and our proposed method (MNSS). In order to verify the efficiency of our proposed MNSS, a full inspection was performed. Four indexes were used to evaluate the efficiency of different sampling methods: overall classification accuracy (OA), Kappa coefficient (Kappa), producer accuracy (PA), and user accuracy (UA).
Overall classification accuracy (OA) [22] is the proportion of the correct classification of all categories relative to the total number of samples, reflecting the overall correctness of the classification results [23,24]. The formula is: The Kappa coefficient is different from the overall classification accuracy which only uses the diagonal elements of the confusion matrix; it utilizes the entire confusion matrix information and is a comprehensive measure of the classification accuracy [25]. The formula is: Producer accuracy (PA) is the ratio of the classifier correctly dividing the cells of the entire image into the number of cells in the class A (diagonal value) and the total number of real references in class A (the sum of the class A columns in the confusion matrix). The formula is: User accuracy (UA) is the ratio of the total number of cells (diagonal value) correctly classified into class A and the classifier dividing the cells of the entire image into the total number of cells of class A (the sum of class A rows in the confusion matrix). The formula is: where m i+ and m +i represent the sum of the rows and columns of the confusion matrix and N is the sum of all items in the confusion matrix. Figure 6 shows the sample points allocated in the studied region by different sampling methods. Table 6 shows the accuracy inspection confusion matrix of the full inspection based on SVM [26]. Figure 7 shows the accuracy assessment by different sampling methods. Figure 8 shows the comparison of producer accuracy (PA) and user accuracy (UA) for different sampling methods.      As shown in Figure 6, compared with random sampling method (SRS), the proposed method overcame the shortcomings that the sample points were randomly distributed in the studied region. Compared with stratified sampling method (SS) and spatial sampling of the gray level co-occurrence matrix method (GLCM), the proposed MNSS reduced the redundant information between different sample points.
From Figures 7 and 8, we can conclude that the accuracy results of our proposed MNSS were lower than those of other methods. By the proposed method, the sample points were located in the region, which had more mixed pixels and was the change boundary of land covers, and also had a higher probability of classification error. The result showed that our proposed MNSS was stricter than other methods and was more suitable for higher accuracy requirements of remote sensing image applications. As shown in Figure 6, compared with random sampling method (SRS), the proposed method overcame the shortcomings that the sample points were randomly distributed in the studied region. Compared with stratified sampling method (SS) and spatial sampling of the gray level co-occurrence matrix method (GLCM), the proposed MNSS reduced the redundant information between different sample points.
From Figures 7 and 8, we can conclude that the accuracy results of our proposed MNSS were lower than those of other methods. By the proposed method, the sample points were located in the region, which had more mixed pixels and was the change boundary of land covers, and also had a higher probability of classification error. The result showed that our proposed MNSS was stricter than other methods and was more suitable for higher accuracy requirements of remote sensing image applications. As shown in Figure 6, compared with random sampling method (SRS), the proposed method overcame the shortcomings that the sample points were randomly distributed in the studied region. Compared with stratified sampling method (SS) and spatial sampling of the gray level co-occurrence matrix method (GLCM), the proposed MNSS reduced the redundant information between different sample points.
From Figures 7 and 8, we can conclude that the accuracy results of our proposed MNSS were lower than those of other methods. By the proposed method, the sample points were located in the region, which had more mixed pixels and was the change boundary of land covers, and also had a higher probability of classification error. The result showed that our proposed MNSS was stricter than other methods and was more suitable for higher accuracy requirements of remote sensing image applications.

Discussion
Accuracy assessment of the classification results of remote sensing images is a key step before application for scientific investigation and policy decision. According to the sampling procedures for accuracy assessment by attributes, such as ISO 2859, ISO 19113, or ISO19114, there are three different classes of the accuracy inspection procedures, i.e., reduced inspection, standard inspection, and tightened inspection [27][28][29]. In this study, we designed an accuracy assessment method for the tightened inspection of accuracy assessment of remote sensing image classification results. The tightened inspection of accuracy assessment presented a lower error tolerance rate for the classification results of remote sensing images and was more suitable for higher accuracy requirements of remote sensing image applications.
Our proposed method considered both spatial autocorrelation and spatial heterogeneity during the accuracy assessment. Compared with the existing sampling methods, the proposed method obtained more sample points in the studied region with higher land use fragmentation. The proposed method overcame the shortcomings of the random sampling method (SRS) to avoid sample points randomly allocated in the studied region. Compared with stratified sampling method (SS) and spatial sampling of the gray level co-occurrence matrix method (GLCM), the proposed method significantly reduced the redundant information between different sample points. Thus, our proposed method is more suitable for the accuracy assessment of classification results of remote sensing images.
However, there are some limitations of our proposed method. Our method only considers how to distribute sample points and ignores how to determine the required sample size. In addition, our proposed method only selects a 3 pixel by 3 pixel window for the studied zone. However, we did consider the potential impact of window size on the accuracy assessment of the classification results of remote sensing images.

Conclusions
In this study, we proposed a multi-level non-uniform spatial sampling method for the classification results of remote sensing. This method zoned the studied region from high to low spatial resolution, deduced the sampling rate of each zone from low to high resolution, and allocated the sample points to the original spatial resolution. The proposed method ensured that the higher the fragment index of the zoned region, the higher the probability of the sample points allocated in the studied region. Thus, our proposed method is a suitable method for the accuracy assessment of the classification results of remote sensing images.