Improvement of Moderate Resolution Land Use and Land Cover Classification by Introducing Adjacent Region Features

Landsat-like moderate resolution remote sensing images are widely used in land use and land cover (LULC) classification. Limited by coarser resolutions, most of the traditional LULC classifications that are based on moderate resolution remote sensing images focus on the spectral features of a single pixel. Inspired by the spatial evaluation methods in landscape ecology, this study proposed a new method to extract neighborhood characteristics around a pixel for moderate resolution images. 3 landscape-metric-like indexes, i.e., mean index, standard deviation index, and distance weighted value index, were defined as adjacent region features to include the surrounding environmental characteristics. The effects of the adjacent region features and the different feature set configurations on improving the LULC classification were evaluated by a series of well-controlled LULC classification experiments using K nearest neighbor (KNN) and support vector machine (SVM) classifiers on a Landsat 8 Operational Land Imager (OLI) image. When the adjacent region features were added, the overall accuracies of both the classifiers were higher than when only spectral features were used. For the KNN and SVM classifiers that used only spectral features, the overall accuracies of the LULC classification were 85.45% and 88.87%, respectively, and the accuracies were improved to 94.52% and 96.97%. The classification accuracies of all the LULC types improved. Highly heterogeneous LULC types that are easily misclassified achieved greater improvements. As comparisons, the grey-level co-occurrence matrix (GLCM) and convolutional neural network (CNN) approaches were also implemented on the same dataset. The results revealed that the new method outperformed GLCM and CNN approaches and can significantly improve the classification performance that is based on moderate resolution data.


Introduction
Land use and land cover (LULC) information is one of the most essential inputs for environmental monitoring tasks and numerous interdisciplinary studies, including research on climate change and nature conservation, since LULC information is crucial to understanding the complex underlying patterns and mechanisms among natural processes and human activities [1][2][3][4][5].Remote sensing is capable of providing large scale and long time series information of earth surface.LULC classification based on remote sensing data, which is a basic issue in geographical information system (GIS) fields, is playing an increasingly important role at present [1,4,6,7].Landsat data are free on the United States Geological Survey (USGS) website for downloading and relevant analysis.Landsat-like moderate spatial resolution images are capable of providing global-scale information on the earth surface and have been the major data source of LULC classification, especially at large scales [4,[7][8][9].Furthermore, Landsat data have a remarkable temporal range of over 40 years and have great potential for LULC classification, change detection, and relevant analysis [7,10,11].
Numerous effective methods and advanced classifiers have been applied to improve the performance of LULC classification that is based on moderate resolution data, and most of these methods have been implemented in the feature generation step of LULC classification.Neighborhood characteristics around a pixel provide spatial information that can potentially discriminate LULC types with similar spectral characteristics.For high or very high resolution LULC classification, neighborhood characteristics in different forms have been proven to be beneficial for improving the classification performance.The most popular method is using a grey level occurrence matrix (GLCM) to obtain the texture features such as the mean, contrast, homogeneity, and angular second moment [12][13][14][15][16].In recent years, numerous studies on high-resolution LULC classification have used powerful state-of-the-art techniques of deep learning methods, such as convolutional neural network (CNN) for object-level classification in a limited area [17][18][19][20].The CNN approach uses convolutional windows and local connections to effectively extract the spatial information.In regard to LULC classification that is based on moderate or coarser resolution images, such as Landsat and MODIS, the CNN approach and texture features may still be advantageous in some special cases, such as cropland classifications or in the extraction of certain LULC types in special regions [21][22][23].However, for moderate resolution LULC classification with high thematic resolution, very few researches involved neighborhood characteristics using methods like GLCM and CNN.The improvement of classification accuracy appears to be dependent on the resolution level applied.Chen, Stow, and Gong [24] found that textural features were more effective in improving the classification accuracy of land use classes at finer resolution levels, and when the spatial resolution exceeded a certain level, adding texture did not lead to higher classification accuracy.Thus these methods, which have been proved very effective in high or very high resolution LULC classification, may face challenges at a much coarser resolution.In this situation, most of efforts have concentrated on using multitemporal or multiseasonal data, i.e., more than one remote sensing image is selected to obtain multitemporal features, or phenological characteristics are extracted based on a series of fusion data generated from models [10,[25][26][27][28][29][30][31][32].However, multitemporal methods are still based on the spectra information of a single pixel, i.e., the environment surrounding the pixel, which may be crucial for LULC classification, is not evaluated.
Landscape ecology, which has been widely recognized as a highly interdisciplinary science of spatial heterogeneity, is a newly developed and one of the most active disciplines that involves many novel methods and concepts that differ from traditional ecology [33][34][35][36].The core concepts of landscape ecology consist of spatial pattern, heterogeneity, and scale.Unlike traditional ecology, landscape ecology focuses on the global spatial patterns or the structure features of a region instead of those of a single site to evaluate the conditions or ecological functions.Spatial heterogeneity and the relationships between patterns and process, are popular research aspects in landscape ecology [35,37,38].In landscape ecology, landscape metrics are defined as quantitative indexes to describe the spatial structures and patterns of a given region at a specific scale.Many landscape metrics in different forms, such as shape metrics, diversity metrics, and area metrics, have been proposed by ecologists to measure the relevant features of a region, and relevant research has demonstrated that scale changing influences the evaluation of landscape metrics [39][40][41][42][43][44][45][46].The Landscape metrics of a certain region are usually calculated from categorical maps and serve for relevant ecological analysis [39,42,44,45].Landscape-metric-like indexes may have potential to capture useful neighborhood characteristics for improving the performance of moderate resolution LULC classification.
The objective of this study is to include the neighborhood characteristics around a pixel from a landscape ecology perspective to help improve the performance of LULC classification based on moderate resolution remote sensing images, such as Landsat.The researchers used basic landscape-metric-like indexes, i.e., mean index, standard deviation index, and distance weighted value index to evaluate the surrounding environment.These indexes are in the same or similar forms with the mean and variation filters, which has been used for decades and proved to be effective for different purposes in digital image processing [47][48][49][50].In in this study, we redefined and understood them from a novel perspective of landscape ecology for neighborhood characteristic extraction.As comparisons, the grey-level co-occurrence matrix (GLCM) and convolutional neural network (CNN) approaches were also implemented on the same dataset.

Study Area
An entire Landsat 8 OLI image (path/row: 123/32, date: 10 July 2017) was selected from the USGS website, and the analysts have prior knowledge of the LULC in the coverage area of the image.The image was utilized as the data source for the control experiments over different feature set configurations.The selected Landsat OLI image was the latest Landsat image with a zero cloud cover rate, i.e., the image is almost not affected by clouds.One single image consists of more than 6 × 10 6 pixels, which is already extensive data level for machine learning methods in LULC classification.In addition to the simplification, a single image with information from all of the pixels collected under the same atmospheric conditions at same time does not need further processing, such as atmosphere correction, to make digital number (DN) values that are comparable among different remote sensing images [51].Then, the DN values can be directly used as the feature generation source since they are a specific transformation of the real surface reflectance for all of the pixels in the image, and this method will avoid any possible disturbance to the control experiments.As shown in Figure 1, the image covers the majority of Beijing city, part of Hebei Province, and part of Tianjin city.The typical LULC types, including forest, shrub, grassland, waterbody, cropland, bare land, and urban areas in a temperate climate region make the coverage ideal for the LULC classification experiments.purposes in digital image processing [47][48][49][50].In in this study, we redefined and understood them from a novel perspective of landscape ecology for neighborhood characteristic extraction.As comparisons, the grey-level co-occurrence matrix (GLCM) and convolutional neural network (CNN) approaches were also implemented on the same dataset.

Study Area
An entire Landsat 8 OLI image (path/row: 123/32, date: 10 July 2017) was selected from the USGS website, and the analysts have prior knowledge of the LULC in the coverage area of the image.The image was utilized as the data source for the control experiments over different feature set configurations.The selected Landsat OLI image was the latest Landsat image with a zero cloud cover rate, i.e., the image is almost not affected by clouds.One single image consists of more than 6 × 10 6 pixels, which is already extensive data level for machine learning methods in LULC classification.In addition to the simplification, a single image with information from all of the pixels collected under the same atmospheric conditions at same time does not need further processing, such as atmosphere correction, to make digital number (DN) values that are comparable among different remote sensing images [51].Then, the DN values can be directly used as the feature generation source since they are a specific transformation of the real surface reflectance for all of the pixels in the image, and this method will avoid any possible disturbance to the control experiments.As shown in Figure 1, the image covers the majority of Beijing city, part of Hebei Province, and part of Tianjin city.The typical LULC types, including forest, shrub, grassland, waterbody, cropland, bare land, and urban areas in a temperate climate region make the coverage ideal for the LULC classification experiments.

Dataset
The classification categories were built based on the previous knowledge of the distribution of LULC types in the study area, including forest, grass, shrub, water, cropland, bare land, and impervious.Sample polygons of each type were built according to the classification categories and their presented visual characteristics in the remote sensing images with the help of experiential knowledge of researchers, high-quality Google Earth images, and the ArcGIS environment.Google Earth, which is a contemporary high-resolution archive, represents a significant, rapidly expanding, cost-free, and largely unexploited resource for scientific inquiry.High-resolution Google Earth imagery has been widely used for assessing moderate resolution remote sensing products [52,53].All the sample polygons of each type are shown in Figure 1.
Pixels from the same sample polygon tend to have similar large-scale features in adjacent regions, not only because they are of the same classification category, but also because they are in the same site with the same large-scale surroundings.Thus, the classification accuracy may be artificially high if a sample polygon provides both training data and validation data.Thus, for each LULC type, we divided the sample polygons into two groups, half for producing training data and half for validation data.This process indeed generated a stricter validation standard for the generalization capacities of the classifiers.
The cover percentages of each land cover type vary greatly.Some types, such as forest, water, crops, and impervious surfaces are widely distributed in relatively larger patches, while other types are rare and in smaller patches, such as grass and bare land.Thus, the total number of pixels of the different LULC types vary considerably, as shown in Figure 1.To balance the amounts of training and validation data of the different LULC types, 3000 pixels for each LULC type were randomly selected as the training data for the supervised classifiers, and another 3000 pixels were selected as validation data.There are seven LULC types, so there was a total of 42,000 pixels in the sample polygons and 21,000 were used as training data, and the other 21,000 were used for the accuracy assessment.The training data and validation data for each LULC type are from different polygons, as mentioned above.We used the same training data and validation data for all of the classifiers and methods to maintain a strict and absolutely contrasting effect.
All the above processes, including sampling from different groups of polygons for training and validation data, balancing the dataset, and using the same data for different classifiers, are conducive for evaluating the performance of the classifiers with different feature set configurations.Thus, the contributions of the newly included feature sets will be tested more reasonably.

Adjacent Region Feature Extraction
From a landscape ecology perspective, the attributes of a pixel are strongly related to the surrounding environment.Thus, information on the spatial patterns and heterogeneity of adjacent regions at different scales may be beneficial to the LULC classification of the central pixel.Inspired by the use of landscape metrics/indexes to evaluate the attributes of spatial patterns or the spatial heterogeneity in landscape ecology, we defined 3 basic adjacent region indexes-mean index, standard deviation index, and distance weighted value index-to evaluate the environment surrounding a focus pixel from a landscape perspective.Moving windows were used to extract the features of adjacent regions, and the scale must be taken into consideration when evaluating a region.Relevant research has found that scale may affect the evaluation of a region, i.e., moving windows of different sizes may capture different attributes or potential spatial patterns, which may be useful to distinguish different LULC types [43,45,46].As an example, Figure 2 shows four adjacent region feature extraction windows surrounding adjacent regions of incremental scales centred on the red focus pixel.Several definitions will be given as follows for unequivocal description in this paper.

Definition 2. (Scale): The scale S of the adjacent region is denoted by the length of the relative side of the square feature extraction window, which is measured by the number of pixels on one side. S has to be an odd number to ensure that the focus pixel is the geometric centre of the extraction window. The value range of scale S was set to an odd number of the sequence (1, 2, 3, …, 2n − 1). Thus, when S = 1, the adjacent region is limited to the central pixel itself.
For instance, as shown in Figure 2, we demonstrate four adjacent regions of the central pixel at the scales of 1, 5, 9, and 13.

Definition 3. (Mean index): The mean index (MI) of an adjacent region z measures the mean reflectance level
in the square feature extraction window with the scale s.
where is a pixel in an adjacent region z, and is the DN value of pixel .

Definition 4. (Standard deviation index):
The standard deviation index measures the variability of the DN values of all pixels in a square feature extraction window.The standard deviation index is defined to evaluate the degree of variation among the reflectance levels of the adjacent region of a focus pixel with a scale of s.

Definition 2. (Scale):
The scale S of the adjacent region is denoted by the length of the relative side of the square feature extraction window, which is measured by the number of pixels on one side.S has to be an odd number to ensure that the focus pixel is the geometric centre of the extraction window.The value range of scale S was set to an odd number of the sequence (1, 2, 3, . . ., 2n − 1).Thus, when S = 1, the adjacent region is limited to the central pixel itself.
For instance, as shown in Figure 2, we demonstrate four adjacent regions of the central pixel at the scales of 1, 5, 9, and 13.

Definition 3. (Mean index):
The mean index (MI) of an adjacent region z measures the mean reflectance level in the square feature extraction window with the scale s.
where p i is a pixel in an adjacent region z, and v p i is the DN value of pixel p i .

Definition 4. (Standard deviation index):
The standard deviation index measures the variability of the DN values of all pixels in a square feature extraction window.The standard deviation index is defined to evaluate the degree of variation among the reflectance levels of the adjacent region of a focus pixel with a scale of s.
Definition 5. (Distance weighted value index): The distance weighted value index (DWVI) of an extracted adjacent region is defined based on the principle that the reference value of a pixel in an adjacent region to the central pixel declines when the distance between the two pixels increases.DWVI is defined as follow: Similar to landscape metrics, the three basic adjacent region indexes are defined to capture the component characteristics and spatial information of a region at a certain scale.Landscape metrics are calculated with categorical maps of a certain region, while the three basic adjacent region features can be extracted from the original remote sensing images of the adjacent region.

Feature Set Configuration and LULC Classification
Scale or moving window size selection is an issue that is worth considering [24,43,46,54].Higher scales may lead to higher accuracies as well as higher computation costs, so the selection of scale depends on the tradeoffs in different situations.In this study, each Landsat OLI band was used for each pixel in the image.Window sizes from 3 pixels × 3 pixels to 23 pixels × 23 pixels were used in this study.With the resolution of 30 m, the length of the largest feature extraction window is 690 m.The largest window covers a 47.61 ha region that is adjacent to the central pixel and is large enough to capture the features of a region adjacent to a site.Moreover, the results that are presented in the next section show that the performances of the classifiers change only slightly once such a high scale is reached.
Among the Landsat OLI data bands, some are designed for special use.The coastal aerosol band is mostly used for coastal water monitoring.The cirrus band is mainly used for cloud detection.These bands are less useful for LULC classification tasks and can always be removed since they contain limited land surface information.The thermal infrared bands are at a much lower resolution of 100 m.Finally, bands 2, 3, 4, 5, 6, and 7 of the Landsat OLI data were selected as the data sources.All processes will be repeated on each selected band for every pixel.
Six raw spectral features from the Landsat OLI bands were first selected as the feature set for the LULC classification, and the performances of the classification were then contrasted with the following classifiers that utilized the features from adjacent regions.As shown in Table 1, to evaluate the effect of different types of adjacent region features at different scales, four experimental sequences with different feature set configurations were designed.For the first sequence, each type of adjacent region feature (MI, or SI, or DWVI) at a certain scale s (1, 2, 3, . . . , 23)were successively added to the feature set.For the second experimental sequence, all of the adjacent region features of a certain type with scales no larger than a given maximum were added to the feature set as a whole.For the third experimental sequence, the three adjacent region features (MI, SI, and DWVI) at a certain scale s were successively added to the feature set.Finally, for the fourth experimental sequence, the three features of the adjacent region (MI, SI, and DWVI) with scales no higher than a given maximum were added to the feature set as a whole.It is worth noting that all of the feature set configurations contain six basic spectral features, since when S = 1, there are six spectral features in the feature set.Therefore, in any configuration, the classification performance with S = 1 becomes the basic comparative references.A KNN classifier based on the K-nearest neighbour algorithm and a support vector machine (SVM) classifier based on the support vector machine algorithm were selected for LULC classification.The KNN classifier is one of the most fundamental but effective non-parametric classification methods [55].SVM aims to find an optimal hyperplane between classes and addresses if the data are linearly separable or not.An SVM classifier usually performs well and is one of the most widely used advanced discrimination models that has been used in numerous classification tasks containing LULC classification [56,57].

Comparable Methods
To compare the new feature extraction method with other methods, GLCM and CNN approaches were also implemented using the same data.
Specifically, for the GLCM approach, the mean, contrast, homogeneity, and angular second moment were extracted from the GLCM.These texture features have proved to be effective in classification.Additionally, mean, homogeneity, and angular second moment are least correlated with each other [58].GLCM texture features with different configurations on different scales were evaluated in the similar way in Table 1 to find the best-performed configuration.
For CNN approach, we referenced the architecture of GoogLeNet Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14) [59].The "Inception modules" combines multiscale convolution kernels with a parallel pooling path in each such stage, which has an additional beneficial effect.It is worthy to note that there two different type of moving window.One type is the same with GLCM texture window and adjacent region defined above in this study.Another type is the convolution window used in CNN architecture.We proposed a similar strategy with GoogLeNet Inception version 1. Two "Inception modules" were stacked in the final CNN architecture which we designed for pixel based LULC classification, as is shown in Figure 3.As the biggest convolution window is 5 × 5, the smallest adjacent region scale we used in CNN approach will be 5 × 5 as well.Thus, to maintain a contrasting effect, the scale range 5 to 23 is evaluated by CNN approach.
Remote Sens. 2018, 10, x FOR PEER REVIEW 7 of 16 if the data are linearly separable or not.An SVM classifier usually performs well and is one of the most widely used advanced discrimination models that has been used in numerous classification tasks containing LULC classification [56,57].

Comparable Methods
To compare the new feature extraction method with other methods, GLCM and CNN approaches were also implemented using the same data.
Specifically, for the GLCM approach, the mean, contrast, homogeneity, and angular second moment were extracted from the GLCM.These texture features have proved to be effective in classification.Additionally, mean, homogeneity, and angular second moment are least correlated with each other [58].GLCM texture features with different configurations on different scales were evaluated in the similar way in Table 1 to find the best-performed configuration.
For CNN approach, we referenced the architecture of GoogLeNet Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14) [59].The "Inception modules" combines multiscale convolution kernels with a parallel pooling path in each such stage, which has an additional beneficial effect.It is worthy to note that there two different type of moving window.One type is the same with GLCM texture window and adjacent region defined above in this study.Another type is the convolution window used in CNN architecture.We proposed a similar strategy with GoogLeNet Inception version 1. Two "Inception modules" were stacked in the final CNN architecture which we designed for pixel based LULC classification, as is shown in Figure 3.As the biggest convolution window is 5 × 5, the smallest adjacent region scale we used in CNN approach will be 5 × 5 as well.Thus, to maintain a contrasting effect, the scale range 5 to 23 is evaluated by CNN approach.

Accuracy Assessment
The accuracies of the LULC classification were evaluated using a confusion matrix to calculate the overall accuracy, kappa coefficient, producer's accuracy, and user's accuracy [60].All of the classifiers with different feature set configurations used the same training and testing datasets.Hence, we implemented a series of standard control experiments to evaluate the effect of adjacent region features.Furthermore, in order to access statistical differences between the accuracy measurements of classification results using different approaches, a Z-test was performed to see if they were significantly different [61].

Results
Figure 4 shows how the overall accuracy changed with different feature set configurations.The results show that, in any cases, when adjacent region features were added, both classifiers (KNN and SVM) achieve considerable performance improvements.In addition, when higher scale adjacent region features were added, the improvement continued until a relative gently change range was reached when the accuracy of the LULC classification was very high.For the KNN classifier, the overall accuracy of the LULC classification with only spectral features was 85.45% and the highest overall accuracy was 94.52%, which was achieved when all of the adjacent region features were

Accuracy Assessment
The accuracies of the LULC classification were evaluated using a confusion matrix to calculate the overall accuracy, kappa coefficient, producer's accuracy, and user's accuracy [60].All of the classifiers with different feature set configurations used the same training and testing datasets.Hence, we implemented a series of standard control experiments to evaluate the effect of adjacent region features.Furthermore, in order to access statistical differences between the accuracy measurements of classification results using different approaches, a Z-test was performed to see if they were significantly different [61].

Results
Figure 4 shows how the overall accuracy changed with different feature set configurations.The results show that, in any cases, when adjacent region features were added, both classifiers (KNN and SVM) achieve considerable performance improvements.In addition, when higher scale adjacent region features were added, the improvement continued until a relative gently change range was reached when the accuracy of the LULC classification was very high.For the KNN classifier, the overall accuracy of the LULC classification with only spectral features was 85.45% and the highest overall accuracy was 94.52%, which was achieved when all of the adjacent region features were introduced in an accumulated-scale with the largest scale of 19.At the same time, the kappa coefficient improved from 0.83 to 0.94.For the SVM classifier, the improvement was from 88.87% to the highest overall accuracy of 96.97% when the configuration was the combination of all there adjacent region features at a scale of 17.The kappa coefficient improved from 0.87 to 0.96. Figure 4 also indicates that, in general, when the three adjacent region metrics were evaluated, the DWVI and MI outperformed the SDI.
introduced in an accumulated-scale with the largest scale of 19.At the same time, the kappa coefficient improved from 0.83 to 0.94.For the SVM classifier, the improvement was from 88.87% to the highest overall accuracy of 96.97% when the configuration was the combination of all there adjacent region features at a scale of 17.The kappa coefficient improved from 0.87 to 0.96. Figure 4 also indicates that, in general, when the three adjacent region metrics were evaluated, the DWVI and MI outperformed the SDI.To show more details, the confusion matrixes of the classification results using only spectral features and the best performing feature set configurations with both KNN and SVM classifiers were shown in Tables 2 and 3.The large improvements indicated the effectiveness of utilizing adjacent region features, especially for the LULC types that had lower accuracy when only spectral features were used, such as crop land, forest, and shrub, as shown in Tables 2 and 3.For instance, with the KNN and SVM classifiers, the user's accuracy of crop land improved from 76.7% to 92.5% and from 72.4% to 84.6%, respectively; the producer's accuracy improved from 79.7% to 91.4%, and from 77.6% to 92.6%, respectively.The results indicated that these two types are quite challenging for even an advanced classifier using only spectral features.These LULC types, such as crop land, are highly heterogeneous since numerous crop species show quite different optical characteristics.But, as a whole, one LULC type has global spatial pattern features that will not be captured when only one pixel is evaluated.As clearly shown in the results, these features may be crucial for improving the performance of LULC classification based on Landsat-like moderate resolution data.To show more details, the confusion matrixes of the classification results using only spectral features and the best performing feature set configurations with both KNN and SVM classifiers were shown in Tables 2 and 3.The large improvements indicated the effectiveness of utilizing adjacent region features, especially for the LULC types that had lower accuracy when only spectral features were used, such as crop land, forest, and shrub, as shown in Tables 2 and 3.For instance, with the KNN and SVM classifiers, the user's accuracy of crop land improved from 76.7% to 92.5% and from 72.4% to 84.6%, respectively; the producer's accuracy improved from 79.7% to 91.4%, and from 77.6% to 92.6%, respectively.The results indicated that these two types are quite challenging for even an advanced classifier using only spectral features.These LULC types, such as crop land, are highly heterogeneous since numerous crop species show quite different optical characteristics.But, as a whole, one LULC type has global spatial pattern features that will not be captured when only one pixel is evaluated.As clearly shown in the results, these features may be crucial for improving the performance of LULC classification based on Landsat-like moderate resolution data.Regarding the performances of the different classifiers that were applied in this study, Figure 5 compares the overall accuracies of the LULC classifications using KNN and SVM classifiers.With different feature set configurations, the advanced SVM classifier outperformed KNN classifier in all cases.Generally, the overall improvements of the different feature set configurations were similar as the scale increased.The LULC classification results of both the KNN and SVM classifiers using only spectral features and using the best performing feature set configurations containing adjacent region features are shown in Figure 6.All of the results reflect the distribution of different LULC types in the Landsat image coverage area based on experts' knowledge.In the northwest mountainous areas, forests, shrubs, and grasses account for a large proportion.Cropland and impervious surfaces are primarily distributed in the southeast rural and urban areas.The LULC classification results of both the KNN and SVM classifiers using only spectral features and using the best performing feature set configurations containing adjacent region features are shown in Figure 6.All of the results reflect the distribution of different LULC types in the Landsat image coverage area based on experts' knowledge.In the northwest mountainous areas, forests, shrubs, and grasses account for a large proportion.Cropland and impervious surfaces are primarily distributed in the southeast rural and urban areas.The LULC classification results of both the KNN and SVM classifiers using only spectral features and using the best performing feature set configurations containing adjacent region features are shown in Figure 6.All of the results reflect the distribution of different LULC types in the Landsat image coverage area based on experts' knowledge.In the northwest mountainous areas, forests, shrubs, and grasses account for a large proportion.Cropland and impervious surfaces are primarily distributed in the southeast rural and urban areas.To demonstrate the effect of adjacent region features in a more visible way, the classification results of a representative region are shown in Figure 7 as a typical example.The maps show that when adjacent region features are included, the classifier tended to yield more aggregated class objects, and fragmentized patches were reduced.The reason may be that the classifier captured neighborhood characteristics and became more tolerant to the variance within classes.Thus, the classifier was less likely to misclassify and split a single-class region, especially for those LULC classes with high degree of heterogeneity, such as shrub and crop land.The overall accuracy of LULC classification obtained substantial improvements, and more importantly, the spatial distribution and the pattern of the classification are more similar to the sample polygons that are based on expert knowledge, which is a more reasonable result that is desired by the LULC classification task.
To demonstrate the effect of adjacent region features in a more visible way, the classification results of a representative region are shown in Figure 7 as a typical example.The maps show that when adjacent region features are included, the classifier tended to yield more aggregated class objects, and fragmentized patches were reduced.The reason may be that the classifier captured neighborhood characteristics and became more tolerant to the variance within classes.Thus, the classifier was less likely to misclassify and split a single-class region, especially for those LULC classes with high degree of heterogeneity, such as shrub and crop land.The overall accuracy of LULC classification obtained substantial improvements, and more importantly, the spatial distribution and the pattern of the classification results are more similar to the sample polygons that are based on expert knowledge, which is a more reasonable result that is desired by the LULC classification task.Figure 8 shows the overall accuracy curves with the best performed configurations using different feature extraction methods (GLCM, CNN, and adjacent region features).For the GLCM approach, the highest overall accuracy was 92.03%, which was achieved on SVM classifier when angular second moment features were introduced at a scale of 17.For the CNN approach, the highest overall accuracy was 94.58%, which was achieved at a scale of 19.As previously stated, for the newly proposed method using adjacent region features, the highest overall accuracy was 96.97%, which was achieved on SVM classifier when the configuration was the combination of all three adjacent region features at a scale of 17.The result indicated that when it comes to moderate resolution LULC classification, the newly proposed method outperformed GLCM and CNN approaches in all cases that we have evaluated.Figure 8 shows the overall accuracy curves with the best performed configurations using different feature extraction methods (GLCM, CNN, and adjacent region features).For the GLCM approach, the highest overall accuracy was 92.03%, which was achieved on SVM classifier when angular second moment features were introduced at a scale of 17.For the CNN approach, the highest overall accuracy was 94.58%, which was achieved at a scale of 19.As previously stated, for the newly proposed method using adjacent region features, the highest overall accuracy was 96.97%, which was achieved on SVM classifier when the configuration was the combination of all three adjacent region features at a scale of 17.The result indicated that when it comes to moderate resolution LULC classification, the newly proposed method outperformed GLCM and CNN approaches in all cases that we have evaluated.To determine whether the classification accuracies of different approaches were significantly different, the Z-test was used to compare the confusion matrixes.Z > 1.96 or Z < 1.96 would indicate the difference of the two confusion matrixes being significant at the 5% significance level [61].As shown in Table 4, the Z-test value for comparison between the confusion matrices of classification result using the best performed configurations using different feature extraction methods (GLCM, CNN and adjacent region features) are all larger than 1.96.The results indicate that GLCM, CNN, and the proposed method could significantly improve the land cover classification accuracy of using spectral features only.These findings also indicate that the classification accuracy improvement achieved by adding adjacent region features is greater than that of GLCM and CNN, and the performances were significantly different.

Discussion
Neighbourhood characteristics around a pixel are important for LULC classification using remote sensing data [12,[14][15][16][17][18]20,[22][23][24].However, neighbourhood characteristics extraction methods, which have proved very effective in high resolution classification, have not been commonly used in moderate resolution LULC classification.This study proposed a novel method to evaluate the adjacent region around a pixel inspired by the landscape metrics.
In the extraction of neighbourhood characteristics, moving window size, which is defined as scale in this study, is an important factor influencing classification accuracy.How to select the optical window size or scale is a notable issue.A number of studied have been conducted to discuss this problem in both image classification area and landscape ecology [24,43,46,54].A bigger window may include more useful features for classification, but it also may include more redundant information that are not beneficial or even have negative effect.As is shown in Figure 8, the performances of all the approaches that were used in this study were affected by changing window size.For GLCM,  4, the Z-test value for comparison between the confusion matrices of classification result using the best performed configurations using different feature extraction methods (GLCM, CNN and adjacent region features) are all larger than 1.96.The results indicate that GLCM, CNN, and the proposed method could significantly improve the land cover classification accuracy of using spectral features only.These findings also indicate that the classification accuracy improvement achieved by adding adjacent region features is greater than that of GLCM and CNN, and the performances were significantly different.

Discussion
Neighbourhood characteristics around a pixel are important for LULC classification using remote sensing data [12,[14][15][16][17][18]20,[22][23][24].However, neighbourhood characteristics extraction methods, which have proved very effective in high resolution classification, have not been commonly used in moderate resolution LULC classification.This study proposed a novel method to evaluate the adjacent region around a pixel inspired by the landscape metrics.
In the extraction of neighbourhood characteristics, moving window size, which is defined as scale in this study, is an important factor influencing classification accuracy.How to select the optical window size or scale is a notable issue.A number of studied have been conducted to discuss this problem in both image classification area and landscape ecology [24,43,46,54].A bigger window may include more useful features for classification, but it also may include more redundant information that are not beneficial or even have negative effect.As is shown in Figure 8, the performances of all the approaches that were used in this study were affected by changing window size.For GLCM, CNN and our new proposed methods, the window sizes or the scales, where the highest accuracies were achieved, were 17, 9 and 17, respectively.
The results that were obtained using different classifiers applied in this study reveal that the advanced non-parametric classifier SVM could achieve a more satisfactory classification result than the basic KNN classifier.Regarding different approaches applied in this study for neighbourhood characteristic extraction, the adjacent region features performed better than the GLCM and CNN approaches.How can these basic statistical indices outperform more complicated texture feature methods and powerful state-of-the-art techniques of CNN?The author's viewpoint is that at moderate or coarser resolution, the spectral features of a much bigger pixel may be the mixture of different reflection levels.The object-level texture features, which can be effectively captured by GLCM and CNN, tend to be smoothed and the higher scale texture features often reflect the topographical change, which has little reference value to the classification of a single pixel.However, Neighbourhood characteristics in other forms may be still beneficial.In landscape ecology, patch level and regional level statistical features are crucial to describe a landscape type.
In general, the DWVI and MI outperformed the SDI when the three adjacent region indexes were evaluated.The MI and DWVI are more like component indexes for measuring the component characteristics of an adjacent region or the surrounding environment, whereas the SDI is more like a texture feature for evaluating the variation pattern.The results also indicated that at a much coarser resolution, texture-like features may not lead to satisfactory improvements on LULC classification accuracy.
This study is designed to investigate the effects of adjacent region features, which are extracted from multiscale regions around a pixel from a landscape ecology perspective, on improving LULC classification using moderate resolution remote sensing data such as Landsat.Three newly defined basic adjacent region features were evaluated and compared with GLCM and CNN approaches.However, as there are numerous landscape metrics, more forms of adjacent region features will be evaluated on other advanced classifiers in future research.

Conclusions
At a much coarser resolution, LULC classification based on moderate resolution remote sensing data like Landsat images has limitations in terms of using neighborhood characteristics for accuracy improvements when compared to similar methods that were designed for very high or very high resolution remote sensing images, such as the GLCM and deep CNN approaches.Inspired by the concepts and methods in landscape ecology, three landscape-metric-like indexes were defined to include adjacent region features into the moderate resolution LULC classification.This study implemented a series of well-controlled experiments of LULC classification with different configurations to investigate the efficacy of adjacent region features to improve the LULC classification performances of Landsat-like moderate resolution remote sensing data.The study indicated that adjacent region features that contain environmental information around pixels greatly improved the LULC classification accuracy with different classifiers.For KNN and SVM classifiers, the best overall classification accuracies that were achieved were 94.52% and 96.97%, and the improvement were over 9% and 8%, respectively.Moreover, when compared to the GLCM and CNN approaches, the proposed method achieve much better improvements.The results reveal that adjacent region features are beneficial for moderate resolution LULC classification and can significantly improve the classification performance.The proposed methods have great potential for improving LULC classification based on Landsat-like moderate resolution remote sensing data, especially for large temporal and spatial scale analyses.

Figure 2 .
Figure 2.An example of moving windows for the extraction of adjacent region feature at different scales of the focus pixel.(Note that this is just a schematic, and not all the feature extraction windows are shown here).Definition 1. (Adjacent region): An adjacent region Z of a pixel is the area in the moving square feature extraction window with a certain size.Note that the given pixel is the central pixel of the moving window.

Figure 2 .
Figure 2.An example of moving windows for the extraction of adjacent region feature at different scales of the focus pixel.(Note that this is just a schematic, and not all the feature extraction windows are shown here).

Figure 3 .
Figure 3. Convolutional neural network (CNN) architecture used in this study.

Figure 3 .
Figure 3. Convolutional neural network (CNN) architecture used in this study.

Figure 4 .
Figure 4. Overall accuracies of the land use and land cover (LULC) classifications with different feature set configurations.

Figure 4 .
Figure 4. Overall accuracies of the land use and land cover (LULC) classifications with different feature set configurations.

16 Figure 5 .
Figure 5. Overall accuracies of the LULC classification with different feature set configurations in the SVM and KNN classifiers.

Figure 5 .
Figure 5. Overall accuracies of the LULC classification with different feature set configurations in the SVM and KNN classifiers.

16 Figure 5 .
Figure 5. Overall accuracies of the LULC classification with different feature set configurations in the SVM and KNN classifiers.

Figure 7 .
Figure 7. LULC classification of a representative region: (a) Landsat image in true color; (b) sample polygons based on expert knowledge; (c) SVM with spectral features only; (d) SVM+ spectral features +all adjacent region features (S = 17).

Figure 7 .
Figure 7. LULC classification of a representative region: (a) Landsat image in true color; (b) sample polygons based on expert knowledge; (c) SVM with spectral features only; (d) SVM + spectral features + all adjacent region features (S = 17).

16 Figure 8 .
Figure 8. Performances of grey level occurrence matrix (GLCM), CNN, and adjacent region features with feature set configurations that achieved the highest overall accuracies.

Figure 8 .
Figure 8. Performances of grey level occurrence matrix (GLCM), CNN, and adjacent region features with feature set configurations that achieved the highest overall accuracies.

Table 1 .
Different feature set configurations used in this study.

Table 3 .
Confusion matrixes of LULC Classification using support vector machine (SVM) classifiers with only spectral information and the configuration that achieved the highest overall accuracy (96.97%).

Table 4 .
Z-test values for comparison between confusion matrixes of LULC classification using different approaches with the best performed configurations.

Table 4 .
Z-test values for comparison between confusion matrixes of LULC classification using different approaches with the best performed configurations.