Mapping Land Use from High Resolution Satellite Images by Exploiting the Spatial Arrangement of Land Cover Objects

: Spatial information regarding the arrangement of land cover objects plays an important role in distinguishing the land use types at land parcel or local neighborhood levels. This study investigates the use of graph convolutional networks (GCNs) in order to characterize spatial arrangement features for land use classification from high resolution remote sensing images, with particular interest in comparing land use classifications between different graph-based methods and between different remote sensing images. We examine three kinds of graph-based methods, i.e., feature engineering, graph kernels, and GCNs. Based upon the extracted arrangement features and features regarding the spatial composition of land cover objects, we formulated ten land use classifications. We tested those on two different remote sensing images, which were acquired from GaoFen-2 (with a spatial resolution of 0.8 m) and ZiYuan-3 (of 2.5 m) satellites in 2020 on Fuzhou City, China. Our results showed that land use classifications that are based on the arrangement features derived from GCNs achieved the highest classification accuracy than using graph kernels and handcrafted graph features for both images. We also found that the contribution to separating land use types by arrangement features varies between GaoFen-2 and ZiYuan-3 images, due to the difference in the spatial resolution. This study offers a set of approaches for effectively mapping land use types from (very) high resolution satellite images. GCNs graph kernels used as state-of-the-art methods for dealing with graph-structural data. compared ten different land use classiﬁcations that are based upon the extracted structural features, and experimented on two remote sensing images, i.e., a very high resolution (VHR) image from the GF2 satellite with a spatial resolution of 0.8 m and a high resolution (HR) image from the ZY3 satellite of 2.5m. Our results reveal that the structural features that are derived from GCNs and graph kernels provide better classiﬁcation performance than that using handcrafted methods, and they have a high potential for different applications. In general, the classiﬁcation accuracies that were obtained by the are than that of the ZY3 image, due higher spatial resolution. Nonetheless, the ZY3 has a larger swath, and can be used for collecting images with broader coverage, facilitating to land use mapping over large areas. Previous research has highlighted the importance of the spatial arrangement of land cover objects for land use mapping from VHR images and shown that model spatial arrangement the best of our knowledge, little research has been conducted on the comparison of land use classiﬁcations between popular graph-based methods while using VHR images, particularly less using HR images. This study contributes to adding such gap information.


Introduction
Mapping land use using high resolution (HR) satellite images is important for urban planning, socio-economic analysis, and geo-information database updating [1][2][3][4]. Particularly, the rapid urbanization and population growth pose big challenges for sustainable urban development, requiring updated and fine-detailed land use information for decision making [5,6]. In the past decade, remote sensing technologies have developed rapidly, and many satellites have been launched. A vast amount of remote sensing images that were acquired from various platforms and sensors can be accessed now. Among them, (very) high spatial resolution remote sensing images, usually with a resolution finer than 5 m acquired from Worldview, GeoEye, GaoFen-2 (GF2), and ZiYuan-3 (ZY3) satellites, provide detailed images of the earth. These images have a high potential for land use mapping at the land parcel or local neighborhood levels [7][8][9][10][11].
variant of graph neural networks refers to graph convolutional network (GCN) [36]. Recently, Li [11] applied GCNs to urban land use classification from very high resolution (VHR) satellite images with 0.5 m spatial resolution, and obtained promising results.
This study is an extension of [11], with the aim to exploit the spatial arrangement of land cover objects by graph-based methods for land use mapping from high resolution satellite images. We focus on extracting high-level structural features of land use while using GCNs, and comparing with methods using feature engineering and graph kernels. Previous studies investigated land use classification by GCNs while using VHR images with a spatial resolution of less than 1m. In this study, we also analyze the applicability of these graph-based methods to land use classifications using high resolution remote sensing images with a spatial resolution of 2.5 m.
The remaining of this paper is organized, as follows. Section 2 introduces the study areas and data, Section 3 illustrates the graph-based methods used for graph feature learning and land use classification, Section 4 gives the experimental results and corresponding analysis. The discussion and conclusions are provided in Sections 5 and 6, respectively.

Study Areas and Data
The study area is located in the core region of Fuzhou City, the capital city of Fujian province, China ( Figure 1). It covers 210 km 2 , and it contains a large variety of land cover and land use types. For the Fuzhou study area, we acquired two remote sensing images, from the GF2 satellite on 18 February 2020 and the ZY3 satellite on 10 April 2020. The GF2 image was captured by a PMS1 sensor having four 2.5 m resolution multi-spectral bands and one 0.78 m resolution panchromatic band. The ZY3 image has four multi-spectral bands with 6.78 m and one panchromatic band with 2.38 m resolution. We executed pansharpening to fuse multi-spectral and panchromatic bands for both images [37], and resampled the pansharpened images to 0.8 m and 2.5 m for experiment convenience. The processed GF2 image has a size of 18,678 × 17,703 pixels for the Fuzhou study area, and the ZY3 image has 5978 × 5666 pixels.
We collected ground truth land use data over the study area from the local surveying department to delineate homogeneous land use units. The land use samples that were used to train and test land use classification algorithms were collected based on land use units derived from the ground truth data. These data were derived from the 3rd National Land Survey Project, and produced at the end of 2019. The classification system of the ground-truth land use data follows the (Chinese) National Land Use Classification Standard (GB/T 21010-2017 ), with 12 first-level land use classes and 73 second-level classes. These data are the most detailed and accurate land use data available and, thus, are suitable for this study. Moreover, we re-organized the official land use system at the first level into seven land use classes, see Table 1. We distinguish six main land use classes of high-density residential (RH), low-density residential (RL), commercial (CM), industrial and warehouses (IW), green space and entertainment land (GE), and undeveloped land (UN) from high resolution remote sensing images. We group the rest of land use classes into others (OT), including public management and services, transportation, water body, and land for special use or with multiple categories. We map water body separately because water body can be easily identified based upon the ratio of water coverage.   Figure 2 illustrates the workflow of land use mapping while using graph-based methods. Specifically, we model the topological relationships between neighboring land cover objects preliminarily obtained from remote sensing images into a planar graph. Three feature extraction methods based upon GCNs, feature engineering, and graph kernels are used to learn structural features from graph-structured data. The extracted structural features that are integrated with the features regarding the spatial composition of land use are then used for land use classification.

Land Cover Classification
Land cover classification is first conducted in order to obtain key types of land cover objects. In this section, we apply a deep learning method based upon UNet [38] to obtain the land cover map from a HR image. UNet is a popular network for semantic segmentation in computer vision, and has been successfully used for land cover classification from remote sensing images. When compared with traditional image classification methods that are based on machine learning, UNet can automatically derive high-level semantic features of images, and perform semantic segmentation in an end-to-end way. In this study, the adopted UNet uses four multispectral bands of HR images for classifying seven land cover types [39], i.e., trees, grass, shadow, water, bare soil, buildings, and others.
The main parameters of the UNet classification were set, as follows: stochastic gradient descent with momentum optimizer was used with the momentum value of 0.9, and the initial learning rate was set to 0.05 with the L2 regularization. The UNet used four channels of a HR image. The training dataset of the UNet classification was collected by digitizing the HR images into ground truth land cover areas. Because the two images (GF2 and ZY3) were acquired in the same year, we collected the training dataset (i.e., reference land cover areas) that was mainly based on the GF2 image, and then re-used the dataset with a slight correction to classify land cover on the ZY3 image.

Characterizing Urban Structures by Graphs
The arrangement of land cover objects and their structures plays an important role in separating the land use types from a HR image. We use graph theory to model the topological relations between neighboring land cover objects. The pairwise relations between land cover objects are represented in a graph. Let G = (V, E ) be a graph with a set of nodes V = {v 1 , · · · , v n } and a set of edges E = {e 1 , · · · , e m }, where n and m are the number of nodes and edges, respectively. A node v refers to either a land cover object or a land use unit, which corresponds to v LC or v LU , respectively, and an edge e refers to the linkage of two nodes. Besides, features x can be assigned to each node, leading to a feature matrix for all nodes X = [x 1 , · · · , x n ] T . High-level structural features are extracted from graph-structured data, and then used for land use classification.
To create a graph G, we first need to determine graph nodes V, i.e., the basic spatial objects for analysis, and obtain their edges E to model their pairwise relations. In this study, we used image objects created from image segmentation as nodes, while graph edges describe the spatial adjacency of images objects in a land use unit. More specifically, the image segmentation was conducted by multi-resolution segmentation implemented in eCognition software. For the GF2 image, we set the scale parameter to 60, while 30 for the ZY3 image because of a lower spatial resolution. The obtained image objects were over-segmented to maintain fine spatial details [11]. The adjacency relationships between neighboring image objects was specified by an adjacency matrix, which was computed from a Delaunay triangulation that was built based upon the centroids of image segments ( Figure 2). The use of Delaunay triangulation in order to determine the adjacency of neighboring spatial objects was previously investigated in [40].

Graph Features Derived via Graph Convolutional Networks
For a graph G, GCN learns hidden representations h v for each node v by aggregating its neighborhood information. Let h (l) i be the feature vector of node v i at the lth GCN layer, and H (l) be the feature matrix of all nodes V. A lth layer GCN aggregates information via the following layer-wise propagation rule [36], whereÃ = A + I is the adjacency matrix of graph G with added self-connections, I is the identity matrix, D is a degree matrix ofÃ withD ii = ∑ jÃij , W (l−1) is a learned weight matrix, H (0) equals X, and σ represents an activation function, such as the ReLU function. The label of nodes V can be predicted asŶ at the last layer of the GCN while using the so f tmax function, where so f tmax( In this section, we use a two-layer GCN for land use classification [11]. More specifically, we create a subgraph G i for each land use unit based upon two types of nodes, which correspond to a number of land cover nodes v LC and a land use node v LU . A land cover node refers to a land cover object within the land use unit, and the its edges model the linkage between the adjacent land cover objects. For the land use node, its edges model the linkage between land cover objects and the land use unit. In urban areas, the number of land cover objects vary between different land use units, leading to big variation among subgraphs G i representing land use units. Li [11] proposed compressing a graph that models the relations between land cover objects into the graph that models the relations between land cover types by creating a AUM. We use the same strategy in order to compress the size of graphs with respect to land use units.
Moreover, we use Bayesian methods in order to integrate information regarding spatial arrangement and spatial composition. Let C be the class variable of land use types with the number of K LU classes, F SA be the attribute variable of spatial arrangement, and F SC be the attribute variable of spatial composition. We assumed that the variable F SA is independent of F SC . Based upon Bayes' theorem [11], the label of unclassified land use units can be assigned by Here, the prior probability p(C i ) was set equally for all land use classes, the conditional probability p(F SA |C i ) was approximated by the probabilistic output ofŶ by GCN. Regarding parameter settings associated with the used GCN model, we set the number of hidden units to 24, the maximum number of epochs to 400, the initial learning rate to 0.01 using a L2 loss, and the drop rate to 0.5 [11]. The conditional probability of p(F SC |C i ) was approximated by the probabilistic output of a SVM classification while using histogram intersection kernel [41], based on spatial composition features. More specifically, the spatial composition features refer to the coverage ratio and density of a specific land cover class within a land use unit [9].

Graph Features Derived via Graph Kernels
In machine learning and pattern recognition, kernel methods have also been popularly used for handling graph-structured data, referring to graph kernels, particularly with the SVM classifier. We compare land use classifications between the structural features that were extracted by GCNs and graph kernels.
Given data points x, x ∈ R, a kernel k is a function k(x, x ) = φ(x), φ(x ) , where φ denotes a mapping from R to a feature space H, e.g., a Hilbert space [42]. We are interested in constructing kernels for graph-structured data analysis, corresponding to graph kernels, i.e., k(G, G ) where G and G refer to graphs. Recently, Kriege [33] conducted a comprehensive review on graph kernels by analyzing their expressivity, non-linear decision boundaries, accuracy, and agreement for benchmark graph-structured datasets. Among them, the Weisfeiler-Lehman subtree kernel [43] achieved the best overall performance and the state-of-the-art in graph classification, motivating our choice of graph kernel in this study. This kernel is a successful instance of the Weisfeiler-Lehman kernel framework [43], which is built upon the Weisfeiler-Lehman test of isomorphism [44]. Given two graphs G and G , the two graphs are considered to be isomorphic if and only if a pair of nodes in G is connected by an edge in the same way as the corresponding pair of nodes in G . See [43] for a detailed description on the Weisfeiler-Lehman subtree kernel. For the subtree kernel, we set its main parameter, i.e., the number of iterations, to 6.

Graph Features Derived via Feature Engineering
Extracting graph features via feature engineering, referred to as handcrafted graph features, has also been studied in the past [24,45,46]. For example, Walde [24] investigated a number of graph features based on graph centrality, adjacency unit matrix (AUM), graph connectivity, and geometry in order to classify urban structure types. These features are also used in our study for land use classification ( Table 2). Based upon handcrafted graph features, we use a random forest to classify land use units into different land use classes. For the random forest algorithm, we set its main parameter, i.e., the number of trees, to 500. Table 2. Handcrafted graph features [24]. LC refers to land cover.

Centrality measures
Degree centrality, i.e., LC type of the node with the highest node degree [45].
Betweenness centrality, i.e., LC type of the node with the highest node betweenness centrality [45]. Mean node degree of buildings, i.e., Mean of the node degree of buildings.
AUM-related measures Normalized number of edges to trees, grass, and others. Percentage of buildings with at least 1 edge to trees, grass, and others. Diversity around buildings. Percentage of buildings enclosed by trees, grass, and others.

Connectivity measures
Beta index β = m/n, measures the level of connectivity in a graph, where n and m refer to the number of edges e and number of nodes v [47]. Two distance ranges of the beta index: 0-60 m and 40-150 m were computed [24].

Additional measures
Shape index of buildings γ = A/P 2 , where A and P refer to the area and perimeter of a building object [48]. Building coverage ratio.

Performance Evaluation and Accuracy Assessment
We extracted high-level structural features by GCNs, and applied a Bayesian classifier (as mentioned in Section 3.2.1) to land use classification on both GF2 and ZY3 images. We labeled this classification as BN(SC, SA GCN ) for convenience (No. 1 in Table 3). In order to evaluate the performance of this method, we compared with nine different methods using spatial arrangement and composition features, leading to three categories ( Table 3). The choice of these methods was motivated, as follows.
• SV M(SC), SVM classification only uses features of spatial composition. It is a traditional method, and it can serve as a baseline to compare with classifications using features of spatial arrangement. • SV M(SA LS ), SVM classification only uses landscape metrics: fractal dimension, landscape shape index, and Shannon's diversity index. These metrics were investigated by existing studies [19], and they demonstrated their effectiveness in characterizing spatial structures. • SV M(SC, SA LS ), SVM classification uses both spatial composition features and landscape metrics [9]. The above-mentioned two methods only characterizes the spatial properties of land use units from one aspect, i.e., either spatial composition or spatial structures. We believe that the integration of the two aspects can improve classification performance. • GCN(SA GCN ), GCN classification that automatically derives high-level structural features while using a GCN. This method has been recently applied to land use classification from VHR remote sensing images [11]. It is also of great interest to use this method on HR images. • SV M(SA Ker ), SVM classification uses the Weisfeiler-Lehman subtree kernel. It is an automatic method for graph feature learning. [43] stated that the subtree kernel achieved the best overall performance among many others. Therefore, the SV M(SA Ker ) can be seen as a state-of-the-art method while using graph kernels. • RF(SA HD ), random forest classification uses handcrafted graph features [24]. It is a common practice to manually derive features for graph-structured data analysis. The adopted method achieved successful results in classifying urban structure types [24]. Therefore, we consider it to be a suitable benchmark method for land use classification. We further divided classified buildings into dark roof, gray roof, brick-color roof, blue roof, and bright roof based on spectral features while using a SVM classifier [9]. Table 3. Different methods for land use classifications. SA and SC refer to spatial arrangement and spatial composition.  We compute a confusion matrix [49] based on sample points collected by visual interpretation while using a stratified sampling strategy in order to evaluate the accuracy of the land cover map derived from a high resolution remote sensing image. The confusion matrix is also used to assess the accuracy of land use classifications. Figure 3 shows the classified land cover maps from the GF2 and ZY3 images. We used the training dataset (i.e., ground truth land cover areas) that was collected from two subset areas (red boxes in Figure 3) to train the UNet model. By visual inspection, the two classified maps have similar distributions of land cover types, although the map of the GF2 image has larger shadow areas than that from the ZY3 image due to different azimuth angles. The two land cover maps were assessed based upon the confusion matrix whlie using testing samples (i.e., in pixels) that were generated by stratified random sampling. More specifically, 60 testing samples were randomly generated for each land cover type, and interpreted by visual inspection. The areas of testing samples are shown in the black boxes of Figure 3. Tables 4  and 5 give the confusion matrices of the two classified land cover maps on the GF2 and ZY3 datasets. In general, both of the datasets were classified with relatively higher overall accuracy (OA) and kappa coefficient κ, corresponding to 0.91 and 0.90 for the OA of the GF2 and ZY3 datasets, and 0.8972 and 0.8806 for their κ. More specifically, the land cover map on the GF2 image has a slightly higher accuracy than that of the ZY3 image. Particularly, the increase of spatial resolution helps in distinguishing grass from trees, e.g., the user accuracy (UA) of the grass obtained from the GF2 image is higher than that of the ZY3 image, corresponding to 0.93 vs. 0.63. However, we also observed that the UA of the water class on the ZY3 dataset is higher than that of the GF2 dataset, for which more shadow pixels were misclassified into water class. This is because water and shadow have a similar spectral response on HR images. Table 4. Confusion matrix of the classified land cover map using the GF2 image. UA, PA, and OA refer to user accuracy, producer accuracy, and overall accuracy.

Land Use Classification and Accuracy Assessment
For land use classification, we selected 80 land use units as samples for each land use class based upon the ground truth land use map and visual inspection of HR images (Figure 4). The samples were randomly partitioned into training and testing datasets with equal size, i.e., 280 sample land use units for each dataset. We then performed 10 different land use classifications while using the GF2 and ZY3 images. Wwe first conducted a 10-fold cross validation on the training dataset for each classification method in order to evaluate the performance between different land use classifications. Figure 5 plots the distribution of the overall accuracy of the 10-fold cross validation accuracy. This figure shows that the classifications using structural features derived from graph kernels and GCNs, i.e., SV M(SA Ker ) and GCN(SA GCN ) had higher accuracies than using handcrafted graph features, i.e., RF(SA HD ), on both study datasets. The lowest classification accuracy was produced by SV M(SA LS ) while using three landscape metrics alone. In general, the accuracy of the land use classification using the GF2 image was higher than the corresponding classification using the ZY3 image due to a higher spatial resolution. Besides, the integration of structural features with spatial composition features further improved the classification accuracies. The highest classification accuracy was achieved by BN(SC, SA GCN ) * taking building roof's information into account. More specially, the classified buildings were further divided into dark roof, gray roof, brick-color roof, blue roof, and bright roof. We also evaluated the 10 land use classifications while using the testing dataset. Table 6 lists the OA and κ coefficient of these classifications. We can see that the results of land use classifications BN(SC, SA GCN ) * achieved an OA of 0.8750 and 0.8500, and κ of 0.8542 and 0.8250 for the GF2 and ZY3 images.  Table 6. Evaluation of the 10 land use classifications of the GF2 and ZY3 images on the testing datasets. SA and SC refer to spatial arrangement and spatial composition.  Tables 7 and 8 show the confusion matrices of the land use classifications while using the BN(SC, SA GNN ) * method on the GF2 and ZY3 images. We can see that the user accuracies (UAs) of all land use classes, except the class of others, are larger than 0.80 for both images. More specifically, for the GF2 dataset, the highest UA was obtained by the high-density residential with a UA of 1.00, followed by green space and entertainment land and low-density residential due to clear spatial patterns, which can be seen from their adjacency unit matrices (AUMs) (Figure 6). The lowest UA was given by the land use class of others. It is reasonable, because this class is usually composed of multiple functions, and used for multiple purposes. For the ZY3 dataset, we found that the highest UA was from the industrial and warehouses with a UA of 1.00, followed by green space and entertainment land, while the high-density residential was classified with a UA of 0.83. This may be explained by the fact that (1) the ZY3 image has a spatial resolution of 2.5 m, which is hard to identify individual land cover objects, particularly the buildings in highly populated areas ( Figure 6); (2) on a ZY3 image, the buildings that are used for industrial and warehouses can be more easily identified than buildings used for other land use types, because of more homogenous spectral reflectance and more regular shapes. Table 7. Confusion matrix of the classified land use map using the BN(SC, SA GNN ) * method based on the GF2 image. UA, PA and OA refer to user accuracy, producer accuracy and overall accuracy. RL, RL, CM, IW, GE, UN, and OT refer to land use classes of high-density residential, low-density residential, commercial, industrial and warehouses, green space and entertainment land, undeveloped land, and others, respectively.  Table 8. Confusion matrix of the classified land use map using the BN(SC, SA GNN ) * method based on the ZY3 image. UA, PA and OA refer to user accuracy, producer accuracy and overall accuracy. RH, RL, CM, IW, GE, UN, and OT refer to land use classes of high-density residential, low-density residential, commercial, industrial and warehouses, green space and entertainment land, undeveloped land, and others, respectively.  For the study area, two big rivers (i.e., Min River and Wulong River) cross the city, which results in some land use units filled with water. These land use units can be easily identified based upon the coverage ratio of water. We created an additional land use class, i.e., water body, for those land use units with a large portion of water. Here, we set the threshold for the coverage ratio of water to 0.85. Figure 7 shows the derived land use maps using the BN(SC, SA GNN ) * method from both the GF2 and ZY3 images.

Discussion
This study focused on the comparison of land use classifications between different graph-based methods, and between different high resolution remote sensing images. We investigated three kinds of graph-based methods, i.e., by the handcrafted method (i.e., feature engineering), graph kernels, and graph convolutional networks (GCNs), in order to extract structural features for the classification. The GCNs and graph kernels methods have been recently used as state-of-the-art methods for dealing with graph-structural data. We compared ten different land use classifications that are based upon the extracted structural features, and experimented on two remote sensing images, i.e., a very high resolution (VHR) image from the GF2 satellite with a spatial resolution of 0.8 m and a high resolution (HR) image from the ZY3 satellite of 2.5m. Our results reveal that the structural features that are derived from GCNs and graph kernels provide better classification performance than that using handcrafted methods, and they have a high potential for different applications. In general, the classification accuracies that were obtained by the GF2 image are higher than that of the ZY3 image, due to a higher spatial resolution. Nonetheless, the ZY3 satellite has a larger swath, and can be used for collecting images with broader coverage, facilitating to land use mapping over large areas. Previous research has highlighted the importance of the spatial arrangement of land cover objects for land use mapping from VHR images [9], and shown that graph-based methods, like GCNs [11], can effectively model the spatial arrangement information. However, to the best of our knowledge, little research has been conducted on the comparison of land use classifications between popular graph-based methods while using VHR images, particularly less using HR images. This study contributes to adding such gap information.
We model the pair-wise relations between neighboring land cover objects into a graph. Thus, it is essential to identify key types of land cover objects preliminarily. This study uses a deep learning method that is based upon the UNet model in order to classify land cover from the GF2 and ZY3 images. The classified land cover maps from the two images show similar attribute accuracies in terms of confusion matrix when using a point-by-point evaluation. Regarding the geometrical accuracy, the classified land cover map of the GF2 image has a better delineation of the boundaries of individual land cover objects than the ZY3 image due to higher spatial resolution ( Figure 6). For example, on a ZY3 image of 2.5 m resolution, small buildings and buildings in densely populated areas are difficult to be delineated. This difficulty may affect the modelling of urban structures based on the pair-wise relations of land cover objects. This could also explain that the highest user accuracy of the classified land use map of the ZY3 image was given by industrial and warehouses class, where the buildings usually have relatively homogeneous spectral reflectance and regular shapes. Besides, a previous study showed that the classification accuracy of land use is positively correlated with the accuracy of classified land cover [9].
A number of methods using the structural features that were extracted from graph-structured data are applied to land use classifications. Among them, we highlighted the use of graph kernels and GCNs for automatically learning high-level structural features. Recent studies investigated the effectiveness of using GCNs for feature learning from graph-structured data, and provided a technique for data compression by adjacency unit matrix (AUM) [11]. As follow-up research of [11], this study added the extraction of high-level structural features that are based on a state-of-the-art graph kernel, i.e., the Weisfeiler-Lehman subtree kernel [43]. Our results showed that both GCNs and graph kernels performed better than the handcrafted method in terms of classification accuracy. When comparing with graph kernels, the GCNs achieved the best classification performance in terms of classification accuracy. On the other hand, graph kernels can be naturally integrated with support vector machine, showing merits from the perspective of the applicability of methods. In this study, the parameters that are associated with feature learning and classification methods were set according to the literature. Further work can be conducted in order to analyze the effect of key parameters on classification performance. It is also interesting to compare the land use classifications while using high-level structural features that were learnt from graph-structured data with those using deep image features learnt from grid-structured data [50][51][52], leading to our future study. Moreover, from the statistical point of view, further improvement can be conducted in order to increase the number of training and testing land use samples using a more sophisticated sampling strategy for accuracy assessment and performance evaluation.
We tested the effectiveness of the proposed method for land use mapping on one study area in Fuzhou, China. One may be interested in the applicability of the method to other areas of the world. This method follows a hierarchical land use classification framework that was proposed in [9]. The framework starts from the classification of land cover, proceeds to the characterization of spatial arrangement and composition of land use, and ends at the classification of land use. It also highlights the importance of characterizing spatial arrangement effectively. This study used a data-driven method that was based on GCN to automatically learn high-level structural features in order to characterize the spatial arrangement. Hence, we expect this method can also be used in other different areas. Although the proposed land use classification achieved satisfactory results in this study by giving 40 land use samples per class, it is helpful to conduct a sensitivity analysis of the effect of the number of samples on the learning power of spatial arrangement features by GCN, and on the subsequent land use classification, leading to future study.
The derivation of homogenous land use units is essential in classifying land use with high accuracy, because the errors that are involved in the delineation of land use boundaries affect classification accuracy [53]. We used official data regarding land use boundaries from the local surveying department to obtain land use units. By doing so, the produced land use maps maintain high geometric accuracy. On the other hand, such practice may constrain the use of the classification method for cases without existing data of land use boundaries. For such cases, it is important to investigate an automatic method in order to directly obtain land use units from remote sensing images, which is out of the scope of this study. However, such an investigation leads to a challenging topic and it is insufficiently addressed in the literature.
Last but not the least, in this study, we distinguished six main land use classes, i.e., high-density residential, low-density residential, commercial, industrial and warehouses, green space and entertainment land, and undeveloped land from high resolution remote sensing images, while grouping the rest into others. Within the class of others, land use may involve different subclasses, such as land for special use or with multiple categories, showing a large variety of characteristics. Future efforts can be conducted to refine the definition of land use classes, or look for more effective strategy to deal with mixed land use type.

Conclusions
This study investigated the use of graph convolutional networks (GCNs) in order to extract high-level structural features for land use mapping from high resolution satellite images. We compared it with two other popular graph-based methods: feature engineering and graph kernels, and made a comparison with two remote sensing images of 0.8 m and 2.5 m spatial resolution, respectively. Our results showed that the structural features that are derived from GCNs and graph kernels are more effective than handcrafted features for both images. When comparing it with graph kernels, the GCNs achieved the best classification performance in terms of classification accuracy. Combining spatial composition features with structural features further improved the accuracy. Moreover, the improvement by the structural features varies between different land use classes for the two images due to the different spatial resolution. By comparing ten different land use classification methods, we conclude that integrating the structural features that are derived from GCNs and graph kernels with spatial composition serves as an effective means for land use mapping from high resolution remote sensing images, and has a high potential for different applications due to the automation of feature extraction.