Development of Land Cover Classiﬁcation Model Using AI Based FusionNet Network

: Prompt updates of land cover maps are important, as spatial information of land cover is widely used in many areas. However, current manual digitizing methods are time consuming and labor intensive, hindering rapid updates of land cover maps. The objective of this study was to develop an artiﬁcial intelligence (AI) based land cover classiﬁcation model that allows for rapid land cover classiﬁcation from high-resolution remote sensing (HRRS) images. The model comprises of three modules: pre-processing, land cover classiﬁcation, and post-processing modules. The pre-processing module separates the HRRS image into multiple aspects by overlapping 75% using the sliding window algorithm. The land cover classiﬁcation module was developed using the convolutional neural network (CNN) concept, based the FusionNet network and used to assign a land cover type to the separated HRRS images. Post-processing module determines ultimate land cover types by summing up the separated land cover result from the land cover classiﬁcation module. Model training and validation were conducted to evaluate the performance of the developed model. The land cover maps and orthographic images of 547.29 km 2 in area from the Jeonnam province in Korea were used to train the model. For model validation, two spatial and temporal di ﬀ erent sites, one from Subuk-myeon of Jeonnam province in 2018 and the other from Daseo-myeon of Chungbuk province in 2016, were randomly chosen. The model performed reasonably well, demonstrating overall accuracies of 0.81 and 0.71, and kappa coe ﬃ cients of 0.75 and 0.64, for the respective validation sites. The model performance was better when only considering the agricultural area by showing overall accuracy of 0.83 and kappa coe ﬃ cients of 0.73. It was concluded that the developed model may assist rapid land cover update especially for agricultural areas and incorporation ﬁeld boundary lineation is suggested as future study to further improve the model accuracy.


Introduction
Land cover changes continuously as the urbanization progresses, and the land use changes affect various aspects including human behavior, ecosystem, and climate [1][2][3]. Land cover maps that bring land use status provide essential data for research that requires spatial information. In particular, real-time agricultural land cover data are essential for research on floodgates, soil erosion, etc., in agricultural areas [4][5][6][7][8], and for developing a future plan based on research findings [9]. Recent

CNN Based Land Cover Classification Model Framework
The land cover classification model converts HRRS images into land cover classification maps and the classification process goes through three basic modular phases ( Figure 1). First, the HRRS images are inserted in the model by overlapping 75% of each image and then separated for each data to reflect different perspectives through a pre-processing module (Figure 1a). The separated images are then passed through a CNN-based land cover classification module. The module converts the data into a land cover map, which classifies land uses with trained colors (Figure 1b). Lastly, the land cover maps with different perspectives of a single region that was classified are aggregated in the post-processing module and the resulting image is printed as the final land use classification map (Figure 1c).

Pre-Processing Module
The land use classification through CNN model does not simply classify land uses by color values of pixels; it considers the spatial distribution of color values of the surrounding pixels for process. Using only a single image of the land for land use classification could increase errors in the classification process. The pre-processing module was applied to increase the accuracy of the land use classification, as it provides images of different perspectives.
For the pre-processing module, the size of land use classification images to be separated from a random size orthographic image was set to 256 × 256 pixels considering the computer performance. Each image was 75% (shifting 64 pixels) overlapped both in the horizontal and vertical directions, and then separated for diversification of perspectives ( Figure 2). Finally, images of 16 different perspectives were used to classify land uses in a single space (Figure 2, red hatched area).

Pre-Processing Module
The land use classification through CNN model does not simply classify land uses by color values of pixels; it considers the spatial distribution of color values of the surrounding pixels for process. Using only a single image of the land for land use classification could increase errors in the classification process. The pre-processing module was applied to increase the accuracy of the land use classification, as it provides images of different perspectives.
For the pre-processing module, the size of land use classification images to be separated from a random size orthographic image was set to 256 × 256 pixels considering the computer performance. Each image was 75% (shifting 64 pixels) overlapped both in the horizontal and vertical directions, and then separated for diversification of perspectives ( Figure 2). Finally, images of 16 different perspectives were used to classify land uses in a single space (Figure 2, red hatched area).

Architecture of CNN Based Land Cover Classification Module
The land cover classification module converts the input image into a land cover map, in which the land use status is segmented semantically by respective land cover color codes (please refer to Table 1 in Section 3.3.2). CNN was used for an effective image segmentation of the land cover, and FusionNet [36] of CNN, which separates cell from EM image [37,38] was conceptually similar to the creating land cover maps that identifies land use boundary using HRRS image, was used as the structure of the neural network.

Architecture of CNN Based Land Cover Classification Module
The land cover classification module converts the input image into a land cover map, in which the land use status is segmented semantically by respective land cover color codes (please refer to Table 1 in Section 3.3.2). CNN was used for an effective image segmentation of the land cover, and FusionNet [36] of CNN, which separates cell from EM image [37,38] was conceptually similar to the creating land cover maps that identifies land use boundary using HRRS image, was used as the structure of the neural network.

Architecture of CNN Based Land Cover Classification Module
The land cover classification module converts the input image into a land cover map, in which the land use status is segmented semantically by respective land cover color codes (please refer to Table 1 in Section 3.3.2). CNN was used for an effective image segmentation of the land cover, and FusionNet [36] of CNN, which separates cell from EM image [37,38] was conceptually similar to the creating land cover maps that identifies land use boundary using HRRS image, was used as the structure of the neural network.

Architecture of CNN Based Land Cover Classification Module
The land cover classification module converts the input image into a land cover map, in which the land use status is segmented semantically by respective land cover color codes (please refer to Table 1 in Section 3.3.2). CNN was used for an effective image segmentation of the land cover, and FusionNet [36] of CNN, which separates cell from EM image [37,38] was conceptually similar to the creating land cover maps that identifies land use boundary using HRRS image, was used as the structure of the neural network.

Architecture of CNN Based Land Cover Classification Module
The land cover classification module converts the input image into a land cover map, in which the land use status is segmented semantically by respective land cover color codes (please refer to Table 1 in Section 3.3.2). CNN was used for an effective image segmentation of the land cover, and FusionNet [36] of CNN, which separates cell from EM image [37,38] was conceptually similar to the creating land cover maps that identifies land use boundary using HRRS image, was used as the structure of the neural network.

Architecture of CNN Based Land Cover Classification Module
The land cover classification module converts the input image into a land cover map, in which the land use status is segmented semantically by respective land cover color codes (please refer to Table 1 in Section 3.3.2). CNN was used for an effective image segmentation of the land cover, and FusionNet [36] of CNN, which separates cell from EM image [37,38] was conceptually similar to the creating land cover maps that identifies land use boundary using HRRS image, was used as the structure of the neural network.

Architecture of CNN Based Land Cover Classification Module
The land cover classification module converts the input image into a land cover map, in which the land use status is segmented semantically by respective land cover color codes (please refer to Table 1 in Section 3.3.2). CNN was used for an effective image segmentation of the land cover, and FusionNet [36] of CNN, which separates cell from EM image [37,38] was conceptually similar to the creating land cover maps that identifies land use boundary using HRRS image, was used as the structure of the neural network.

Architecture of CNN Based Land Cover Classification Module
The land cover classification module converts the input image into a land cover map, in which the land use status is segmented semantically by respective land cover color codes (please refer to Table 1 in Section 3.3.2). CNN was used for an effective image segmentation of the land cover, and FusionNet [36] of CNN, which separates cell from EM image [37,38] was conceptually similar to the creating land cover maps that identifies land use boundary using HRRS image, was used as the structure of the neural network.

Architecture of CNN Based Land Cover Classification Module
The land cover classification module converts the input image into a land cover map, in which the land use status is segmented semantically by respective land cover color codes (please refer to Table 1 in Section 3.3.2). CNN was used for an effective image segmentation of the land cover, and FusionNet [36] of CNN, which separates cell from EM image [37,38] was conceptually similar to the creating land cover maps that identifies land use boundary using HRRS image, was used as the structure of the neural network.

Architecture of CNN Based Land Cover Classification Module
The land cover classification module converts the input image into a land cover map, in which the land use status is segmented semantically by respective land cover color codes (please refer to Table 1 in Section 3.3.2). CNN was used for an effective image segmentation of the land cover, and FusionNet [36] of CNN, which separates cell from EM image [37,38] was conceptually similar to the creating land cover maps that identifies land use boundary using HRRS image, was used as the structure of the neural network. The number of spectral channels used for the land classification was upgraded by adopting three channels: red (R), green (G), blue (B), of the HRRS image. The land cover classification result is expressed in values that are closest to the learned classification color for each land cover ( Table 1).
The entire land cover classification module largely consists of two processes: encoding ( Figure  3a) that classifies land use features from the image data, and decoding (Figure 3b) that prints the land use map classified through different colors according to the classified land use features. The module is configured with a combination of four basic layers of the convolutional layer, residual layer, down-sampling and up-sampling, and summation-based skip connection.
As a layer that is widely used in the deep learning field, the convolutional layer converts the entered data into compressed data including special information in the processes of convolution, The number of spectral channels used for the land classification was upgraded by adopting three channels: red (R), green (G), blue (B), of the HRRS image. The land cover classification result is expressed in values that are closest to the learned classification color for each land cover ( Table 1).
The entire land cover classification module largely consists of two processes: encoding ( Figure  3a) that classifies land use features from the image data, and decoding (Figure 3b) that prints the land use map classified through different colors according to the classified land use features. The module is configured with a combination of four basic layers of the convolutional layer, residual layer, down-sampling and up-sampling, and summation-based skip connection.
As a layer that is widely used in the deep learning field, the convolutional layer converts the entered data into compressed data including special information in the processes of convolution, The number of spectral channels used for the land classification was upgraded by adopting three channels: red (R), green (G), blue (B), of the HRRS image. The land cover classification result is expressed in values that are closest to the learned classification color for each land cover ( Table 1).
The entire land cover classification module largely consists of two processes: encoding ( Figure  3a) that classifies land use features from the image data, and decoding (Figure 3b) that prints the land use map classified through different colors according to the classified land use features. The module is configured with a combination of four basic layers of the convolutional layer, residual layer, down-sampling and up-sampling, and summation-based skip connection.
As a layer that is widely used in the deep learning field, the convolutional layer converts the entered data into compressed data including special information in the processes of convolution, The number of spectral channels used for the land classification was upgraded by adopting three channels: red (R), green (G), blue (B), of the HRRS image. The land cover classification result is expressed in values that are closest to the learned classification color for each land cover ( Table 1).
The entire land cover classification module largely consists of two processes: encoding ( Figure  3a) that classifies land use features from the image data, and decoding (Figure 3b) that prints the land use map classified through different colors according to the classified land use features. The module is configured with a combination of four basic layers of the convolutional layer, residual layer, down-sampling and up-sampling, and summation-based skip connection.
As a layer that is widely used in the deep learning field, the convolutional layer converts the entered data into compressed data including special information in the processes of convolution, activation function, and batch normalization [39]. Parameters used in the configuration of  The number of spectral channels used for the land classification was upgraded by adopting three channels: red (R), green (G), blue (B), of the HRRS image. The land cover classification result is expressed in values that are closest to the learned classification color for each land cover ( Table 1).
The entire land cover classification module largely consists of two processes: encoding ( Figure  3a) that classifies land use features from the image data, and decoding (Figure 3b) that prints the land use map classified through different colors according to the classified land use features. The module is configured with a combination of four basic layers of the convolutional layer, residual layer, down-sampling and up-sampling, and summation-based skip connection.
As a layer that is widely used in the deep learning field, the convolutional layer converts the entered data into compressed data including special information in the processes of convolution, activation function, and batch normalization [39]. Parameters used in the configuration of The number of spectral channels used for the land classification was upgraded by adopting three channels: red (R), green (G), blue (B), of the HRRS image. The land cover classification result is expressed in values that are closest to the learned classification color for each land cover ( Table 1).
The entire land cover classification module largely consists of two processes: encoding ( Figure  3a) that classifies land use features from the image data, and decoding (Figure 3b) that prints the land use map classified through different colors according to the classified land use features. The module is configured with a combination of four basic layers of the convolutional layer, residual layer, down-sampling and up-sampling, and summation-based skip connection.
As a layer that is widely used in the deep learning field, the convolutional layer converts the entered data into compressed data including special information in the processes of convolution, activation function, and batch normalization [39]. Parameters used in the configuration of convolutional layer in this module are kernel size of 3, stride of 1, and padding of 1. In general, the The number of spectral channels used for the land classification was upgraded by adopting three channels: red (R), green (G), blue (B), of the HRRS image. The land cover classification result is expressed in values that are closest to the learned classification color for each land cover ( Table 1).
The entire land cover classification module largely consists of two processes: encoding ( Figure  3a) that classifies land use features from the image data, and decoding (Figure 3b) that prints the land use map classified through different colors according to the classified land use features. The module is configured with a combination of four basic layers of the convolutional layer, residual layer, down-sampling and up-sampling, and summation-based skip connection.
As a layer that is widely used in the deep learning field, the convolutional layer converts the entered data into compressed data including special information in the processes of convolution, activation function, and batch normalization [39]. Parameters used in the configuration of convolutional layer in this module are kernel size of 3, stride of 1, and padding of 1. In general, the The number of spectral channels used for the land classification was upgraded by adopting three channels: red (R), green (G), blue (B), of the HRRS image. The land cover classification result is expressed in values that are closest to the learned classification color for each land cover ( Table 1).
The entire land cover classification module largely consists of two processes: encoding (Figure 3a) that classifies land use features from the image data, and decoding (Figure 3b) that prints the land use map classified through different colors according to the classified land use features. The module is configured with a combination of four basic layers of the convolutional layer, residual layer, down-sampling and up-sampling, and summation-based skip connection.
Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 21 with third convolutional layer. Four long skip connections connect the layer of the encoding path to the decoding path, which are the same size.

Post-Processing Module
The primary function of post-processing integrates module is to analyze and integrate classified images of 16 different perspectives, produced from the pre-processing and the land cover classification modules, into a final land cover using numerical and statistical methods. The post-processing module consists of two parts of land cover assignment to a pixel level and following land cover integration to determine the final land cover to a given pixel.
Land cover assignment part is the process to determine land cover to a pixel based on the classified output. The R, G, B values of classified image pixels are the output by the classification model that were trained with land cover map color (label). The model output values do not match exactly to the reference values of land cover map since the trained CNN works only to minimize errors. The Gaussian distance was used to find nearest land cover to a given pixel by calculating the As a layer that is widely used in the deep learning field, the convolutional layer converts the entered data into compressed data including special information in the processes of convolution, activation function, and batch normalization [39]. Parameters used in the configuration of convolutional layer in this module are kernel size of 3, stride of 1, and padding of 1. In general, the rectified linear unit (ReLU) is used in the activation function (Equation (1)). However, in ReLU, the gradient value always becomes negative if the entered value is negative. Data compression using ReLU in the encoding process could result in data losses. Leaky ReLU, which can provide gradient value when the input value is negative, can complement ReLU (Equation (2)). Therefore, the module was configured with Leaky ReLU as an activation function in the encoding process and ReLU as an activation function in the decoding process.
ReLU: f(x) = max(0, x) (1) Leaky ReLU: f(x) = max(0.01x, x) The residual layer is configured with three convolutional layers and a single summation-based skip connection. The neural network deepens, and the data features become noticeable through the three convolutional layers included in the residual layer. However, as the neural network deepens, a gradient vanishing problem occurs. To solve this issue, we used the summation-based skip connection in the residual layer, which integrates the past data with the currently processing data in the module, and configured the layer to enable a more exquisite classification [36].
Down-sampling was used to reduce computation volume and prevent overfitting [40]. Maxpooling function that brings the largest value within each stride was also used. In the down-sampling process, a kernel size of 2, stride of 2, and zero padding were used. In contrast to the down-sampling, the up-sampling increases the layer size to acquire images segmented by colors from the land use classification results. The deconvolution function was used for up-sampling.
Summation-based skip connections, which solve the gradient vanishing issue and help the information transfer, were used in the residual layer and long skip connections. So, the convolutional layer could be much deeper to include more parameters without losing its training efficiency. Skip connections function to merge the previous layer and the current layer by summing the matrix of each layer. Short skip connections in residual layer add up first convolutional layer with third convolutional layer. Four long skip connections connect the layer of the encoding path to the decoding path, which are the same size.

Post-Processing Module
The primary function of post-processing integrates module is to analyze and integrate classified images of 16 different perspectives, produced from the pre-processing and the land cover classification modules, into a final land cover using numerical and statistical methods. The post-processing module consists of two parts of land cover assignment to a pixel level and following land cover integration to determine the final land cover to a given pixel.
Land cover assignment part is the process to determine land cover to a pixel based on the classified output. The R, G, B values of classified image pixels are the output by the classification model that were trained with land cover map color (label). The model output values do not match exactly to the reference values of land cover map since the trained CNN works only to minimize errors. The Gaussian distance was used to find nearest land cover to a given pixel by calculating the distance between the classified color values (result) with reference land cover code (label). The formula to calculate Gaussian distance is given in Equation (3). The label land cover with the lowest Gaussian distance was assigned to a given pixel.
where Rresult, Gresult, Bresult = red, green, blue values represent output produced by the land cover classification module and Rlabel, Glabel, Blabel = red, green, blue values indicate the reference values specific to each sub-category of land cover maps given in Table 1. The reference color code is standardized and provided by the Korean government. The land cover integration is a process of aggregating the land cover classification results that are classified through images of 16 different perspectives of one region, to determine the final land cover classification result. The 16 different classification results per pixel that were obtained from the pre-processing, the land cover classification module, and the pixel land use determination process of the post-processing module were aggregated to produce the final result. The item selected as having Remote Sens. 2020, 12, 3171 8 of 20 the greatest number of land cover classification results among the aggregated land cover classification items was determined as the final land use classification result (Equation (4)). When the maximum counts are same for the different land covers, then one of them was chosen arbitrarily. The final land use classification result per pixel is printed as a final land cover classification map in which the classification of the aggregated orthographic image input is complete.
Classification Result = max(count paddy , count forest , count water , count item , etc.), (4) where count item is the number of the classified item out of the total classification results.

Study Procedure
The schematics of the developed model training and verification is shown in Figure 4.
Rlabel, Glabel, Blabel = red, green, blue values indicate the reference values specific to each sub-category of land cover maps given in Table 1. The reference color code is standardized and provided by the Korean government.
The land cover integration is a process of aggregating the land cover classification results that are classified through images of 16 different perspectives of one region, to determine the final land cover classification result. The 16 different classification results per pixel that were obtained from the pre-processing, the land cover classification module, and the pixel land use determination process of the post-processing module were aggregated to produce the final result. The item selected as having the greatest number of land cover classification results among the aggregated land cover classification items was determined as the final land use classification result (Equation (4). When the maximum counts are same for the different land covers, then one of them was chosen arbitrarily. The final land use classification result per pixel is printed as a final land cover classification map in which the classification of the aggregated orthographic image input is complete.
Classification Result = max(countpaddy, countforest, countwater, countitem, etc.), (4) where countitem is the number of the classified item out of the total classification results.

Study Procedure
The schematics of the developed model training and verification is shown in Figure 4.

Training Area
To train the land cover classification model, a training area of 547.29 km 2 was selected, which could acquire the latest source data of 2018 and contain the largest cultivating area in South Korea. The selected area included a cultivated acreage of 22,495 ha in Yeongam-gun and 20,279 ha in Muan-gun, and is a useful area for training the model for agricultural land cover [41]. In addition, it could also train the model for urban land cover, as it includes the main urban areas, such as the Gun offices in Jeonnam province, Mokpo City Hall, etc., which are some of the major buildings in the urban center. The areas selected for the study are shown in Figure 5.
could acquire the latest source data of 2018 and contain the largest cultivating area in South Korea. The selected area included a cultivated acreage of 22,495 ha in Yeongam-gun and 20,279 ha in Muan-gun, and is a useful area for training the model for agricultural land cover [41]. In addition, it could also train the model for urban land cover, as it includes the main urban areas, such as the Gun offices in Jeonnam province, Mokpo City Hall, etc., which are some of the major buildings in the urban center. The areas selected for the study are shown in Figure 5.

Verification Area
The artificial intelligence (AI) based land cover classification is likely to be the most accurate for data that have spatial and temporal dimensions similar to that of the training data. To increase the accuracy of model verification, this study selected two separate areas for the verification process; one area had spatial and temporal dimensions similar to that of the training area and the other area was located far from the training area and had different time periods.
An area of 62.37 km 2 in Subuk-myeon, Jeonnam province, in 2018 was selected as the first verification area. For the Subuk-myeon area, an orthographic image source data obtained for the same period as that of the study data (2018) exists, and spatially, it is located in the same region as the study data (Jeonnam province).
As the second verification area, an area of 79.61 km 2 in Daeso-myeon, Chungbuk province, in 2016 was selected. Daeso-myeon was selected because an orthographic image source data recorded at a different time period (2016) and located far from the study data was available. These two verification areas are shown in Figure 5.

Orthographic Image
The orthographic images of the study area (2018) and the verification areas (2018, 2016) were obtained from the National Geographic Information Institute (NGII). The orthographic images are produced through geometric correction and orthometric correction of aerial photographs recorded in each period. The spectral channels of HRRS image include the red, green, and blue bands. The

Verification Area
The artificial intelligence (AI) based land cover classification is likely to be the most accurate for data that have spatial and temporal dimensions similar to that of the training data. To increase the accuracy of model verification, this study selected two separate areas for the verification process; one area had spatial and temporal dimensions similar to that of the training area and the other area was located far from the training area and had different time periods.
An area of 62.37 km 2 in Subuk-myeon, Jeonnam province, in 2018 was selected as the first verification area. For the Subuk-myeon area, an orthographic image source data obtained for the same period as that of the study data (2018) exists, and spatially, it is located in the same region as the study data (Jeonnam province).
As the second verification area, an area of 79.61 km 2 in Daeso-myeon, Chungbuk province, in 2016 was selected. Daeso-myeon was selected because an orthographic image source data recorded at a different time period (2016) and located far from the study data was available. These two verification areas are shown in Figure 5.

Orthographic Image
The orthographic images of the study area (2018) and the verification areas (2018, 2016) were obtained from the National Geographic Information Institute (NGII). The orthographic images are produced through geometric correction and orthometric correction of aerial photographs recorded in each period. The spectral channels of HRRS image include the red, green, and blue bands. The resolution of the orthographic images of the established study area (2018) and verification areas (2018, 2016) is 51 cm/pixel.

Land Cover Map
The land cover maps of the selected sites for model training (2018) and the two verification sites (2018, 2016) were established using the Korean Environmental Geographic Information System (EGIS). The land cover maps of EGIS are the results of on-screen digitizing of the orthographic images from the NGII by assigning various colors to respective land use types as shown in Table 1. It should be noted that the color codes were thoughtfully assigned that the sub-category-classes under a main category have similar color hues, i.e., red for urbanized area, yellow for agricultural land, dark green for forest, light green for grassland, violet for wetland, light blue for barren lands, and dark blue for water. The land cover map was used as the ground truth serving as the target for the supervised model training. As previously described in Equations (3) and (4), the color codes in Table 1 were also used for calculating the Gaussian distance for the land cover assignment to the pixel level. The land cover map is presented in three different categories based on the level of detail; seven items in the main category, 22 items in the parent subcategory, and 41 items in the child subcategory.

Training Land Cover Classification Model
The location adjustment of the established orthographic images and land cover maps of the study area was made using ArcMap (ESRI, Ver. 10.1). To train the land cover classification model, acquired data were split vertically and horizontally with 256 × 256 pixels (130 m × 130 m), which is an appropriate size for confirming the land uses. The land cover map classified by colors of the child subcategory (41 items) was used for the model training. A total of 32,384 orthographic images and 32,384 land cover maps of the same area were created.
For effective model training, an image augmentation process that increases the number of the study data was performed. The level of the diversification of the study data was increased by rotating the generated orthographic images and land cover maps at angles of 0 • , 90 • , 180 • , and 270 • . Finally, 129,536 sheets of each orthographic image and land cover map were used for training the model.
The model training was conducted using Intel i7-9700K CPU and NVIDA GeForce GTX 1060 6gb hardware, and using the Python (Ver. 3.6.8) and PyTorch (Ver. 1.0.1) software. Mean square loss (MSELoss) was used for estimating errors in the model, and the training was completed when the error value was less than 0.01. The training was completed with MSELoss value of 0.0077 after training 129,536 sheets of the established study data 84 times.

Verification Method for Land Cover Classification Model
The accuracy of the land cover classification was evaluated by creating accuracy metrics that indicated consistencies of items between the evaluation results of the two evaluators and by estimating the quantitative index values for overall accuracy and kappa coefficient based on the metrics.
The accuracy metrics were created by comparing the land cover classification map created by the on-screen digitizing method (reference land cover) with that created classified by the developed model (classified land cover). The consistent land cover classification results of both evaluators (reference land cover, classified land cover) were indicated on the diagonal matrix of accuracy metrics.
Producer's accuracy and user's accuracy show the ratio of correctly matched area of each land use in reference data and classified data, respectively (Equations (5) and (6)). The overall accuracy was estimated using Equation (7), which indicates the ratio of the correctly classified land cover area to the entire area. Three accuracy indicator has values ranging from 0 to 1, and the classification becomes more accurate as this value comes closer to 1. Although the overall accuracy enables an intuitive accuracy evaluation of the land cover classification, it has a limitation of not being able to consider a possibility of accidentally assigning the same land cover classification for different areas.
Thus, a kappa coefficient, which excluded the probability of accidental consistency in the overall accuracy, was also used in the accuracy evaluation of the land cover classification. The kappa coefficient means a value compared with a randomly arranged accuracy metrics (Equation (8)). The kappa coefficient has values ranging from −1 to 1, and as the value gets closer to 1, the classification attains a higher accuracy. Model performance was evaluated by the strength criteria provided by Landis and Koch [42].

Performance of Land Cover Classification at the Child Subcategory
Model performance in land cover classification for the respective Subuk-myeon (Jeonnam province, 2018) and Daeso-myeon (Chungbuk province, 2016) was presented in terms of the number of classified land cover that matches with the reference land label as shown in Figure 1. A total of 41 land covers at the child subcategory level were classified with the developed model ( Figure 6). As can be seen from the darker gray along the diagonal direction, the developed model performed reasonably well in land cover classification. Overall, the model demonstrated better accuracy for the urban, agriculture, forest and water areas, while relatively poor in grass, wetlands, and barren lands showing the wider spectrum of classified land covers.

Performance of Land Cover Classification at the Child Subcategory
Model performance in land cover classification for the respective Subuk-myeon (Jeonnam province, 2018) and Daeso-myeon (Chungbuk province, 2016) was presented in terms of the number of classified land cover that matches with the reference land label as shown in Figure 1. A total of 41 land covers at the child subcategory level were classified with the developed model ( Figure 6). As can be seen from the darker gray along the diagonal direction, the developed model performed reasonably well in land cover classification. Overall, the model demonstrated better accuracy for the urban, agriculture, forest and water areas, while relatively poor in grass, wetlands, and barren lands showing the wider spectrum of classified land covers.

Classified land cover
Classified land cover Reference land cover (a) Subuk-myeon (b) Daeso-myeon Figure 6. Model performance matrices for (a) the Subuk-myeon and (b) Daeso-myeon areas. The accuracy was presented in gray scale based on the percentage of the classified pixels that batch with the reference land cover label to the total number of pixels. The darker gray indicates the more pixels were classified to the given land cover. Refer to the color codes for detailed land cover given Table 1. The accuracy was presented in gray scale based on the percentage of the classified pixels that batch with the reference land cover label to the total number of pixels. The darker gray indicates the more pixels were classified to the given land cover. Refer to the color codes for detailed land cover given Table 1.
Most of misclassifications at the child subcategory (41 classes) level occurred within the main categories (seven classes) for the urbanized and agriculture areas. It can be seen that some sporadic points are deviated from the diagonal line, but it still remains within the same square of the respective main land cover classes in Figure 6. Several urban land covers in the 41-classes-subcategory are buildings which are similar in appearance so making it difficult for the developed model differentiate its usage, for example among apartment, industrial, and commercial buildings as shown in Figure 6(a-1,a-2,b-2  and b-2). Agricultural areas also include land cover sub-classes with similar outward shapes depending on consolidation that resulted in the misclassifications within the main category ( Figure 6(a-3,b-3)).
However, grass and barren lands showed the wider misclassifications over the main land cover categories. Many of grass lands were misclassified by the forest, wetlands, and barren lands, of which appearances are similar as natural landscapes (Figure 6(a-4,a-5,b-4, and b-5)).
The statistics of 16 different perspective classifications are presented in Figure 7. As aforementioned, the final land cover classification was assigned statistically based on the maximum counts of a given land cover class. developed model differentiate its usage, for example among apartment, industrial, and commercial buildings as shown in Figure 6(a-1,a-2,b-2 and b-2). Agricultural areas also include land cover sub-classes with similar outward shapes depending on consolidation that resulted in the misclassifications within the main category ( Figure 6(a-3,b-3)).
However, grass and barren lands showed the wider misclassifications over the main land cover categories. Many of grass lands were misclassified by the forest, wetlands, and barren lands, of which appearances are similar as natural landscapes (Figure 6(a-4,a-5,b-4, and b-5)).
The statistics of 16 different perspective classifications are presented in Figure 7. As aforementioned, the final land cover classification was assigned statistically based on the maximum counts of a given land cover class. The greater the maximum count is, the more consistent the developed model performed for a given class. The average counts for urban, agriculture, forest, and water land covers were greater than 14, which indicates approximately 87% consistency of out of 16 classifications. This implies also that the consideration of viewpoints could improve classification consistency by reducing 13% potential errors as compared to a single-pixel based classification. Consistent with Figure 6, the maximum counts for wetland and barren lands were smaller than nine, which means nearly half of these categories could be misclassified as other land covers if the 16 different perspective classifications were not implemented. Thus the 16-perspective application can increase model accuracy by improving the model classification consistency.

Classification Accuracy of the Aggregated Land Cover to Main Category
The overall accuracy of the land cover classification was evaluated by expanding the printed land uses of child subcategory unit to the main category unit. A land cover map (Figure 8c) of an orthographic image (Figure 8a) of Subuk-myeon which is spatially and temporally similar to the study area (Jeonnam province, 2018) was printed through the land cover classification model developed in this study. This result is compared with the land cover map (Figure 8b) obtained from EGIS and the accuracy metrics are presented in Table 2. The greater the maximum count is, the more consistent the developed model performed for a given class. The average counts for urban, agriculture, forest, and water land covers were greater than 14, which indicates approximately 87% consistency of out of 16 classifications. This implies also that the consideration of viewpoints could improve classification consistency by reducing 13% potential errors as compared to a single-pixel based classification. Consistent with Figure 6, the maximum counts for wetland and barren lands were smaller than nine, which means nearly half of these categories could be misclassified as other land covers if the 16 different perspective classifications were not implemented. Thus the 16-perspective application can increase model accuracy by improving the model classification consistency.

Classification Accuracy of the Aggregated Land Cover to Main Category
The overall accuracy of the land cover classification was evaluated by expanding the printed land uses of child subcategory unit to the main category unit. A land cover map (Figure 8c) of an orthographic image (Figure 8a) of Subuk-myeon which is spatially and temporally similar to the study area (Jeonnam province, 2018) was printed through the land cover classification model developed in this study. This result is compared with the land cover map (Figure 8b) obtained from EGIS and the accuracy metrics are presented in Table 2.
Visual comparison of both the land cover map of EGIS and the land cover map produced by the model shows an overall match between the two maps. However, the result showed large differences in the green areas of deciduous forests, coniferous forests, and mixed forests that were classified as the forest area. The classification of the overall forest area showed users accuracy of 95.02% and a producer's accuracy of 74.37%. This high accuracy result was achieved because the developed model is effective in classifying forest and other areas; however, it is not as effective in identifying detailed forest types within the forest area. As the forest area has large differences in elevations, the shades developed are based on its terrain, which affects the colors of the orthographic images. Such differences in colors made it difficult to classify the forests in detail.
The qualitative index of overall accuracy and kappa coefficient were 0.81 and 0.71, respectively. This showed a substantial degree of accuracy with the kappa coefficient ≥0.6 and <0.8. It was confirmed that this was an improvement when compared to overall accuracy of 73.7% and kappa coefficient of 0.69 [43] of the land cover classification using an object-based algorithm in the agricultural region.    Agricultural area, forest area and water showed high classification accuracy. The user's accuracy and producer's accuracy of each classification item from the accuracy metrics showed in agricultural area (92.58%, 90.68%), forest area (95.02%, 74.37%), and water (90.27%, 82.41%), respectively. However, the classification accuracy of grassland (51.88%, 71.04%), wetlands (54.20%, 21.38%), and barren lands (9.70%, 11.10%) were low. It was caused by the misclassification of grassland to forest, wetland to grassland and barren lands, and barren lands to wetland and agricultural area. This might be results from the ambiguity of the land cover, which causes difficulties in classification, as the forms of land uses are not clear (Figures 9 and 10).
Agricultural area, forest area and water showed high classification accuracy. The user's accuracy and producer's accuracy of each classification item from the accuracy metrics showed in agricultural area (92.58%, 90.68%), forest area (95.02%, 74.37%), and water (90.27%, 82.41%), respectively. However, the classification accuracy of grassland (51.88%, 71.04%), wetlands (54.20%, 21.38%), and barren lands (9.70%, 11.10%) were low. It was caused by the misclassification of grassland to forest, wetland to grassland and barren lands, and barren lands to wetland and agricultural area.
This might be results from the ambiguity of the land cover, which causes difficulties in classification, as the forms of land uses are not clear (Figures 9 and 10).  In the space surrounded by the central river of the orthographic image (Figure 10a) near the river in Subuk-myeon, it is difficult to clearly classify the wetlands, barren lands, and grasslands by visual reading. In the land cover map (Figure 10b) of EGIS, this was classified as a single land use (wetland) based on the land boundary, while in the land cover classification model, the classification of an optimal land use per pixel (Figure 10c) was conducted. The differences in the land cover classification results caused by the ambiguity of the land cover largely affected the classification accuracy of wetlands and barren lands.
Wetlands are generally located in lower land area and thus pooled water during rainy season, while grasslands and barren lands are dry lands with and without vegetation, respectively. Thus wetlands, grasslands, and barren lands could share the similar landscape and its land cover changes depending on the existence of water and vegetation. This may have caused some ambiguity among those land covers that resulted in relatively poor performance. The ambiguity can be alleviated if additional indicators of NDWI (normalized difference water index) and NDVI (normalized difference vegetation index) are used, along with the RGB values.
Additionally, the overall portions of wetland and barren areas are relatively small as compared to other land covers, so the model might have been under-trained. The poor performance for water and barren area would be improved by training the model with more data specific to these land covers. Figure 11 shows the orthographic image of Daeso-myeon (Figure 11a), which is spatially (Chungbuk province) and periodically (2016) different from the study area (Jeonnam province, 2018) and the land cover map (Figure 11c) classification produced by the land cover classification model. The accuracy metrics obtained by verifying it visually with the land cover map of EGIS (Figure 11b) are presented in Table 3.  (Figure 9c, yellowish green). As mentioned, this might be due to difficulties of distinguish these two classes and thus more intensive training on forest areas are needed to increase the accuracy.
In the space surrounded by the central river of the orthographic image (Figure 10a) near the river in Subuk-myeon, it is difficult to clearly classify the wetlands, barren lands, and grasslands by visual reading. In the land cover map (Figure 10b) of EGIS, this was classified as a single land use (wetland) based on the land boundary, while in the land cover classification model, the classification of an optimal land use per pixel (Figure 10c) was conducted. The differences in the land cover classification results caused by the ambiguity of the land cover largely affected the classification accuracy of wetlands and barren lands.
Wetlands are generally located in lower land area and thus pooled water during rainy season, while grasslands and barren lands are dry lands with and without vegetation, respectively. Thus wetlands, grasslands, and barren lands could share the similar landscape and its land cover changes depending on the existence of water and vegetation. This may have caused some ambiguity among those land covers that resulted in relatively poor performance. The ambiguity can be alleviated if additional indicators of NDWI (normalized difference water index) and NDVI (normalized difference vegetation index) are used, along with the RGB values.
Additionally, the overall portions of wetland and barren areas are relatively small as compared to other land covers, so the model might have been under-trained. The poor performance for water and barren area would be improved by training the model with more data specific to these land covers. Figure 11 shows the orthographic image of Daeso-myeon (Figure 11a), which is spatially (Chungbuk province) and periodically (2016) different from the study area (Jeonnam province, 2018) and the land cover map (Figure 11c) classification produced by the land cover classification model. The accuracy metrics obtained by verifying it visually with the land cover map of EGIS (Figure 11b) are presented in Table 3.  Although the verification area was both spatially and periodically different from the study area, Figure 10. Example of ambiguity of land cover in wetland and barren lands (purple is wetland, gray is barren lands, green is grass land, blue is river, red is road). (a) Orthographic image 1, (b) Land cover map 1, (c) Classification result 1.
In the space surrounded by the central river of the orthographic image (Figure 10a) near the river in Subuk-myeon, it is difficult to clearly classify the wetlands, barren lands, and grasslands by visual reading. In the land cover map (Figure 10b) of EGIS, this was classified as a single land use (wetland) based on the land boundary, while in the land cover classification model, the classification of an optimal land use per pixel (Figure 10c) was conducted. The differences in the land cover classification results caused by the ambiguity of the land cover largely affected the classification accuracy of wetlands and barren lands.
Wetlands are generally located in lower land area and thus pooled water during rainy season, while grasslands and barren lands are dry lands with and without vegetation, respectively. Thus wetlands, grasslands, and barren lands could share the similar landscape and its land cover changes depending on the existence of water and vegetation. This may have caused some ambiguity among those land covers that resulted in relatively poor performance. The ambiguity can be alleviated if additional indicators of NDWI (normalized difference water index) and NDVI (normalized difference vegetation index) are used, along with the RGB values.
Additionally, the overall portions of wetland and barren areas are relatively small as compared to other land covers, so the model might have been under-trained. The poor performance for water and barren area would be improved by training the model with more data specific to these land covers. Figure 11 shows the orthographic image of Daeso-myeon (Figure 11a), which is spatially (Chungbuk province) and periodically (2016) different from the study area (Jeonnam province, 2018) and the land cover map (Figure 11c) classification produced by the land cover classification model. The accuracy metrics obtained by verifying it visually with the land cover map of EGIS (Figure 11b) are presented in Table 3.
(a) Figure 11. Photos of (a) Orthographic image, (b) land cover map, and (c) classification results of the Daeso-myeon area.
Although the verification area was both spatially and periodically different from the study area, it was confirmed that the classification was conducted well when visually comparing the land cover map of EGIS with the land cover map produced by the model.
In particular, the qualitative index of overall accuracy and kappa coefficient were 0.75, and 0.64, respectively. When compared with the Subuk-myeon area (overall accuracy of 0.81, kappa coefficient of 0.71), which was similar to the study area, it showed approximately 10% lower accuracy. However, it showed a substantial degree of accuracy with kappa coefficient ≥ 0.6 and < 0.8. This result confirms the possibility of applying the land cover classification of a general orthographic image data regardless of its acquired time, or space. As for the classification accuracy of each item, the results for agricultural area (user's accuracy of 87.16%, producer accuracy of 89.16%) were accurate, whereas the results for wetlands (user's accuracy of 20.47%, producer's accuracy of 8.92%) and barren lands (user's accuracy of 42.29%, producer's accuracy of 18.09%) showed a lower level of accuracy. Thus, it can be confirmed that the results are similar to those from the aforementioned verification of Subuk-myeon.

Land Cover Classification of the Agricultural Fields
Agricultural land cover data is widely used in various purpose from rural hydrological analysis to soil conservation to irrigation planning. To this end, further investigation in agricultural lands were conducted by analyzing the parent level land cover. The model simulation results from the Daeso-myeon, which is one of the verification areas, was evaluated by estimating the classification accuracy to the level of subcategory land cover including paddy field, upland, and greenhouses.
Agricultural lands (equivalent to the second row and column in Table 3) were extracted from the global simulation results and rearranged based on the subcategories as presented in Table 4. All other land cover classes other than agricultural lands were lumped into other land cover class. The model performance in agricultural lands were evaluated with overall accuracy and kappa coefficient were also estimated. The land cover classification in the agricultural field showed an overall accuracy of 0.83 and kappa coefficient of 0.73, which showed the result of substantial. In evaluating the accuracy per item, the classification of paddy (user's accuracy of 88.56, producer's accuracy of 83.70) and upland (user's accuracy of 71.11 producer's accuracy of 78.55) showed a higher level of accuracy. In both items, some paddy fields were misclassified as upland and some uplands as paddy fields. This arises from an ambiguity in boundaries that occurs when classifying the land cover in pixel unit, which is a characteristic of the model (Figure 12).
In reality, the land uses of paddy field and upland are in the form of parcels with boundaries. However, the result of the model produces the optimal land cover classification per pixel not per parcel with boundaries. As indicated by the red dotted lines shown in Figure 12, most of the area is classified as paddy field in the classification of a parcel of paddy field, whereas some part of the area is classified as upland. The ambiguity in boundaries could be improved to additional process of applying land parcel boundaries into pixel-based land cover. The classification of the greenhouses had a user's accuracy of 74.86 and a producer's accuracy of 59.27, and, therefore, some areas were classified as upland. After the structure of the green house was demolished, it showed a similar land use characteristic to upland in the orthographic image, although it was classified as green house in the land cover map of EGIS.
The classification accuracy of orchards and other cultivation plots was evaluated to be lower than that of the other items. The land cover map of EGIS classifies areas of cultivating garden trees, street trees, etc., other than fruit trees and livestock production facilities, as other cultivation lands. However, the ambiguity in classification of orchards and other cultivation lands that cultivate garden trees, street trees, etc., and greenhouses and other cultivation lands that were used as livestock production facilities became a limitation factor.

Conclusions
As an effort to improve the update of land cover map especially for agriculture, this study developed the land cover classification model using the CNN-based FusionNet network structure, and verified its applicability for the two areas of different spatial and temporal characteristics.
The land cover classification model was designed to perform an accurate land cover classification by reflecting the adjacent land uses based on CNN. In addition, by reading 16 images of different perspectives for the land cover classification of a single area, the classification consistency was increased.
Performance of the developed model was reasonably good demonstrating the overall accuracies of 0.81 and 0.75, and kappa coefficients of 0.71 and 0.64, respectively. However, the accuracy for wetlands and barren lands were 26.24% and 20.30%, which were substantially misclassified to grassland and wetland, respectively. These areas were relatively small and, thus, further training with data specific to these land covers may improve the accuracy. In reality, the land uses of paddy field and upland are in the form of parcels with boundaries. However, the result of the model produces the optimal land cover classification per pixel not per parcel with boundaries. As indicated by the red dotted lines shown in Figure 12, most of the area is classified as paddy field in the classification of a parcel of paddy field, whereas some part of the area is classified as upland. The ambiguity in boundaries could be improved to additional process of applying land parcel boundaries into pixel-based land cover.
The classification of the greenhouses had a user's accuracy of 74.86 and a producer's accuracy of 59.27, and, therefore, some areas were classified as upland. After the structure of the green house was demolished, it showed a similar land use characteristic to upland in the orthographic image, although it was classified as green house in the land cover map of EGIS.
The classification accuracy of orchards and other cultivation plots was evaluated to be lower than that of the other items. The land cover map of EGIS classifies areas of cultivating garden trees, street trees, etc., other than fruit trees and livestock production facilities, as other cultivation lands. However, the ambiguity in classification of orchards and other cultivation lands that cultivate garden trees, street trees, etc., and greenhouses and other cultivation lands that were used as livestock production facilities became a limitation factor.

Conclusions
As an effort to improve the update of land cover map especially for agriculture, this study developed the land cover classification model using the CNN-based FusionNet network structure, and verified its applicability for the two areas of different spatial and temporal characteristics.
The land cover classification model was designed to perform an accurate land cover classification by reflecting the adjacent land uses based on CNN. In addition, by reading 16 images of different perspectives for the land cover classification of a single area, the classification consistency was increased.
Performance of the developed model was reasonably good demonstrating the overall accuracies of 0.81 and 0.75, and kappa coefficients of 0.71 and 0.64, respectively. However, the accuracy for wetlands and barren lands were 26.24% and 20.30%, which were substantially misclassified to grassland and wetland, respectively. These areas were relatively small and, thus, further training with data specific to these land covers may improve the accuracy.
When considering only agricultural areas, the model performance was better showing overall accuracy of 0.83 and kappa coefficients of 0.73. Moreover, each land cover classification accuracy of paddy field, upland was 80.48% on average.
It was concluded that the developed model can enhance the efficiency of the current slow process of land cover classification and thus assist rapid update of land cover map.
The developed CNN-based land cover classification model classifies land cover on a pixel unit, which can be different from the actual land use of parcel unit. So, considering land parcel boundaries