Classiﬁcation of Landscape A ﬀ ected by Deforestation Using High-Resolution Remote Sensing Data and Deep-Learning Techniques

: Human-induced deforestation has a major impact on forest ecosystems and therefore its detection and analysis methods should be improved. This study classiﬁed landscape a ﬀ ected by human-induced deforestation e ﬃ ciently using high-resolution remote sensing and deep-learning. The SegNet and U-Net algorithms were selected for application with high-resolution remote sensing data obtained by the Kompsat-3 satellite. Land and forest cover maps were used as base data to construct accurate deep-learning datasets of deforested areas at high spatial resolution, and digital maps and a softwood database were used as reference data. Sites were classiﬁed into forest and non-forest areas, and a total of 13 areas (2 forest and 11 non-forest) were selected for analysis. Overall, U-Net was more accurate than SegNet (74.8% vs. 63.3%). The U-Net algorithm was about 11.5% more accurate than the SegNet algorithm, although SegNet performed better for the hardwood and bare land classes. The SegNet algorithm misclassiﬁed many forest areas, but no non-forest area. There was reduced accuracy of the U-Net algorithm due to misclassiﬁcation among sub-items, but U-Net performed very well at the forest / non-forest area classiﬁcation level, with 98.4% accuracy for forest areas and 88.5% for non-forest areas. Thus, deep-learning modeling has great potential for estimating human-induced deforestation in mountain areas. The ﬁndings of this study will contribute to more e ﬃ cient monitoring of damaged mountain forests and the determination of policy priorities for mountain area restoration. In FCNs, fully connected layers are replaced with convolution layers to overcome the limitations of the CNN model, such as the loss of image location information M.-J.L.; S.-H.L. M.-J.L.; S.-H.L. Visualization,


Introduction
Forests perform many important roles that influence the lives of humans [1]. They store large amounts of carbon in vegetation and soil, and exchange it for oxygen (i.e., contribute oxygen to the atmosphere) [2]. Forests also affect the urban thermal environment via sheltering and evaporative cooling. The social impacts of forests also cannot be ignored [3][4][5]. Forests also produce wood for lumber, and help protect riverine ecosystems, prevent soil erosion, and mitigate climate change through carbon exchange [1,2]. Forests also have indirect economic value; they promote water security through the collection of approximately 139,000 gallons of rainwater annually, and they can reduce air conditioning costs by up to 56% [3]. One study reported that investment in small urban forests improved net profit by at least $232,000 through reducing air conditioning and water storage fees [6,7]. The economic value of Mediterranean forests, including the value of wood products such as cork and areas showing recent deforestation were selected using high-resolution remote sensing data acquired by the Kompsat-3 satellite, and datasets for deep-learning using segmentation data were constructed to incorporate spatial characteristics. Classification items were selected based on different types of reference data. Third, the datasets were divided into learning and testing sets, and the learning data were applied to the selected deep-learning algorithms. Hyperparameters were tuned in the algorithm learning process, and the accuracy of the optimum test was evaluated. Finally, the results were analyzed to determine the applicability of the algorithm and data for the assessment of landscape affected by deforestation (Figure 1).
Remote Sens. 2020, 12, x FOR PEER REVIEW 3 of 16 area. Next, study areas showing recent deforestation were selected using high-resolution remote sensing data acquired by the Kompsat-3 satellite, and datasets for deep-learning using segmentation data were constructed to incorporate spatial characteristics. Classification items were selected based on different types of reference data. Third, the datasets were divided into learning and testing sets, and the learning data were applied to the selected deep-learning algorithms. Hyperparameters were tuned in the algorithm learning process, and the accuracy of the optimum test was evaluated. Finally, the results were analyzed to determine the applicability of the algorithm and data for the assessment of landscape affected by deforestation ( Figure 1).

Method
Deep-learning algorithms are currently being used in various remote sensing and spatial information studies. The CNN is the most widely used computer vision-related deep-learning algorithm. CNNs reinforce the pixel characteristics of input images through a convolutional process and perform an iterative process of condensing and pooling reinforced characteristics in a feature map, ultimately producing fully connected layers that may be applied to a neural network. The CNN has evolved into various algorithms, including LeNet-5, AlexNet, and GoogLeNet. The spatial characteristics of input images and locational characteristics of objects are not involved in these processes.
FCN-based algorithms have been used recently to solve this problem of deep-learning algorithms based on computer vision. In FCNs, fully connected layers are replaced with convolution layers to overcome the limitations of the CNN model, such as the loss of image location information

Method
Deep-learning algorithms are currently being used in various remote sensing and spatial information studies. The CNN is the most widely used computer vision-related deep-learning algorithm. CNNs reinforce the pixel characteristics of input images through a convolutional process and perform an iterative process of condensing and pooling reinforced characteristics in a feature map, ultimately producing fully connected layers that may be applied to a neural network. The CNN has evolved into various algorithms, including LeNet-5, AlexNet, and GoogLeNet. The spatial characteristics of input images and locational characteristics of objects are not involved in these processes.
FCN-based algorithms have been used recently to solve this problem of deep-learning algorithms based on computer vision. In FCNs, fully connected layers are replaced with convolution layers to overcome the limitations of the CNN model, such as the loss of image location information and fixation of input images. Due to these characteristics, FCNs can not only classify objects, but also semantically divide them [37].
Various semantic segmentation models have emerged recently to supplement FCN methods [38]. Among them, SegNet [39] is effective in terms of learning speed and accuracy [40]. The architecture of SegNet consists of encoder and decoder processes. The encoder process consists of image compression and feature extraction using the rectified linear unit (ReLU) during activation. Upon its completion, the decoder process restores the image. Image spatial information is maintained during the decoder process because image restoration is performed using the same pooling layer as in the encoder process. This feature of SegNet differentiates it from FCNs. When image reconstruction is complete, the image is classified using a softmax function ( Figure 2) [39,41].
Remote Sens. 2020, 12, x FOR PEER REVIEW 4 of 16 and fixation of input images. Due to these characteristics, FCNs can not only classify objects, but also semantically divide them [37]. Various semantic segmentation models have emerged recently to supplement FCN methods [38]. Among them, SegNet [39] is effective in terms of learning speed and accuracy [40]. The architecture of SegNet consists of encoder and decoder processes. The encoder process consists of image compression and feature extraction using the rectified linear unit (ReLU) during activation. Upon its completion, the decoder process restores the image. Image spatial information is maintained during the decoder process because image restoration is performed using the same pooling layer as in the encoder process. This feature of SegNet differentiates it from FCNs. When image reconstruction is complete, the image is classified using a softmax function ( Figure 2) [39,41]. U-Net [42] was developed based on the FCN and is applied mainly for the segmentation of small numbers of medical images. The U-Net model architecture resembles the letter U [43], with a contracting path on the left and expansive path on the right ( Figure 3). The contracting path uses an image patch, with the NxNxC C channel as input layers. In each path, sub-sampling is performed using convolution layers, ReLU activation functions, and max pooling. In the expansive path, U-Net has two definitive characteristics: the copy-and-crop step, which brings source information to the contracting path using a skip connection [44], and convolution layers without fully connected layers in the image restoration stage. In the network, input images are mirrored to predict the boundary value of the patch [45]. U-Net uses input data in patch units instead of a sliding window, thereby improving its speed over that of previous networks. This algorithm accurately captures the context of the image through concatenation using the copy-and-crop function while solving the FCN issue of localization [42]. U-Net [42] was developed based on the FCN and is applied mainly for the segmentation of small numbers of medical images. The U-Net model architecture resembles the letter U [43], with a contracting path on the left and expansive path on the right ( Figure 3). The contracting path uses an image patch, with the NxNxC C channel as input layers. In each path, sub-sampling is performed using convolution layers, ReLU activation functions, and max pooling. In the expansive path, U-Net has two definitive characteristics: the copy-and-crop step, which brings source information to the contracting path using a skip connection [44], and convolution layers without fully connected layers in the image restoration stage. In the network, input images are mirrored to predict the boundary value of the patch [45]. U-Net uses input data in patch units instead of a sliding window, thereby improving its speed over that of previous networks. This algorithm accurately captures the context of the image through concatenation using the copy-and-crop function while solving the FCN issue of localization [42].

Satellite Image Preprocessing
High-resolution Komsat-3 images were used to construct the datasets. Kompsat-3 data comprise five bands: panchromatic, blue, green, red, and near-infrared (NIR). Systematic errors in the data are eliminated using rational polynomial coefficient sensor modeling data, followed by orthometric correction by rectifying digital differentials through observation of ground control points. Atmospheric correction was performed using the COST model that approximates the transmittance from the sun to the earth by cosine [46]. The resolution of the Kompsat-3 panchromatic image was 0.7 m; resolution was corrected for the multispectral band using pan-sharpening, which merges highresolution panchromatic images and relatively low-resolution multispectral images [47]. The pansharpened files used in this study to generate 0.7-m-resolution multispectral images were obtained from the Korea Aerospace Research Institute (KARI).

AI Learning Dataset Construction
Precise deep learning data are needed to apply deep learning to high-resolution satellite images, to ensure accurate classification of regional attributes. Thus, learning and testing datasets are constructed after satellite image preprocessing. The study areas were selected in consultation with: Ministry of Environment land cover maps (1:5000); Korea Forest Service digital forest cover maps (1:5000) and a mountain forest database ( Table 1). The study area is selected near Bonghwa-gun, Gyeongsangbuk-do, which has 19,031 km 2 of mountainous area ( Figure 4). In addition, according to the data of the Korea Forest Service [14], the number of deforestation cases is the highest in the country [14]. Furthermore, more than half of the forest area is made up of softwood and hardwoods, and more than 70% of them are Softwood.

Satellite Image Preprocessing
High-resolution Komsat-3 images were used to construct the datasets. Kompsat-3 data comprise five bands: panchromatic, blue, green, red, and near-infrared (NIR). Systematic errors in the data are eliminated using rational polynomial coefficient sensor modeling data, followed by orthometric correction by rectifying digital differentials through observation of ground control points. Atmospheric correction was performed using the COST model that approximates the transmittance from the sun to the earth by cosine [46]. The resolution of the Kompsat-3 panchromatic image was 0.7 m; resolution was corrected for the multispectral band using pan-sharpening, which merges high-resolution panchromatic images and relatively low-resolution multispectral images [47]. The pan-sharpened files used in this study to generate 0.7-m-resolution multispectral images were obtained from the Korea Aerospace Research Institute (KARI).

AI Learning Dataset Construction
Precise deep learning data are needed to apply deep learning to high-resolution satellite images, to ensure accurate classification of regional attributes. Thus, learning and testing datasets are constructed after satellite image preprocessing. The study areas were selected in consultation with: Ministry of Environment land cover maps (1:5000); Korea Forest Service digital forest cover maps (1:5000) and a mountain forest database ( Table 1). The study area is selected near Bonghwa-gun, Gyeongsangbuk-do, which has 19,031 km 2 of mountainous area (Figure 4). In addition, according to the data of the Korea Forest Service [14], the number of deforestation cases is the highest in the country [14]. Furthermore, more than half of the forest area is made up of softwood and hardwoods, and more than 70% of them are Softwood.   A training dataset was constructed using the study area data. The data were projected on the WGS 1984 UTM Zone 52N coordinate system, and a learning data plan was established based on a 1:5000 scale. The software used for this process was ArcGIS ver. 10.3, ENVI ver. 5.1, and ERDAS Imagine 2015 (or later versions). The working procedures for data labeling comprised five stages: A training dataset was constructed using the study area data. The data were projected on the WGS 1984 UTM Zone 52N coordinate system, and a learning data plan was established based on a 1:5000 scale. The software used for this process was ArcGIS ver. 10.3, ENVI ver. 5.1, and ERDAS Imagine 2015 (or later versions). The working procedures for data labeling comprised five stages: data collection, data preparation, land cover classification, quality inspection, and creation of learning data.
For this study, satellite images and land and forest cover maps were collected as base data, and a continuous digital map (1:5000), mountain forest area database, and softwood database were collected as reference data ( Figure 5). The study area was selected to include at least 50% forest. Land cover classification was performed of items that were visually distinct on the satellite images. Primary classification consisted of dividing the data into forest and non-forest areas, after which forest areas were subdivided into softwood and hardwood and non-forest areas were subdivided into buildings, farmland, and bare land, resulting in a total of 13 cover types (Table 2). Detailed revision of the data included corrections such as the elimination of roads blocked by trees in the satellite images ( Figure 6).
Next, a quality inspection was conducted of the classified items to improve the efficiency of learning dataset creation. This inspection consisted of a visualization test of classification items at each step. Labeling was performed with precision revision when the accuracy was ≥95% and re-labeled when the accuracy was lower. Finally, learning and test datasets were constructed according to the relevant image data. For data labeling, the scope and resolution were equivalent to those of the image data. Image and labeling data were converted to the Geotiff (unsigned integer 8-bit) standard information compatibility format (Figure 7). Prior to the inclusion of satellite image and labeling data in the datasets, 63 pairs of images were created. Image data and ground truth pairs represent the same area, and all image pairs represent the areas that contain at least 50% of the forest area. Satellite image data that corresponded to labeling datasets were constructed with the inclusion of red, green, blue, and NIR bands. The 63 datasets were subdivided into 50 sets for learning and 13 for testing. The precisely constructed dataset was applied to the SegNet and U-Net algorithms.
Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 16 data collection, data preparation, land cover classification, quality inspection, and creation of learning data. For this study, satellite images and land and forest cover maps were collected as base data, and a continuous digital map (1:5000), mountain forest area database, and softwood database were collected as reference data ( Figure 5). The study area was selected to include at least 50% forest. Land cover classification was performed of items that were visually distinct on the satellite images. Primary classification consisted of dividing the data into forest and non-forest areas, after which forest areas were subdivided into softwood and hardwood and non-forest areas were subdivided into buildings, farmland, and bare land, resulting in a total of 13 cover types ( Table 2). Detailed revision of the data included corrections such as the elimination of roads blocked by trees in the satellite images ( Figure  6).
Next, a quality inspection was conducted of the classified items to improve the efficiency of learning dataset creation. This inspection consisted of a visualization test of classification items at each step. Labeling was performed with precision revision when the accuracy was ≥95% and relabeled when the accuracy was lower. Finally, learning and test datasets were constructed according to the relevant image data. For data labeling, the scope and resolution were equivalent to those of the image data. Image and labeling data were converted to the Geotiff (unsigned integer 8-bit) standard information compatibility format (Figure 7). Prior to the inclusion of satellite image and labeling data in the datasets, 63 pairs of images were created. Image data and ground truth pairs represent the same area, and all image pairs represent the areas that contain at least 50% of the forest area. Satellite image data that corresponded to labeling datasets were constructed with the inclusion of red, green, blue, and NIR bands. The 63 datasets were subdivided into 50 sets for learning and 13 for testing. The precisely constructed dataset was applied to the SegNet and U-Net algorithms.

Hyperparameter Tuning
Model learning was performed before optimum hyperparameter estimation. The hyperparameters were estimated with considered of the number of iterations, batch size, patch size, and learning rate. All hyperparameters consider the change in "training loss value" and test accuracy to avoid overfitting. There was no underfitting; to minimize overfitting, a high-quality dataset was sought and the loss value and final accuracy during the learning process were considered. Iteration refers to learning trials; we began with 100,000 iterations and then increased this in steps of 100,000 considering the training loss value and test accuracy. Of the algorithms tested, SegNet was the most accurate and had the lowest loss value after 100,000 iterations, while U-Net was most accurate after 300,000 iterations. The training loss value was smaller than 0.4 when U-Net learned 300,000 times and SegNet learned 100,000 times. Batch size is the number of input data; in this study, it was fixed at 100 for SegNet and 1 for U-Net. A patch size of 256 × 256 was used. The learning rate is based on the amount of learning achieved in each iteration, which affects the model weights; in this study, it was set to 1 × 10 −5 for both the SegNet and U-Net algorithms (Table 3).

Classification of Deforested Land
After constructed dataset learning and hyperparameter tuning, the final classification of the mountain deforestation area was performed. These results were obtained from images in which intact and damaged forests were distributed evenly among 13 test datasets. To obtain the results, the softmax function was used for both the SegNet and U-Net models. Thirteen classification items were obtained, and the total accuracy and item-level accuracy were calculated among all items not excluded from the test datasets. Three accuracy metrics were calculated (Table 4). During the learning process, hyperparameter tuning was performed to enhance overall accuracy. Mean intersection over union (MIoU) and frequency-weighted intersection over union (FIoU) values were derived from the calculated hyperparameters. MIoU, the mean of the IoU values of all classes, is used widely in computer vision-based object detection [48]. In this study, the MIoU values for SegNet and U-Net were 14.0% and 25.4%, respectively; classes containing relatively small areas appear to have shown low accuracy. The FIoU is calculated with greater weight on large-area items; it has been applied to various datasets, such as the COCO dataset [49]. FIoU values for SegNet and U-Net were 41.4% and 61.6%, respectively; with the U-Net model FIoU showing 10-20% less than total accuracy. The total accuracy values for SegNet and U-Net were 63.3% and 74.8%, respectively (Figure 8).
Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 16 low accuracy. The FIoU is calculated with greater weight on large-area items; it has been applied to various datasets, such as the COCO dataset [49]. FIoU values for SegNet and U-Net were 41.4% and 61.6%, respectively; with the U-Net model FIoU showing 10-20% less than total accuracy. The total accuracy values for SegNet and U-Net were 63.3% and 74.8%, respectively (Figure 8).  Item-level classification showed that softwood represented the largest forest area, with a high classification accuracy of 92.6% for U-Net and an accuracy of 84.4% for SegNet (Table 5). Hardwood forest, which covered a smaller area than the softwood forest, was identified at an accuracy of 61.0% by U-Net and 73.3% by SegNet. SegNet showed 54.6% higher accuracy than U-Net in classifying bare  Item-level classification showed that softwood represented the largest forest area, with a high classification accuracy of 92.6% for U-Net and an accuracy of 84.4% for SegNet (Table 5). Hardwood forest, which covered a smaller area than the softwood forest, was identified at an accuracy of 61.0% by U-Net and 73.3% by SegNet. SegNet showed 54.6% higher accuracy than U-Net in classifying bare land. However, U-Net showed better classification accuracy than SegNet for other non-forest areas, perhaps because SegNet misclassified most non-forest areas as bare land. Fields represented the second largest land cover area among non-forest areas; U-Net showed 84.2% accuracy and SegNet showed 64.1% accuracy for their classification. Among non-forest land classes in the U-Net and SegNet algorithms, fields had the highest classification accuracy. U-Net classified "facility cultivation" with 50.7% accuracy, while SegNet did not classify it at all. In most cases, facility cultivation was misclassified as bare land or paddy field. U-Net and SegNet showed 21.2% and 11.5% classification accuracies, respectively, for agricultural land including paddy fields; the most frequent classification error was the misclassification of grassland as agricultural land, representing a 21.3% difference in accuracy. For cemeteries, which occupied the smallest area among all land use types, the classification accuracy of U-Net was 14.2%, which was misclassified to forest land. In the case of cemetery, all misclassifications occurred in SegNet. Both U-Net and SegNet misclassified by cemeteries as hardwood forest. Regarding the U-Net results for buildings and roads, the classification accuracy was less than 30%; it was typically misclassified as field. Regarding the SegNet results, the building and road land use types were misclassified as bare land and facility cultivation, resulting in low overall accuracy. Unlike U-Net, SegNet could not accurately classify buildings and roads. The five land use types with a 0% value in the SegNet model were misclassified as bare land or fields. Land use types that could not be classified by SegNet had significantly fewer pixels than the classified types. The classification performance of the U-Net algorithm also showed low accuracy in land cover types with a small number of pixels.

Discussion
The SegNet algorithm outperformed U-Net in the classification of hardwood forests and bare land; for all other land use types, U-Net was more accurate. U-Net classified forests and non-forests, which was the basis of this study, with 98.4% and 88.5% accuracy, respectively. Previous studies of deforested area based on remote sensing data and deep-learning have also shown a high rate of accuracy (~90%), likely because land cover is divided into only two or three classes [35,[50][51][52].
The overall accuracy of the U-Net model is 74.8%, which was about 11.5% higher than that of SegNet (63.3%). These results are consistent with those of previous studies, which have revealed greater accuracy of the U-Net model in single-and multi-class classification [53][54][55][56]. The model results showed that SegNet outperformed U-Net in the classification of hardwood forest and bare land, but in all other items, U-Net showed a higher rate of accuracy than SegNet. SegNet was not able to classify facility cultivation, grassland, cemetery, building, and road. These results are performance in classes with small number of pixels and are items that show low accuracy even in the U-Net model. As a result, it is judged that SegNet exhibited lower performance in terms of a small number of pixels compared to U-Net. The difference between the overall accuracy and FIoU differs by more than 20% for SegNet, and this indicates that the accuracy of SegNet is lower than that of U-Net for items with a small number of pixel values. Other studies have shown relatively low classification accuracy for farmland and roads [57]. SegNet shows lower performance in classifying buildings than U-Net according to the dataset [58]. This may be explained by the difference in the decoder stage upsampling process between U-Net and SegNet.
Forest and non-forest areas were classified with high accuracy (98.4% and 88.5%, respectively) in this study. However, the accuracy of non-forest land cover (grasslands, fields, buildings, and roads) classification was low. The distinction between bare land and fields was ambiguous because the Kompsat-3 images used as learning data in this study were obtained in the spring, when the aboveground plant biomass was low. Roads around disturbed mountain forests are typically made of concrete, rather than asphalt, which may explain the models' low accuracy distinguishing roads from buildings. The facility cultivation, bare land, and cemetery land use types in the training dataset were insufficient. The performance of the SegNet and U-Net algorithms was different for each item. As a result, it is difficult to achieve high performance in 13 items with one algorithm, so each algorithm must be treated organically by item. For example, the U-Net algorithm showed high performance for forest/non-forest classification, but the SegNet showed better performance for hardwood and bare land items. Therefore, it is necessary to organically use an algorithm suitable for each cover types classification performance in the future, and the hardware performance limitations due to the use of multiple algorithms is a task to be solved.

Conclusions
In this study, landscape affected by human-induced deforestation from high-resolution Kompsat-3 satellite images was classified using the FCN-based U-Net and SegNet deep-learning algorithms. To ensure efficiency, precise training datasets were constructed from reference data, such as forest and land cover maps. To this end, satellite images were preprocessed, and labeling data were created from the reference data. In total, 13 classes were formed by the subdivision of forest and non-forest areas. Training and verification datasets were applied to the SegNet and U-Net models to estimate hyperparameters, considering batch size, patch size, number of iterations, and learning rate. For SegNet, the hyperparameter estimation was optimal at a batch size of 100 and patch size of 256 × 256, with 100,000 iterations; for U-Net, performance was optimal at a batch size of 1, patch size of 256 × 256, and 300,000 iterations. The final performance of the U-Net and SegNet algorithms was evaluated based on the FIoU and MIoU values.
The overall accuracy of the U-Net model was 74.8%, which was 11.5% higher than that of SegNet (63.3%). However, for land use types with a small number of pixels, both models showed low accuracy. Overall, the accuracy of U-Net was high, and when the results of U-Net were divided into forest and non-forest land, the forest land was misclassified within the forest land. The accuracy for non-forest lands was also low, but misclassification occurred within non-forest lands. SegNet outperformed U-Net in the classification of hardwood forest and bare land.
Training datasets can be constructed, and deep-learning algorithms like U-Net applied, for the interpretation of high-resolution satellite images in various ways. Moreover, the U-Net hyperparameters used in this study could be applied to other regions, which may facilitate quantitative analysis of larger areas disturbed by deforestation.
This study has several limitations. First, more advanced deep-learning algorithms have been developed since this study was conducted and which may improve upon the accuracy of our classification results. Second, larger datasets are needed to accurately train the AI algorithms and more clearly distinguish the various sub-classes used in this study, especially in non-forest areas. We anticipate that further research using improved algorithms and larger datasets will result in better estimation of the causes of deforested mountain areas. Furthermore, the implementation of a stable system for upgrading deep-learning algorithms will facilitate future monitoring and management of damaged mountain forests.