Mapping Forest Vertical Structure in Gong-ju, Korea Using Sentinel-2 Satellite Images and Artificial Neural Networks

As global warming accelerates in recent years, the frequency of droughts has increased and water management at the national level has become very important. In particular, accurate understanding and management of the forest is essential as the water storage capacity of forest is determined by forest structure. Typically, data on forest vertical structure have been constructed from field surveys that are both costly and time-consuming. In addition, machine learning techniques could be applied to analyze, classify, and predict the uncertainties of internal structures in forest. Therefore, this study aims to map the forest vertical structure for estimating forest water storage capacity from multi-seasonal optical satellite image and topographic data using artificial neural network (ANN) in Gongju-si, South Korea. For this purpose, the 14 input neurons of normalized difference vegetation index (NDVI), two types of normalized difference water index (NDWI), two types of Normalized Difference Red Edge Index (NDre), principal component analysis (PCA) texture, and canopy height average and standard deviation maps were generated from Sentinel-2 optical images obtained in spring and fall season and topographic height maps such as digital terrain models (DTM) and digital surface models (DSM). The training/validation and test datasets for the ANN model were derived from forest vertical structures based on field surveys. Finally, the forest vertical classification map, the result of ANN application, was evaluated by creating an error matrix compared with the field survey results. The result showed an overall test accuracy of ~65.7% based on the number of pixels. The result shows that forest vertical structure in Gong-ju, Korea can be efficiently classified by using multi-seasonal Sentinel-2 satellite images and the ANN approach.


Introduction
Recently, as the importance of water resource management increases due to the occurrence of drought in an environment such as global warming, interest in forests that play a large role in the global water cycle is increasing. Forests with high water storage capabilities contribute to drought as well as flood control [1]. An accurate understanding of forest water storage capacity in water circulation functions is crucial for preventing water-related disasters and for sustainable water management at the national level [2,3]. However, information on the forest vertical structure for estimating the exact water storage capacity of forests is not quantified in most forest regions because of the complexity and continuous change of the forest. Therefore, forest vertical structure information is essential for understanding the water storage function of forests and for quantifying water storage capacity of forest.
Additionally, machine learning has recently been increasingly utilized in terms of the abundance of data available for learning, including satellite images. Machine learning techniques can be applied based on large amounts of remote sensing data [26] to produce information about continuous internal structures in forest areas with high uncertainty [27][28][29]. When conducting forest investigations over large areas of high uncertainty, the application of artificial neural networks (ANN) to satellite remote sensing data could be more effective than time-and cost-consuming aerial photography or human surveying. In addition, support vector machines (SVM) [30,31] and decision tree-based models such as random forests [32] and boosted trees [33,34] are being applied to various fields. In this study, the analysis of forest vertical structure using optical satellite images and machine learning of ANN was conducted.
In this study, the analysis of forest vertical structure to estimate forest water capacity from satellite images by using machine learning method. For this purpose, to extract input layers, index map and texture map were prepared from two seasonal Sentinel-2 optical satellite images after preprocessing. In addition, two types of canopy height maps were produced using the difference between two topographic data of DTM and DSM. The 14 input layers were produced considering the vitality of the canopy, the image texture, and the difference in tree height. Finally, the input layers were applied to ANN algorithms with training/validation and test dataset of forest vertical structure data constructed by field survey; the classification map of forest vertical structure was generated. The accuracy of the result map was evaluated through error matrix. The proposed methods for forest vertical structure could be used for establishing a plan for forest water management by enabling more accurate forest water capacity estimation than previous methods.

Study Area and Data
The study area is a part of Gongju-si, South Korea where the Charyeong Mountains are located in the north, Gyeryong Mountain is distributed in the southeast, and major tributaries are joined by the Geum River. The study area covers an area of 864.29 km 2 and is~400 m above sea level. The study area is climatically located in the mid-latitudes and belongs to a humid continental climate zone. The Charyeong Mountains block the cold north wind so that it is much warmer than the north, but the area is inland basin and influenced by Geum River, the temperature difference between summer and winter is great. Figure 1 shows the study area on the peninsula with the orthoimage of the Sentinel-2. According to the field survey data, this study area is known to have variously deciduous and coniferous trees as natural and artificial forests.    Figure 2 shows (a) three-dimensional structure, (b) major species, (c) artificial and natural forests, and (d) deciduous and coniferous forests from existing vegetation for the vegetation in the study area. Most of the single-layered forests in this area are consists of chestnut artificial forests, whereas the main species of the double-layered forests are natural forests with oyster oak, oak hardwood mixed forests, and pinus rigida artificial forests. In the triple-layered forest, oak natural forest, pinus rigida artificial forest, and oyster-oak tree natural forest are distributed. In this study, the vegetation map of Gongju area was used from the 3rd National Natural Environment Survey conducted by the National Institute of Ecology (NIE) in 2009 [35]. The forest vertical structure data from vegetation map was used as training/validation and test dataset as shown in Figure 2a. In this study, among the canopy layer, the understory layer, and the shrub layer, the layer structure including one or two of layers was defined by the single-and double-layer structure, respectively, whereas the triple-layer structure includes all of the layers.

Methodology
To map the forest vertical structure, a probability map of forest vertical structure was generated by applying artificial neural network to input layers from (1) Sentinel-2 optical image and (2) DTM (NGII DEM) and DSM (WorldDEM) with the training/validation dataset based on the forest vertical map through fieldwork. Probability maps are classified into single, double, and triple vertical hierarchies, taking into account the characteristics of the forest structures in study area. Finally, test

Methodology
To map the forest vertical structure, a probability map of forest vertical structure was generated by applying artificial neural network to input layers from (1) Sentinel-2 optical image and (2) DTM (NGII DEM) and DSM (WorldDEM) with the training/validation dataset based on the forest vertical map through fieldwork. Probability maps are classified into single, double, and triple vertical hierarchies, taking into account the characteristics of the forest structures in study area. Finally, test data of field survey data was compared to determine the accuracy of classification through the Error Matrix. Figure 3 shows the overall flow of this study.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 18 data of field survey data was compared to determine the accuracy of classification through the Error Matrix. Figure 3 shows the overall flow of this study.

Preprocessed Layers of Sentinel-2 Images
In this study, two Sentinel-2 images from different seasons of spring and fall were used (European Space Agency (ESA), Paris, France).  Table 1). The spectral bands defined for the Sentinel-2 used in this study are shown in Table 2. As vegetation, especially hardwoods, could have reflectivity differences depending on the season, images of both seasons were used to account for seasonal characteristics. The NDVI, NDWI, NDre maps, and the PCA texture maps were generated from the Sentinel-2 orthoimages of both seasons by using Matlab (Mathworks, Natick, MA, USA).

Preprocessed Layers of Sentinel-2 Images
In this study, two Sentinel-2 images from different seasons of spring and fall were used (European Space Agency (ESA), Paris, France).  Table 1). The spectral bands defined for the Sentinel-2 used in this study are shown in Table 2. As vegetation, especially hardwoods, could have reflectivity differences depending on the season, images of both seasons were used to account for seasonal characteristics. The NDVI, NDWI, NDre maps, and the PCA texture maps were generated from the Sentinel-2 orthoimages of both seasons by using Matlab (Mathworks, Natick, MA, USA).   Sentinel-2 satellite image data were preprocessed for atmospheric correction and topographic correction. Radiant energy reaches the sensor and is affected by the atmosphere in various ways. Atmospheric correction is a method of eliminating the effects of the atmosphere that disturbs the image by reaching the sensor with unnecessary energy by the atmosphere [36]. Thus, in this study, Sentinel-2 image was converted to Bottom-Of-Atmosphere, which ignores the atmospheric effects of the original image [37]. Atmospheric correction of Sentinel images was performed with a tool called Sen2Cor from Sentinel Application Platform (SNAP) provided by ESA.
In addition, the satellite image data has a difference in reflectivity according to the sun incidence angle in a mountain area with the slope land; high mountainous region has shade and sunlit slopes. Topographic correction is necessary for mountainous area to minimize other impacts besides the forest features. In this study, a commonly used terrain correction model of statistical-empirical correction [38] was applied to Sentinel-2 optical image as follows,   Table 2. Definition of Sentinel-2 spectral bands used in this study [36]. Sentinel-2 satellite image data were preprocessed for atmospheric correction and topographic correction. Radiant energy reaches the sensor and is affected by the atmosphere in various ways. Atmospheric correction is a method of eliminating the effects of the atmosphere that disturbs the image by reaching the sensor with unnecessary energy by the atmosphere [36]. Thus, in this study, Sentinel-2 image was converted to Bottom-Of-Atmosphere, which ignores the atmospheric effects of the original image [37]. Atmospheric correction of Sentinel images was performed with a tool called Sen2Cor from Sentinel Application Platform (SNAP) provided by ESA.

Sentinel-2 Bands
In addition, the satellite image data has a difference in reflectivity according to the sun incidence angle in a mountain area with the slope land; high mountainous region has shade and sunlit slopes. Topographic correction is necessary for mountainous area to minimize other impacts besides the forest features. In this study, a commonly used terrain correction model of statistical-empirical correction [38] was applied to Sentinel-2 optical image as follows, where ρ and ρ h are pixel values from original and topographic-corrected images, respectively, ρ is the mean value of the original image and i indicates the incidence angle. a and b are parameters of the statistical empirical model. After the topographic correction process, NDVI, NDWI, and NDre maps are generated from the topographic-corrected Sentinel-2 images, which were used for forest vertical mapping in this study instead of the original image. NDVI is an index based the reflectance difference between red and near-infrared, which increases when the vegetation activity becomes more active. The red band (Band 4) and NIR band (Band 8) are used in combination defined as follows [13].
NDWI is an index that is widely used for vegetation analysis by using the difference of spectral characteristics according to moisture content in vegetation [16]. The difference between NIR and SWIR, and the difference between green and SWIR, are the widely used methods [37]. In this study, two types of NDWIs were generated using the difference between green (Band 3) and SWIR (Band 11, Band 12) as follows [39].
NDre is an index using vegetation red edge band, indicating that the higher the NDre value, the greater the activity of vegetation. In this study, two types of NDre were created using three red edge bands (Bands 5-7) provided by Sentinel-2 [40,41].
In addition, texture data were used to visually consider the difference in the forest texture and the reflectivity of forest communities according to the dominant species. In the RGB image, the texture of the image is determined by the distribution of shadows. In the case of the artificial forest of the single-layer structure having the same number of species and ages, the texture is smooth due to the constant height between the trees. Multi-layered natural forests of varying ages and species are roughly textured [18]. It is also considered that the arrangement of canopies is uniform in single-layer forests and uneven in multi-layer forests [18,22]. To take this into account, the PCA, which is a representative dimensional reduction feature extraction technique used in multiband image processing, was used [42]. An image that has undergone the PCA technique has the advantage of showing surface characteristics and spectral information that were difficult to read from the original image [43].
The calculation of the PCA texture maps is conducted by three steps: (1) generation of base map by applying a 5 × 5 median filter to the produced PCA image with the moving window technique, (2) subtracting a texture image from the PCA image and applying the root mean square deviation equation, and (3) applying a 3 × 3 median filter to the texture image to reduce unwanted noise information. The final PCA texture map was produced by converting the calculated image value into a DB unit [37,44] as follows.

Canopy Height Maps from DSM and DTM
Finally, DTM and DSM data were collected for the forest height measurement. The DTM used in this study is a 5 m DEM produced by the National Geographic Information Institute (NGII) (Figure 5a). For DSM, 12 m resolution WorldDEM data were obtained which is created by using TerraSAR-X X-band radar interferometry (Figure 5b) from German Aerospace Center (Deutsches Zentrum für Luftund Raumfahrt e.V., DLR). Both data were resampled into 10 m based on the GSD (Ground Sample Distance) of the Sentinel-2 image, and the forest height map was generated by subtracting DTM which means the terrain height of ground and DSM data which includes the height of ground objects such as canopies and buildings. The canopy height map was created. In addition, DSM data were used to correct the terrain of Sentinel-2 images.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 18 Finally, DTM and DSM data were collected for the forest height measurement. The DTM used in this study is a 5 m DEM produced by the National Geographic Information Institute (NGII) ( Figure  5a). For DSM, 12 m resolution WorldDEM data were obtained which is created by using TerraSAR-X X-band radar interferometry (Figure 5b) from German Aerospace Center (Deutsches Zentrum für Luft-und Raumfahrt e.V., DLR). Both data were resampled into 10 m based on the GSD (Ground Sample Distance) of the Sentinel-2 image, and the forest height map was generated by subtracting DTM which means the terrain height of ground and DSM data which includes the height of ground objects such as canopies and buildings. The canopy height map was created. In addition, DSM data were used to correct the terrain of Sentinel-2 images. In addition, the canopy height maps were created in consideration of the fact that the forest vertical structure is closely related to the canopy height. The canopy height map is effective for measuring tree height over large areas which could be calculated from the difference between DSM and DTM [23]. In this study, 5 m resolution DTM (NGII DEM) generated based on 1:5000 digital topographic maps (NGII, Suwon, South Korea) and the 12 m resolution DSM (WorldDEM) generated using the TerraSAR-X SAR interferometry (InSAR) were resampled to 10 m as Sentinel-2 image and used to extract the canopy height from German Aerospace Center (Deutsches Zentrum für Luft-und Raumfahrt e.V., DLR). Average canopy height and standard deviation canopy height maps were generated by calculating the mean and standard deviation in a window with the kernel of 5 × 5.
DTM data could be less accurate in slope areas, and DSM could be underestimated as radar signals penetrate through forest areas. Thus, the average canopy height map estimated from InSARbased DSMs may appear somewhat lower than the actual canopy. The standard deviation canopy height map shows higher values in areas with large differences in canopy heights and smaller values in smooth areas. As such, the height of the forest that shows the distribution characteristics of the forest existing in the image could be calculated by subtracting the DTM in the DSM.

Application of Artificial Neural Network
The machine learning model of ANN with MultiLayer Perceptron (MLP) algorithm was used for learning in this study. The MLP algorithm adds hidden layers to compensate for the limitation of the linear classification of the conventional perceptron [45]. The MLP algorithm consists of three groups of layers: input layer, hidden layer, and output layer. The MLP algorithm adjusts the connection strength between layers and processes it repeatedly to perform prediction and estimation by using the error back propagation learning technique for learning. In the error back algorithm, an In addition, the canopy height maps were created in consideration of the fact that the forest vertical structure is closely related to the canopy height. The canopy height map is effective for measuring tree height over large areas which could be calculated from the difference between DSM and DTM [23]. In this study, 5 m resolution DTM (NGII DEM) generated based on 1:5000 digital topographic maps (NGII, Suwon, South Korea) and the 12 m resolution DSM (WorldDEM) generated using the TerraSAR-X SAR interferometry (InSAR) were resampled to 10 m as Sentinel-2 image and used to extract the canopy height from German Aerospace Center (Deutsches Zentrum für Luft-und Raumfahrt e.V., DLR). Average canopy height and standard deviation canopy height maps were generated by calculating the mean and standard deviation in a window with the kernel of 5 × 5.
DTM data could be less accurate in slope areas, and DSM could be underestimated as radar signals penetrate through forest areas. Thus, the average canopy height map estimated from InSAR-based DSMs may appear somewhat lower than the actual canopy. The standard deviation canopy height map shows higher values in areas with large differences in canopy heights and smaller values in smooth areas. As such, the height of the forest that shows the distribution characteristics of the forest existing in the image could be calculated by subtracting the DTM in the DSM.

Application of Artificial Neural Network
The machine learning model of ANN with MultiLayer Perceptron (MLP) algorithm was used for learning in this study. The MLP algorithm adds hidden layers to compensate for the limitation of the linear classification of the conventional perceptron [45]. The MLP algorithm consists of three groups of layers: input layer, hidden layer, and output layer. The MLP algorithm adjusts the connection strength between layers and processes it repeatedly to perform prediction and estimation by using the error back propagation learning technique for learning. In the error back algorithm, an input signal is transmitted to a hidden layer through a feedforward network and generates the final output. The weight is corrected by propagating the error in the direction of reducing the error by comparing the output with the true value. An activation function is used to adjust the strength of the connection. In this study, logistic function, the commonly used unipolar sigmoid function, is used as follows.
This function adjusts the result to a value between 0 and 1, which represents the result with a probability of 0 to 1 [46].
The input layers of NDVI, two types of NDWI, two types of NDre, PCA Texture, average canopy height, and standard deviation canopy height map produced from the above process are assumed to be correlated with forest vertical structure. To estimate the forest vertical structure, Waikato Environment for Knowledge Analysis (WEKA), a widely used data mining tool, was used in this study [47]. Training (70%) and test (30%) data were randomly set for each layer. Training (including validation) was processed with 70% (1,608,912 pixels including 297,365, 186,573, 1,124,975 for single-, double-and triple-layers, respectively) and test was performed with rest of data. The training and validation process of ANN algorithms were performed by using 10-fold cross validation technique; it is a method of repeating the training and validation by dividing the training and validation set into 10 and finding the optimal condition by the average of the errors recorded in each fold attempt to tune the optimal hyperparameters. For this purpose, the training/validation dataset was divided into 10 groups with nine groups of training data and one group of validation data; each group was used once for validation and nine times for training. Finally, forest vertical structure classification was conducted using the preprocessed data above as an input layer to the ANN and test accuracy was calculated by using test dataset.

Results
In this study, MLP-ANN method is applied to map forest vertical structure in Gongju-si, which is located near the center of South Korea. Fourteen input neurons generated from Sentinel-2 optical satellite images, existing DTM and DSM data were used as input layers for the MLP-ANN approach.

Results Maps from Preprocessing
A total of 12 neurons were prepared by Sentinel-2 optical satellite images of spring and fall season, 6 input data were generated each: NDVI, NDWI1, NDWI2, NDre1, NDre2, and PCA texture.      In addition, PCA texture maps were generated using Equations (4) and (5) from seasonal Sentinel-2 optic satellite images, respectively. The smoother the surface, the higher the texture values as shown in Figure 9a,b.  In addition, PCA texture maps were generated using Equations (4) and (5) from seasonal Sentinel-2 optic satellite images, respectively. The smoother the surface, the higher the texture values as shown in Figure 9a In addition, PCA texture maps were generated using Equations (4) and (5) from seasonal Sentinel-2 optic satellite images, respectively. The smoother the surface, the higher the texture values as shown in Figure 9a,b.  The other two input neurons were generated by differentiating the DTM obtained from NGII and the DSM obtained from WorldDEM as described above. From the canopy height map obtained from this process, two data were generated by a moving window technique with the kernel of 5 × 5. The first kind of canopy height map is an average canopy height map that could be used to estimate the average height of the canopies in the forest as shown in Figure 10a. Most average values range from 7 m to 15 m, with average values up to 25 m in some regions. Another canopy height map is a standard deviation canopy height map that reflects the varying heights of adjacent trees (Figure 10b). The varying vertical structure of forest means that there is a difference in the canopy height, so the standard deviation of the height data was obtained to recognize the difference in height of adjacent trees. In other words, the higher the standard deviation in forest region, the larger the difference in the tree height of the region, which leads to complicated vertical structure of the forest.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 12 of 18 The other two input neurons were generated by differentiating the DTM obtained from NGII and the DSM obtained from WorldDEM as described above. From the canopy height map obtained from this process, two data were generated by a moving window technique with the kernel of 5 × 5. The first kind of canopy height map is an average canopy height map that could be used to estimate the average height of the canopies in the forest as shown in Figure 10a. Most average values range from 7 m to 15 m, with average values up to 25 m in some regions. Another canopy height map is a standard deviation canopy height map that reflects the varying heights of adjacent trees (Figure 10b). The varying vertical structure of forest means that there is a difference in the canopy height, so the standard deviation of the height data was obtained to recognize the difference in height of adjacent trees. In other words, the higher the standard deviation in forest region, the larger the difference in the tree height of the region, which leads to complicated vertical structure of the forest.

Results from Artificial Neural Network
As a result from ANN approaches, probability maps for each vertical structure were obtained as shown in Figure 11: (a) single-layer structure, (b) double-layer structure, and (c) triple-layer structure. As aforementioned, the single-, double-, and triple-layer structures in this study are defined by how many layers it contains among the canopy layer, the understory layer, and the shrub layer. In the probability maps, most pixels of a single-layer structure probability map have relatively low values of less than 10%. Some pixels of the double layer probability maps have low values of less than 20%, whereas others have values between 40% and 60%. In the case of a triple-layer structure probability map, some pixels have a probability value higher than 70%, and the outer pixels of the forest appear to have a probability value lower than 20%.

Results from Artificial Neural Network
As a result from ANN approaches, probability maps for each vertical structure were obtained as shown in Figure 11: (a) single-layer structure, (b) double-layer structure, and (c) triple-layer structure. As aforementioned, the single-, double-, and triple-layer structures in this study are defined by how many layers it contains among the canopy layer, the understory layer, and the shrub layer. In the probability maps, most pixels of a single-layer structure probability map have relatively low values of less than 10%. Some pixels of the double layer probability maps have low values of less than 20%, whereas others have values between 40% and 60%. In the case of a triple-layer structure probability map, some pixels have a probability value higher than 70%, and the outer pixels of the forest appear to have a probability value lower than 20%. defined by how many layers it contains among the canopy layer, the understory layer, and the shrub layer. In the probability maps, most pixels of a single-layer structure probability map have relatively low values of less than 10%. Some pixels of the double layer probability maps have low values of less than 20%, whereas others have values between 40% and 60%. In the case of a triple-layer structure probability map, some pixels have a probability value higher than 70%, and the outer pixels of the forest appear to have a probability value lower than 20%. A high percentage of low values in the single-and double-layer structure probability maps indicates that there is a relatively small percentage of single-and double-layer forests in the study area. In a triple-layer structure probability map, probability values greater than 70% indicate a high probability for a triple-layer structure forest, whereas probability values lower than 20% indicate little or no triple-layer structure, which means that the triple structure could be clearly distinguished. In addition, the probability of a triple-layer forest occupies most of the study area since the probability density function of triple-layer structure is closer to the Gaussian distribution than the single-and double-layer structure forests.
The final classification map was created using the maximum operation that considers the highest value in the same pixel of each layer's probability map ( Figure 12). The probability of single-and double-layer structure maps was relatively low, so that the percentage of single-(20%) and double-layer (15%) structure area was low; the triple-layer structure forests dominate with 65% in the study area as represented in the classification map. As the study area is mostly composed of multi-layered natural forests, the classification results are considered to be in line with expectations. A high percentage of low values in the single-and double-layer structure probability maps indicates that there is a relatively small percentage of single-and double-layer forests in the study area. In a triple-layer structure probability map, probability values greater than 70% indicate a high probability for a triple-layer structure forest, whereas probability values lower than 20% indicate little or no triple-layer structure, which means that the triple structure could be clearly distinguished. In addition, the probability of a triple-layer forest occupies most of the study area since the probability density function of triple-layer structure is closer to the Gaussian distribution than the single-and double-layer structure forests.
The final classification map was created using the maximum operation that considers the highest value in the same pixel of each layer's probability map ( Figure 12). The probability of single-and double-layer structure maps was relatively low, so that the percentage of single-(20%) and doublelayer (15%) structure area was low; the triple-layer structure forests dominate with 65% in the study area as represented in the classification map. As the study area is mostly composed of multi-layered natural forests, the classification results are considered to be in line with expectations. The classification map was evaluated for accuracy, as shown in Table 3, through the error matrix based on the forest vertical structure data based on the field survey of Figure 2a. The total number of test data pixels excluding non-forest areas is 689,534. The overall test accuracy for the total pixels estimated from the error matrix was ~65.06%, which was not very high. The application of ANN approach with full-waveform Lidar data could expect higher accuracy. However, full-waveform The classification map was evaluated for accuracy, as shown in Table 3, through the error matrix based on the forest vertical structure data based on the field survey of Figure 2a. The total number of test data pixels excluding non-forest areas is 689,534. The overall test accuracy for the total pixels estimated from the error matrix was~65.06%, which was not very high. The application of ANN approach with full-waveform Lidar data could expect higher accuracy. However, full-waveform Lidar data is difficult to build for large areas at a single time, and is difficult to use for most areas because of the high cost. Optical satellite images and topographic data including DSM and DTM are available for most parts of the world, which could be used for cost-and time-efficient production of forest vertical maps. As a result, it is possible to observe the forest vertical structure over a large area using optical images and topographic data with~65% accuracy. The user and producer accuracy in single-layer structure was approximately 51.04% and 56.46%, respectively, with more than half the accuracy. Double-layer forests have 19.73% and 25.23% user and producer accuracy, respectively, which could be represented to have been identified with very low probability. This is due to the fact that there are very few data in the double-layer forest for training and testing the MLP-ANN model, and~62.3% of the double-layer structure has been misclassified as a triple-layer structure due to the similar canopy characteristics of the double-layer and triple-layer forests. Therefore, it could be mentioned that 62.3% of double-layer forests are very similar to the pattern of triple-layer forests. This result in this study means that a double-layer structured area cannot be completely separated from a triple-layer structure forest. The user and producer accuracy in triple-layer structure forest is about 79.85% and 73.92%, respectively. Triple-layer structure accuracy was higher than the accuracy of single-and double-structure. It can be interpreted that this is due to the well-trained MLP ANN model parameters in the triple-layer structure. Nevertheless, about 12.3% and 13.8% of the triple-layer structure were misclassified as single-and as double-layer forests, respectively.
To analyze the cause of the misclassification, the forest map and the original Sentinel-2 RGB image were compared and analyzed by visual interpretation. As a result, it was confirmed that forest trees were artificially removed or newly formed in the single-, double-, and triple-layer structure, as shown in Figure 13. This type of misclassification is occurred due to the time difference between the acquisition time of satellite images and the field survey, which could lead to the learning and output of ANN algorithm.
To analyze the cause of the misclassification, the forest map and the original Sentinel-2 RGB image were compared and analyzed by visual interpretation. As a result, it was confirmed that forest trees were artificially removed or newly formed in the single-, double-, and triple-layer structure, as shown in Figure 13. This type of misclassification is occurred due to the time difference between the acquisition time of satellite images and the field survey, which could lead to the learning and output of ANN algorithm.

Conclusions
The purpose of this study is to analyze forest vertical structure in Gongju-si to enhance the function of forest related to water resource management. MLP-ANN model machine learning technology was applied to Sentinel-2 optical satellite images and previously constructed DTM and DSM data. Machine learning techniques can be applied based on large amounts of remote sensing data to produce information about continuous internal structures in areas with high uncertainty. Therefore, input data NDVI, NDWI1, NDWI2, NDre1, NDre2, and PCA texture maps were generated from Sentinel-2 optical satellite images. In addition, an average canopy height map and a standard deviation canopy height map were generated from DTM and DSM. The classification result from ANN shows that the triple-layer forest with the highest water reserves has shown a relatively accurate 73.92% classification in producer accuracy.
Understanding the vertical structure of forests is essential for estimating forest water storage capacity for integrated water management. The results of this study show that the satellite image data, including optical satellite images and DEM data, could be used for more accurate estimation of the forest water storage by constructing data on forest vertical structures. Therefore, more accurate water storage on forest could be estimated using the vertical structure constructed using satellite images along with the forest type data provided from the forest map.
The diversity of vertical structures with the various age of forests could improve the soil penetrating structure of rainwater and increase the amount of water the forest can store. However, estimation of the forest structure in a large area is difficult due to its difficult accessibility and various forms; it is necessary to use periodic satellite images for estimating vertical structures in national-scale forests which could support forest management for water resource management. Also, the combination of remote sensing, which is essential for exploring large areas, and machine learning, which is effective for classifying and analyzing large amounts of data, could be very useful as in this study. Especially, as there is a limit in obtaining information on the internal structure of forests, it is possible to estimate the vertical structure of the forest through the methodology applied in this study using remote sensing satellite imagery and machine learning technology. It is also expected to reduce research costs, such as time and budget for field surveys.
In this study, the degradation of the learning and classification results was derived using the data with the difference between the acquisition time of the image and the investigation time of the reference data. In the future study, images of similar time should be collected to reflect the difference in survey time and improved results could be expected by obtaining reference data from multiple regions with different forest types. Additionally, various deep learning technologies which are being developed recently could reflect the detailed characteristics of the images. Based on the results of this study, the forest vertical structure data through remote sensing and machine learning could be used to estimate the forest water storage and establish forest administration measures for integrated water management.

Conflicts of Interest:
The authors declare no conflict of interest.