Mapping Forest Vertical Structure in Jeju Island from Optical and Radar Satellite Images Using Artificial Neural Network

Recently, due to the acceleration of global warming, an accurate understanding and management of forest carbon stocks, such as forest aboveground biomass, has become very important. The vertical structure of the forest, which is the internal structure of the forest, was mainly investigated by field surveys that are labor intensive. Recently, remote sensing techniques have been actively used to explore large and inaccessible areas. In addition, machine learning techniques that could classify and analyze large amounts of data are being used in various fields. Thus, this study aims to analyze the forest vertical structure (number of tree layers) to estimate forest aboveground biomass in Jeju Island from optical and radar satellite images using artificial neural networks (ANN). For this purpose, the eight input neurons of the forest related layers, based on remote sensing data, were prepared: normalized difference vegetation index (NDVI), normalized difference water index (NDWI), NDVI texture, NDWI texture, average canopy height, standard deviation canopy height and two types of coherence maps were created using the Kompsat-3 optical image, L-band ALOS PALSAR-1 radar images, digital surface model (DSM), and digital terrain model (DTM). The forest vertical structure data, based on field surveys, was divided into the training/validation and test data and the hyper-parameters of ANN were trained using the training/validation data. The forest vertical classification result from ANN was evaluated by comparison to the test data. It showed about a 65.7% overall accuracy based on the error matrix. This result shows that the forest vertical structure map can be effectively generated from optical and radar satellite images and existing DEM and DTM using the ANN approach, especially for national scale mapping.


Introduction
Recently, as global climate change has shown observable effects on the environment such as global warming, interest in the forests that make a significant contribution to the global carbon cycle is increasing [1]. Therefore, an accurate understanding and management of forest carbon stocks, such as forest biomass, is very important to preserve and protect forest ecosystems and to understand the impact of climate change [2,3]. However, due to the complex structure of forests, spatial distributions such as the aboveground biomass stocks and the location of the carbon uptake are still not quantified in many forests [1]. The forest aboveground biomass could be estimated from the vertical forest high uncertainty is considerably more effective than costly aerial photographs and human surveying. Therefore, this study applied a method of estimating and classifying forest vertical structures using remote sensing technique and an artificial neural network (ANN).
This study aims to analyze the forest vertical structure to estimate forest aboveground biomass in Jeju Island from satellite images. In this study, the vertical structure of forests was quantified by a number of tree layers; according to the number of layers included in the canopy layer, the understory layer, and the shrub layer-layer structure was classified into single, double, or triple-layer structure. For this purpose, an input layer for forest vertical structure analysis was derived based on satellite images and applied to the ANN method. In detail, first, NDVI, NDWI, NDVI texture, and NDWI texture were prepared from KOMPSAT-3 optical images acquired from a part of Seogwipo-si in Jeju Island, South Korea. In order to estimate the height of the trees, two kinds of canopy height maps were produced from the canopy height map by subtracting the DTM from DSM. Two kinds of coherence images were produced from multi-temporal ALOS PALSAR-1 radar images. Finally, the eight input layers and the training data of the forest vertical structure data produced by the field survey were applied to ANN. The classification result accuracy of the forest vertical structure was then validated through forest vertical reference data and error matrix.

Study Area and Dataset
The study area is a part of Seogwipo-si, Jeju-do, South Korea where Mt. Halla is located in the North. The study site mainly consists of abies koreana and pittosporum forests ( Figure 1). The forest vertical structure classification map was created by the field survey in 29 November 2018 ( Figure 2). On-site survey of the forest vertical structure was conducted by human surveys in a group of two people, and the height measurement was carried out based on the height of the 2 m survey pole or recorder. The grass layer was measured on the basis of examiner's knee and waist height. Finally, the forest layer structures from the survey contents were drawn as a reference map (see Figure 2) [8]. The training/validation dataset for learning as well as the test dataset for the accuracy evaluation after the learning was selected considered the spatial distribution of three forest vertical structure layers from the reference map by human surveying. human surveying. Therefore, this study applied a method of estimating and classifying forest vertical structures using remote sensing technique and an artificial neural network (ANN).
This study aims to analyze the forest vertical structure to estimate forest aboveground biomass in Jeju Island from satellite images. In this study, the vertical structure of forests was quantified by a number of tree layers; according to the number of layers included in the canopy layer, the understory layer, and the shrub layer-layer structure was classified into single, double, or triple-layer structure. For this purpose, an input layer for forest vertical structure analysis was derived based on satellite images and applied to the ANN method. In detail, first, NDVI, NDWI, NDVI texture, and NDWI texture were prepared from KOMPSAT-3 optical images acquired from a part of Seogwipo-si in Jeju Island, South Korea. In order to estimate the height of the trees, two kinds of canopy height maps were produced from the canopy height map by subtracting the DTM from DSM. Two kinds of coherence images were produced from multi-temporal ALOS PALSAR-1 radar images. Finally, the eight input layers and the training data of the forest vertical structure data produced by the field survey were applied to ANN. The classification result accuracy of the forest vertical structure was then validated through forest vertical reference data and error matrix.

Study Area and Dataset
The study area is a part of Seogwipo-si, Jeju-do, South Korea where Mt. Halla is located in the North. The study site mainly consists of abies koreana and pittosporum forests ( Figure 1). The forest vertical structure classification map was created by the field survey in 29 November 2018 ( Figure 2). On-site survey of the forest vertical structure was conducted by human surveys in a group of two people, and the height measurement was carried out based on the height of the 2 m survey pole or recorder. The grass layer was measured on the basis of examiner's knee and waist height. Finally, the forest layer structures from the survey contents were drawn as a reference map (see Figure 2) [8]. The training/validation dataset for learning as well as the test dataset for the accuracy evaluation after the learning was selected considered the spatial distribution of three forest vertical structure layers from the reference map by human surveying.
The input layer for learning was composed of eight neurons extracted based on satellite images in this study. For the input layer, the Kompsat-3 orthoimage was acquired in the study area and the acquisition parameters of the Kompsat-3 image were listed in Table 1. The acquisition time of the image was 29 April 2017, which is in the spring season in study area. The solar altitude angle and the azimuth angle of the image is approximately 65.64° and 127.23°, respectively. The NDVI and NDWI maps and the corresponding texture maps for NDVI and NDWI were generated from the Kompsat-3 orthoimage. All input data were resampled onto a 2.8 m grid based on Kompsat-3 image's ground sample distance (GSD) by using bilinear interpolation.  The input layer for learning was composed of eight neurons extracted based on satellite images in this study. For the input layer, the Kompsat-3 orthoimage was acquired in the study area and the acquisition parameters of the Kompsat-3 image were listed in Table 1   In addition, the DTM and DSM were collected for the canopy height measurement. The DTM, with a spatial resolution of 5 m, was collected from the National Geographic Information Institute (NGII DEM); NGII DEM was generated at 1 m resolution by using national reference points, 1:5000 digital topographic maps, and LiDAR data collected by local governments, and then released after being resampled into 5 m. (Figure 3a). WorldDEM, which can be considered as DSM because the WorldDEM was created by using TerraSAR-X X-band radar interferometry, with the resolution of 12 m, was obtained from German Aerospace Center (Deutsches Zentrum für Luft-und Raumfahrt e.V., DLR) (Figure 3b). The DSM was used for the ortho-rectification and topographic correction of Kompsat-3 image, and both the DTM and DSM data were used to create canopy height maps. The DSM and DTM data were also preprocessed to 2.8 m resolution. The DTM data in Figure 3a is much smoother than the DSM data in Figure 3b, even though the resolution of DTM is more than two times higher than DSM. This is because DTM is the height of the ground terrain, while DSM is the aboveground height including the height of ground objects such as the canopies and buildings.
Finally, a total number of eight ALOS PALSAR-1 images were acquired in the descending orbit. Figure 4 shows the ortho-rectified ALOS PALSAR-1 power image. The PALSAR-1 raw data were preprocessed to create PALSAR-1 single look complex (SLC) images. Then, we generated 12 interferograms from the eight PALSAR-1 SLC images, as listed in Table 2. The incidence angle of the PALSAR-1 images was 38.7 deg, and the pixel spacing were about 3.15 and 4.68 m in the azimuth and range directions, respectively. The temporal and perpendicular baselines of the interferometric pairs are summarized in Table 2. An L-band ALOS PALSAR-1 image has a higher transmittance to objects due to a longer wavelength, so that it is able to preserve a higher coherence in forest areas than the X and C-band images [32]. Therefore, the L-band image is more suitable to classify the forest vertical structure rather than the X and C-band images.  In addition, the DTM and DSM were collected for the canopy height measurement. The DTM, with a spatial resolution of 5 m, was collected from the National Geographic Information Institute (NGII DEM); NGII DEM was generated at 1 m resolution by using national reference points, 1:5000 digital topographic maps, and LiDAR data collected by local governments, and then released after being resampled into 5 m. (Figure 3a). WorldDEM, which can be considered as DSM because the WorldDEM was created by using TerraSAR-X X-band radar interferometry, with the resolution of 12 m, was obtained from German Aerospace Center (Deutsches Zentrum für Luft-und Raumfahrt e.V., DLR) ( Figure 3b). The DSM was used for the ortho-rectification and topographic correction of Kompsat-3 image, and both the DTM and DSM data were used to create canopy height maps. The DSM and DTM data were also preprocessed to 2.8 m resolution. The DTM data in Figure 3a is much smoother than the DSM data in Figure 3b, even though the resolution of DTM is more than two times higher than DSM. This is because DTM is the height of the ground terrain, while DSM is the above-ground height including the height of ground objects such as the canopies and buildings.
Finally, a total number of eight ALOS PALSAR-1 images were acquired in the descending orbit. Figure 4 shows the ortho-rectified ALOS PALSAR-1 power image. The PALSAR-1 raw data were preprocessed to create PALSAR-1 single look complex (SLC) images. Then, we generated 12 interferograms from the eight PALSAR-1 SLC images, as listed in Table 2. The incidence angle of the PALSAR-1 images was 38.7 deg, and the pixel spacing were about 3.15 and 4.68 m in the azimuth and range directions, respectively. The temporal and perpendicular baselines of the interferometric pairs Remote Sens. 2020, 12, 797 5 of 17 are summarized in Table 2. An L-band ALOS PALSAR-1 image has a higher transmittance to objects due to a longer wavelength, so that it is able to preserve a higher coherence in forest areas than the X and C-band images [32]. Therefore, the L-band image is more suitable to classify the forest vertical structure rather than the X and C-band images.

Methods
In order to map the forest vertical structure, an input layer of the ANN approach was created by using (i) the satellite images including the Kompsat-3 optical image and PALSAR-1 radar images and (ii) existing DEM including NGII DEM (which is a DTM), WorldDEM (which is a DSM). Then, a probability map of forest vertical structure was produced by using an artificial neural network. Finally, the probability map was classified into single, double and triple vertical structure layers. The accuracy of the classified map was evaluated by comparing the field survey data through the error matrix. The overall process flow of this study is shown in Figure 5.

Methods
In order to map the forest vertical structure, an input layer of the ANN approach was created by using (i) the satellite images including the Kompsat-3 optical image and PALSAR-1 radar images and (ii) existing DEM including NGII DEM (which is a DTM), WorldDEM (which is a DSM). Then, a probability map of forest vertical structure was produced by using an artificial neural network. Finally, the probability map was classified into single, double and triple vertical structure layers. The accuracy of the classified map was evaluated by comparing the field survey data through the error matrix. The overall process flow of this study is shown in Figure 5.

NDVI and NDWI Maps
High mountain regions have sunlit and sunshade slopes depending on the solar incidence angle. The sunlit slope areas have a higher reflectivity, while the sunshade areas have a lower reflectivity in the optical imagery. To classify the forest type or forest structure efficiently, topographic normalization is required in high mountain areas. In this study, the Statistical-empirical model, among other topographic normalization models, was selected and applied to the Kompsat-3 optical image, as given by [33]: where and ρ , respectively, indicate a pixel value of the original and topographic-corrected images, ̅ is the mean value of the original image, a and b are the Statistical-empirical model parameters, and i is the incidence angle [34,35]. The Statistical-empirical model has been widely and

NDVI and NDWI Maps
High mountain regions have sunlit and sunshade slopes depending on the solar incidence angle. The sunlit slope areas have a higher reflectivity, while the sunshade areas have a lower reflectivity in the optical imagery. To classify the forest type or forest structure efficiently, topographic normalization is required in high mountain areas. In this study, the Statistical-empirical model, among other topographic normalization models, was selected and applied to the Kompsat-3 optical image, as given by [33]: where ρ and ρ h , respectively, indicate a pixel value of the original and topographic-corrected images, ρ is the mean value of the original image, a and b are the Statistical-empirical model parameters, and i is the incidence angle [34,35]. The Statistical-empirical model has been widely and successfully used to normalize the topographic effect of vegetation covers in high mountain areas. After the topographic normalization was applied to the optical image, the NDVI and NDWI index maps were created, as given by: where Green, Red and NIR denote the green, red and near-infrared (NIR) bands of the Kompsat-3 image, respectively. NDVI is an index that estimates the presence or absence of vegetation by using the difference between red and near-infrared reflectance [34]. NDWI is an index that uses the difference in spectral characteristics according to water content of tree canopies, and it varies according to the vegetation type and vitality [25]. NDVI is more sensitive to chlorophyll content, whereas NDWI is more sensitive to water content of the leaves [36,37]. In this study, the NDVI and NDWI maps were used for the forest vertical structure mapping rather than the original Kompsat-3 image. This was because the index maps had a lower topographic effect than the original image, although the topographic normalization was successfully applied. Thus, two input neurons were created by the Kompsat-3 optical satellite image. The Kompsat-3 image was ortho-rectified from the DSM data using ground control points (GCP) and rational polynomial coefficients (RPC) parameters and topography-normalized from the DTM data using Equation (1). Then, the NDVI and NDWI maps were created by using Equations (2) and (3). Figure 6a,b shows the NDVI and NDWI maps from Kompsat-3. Since the topographic effect in the NDVI and NDWI maps was almost mitigated by the Statistical-empirical model, we could not find this effect, even in the high mountain areas (see Figure 6a,b).
Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 17 successfully used to normalize the topographic effect of vegetation covers in high mountain areas. After the topographic normalization was applied to the optical image, the NDVI and NDWI index maps were created, as given by: (2).
where Green, Red and NIR denote the green, red and near-infrared (NIR) bands of the Kompsat-3 image, respectively. NDVI is an index that estimates the presence or absence of vegetation by using the difference between red and near-infrared reflectance [34]. NDWI is an index that uses the difference in spectral characteristics according to water content of tree canopies, and it varies according to the vegetation type and vitality [25]. NDVI is more sensitive to chlorophyll content, whereas NDWI is more sensitive to water content of the leaves [36,37]. In this study, the NDVI and NDWI maps were used for the forest vertical structure mapping rather than the original Kompsat-3 image. This was because the index maps had a lower topographic effect than the original image, although the topographic normalization was successfully applied. Thus, two input neurons were created by the Kompsat-3 optical satellite image. The Kompsat-3 image was ortho-rectified from the DSM data using ground control points (GCP) and rational polynomial coefficients (RPC) parameters and topography-normalized from the DTM data using Equation (1). Then, the NDVI and NDWI maps were created by using Equations (2) and (3). Figure  6a,b shows the NDVI and NDWI maps from Kompsat-3. Since the topographic effect in the NDVI and NDWI maps was almost mitigated by the Statistical-empirical model, we could not find this effect, even in the high mountain areas (see Figure 6a,b).

NDVI and NDWI Texture Maps
In addition, texture maps were, respectively, calculated from NDVI and NDWI maps. The texture map was mainly determined by the forest type and structure, forest shadow area, etc. For example, in the case of artificial forests with single-layered forest with uniform tree age and species, the texture values could be relatively constant. However, multi-layered vertical natural forests with varying ages and species had rough textures [27]. The calculation of the NDVI and NDWI normalized texture maps was generated by three main steps: (i) filtering of NDVI and NDWI maps using the Gaussian averaging filter, (ii) calculating the root mean square difference (RMSD) between the

NDVI and NDWI Texture Maps
In addition, texture maps were, respectively, calculated from NDVI and NDWI maps. The texture map was mainly determined by the forest type and structure, forest shadow area, etc. For example, in the case of artificial forests with single-layered forest with uniform tree age and species, the texture values could be relatively constant. However, multi-layered vertical natural forests with varying ages Remote Sens. 2020, 12, 797 8 of 17 and species had rough textures [27]. The calculation of the NDVI and NDWI normalized texture maps was generated by three main steps: (i) filtering of NDVI and NDWI maps using the Gaussian averaging filter, (ii) calculating the root mean square difference (RMSD) between the original and filtered maps in a given window kernel, and (iii) dividing the RMSD map into the filtered map on the basis of pixel-by-pixel calculation. The texture map (T) was defined, as given by [38]: where i and j are lines and pixels, respectively, N i and N j are, respectively, the window kernel size in the line and pixel directions, and I and I f are, respectively, the original and filtered intensity images. The Gaussian averaging filter was applied to the NDVI and NDWI maps with the purpose of noise reduction and base plane estimation [38]. Then, the median filter of size 3 × 3 was applied to the texture maps to reduce the noise, and the unit of the texture maps was converted to decibel unit, as given by: where T db denotes the texture map in the decibel unit. In the normalized texture map, texture values for smooth areas are high while texture values for rough areas are low. The NDVI and NDWI texture maps were, respectively, generated from the NDVI and NDWI maps using Equations (4) and (5). As you can see from Figure 7a,b, the smoother the index maps are, the higher the texture values are.
Remote Sens. 2020, 12, x FOR PEER REVIEW 8 of 17 original and filtered maps in a given window kernel, and (iii) dividing the RMSD map into the filtered map on the basis of pixel-by-pixel calculation. The texture map (T) was defined, as given by [38]: , where i and j are lines and pixels, respectively, Ni and Nj are, respectively, the window kernel size in the line and pixel directions, and I and If are, respectively, the original and filtered intensity images. The Gaussian averaging filter was applied to the NDVI and NDWI maps with the purpose of noise reduction and base plane estimation [38]. Then, the median filter of size 3 × 3 was applied to the texture maps to reduce the noise, and the unit of the texture maps was converted to decibel unit, as given by: where Tdb denotes the texture map in the decibel unit. In the normalized texture map, texture values for smooth areas are high while texture values for rough areas are low. The NDVI and NDWI texture maps were, respectively, generated from the NDVI and NDWI maps using Equations (4) and (5). As you can see from Figure 7a,b, the smoother the index maps are, the higher the texture values are.

Average and Standard Deviation Canopy Height Maps
The canopy height map was calculated by considering that the forest vertical structure is closely related to the tree height. Canopy height maps created by using the difference between DSM and DTM have a good performance in measuring tree height [23]. In this study, DTM and DSM, with the spatial resolution of 5 and 12 m, were used to create the canopy height map. The DTM was generated based on the national base map (digital topographic map on scale of 1:5000), and the DSM was produced by using TerraSAR-X SAR interferometry (InSAR). Thus, the DTM data had a lower accuracy in forest areas, especially for mountainous areas, and the canopy height in the DSM, created by the interferometric technology, could be underestimated because the radar signal could have penetrated leaves, stems, etc.
Two input neurons of MLP-ANN were created from existing DTM and DSM. The DSM and DTM were resampled into 2.8 m GSD, and then used to generate a canopy height map from the difference between DTM and DSM. In this study, the average canopy height (hereafter 'avgCH') and standard deviation canopy height (hereafter 'stdCH') maps were, respectively, generated by using a

Average and Standard Deviation Canopy Height Maps
The canopy height map was calculated by considering that the forest vertical structure is closely related to the tree height. Canopy height maps created by using the difference between DSM and DTM have a good performance in measuring tree height [23]. In this study, DTM and DSM, with the spatial resolution of 5 and 12 m, were used to create the canopy height map. The DTM was generated based on the national base map (digital topographic map on scale of 1:5000), and the DSM was produced by using TerraSAR-X SAR interferometry (InSAR). Thus, the DTM data had a lower accuracy in forest areas, especially for mountainous areas, and the canopy height in the DSM, created by the interferometric technology, could be underestimated because the radar signal could have penetrated leaves, stems, etc.
Remote Sens. 2020, 12, 797 9 of 17 Two input neurons of MLP-ANN were created from existing DTM and DSM. The DSM and DTM were resampled into 2.8 m GSD, and then used to generate a canopy height map from the difference between DTM and DSM. In this study, the average canopy height (hereafter 'avgCH') and standard deviation canopy height (hereafter 'stdCH') maps were, respectively, generated by using a moving window of 11 × 11 to estimate the average and standard deviation of the height of the tree communities. A size of the moving window was empirically determined based on the spatial resolution of DTM, DSM data and size of the tree communities. Figure 8 shows the avgCH and stdCH maps. The avgCH map estimated from the InSAR-derived DSM can be reduced in proportion to the real canopy height. For the most part, the avgCH values ranged from 10 to 20 m. In some areas, the avgCH values were up to 30 m. The stdCH map can be used to recognize the variation of the height of adjacent trees (Figure 8b). That is, the stdCH values were allowed for the visual identification of the difference in the forest vertical structure. These values were not higher when the avgCH values had a higher value, as seen in Figure 8b. The stdCH map had a smaller value in the regions where the forest canopy height was almost constant, while it had a higher value in the regions where the canopy height variation was severe. This is because the higher the stdCH values, the greater the difference in the height of adjacent trees in the community. In addition, higher stdCH values indicate a higher forest vertical structure [38].
Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 17 moving window of 11 × 11 to estimate the average and standard deviation of the height of the tree communities. A size of the moving window was empirically determined based on the spatial resolution of DTM, DSM data and size of the tree communities. Figure 8 shows the avgCH and stdCH maps. The avgCH map estimated from the InSAR-derived DSM can be reduced in proportion to the real canopy height. For the most part, the avgCH values ranged from 10 to 20 m. In some areas, the avgCH values were up to 30 m. The stdCH map can be used to recognize the variation of the height of adjacent trees (Figure 8b). That is, the stdCH values were allowed for the visual identification of the difference in the forest vertical structure. These values were not higher when the avgCH values had a higher value, as seen in Figure 8b. The stdCH map had a smaller value in the regions where the forest canopy height was almost constant, while it had a higher value in the regions where the canopy height variation was severe. This is because the higher the stdCH values, the greater the difference in the height of adjacent trees in the community. In addition, higher stdCH values indicate a higher forest vertical structure [38].

Mean Coherence Maps
Two types of coherence maps were used for the forest vertical structure mapping-created from SAR interferometric pairs. Speckle noise was an inevitable component in radar satellite images, which caused bias in coherence [39][40][41]. Thus, it was necessary to consider the bias in the SAR images when the coherence was calculated. The coherence bias could have been reduced by multilooking or image filtering, but the multi-look and filtering process had limitations that reduced the spatial resolution [39]. In this study, to consider the trade-off relationship, two coherence maps were used to map the forest vertical structure and the azimuth common-band filtering and the topographic and ionospheric corrections were performed to reduce the bias of the coherence values. One coherence map was calculated with filtering, while the other map was estimated without filtering. The coherence maps were calculated from multi-temporal differential interferograms, which the topographic and ionospheric distortions were mitigated in.
The detailed process was as follows: (i) the differential interferograms were generated in all interferometric pairs; 12 differential interferograms were produced by the InSAR processing using GAMMA software (GAMMA, Aktiengesellschaft-AG, Bern, Switzerland). For the processing, the azimuth common-band filtering, flat-Earth correction, topographic correction and ionospheric correction were applied, (ii) two types of coherence maps were created from the differential interferograms. One coherence map was generated by the coherence calculation from the non-filtered differential interferograms, while the other maps were generated by the coherence calculation from the filtered differential interferograms, and (iii) the mean coherence map (hereafter 'CC1') was

Mean Coherence Maps
Two types of coherence maps were used for the forest vertical structure mapping-created from SAR interferometric pairs. Speckle noise was an inevitable component in radar satellite images, which caused bias in coherence [39][40][41]. Thus, it was necessary to consider the bias in the SAR images when the coherence was calculated. The coherence bias could have been reduced by multilooking or image filtering, but the multi-look and filtering process had limitations that reduced the spatial resolution [39]. In this study, to consider the trade-off relationship, two coherence maps were used to map the forest vertical structure and the azimuth common-band filtering and the topographic and ionospheric corrections were performed to reduce the bias of the coherence values. One coherence map was calculated with filtering, while the other map was estimated without filtering. The coherence maps were calculated from multi-temporal differential interferograms, which the topographic and ionospheric distortions were mitigated in.
The detailed process was as follows: (i) the differential interferograms were generated in all interferometric pairs; 12 differential interferograms were produced by the InSAR processing using GAMMA software (GAMMA, Aktiengesellschaft-AG, Bern, Switzerland). For the processing, the azimuth common-band filtering, flat-Earth correction, topographic correction and ionospheric correction were applied, (ii) two types of coherence maps were created from the differential interferograms. One coherence map was generated by the coherence calculation from the non-filtered differential interferograms, while the other maps were generated by the coherence calculation from the filtered differential interferograms, and (iii) the mean coherence map (hereafter 'CC1') was created by averaging the non-filtered coherence maps, while the mean filtered coherence map (hereafter 'CC2') was generated by averaging the filtered coherence maps.
Specifically, to generate the CC1 map, 12 coherence maps were calculated through 3 × 3 window block calculation from the 12 differential interferograms, and then averaged on the basis of the pixel-by-pixel operation. The CC2 map was created by (i) filtering the differential interferograms using the Goldstein filter with the kernel size of 32 and the filter parameter (alpha), (ii) creating the coherence maps from the filtered interferograms through 3 × 3 window block calculation, and (iii) averaging the 12 filtered coherence maps. The CC1 map can have a bias in maintaining the spatial resolution, while the CC2 map has a loss in the spatial resolution due to the filtering effect without the bias. Figure 9 shows the CC1 and CC2 maps in the study area, which were generated and used for the two input neurons to reduce the trade-off effect between the spatial resolution and the bias of the coherence values. Since the CC1 map was not applied with SAR filtering, it may have a bias in preserving the spatial resolution. However, the bias effect of the CC2 map could be largely reduced, but the spatial resolution of the CC2 map must have been lower due to the filtering effect. In this study, the CC1 and CC2 maps were used as the ANN input neurons to account for the SAR imaging characteristics.
Remote Sens. 2020, 12, x FOR PEER REVIEW 10 of 17 created by averaging the non-filtered coherence maps, while the mean filtered coherence map (hereafter 'CC2') was generated by averaging the filtered coherence maps. Specifically, to generate the CC1 map, 12 coherence maps were calculated through 3 × 3 window block calculation from the 12 differential interferograms, and then averaged on the basis of the pixelby-pixel operation. The CC2 map was created by (i) filtering the differential interferograms using the Goldstein filter with the kernel size of 32 and the filter parameter (alpha), (ii) creating the coherence maps from the filtered interferograms through 3 × 3 window block calculation, and (iii) averaging the 12 filtered coherence maps. The CC1 map can have a bias in maintaining the spatial resolution, while the CC2 map has a loss in the spatial resolution due to the filtering effect without the bias. Figure 9 shows the CC1 and CC2 maps in the study area, which were generated and used for the two input neurons to reduce the trade-off effect between the spatial resolution and the bias of the coherence values. Since the CC1 map was not applied with SAR filtering, it may have a bias in preserving the spatial resolution. However, the bias effect of the CC2 map could be largely reduced, but the spatial resolution of the CC2 map must have been lower due to the filtering effect. In this study, the CC1 and CC2 maps were used as the ANN input neurons to account for the SAR imaging characteristics. Figure 9. Coherence maps calculated from the (a) non-filtered and (b) filtered differential interferograms.

Artificial Neural Network
In this study, a machine learning method of ANN was applied to estimate the forest vertical structure, and specifically, the multi-layer perceptron (MLP) algorithm was used for learning. The MLP algorithm drew multiple discrimination lines by adding hidden layers to solve the limitation of the linear classification of the conventional perceptron [28,42]. The MLP algorithm consisted of three layers, including input layer, hidden layer and output layer, and performed prediction and estimation by adjusting the connectivity between these layers. The MLP algorithm used an error back propagation algorithm, which meant that the final result was output by repeating the way the signal was transmitted to the hidden layer. In other words, the initial output value was compared to the true value through the initial value, and the obtained weight was corrected in the direction of reducing the error by the back propagation algorithm. In addition, the activation function of a commonly used logistic function were used to adjust the connectivity, as given by: The sigmoid function, which was selected as the activation function, adjusted the predicted value to a value between zero and one, and expressed the result as a probability of zero or one [25,34].

Artificial Neural Network
In this study, a machine learning method of ANN was applied to estimate the forest vertical structure, and specifically, the multi-layer perceptron (MLP) algorithm was used for learning. The MLP algorithm drew multiple discrimination lines by adding hidden layers to solve the limitation of the linear classification of the conventional perceptron [28,42]. The MLP algorithm consisted of three layers, including input layer, hidden layer and output layer, and performed prediction and estimation by adjusting the connectivity between these layers. The MLP algorithm used an error back propagation algorithm, which meant that the final result was output by repeating the way the signal was transmitted to the hidden layer. In other words, the initial output value was compared to the true value through the initial value, and the obtained weight was corrected in the direction of reducing the error by the back propagation algorithm. In addition, the activation function of a commonly used logistic function were used to adjust the connectivity, as given by: The sigmoid function, which was selected as the activation function, adjusted the predicted value to a value between zero and one, and expressed the result as a probability of zero or one [25,34].
In this study, the MLP-ANN was implemented by using Matlab software. The neural network was iterated for 1000 cycles per epoch and a total 500 epochs were processed with a learning rate of 0.001. The hidden layer consisted of 16 neurons and one linear output layer was created. Finally, to map the forest vertical structure in the study area, the MLP-ANN approach was applied by using the input layers, which was composed of the NDVI, NDWI, NDVI texture, NDWI texture, avgCH, stdCH, CC1, and CC2 maps.
The neural network training was performed by using the training and validation sample data shown in Figure 10. The training and validation data sample was extracted randomly based on forest vertical structure classification map produced by the field survey. The tenfold cross-validation approach was used for training and validation of the MLP-ANN method. For this, the training/validation data was divided into ten groups, and the groups were randomly divided into training data of three groups and validation data of seven groups. The training/validation approach was applied 20 times repeatedly, and the 20 hyperparameters of MLP-ANN were estimated. The hyperparameter, showing the best performance out of 20 results, was selected, and then it was used as an initial parameter value. Finally, the hyperparameter was determined by using all the training/validation data [43].
Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 17 In this study, the MLP-ANN was implemented by using Matlab software. The neural network was iterated for 1000 cycles per epoch and a total 500 epochs were processed with a learning rate of 0.001. The hidden layer consisted of 16 neurons and one linear output layer was created. Finally, to map the forest vertical structure in the study area, the MLP-ANN approach was applied by using the input layers, which was composed of the NDVI, NDWI, NDVI texture, NDWI texture, avgCH, stdCH, CC1, and CC2 maps.
The neural network training was performed by using the training and validation sample data shown in Figure 10. The training and validation data sample was extracted randomly based on forest vertical structure classification map produced by the field survey. The tenfold cross-validation approach was used for training and validation of the MLP-ANN method. For this, the training/validation data was divided into ten groups, and the groups were randomly divided into training data of three groups and validation data of seven groups. The training/validation approach was applied 20 times repeatedly, and the 20 hyperparameters of MLP-ANN were estimated. The hyperparameter, showing the best performance out of 20 results, was selected, and then it was used as an initial parameter value. Finally, the hyperparameter was determined by using all the training/validation data [43].

Results and Discussion
In this study, the MLP-ANN method was applied to map the forest vertical structure in Seogwipo-si, Jeju-do, South Korea where Mt. Halla is located in the North. The input layer of the MLP-ANN approach was composed of eight input neurons created by using the Kompsat-3 optical satellite image, existing DTM and DSM data, and ALOS-1 PALSAR-1 radar satellite image.
The forest vertical structure was classified into three layer-structures. The layer structure included all the canopy layers, the understory layer, and the shrub layer was defined by the triplelayer structure. The forest vertical structure, including two among the canopy, understory and shrub layers was defined as the double-layer structure, and the structure including only one among the layers was defined as the single-layer structure. Figure 11 shows the probability maps using the MLP-ANN approach from the training/validation data of single-layer, double-layer and triple-layer structures shown in Figure 10. Figure 11a shows the probability map of the single-layer structure, Figure 11b presents the probability map of double-layer structure, and Figure 11c shows the triplelayer structure. From the probability maps, we can recognize that (i) most pixels in the single-layer probability map have relatively low values of less than 30%, (ii) most pixels in the double-layer probability map have values between 40% and 60%, and (iii) some pixels in the triple-layer

Results and Discussion
In this study, the MLP-ANN method was applied to map the forest vertical structure in Seogwipo-si, Jeju-do, South Korea where Mt. Halla is located in the North. The input layer of the MLP-ANN approach was composed of eight input neurons created by using the Kompsat-3 optical satellite image, existing DTM and DSM data, and ALOS-1 PALSAR-1 radar satellite image.
The forest vertical structure was classified into three layer-structures. The layer structure included all the canopy layers, the understory layer, and the shrub layer was defined by the triple-layer structure. The forest vertical structure, including two among the canopy, understory and shrub layers was defined as the double-layer structure, and the structure including only one among the layers was defined as the single-layer structure. Figure 11 shows the probability maps using the MLP-ANN approach from the training/validation data of single-layer, double-layer and triple-layer structures shown in Figure 10. Figure 11a shows the probability map of the single-layer structure, Figure 11b presents the probability map of double-layer structure, and Figure 11c shows the triple-layer structure. From the probability maps, we can recognize that (i) most pixels in the single-layer probability map have relatively low values of less than 30%, (ii) most pixels in the double-layer probability map have values between 40% and 60%, and (iii) some pixels in the triple-layer probability map have probability values higher than 80%, while some pixels have probability values lower than 20%. The low values in the single-layer probability maps illustrated that the single-layer forest almost did not exist in the study area. In the triple-layer probability map, the probability values greater than 80% indicate that the triple-layer forest is likely to be in the higher value area, while the probability values higher than 20% mean that there is almost no probability that the triple-layer forest is likely to be in the lower value area. This may mean that the possibility of triple-layer forest identification could be clearly distinguished. The probability of the double-layer forest ranged between 40% and 60%. The probability density function (pdf) of the double-layer forest was closer to the Gaussian distribution than the single and triple-layer forests. This may show that the double-layer forest can be dominant in the study area. Additionally, this may also indicate that the double-layer forest is difficult to distinguish clearly from the single and triple-layer forests.
Remote Sens. 2020, 12, x FOR PEER REVIEW 12 of 17 probability map have probability values higher than 80%, while some pixels have probability values lower than 20%. The low values in the single-layer probability maps illustrated that the single-layer forest almost did not exist in the study area. In the triple-layer probability map, the probability values greater than 80% indicate that the triple-layer forest is likely to be in the higher value area, while the probability values higher than 20% mean that there is almost no probability that the triple-layer forest is likely to be in the lower value area. This may mean that the possibility of triple-layer forest identification could be clearly distinguished. The probability of the double-layer forest ranged between 40% and 60%. The probability density function (pdf) of the double-layer forest was closer to the Gaussian distribution than the single and triple-layer forests. This may show that the doublelayer forest can be dominant in the study area. Additionally, this may also indicate that the doublelayer forest is difficult to distinguish clearly from the single and triple-layer forests.  Figure 12 shows the vertical structure map classified from the single, double and triple-layer forest maps of Figure 11 using the maximum operation. Very few single-layer forests existed in the vertical structure map because the single-layer forest map has lower possibility values. Double-layer forests were dominant in the study area, and triple-layer forests were found in the areas with high possibility values in the triple-layer forest map. The classification map shows that the percentage of single-layer forests were about 0.5%, the double-layer forests were about 61.1%, and the triple layer forest were about 38.4%, respectively. Due to the fact that natural forests with various vertical structures were mostly present in the study area, it would be expected that the double and triple-layer forests were dominant, while the single-layer forests were rare.  Figure 12 shows the vertical structure map classified from the single, double and triple-layer forest maps of Figure 11 using the maximum operation. Very few single-layer forests existed in the vertical structure map because the single-layer forest map has lower possibility values. Double-layer forests were dominant in the study area, and triple-layer forests were found in the areas with high possibility values in the triple-layer forest map. The classification map shows that the percentage of single-layer forests were about 0.5%, the double-layer forests were about 61.1%, and the triple layer forest were about 38.4%, respectively. Due to the fact that natural forests with various vertical structures were mostly present in the study area, it would be expected that the double and triple-layer forests were dominant, while the single-layer forests were rare.
To evaluate the classification result, the test dataset was extracted from the field survey based forest vertical structure map of Figure 2, excluding the training/validation dataset. The error matrix was used to evaluate the classification accuracy from the test data. The overall accuracy estimated from the error matrix was about 65.7% (Table 3). The overall accuracy was not very high. Additionally, the fact that this test was performed for a limited mountain region also needs to be considered, as shown in Figure 3. Nevertheless, this may indicate the limitation of the forest vertical structure mapping using satellite images and topographic data. If full waveform Lidar data can be used for the classification using the ANN approach, the accuracy would be higher. However, the full waveform Lidar data were not available in most cases, because it is not cost effective. Since the optical and radar satellite images and topographic data, including DSM and DTM, are available in most cases, mapping only by using the data enables us to create forest vertical structure maps time-and cost-effectively. Thus, from the result, it can be concluded that the forest vertical structure map can be created by using the satellite and topographic data with an overall accuracy of about 65.7% time and cost-effectively. forest maps of Figure 11 using the maximum operation. Very few single-layer forests existed in the vertical structure map because the single-layer forest map has lower possibility values. Double-layer forests were dominant in the study area, and triple-layer forests were found in the areas with high possibility values in the triple-layer forest map. The classification map shows that the percentage of single-layer forests were about 0.5%, the double-layer forests were about 61.1%, and the triple layer forest were about 38.4%, respectively. Due to the fact that natural forests with various vertical structures were mostly present in the study area, it would be expected that the double and triple-layer forests were dominant, while the single-layer forests were rare.  The user and producer accuracies of the single-layer forests were about 4.3% and 3.7%, respectively. The accuracies were very low. Thus, we can say that identifying the single-layer forests was not successful. This is because the number of the training and test data in the single-layer forests were not high enough to train the MLP-ANN model (see Figure 10). The single-layer forests were classified into double-layer (79.6%) and triple-layer (16.7%) forests. It means that the single and double-layer forests could not be separated in this study. The user and producer accuracies of the double-layer forests were, respectively, about 78.0% and 70.9%. The double-layer accuracy was higher than the single and triple-layer accuracy. This is because the MLP-ANN model parameters of the double-layer forests were well trained. Nevertheless, about 24.0% of the double-layer forests were mis-classified as a triple-layer forest. Finally, the user and producer accuracies of the triple-layer forests were, respectively, about 45.2% and 55.0%. The accuracy in the triple-layer forest was much higher than in the single-layer forest, but lower than in the triple-layer forest. About 30.5% of the triple-layer forest was mis-classified as a double-layer forest. Thus, it means that 30.5% of the triple-layer forest was very similar to the pattern of the double-layer forest.
The results clearly show that the stratification structure of forests can be determined using satellite images. As mentioned above, information on the vertical structure, along with the horizontal structure of the forest, is essential to estimate the exact forest aboveground biomass [6]. The result map of forest structure in this study shows relatively high accuracy in the double and triple-layer, which have a high ratio in the study area. Therefore, the forest vertical structure map can be used to estimate the forest aboveground biomass more accurately by reflecting the biomass difference of each layer structure. Furthermore, the map enables us to identify forest vitality and environmental impacts.
This study presents limitations in the timing of data collection discrepancies and a lack of training data. In addition, many previous studies that have conducted an analysis of forest vertical structure are mostly based on LiDAR data [29,30], which are subjected to regional, time and cost limitations. In the future, more accurate results could be expected if satellite images from the same time period and additional forest vertical reference data are used for research. Comparison analysis with classification results, using the recently used deep learning technique, may increase the applicability of machine learning in forest research. By complementing the limitations of this study, it is expected that better results will be obtained in the future analysis of forest vertical structure and estimation of forest carbon cycle through remote sensing and machine learning. The proposed forest vertical structure can be used as basic data to establish a plan for coping with global warming by enabling more accurate carbon fixation and forest aboveground biomass estimation than conventional methods.

Conclusions
The purpose of this study is forest vertical structure analysis for forest aboveground biomass estimation in Jeju Island. Machine learning technique of the MLP-ANN model was applied based on remote sensing data of by using the Kompsat-3 optical satellite image, existing DTM and DSM data, and ALOS-1 PALSAR-1 radar satellite image. When using machine learning techniques, building accurate input data may result in a forest vertical structure for accurate carbon uptake estimates. Therefore, the input data was constructed by producing NDVI, NDWI, NDVI texture, NDWI texture, avgCH, stdCH, CC1, and CC2 maps. The input layer was produced in consideration of the texture of the canopy, the difference in tree height, and the image texture due to the difference in tree height. As a result of classifying the input layers to ANN, the double-layer structure, which occupies the most area and the second highest carbon absorption rate, showed a relatively accurate 70.9% classification in producer accuracy, while the single and triple-layer structures showed approximately 3.7% and 55.0% of producer accuracy.
Understanding horizontal and vertical structures of forest is an important factor in estimating forest aboveground biomass. The results of this study show the possibility of more accurate estimation of forest aboveground biomass using various types of satellite images through the construction of forest vertical structure inventory. Specifically, in order to estimate a forest carbon absorption by detecting the change of vertical structure in a national-scale forest, it is essential to use satellite imagery with periodical acquisition. Accordingly, it is possible to estimate forest aboveground biomass more precisely by using a vertical structure constructed by using satellite images along with the species and horizontal structure data of forests.
Forest structure, which is a key factor in forest aboveground biomass estimation, is highly uncertain due to the nature of forests in various forms over a wide area. In particular, the use of remote sensing images is essential for forests distributed in inaccessible regions, since there is a limitation in constructing forest inventory using field investigation. Additionally, machine learning techniques could be used to estimate the structure of forests, as applied in this study. Thus, the combination of remote sensing and machine learning techniques to classify and analyze large amounts of data is very effective in forestry, especially for a national scale analysis.
Remote sensing data can be used to generate forest vertical structure maps for large areas on the national scale. NDVI, NDWI, NDVI texture and NDWI texture variables could be generated from optical remote sensing data; canopy height maps and coherence maps could be generated from SAR imagery and existing DTM and DSM data. Future research should include the mapping of forest vertical structures by each layer. Various types of remote sensing data acquired at the same time could be used to create forest vertical information more accurately. The study results represented that continuing research in these areas, by using the capability of remote sensing data to estimate forest vertical structures in single and multi-layers, could lead to a considerable improvement in forestry.

Conflicts of Interest:
The authors declare no conflict of interest.