Machine Learning for Tree Species Classification Using Sentinel-2 Spectral Information, Crown Texture, and Environmental Variables

The most recent forest-type map of the Korean Peninsula was produced in 1910. That of South Korea alone was produced since 1972; however, the forest type information of North Korea, which is an inaccessible region, is not known due to the separation after the Korean War. In this study, we developed a model to classify the five dominant tree species in North Korea (Korean red pine, Korean pine, Japanese larch, needle fir, and Oak) using satellite data and machine-learning techniques. The model was applied to the Gwangneung Forest area in South Korea; the Mt. Baekdu area of China, which borders North Korea; and to Goseong-gun, at the border of South Korea and North Korea, to evaluate the model’s applicability to North Korea. Eighty-three percent accuracy was achieved in the classification of the Gwangneung Forest area. In classifying forest types in the Mt. Baekdu area and Goseong-gun, even higher accuracies of 91% and 90% were achieved, respectively. These results confirm the model’s regional applicability. To expand the model for application to North Korea, a new model was developed by integrating training data from the three study areas. The integrated model’s classification of forest types in Goseong-gun (South Korea) was relatively accurate (80%); thus, the model was utilized to produce a map of the predicted dominant tree species in Goseong-gun (North Korea).


Introduction
Forest-type maps show the distribution of features within forested areas. They contain information on various forest attributes, such as forest type, physiognomy, plant species, diameter of breast height (DBH) class, age class, crown density, and stand height. Additionally, they are a source of data for national-scale topographic, soil, and geological maps [1].
Forest-type maps of South Korea were produced at a scale of 1:25,000 from 1972 to 2012 and have been produced at a scale of 1:5000 since 2012. The scale of South Korean forest-type maps is quite detailed compared to those of Japan (1:5000, national forests only), Canada (1:20,000, managed forests only), France (1:25,000), and Sweden (1:10,000). However, no forest maps or forest-type maps have covered the entire Korean Peninsula since the first Korean forest-distribution map was produced in 1910 [2].
Information on the composition and distribution of species in forests is essential for the sustainable management of large forest areas [3,4]. Additionally, this information promotes the accurate assessment of forest resources, such as growing stock and biomass; the establishment of a national forest

Data
The data used for this study were drawn from satellite images and geographic information system (GIS) data. Sentinel-2 and very high-resolution satellite (VHRS) images from GeoEye-1 and WorldView-3 were employed. Sentinel-2 images were used to acquire spectral information on target species, and VHRS images were used to acquire crown characteristics and textural information ( Table  1). Regarding GIS data, the Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) was used to acquire topographic information on the growth environment of the target species. Additionally, 1:5000-scale forest-type maps produced by Korea Forest Service were used to generate training data for species classification and validation. We also generated training and validation data for the Mt. Baekdu area and validated the accuracy of tree-species classification accuracy in this area.

Data
The data used for this study were drawn from satellite images and geographic information system (GIS) data. Sentinel-2 and very high-resolution satellite (VHRS) images from GeoEye-1 and WorldView-3 were employed. Sentinel-2 images were used to acquire spectral information on target species, and VHRS images were used to acquire crown characteristics and textural information (Table 1). Regarding GIS data, the Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) was used to acquire topographic information on the growth environment of the target species. Additionally, 1:5000-scale forest-type maps produced by Korea Forest Service were used to generate training data for species classification and validation. We also generated training and validation data for the Mt. Baekdu area and validated the accuracy of tree-species classification accuracy in this area.

Methodology
In this study, to develop a tree species classification model, not only information on leaf reflectance spectra but also crown texture and the tree growth environment were used. Even though each species showed a different spectral reflection value, the differences were not numerically large [10]. Thus, information on growth and texture were used to provide additional information on reflectance spectra to advance the classification model.
The Gwangneung Forest was investigated to establish the basic structure of the species classification model. Sentinel-2 images were preprocessed to assess the spectral characteristics of each species. Top of atmosphere (TOA) reflection values were drawn from Sentinel-2 Level 1C images. Thus, atmospheric corrections were performed to calculate the surface reflection value of several species using Sen2Cor [12], a Sentinel-2 atmospheric correction algorithm provided by the European Space Agency (ESA). The spectral separation between tree species was evaluated using the Jeffries-Matusita (JM) distance algorithm to select the optimal time for species classification among Sentinel-2 images over multiple periods. JM distances range from 0 (i.e., no separation) to 1.414 (i.e., absolute separation), and this method is commonly used to quantify the degree of separation [13,14]. Next, the elevation, slope, and aspect maps were calculated from DEM data and utilized to analyze the typical growth environments of different species.
Textural information was generated from Sentinel-2 and VHRS images using the gray-level co-occurrence matrix (GLCM) technique to reflect the crown textural characteristics of different species (e.g., coniferous and broad-leafed trees). To select the optimal window size when creating the GLCM, 15 window sizes ranging from 3 × 3 (30 m × 30 m) to 59 × 59 (590 m × 590 m) were selected; then, the variance was compared, and the optimal window size was selected. Additionally, the GLCM was refined based on a comparison of the classification accuracies of the Sentinel-2 and VHRS image-based textural information. The model's applicability to North Korea was evaluated through application of the model to the Mt. Baekdu area and Goseong-gun. An integrated model was also developed using the combined training data generated from the Gwangneung Forest, Mt. Baekdu area, and Goseong-gun. Based on the integrated model, a map of predicted tree species in Goseong-gun (North Korea) was produced ( Figure 2).

Analysis of Spectral Characteristics and Separability
To evaluate the spectral characteristics of the target species, spectral information from April to October, when leaves began to grow and fall, respectively, was analyzed. To acquire this information, random points were generated based on the forest-type map, and surface reflectance spectra were extracted from Sentinel-2 images after atmospheric correction. The mean values and standard Remote Sens. 2020, 12, 2049 5 of 21 deviations for each species were calculated and compared, and the spectral separation between species was evaluated based on the JM distance algorithm to select the best time for species classification [13].

Analysis of Spectral Characteristics and Separability
To evaluate the spectral characteristics of the target species, spectral information from April to October, when leaves began to grow and fall, respectively, was analyzed. To acquire this information, random points were generated based on the forest-type map, and surface reflectance spectra were extracted from Sentinel-2 images after atmospheric correction. The mean values and standard deviations for each species were calculated and compared, and the spectral separation between species was evaluated based on the JM distance algorithm to select the best time for species classification [13].

Analysis of Crown Textural Information
Currently, forest-type maps of South Korea are produced by interpreting digital aerial photographs and confirming the distribution and attributes of forest types through fieldwork. This method relies heavily on the crown characteristics of each species, such as texture, color tone, and pattern, to differentiate species. The crown of the Korean red pine is umbrella-shaped and has a star shape, with branches spreading in all directions and a low density of leaves attached to each branch. The Korean pine's crown is umbrella-shaped, similar to that of the Korean red pine, but it has a white, droplet-like ridge at the end of each branch, which resembles snowflakes. It also has a less saturated silvery teal color than most other coniferous species. The crown shape of the Japanese larch is conical, with branches converging toward the center like a rose and extending skyward. The higher the tree's age class, the more widely its branches will spread on all sides.
The crown shapes of the Korean fir (Abies koreana Wilson) and Khingan fir (Abies nephrolepis (Trautv. ex Maxim.) Maxim.), which are classified as needle firs in the current forest-type map, are

Analysis of Crown Textural Information
Currently, forest-type maps of South Korea are produced by interpreting digital aerial photographs and confirming the distribution and attributes of forest types through fieldwork. This method relies heavily on the crown characteristics of each species, such as texture, color tone, and pattern, to differentiate species. The crown of the Korean red pine is umbrella-shaped and has a star shape, with branches spreading in all directions and a low density of leaves attached to each branch. The Korean pine's crown is umbrella-shaped, similar to that of the Korean red pine, but it has a white, droplet-like ridge at the end of each branch, which resembles snowflakes. It also has a less saturated silvery teal color than most other coniferous species. The crown shape of the Japanese larch is conical, with branches converging toward the center like a rose and extending skyward. The higher the tree's age class, the more widely its branches will spread on all sides.
The crown shapes of the Korean fir (Abies koreana Wilson) and Khingan fir (Abies nephrolepis (Trautv. ex Maxim.) Maxim.), which are classified as needle firs in the current forest-type map, are conical; in the case of the Korean fir, the top of the tree is centered. It has a white apex and the branches are densely layered like a pinwheel. In the case of the Khingan fir, the branches gather around the top of the tree, and most are dense, forming a stable horn shape [15]. The crown shape of the oak is irregular, and the texture is uniform. It exhibits a lighter color than conifers, but it is somewhat darker than other broad-leafed trees. The crown is also dense, and exhibits the same color over a large area [16]. Textural information can help improve the accuracy of species classifications [10,17]. In this study, the GLCM technique, which is widely used in the remote sensing field, was used to numerically reflect qualitatively expressed crown-shape characteristics in the stereoscopic interpretation of aerial photographs [15] to produce distribution maps of tree species [10,[18][19][20][21][22][23][24][25][26]. The GLCM represents the distance and angular relationship between sub-regions of an image of a specified size. Texture is quantified based on the frequency at which a pair of grayscale pixel brightness values in a user-defined mobile kernel occurs. In this study, angular second moment (ASM), contrast (CON), dissimilarity (DIS), entropy (ENT), homogeneity (HOM), mean (MEAN), and variance (VARIANCE) were considered. A series of GLCM texture parameters were calculated according to the following equations [14]: (1) where quant k is the quantization level of band k (e.g., 2 8 = 0 to 255) and h c (i,j) is the (i,j)th entry in a spatially dependent angular brightness matrix. Textural feature analysis was performed using the 'glcm' package in R v. 3.6.1 [27]. Texture features of the growth period (May) and the non-growth period (October) were generated. The classification accuracy can vary depending upon the window size set when generating textural information [28][29][30]. Thus, the optimal window size was selected as described in Section 3.2.

Analysis of Growth Environment
Preferred growth environments differ between species, and various factors shape each growth environment. This study first investigated the elevation, slope, and aspect of various species' growth environments. Korean red pine usually grows below 1300 m in mountainous regions, except for in the northern highlands and high mountain peaks [31]. Korean pine grows mainly on the ridges of high mountain areas north of Mt. Jiri, which have elevations > 1000 m [31]. Needle fir naturally grows on the high mountain ridges or in valleys in the north-central part of the study area above 1000 m [31]. Oak is distributed nationwide below 1800 m [31]. However, pure forests of Mongolian oak (Quercus mongolica Fisch. ex Ledeb.) exist in the mountains at high elevations [31][32][33]. Using the locality information within the training data extracted from the 1:5000-scale forest-type map, the elevation, slope, and aspect information of the corresponding points were extracted from the DEM, and the means and standard deviations of each species were calculated and compared.

Development of Species Classification Algorithm
A random forest (RF) model was used for the tree classification algorithm. The RF model was developed by Breiman and Cutler [34], and we confirmed the suitability of the RF model as a machine-learning technique for species classification in a previous study in which we classified Korean pine and Japanese larch with an accuracy of 98% [10]. To define the classification value of a pixel, a RF generates multiple decision trees using attribute values, such as spectral information, elevation, slope, and aspect, related to a pixel. Each decision tree assigns the corresponding pixel to a specific classification value and, finally, votes to classify the pixel according to its most common classification value [14,[34][35][36][37]. Simultaneously,~70% of the training data were extracted and used to train the model, and the remaining 30% was treated as "out-of-bag" (OOB) data and used to evaluate the model. The model was evaluated by dividing the number of misclassifications by the total number of observations; this approach can be used to select the optimal model [38][39][40]. Random forests can also be used to determine significant variables and to calculate their significance based on their Gini index values [35,41]. The Sentinel-2 spectral bands, elevation, aspect, slope, and texture features (GLCM) were used as input data for the RF.

Spectral Characteristics and Separability
The mean spectral characteristics of the target species within the April-October period as well as their standard deviations are illustrated in Figure 3. Overall, the target species showed no significant difference in separation within the April-October period. However, in May, the degree of separation observed was relatively high; thus, images corresponding to this period were used for the classification. However, it was confirmed that the spectral separation between the species was not significant, indicating the need for the inclusion of information on growth and textural information, in addition to spectral information ( Figure 4).
Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 22 Korean pine and Japanese larch with an accuracy of 98% [10]. To define the classification value of a pixel, a RF generates multiple decision trees using attribute values, such as spectral information, elevation, slope, and aspect, related to a pixel. Each decision tree assigns the corresponding pixel to a specific classification value and, finally, votes to classify the pixel according to its most common classification value [14,[34][35][36][37]. Simultaneously, ~70% of the training data were extracted and used to train the model, and the remaining 30% was treated as "out-of-bag" (OOB) data and used to evaluate the model. The model was evaluated by dividing the number of misclassifications by the total number of observations; this approach can be used to select the optimal model [38][39][40]. Random forests can also be used to determine significant variables and to calculate their significance based on their Gini index values [35,41]. The Sentinel-2 spectral bands, elevation, aspect, slope, and texture features (GLCM) were used as input data for the RF.

Spectral Characteristics and Separability
The mean spectral characteristics of the target species within the April-October period as well as their standard deviations are illustrated in Figure 3. Overall, the target species showed no significant difference in separation within the April-October period. However, in May, the degree of separation observed was relatively high; thus, images corresponding to this period were used for the classification. However, it was confirmed that the spectral separation between the species was not significant, indicating the need for the inclusion of information on growth and textural information, in addition to spectral information ( Figure 4).

Crown Texture
To select texture factors with low redundancy, the separability was evaluated using the JM distance. The results of our analyses show that all the factors were highly separable; thus, they were all used for the classification. The three most significant texture feature-related factors (mean, variance, and homogeneity), which increased classification accuracy, were compared. The mean was high when the number of pairs of pixels with significant differences in contrast was high. The variance was high when the difference in the brightness of the pairs of pixels was significant. Finally, homogeneity was found to be associated with both regional uniformity as well as the uniformity of each pixel in the matrix, and it was higher when the number of GLCM elements located on a diagonal line was higher [42].
The variance of textural information between 15 different window sizes was compared to select the optimal window size. As the window size increased, the distributions of the mean, variance, and homogeneity in the same species gradually decreased ( Figure 5). In other words, as the window size increased, the intra-species textural characteristics achieved increased homogeneity. However, this

Crown Texture
To select texture factors with low redundancy, the separability was evaluated using the JM distance. The results of our analyses show that all the factors were highly separable; thus, they were all used for the classification. The three most significant texture feature-related factors (mean, variance, and homogeneity), which increased classification accuracy, were compared. The mean was high when the number of pairs of pixels with significant differences in contrast was high. The variance was high when the difference in the brightness of the pairs of pixels was significant. Finally, homogeneity was found to be associated with both regional uniformity as well as the uniformity of each pixel in the matrix, and it was higher when the number of GLCM elements located on a diagonal line was higher [42].
The variance of textural information between 15 different window sizes was compared to select the optimal window size. As the window size increased, the distributions of the mean, variance, and homogeneity in the same species gradually decreased ( Figure 5). In other words, as the window size increased, the intra-species textural characteristics achieved increased homogeneity. However, this did not mean that the largest window size, 59 × 59, was optimal. Window sizes of less than 51 × 51 are generally used in remote sensing applications [26]. If the window size is similar to the area of the target forest, textural information of the forest cannot be adequately extracted [26]. The smallest stand size of a target species was 23.5 ha (less than a 50 × 50 window size) in the Gwangneung Forest. Thus, a window size smaller than this area was selected. Further, the non-growth-related variance of the five species was homogeneous, indicating regional uniformity in the GLCM; convergence occurred at a window size of 31 × 31 ( Figure 5). Overall, the textural information of the five species displayed relatively low variance, and textural information of a similar quality was generated among the species.
Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 22 did not mean that the largest window size, 59 × 59, was optimal. Window sizes of less than 51 × 51 are generally used in remote sensing applications [26]. If the window size is similar to the area of the target forest, textural information of the forest cannot be adequately extracted [26]. The smallest stand size of a target species was 23.5 ha (less than a 50 × 50 window size) in the Gwangneung Forest. Thus, a window size smaller than this area was selected. Further, the non-growth-related variance of the five species was homogeneous, indicating regional uniformity in the GLCM; convergence occurred at a window size of 31 × 31 ( Figure 5). Overall, the textural information of the five species displayed relatively low variance, and textural information of a similar quality was generated among the species.  The standard deviation of the variance among species was lowest at a window size of 31 × 31 ( Figure 6). The results of the GLCM separability evaluation among species using the JM distance show that overall separability decreased as the window size increased (Figure 7). In particular, the separability between Korean pine and oak increased until a window size of 31 × 31 was reached. Therefore, a window size of 31 × 31, which generated textural information of a uniform quality between species and at which the separability between species was relatively high, was ultimately selected for the analysis.
Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 22 Figure 8. Comparison of textural information between growing and non-growing periods. GLCM mean G and GLCM mean NG show mean growing and non-growing season crown texture, respectively. GLCM variance G and GLCM variance NG show variance growing and non-growing season crown texture, respectively. GLCM homogeneity G and GLCM homogeneity NG show homogeneity growing and non-growing season crown texture, respectively. Abbreviations: G, growth; NG, non-growth.

Comparison of Growth Environments
The mean elevation of the area over which Korean red pine was distributed in the study area was 185 ± 71 m; for Korean pine, it was 192 ± 76 m, while for Japanese larch, it was 196 ± 69 m, for needle fir, 145 ± 58 m, and for oak, 211 ± 82 m. Needle firs were found to be distributed at the lowest elevation and oak at the highest (Table 2, Figure 9). Previous studies indicated that Korean pines and needle firs were distributed in high mountains, but they were distributed in areas below 270 m in the study area. Based on the attributes presented in the examined forest-type map, both the Korean pine and needle fir areas at the study site were identified as artificial forests. Figure 8. Comparison of textural information between growing and non-growing periods. GLCM mean G and GLCM mean NG show mean growing and non-growing season crown texture, respectively. GLCM variance G and GLCM variance NG show variance growing and non-growing season crown texture, respectively. GLCM homogeneity G and GLCM homogeneity NG show homogeneity growing and non-growing season crown texture, respectively. Abbreviations: G, growth; NG, non-growth.

Comparison of Growth Environments
The mean elevation of the area over which Korean red pine was distributed in the study area was 185 ± 71 m; for Korean pine, it was 192 ± 76 m, while for Japanese larch, it was 196 ± 69 m, for needle fir, 145 ± 58 m, and for oak, 211 ± 82 m. Needle firs were found to be distributed at the lowest elevation and oak at the highest (Table 2, Figure 9). Previous studies indicated that Korean pines and needle firs were distributed in high mountains, but they were distributed in areas below 270 m in the study area. Based on the attributes presented in the examined forest-type map, both the Korean pine and needle fir areas at the study site were identified as artificial forests. According to a survey of the relevant literature, the Korean red pine is relatively environmentally selective. It is mainly distributed on south-facing slopes, as these are relatively dry and barren compared to north-facing slopes [43]. Korean pines appear most commonly on southern-westerly aspects, but they are considered to grow well on all slopes. Moreover, the Korean pine is well-adapted to non-eroded mountainous areas with smooth drainage and tends to avoid dry and windy ridges [44]. Japanese larches-the most selective tree-grow most vigorously in sunny areas, so north-facing slopes should be avoided during afforestation [45]. Khingan and Korean firs (needle firs) are distributed over a more substantial portion of northern slopes, where moisture conditions are more suitable than on southern slopes [46]. Oak is normally an intolerant tree and is thought to prefer moderately to slightly humid areas. Southern slopes offer favorable light conditions, but the relatively dry soil on such slopes has a negative effect on oaks' growth. In contrast, northern slopes provide favorable environments in terms of humidity and nutrient conditions, though light conditions are less advantageous [47]. 12.0 ± 6.0 16.6 ± 6.8 Figure 9. Comparison of species growth environments (elevation, aspect, slope).
According to a survey of the relevant literature, the Korean red pine is relatively environmentally selective. It is mainly distributed on south-facing slopes, as these are relatively dry and barren compared to north-facing slopes [43]. Korean pines appear most commonly on southernwesterly aspects, but they are considered to grow well on all slopes. Moreover, the Korean pine is well-adapted to non-eroded mountainous areas with smooth drainage and tends to avoid dry and windy ridges [44]. Japanese larches-the most selective tree-grow most vigorously in sunny areas, so north-facing slopes should be avoided during afforestation [45]. Khingan and Korean firs (needle firs) are distributed over a more substantial portion of northern slopes, where moisture conditions are more suitable than on southern slopes [46]. Oak is normally an intolerant tree and is thought to prefer moderately to slightly humid areas. Southern slopes offer favorable light conditions, but the relatively dry soil on such slopes has a negative effect on oaks' growth. In contrast, northern slopes provide favorable environments in terms of humidity and nutrient conditions, though light conditions are less advantageous [47].
In the study area, Korean red pine was mainly distributed on southeasterly slopes (25%), showing a similar pattern to that described in the literature. Korean pine was distributed evenly on all slopes but was present in the highest proportion on the east slope (21%) due to the relatively favorable moisture conditions (Figure 9). Japanese larch was mainly distributed on the eastern slope (19%), likely due to the high availability sunlight. Needle firs were mainly distributed on the southern slope (21%), unlike the distribution described in the literature-this discrepancy may be attributable to mistaken locality selection during afforestation or to competition from other species after afforestation. Oak was distributed on all slopes, but was most abundant on the western slope (17%). The mean slope of the area over which Korean red pine was distributed was determined to be 14.8° ± 5.9°; for Korean pine, it was 15.1° ± 6.3°, while for Japanese larch it was 15.1° ± 5.9°, for needle fir, 12.0° ± 6.0°, and for oak, 16.6° ± 6.8° (Table 2, Figure 9). Needle firs were distributed over areas with the lowest slope; the distribution area of oak had the highest slope. It was assumed that Korean pine, Japanese larch, and needle fir were planted at relatively low altitudes and slopes.

Gwangneung Forest Area
Using the proposed model, the overall accuracy of species classification was 83%, and the kappa index (0.83) also showed a relatively high accuracy ( Table 3). The results for Korean red pine show a producer accuracy of 81% and a user accuracy of 86%; the model was determined to have underestimated the abundance of Korean red pine. Korean pine was overestimated, with 79% In the study area, Korean red pine was mainly distributed on southeasterly slopes (25%), showing a similar pattern to that described in the literature. Korean pine was distributed evenly on all slopes but was present in the highest proportion on the east slope (21%) due to the relatively favorable moisture conditions (Figure 9). Japanese larch was mainly distributed on the eastern slope (19%), likely due to the high availability sunlight. Needle firs were mainly distributed on the southern slope (21%), unlike the distribution described in the literature-this discrepancy may be attributable to mistaken locality selection during afforestation or to competition from other species after afforestation. Oak was distributed on all slopes, but was most abundant on the western slope (17%). The mean slope of the area over which Korean red pine was distributed was determined to be 14.8 • ± 5.9 • ; for Korean pine, it was 15.1 • ± 6.3 • , while for Japanese larch it was 15.1 • ± 5.9 • , for needle fir, 12.0 • ± 6.0 • , and for oak, 16.6 • ± 6.8 • (Table 2, Figure 9). Needle firs were distributed over areas with the lowest slope; the distribution area of oak had the highest slope. It was assumed that Korean pine, Japanese larch, and needle fir were planted at relatively low altitudes and slopes.

Gwangneung Forest Area
Using the proposed model, the overall accuracy of species classification was 83%, and the kappa index (0.83) also showed a relatively high accuracy ( Table 3). The results for Korean red pine show a producer accuracy of 81% and a user accuracy of 86%; the model was determined to have underestimated the abundance of Korean red pine. Korean pine was overestimated, with 79% producer accuracy and 67% user accuracy. Japanese larch was underestimated, with 78% producer accuracies and 81% user accuracies. The highest accuracy was achieved for needle fir, with a 92% producer accuracy and 94% user accuracy. Oak was underestimated, with 84% producer accuracy and 85% user accuracy.
Overall, the model's classifications were similar to those of the forest-type map ( Figure 10). However, the error was relatively high for the small-sized oak stand in the lower left of the figure, and an undetected portion was also identified along the upper right boundary. When the textural information extracted from VHRS images was used to classify species, an accuracy of 76% was achieved. When textural information extracted from Sentinel-2 images was used, the accuracy was 83%. This may reflect the high levels of noise in the VHRS images, which can reduce classification accuracy [48]. Based on the results, we determined that the differing spatial resolutions of these images did not strongly impact the model's sensitivity to seasonal differences in the target species. Therefore, due to Sentinel-2's simplified data and lower analytical cost (large observation width and free), Sentinel-2 images were used to unify materials and classify species in the Mt. Baekdu area and Goseong-gun. Overall, the model's classifications were similar to those of the forest-type map ( Figure 10). However, the error was relatively high for the small-sized oak stand in the lower left of the figure, and an undetected portion was also identified along the upper right boundary. When the textural information extracted from VHRS images was used to classify species, an accuracy of 76% was achieved. When textural information extracted from Sentinel-2 images was used, the accuracy was 83%. This may reflect the high levels of noise in the VHRS images, which can reduce classification accuracy [48]. Based on the results, we determined that the differing spatial resolutions of these images did not strongly impact the model's sensitivity to seasonal differences in the target species. Therefore, due to Sentinel-2's simplified data and lower analytical cost (large observation width and free), Sentinel-2 images were used to unify materials and classify species in the Mt. Baekdu area and Goseong-gun.

Mt. Baekdu Area
Needle firs were not present at the study site in the Mt. Baekdu area; therefore, only Korean red pine, Korean pine, Japanese larch, and oak were classified. The overall accuracy of species classification was 91%, and the kappa index (0.91) also indicated a relatively high accuracy (Table 4). For Korean red pine, the producer accuracy was 93%, and the user accuracy was 96%; thus, the model underestimated the abundance of Korean red pine. Korean pine was also underestimated, with a 94% producer accuracy and 97% user accuracy, but the highest overall accuracy was achieved for this species. Japanese larch was overestimated, with an 89% producer accuracy and 83% user accuracy. Oak showed an 89% producer accuracy and 89% user accuracy. Overall, the model's classifications were found to be similar to those of the forest-type map ( Figure 11). Needle firs were not present at the study site in the Mt. Baekdu area; therefore, only Korean red pine, Korean pine, Japanese larch, and oak were classified. The overall accuracy of species classification was 91%, and the kappa index (0.91) also indicated a relatively high accuracy (Table 4). For Korean red pine, the producer accuracy was 93%, and the user accuracy was 96%; thus, the model underestimated the abundance of Korean red pine. Korean pine was also underestimated, with a 94% producer accuracy and 97% user accuracy, but the highest overall accuracy was achieved for this species. Japanese larch was overestimated, with an 89% producer accuracy and 83% user accuracy. Oak showed an 89% producer accuracy and 89% user accuracy. Overall, the model's classifications were found to be similar to those of the forest-type map ( Figure 11).

North and South Goseong-gun
No needle firs were present in Goseong-gun, South Korea. Thus, Korean red pine, Korean pine, Japanese larch, and oak were classified. The accuracy of species classifications in Goseong-gun was relatively high, with a total accuracy of 90% and a kappa index of 0.90 (Table 5). The model overestimated the amount of Korean red pine, with a producer accuracy of 88% and a user accuracy of 86%. Korean pine was underestimated, with a 90% producer accuracy and 95% user accuracy, while Japanese larch had a 95% producer accuracy and a 96% user accuracy, the highest accuracy among the four species. Oak, with an 88% producer accuracy and 88% user accuracy, showed a somewhat uniform level of accuracy. Overall, the model classifications were similar to those of the forest-type map ( Figure 12). However, a specific part of the stand was not classified in the lower right area shown in Figure 12, and the abundance of Korean pine was overestimated near the center of the image. while Japanese larch had a 95% producer accuracy and a 96% user accuracy, the highest accuracy among the four species. Oak, with an 88% producer accuracy and 88% user accuracy, showed a somewhat uniform level of accuracy. Overall, the model classifications were similar to those of the forest-type map ( Figure 12). However, a specific part of the stand was not classified in the lower right area shown in Figure 12, and the abundance of Korean pine was overestimated near the center of the image.  Using the developed model, a map of predicted species classifications was produced for Goseong-gun, North Korea (i.e., North Goseong-gun). As other species (black locust, black pine, East Asian alder, East Asian ash, East Asian white birch, Korean castanea, mono maple, oriental flowering cherry, pitch pine, poplar, sawleaf zelkova, and walnut) existed in addition to the target species, a Using the developed model, a map of predicted species classifications was produced for Goseong-gun, North Korea (i.e., North Goseong-gun). As other species (black locust, black pine, East Asian alder, East Asian ash, East Asian white birch, Korean castanea, mono maple, oriental flowering cherry, pitch pine, poplar, sawleaf zelkova, and walnut) existed in addition to the target species, a category labelled "other species" was included in the forest-type map. The model was then evaluated based on the forest-type map of Goseong-gun, South Korea. The results show an overall accuracy of 77% and a kappa index of 0.77 (Table 6). Thus, the developed model was determined to be relatively accurate in classifying tree species in North Korea, the objective of the study, when evaluated based on the Ministry of Environment's Land Cover Classification Guidelines in the Ministry of Environment's Order No. 1317 [49], which set the land-cover classification accuracy standard in North Korea at ≥ 70%. Regarding the classification of Korean red pine, the producer accuracy was 77% and the user accuracy was 79%; thus, the model underestimated the abundance of Korean red pine. Korean pine was also underestimated, with an 86% producer accuracy and a 93% user accuracy. Japanese larch showed a 92% producer accuracy and 95% user accuracy, the highest classification accuracy among the four species. Oak was underestimated, with a 66% producer accuracy and 69% user accuracy. Other species were also underestimated, with a 61% producer accuracy and 49% user accuracy. This result suggests that the presence of other tree species was a source of increased error in the classification of all species; for example, the presence of other species caused classification errors for Korean red pine, Korean pine, and Japanese larch, and the presence of other broad-leafed trees caused classification errors for oak.
In a previous study, we confirmed that a model integrating training data from the Gwangneung Forest and Mt. Baekdu areas achieved adequate accuracy in both areas [10]. Therefore, in this study, a new model was constructed by integrating training data from Gwangneung Forest, the Mt. Baekdu area, and South Goseong-gun. The model trained with integrated data showed an overall classification accuracy of 80% and a kappa index of 0.80 when applied in North Goseong-gun, a 2% increase in accuracy over the non-integrated model (Table 7). For Korean red pine, the producer classification accuracy increased from 77% to 81%, and user accuracy increased from 79% to 83%. For Korean pine, producer accuracy increased from 86% to 88%, and user accuracy was maintained at 93%. However, for Japanese larch, the original producer accuracy of 92% and user accuracy of 95% were reduced to 90% and 91%, respectively. For oak, the 66% producer accuracy and 69% user accuracy rose to 72% and 75%, respectively. Regarding other species, the 61% producer accuracy and 49% user accuracy changed little (to 62% and 49%, respectively). Thus, it was confirmed that a model could be constructed using training data for as many species as possible along the border of South and North Korea to classify species in North Korea with reasonable accuracy.

Discussion
The analysis of the spectral characteristics of the target species showed that within the April-October period, the reflection value of oak in the near-infrared (NIR) region was higher than those of the other species (Figure 3). This trend was similar to those observed in previous studies. Lee and Lee [50] acquired spectral reflectance data on Galcham oak (Quercus aliena Blume) and pitch pine (Pinus rigida Mill.) at Gyeyangsan, Incheon, using an ASD FieldSpec (Malvern Panalytical, UK). Their data showed that the reflection values of oak trees in the NIR region were higher than those of pine trees. Additionally, in Sweden, Persson, Lindberg and Reese [4] extracted the spectral reflection values of Norway spruce (Picea abies), Scots pine (Pinus silvestris), hybrid larch (Larix x marschlinsii), birch (Betula sp.), and pedunculate oak (Quercus robur) within the April-October period using Sentinel-2 images, and their data also showed that compared with the other species, the reflectance of oak in the NIR region was higher.
In April, the intensity of reflection of oaks in the NIR region was the highest, followed by that of Korean red pine, Japanese larch, Korean pines, and needle fir. Within the May-July period, that of Japanese larches was higher than that of Korean red pines. From the beginning of August, the reflection values of Japanese larches began to decrease, and in September and October, they were lower than those of Korean red pine and Korean pine. Even though Japanese larches are coniferous, they showed reflection characteristics that were similar to those of broad-leafed trees, a finding that agrees well with previously reported results [4,10,51].
From August to September, the values of the spectral reflection curves of all species showed a relatively large decrease. Reportedly [4,52], similar phenomena have been attributed to a sharp decrease in the elevation angle of the sun [52]. The Korean red pine and the Korean pine, which are of the same genus, displayed relatively minor spectral differences. Needle firs presented the lowest reflection values over all periods, a finding that is consistent with those reported in previous studies [4,51].
In the crown texture analysis, non-growth mean and variance represent the mean and variance of each pixel value, and they are proportional to the brightness of the pixel. In this study, textural information was calculated using the NIR band. In the non-growth images, the opening rates of Japanese larches and oaks were lower than those of Korean red pines, Korean pines, and needle firs; thus, the means and variances were relatively low owing to the reflection of non-growth characteristics ( Figure 8).
The homogeneities of Japanese larches and oaks were similar, as were those of Korean red pines, Korean pines, and needle firs. This is because the former feature crowns with non-growing branches, while the latter have leaves on their crowns and exhibit similar textures. Therefore, the results of the homogeneity could be attributed to the reflection of non-growth characteristics.
Unlike in the non-growing period, all five species showed similar reflectance values within the growing period, and this could be attributed to the fact that all the five species had lush leaves on their crowns. Based on these results, it was observed that the characteristics of the crowns of Japanese larches and oak crowns were higher in contrast, saturation, and uniformity than those of the crowns of Korean red pines, Korean pines, and needle firs owing to the leaf opening period within the non-growth period. As all species presented lush leaves during the growing season, some differences in contrast and saturation were noted among the crowns; however, these differences were not significant enough to impact uniformity.
Based on the map of predicted species, the most widely distributed of the five considered tree species in Goseong-gun, North Korea, was the "other" group, distributed over 49,786 ha, followed by oak (42,143 ha), Korean red pine (19,063 ha), Korean pine (4544 ha), and Japanese larch (3019 ha) ( Figure 13). These results suggest that additional training data must be constructed to allow for the better classification of species in the "other" category. Thus, a high-quality library of training materials is needed for each target species, as species classification results are significantly influenced by model training data. Moreover, because regional classification models reflect specific regional characteristics, it is essential to build a high-quality training data library that reflects biogeographic patterns on the Korean Peninsula.
( Figure 13). These results suggest that additional training data must be constructed to allow for the better classification of species in the "other" category. Thus, a high-quality library of training materials is needed for each target species, as species classification results are significantly influenced by model training data. Moreover, because regional classification models reflect specific regional characteristics, it is essential to build a high-quality training data library that reflects biogeographic patterns on the Korean Peninsula. South Korea is currently conducting a national forest inventory and implementing forest-type mapping based on > 4000 ground observations over five years. This will provide a great deal of field data, but the quality of these data is not guaranteed, as it depends on the skills of each surveyor. This approach is also disadvantaged by its high cost and low efficiency. A better alternative may be to select standard points that can be surveyed within a year to reflect in situ biogeographic patterns. This would allow the production of a polygonal electronic field book employing drone images captured during field surveys.
Finally, model training materials must be developed based on the border areas between North Korea and China through cooperative research efforts between the two countries. Since it is presently impossible to collect training data within North Korea, surveying should be conducted at points that are biogeographically similar to North Korea to generate training data. South Korea is currently conducting a national forest inventory and implementing forest-type mapping based on > 4000 ground observations over five years. This will provide a great deal of field data, but the quality of these data is not guaranteed, as it depends on the skills of each surveyor. This approach is also disadvantaged by its high cost and low efficiency. A better alternative may be to select standard points that can be surveyed within a year to reflect in situ biogeographic patterns. This would allow the production of a polygonal electronic field book employing drone images captured during field surveys.

Conclusions
Finally, model training materials must be developed based on the border areas between North Korea and China through cooperative research efforts between the two countries. Since it is presently impossible to collect training data within North Korea, surveying should be conducted at points that are biogeographically similar to North Korea to generate training data.

Conclusions
In this study, we developed a model to classify the five dominant species of trees in North Korea, including Korean red pine, Korean pine, Japanese larch, needle fir, and oak, as a follow-up to our previous study in which we developed a classification model for Korean pine and Japanese larch. In the Gwangneung Forest area, the proposed model achieved an overall accuracy of 83% and kappa index of 0.83 (i.e., the classification was relatively accurate). However, since the differences in spectral characteristics between species of the same genus are minimal, spectra-based methods are associated with a certain degree of error; thus, factors that can more clearly reflect the differences between species of the same genus should be considered. Here, only topographical factors associated with species growth areas were considered. However, in future studies, the spatial range of the target area should be expanded to adequately address the growth environments of natural forests and plantations. Additionally, the consideration of climate data related to species growth characteristics could help to delineate differences between species of the same genus more accurately.
The final objective of this study was to develop a species classification model applicable to North Korea. Therefore, the proposed model's applicability to North Korea was evaluated through application of the model to (Mt. Baekdu) Ando County, China, a border region between North Korea and China, and Goseong-gun, a border region between South Korea and North Korea. Needle firs were not present in all these areas; therefore, only Korean red pine, Korean pine, Japanese larch, and oak were classified. The results show high classification accuracies of 91% and 90%, respectively, supporting the model's potential applicability throughout the Korean Peninsula and the broader region.
To predict the distribution of tree species in North Korea, target species (Korean red pine, Korean pine, Japanese larch, oak) and other species were investigated. The model achieved a moderate classification accuracy of 77% in North Goseong-gun. However, as shown in our previous study, the developed model could only be successfully applied to an area in which its training data were constructed. Therefore, to promote broader applicability, an integrated model was developed by combining training data from the Gwangneung Forest, Mt. Baekdu, and South Goseong-gun areas. This integrated model yielded improved accuracy when reapplied to North Goseong-gun (80%). Thus, the integrated model was utilized to produce a map of predicted species distributions in Goseong-gun, North Korea.
In the future, the model could be further improved through the generation of more training data in areas bordering North Korea, potentially allowing it to be applied throughout the entire Korean Peninsula. Additionally, we intended to build a standard library of spectral characteristics for each targeted species that will connect laboratory, in situ, and satellite images, thereby promoting research that reveals the relationships between the structural and chemical characteristics of each species and their spectral features.