Forest is an important natural resource that provides numerous benefits for many fields including economy, society, and environment. It not only serves as a source of timber or an area for recreational activities, but also plays a significant role in maintaining ecological balance, such as converting carbon dioxide into oxygen and biomass, mitigating natural hazards, and regulating climate or water [1
]. It is reported that the amount of forest in the world continues to decrease due to the growth of population and the conversion of forest land into agricultural land or others, although the rate of net global deforestation has slowed down in the past 25 years [3
]. Specifically, deforestation in the Kyrgyzstan Republic might have a significant influence on both regional hydrology [4
] and livelihoods of rural communities accounting for two-thirds of the total population. Besides, the Kyrgyzstan Republic is transiting from a planned economy to a market economy, which suggests the importance of understanding the economic value and potential of forest resources. In this context, it urgently needs a forest cover map for assessing its existing forest resources and assisting national policies on improving rural livelihood and managing forest resources productively in a sustainable manner. However, forest statistics in the Kyrgyzstan Republic are outdated or missing after the collapse of the Soviet Union, and only 60% of government-owned forests were inventoried in a recent national survey in 2005. Furthermore, the Food and Agriculture Organization of the United Nations (FAO) might have overestimated the forest cover percentage as 5% in 2010 according to its country report [4
]. Hence, a complete picture of forest cover is unclear [2
Recent development of remote sensing technologies can provide seamless and periodic observations for countries or regions with inconsistent forest statistics [5
]. The remote sensing data have become a primary data source for deriving spatial distribution of forest cover [7
], which is of fundamental importance for understanding their current status and thereafter achieving a sustainable forest management [9
]. However, the consistency of spectral response across space might be compromised by the complexity of both environment and forest characteristics [10
], and it is not a trivial task to derive a forest cover map with high accuracy. On one hand, researchers argued that the specific class mapping outperformed the multi-class supervised classification, and the strategy is to decompose the multi-class task into a series of binary classification tasks for a higher overall accuracy [11
]. For instance, Silva et al. [14
] adopted the weighted support vector machines classifier to map two types of mangrove forest using Landsat Thematic Mapper (TM) data. On the other hand, it is widely recognized that a group of classification methods can be fused to achieve higher accuracy than a single classification method [10
], and the fusion strategy can be “bagging” with a simple combination rule or “stacking” using a meta-classification model. For instance, Clinton et al. [15
] proposed geographic stacking methods to enhance the accuracy of a land cover map by up to 6.6% in California.
The previous studies can improve the mapping accuracy, but it is still time-consuming and labor-intensive work to derive an accurate map nationally or globally, because a huge amount of training samples need to be manually selected and labeled considering the uneven distribution of forest across space. Recently, it has been facilitated by the free availability of regional or global land cover products, which were developed using different datasets and different methodologies [16
]. The regional land cover maps, such as the North America Forest Disturbance product [18
] or the Russia forest cover map [19
], can provide sufficient spatial and thematic details, but very few of them cover the countries in Central Asia. In addition, the global land cover products vary in spatial resolutions. The coarse resolution products can provide forest cover data, such as GLC2000 at 1000 m with forest cover percentage of 13.5% [20
] or MODIS at 500 m with percentage of 0.54% [21
], but they cannot satisfy the forest mapping in terms of forest planning and ecological governance at a national scale [22
]. On the other hand, the fine resolution maps at 30 m, such as the USGS TreeCover2010 [23
] and the GlobeLand30 [24
], might be potential data sources. Nonetheless, they are reported to have different accuracies for the forest in the same region [24
]. For instance, the user’s accuracies of forest in the GlobeLand30 and TreeCover2010 are 0.84 and 0.94 for the Loess Plateau, China in 2010, respectively [25
]. Thus, a combination of the two fine resolution products not only gives a reliable and cost-effective way to carry out the selection of massive training samples, but also indicates a potential way for deriving a forest cover map with improved accuracy.
The main objective of this study is to develop a forest cover map with improved accuracy at the resolution of 30 m for the Kyrgyzstan Republic. To achieve this objective, a hybrid fusion strategy is adopted, which includes land cover product fusion, geographical feature fusion, and classifier fusion. First, the GlobeLand30 product and the USGS TreeCover2010 are fused to integrate their advantages on forest definitions. Second, multiple geographical datasets are utilized to supply auxiliary geographic information. Third, a two-layer structured classification system is proposed by fusing multiple classifiers in a hierarchical way. Importantly, previous studies simply extracted forest cover map from the USGS TreeCover2010 by specifying a certain tree cover percentage value [26
], but we explored the influence of different tree cover threshold values on generating the forest cover map. Therefore, contributions of this study include: (1) we proposed a cost-effective and reliable framework to map the forest cover with improved accuracy using a hybrid fusion strategy, which can be also used to other countries or other types of land covers; (2) we explored the influence of different forest definitions on the accuracy of forest cover, which could help to improve the accuracy of forest cover map for the Kyrgyzstan Republic.
Using our classification system, we derived forest cover products at tree cover threshold values of 10%, 20%, 30%, 40%, 50%, and 60%. As shown in Figure 3
a, a combination of the USGS TreeCover2010 at tree cover threshold value of 40% and the GlobeLand30 could produce a forest cover map with kappa value of 0.89, G value of 0.86, and F1 score value of 0.91, which was the most accurate one among the products at different tree cover threshold values. For instance, kappa values of the forest cover product at tree cover threshold value of 40% were approximately 5.4%, 2.3%, 1.5%, 6.9%, and 14.4% higher than those of forest cover products at tree cover threshold values of 10%, 20%, 30%, 50%, and 60%, respectively. As shown in Figure 3
b,c, similar finding could be also reported for G value and F1 score.
Besides, our forest cover products at different tree cover threshold values were compared with the GlobeLand30 and the USGS TreeCover2010. As shown in Figure 3
a, kappa values of our forest cover products at tree cover threshold values of 10%, 20%, 30%, 40%, 50%, and 60% were roughly 1.7%, 5.6%, 7.8%, 11.4%, 23.4%, and 39.2% higher than those in the USGS TreeCover2010, whereas they were about 31.7%, 35.7%, 36.8%, 38.8%, 29.9%, and 21.4% higher than the one in the GlobeLand30. This finding was confirmed, as seen in Figure 3
b,c, where the G value and F1 scores of our forest cover products were always higher than those in the GlobeLand30 and USGS TreeCover2010. Importantly, these findings further suggested that our forest cover product at tree cover threshold value of 40% had the highest accuracy in terms of kappa, G, and F1 score, which was chosen as our final product in this study.
As shown in Table 2
, Table 3
and Table 4
, producer’s accuracy (PA) of our product on forest was 86.51%, which was relatively larger than 75.71% of the USGS TreeCover2010 and 79.5% of the GlobeLand30. This observation means that our product contained around 86.51% of the total forest. Besides, user’s accuracy (UA) of our product on forest was as high as 97.74%, which was greater than 94.42% of the USGS TreeCover2010 and 66.07% of the GlobeLand30. It indicated that around 97.74% of forest in our product was real forest, but about 33.93% of forest in the GlobeLand30 might be confused with other land cover types. In other words, forest in GlobeLand30 might be overestimated. The superiority of our product on forest can be also reflected by the high value of the F1 score, which was around 9.21% and 27.17% greater than those of the USGS TreeCover2010 and the GlobeLand30, respectively.
As shown in Figure 4
, we presented our forest cover map using our method. The extracted forest cover was unevenly distributed in space and was composed of different types of forests. For instance, spruce forests are mainly located in the eastern region and with a few scattered in the central region, which can be identified as the locations of d, f, g, and h; the walnut forests are mainly distributed in the western region along the slopes of the valley identified as b, and they are the largest remaining forest type in the world with a significant role in biodiversity conservation; the juniper forests mainly grow in the southern region identified as the locations of a and c and with a few dispersed across the country; and many other types of forests grow along the riverside such as the location of e. Statistically, it was estimated that the forest cover area is around 472,369 ha, which constitutes about 2.4% of the territory of the entire country. However, this percentage was smaller than 3.3% of the Central Asia Forest Cover (CAFC) [4
], 3.4% of the Global Forest Change (GFC) [23
], and 5% of the FAO in 2010 [28
]. It should be noted that our product gave a relatively low estimation of forest resources in 2010, which could be mainly attributed to a rigorous definition of forest by using a tree cover threshold value of 40% in the USGS TreeCover2010. Nonetheless, selection of a tree cover threshold value of 40% ensures a forest cover map with high accuracy, which can be compared with forest cover map in previous years to explore forest dynamics in terms of deforestation and restoration. Therefore, our product can be valuable for understanding the current forest distribution, evaluating the current forest resources, amending the current national forest policies, and developing sustainable forest management strategies.
To further illustrate the superiority of our forest cover map, eight test sites were selected uniformly from the entire study region (as noted from a to h in Figure 4
). Meanwhile, the high resolution Google Earth images on different dates in 2010 were displayed to serve as the ground truth to facilitate the visual comparison of our product with the other two products. It should be noted that we tried to choose the Google Earth images during the growing season as much as possible, but there were cases when the image was absent or unclear with cloud. In these cases, we had to choose the clear Google Earth images on another date in 2010. As shown in Figure 5
, we could generally observe that forest cover extracted from the USGS TreeCover2010 at tree cover threshold value of 40% tended to underestimate the forest extent, where the real forests were wrongly identified as other types of land covers. On the other hand, forest cover extracted from the GlobeLand30 was more likely to be overestimated, where other types of land covers might have been misclassified as forest. Importantly, forest cover extracted from our method had a high quality and could reflect well the real situations. For instance, as seen in Figure 5
a,c, forest cover was clearly identified in our product, but it was underestimated in the USGS TreeCover2010 and overestimated in the GlobeLand30; the same situation can be shown in Figure 5
b,d,f–h. Specifically, as seen in Figure 5
e, grasslands were misclassified as forests by TreeCover2010 and many forests were not identified by GlobeLand30; however, forest cover in our product was relatively complete.
5. Discussions and Limitations
5.1. Influence of Auxiliary Geographical Information on Model Accuracy
It has been argued that fusion of auxiliary geographic information into the feature vector can improve the accuracy of classification [15
]. Thus, we discussed this argument by gradually including auxiliary information into the feature vector. The feature vector is basically composed of six dimensions of three spectral features and three texture features, and then it is expanded into eight dimensions by including NDVI and its texture and further increased into twelve dimensions by appending height, slope, aspect, and temperature.
As shown in Figure 6
, we can see that fusion of auxiliary geographic information can significantly improve the accuracy of classifiers in terms of kappa, G, and F1 score. For the five base classifiers, kappa, G, and F1 score can be improved on average by 34%, 16%, and 33%, respectively (Figure 6
a–c). However, this result is based on forest sample size of 100,000 and tree cover threshold value of 50%. We wonder whether the improved model accuracy can be affected by the change of forest sample size or tree cover threshold value. As shown from Figure 6
d–f, we found that the change of forest sample size from 100,000 to 20,000 has little influence on the improved model accuracy. Similarly, as shown from Figure 6
g–i, we found that the change of tree cover threshold value from 50% to 30% has also trivial influence on the improved model accuracy. For the two meta-classifiers, kappa, G, and F1 score are also improved on average by 36%, 18%, and 35%, respectively, which are slightly higher than those of base classifiers and remain stable when changing sample size (n) or tree cover threshold value (α). Additionally, it is very clear that GBM has a larger value of kappa or G or F1 score than LR irrespective of feature dimension size, sample size, or tree cover threshold value. Hence, in this study, the twelve-dimensional feature vector including auxiliary geographic information is adopted and GBM is used as the meta-classifier. Nonetheless, sample size (n) has a slight influence on the accuracy of classification, for instance, kappa value of DTB deceases from 0.73 to 0.71 when sample size (n) changes from 100,000 to 20,000.
5.2. Influence of Sample Size on Model Accuracy
As aforementioned, sample size can impose a non-trivial influence on the accuracy of our model. Thus, we discussed this issue by gradually increasing sample size and to observe if the optimal sample size gives improved accuracy. As shown in Figure 7
, we can see that the accuracy of our model in terms of kappa, G, and F1 score firstly increases and then decreases with the increment of sample size. Specifically, polynomial models with degree = 2 can be obtained by fitting the relationship between forest sample size and model accuracy at different tree cover threshold values. Interestingly, from the polynomial curve, the optimal forest sample sizes of 70,000, 80,000, 110,000, 100,000, 100,000, and 110,000 can be determined visually for our models at tree cover threshold values of 10%, 20%, 30%, 40%, 50%, and 60%, respectively. For instance, using the optimal sample size, kappa value is calculated as 0.68, 0.71, 0.72, 0.74, 0.74, and 0.78 at tree cover threshold values of 10%, 20%, 30%, 40%, 50%, and 60%, respectively.
5.3. Comparison with Other Forest Cover Products
A hybrid fusion strategy, including forest cover products fusion, geographical feature fusion, and classifiers fusion, is adopted to extract the forest cover. Specifically, as shown in Table 5
, our classification system could outperform any individual classifiers in its first layer. Thus, forest cover developed using our method differs from any single forest cover products in the literature. On one hand, forest cover estimates from government statistical reports tend to be overestimated, for instance, around 5% of the entire territory is estimated as forest cover by FAO in 2010 [28
] and about 4.3% reported by International Union of Forest Research Organizations (IUFRO) [35
]. The reason can be partly attributed to the inclusion of sparse woodland or grassland as part of forest [4
]. On the other hand, forest cover estimates using different remote sensing based methods are different in the literature, and possible reasons can be phenological variation, spectral reflectance, forest definitions, and different models. For instance, forest cover constitutes only 0.54% in the MODIS VCF product [21
], but the percentages can be 7.8% in the GlobeLand30 product [24
], 13.5% in the GLC2000 product [20
], 3.3% in the CAFC product [4
], and 3.4% in the GFC dataset [6
]. Specifically, forest cover in the USGS TreeCover2010 might be overestimated using very small tree cover threshold value, such as 10% or even 5%.
5.4. Limitations of This Study
Firstly, it is well known that multi-source image classification could outperform the traditional statistically based classification, and it is also argued that classifiers might perform differently at different locations because image information varies geographically [37
]. In these respects, we assume that inclusion of auxiliary geographic information could improve the accuracy of land cover classification. Currently, a few studies have attempted to utilize the geographic information in the procedure of classification. For instance, geographical coordinates in terms of latitude and longitude were integrated into the feature vector with the aim to improve the classification accuracy [15
]; the results of spatial heterogeneity analysis were adopted to improve the accuracy of classification [38
]. In this study, we used information relevant to forest including NDVI, elevation, and temperature. Although the classification accuracy improved significantly, it is still an open question of how to choose effective and uncorrelated auxiliary geographical variables, which needs a deep understanding of domain knowledge and requires experiments with trial and error.
Secondly, samples should be used to train the classification model, which are randomly selected to represent the characteristics of the entire population. In this respect, sampling strategy can affect the classification accuracy. However, it is not a trivial task to obtain representative samples in land cover classification because of spatial dependency and unbalanced distribution of geographic entities. In fact, the unbalanced issue might depend on the context. It would not be a problem for the classification of oasis from the desert, which are homogeneous and uniformly distributed, but it would be a severe problem in our study considering the heterogeneous distribution of non-forest (including grassland, scrubland, farmland, and so on). Therefore, a stratified spatial sampling was adopted in our study according to the variability of forest and non-forest. In addition, to improve the classification accuracy, we explored the influence of sample size on the classification accuracy. Although stratified sampling can help to reduce the variance of sample information, it does not consider the spatial dependency among neighboring samples, which indicates that nearby samples are more similar to each other and might give redundant information. Thus, improving the sampling strategy points out a future study.
Thirdly, there are more than 800 different forest definitions in use around the world and some countries have adopted several definitions at the same time, which might cause difficulty in forest cover extraction. Currently, most studies extracted the forest cover from the USGS TreeCover2010 by using a certain tree cover threshold value, and very few studies considered the influence of different tree cover threshold values on the accuracy of forest cover. However, this study took a step to explore the forest covers extracted from the USGS TreeCover2010 at different tree cover threshold values and thereafter contributed a forest cover map with improved accuracy. Thus, usage of the agreements on forest definitions from multiple products can be a potential way to improve the accuracy of forest cover map, but it is still an open question of how to utilize different forest definitions in an effective way, for instance, how many forest cover products are necessary and optimal. This should depend on the context and deserves further studies.
This study used a hybrid fusion strategy to derive forest cover map with improved accuracy for the Kyrgyzstan Republic. Specifically, the forest cover in the USGS TreeCover2010 was overlaid with the forest cover in the GlobeLand30, which produced the consistent area and the conflicting area. Then, a two-layered stacking classification system, composed of five base classifiers and one meta-classifier, was proposed and trained using random samples from the consistent area. We found that the accuracy of our model could be improved significantly by including auxiliary geographic information. Additionally, by gradually changing the sample size, a clear polynomial relationship with the model accuracy can be derived. This finding suggested that optimal sample size can be obtained to further improve the accuracy of our model. Lastly, our model was used to estimate forest in the conflicting area, which was further merged with forest in the consistent area to create the final forest cover map.
In this study, we derived several forest cover maps with respect to tree cover threshold values of 10%, 20%, 30%, 40%, 50%, and 60% in the USGS TreeCover2010. Specifically, the forest cover map at tree cover threshold value of 40% was verified as the one with the highest accuracy in terms of kappa, G, and F1 score in terms of the values of 0.89, 0.86, and 0.91, respectively. For instance, F1 score value of our product in forest is 9.21% and 27.17% higher than those of the USGS TreeCover2010 and the GlobeLand30, respectively. Moreover, the forest extent in our product is estimated as 472,369 ha and constitutes around 2.4% of the entire territory, which is relatively smaller than most of the estimates reported in the literature. In this respect, our estimate might suggest a low estimation of forest resources based on rigorous definition, which can be valuable for reviewing and amending the current national forest policies. Importantly, this study contributed a hybrid fusion strategy of deriving a high accuracy forest cover map, which is economical and time-saving. It can be used to develop high accuracy forest cover maps in other countries or regions and can be also applied to other types of land covers.