1. Introduction
Forests are one of the most important terrestrial ecosystems, providing a variety of services to humans [
1]. Forests play multiple important roles in meeting the habitat needs of different organisms [
2]. Biodiversity is a structural feature in dynamic and complex forest ecosystems. [
3]. One of the most challenging issues in modeling forest ecosystems is understanding the relationship between biodiversity and environmental factors. Although this has been the focus of many investigations in recent years and extensive research has been done in this field [
4,
5], much work remains to be done. Biodiversity analysis in a forest along the gradient of environmental factors can provide meaningful information about the structure of the forest community. For example, an analysis of altitudinal gradient concluded that functional divergence shows a single peak change law with increasing elevation in the Baihua Mountains [
6]. Species diversity generally decreases with increasing latitude [
5]. Factors affecting species diversity are not only abiotic and environmental, but also include environmental gradients by biotic factors such as competition [
7]. Biotic conditions affect species composition and create differences in the diversity of the community [
8]. Biotic factors related to the structure of a community have a significant effect on access to niche in coexisting species [
9]. Composition of forest ecosystems is the result of biotic interactions coupled with environmental influences [
5].
Many studies have considered abiotic factors of forest by numerical methods and related them to tree growth measured in the field [
10,
11,
12,
13,
14]. Furthermore, various studies have considered only biotic factors and their effect on forest biodiversity [
15,
16]. In these cases, abiotic environmental effects on species distributions and their ability to sustain viable populations in specific environmental configurations are not evaluated [
17]. However, the species diversity in a given site is also influenced by other species through interactions such as trophic, non-trophic and competitive [
18,
19].
Few studies have considered the impact of biotic and abiotic factors on biodiversity that change simultaneously [
20,
21]. Some studies indicate that biodiversity of a forest is affected by many factors, such as local climatic conditions, soil characteristics, biodiversity, and even the type of management practices employed [
6,
22]. Environmental factors are key variables that can help determine the diversity and distribution of plant species. For example, an analysis of patterns of diversity across different climate conditions of forest in China, which used PCA analysis to build the compound habitat gradient and biotic and abiotic factors, concluded that both biotic and abiotic factors influence diversity of the forest community across different climatic zones [
23].
Research studies that have considered influence of both biotic and abiotic factors on biodiversity are limited, and this is one of the unique aspects of this study.
In recent years, machine learning (ML) methods have been widely used to investigate the effect of changes in environmental conditions on biodiversity. ML is one of the most important branches of artificial intelligence, which is widely used for analyzing forest data due to its significant advantages in some circumstances. ML processes often include the following: (i) Data selection and preprocessing. (ii) algorithm selection and (iii) assessment of solutions [
24,
25]. The application of machine learning has been used to improve local, regional, and global estimates in complex ecosystems such as forests [
26]. ML methods have also been used in forest management and hazard assessment [
27,
28].
ML models can provide good accuracy and capacity in solving complex issues and representing nonlinear behavior of systems; these methods require rigorous training and test data [
29]. However, long-term, accurate monitoring can be costly, and data collection, storage, and updating can be disrupted. Therefore, researchers have proposed different ML methods as well as a combination of ML methods and traditional methods to better understand the different environmental mechanisms in forests as well as to solve different problems [
27,
30]. It is important to note that there is no one universal suitable ML method for all studies, and the choice of the most appropriate and best method, or a combination of the methods, depends on the users and their objectives [
31]. For example survival and height of trees were predicted and modeled with ML, respectively, and determined the affecting factors in natural forests [
32,
33].
In complex ecosystems such as forests, ML methods have been used extensively to improve global, regional, and local forecasts. Machine learning and geo statistical methods were used to predict aboveground biomass (AGB) in Chinese forests [
34]. They concluded that the proposed random forest approach provided a reliable and accurate method for AGB mapping in subtropical forest regions with complex topography.
ML models have many advantages that justify their choice for modeling forest features such as biodiversity. For example, the random forest approach is less sensitive to parameter adjustments than other ML models, and it provides an assessment of the importance of variables. This model is more powerful in reducing data than other models, and it is more accurate than decision trees; it also generates an internal unbiased estimate of the generalized error as model building progresses [
35,
36].
The SVM model is another ML model that was used in this research. SVM models can be applied for solving nonlinear, regression and density estimation problems, and they are very useful in forest modeling. In addition, they use kernel functions in the form of points to project the multidimensional space of data and then find the best classification of the hyperplane [
37].
The K-nearest neighbors (KNN) model was also applied in this study. KNN is one of the easiest and simplest ML models, which is one of the advantages of using it. In this model, no hypothesis about the distribution of prediction variables is required, and it can be applied to both single and multivariate predictions [
38].
Another method, the GAM model, was also considered as it can limit the error in prediction of a dependent variable Y by assessing unspecified functions, which are connected by means of a link function with the dependent variable. By defining the model in terms of a smooth function the GAM model, provides a flexible specification of response [
39].
It can be said that the key to reveal mechanisms for conserving species diversity and formulating forest management strategies is to study the response of species diversity to changes in environmental conditions and its relationship with ecosystem performance [
40].
Hyrcanian forests in the form of green strips are green belt forests that are often temperate. These forests are located along the southern borders of the Caspian Sea and over the northern slopes of the Alborz Mountain. These forests extend from Astara in northwestern Iran to Golidaghi in Gorgan in northeastern Iran and are located in the three provinces of Gilan, Mazandaran and Golestan. According to the latest statistics of the Forests and Rangelands Organization of Iran (FRWO), these forests cover an area of 1.85 million hectares, which constitute 1.1% of the country and 15% of the total forests of Iran. Hyrcanian forests, with more than 3200 plant species, have significant biodiversity on a global level. This region is of special importance with about 44% of vascular plants in Iran. There are about 500 species of plants native to Iran in these forests. All these features emphasize the need to protect the biodiversity of this region [
41]. The main threat to biodiversity of the Hyrcanian forest arises from habitat loss and fragmentation resulting from conversion of forests to urban development, agricultural land, dam construction, private dwellings, and other non-forest uses.
The aim of this study was to investigate the effect of biotic and abiotic factors on the tree diversity of Hyrcanian forests in northern Iran. For this purpose, a combination of ML models was used. ML models include GAM, SVM, RF and KNN and try to include the most important factors such as average DBH in plot, BAL, BA, number of tree per ha, tree species and slope, aspect and elevation. This study is unique in that the selected forest sites are comprehensive and cover the Hyrcanina forest from west to east. Past studies have only included a limited part of these forests in terms of biodiversity, this study is the first to completely cover these forests by selecting 8 forest sites from east to west. In addition to diversity modeling, we analyzed tree diversity differences in the east-west gradient of Hyrcanian forests.
4. Discussion
4.1. Trend of Tree Diversity in the Hyrcanian Forest
The diversity of tree species is the basis of biodiversity of the whole forest, because trees provide resources and habitats for other forest species. Species diversity in the forest changes under the influence of various factors [
42]. Biodiversity ensures flexibility and adaptability of forest ecosystems, which protects the environment and leads to sustainable forest management [
68]. Currently, considering biodiversity in forest management, along with other accepted economic and environmental criteria, it is believed that in order to achieve the goals of sustainable forest management, forestry activities must be in line with environmental issues, especially plant biodiversity [
69]. Also, the use of biodiversity indicators to evaluate different functions that have been developed for ecological indicators allows study of environmental characteristics, forest management [
70,
71], and conservation [
72] as applied to ecosystems. Because plants are the result of the environmental characteristics of each region, they are a full-fledged mirror of the habitat characteristics of that region [
73]. Therefore, the study of plant composition and plant biodiversity can be used as an appropriate guide in ecological evaluations and study of biodiversity in each region. According to the results of the present study, biodiversity from west (Gilan Province) to east (Golestan Province) often has an increasing trend in the Hyrcanian forest. Temperature increases from west to east as does biodiversity [
23], But the annual precipitation decreases from 1345 mm at the Asalem forest in Gilan Province at the westernmost point to 524 mm at the Loveh forest in Golestan Province in the easternmost point. In addition, there are other factors that increase or decrease biodiversity in a forest site, the most important of which is elevation, Species diversity decreases with increasing elevation. But there are exceptions to this trend seen in this study; Loveh forest, despite being located in the easternmost point of the study, had the highest Shannon index (1.142) due to the predominance of oak species in this forest. Because it is a shade-intolerant species (unlike beech, which is shade-tolerant), it has more species diversity than the forests in which beech is predominant.
Shannon index in the Nav forest was 0.439, which is the lowest value after Haftkhal forest. Despite this site being in the west of Gilan province, and the expectation of a high index of diversity, and in line with the general trend of biodiversity from west to east in the Hyrcanian forest, this decrease can be considered to result from intense exploitation of these forests.
Shannon index of 0.784 was relatively high on the Chafroud site, located in the west of the Hyrcanian forests and in the middle highlands.
In Mazandaran Province, from west to east, respectively, the index for Ramsar, Sardabroud, Kheyroud and Haftkhal forest sites was 0.846, 0.681, 0.680 and 0.147, respectively. The trend of diversity from west to east was consistent, with the exception of the Sardabroud site, which, being at the mid elevation level, showed high biodiversity index. On the other hand, HaftKhal forest site in Neka city had the lowest Shannon index, which apparently is related to both natural causes and management factors.
Extreme degradation and exploitation in the Neka region has led to high dominance for species such as beech, resulting in high dominance index and very low uniformity. In most of the sample plots, one or two dominant species with high frequency (beech species) were present and the presence of other species was low or not observed at all, which caused a decrease in uniformity and increased dominance in this area. In addition, intense management and exploitation play an important role in relation to the low diversity index in the Neka forest area, and another important factor in this regard is environmental influences, including elevation. This site is located at a relatively high elevation as compared to the other two areas. This, as well as the dominance of the northern direction in this region, can be another factor for the dominance of beech species and overcoming competition with other species.
4.2. Machine Learning Approach to Modeling Diversity
In this research, four ML models were implemented including generalized additive model (GAMs), support vector machine (SVM), random forest (RF) and the K-nearest neighbor algorithm (KNN). Among ML models, The RF model with R
2 0.59 and RMSE 0.28 was the best model followed by SVM, the nearest neighbor algorithm, and GAM models with R
2 0.41, 0.30 and 0.17, respectively. Thus, as can be seen, the RF model was the best model among the ML models, which was similar to the results of many studies in this field [
25,
74]. The reason for the superiority of the RF model over other models can be that this model is more powerful than other machine learning methods in reducing data and that it is also less sensitive to parameter adjustment and is able to evaluate the importance of variables. In addition, it generates an internal unbiased estimate of the generalization error as the model building progresses [
35,
36]. In a study similar to ours support vector regression (SVR), modified regression trees (RT) and random forest (RF) were used in determining forest stand height using plot-based observations and airborne LiDAR data [
75]. Similar to the present study, which considers the efficiency of SVM and RF models in modeling forest biodiversity, they concluded that there was no statistically significant difference in plot height estimation between these models and all of them are acceptable. Also, in a study in Western Himalaya [
76] that used machine learning methods including classification regression tree (CART), random forest (RF), and support vector machine (SVM) algorithms showed results similar to the present study; the authors concluded that RF model has a higher accuracy in forest fire burn area. A study in Tasmania, Australia [
77] used support vector regression (SVR), artificial neural networks (ANN), random forest (RF), and gradient boosted regression trees (GBRT) for mapping forest cover and exploring influential factors, and their findings were in line with the results of our study. In terms of projection accuracy, and required less computational costs, RF far outperformed the other three models [
77].
4.3. Effect of Biotic and Abiotic Factors on Biodiversity Index
The distinguishing feature of this study is the simultaneous consideration of the most important biotic and abiotic factors on biodiversity, which in comparison with related studies is one a few of studies that does so. Physiography affects biodiversity of the forest. According to the results obtained from the models, especially the random forest model, the effect of elevation was one of the most important factors affecting the biodiversity of forests in northern Iran. Changes in elevation, microclimatic, ecological and environmental conditions of the forest habitat, and the structural condition of the area changes in proportion to the local conditions [
42]. With the change of the conditions governing the habitat, tree diversity changes and increases in the favorable ecological and structural conditions of the land, and their amount decreases in unfavorable conditions. In the Haft Khal region, due to being located at a higher altitude, there is less diversity in the region. In addition, studies in USA (including all states of the USA east of North Dakota, South Dakota, Nebraska, Kansas, Oklahoma, and Texas) [
78] and in 11 remnant grasslands within the Aspen Parkland Ecoregion of central Alberta, in western Canada [
79], have stated that elevation is one of the most important factors affecting biodiversity of the region, which is in line with the results of this study. Then biometric factors such as Basal area and Basal area of the thickest trees, physiographic factors such as slope and aspect and finally the type of species were mentioned as factors affecting biodiversity in the region. Species such as
Fagus orientalis and
Carpinus betulus are important in the region, while other species also had a positive effect in the modeling process [
80]. Similarly, patterns of diversity across different climate zones [
23] in four climate regions in China, including tropical (three sites in Yunnan Province), subtropical (two sites in Hubei Province), warm temperate (one site Gansu Province) and temperate (one site Xinjiang Uygur Autonomous area) showed that both biotic and abiotic factors change the diversity of the forest community between different climatic zones. Other studies have considered both biotic and abiotic factors Assessing the relative importance of mean air temperature, nitrogen availability and direct plant interactions in determining the millennial-scale population dynamics for four temperate tree taxa in the Scottish Highland concluded that all of factors are important [
20,
21]. Also, as the sensitivity analysis in
Figure 8 shows, elevation was the most important factor affecting biodiversity in the selected model (RF). the next factors were BA and BAL, respectively. Tree species was also important factor for the random forest model. Physiographic factors have a significant impact on the biodiversity index in forests, especially the elevation factor. Similar studies have shown that physiographic factors play an important role as indicators of richness and diversity of species [
24] determined affecting biotic and abiotic factors. In their study wind, topographic wetness index (TWI) and elevation were most important affecting factors in tree species richness variations. As seen in the results of this study, elevation was the main influential factor in RF models. This variable is a significant predictor affecting species diversity, as observed in previous studies [
81,
82]. Similarly, aspect, slope and altitudinal variation in Ethiopian landscapes have influenced the existence of varied vegetation types and diversity [
83]. By contrast we used modeling to examine the relationship between environmental factors and the biodiversity index while they sampled quadrats and recorded data on species identity, abundance, elevation, slope and aspect. Also, they used different diversity indices and ordination techniques to analyze the data. Furthermore, elevation was one of the most important factors influencing community distribution and species diversity in the Balhus Mountains Reserve of Beijing, China [
6], this study examined the functional diversity in the elevation gradient, while we used modeling using ML.
The results of various studies show that the middle elevations generally have the highest index of richness and species diversity, which in the present study is the reason for the high level of these indicators in Sardabroud forest site.
In Haftkhal in Neka forest site, elevation is relatively higher than other areas; this, as well as the dominance of the northern direction in this region, can be another factor for the dominance of the beech species and its dominance in competition with other species. In fact, less uniformity at high elevations is due to the abundance and dominance of beech species.
In general, the results of similar studies show that lower temperatures and slower melting of ice in these areas, especially at high elevations, causes less variety in diversity. These slopes are more humidity and colder causing the dominance of beech species and as a result reduce the uniformity index. Although, it is worth mentioning that in different regions, due to their climatic and geological characteristics and geographical location, different results are obtained about aspect, but usually diversity is greater in aspects with higher humidity and temperature [
84].
Another important factor that affects the diversity index, along with natural factors, is management and conservation practices in the forests. In fact, the huge difference in biodiversity index between forest sites with similar elevation and environmental conditions, despite the relative similarity of climatic, physiographic and biological conditions, can be attributed to the management style in these forests.
Therefore, in general, the Shannon diversity index in the managed and protected forest area is significantly larger compared to other areas, which indicates that tree felling and human pressure in the area has resulted in more heterogeneity in the number and diversity of reproduction of different species.
It is necessary to mention that it seems that other environmental factors such as humidity, precipitation, temperature and soil in the studied forest sites, one would expect that the modeling coefficient will increase. Thus, if a model is developed that includes all the other abiotic factors (mentioned), the R
2 should be much higher. In our research, results show the capability of some machine learning techniques to produce accurate estimates of biodiversity index in forest sites and to identify important variables (e.g., elevation, BAL). Although it cannot be said that RF techniques may always be better than other machine learning methods, our results showed a higher coefficient of determination and lower RMSE than other ML methods evaluated. The same results were achieved using ML and geo statistical methods [
34] to predict aboveground biomass in Chinese forests, which concluded that the random forest created a reliable and accurate method for AGB mapping in subtropical forest regions with complex topography.
5. Conclusions
The main goal of natural resource management is to preserve biodiversity in natural ecosystems, assuming that habitats with more biodiversity have more ecological stability and fertility than areas with less biodiversity, and more stable ecosystems will be more dynamic. In this study, by combining biotic and abiotic factors in ML models, we analyzed their relationship to biodiversity indices across eight forest sites in the Hyrcanian forest in northern Iran. Four machine learning algorithms including GAM, SVM, RF, and nearest neighbor algorithm were used to model tree diversity. The results showed that machine learning methods, especially the random forest and support vector machine, were more accurate than other methods. Based on results of the RF model, elevation, BA, and BAL, were indicated as the most influential factors defining variation of tree diversity in the Hyrcanian forests. Also, in this study, we simultaneously examined the important biotic and abiotic factors in relation to the biodiversity index, which distinguishes this study from similar studies.
Machine learning techniques can often superior to traditional methods when assumptions model for applying parametric procedures are not validated Also, flexibility, accuracy, and the ability to model complex and nonlinear relationships are features of ML methods.