1. Introduction
Forestation is an important natural and strategic resource in China. The development of forestland resources in Zhejiang Province is relatively high, but its quality has been disregarded in the long-term national economic development process. Therefore, protecting forestland and improving its quality have been important measures to balance rapid economic growth and ecosystem protection. The accurate evaluation of forest site quality is an important guarantee for matching tree species with sites and establishing plantation management measures scientifically. Such an evaluation is a significant premise for realizing the scientific, reasonable, and efficient utilization of forestland. If a scientific and accurate evaluation system of forest site quality can be proposed, then it will have an immense impact on improving the productivity and sustainable development of plantation forests [
1].
At present, numerous studies have been conducted in the field of the site quality evaluation of plantation forests. Site quality evaluation aims to classify the suitability or potential productivity of forestland by collecting relevant data with generalized mathematical methods to take forest management measures according to different classification results [
2]. The selection of site factors and site evaluation methods is the key to solving the problem of the site quality evaluation of the plantations.
- (1)
Site factors
The site quality of plantations can be reflected by some site factors [
3]. Given that measuring all of the numerous site factors is impossible, identifying which factors are used as the basis of site quality evaluation has constantly been the research focus in this field. The existing research results have shown that commonly used site factors at present are mainly summarized as topography, soil, climate, and vegetation factors. Traditional forest site quality evaluation involves obtaining relevant site factors through ground measurement and dividing site types, thereafter according to the combination of different site factors. The research on dominant tree species has been conducted particularly in regions with relatively consistent climatic conditions, in which the concern is on the relationship between the local soil or topographic factors and tree height or site index [
4,
5,
6]. Moreover, some scholars have indicated that ground survey data are mostly discrete non-numerical data, thereby reducing the convergence and stability of the site quality model. Therefore, the focus of site factor acquisition has shifted to climate and plant biological factors. For climate factors, temperature, humidity, precipitation, and dryness are the most selected factors applied to the site quality evaluation of
Pinus koraiensis,
Picea asperata,
Fagus longipetiolata,
Quercus mongolica, and other tree species [
7,
8,
9,
10]. For vegetation factors, the height and coverage of understory vegetation are often used to construct a site index model [
11].
In collecting site factors, topography, soil, and understory vegetation factors can be obtained through the Forest Management Inventory System of China or field investigation. Studies have shown that soil chemical properties and nutrients such as soil PH, nitrogen, phosphorus, potassium, and other elements directly affect the growth of trees [
12]. However, determining these factors is time-consuming and costly. In local stands, the effects of climate change on tree growth are not significant [
13]. Therefore, the effects of climate can be markedly important at the landscape and regional scales, while topography and soil can be significant at the local scale. Therefore, in selecting site factors of local stands, choosing factors that can affect the growth of the stand and are easy to determine and measure is better.
At present, selecting site factors in site quality model construction is often subjective. ANKIWAN [
14] estimated the average height growth model by using 32 site environmental factors such as topography, gradient, effective soil depth, and the average height of five dominant trees in the Jeju special self-governing province and southern area. Chen et al. [
15] selected eight indices, namely, geomorphology, slope aspect, slope position, slope degree, altitude, soil type, soil parent materials, and soil thickness to study the site quality classification rules of Chinese fir and Masson’s pine using the decision tree algorithm. Some scholars have used a series of mathematical methods to reduce the dimension of numerous site factors. Guo et al. used principal component analysis (PCA) to select eight main relevant factors (i.e., slope, position, aspect, soil type, humus thickness, soil thickness, landform, and altitude) affecting tree growth from the original 16 site factors, and classified the site quality grade using the comprehensive fuzzy method [
16]. Lv et al. selected nine indicators (e.g., soil thickness, soil type, aspect, and position) and reconstructed a stand index model through expert scoring and weighting via the Delphi method [
17]. Quichimbo et al. used the CART method to reduce the dimension of subjective soil factors and to analyze the relationship between the dominant height and soil factors [
4]. Site factor selection is a multi-attribute fuzzy decision-making problem, and the relationship among factors is constantly complex. Hence, finding key factors affecting stand growth is difficult. The site factor selection method typically relies on prior knowledge, and the results are subjective. To solve this problem, this study used rough set theory to reduce the dimension of site factors. Rough set theory, which was proposed by Professor Pawlak in 1982, is a mathematical tool that can quantitatively deal with inaccurate, inconsistent, and incomplete information [
18]. Rough set can accurately calculate the attribute factors closely related to the decision attribute from the data level without prior knowledge and remove redundant information on the premise of keeping the original classification ability.
- (2)
Site quality evaluation method
The traditional way to evaluate the stands’ site quality mainly includes direct and indirect evaluations [
3]. Site class (SC) and site index (SI) methods, as direct evaluation methods, are often used to evaluate the site quality of plantations [
19]. That is, the site quality of plantations is evaluated according to the average height and dominant height of the stands. Mathematical methods such as the guide curve model [
20], random effect model [
21], algebraic differential approach (ADA) [
22], generalized ADA (GADA) [
23,
24], parameterization model [
25], and mixed effect model [
26] are commonly used to establish the SC and SI models. The indirect method mainly establishes the multiple regression equation between the tree height and site factors to evaluate the growth potential of trees; the quantity theory I model is a typical method and is mostly used in the quality evaluation of non-forested sites [
27], where its basic principle is to convert the data of each plot to 0–1 according to the sub-classes of the site factors (i.e., the site factor of slope position is divided into three sub-classes, namely upper, middle, and lower), so as to construct the regression equation of sub-small classes site factors and dominant heights. In addition, the functional relation of the site index between tree species is used to evaluate the site quality, but the accuracy of the results depends on the similarity of the growth types of tree species [
28].
Site quality evaluation methods are based on traditional linear or nonlinear modeling methods, which need to have certain statistical assumptions such as data independence, normal distribution, and equal variance. However, the relationship between tree growth and site factors is typically complex and nonlinear, and most forest growth data do not meet this assumption, thereby resulting in difficulty in providing accurate prediction results [
29]. For example, biased estimation or invalid prediction would easily occur when traditional regression analysis is used [
30]. Site factors screened by PCA can effectively simplify the data structure, but the cumulative contribution rate of the first several principal component factors is consistently low and the key factors cannot be easily determined. The application of quantity theory I can effectively deal with discrete attribute factors, but it depends on the long-term observation data.
In recent years, machine learning, as a new artificial intelligence technology, has gradually entered the field of forestry scientific research to satisfy the needs of forestry production [
31]. Compared with traditional statistical models, the machine learning method has no assumptions on the distribution form of data, can considerably process data with high dimensions and complex nonlinear interactions, and can deeply mine valuable information [
32]. Furthermore, machine learning models based on recursion, resampling, averaging, and randomization can reveal the hidden structure in the stand data, obtain accurate site quality prediction, and discover new relationships [
33]. In machine learning technology, random forest can effectively deal with nonlinearity, interaction, collinearity, and other problems, and can effectively avoid multiple fitting [
34]. Moreover, random forest can be used for regression, classification, and prediction, and can also measure the importance of the variables.
Chinese fir is one of the major plant species in Southern China, particularly in Zhejiang Province, and exhibits characteristics such as fast growth, high yield, good material, and significant economic value [
35]. Research on trees and stands is essential for Chinese fir planation management in the region. The motivation of the present study is to explore the role of rough set in site factor dimension reduction, develop site quality classification models for Chinese fir using rough set theory and random forest algorithm in Lin’an District, Zhejiang Province, and compare the accuracy of classification models under different site factors. Accordingly, the influence of key site factors on Chinese fir site quality is explored, and the comprehensive evaluation of the forest quality grade is realized.
4. Discussion
The results of the rough set study showed that the main factors affecting the growth of Chinese fir were naturalness, stand origin, plant community structure, forest class, soil layer thickness, humus layer thickness, undergrowth vegetation coverage, undergrowth vegetation height, undergrowth vegetation species, slope position, slope gradient, slope direction, and canopy closure. These factors played a key role in the site quality classification of the Chinese fir sub-compartments. In the attribute reduction based on rough set, Pawlak indicated that the reduction algorithm is suitable for dealing with discrete variables. However, some forestry data belonged to continuous data, in which the Pawlak algorithm was introduced to process this type of data [
18]; continuous data were often converted into discrete data, inevitably resulting in information loss [
48]. To solve this problem, fuzzy rough set, similar relation rough set, and neighborhood relation models can be introduced to study the attribute reduction of site factors in subsequent research.
The results of the site quality classification model based on random forest showed that this model, based on reduced attributes, was more simplified and the model training efficiency was higher. The accuracy, recall rate, and accuracy of the model were relatively improved in the training and testing sets compared with the model without attribute reduction. The random forest model is an extension of the decision tree model. Chen et al. once used the decision tree to construct the quality classification model of Chinese fir, and her research results showed that the classification accuracy of the model was lower than that of the random forest model [
15]. At present, some scholars have used random forest algorithm to evaluate site quality, but in the selection of site factors, almost all of them were subjective selection based on experience [
49], and the rough set in this study could well solve the subjective problem of site factor selection. Moreover, the effects of 13 site factors on the growth of Chinese fir were analyzed using the variable importance assessment function of the random forest model. The results showed that the most influential site factors were slope direction, canopy closure, and slope gradient, while the less influential factors were the humus layer thickness, soil layer thickness, naturalness, and stand origin in the study area. The reasons were as follows. The change in the slope direction and slope gradient have a certain influence on solar radiation, soil fertility, and air temperature. Hence, the slope direction and slope gradient have immense influence on the growth of Chinese fir.
Related studies in the same region have shown that the steeper the slope, the worse the stand quality. The reason is that the slope has an impact on the microclimate of the stand. The place where the slope is considerably steep is often located in the windward with the thinner soil layer, which is not conducive to the growth of Chinese fir [
50]. Some studies have also shown that Chinese fir on the northeast and northwest slopes has better site quality than that on the south slope, indicating that Chinese fir is more suitable for growing on shady or semi-shady slopes [
5]. Some studies have also shown that site factors have different effects on the growth of Chinese fir in different growth stages of stands. Slope position is the main factor affecting the growth of Chinese fir in young and middle-age stands, while humus thickness is the most critical factor affecting the growth of Chinese fir in near-mature and over-mature stands [
16].Canopy closure is the embodiment of stand density. The change in canopy closure can indirectly affect changes in solar radiation, stand air humidity, growth environment of undergrowth vegetation, soil physical and chemical conditions, and the types and activity intensity of microorganisms in the soil. Some studies have also shown that the density of the stand indirectly affects the site conditions of vegetation [
51]. Therefore, canopy closure is an important factor affecting the growth of Chinese fir. Although there are many studies that have indicated that soil layer thickness and humus layer thickness have relatively important effects on soil quality [
52,
53,
54], there are a few types of soil in the planting area of Chinese fir in Lin’an District, most of which are yellow and red soils. In addition, most of the soil is thick, so the influence of soil thickness and humus layer thickness on the growth of Chinese fir is not evident. At present, no study has been conducted to analyze the impact of naturalness and origin on site quality. The data in this study indicated that stands with naturalness Class III accounted for 94.5%, and the rest of the stands with naturalness II and I merely accounted for 5.5%. In the origin of stand, plantations accounted for 96.0% and natural forest only accounted for 4.0%. Thus, the imbalance in the experimental data was also the main reason that the preceding factors did not clearly classify the site quality of Chinese fir. Furthermore, plantations are probably located in specific (pre-selected) locations, therefore, the site factors affecting the growth of Chinese fir showed inconsistent conclusions with other references [
55,
56], the results of this study are only applicable to the study area, so we still need to verify the applicability of the results to other regions.
The results of this study proved that the method of forest site quality evaluation combined with rough set and random forest could deal well with the nonlinear relationship between forest site quality and site factors as well as overcome the limitation and subjectivity of the artificial selection of site factors. The random forest model can improve the accuracy of classification and prediction without significantly increasing the amount of computation. In the model construction, there are few adjustment parameters, and it can also be used to evaluate the importance of features. In general, the model has numerous advantages in classification. The model can predict the site quality of Chinese fir with the site factors and also judge whether or not there is forestland suitable for the growth of Chinese fir. Meanwhile, the algorithm was edited using Python, which has strong universality and compatibility. Finally, the programs proposed in this study can be used on different software platforms, thereby providing a new idea for the application of big data in Chinese forestry.
The main innovation of this study was to apply the rough set theory and random forest model to the problem of “matching tree species with site” with satisfactory results. In future research, we should deeply analyze the impact of each site factor on the site quality of the stand and the interaction between site factors under different climates, environments, stand ages, and stand densities, so the growth environment of Chinese fir is in the best combination state to achieve the best productivity. In addition, the random forest model has potential wide application. The proposed model was only for Chinese fir species, and random forest models of other species can be established in the future. Future models can provide scientific theoretical basis for further discussion of the spatial distribution of the forest site quality grade and forest land utilization planning, and provide technical support for improving forestry information management.