1. Introduction
Pinus radiata D. Don (radiata pine) is the predominant plantation species within Chile and there is considerable interest within the forest sector in the accurate prediction of Site Index for this species [
1].
P. radiata is the most widely established plantation species within the Southern Hemisphere, and this species constitutes a large proportion of plantations in Chile, New Zealand, and Australia [
2]. This species is very responsive to environment and, as a consequence, productivity has been found to range widely across the environments over which it is grown [
3,
4]. A number of process-based models, such as 3PG [
5], CenW [
6], and CABALA [
7], have been developed to describe how the environment influences growth of plantation species, such as
P. radiata (e.g., Kirschbaum and Watt [
8]). However, empirical or hybrid models are still the most widely used for predictions of plantation productivity, as these models are simpler to parameterise and can provide more precise estimates of growth than process-based approaches [
9].
Stand productivity is modelled by empirical models as a function of stand age, while using non-linear functional forms. Variation in the productivity between stands is accounted for by standardised measurements of productivity at a given age that are used to adjust both the trajectory and the asymptote of predictions of productivity over time [
10,
11,
12]. Site Index, which expresses the height of dominant or co-dominant trees at a reference age [
13], has been most widely used to account for this inter-stand variation, as this metric is correlated with productivity [
14,
15] and the height of dominant trees is relatively invariant to stand density [
16,
17,
18].
Environmental surfaces have been widely used through a range of modelling approaches to develop maps of Site Index for
P. radiata [
3,
19] and many other coniferous tree species [
20,
21,
22,
23,
24,
25]. When compared to direct measurements of Site Index made using plot data, which are typically averaged to the stand level, predictions of Site Index from environmental surfaces open up a range of applications that are not available from traditional inventory. The resulting spatial description of Site Index provides insight into the key environmental drivers of productivity and allows for managers to understand how productivity is likely to vary across the landscape and where the optimal productivity will occur at a range of resolutions from the intra-stand to the regional level [
3,
19]. In contrast to spatial predictions of Site Index from remotely sensed data, such as LiDAR, [
26], surfaces of productivity, which are created from environmental surfaces can also be used to estimate productivity for unplanted areas, providing managers with insight into the potential value of land that they intend to purchase [
27].
The use of Site Index surfaces to parameterise empirical growth models incorporates elements of process-based modelling, as Site Index integrates the most important determinants of tree growth, including topography, soil characteristics, and climate [
28]. Consequently, spatial predictions of Site Index provide a means of generating stand growth curves that are sensitive to fine and coarser scale landscape level changes in climatic and edaphic conditions [
29]. These estimates of stand development allow for managers to spatially optimise the timing of a range of silvicultural operations including thinning and pruning, across their estate [
30,
31]. The site Index surfaces can also be used as input to models that are used for key management decisions, such as the optimisation of final crop stand density (S
) and the development of surfaces showing spatial variation in S
[
32].
A large number of modelling methods with varying levels of complexity have been used to predict Site Index for a wide range of forest species growing in Europe, North America, and New Zealand. These methods range from relatively simple approaches, such as multiple linear regression [
4,
21,
22,
23,
24,
25,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43,
44], to more complex parametric methods, such as Partial Least Squares, Lasso, Elastic Net, Least Angle Regression, and Infinitesimal Forward Stagewise Regression [
45]. A wide range of non-parametric methodologies has also been used to model Site Index, which includes Random Forests [
46,
47], Boosted Trees [
33,
34], Classification and Regression Trees [
33,
34], Neural Networks [
34], Generalised Additive Models [
33,
34,
48], and Multivariate Adaptive Regression Splines [
45].
Parametric methods that utilise the spatial correlation between the underlying plot data describing Site Index have been less frequently used to develop models and surfaces of Site Index. Amongst these geostatistical methods, ordinary kriging and regression kriging are the most commonly used techniques [
3,
19]. Because predictions are made by ordinary kriging through interpolating values between measured plots, this method is most precise when plots are located in relatively close proximity [
3]. Regression kriging is less reliant on a dense plot network than ordinary kriging, as this method fits an underlying regression model and then geospatially refines these estimates through kriging the model residual variation across the area of interest [
3].
The recent emergence of advanced machine learning methods allows for greater utilisation of the increasing amount of information in geospatial surfaces, as these models can often accommodate collinearity between closely correlated environmental variables [
49,
50]. Despite this advantage, few studies have compared the predictive precision of these methods with more traditional approaches. For forest species located in Belgium and Turkey, Site Index was more precisely predicted while using non-parametric methods than multiple linear regression and, amongst non-parametric methods, artificial neural networks had the highest predictive performance [
33]. Comparative studies of model performance undertaken in
P. radiata plantations have highlighted the precision of regression kriging and more advanced non-parametric models, but, as with other forest species, have not included a comprehensive comparison of the models. Within New Zealand plantations, regression kriging was found to be marginally more precise than ordinary kriging, which, in turn, was more precise than Partial Least Squares [
3,
19]. A comparison of seven modelling methods using data that were collected from northwest Spain found the non-parametric Multivariate Adaptive Regression Splines (MARS) to be the most precisely predicted Site Index, which was closely followed by the parametric methods of stepwise regression and PLS [
45].
Because each modelling method has its own limitations and advantages [
51], an alternative approach for improving the overall model precision is to combine predictions from each model [
52,
53]. This method, which is known as Ensemble Modelling, is a well known methodology that can improve prediction through integrating knowledge from many sources [
53]. Although this technique has been used for the prediction of many soil attributes [
53,
54] and class prediction studies [
52], we are unaware of any studies that use spatial Ensemble Models for the prediction of Site Index.
In Chile, different Site Index curves have been developed for each region and geographic area, although these local predictions are relatively inaccurate and there is little understanding of how the Site Index responds to topography, climatic and edaphic conditions [
55]. Given the wide diversity of environmental conditions within the region over which plantations are grown, we assumed that more than one modelling method would be required to best predict Site Index across south-central Chile. Consequently, the objectives of this study were to compare the precision of a wide range of modelling algorithms and determine whether the combination of multiple algorithms (e.g., by the means of spatial ensemble learning) could more precisely predict Site Index than the best performing single modelling algorithm.
4. Discussion
This study clearly demonstrated the utility of geostatistical methods for predicting the Site Index of
P. radiata. While using a novel spatial ensemble approach, the most precise model for each pixel was combined in order to produce an overall model with a more precise prediction that the constituent models. The final model had an RMSE of 1.88 m which compared favourably with previous predictions for both
P. radiata and other coniferous species [
3,
19,
33,
34]. The variable reduction process highlighted the sensitivity of Site Index to climatic and edaphic factors.
The geostatistical models of ordinary kriging or regression kriging provided the most precise predictions of Site Index among the compared methods. Previous studies have primarily focused on comparisons of precision between the models of Site Index, which do not include a geostatistical component, demonstrating that non-parametric generally outperform parametric models [
33,
45,
46]. Our results extend this research through demonstrating that the addition of a spatial component to both of these model types outperforms models without this component. Gains through regression kriging over the base model were particularly marked for the most precise model, which utilised PLS (r = 0.807 vs 0.705), demonstrating the utility of this approach. Although regression kriging has been widely used for prediction in other domains [
67,
77,
78], with few exceptions very little research has used regression kriging for prediction of productivity indices. As noted by Samuel-Rosa et al. [
79], there was generally a consistent but small reduction in precision when the number of variables in the models was reduced from between 18–20 to five.
Regression kriging and ordinary kriging have been found to be the most precise when applied to high density datasets. The accuracy of OK is most influenced by the spatial point pattern (random, aggregated, or regular), sampling density (high or low), autocorrelation, data distribution (normal and skewed), and heterogeneity of the data [
80]. The high density of observations in this study favoured the use of ordinary kriging and regression kriging, which is consistent with a model of
P. radiata Site Index developed in New Zealand [
19]. Regression kriging is less sensitive to the spatial distribution of the sample plots than ordinary kriging, as this method also includes an underlying model that is based on environmental variables. As a result, regression kriging can outperform OK when datasets include a range of plot densities across the area of interest (e.g., [
19]).
A novel ensemble approach was used here to spatially combine predictions from the five most precise prediction models. Although ensemble methods have been widely used in other disciplines [
52,
54,
81,
82], this method has not previously been used in order to predict Site Index. Most ensemble approaches combine predictions from all models across the entire study area using a range of approaches to weight the individual model predictions [
81,
83]. Our approach differs in that a single model was used in order to predict Site Index within each pixel, which improved the overall predictive precision over any of the five constituent models.
One of the advantages of our ensemble approach is that this method highlighted regions in which each model performed best, which may be useful if predictions need to be made within a sub-set of the study area. The RF–Kriging method was found to be well suited to northern regions with sparse observations and southern parts of the study area without observations, which highlights the utility of this approach, where the observation density is low. OK had less error in regions where there were both sparse and denser observations. In high altitude eastern areas where the OK model was not selected predictions from this model are likely be higher than actual values as there were not any points to interpolate within this region. In the central part of the study area, as well as southern parts of the Andes, PLS–riging had the lowest prediction error and this region generally included a dense concentration of observations.
The five environmental variables that were the most important determinants of Site Index were all climatic variables for the RF-Kriging model and almost all edaphic variables for the PLS-Kriging model. Growing degree days, accumulated rainfall during years 1 and 4, and variables describing the rainfall and temperature seasonality were the most important climatic determinants of Site Index. Growing degree days has a sound physiological basis as a predictive variable, as
P. radiata height extension is strongly regulated by air temperature [
84,
85] and, consequently, this variable controls the length of the growing season. The sensitivity of Site Index to rainfall has been well established [
4,
8] and our study clearly demonstrates the importance of adequate rainfall during the years immediately after establishment. The seasonality of air temperature and rainfall were also important regulators of Site Index within Chile, where both of the variables exhibit a marked seasonal variance [
86].
Important soil properties included C:N ratio, soil hydraulic conductivity, available soil water, and soil depth. Soil C:N ratio has been found to be a key determinant of conifer productivity [
87] and it is a more precise proxy of soil nutrient availability than N as C:N ratio accounts for the positive relationship between carbon content and nitrogen immobilisation [
88,
89,
90,
91,
92]. Both available soil water and soil depth control the amount of water available to trees which are key attributes within the study area where rainfall is often sparse and highly seasonal [
86]. Similarly, soil hydraulic conductivity is also related to water availability and it reflects the soil’s ability to transmit water when subjected to a hydraulic gradient, therefore controlling the partitioning of precipitation between surface runoff and groundwater recharge [
93].
Although enhanced vegetation index was not included in the top five variables, variables describing the mean and dispersion of EVI consistently featured among the optimum variables for both types of model. Previous research has found EVI to be a useful predictor of forest canopy structure [
94]. There is a strong physiological link between this variable and growth rate, as EVI has also been found to be strongly related to leaf area index (LAI) [
95,
96] and EVI can also be used in order to identify the start of the growing season [
97].
Topographic variables that were well represented for prediction of Site Index in both types of models included terrain surface convexity (TSC) and valley depth. These covariates influence local microclimate and soil-forming processes, and they are consequently associated with the soil type [
43]. Both TSC and valley depth are also associated with multiple environmental variables, such as water drainage and water availability, as well as the accumulation of clay and other soil particles. Valley depth is also likely to be a proxy for local exposure to the wind. As higher windspeeds result in reduced tree height and increased diameter [
98,
99,
100,
101]
P. radiata located in deep valleys with little wind exposure has been found to have significantly greater height than trees that are located on ridges or more exposed areas [
102,
103].
The direct estimation of Site Index from height data that were collected at the index age of 20 years reduced the error from extrapolation. According to Burkhart and Tomé [
104], estimates of Site Index using measurements that coincide with 20 years are rare. As a result, most SI studies use equations to extrapolate height to the required index age, which is an approach that is potentially biased [
105,
106]. More recent methods, such as the generalised algebraic difference approach (GADA), which include polymorphic models, provide a more accurate estimation method, but still include uncertainties in the final prediction [
105]. An additional advantage of using an older dataset was that site specific climatic conditions could be estimated over a uniform period of the rotation length from establishment to 20 years of age.
5. Conclusions
In conclusion, we found geostatistical models of Site Index to outperform a range of parametric and non-parametric models without a geo-spatial component. These five geostatistical models were successfully combined into an ensemble model that was more precise than the constituent models. Climatic and edaphic variables were most strongly related to Site Index, although EVI and many topographic variables were also widely used within the five most precise models. Variables that are related to soil water balance, such as rainfall, soil depth, and water holding capacity, were well represented in the top five models reflecting the importance of water limitations in regulating growth across the study area.
This research highlights the potential improvements that can be gained through the application of sophisticated modelling methods to site productivity modelling, which are likely to be transferrable to other species and countries. Although geostatistical models, such as Ordinary Kriging, are not likely to be as applicable in situations with sparser datasets the regression models used here would be applicable under these circumstances. Future research should more fully utilise LiDAR from existing plantations, as these data can be used to estimate height and Site Index very precisely. In the context of this study, LiDAR can also be used as a supplemental form of plot data, which could prove to be useful when existing plot data are sparse or do not completely cover all environmental conditions. These estimates can be used as input to machine learning and geostatistical models to generate surfaces and predictions of Site Index for unplanted sites.