1. Introduction
The use of various technologies to gather huge amounts of high-dimensional data has developed and become increasingly important in order to better understand a variety of activities, including strategic and managerial efforts for successful environmental sustainability. Remote sensing technology has long been used to collect data that may be used to map, categorise, and monitor the landscape (ecological and man-made infrastructure), as well as to help in effective planning. Some of these endeavours have often relied on high-resolution spectral and temporal data collected using remote sensing, and/or field or laboratory measurements. Rapid technological innovations in the acquisition of spectral data have resulted in improvements in the description of landscape characteristics, particularly when employing high-resolution satellite imagery [
1,
2]. Among these applications are the detailed monitoring of ecological ecosystems that require differentiation of vegetation types or communities, detection of plant stress, and mapping of the biochemical composition in vegetative material. The intensive monitoring of ecological ecosystems also involves the detection of plant stress and estimation of the biochemical composition of vegetation material. A wide range of field and/or laboratory instruments, including hyperspectral field spectrometers, Analytical Spectral Devices (ASD), measure reflectance across the major parts of the electromagnetic spectrum (EMS), including visible (VIS), near-infrared (NIR), and short-wave infrared (SWIR). Sparsity is widespread in high-dimensional variable domains because it is impractical to acquire adequate sample sizes. In this work, the use of ensemble classification algorithms based on random forests and gradient-boosting machines is investigated to identify between tree species based on temporal-hyperspectral data. Hyperspectral measurements suffer from the curse of dimensionality, since they collect reflectance over hundreds of tiny bands to form a continuous spectrum. Reflectance is collected and produced in
bands (variables) for each recorded leaf measurement, making collecting a large enough sample size to adjust for such dimensionality extremely difficult and costly. When
p exceeds
n (as is the case here), the available data becomes sparse, and most classifiers, particularly those in the classical statistics domain, become inefficient and sometimes fail to perform mathematical calculations and explore a large search space in the high-dimensional model [
3]. There are various classifiers that can handle high-dimensional search spaces for classification and discriminant analysis, including kernel-based approaches such as Support Vector Machine (SVM), ensemble methods such as random forests, boosting models and neural networks. These techniques offer tools for modeling and analysing complex data sets and are largely based on supervised and unsupervised learning and prediction modelling [
4].
We use ensemble classification algorithms to distinguish different tree species using high-dimensional hyperspectral data (with highly correlated bands at certain portions of the EMS) with a temporal dimension. Ensemble learning approaches employ a variety of classification techniques, including (1) fundamental learning methods such as decision trees; (2) bagging, which involves averaging and entirety of decision trees; (3) randomisation, which includes bootstrap resampling of observations and variables; (4) sequential development of decision trees, also known as boosting [
5]. These techniques are notable for their ability to discover relevant features even in the presence of noise and are useful when dealing with high-dimensional spaces. They are also effective in situations involving small sample sizes, nonlinear relationships between features and responses, and complex interactions between features and responses [
6]. As a result, they have been applied in fields as diverse as bioinformatics, cheminformatics and ecology [
7]. Furthermore, these methods, particularly random forests, have been applied to regression and classification problems involving large amounts of data in fields such as medicine, agriculture, remote sensing [
8], astronomy, finance, online learning and text mining [
9,
10,
11,
12]. Ensemble approaches, such as random forest and gradient boosting, combine several techniques using statistical and machine learning frameworks to increase the performance of regression or classification models.
It can be difficult to distinguish or categorise spectrally similar objects measured at a single point in time, especially if these objects (e.g., plants) exhibit changes over time. Dynamic spectral characteristics of objects such as trees may be useful in enhancing the separability of individual trees depending on environmental influences such as seasonal fluctuations in weather or climatic conditions. To improve tree species separation, we used time-induced variations in tree leaf spectra. We are particularly interested in the use of ensemble learning algorithms to characterise tree leaves measured using hyperspectral sensors (which record reflectance over hundreds of variables) at different times (covering different growing seasons). Although we assume that the detailed information in the temporal-spectral measurements will be useful in detecting any small variations that could be used to differentiate between tree species, the high-dimensional search space provided by these measurements presents challenges that should be explored, particularly from the perspective of statistical learning. As a result, a number of research problems arise in this work from both statistical methods and ecological and remote sensing concerns. Statistical issues include whether temporal spectral signatures influence the separation of relevant tree species (i.e., how prediction and validation accuracies vary and if such variations have statistical significance) and identifying spectral signatures (or parts of the electromagnetic spectrum) influencing the separability thereof. From a technological (sensor) application perspective, it is important to determine which of the measurement times provides the best discrimination.
In summary, the purpose of this study is to (1) identify the optimal period for distinguishing tree species, (2) improve species separation by leveraging the effect of measuring trees at different periods, and (3) determine the major drivers (parts of the EMS or spectral wavebands) influencing changes and discriminability of the relevant trees. As a result, we hypothesise that incorporating time-related changes may enhance the discrimination between similar objects, and we anticipate variation in classification errors over time periods.
4. Results
The results of the classification derived from RF and GBM models, including prediction errors and important variables for species discrimination at each measurement time, are presented in this section. The average classification errors generated from these models across the measurement periods (
Time 1 through to
21, covering different seasons) are shown in
Figure 2, with the 95% confidence bands as measures of uncertainty associated with these errors. In terms of the findings from RF, this model provided an average out-of-bag error of roughly 13.5%, with time-specific average errors ranging from 1.4% to 32%, indicating changing patterns in leaf properties during the interannual growing season of the relevant tree species.
Further detail provided in the right-hand panel of
Figure 2 generally illustrates an increasing pattern of classification inaccuracies from winter through to autumn, with some fluctuations within and between seasons. It is also noticeable from this figure that the measurements gathered during the winter month (from June to August) produced relatively lower classification errors (ranging from approximately 1.4% to 9%) compared to other seasons. The highest classification inaccuracies (exceeding 30% error rate) were obtained from one of the measurements collected during one of the spring months (specifically Time 8, on the 28th day of September), with the highest possible error almost reaching 50% as per the upper confidence limit.
Regarding the results of GBM, the classification inaccuracies were generally lower (average classification error of 5.6%) than those of the random forest model. At a detailed level, the average error classification at each of the measurement times ranged from a minimum average error of about 0.8% observed from the beginning of the spring season to the largest classification error of about 10.5% observed from measurements gathered in autumn. However, the temporal pattern of the GBM classification errors was not as variable as that of the RF errors since larger error fluctuations were more pronounced in the random forest models.
Having observed temporal variability in the pattern of classification errors resulting from both models, it was important to examine whether there is enough statistical evidence to suggest that the average difference in classification errors is due to differences in the times at which these hyperspectral leaf measurements were gathered. In addition, we quantified the amount of variation that time accounts for in explaining the observed variations.
A statistical analysis was conducted in which a generalised linear model was used to determine whether the average classification error for the measurement time varied significantly. To determine whether there were significant changes in the mean error, specifically over time, we used the least squares difference (LSD) at an
alpha (
) level of
. The model suggested that at least one of the time points is statistically different with respect to classification errors obtained from the random forest model and that the measurement time accounts for approximately 46% of the variation in these errors. A multiple comparison of the effect of time on classification errors is shown in
Figure 3, with distinct pairs of time shown by blue diagonal lines, while those that are not statistically different are indicated by red diagonal lines. In this figure, the mean classification error for time is shown along the x-axis, with the mean error of the other time along the y-axis with a dot at the intersection. The identity line represents the equality of means such that if a vector does not cross this line, we can conclude that the means’ mean errors are significantly different between relevant time periods. Otherwise, the means are similar if the lines cross the identity line. Red vectors are used to identify significant differences between time periods, whereas blue lines indicate time vectors with similar classification errors with respect to their average errors. It is most noticeable in
Figure 3 that most error differences occurred between
Time 8 (with the largest mean error of about 32% and representing a measurement period in September) and the rest of the other measurement periods. Other pairs of time where distinct differences occur mostly include time pairs that are farther apart. For example, the earlier time periods that included measurements gathered between June and July consistently differ (in terms of average classification inaccuracies) from the later measurements taken in January and May of the following year. Regarding understanding the variation of errors from the boosted model, the analysis indicates that measurement time has an effect on the variation of classification errors and accounts for nearly 21% of these differences. Generally, differences in the temporal pattern of these errors are not as pronounced as differences obtained from the random forest model.
Figure 4 shows pairs of time periods that are statistically different with respect to their average prediction errors. From this model, the mean error from
Time 21 (from end-of-May measurements) significantly differs from the earlier measurements (taken from the start of data collection in June to the end of September, as well as in February and March).
Time 12 errors representing classification errors from measurements gathered towards the end of November are also distinct from the average errors of measurements gathered from June through to the end of September. Therefore, the variation of errors in time is more evident from the random forest model as compared to the GBM model.
Figure 5 provides a graphical view of classification inaccuracy patterns from the random forest model at various measurement time periods, for each of the eight species. The plotted points represent the time interval values at which the spectral reflectance of each species was gathered across the distribution range of class prediction errors to highlight the time periods that influence classification inaccuracies. These results generally show that discrimination between species is not constant in time and that some species appear to be more accurately distinguishable at different times than others.
Tree species that appeared to be easily separated via the random forest model include Celtis africana, Englerophytum magalis, Brachylaena rotundata, and Strychnos pungens as their classification inaccuracies were largely below 30%. Meanwhile, species such as Combretum molle and Lannea discolour had at least one time point in which their classification rate was no better than the random allocation (classification errors exceeding 50% and the highest reaching 57%) for both CM and LD at time 8 (30 September) and 21 (25 May), respectively.
The smallest classification inaccuracies occurred in the June to August measurements for all tree species. The pattern of inaccuracies observed at a species level corresponds to the generic patterns discussed above. However, additional information suggested that even though the highest average classification error came from the random forest model, species-level classification errors were subject to larger variations in classification rates. The influence of these larger errors, with seemingly outlier properties, had an influence on the average errors in
Figure 2, particularly for
Time 8, which stands out as having the largest inaccuracies.
Species-level classification inaccuracies from the GBM model are shown in
Figure 6, and the variation of these errors is not very different from that of the random forest model, even though there is a slight reduction in the magnitude of errors obtained from GBM. Similar species that were highlighted as having the largest classification errors at certain time periods, by the random forest model, maintained that pattern from the GBM results as well. The major difference, however, is that the measurement periods that generate larger errors (those greater than 20%) from GBM are different. This is with the exception of
Time 21, which appears in both models as one of the highest error time periods where species such as LD and RC were not easily distinguishable. Another difference, for instance, is that larger errors from GBM were associated with clusters of times including periods in May (
times 20 and 21), April (
Time 18) and November (
Time 11 and
12). Another important element of classification involves performing a diagnostic assessment of the model performance by summarising the resulting confusion matrix (matrices) rather than observing the accuracy or error rate on a generic scale.
Figure 7 compares the classifiers’ performance based on the micro AUC, demonstrating that both models provided an adequate categorisation of the tree species, with GBM offering a slightly superior discriminatory ability than RF for a majority of the periods in question. The pattern of variability in accuracy through time appears to be consistent with the insights which have already been revealed in preceding results.
Table 2 provides a summary of the model accuracy statistics including accuracies and their confidence limits (CI) as well as the Kappa coefficient. The Kappa coefficient is sometimes viewed as a more robust measure since it also accounts for the likelihood of agreement between a true and predicted classification of categories by mere chance. As can be observed from
Table 2, the values of kappa are slightly lower than those of accuracies because kappa penalises the statistic by incorporating random chance as compared to a percentage measure. Since there are no agreed-upon threshold values for levels of agreement, different areas of research assign various thresholds to indicate poor, good, or exceptional discrimination between objects. The findings indicate that accuracy in categorizing tree species varies over time, demonstrating that time could influence the level of discrimination between trees because of the changes in spectral properties through time.
Time periods with relatively lower classification accuracies and kappa values are highlighted in yellow and blue in
Table 2 to highlight those times when the models achieved moderate classification performance. The confidence intervals for overall accuracies also show the degree of uncertainty surrounding the classification, with certain time periods having wider confidence limits, suggesting more fluctuation around the values.
Figure 8 presents important variables in the prediction of the eight species for the initial measurement periods (winter months) with the top 25 variables (wavelength bands denoted by a prescript ’B’) identified using an unbiased feature selection for the random forest.
Time 1, which corresponds to the time period (10 June) when measurements of the relevant species’ leaf reflectance properties were first collected, reveals that the most relevant spectral signatures for species discrimination were largely from the NIR part of the electromagnetic spectrum, with some of the few red-edge position bands located between 670 nm and 780 nm which are closely associated with the pigment status and physical and chemical properties of vegetation [
27].
Time 2 consists of bands from similar regions as
Time 1 but also contains a few more signatures from the VIS region, 401 nm and 669 nm. The VIS bands are known to have strong chlorophyll absorption and are sensitive to photosynthetic pigments and characteristics including biochemicals such as carotenoids (responsible for the orange pigment), chlorophyll (green pigment) and xanthophyll (yellow pigment).
The classification of species in Time 3 seems to be driven largely by the bands from the SWIR part of the spectrum which generally provide information about leaf structure, proteins and nutrients, whereas Time 4 shows a somewhat different profile of signatures which played a significant role in the discrimination of species.
The classification of species in
Time 3 appears to be largely driven by bands from the SWIR part of the spectrum, and these bands typically provide information about leaf structure, proteins, and nutrients.
Time 4, meanwhile, shows a slightly different profile of signatures that played a significant role in species discrimination. Additional graphs are provided in the
Appendix A and
Appendix B, demonstrating the changing spectral attributes which may be useful in the characterisation of tree species based on leaf properties or temporal condition.
Figure 9 depicts the bands identified as the most important by the GBM model in relevant winter weeks. Regarding
Time 1, GBM mostly selected the signatures from the NIR range of the EMS, especially those in the red-edge region, and very few SWIR signatures, as the most important bands in discriminating between species. For
Time 2, the strongest signatures identified by GBM were predominantly those from the VIS (with mainly blue, a few red, and green bands), NIR and the SWIR regions.
Time 3 had a number of SWIR bands, a mix of red, blue, and green bands from VIS, and only one NIR signature, just as in the previous two time points. The important bands in separating species from
Time 4 include those from SWIR, VIS (with no yellow bands included), and a few NIR bands.
It is important to note from the analysis of important variables that, while the random forest and GBM produced somewhat different lists of variables at various measurement times, these models did identify these variables largely from the same regions of the EMS. In general, this analysis shows that different wavelength bands from various regions of the electromagnetic spectrum contribute differently to the discrimination of the species depending on the time at which the measurements were taken. The main observation from the results is that the different parts of the electromagnetic spectrum can play different roles in the discrimination of trees depending on the time at which these measurements are gathered.
5. Discussion of the Results
This investigation is part of a larger project that aims to improve separability between similar tree species using hyperspectral measurements by incorporating variability in leaf characteristics that occurs over time due to seasonal changes during the annual growing cycle of plants. Because of high spectral dimensionality, ensemble learning techniques involving random forest and GBM were used in distinguishing between tree species at different times. Second, it was important, from an ecological perspective, to identify the time period at which the separability between the relevant tree species (from a leaf-level perspective) could be most favourable. The discussion of the results is anchored around these two main aspects.
5.1. Comparative Assessment of Class Prediction Accuracy between Random Forest and GBM
Previous studies that have compared the prediction accuracy between random forest and gradient boosting methods, particularly in remote sensing applications, have not reached the same conclusions regarding their performance. In the recent review involving the use of random forests in remote sensing [
8] the included studies made a comparative assessment of the classification accuracy results between random forests and boosting ensemble techniques such as adaptive boosting and concluded that random forests provided better classification results than boosting ensembles. Meanwhile, specific studies in the same review [
8] found that these two sets of methods provided similar classification results, with RF gaining favour due to its stability and less computationally intensive requirements [
28,
29]. However, in an investigation by [
29], however, slightly improved classification results were obtained from specific booting techniques (AdaBoost tree and AdaBoost random) compared to random forest and bagging tree methods. Another recent study by [
30], which applied extreme gradient boosting (XgBoost), random forest, and SVM for object-based classification of the relevant types of
Land Use-Land Cover (LULC) types, found that XgBoost outperformed random forest and SVM.
From the results obtained in our study, a stochastic gradient boosting technique outperformed the random forest with respect to the accuracy of the classification across time intervals. It is important to note, however, that the random forest was able to account for larger differences between species at various measurement periods. This could be explained by a known phenomenon associated with random forests being sensitive to imbalanced training samples, thus favouring the most represented classes. In our case, class imbalances occurred, particularly because deciduous trees had fewer samples during leaf shedding times. Moreover, it appears that the random forest was more sensitive to intra-species-level variability. For example, the random forest achieved the largest average classification error at measurement time period 8 (consisting of measurements collected at the end of September), where larger errors were generated from (Combretum molle) reflectance measurements with a few newly emerging leaves. Generally, species with fewer measurements tended to have the highest average classification inaccuracies, and errors were obtained from deciduous tree species with lesser measurements at relevant time periods. Therefore, the random forest maximised the degree of differences between measurements collected at different time points, accounting for double the amount of temporal variability compared to the boosted ensemble.
5.2. Important Variables
Although opportunities exist to reduce the high dimension of wavebands from hyperspectral measurements without losing much useful information, high correlations between adjacent bands of these measurements make it challenging to perform the exact band selection. Therefore, the intention for identifying bands with high discriminatory potential was not to directly pinpoint the exact bands, but rather to identify prominent regions of the EMS, as well as to assess their contribution based on known reflectance properties.
Other phenological-based applications that have used phenological analysis to understand periodic patterns of change in vegetation characteristics and the extent to which these are altered by changes in seasonal or climatic variations; mostly based on remote sensing, in situ, and laboratory data to examine such changes. A study by [
16] is among a few studies in which phenological events were studied to establish the potential to improve the classification between tree species. Specifically, this study used laboratory measurements gathered from two simulated stages, including flowering and nonflowering stages, and established that the classification between species was enhanced during the flowering stage (measurements gathered in July), with prominent differences from the VIS part of the electromagnetic spectrum. In a study by [
31], where leaf properties were examined based on laboratory measurements, the authors discovered that signatures in the visible range explained variations in the relevant properties. Everitt et al. [
32] studied the impact of flowering on VIS and NIR spectra of
Drummond goldenweed species and found that these species were only distinguishable from others based on VIS bands during the flowering stage. However, during non-flowering, the NIR bands of the goldenweed species were separable from other relevant species. Hence, from these studies we can conclude that even though VIS has a greater influence on the separation at certain times, other regions have a significant contribution depending on the measurement time and prevailing characteristics.
Generally, our study established that leaf phenology variations and the potential to spectrally distinguish target tree species classes were driven by various spectral characteristics at different time periods. Different sets of VIS bands were consistently identified as being some of the important bands in discriminating between tree species, at different times. There was, however, a strong combination of NIR and/or SWIR bands along with the VIS wavebands, which provided a better discriminatory ability. This indicates that photosynthetic characteristics were predominantly driving the prediction of species while SWIR and NIR, which characterise leaf structure, proteins and starches, age, leaf health and nutrients, also played a significant discriminatory role. Since two different models were applied in this study, it is important to note that these models did not always select similar spectral characteristics when looking at the top 25 important variables. It is, however, worth noting that similar sets of bands, especially those in the VIS range, were identified in both models, while most inconsistencies were with respect to the selection of NIR and SWIR wavebands. In the view of the changing spectral properties with time, this study suggests that it may be limiting to use the same set of bands (drivers of separability) for prediction at other times, especially when using band-level information.
5.3. Best Time to Distinguish between Species
Since it can be relatively costly to acquire high resolution satellite imagery, the optimal time for acquiring images has been studied for monitoring and managing ecological applications or agricultural sites with varied imaging technologies. Some of these investigations have been conducted on multiple temporal images to capture the variability across the growing season and to determine the best time for observing and identifying specific characteristics as well as classifying crop types, trees and grass species, as these have changing characteristics over time. Using aerial images, Lisein et al. (2015) [
33] were able to determine that spring and fall (end of leaf flushing) were the best times for species separation. Hill et al. (2010) [
34] combined temporal images to find that using a combination of 17 March, 16 July, and 27 October had the greatest overall classification accuracy, at 84 percent (green-up and full-leaf phases were optimum). In this study, which used
in situ temporal hyperspectral leaf measurements, we discovered that the best time for differentiating tree species was during the winter and spring seasons.