Combining Airborne Laser Scanning and Aerial Imagery Enhances Echo Classification for Invasive Conifer Detection

The spread of exotic conifers from commercial plantation forests has significant economic and ecological implications. Accurate methods for invasive conifer detection are required to enable monitoring and guide control. In this research, we combined spectral information from aerial imagery with data from airborne laser scanning (ALS) to develop methods to identify invasive conifers using remotely-sensed data. We examined the effect of ALS pulse density and the height threshold of the training dataset on classification accuracy. The results showed that adding spectral values to the ALS metrics/variables in the training dataset led to significant increases in classification accuracy. The most accurate models (kappa range of 0.773–0.837) had either four or five explanatory variables, including ALS elevation, the near-infrared band and different combinations of ALS intensity and red and green bands. The best models were found to be relatively invariant to changes in pulse density (1–21 pls/m2) or the height threshold (0–2 m) used for the inclusion of data in the training dataset. This research has extended and improved the methods for scattered single tree detection and offered valuable insight into campaign settings for the monitoring of invasive conifers (tree weeds) using remote sensing approaches.


Introduction
Exotic conifers are the foundation of the plantation forest industry in many Southern Hemisphere countries, providing significant economic and social benefits.However, a number of exotic conifer species have become invasive and, under certain environmental settings, propagate beyond the plantation boundary and are now serious tree weeds [1][2][3].In New Zealand, invasive conifers, often referred to as 'wilding conifers', are dominantly invading indigenous and semi-native grass and shrublands across large areas of the South and North Island [3,4].Recent estimates place the total area affected in New Zealand at 1.7 M ha, with the rate of spread estimated at 5%-6% per annum [5].Invasive conifers are estimated to occupy, with highly variable densities (from less than one tree per hectare to full cover), an area equivalent to the national plantation forest estate [5].The costs from lost pasture land alone are estimated at between $88 and $221 million [6].A range of chemical and physical control methods can be deployed to control invasive conifers [7], but these depend on successful detection of individuals and characterisation of the infestation level to be deployed cost effectively.Extensive closed canopy stands of mature invasive conifers are relatively easy to detect across the landscape, although smaller groups and isolated individuals of various sizes and particularly juvenile and stunted trees are more problematic.The control of early invasion stages, characterised by smaller, juvenile and more scattered trees, is the most cost-effective way to prevent further expansion of conifer infestations [4,8].The detection of such trees is critical for invasive conifer management, even if the detection of individuals at such an early stage of invasion is a complex and laborious task [9,10].Importantly, effective control in these early stages of invasion supported by a high detection rate of mature trees can reduce the cost and intensity of later management efforts on a site as widely-scattered seed producers are eliminated [8,11].Current methods of surveillance and monitoring rely on the detection of invasive conifers across often wide areas carried out as helicopter-based surveys by skilled observers [12], ground surveys in smaller areas or a combination of both [13] with often various levels of detection success.These approaches are expensive, even if combined with control operations at the same time (search and destroy), and rely heavily on the observers' ability to correctly and rapidly identify individual invasive conifers of variable size occurring at various densities across often multiple vegetation types.
Remote sensing has been widely used to detect and monitor invasive species across natural environments ranging from arid areas [14] and estuaries [15] to dense tropical rain forests [16] and coastal scrub communities [17].The success of remote sensing techniques has been noted to vary according to the structural and phenological traits of the invasive species in relation to the invaded habitat [18].Spectral and other properties from imagery alone can be used to successfully identify invasive species [18].
Airborne laser scanning (ALS) offers the appealing advantage of providing precise elevation and structural data from vegetation returns, which makes it well suited to the detection of taller isolated trees in otherwise short-stature vegetation types [19,20], and the combination of height and structural information derived from LiDAR data with imagery has been shown to improve the identification of invasive species [21,22].The literature on identifying invasive conifers in New Zealand's unique environment using remote sensing techniques is not well developed, and we are unaware of efforts to use ALS for invasive conifer detection in this context.However, relevant examples of this approach may be found in efforts to detect and monitor shifts in the tree line in boreal ecotones affected by climate change [23,24].Height information from high-density ALS (7.7 pls/m 2 ) has been successfully used to identify isolated small-stature trees in tundra environments, with nearly 91% of trees >1 m successfully detected according to elevation of echoes above a digital terrain model (DTM) generated from the same data [19].Detection of smaller trees was more difficult, with trees <1 m returning discernible positive elevation values 5%-73% of the time depending on species and DTM properties.The decrease in accuracy could be partly attributed to the inherent vertical error in ALS elevation values, the fact that smaller individuals are less likely to be sampled or generate sufficient return energy and an increase in terrain features, such as rocks with a height close to that of the target trees [19].Use of additional ALS-derived variables, such as intensity and topographic features, have been proposed as a means of improving the classification of trees and returns from other sources, especially at lower height thresholds [24,25].In one study, the inclusion of intensity and slope with echo elevation allowed tree and non-tree echoes to be distinguished with an accuracy of 93%, and detection was high for nearly all tested models that included elevation and intensity [25].However, other studies have found intensity data to be of little use in identifying ground vegetation [26].This was at least in part due to the uncalibrated and sensor-dependent nature of intensity information from ALS data, which can reduce the applicability and utility of these data across surveys [26].Echo-based classification requires substantial computational power to evaluate datasets over large areas that invariably contain millions of records.Stumberg et al. [25] explored the use of an unsupervised classification approach using a raster-based algorithm to identify pixels containing trees.Detection of trees using this approach was generally good with detection rates for larger trees (>1 m) ranging from 36%-73%.Overall, larger trees had an increased probability of detection and resulted in reduced errors of commission.
Research on the effect of ALS pulse density would ideally use data acquired from campaigns with varying altitude, speed or pulse frequency.Repeated campaigns would capture both changes in final pulse density, as well as related impacts, such as changes arising from beam divergence or altered energy per pulse [27][28][29][30].Unfortunately, this is prohibitively expensive in most cases, and so, researchers have developed methods to thin ALS data and simulate adjusted flight patterns.Numerous studies have investigated the effect of pulse density on forest properties estimated from ALS data [30][31][32][33].These studies have encompassed a wide range of forest types and data thinning methods [29,34].However, the impact of pulse density on the detection of small, pioneer trees has not been well studied through simulation or campaign selection.Naesset and Nelson [19] achieved good detection of isolated trees with ALS data containing 7.7 pls/m 2 .At this density, nearly every tree over 1 m was sampled by at least one pulse.Indeed, several other studies have used similar pulse densities in the range of 6-8.5 pls/m 2 [20,25,35] to achieve good rates of detection for similarly-sized trees (>1 m).Detection of pioneer trees at the boundary of the Arctic-boreal regions has been achieved from surveys with densities as low as 0.25 pls/m 2 ; however, in this case, the trees had a modal height of 6.6 m [23].Naesset and Nelson [19] used data on detection success and tree crown diameter to estimate the size of crowns required for detection at different pulse densities.Their estimate suggested that at 1 pls/m 2 , the crown diameter would need to be 2.8-3.3 m to ensure detection.Overall, work in the boreal ecotone has demonstrated that individual pioneer trees in relatively open areas can be accurately detected using LiDAR, potentially over very large areas.In general, trees smaller than 1 m are problematic to detect without increasing the rate of false detections and other sources of error [19,20,25].
Remote sensing-based detection approaches of invasive trees that rely principally on the height of ALS echoes are likely to be well suited to many low stature vegetation types that are susceptible to conifer infestations in New Zealand.Particularly in short-stature grasslands, problematic invasive conifers are able to establish more successfully than most native tree species [8], simplifying the task of detection and avoiding a high commission error.Other vegetation types highly susceptible to conifer invasions in New Zealand, such as indigenous shrublands or tall tussock grasslands [36] mixed with complex terrain, pose a much greater challenge to detection efforts.
Our study area contained numerous vegetation types and, in some areas, the dominant indigenous vegetation formed closed canopy stands up to 4 m in height.A successful method for classifying invasive conifers requires the capacity to differentiate between echoes from invasive conifers and other vegetation.The ALS data alone may not contain sufficient information for this purpose.Fortunately, the spectral properties of invasive conifers are quite different to those of the other surrounding tree vegetation in this environment.This means that spectral information from aerial imagery may provide a practical means of separating invasive conifers from other vegetation types.Previous studies, working at a coarser resolution, have found that fusing structural information from ALS data with spectral data from satellite imagery provides a useful method for vegetation classification [37,38].
Controlling the spread of invasive conifers is critical to protection of New Zealand's natural heritage, threatened ecosystems and ecosystem services and maintaining the licence to operate for plantation forest managers in the face of increasing environmental scrutiny.There is a need to develop feasible and effective detection methods to understand and monitor the spread of invasive conifers and to guide management and control efforts of infested areas.In this study, we attempt to develop a method for invasive conifer detection using an extensive dataset that included ALS and spectral data collected from an area dominated by indigenous and semi-native grass and shrublands in New Zealand and a field dataset that sampled 825 solitary conifers of various sizes.
Using this dataset, the objectives of this research were to (i) compare the accuracy of detection models developed using various combinations of ALS data (elevation, intensity) and aerially-acquired spectral data and (ii) determine the sensitivity of classification accuracy in these models to the height threshold used for inclusion and the density of the ALS data.

Study Site
The study site was located in the vicinity of Geraldine forest in the Canterbury region in the South Island of New Zealand (Figure 1).Geraldine forest and the adjacent study site is positioned in the foothills of the Southern Alps.The topography is characterised by steep and broken terrain with elevations ranging from 203-780 m above sea level.Silty-loam soils dominate, and the climate is temperate with a mean annual temperature of 8.6 • C and an annual rainfall of 864 mm.The dominant production tree species planted at Geraldine forest are Pinus radiata D. Don (P.radiata) and Pseudotsuga menziesii (Mirb.)Franco (Ps.menz.).The study site is enclosed by plantation forest and therefore prone to high seed-rain from the adjacent plantations, resulting in a high presence of self-established conifers of various ages and sizes.The dominant land covers are short-tussock grasslands dominated by Festuca spp., Poa spp.and patches of indigenous shrub, dominated by manuka Leptospermum scoparium or ferns dominated by bracken (Pteridium esculentum (G.Forst.)Cockayne).One distinctive area of the study site was dominated by the invasive shrub gorse (Ulex europaeus L.).

Field Data
A field survey was carried out between 16 May and 29 June 2016 to assess the severity of the conifer infestation across the study area.In the first instance, a grid with a randomised start point and orientation was used to locate 46 field plots throughout the study area, providing a sample with good spatial coverage of the invasive study area.Five of these plots were abandoned because the terrain was too steep for them to be safely measured, leaving a total of 41 established systematically.An additional 27 plots were selectively placed by the field crews in locations representing areas with light, moderate and dense cover of invasive conifers.This provided a dataset across all major vegetation types enabling us to characterise their structural and spectral properties using the remotely-sensed data.
The sampling unit was a slope-corrected 0.04 ha circular bounded field plot with the plot centres fixed using a Trimble Geo7X GNSS (Trimble Navigation Ltd., Sunnyvale, CA, USA).The accuracy of the recorded plot centre positions was increased by differential correction using a local base station network maintained by Land Information New Zealand (LINZ).For all invasive conifers found within the field plots, the species, total height and diameter recorded at breast height (1.4 m) for trees and at ground level for saplings.Trees were defined as those individuals with a measurable diameter at breast height.The distance and bearing of each tree from the plot centre was also recorded.
In total, 825 invasive conifers were identified, located and measured within the 68 field plots.Ps. menz.was the dominant species, constituting 98.5% (813) of all invasive conifers.The mean height of this species was 1.72 m and ranged from 0.05-12.90m.A small number of P. muricata and P. radiata were found in the plots, with heights averaging 1.99 and 2.00 m, respectively (Table 1).

ALS Data
An ALS survey was completed over the study area on 13 and 14 June 2016 using a Riegl Q1560 two-channel scanner system with the settings shown in Table 2.A laser pulse rate of 330 kHz and a maximum scan angle of 14 • off nadir were used.Flight planning ensured substantial overlap across the entire area of interest to remove the possibility of data voids.During field work, 99 ground control test points were obtained and compared to interpolated elevation values from the ALS data.The results indicated a mean difference in elevation of −0.004 m (SD 0.017 m) and RMS of 0.017 m.Initial ALS data processing, including tiling and classification, was carried out by the supplier using the TerraScan (TerraSolid, Helsinki, Finland).Intensity values in the ALS data were delivered uncalibrated and ranged between 0 and 65,535 (median = 41,600).

ALS Data Thinning
Several methods for thinning ALS data are commonly employed depending on the objectives of the operation.Where the objective is to reduce data size, these methods focus on iterative removal of points while minimising the loss of accuracy on the target output, such as elevation [39].Other methods randomly remove every n-th echo until a target density is achieved.However, this approach does not simulate a reduction in pulses per unit area as might occur with a change in acquisition settings because the regular scan pattern will not be replicated [40,41].A custom algorithm was developed using Scientific Python [42] to better simulate changes in pulse density that may be expected from increasing flight altitude or reducing overlap in order to achieve higher spatial coverage at the cost of lower final pulse density.The algorithm removed all echoes originating from a pulse marked for removal, with the target density determining the regularity of pulse removal.The thinned datasets retained some of the inevitable variation in pulse density contained in the original survey while approximating the target mean pulse density.
The selected target pulse densities of 10, 5, 2 and 1 pls/m 2 represented a compromise between achieving regularly-spaced intervals and incorporating common minimum pulse densities specified for ALS surveys in New Zealand.The data thinning algorithm successfully produced datasets with average realised properties that were consistent with the target pulse densities (Table 3).

Aerial Imagery
Aerial photography was captured over Geraldine forest and surrounding areas on 17 March 2016 using a Vexcel digital UltraCamEagle (UCE) camera with specifications shown in Table 4. Imagery was captured with a ground surface distance (GSD) of 0.30 m from a flying height of 5770 m.All imagery was free from cloud and cloud shadow and had a minimum sun angle of +35 • .Image processing was carried out by the supplier and included ortho-rectification, map projection, mosaicking and the removal of atmospheric and topographical effects.Imagery was processed to Level 3 and manually checked for colour correctness and even tonal balance across the project area.

Echo Classification
Echo elevation values were converted to local normalised elevation values using a DTM triangulated from the ground classified echoes.These data were used to produce a pit-free canopy height model (CHM) [34] for the area of interest with a 0.3 m resolution.This resolution was selected because it approximates the footprint size of the laser beam in this study.Echo classification was evaluated as a point cloud processing method to investigate the detection of invasive conifers in the study area.To form a training dataset for echo classification, the location of each invasive conifer within the field plots was used to estimate the two-dimensional canopy area associated with each tree.Tree locations and the high-resolution CHM were used to train an algorithm based on the rLiDAR package [43].The CHM was loaded into R and used to delineate the approximate canopy areas using the field measured tree locations and the ForestCAS function of rLiDAR.ForestCAS includes a user-defined parameter that sets the percentage threshold of subject tree height at which pixels are excluded from a tree canopy.Initially, the default (0.3) for this parameter was used, and this was adjusted, where required, in an exploratory manner following visual inspection of the resultant canopy polygons, CHM and imagery until the results were deemed to be accurate.Echoes within the final canopy polygons were classified as invasive conifer echoes; those outside the invasive conifer canopy area, but within the field plots, were classified as non-invasive conifer echoes.
Using echo classification, we sought to develop a method for classifying individual returns from the ALS point cloud into both those that were backscattered from an invasive conifer and those that were not.Developing a classification of this type may offer an efficient means of mapping invasive conifers across the landscape using ALS.The elevation of the return (Z) above the ground logically provides a useful variable for the classification of returns from trees and shrubby vegetation, as these have a value considerably greater than zero.However, the height of the return contains no useful information on the other properties of the target object.The intensity of the echo has previously been used to improve the classification of vegetation type and is more useful following range calibration [26].Previous research has indicated that backscatter intensity is useful for differentiating between echoes originating in shrubby vegetation and those from other sources [44].This suggests that, at the least, intensity can be useful for differentiating between vegetation and non-vegetation objects.Consequently, investigating the potential of backscatter intensity for classification of invasive conifer echoes is worthwhile.In this research, we examined the utility of uncalibrated ALS intensity values for improving echo classification.
We extracted spectral data from the high-resolution aerial imagery of the study area and combined these data with the ALS echoes.The data processing chain for the ALS data could not easily accommodate a fourth spectral band, and manual inclusion resulted in unwieldy computation times; therefore, we chose to drop the blue band.This was motivated by the fact that this band is sensitive to interference [45] and appeared to be less influential in previous work on spectral-based invasive species detection [18].Each echo received the near-infrared, red and green values of the spatially co-incident pixel from the orthophotographs.Our method relied only on the simple assignment of spectral data to ALS points.This approach lacks the ability to account for the geometric effects that prevent image pixels from being reliably tied to spatially coincident ALS returns [46,47].However, the imagery available was captured for the purpose of creating orthomosaics, and the high degree of overlap and knowledge of sensor geometry required by more sophisticated approaches was not available to us [46].Nonetheless, as much of the vegetation was fairly low and the accuracy of assignment would also be limited by pixel size, we judged the loss of precision to be acceptable.
The coloured ALS point cloud was also used to summarise the spectral and structural properties of invasive conifers and the other major vegetation types in the study area (Figure 1).Echoes originating from invasive conifers were used to characterise their properties.Each study plot was classified according to its dominant vegetation type as either grassland, shrubs, ferns or manuka.Echoes originating from invasive conifers were excluded, and the remaining echoes within each plot boundary were used to characterise the properties of the dominant vegetation types in that plot.

Random Forest
Random forest (RF) is an ensemble decision tree classifier that uses bootstrap aggregated sampling (bagging) to construct many individual decision trees, from which a final class assignment is determined [48].RF is increasingly being applied to natural resource problems [49] and has previously been used to successfully model several plantation forest variables using remotely-sensed data [50][51][52][53].The RF algorithm constructs decision trees using a bootstrap sample from the available training data, with the remaining assigned as out-of-bag (OOB) samples.At each node, a random subset of predictor variables is tested to partition the observation data into increasingly homogeneous subsets.The node-splitting variable selected from the variable subset is that which resulted in the greatest increase in data purity (variance or Gini) before and after the tree node split [54].This process ceases when there are no further gains in purity.Response variables can be continuous, calculated by averaging, or categorical, predictions derived via a model vote, amongst all decision trees.The computational load of the algorithm is reduced, as only a subset of variables is used at each node split.This process also reduces the correlation between trees, improving both predictive power and classification accuracy.The OOB sample data are used to compute accuracies and error rates, averaged over all predictions, and estimate variable importance [49,54].RF provides two methods to estimate the importance of each predictor variable in the model.The mean decrease in accuracy (MDA) importance measure is calculated as the normalised difference between the OOB accuracy of the original observations to randomly-permuted variables [49,54].An alternative variable importance measure is calculated by summing all of the decreases in Gini impurity at each tree node split, normalised by the number of trees [49,55].RF is a well-regarded machine learning tool that has the capacity to identify complex and non-linear relationships in the fitting dataset and offers high classification accuracy [54,55].
RF categorical classification models were developed using the implementation of the RF algorithm available through the Ranger package [56] in R.This approach was chosen as it offered high performance classification and straight-forward parallelisation.Computing performance was important due to the large size of the training datasets.The classification training dataset included the invasive conifer key, computed from the field data, as the response variable and LiDAR elevation, intensity, near-infrared, green and red DN values as candidate variables.
RF models were initially fitted for 15 combinations of the predictor variables (Table 5) at an unthinned pulse density and with a height threshold of 0 m to examine the relative importance of ALS and spectral data for classifying invasive conifer echoes.These models ranged from a single variable model including only ALS elevation to a five variable model including all available spectral and ALS metrics.The effect of pulse density on classification accuracy was examined by fitting all 15 RF models with each of the four thinned datasets (Table 3), including all echoes regardless of their elevation.The effect of height threshold on classification accuracy was examined by varying the threshold below which the points were excluded from the training dataset between 0 m and 2 m, at intervals of 0.5 m.Using the unthinned ALS dataset, the 15 RF models were refitted at each height threshold.
Table 5. Classification accuracy expressed through Cohen's Kappa and associated 95% confidence interval (KappaCI), area under curve (AUC) and the associated 95% confidence interval (AUCCI) from receiver operator characteristic (ROC) curves for all 15 models examined.Predictor variables denoted with an * were included in the model.

Model
Predictor

Accuracy Assessment
Classification performance for each model was assessed using Cohen's Kappa [57] (kappa) coefficient based on both a leave one out cross-validation (LOOCV) from the RF classification models and a leave one plot out cross-validation (LOPOCV) basis.LOPOCV was implemented by using a custom R function that sequentially excluded all echoes associated with a single reference plot and used all remaining echoes to train an RF model to predict classification values for the excluded plot.Using this approach provided a completely independent validation dataset for assessing predictive accuracy.The LOPOCV provides a much more conservative estimate of predictive accuracy than the other statistics calculated but is more indicative of model performance when applied to an independent dataset.Due to the high computational cost of calculating LOPOCV, this statistic was only calculated for the best performing model.In our approach, LOOCV was used to compare the relative accuracy of the models developed, and LOPOCV provides a measure of model accuracy and transferability to independent data that is more reflective of an operational deployment of this technique.
Kappa is a widely-used metric for assessing the agreement between two sets of observations.Kappa was calculated using the 'psych' R package [58], and unweighted kappa values were reported.The kappa statistic is generally deemed to be robust because it accounts for agreements occurring through chance alone.Several authors propose that the agreement expressed through kappa, which varies between 0 and 1, can be broadly classified as slight (0-0.20),fair (0.21-0.40), moderate (0.41-0.60) and substantial (0.61-1) [38,59].Confidence intervals for kappa values were calculated using the methods proposed by Fleiss et al. [60] available through the 'psych' R package.Receiver operator characteristic (ROC) curves were also used to examine the accuracy of the classification.ROC curves are graphical representations of the accuracy of binary classifiers.The true positive rate (sensitivity) is plotted on the y-axis, and the false positive rate forms the x-axis.The ROC curve is plotted by calculating the cumulative distribution function on both of these axes with a diagonal reference line plotted to indicate where classification is no better than chance.The area under the curve (AUC) can be calculated from ROC curves and is used to quantify classification quality.AUC values for ROC curves vary between 0.5, classification no better than chance, to 1, indicating a perfect binary classification.ROC curves were plotted, and AUC was calculated, using the pROC R package [61].
In addition to the misclassification error, there will also be invasive conifers that were not sampled by the ALS campaign.In this case, there is no chance that these trees will be correctly classified as they will not be included in the sample population.The number of invasive conifer polygons containing no returns was used as a measure of the number of trees that would be omitted through this 'out-of-sample error' for each pulse density.

Spectral and Structural Properties
The height profile of echoes from invasive conifers and plots containing manuka were superficially similar (Figure 2a), as both vegetation types formed continuous tree cover in some plots.Plots dominated by ferns, grassland and other shrubs have considerably different structural properties from invasive conifers.In the near-infrared band, echoes originating from the different vegetation types were quite different.This is particularly evident for plots dominated by manuka (Figure 2c).The differences in the green and red bands (Figure 2b,d) were marginally less distinct between echoes originating in invasive conifers and those originating in plots dominated by other vegetation types.The data summarised in Figure 2 suggest that combining elevation data from ALS with spectral data should provide a means of accurately classifying echoes originating in invasive conifers.The intensity values of echoes originating in invasive conifers spanned the entire range of intensity values (range = 0-65,535) in the study area and had a median intensity value (32,055) close to the study area median (41,600).This suggests that the uncalibrated intensity values used in this study would likely have little value in classification models for invasive conifers as they would overlap the values from all other vegetation types.

Classification Accuracy
The most accurate classification model (Model 1) included covariates from ALS data and data from all spectral bands (Table 5).Model 1 displayed substantial agreement between predicted and actual echo classification (kappa = 0.837, AUC = 0.885).A comparison of the four variable models showed that Model 1 was fairly insensitive to removal of the green band (Model 2: Kappa = 0.785, AUC = 0.856), red band (Model 4: kappa = 0.781, AUC = 0.854) or ALS intensity data (Model 5: kappa = 0.773, AUC = 0.849), but was sensitive to removal of the near-infra red band (Model 3: kappa = 0.744, AUC = 0.828).Compared to four variable models, there was a marked decline in model accuracy for models with three variables (Table 5).Three variable models that included ALS elevation, near-infrared and either intensity (Model 6) or another spectral band (Models 7 and 8) were far more accurate than three variable models with only spectral bands or a combination of ALS elevation and spectral bands other than near-infrared (Models 9, 10 and 13).Model 10 contained spectral information only and was substantially less accurate than models that contained ALS elevation and two spectral bands (Model 7, 8 and 9), but performed better than models that included only elevation and a single spectral band (Model 14 and 13), unless the single band was near-infrared (Model 12).Of the two variable models with ALS elevation data and a single spectral band, Model 12 including the near-infrared band was most accurate (kappa = 0.355, AUC = 0.597).By comparison, models fitted using ALS elevation data and either intensity (Model 11: kappa = 0.292, AUC = 0.597), the red band (Model 13: kappa = 0.224, AUC = 0.571) or the green band (Model 14: kappa = 0.221, AUC = 0.569) were considerably less accurate.Models developed using ALS elevation data alone displayed minimal classification accuracy (Model 15: kappa = 0.101, AUC = 0.529).
The receiver operator curves for all models were plotted and examined to determine the performance of the binary classification.This analysis highlighted a significant discrepancy between the best and worst performing models (Figure 3).The most accurate models displayed significantly more area under the ROC curve, indicating a far greater true positive and lower false positive classification rate.The worst performing models (e.g., Model 15 in Figure 3) were considerably closer to the diagonal line, indicating that their classification accuracy was closer to that expected through chance.The variable importance scores for the best random forest model (Model 1) were calculated through permutation.This analysis indicated that the near-infrared (importance = 0.147) was the most important predictive variable, followed by data from the green band (importance = 0.143).ALS elevation values and data from the red band were slightly less important, and ALS intensity data were considerably less useful (importance = 0.06).

Pulse Density and Height Threshold
Overall, pulse density did not have a significantly detrimental impact on classification accuracy for the best models (Figure 4).The more accurate four and five variable models (Models 1-5) showed little change in kappa with pulse density.The most accurate model (Model 1) was the only model where kappa increased slightly as the pulse density increased.All of the remaining models (Models 6-15) showed a decrease in kappa as the pulse density increased to 5 pls/m 2 , which then stabilised and plateaued, or decreased marginally, at higher pulse densities.
Classification accuracy increased with the height threshold for all models (Figure 5).On average, the classification accuracy across all models increased from a mean kappa value of 0.485 at a 0 m threshold to 0.713 at a 2 m threshold.The increase in performance was particularly marked for the models that offered moderate classification accuracy, with less than four variables.Model 10 showed the greatest gain in kappa increasing from 0.308 at a 0 m threshold to 0.887 at 2 m.Although Model 10 did not include any ALS predictors, this model was among the best performing models at the 2 m height threshold.Model 5, a model that excluded ALS intensity values, became the most accurate classifier once the height threshold was increased above zero.The most accurate classifier (Model 1) displayed modest improvement as the height threshold was increased.At a 0 m threshold, the classification accuracy for Model 1 was already very high (kappa = 0.837), and this increased steadily to provide an exceptionally accurate classifier at a threshold of 2 m (kappa = 0.914).It is noteworthy that the shapes of the curves in Figures 4 and 5 are influenced by the number of objects sampled by the ALS data under each scenario.The influence of ALS pulse density and height threshold on classification accuracy must be interpreted with reference to the changes in out-of-sample errors.

Kappa Kappa
Pulse density (pls/m 2 ) q q q q q q q q q q q q q Model 1 Model 2   5) at each of the tested pulse densities.
q q q q q 0.0 0. Height threshold (m) q q q q q q q q q q q q q Model 1 Model 2 Model 3

Leave One Plot Out Cross-Validation
A LOPOCV was implemented to test the applicability of the best performing model (Model 1) to independent data.Across all independent observations, the kappa value for the LOPOCV was 0.284 (confidence interval = 0.279-0.289).Using the categories previously proposed [59], this suggests a fair agreement between observed and predicted values.The overall AUC value for the LOPOCV analysis was 0.605 (confidence interval = 0.601-0.607),considerably lower than the AUC values acquired using LOOCV.The outputs of the best performing model (Model 1) confirm that model performance varied across the range of vegetation types in the study (Figure 6).Invasive conifers are accurately classified in many instances (e.g., Figure 6a), although there are some misclassified echoes that are also present.Areas with alternative vegetation composition contain some misclassified echoes (e.g., Figure 6b (ferns)), but in other examples, classification is considerably more accurate (Figure 6c,d).

Out-Of-Sample Errors
A linear model was used to examine the relationship between the proportion of missed trees per plot, ALS pulse density and the estimated canopy area of the missed individuals.A logit transformation of the proportion of missed trees was used as the response variable.This model (R 2 = 0.37, AIC = 971) indicated that ALS pulse density, canopy area and the interaction between these two explanatory variables all had a significant effect on the proportion of individuals missed by the ALS sampling at the 0.05 significance level.The number of invasive conifers that were missed decreased as the pulse density increased from an average of 6.2 trees per plot at 1 pls/m 2 to 0.6 trees per plot at 21 pls/m 2 .The proportion of trees accounted for increased from 76% at 1 pls/m 2 to 96.4% at 21 pls/m 2 .This result has important implications for guiding suitable campaign settings for invasive conifer detection.Below 5 pls/m 2 , considerable numbers of invasive conifers start to be omitted from the ALS sampled area (Figure 7).It is also clear that at lower pulse densities a greater number of larger trees are omitted from the ALS sample than at higher pulse densities (Figure 7).q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.0

Discussion
The principle finding of this study is that using data from ALS in combination with spectral data from aerial imagery provides a fairly accurate means of classifying LiDAR echoes from invasive conifers in an invasion-prone vegetation type.The study results indicate that this approach may offer a promising method for detecting invasive conifers that invade relatively complex terrain with a vegetation structure composed of short tussock grassland intermixed with shrub species.
The ALS elevation and intensity data and the spectral data from aerial imagery were of limited value for detecting returns from invasive conifers in their own right.Fusion of the two data types improved echo classification.Related approaches relying on ALS elevation data are often concerned only with identifying all tree species in the landscape [19,20,26].Numerous studies have fused ALS data with spectral data to improve classification [38,62].However, these approaches often rely on pixel-based methods to classify different species [63,64] or the aggregation of remote sensing data sources to larger scales, e.g., individual tree crowns [62].To the best of our knowledge, this is the first time that spectral data have instead been added to individual ALS echo data to assist with the identification of invasive conifers.This method appears to improve classification accuracy.Although we did not test alternative methods, the removal of the aggregation step may offer some advantages over approaches relying on the aggregation of LiDAR data into surfaces.Of the spectral bands available, the near-infrared band was the most important for detecting invasive conifers in combination with ALS data.
The effect of pulse density on classification accuracy was examined and found to have a negligible effect for the five most accurate models, but the number of missed invasive conifers was highly dependent on the pulse density (increasing the probability of omission).Raising the height threshold for the inclusion of echoes in the training dataset was found to have little impact on four-and five-variable models, but did markedly improve the classification accuracy of the remaining models.Our approach is reliant on very accurate alignment between both datasets used and will be limited by the coarsest resolution data available.In our study, the coarsest remotely-sensed data used were the aerial imagery (0.3 m GSD), and it is possible that finer resolution imagery would further improve classification accuracy.As pixel values represent an aggregation of spectral values, the attribution to a return could be erroneous (e.g., ground echoes could be assigned canopy spectra values).This could well be the causal mechanism for improved classification accuracy at higher height thresholds.However, the resolution of this dataset is relatively fine, and lower cost alternatives, such as satellite imagery, would be unlikely to provide a useful classification for all but the most severe conifer infestations or larger scattered conifers.Very high spatial, and spectral, resolution satellite imagery could likely provide comparable results to the aerial imagery employed in this research.Further research should be able to provide insight into the optimal spectral and spatial resolutions for invasive conifer detection across a wider range of vegetation types.Emerging sensor technologies will inevitably offer improvements in this area.Photogrammetric point clouds derived using structure from motion could provide a functional, and potentially less expensive, alternative to the combination used in this research.However, our imagery contained insufficient overlap to test this thoroughly.
In this study, field data, comprising a characterisation of the present vegetation, as well as the presence and tree metrics of invasive conifers, was collected across the study area covering the dominant vegetation types present.The objective was to ensure that remotely-sensed data were available for all of the common vegetation types in an area vulnerable to conifer infestation.Consequently, classifiers developed from this field dataset should be more robust to false positive classification of echoes from other trees and shrubs in the study area than if the sampling design had focussed solely on the invasive conifer infestation and did not encompass such a wide range of vegetation types.The size of the ground sample of invasive conifers in this study (n = 825) is equivalent to or larger than other remote sensing studies aimed at detecting pioneer trees in the boreal-alpine transition zone [19,20,26].Unlike previous research, we did not record crown width measurements in the field survey plots, which saved significant data collection effort.Instead, a novel technique was applied where the crown area of invasive conifers was extracted from the CHM using a semi-automated technique, and this was used to provide a binary classification key for the training dataset.Following some manual corrections, this approach appeared to offer a successful alternative to field measurement.However, we were not able to empirically determine the quality of crown area estimates with this dataset; this would be a valuable topic for further research.The apparent success of the canopy delineation approach was probably due, in part, to the exceptionally high density of the original ALS dataset, which provided a high quality, fine resolution CHM with minimal distance between subsequent returns.
The ALS data thinning algorithm employed was based on systematically removing all echoes associated with a pulse in a manner that simulated increased pulse spacing on the ground.We believe this technique provided a reasonable approximation of the effect of reduced pulse density in the ALS data.However, recently-proposed, sophisticated methods for simulation of LiDAR campaign effects on pulse density may offer more realistic results [29].The complexity of these methods and the computation time associated with such a high-density base dataset motivated our selection of a simpler pulse-based thinning algorithm.Regardless, no data thinning technique currently available can fully simulate the increased laser footprint size (further exaggerated on steep slopes), the effect of the increased thickness of the atmosphere experienced when flying at a greater altitude [65] or more oblique scanning angles that may be associated with different campaign settings [29,66].Sophisticated simulations based on ray tracing [67] offer the best opportunity for those seeking a more complete understanding of the influence of campaign settings on the data generated.
The most accurate RF classification models developed in this study provided highly accurate echo classification that could be useful for invasive conifer detection.Using the entire dataset, the most accurate classification model included information from ALS data and from all available spectral bands.The kappa value for this model (0.837) was substantially higher than the kappa value (0.594) of the best reported model from a similar echo classification study in Norway [20].It is reasonable to speculate that this improved performance may be due to the inclusion of high-resolution spectral data, although the properties of the ALS dataset and intensity of the field survey may also be contributing factors.
Removal of the ALS intensity value from the classification model led to only a negligible decrease in model performance, and the RF importance scores indicated that these data contributed the least to successful classification.This is consistent with other recent research that suggested that the utility of LiDAR intensity data for classifications of trees and non-trees is 'far from significant' [26].However, it is possible that intensity values may be useful in situations where there is extensive inorganic material (e.g., rocky terrain) with relatively distinct backscatter characteristics.In this study, we found that a successful classifier was characterised by the ability to differentiate between vegetation and non-vegetation echoes, as well as between vegetation types.For this reason, models based only on ALS elevation data or spectral data were of little use.Models that combined ALS data with all available spectral bands were the most accurate classifiers.However, of the spectral bands tested, the near-infrared was the most valuable for classifying invasive conifers.It is well established that coniferous species show distinctive optical characteristics [68].In comparison to other species groups, conifers are characterised by higher levels of absorption across the visible wavelengths [68] with especially high levels of absorption observed in the near-infrared portion of the spectrum [69,70].It is noteworthy that the differences in the visible spectrum are related to changes in leaf chemistry, while differences in the near-infrared portion of the spectrum are primarily related to differences in leaf structure [68,69].We deem it likely that these properties can partly explain the importance of all bands, as well as the increased importance of the near-infrared data in discerning coniferous species from other vegetation types.This has important practical implications for data collection campaigns aimed at detecting the spread of invasive conifers.Future research should seek to further capitalise on the distinctive spectral properties of conifers in these landscapes by investigating other wavelengths and the possibility of using vegetation indices in conjunction with ALS elevation data.
ALS pulse density was investigated in this research and, as expected, was found to have very little impact on classification accuracy.However, this result should not be interpreted as an indication that low-density ALS data are equally as useful for invasive conifer detection as high density data.In addition to the echo classification error, a further source of error is the omission of invasive conifers that are not scanned due to the campaign settings or occlusion from surrounding terrain or vegetation.Further research is required to investigate the effect of pulse density on the probability of detection invasive conifers of various size by remotely-sensed datasets of varying resolution.
We varied the height threshold used for inclusion in the classification dataset and investigated the effect of height threshold on classification accuracy.This analysis showed that the most accurate models were relatively insensitive to increases in the height threshold.This suggests that these models provide an accurate classification right down to the ground level, and so, the echo classification approach has the potential to correctly identify smaller trees.Interestingly, the model that included all predictors except ALS intensity data (Table 4, Model 5) was more accurate than Model 1, which included ALS intensity data, at height thresholds exceeding 1 m.This could be because at heights below 1 m, the echo classifier needs to be able to differentiate between invasive conifers and inorganic material (e.g., rocks), and ALS intensity may provide useful information within this height range.However, at higher height thresholds, this is no longer an issue, and so, the ALS intensity data do not contribute to classification accuracy.Model 10 was fitted with spectral data only and was the most sensitive to the height threshold used.When the height threshold was 0 m, the accuracy of the model was only fair, but the accuracy rate increased rapidly with the height threshold, and by 2 m, this was amongst the best performing models.This is most likely caused by the fact that above 2 m, all echoes will be from trees or large shrubby vegetation, and the spectral values have significant power to differentiate between invasive conifers and other vegetation types.This suggests that aerial imagery has utility for detecting larger invasive trees, but if detection prior to maturity is required, then ALS data, in combination with spectral values, have greater capacity for detection.
The characteristics of the ALS survey and the size of the target have a significant effect on the probability of invasive conifers being included in the campaign out-of-sample error.This result has important practical implications for guiding suitable campaign settings for invasive conifer detection.Below 5 pls/m 2 , considerably more invasive conifers were missed, and those missed individuals were also considerably larger.Detecting and eradicating invasive conifers before they start to produce cones is vital to controlling conifer spread and minimising their ecological impact.As a result, based on the evidence presented in this paper, it is suggested that a pulse density of 5 pls/m 2 or greater should be used if invasive conifer detection is a major objective of the data acquisition.For large-scale management, it may be infeasible to collect ALS data at this density due to financial or time constraints.In these situations, successive, lower density surveys may offer the possibility to detect missed individuals as height and canopy volume increases while background objects that interfere with detection, such as rocks and mounds, remain static [19,20].The high growth rates of invasive conifers may further assist detection from repeated surveys.Estimates from Ledgard and Paul [10] suggest that during the earlier stages of invasion, height growth for Pinus contorta Dougl.ex Loudon (one of the most problematic species) may be as high as 30 cm per annum.This is significantly higher than growth rates in the boreal ecotone, where research has shown that data from multi-temporal ALS campaigns contains valuable data for monitoring changes in the tree-line [71].The time required before previously undetected invasive conifers would reach sufficient height and crown diameter to be detected in subsequent lower density surveys may be quite short, and the implementation of this approach may represent an effective and efficient means of monitoring and controlling the spread of invasive conifers over large areas of New Zealand.

Conclusions
The objectives of this study were (i) to compare the accuracy of detection models developed using various combinations of ALS data (elevation, intensity) and aerially acquired spectral data and (ii) to determine the sensitivity of classification accuracy in these models to the height threshold used for inclusion and the ALS pulse density used.Through this research, we found that combining spectral data with ALS data resulted in much greater classification accuracy than either ALS or spectral data alone.Uncalibrated ALS intensity data were the least useful candidate variable tested, and of the spectral bands examined, the near-infrared was the most valuable.The most accurate model contained ALS elevation and intensity data, as well as all three spectral bands examined.When this model was applied to a completely independent dataset through a LOPOCV, the classification accuracy was fair.Varying the height threshold for inclusion in the training dataset and the ALS pulse density had very little effect on classification accuracy.However, ALS pulse density had a significant effect on the size and number of invasive conifers that were not sampled by the ALS survey.We found that considerably more, larger invasive conifers were excluded from the sample when the ALS pulse density was reduced below 5 pls/m 2 .Both the accuracy of the models developed and the effect of pulse density on the probability of sampling invasive conifers are specific to this terrain and vegetation type.However, the findings of this research can be used as a basis to inform practitioners planning surveys that include remotely-sensed data for monitoring the spread, and planning control efforts for, these invasive species.This research has proposed a novel approach to classifying echoes from ALS data for the classification of invasive conifers in a grassland environment by incorporating spectral values from aerial imagery.

Figure 1 .
Figure 1.Overview and geographic location of study site and field plots (light green filled circles).Right hand panel shows examples of dominant vegetation types from the study area.These included: (a) invasive conifers; (b) areas dominated by bracken; (c) grassland areas; and (d) areas dominated by manuka trees up to 4 m.

Figure 2 .
Figure 2. Box and whisker plots showing the spectral and structural properties of invasive conifers (conifer) and the other major vegetation types in the study area derived from the ALS point cloud.Panels show (a) return elevation; (b) spectral properties in the green band; (c) spectral properties in the near-infrared band; (d) spectral properties in the red band.

Figure 3 .
Figure 3.The receiver operator characteristic (ROC) curve for the most accurate (Model 1) and the least accurate (Model 15) models.The ROC curves for all other models lie between these two.

Figure 4 .
Figure 4. Classification accuracy (kappa) for all random forest models (Table5) at each of the tested pulse densities.

15 Figure 5 .
Figure 5. Classification accuracy (kappa) for all models at all height thresholds tested.

Figure 6 .
Figure 6.Performance of the RF classifier when applied to the plots shown in Figure 1 containing: (a) invasive conifers; (b) areas dominated by bracken; (c) grassland areas; and (d) areas dominated by manuka trees.Invasive conifer classified echoes are shown in light green, and the invasive conifer outlines used to train the model are shown in blue.Echoes not classified as invasive conifers are not shown.

Figure 7 .
Figure 7. Proportion of invasive conifers missed per plot at each pulse density.Points are jittered to avoid over-plotting, and the point size is proportional to the mean canopy area of the missed invasive conifers in the plot.The grey line shows a linear model.

Table 1 .
Field tree data summary with range shown in brackets.Diameters shown were measured at breast height (1.4 m) for larger trees and ground level for saplings.

Table 2 .
Campaign settings for ALS data acquisition.

Table 3 .
Summary of original and thinned ALS datasets.

Table 4 .
Summary of aerial imagery data.