Exploring Multispectral ALS Data for Tree Species Classification

Multispectral Airborne Laser Scanning (ALS) is a new technology and its output data have not been fully explored for tree species classification purposes. The objective of this study was to investigate what type of features from multispectral ALS data (wavelengths of 1550 nm, 1064 nm and 532 nm) are best suited for tree species classification. Remote sensing data were gathered over hemi-boreal forest in southern Sweden (58◦27′18.35′′N, 13◦39′08.03′′E) on 21 July 2016. The field data consisted of 179 solitary trees from nine genera and ten species. Two new methods for feature extraction were tested and compared to features of height and intensity distributions. The features that were most important for tree species classification were intensity distribution features. Features from the upper part of the upper and outer parts of the crown were better for classification purposes than others. The best classification model was created using distribution features of both intensity and height in multispectral data, with a leave-one-out cross-validated accuracy of 76.5%. As a comparison, only structural features resulted in an highest accuracy of 43.0%. Picea abies and Pinus sylvestris had high user’s and producer’s accuracies and were not confused with any deciduous species. Tilia cordata was the deciduous species with a large sample that was most frequently confused with many other deciduous species. The results, although based on a small and special data set, suggest that multispectral ALS is a technology with great potential for tree species classification.


Introduction
Multiple studies have examined how three-dimensional data from Airborne Laser Scanning (ALS) might benefit forestry [1].ALS has been used to produce nationwide estimations of forest variables, such as height, volume, stem diameter and basal area, with high accuracy [1,2].Estimations are usually made by using regression analysis of height distribution and density features of the ALS-derived point cloud.An efficient, automated and objective forest inventory of large areas can be performed with a relatively small set of field sample plots and remote sensing data, as an alternative to the common subjective, manual inventory used when making forest management plans.
However, the height distribution of the ALS data has not shown great potential for tree species classification at stand level.Several tree species have crowns with similar height distributions and cannot be separated based on the height distribution features from ALS data.Additionally, multi-layered forest stands have smaller trees below the top-most canopy layer, which makes it difficult to differentiate trees with tall crowns from those with short crowns and an understory beneath.
If ALS data with high return density are available, individual tree crowns can be delineated.The shape and proportions of tree crowns differ somewhat between species and have been used for tree species classification [3][4][5][6][7].Additionally, tree crowns can be separated from small trees below the top-most canopy to achieve better defined derived features [8,9].
The features used for tree species classification can be divided into geometric, radiometric and waveform features [10].Geometric features describe the geometry or shape of the object while radiometric features describe characteristics like the backscatter cross-section.Some laser scanner systems record the whole returned waveform, which makes it possible to extract echoes after data collection and use them for tree species classification [9,11,12].Following Koenig and Höfle [10], waveform features describe characteristics such as the distance between two waveform echoes and the skewness of the waveform peaks.
Many ALS systems lack consistent radiometric information, which has been addressed in some studies [13].Despite that, radiometric information has been used for tree species classification [14,15].To include spectral information, the geometric information derived from the laser point cloud has been combined with optical images for tree species classification [16][17][18][19][20].
Another method to obtain detailed 3D data is derivation of photogrammetric point clouds.Digital cameras and computer vision algorithms make it possible to automatically produce large-scale 3D models using matching of overlapping aerial images.The point clouds provide information about the height of the vegetation although the photogrammetric point clouds provide less information about the structure than ALS data [21].On the other hand, the point clouds contain spectral information from the optical bands of the images.From interpretation of aerial images, it is well known that different tree species have somewhat different reflectance spectra [22,23].This can be used to predict tree species proportions and classify dominant species [24].Photogrammetric point clouds produced from images collected with unmanned aerial vehicles (UAVs) [25] and even space-borne very high resolution stereo-imagery (VHRSI) [26] have also been used for this purpose.To make use of the differences in phenology between different tree species, time series of satellite images are being studied [27].
Multispectral data have previously only been available from passive optical sensors, such as cameras and electro-optical scanners, where an average brightness within each pixel is recorded [23].Automated methods for tree species classification using aerial photography have been developed [28], but these deal with data from the surface of the canopy, which is visible from the camera position.Reflectance data from passive optical sensors have several drawbacks.Surfaces that are shadowed do not contribute as much to a pixel value as those in direct sunlight.At the same time, light reflected from the ground contributes to the pixel value.Lighting, shadows and view angle may also affect parts of the image differently.The top layer is the one reflecting most of the sunlight, and a measurement using passive sensors provides more information about the surface than spectral characteristics further down in the canopy.ALS, on the other hand, is an active sensor where the signal originates from a well-defined position, without any interfering intensities from the ground or solar angles, with a well-defined wavelength and outgoing intensity.Most laser scanner systems for topographic mapping use only one wavelength, usually near-infrared or shortwave infrared.A new multispectral laser scanner system with three different wavelengths has been used since 2015 [29]: Optech Titan X (Teledyne Optech, Vaughan, ON, Canada).When using such a laser scanner, colors further down in the canopy are revealed [30], and this information might be very well suited for tree species estimations [31].Data from multispectral ALS have already been shown to provide a good basis for tree species classification [32][33][34].Sensors providing an inherent combination of structure and reflectance may be the future for remote sensing in forestry.
The most common species groups used when classifying tree species in Sweden are Picea abies, Pinus sylvestris, and other, where all deciduous species fall in the other-class [4,28].In general, deciduous species will reflect more light in the near infrared spectrum than coniferous, and spruce have different crown structure than pine [35].Some have tried to classify different deciduous species using ALS data, but with moderate success [3,36].In Sweden, half of the close to 2300 forest-dwelling and red-listed species are dependent on deciduous tree species and many of these rely on a deciduous tree habitat for survival [37].By using information on tree species composition and distribution in the landscape, it is possible to make predictions on where certain threatened species might occur.If species of trees in a stand can be classified, the possible habitats in that area may also be evaluated [38].

Goal and Objectives
The goal of this study was to investigate what type of features from multispectral ALS data that are best suited for tree species classification.Since the technology is so new, no consensus on how to use these data exists yet and an exploration of data will be needed to know how this technology could be used in the future.The conventional way of estimating forest variables from ALS data is by using features of the height distribution [1].This method uses information in one dimension only and does not utilize the three-dimensional nature of ALS-data.Using distributions may not necessarily be the best method when working with multispectral data.To achieve the goal, a number of questions were to be answered: What features are the most effective for tree species classification and what are their commonalities?2.
How do spectral and structural features contribute to tree species classification accuracy?3.
What are the differences in classification accuracy between the best feature combination, the best combination of only spectral features and the best combination of only structural features?

Data
The study area was Remningstorp as well as surrounding forested grazing lands, located in Västra Götaland, Sweden (58 • 27 18.35 N, 13 • 39 08.03 E) (Figure 1).The Remningstorp estate has an area of 1602 ha, covered mainly with spruce in stands managed for wood production.The scanned area included the nature reserve Eahagen-Öglunda ängar, which is dominated by different deciduous species, mainly oak (Quercus robur) and linden (Tilia cordata).The wood pastures inside and outside of the nature reserve are also mainly covered with deciduous trees but distributed more sparsely.

Laser Scanner Data
Multispectral laser data had been gathered on 21 July 2016 using the Optech Titan X ALS-system.Three channels are produced by the system: 1550 nm (channel 1; short wave infrared), 1064 nm (channel 2; near infrared) and 532 nm (channel 3; green) [39].The scanning was performed at around 400 m above ground with a maximum angle of 30 • and produced an average of 30 returns per square meter (10 returns per square meter and channel).The relatively low altitude meant that these data had a very high resolution and a return signal with high power, resulting in intensity values with high precision.To make a digital terrain model for height normalization, laser data from the Swedish land survey were used.The scanner recorded up to four returns for each emitted pulse

Field Inventory Data
During the autumn of 2016, field measurements were made on 179 individual mature trees of nine genera (classes): spruce (Picea abies), pine (Pinus sylvestris), birch (Betula pendula and Betula pubescens), oak (Quercus robur), ash (Fraxinus excelsior), linden (Tilia cordata), wild cherry (Prunus avium), maple (Acer platanoides), and alder (Alnus glutinosa).The only genera where more than one species were present was Betula; therefore, the term species has been used to describe the other species as well as the Betula genera.For each tree, a number of properties were recorded: species, height and live crown height.Height and crown height values were measured using a digital hypsometer.GPS-coordinates delineating the tree crown were also recorded, with a minimum of three coordinates.Trees standing close to each other compete and their crown shapes change.To accommodate for this difference in shape, if a tree stood close to other trees, additional points were recorded between them.This was done to improve the accuracy of the crown boundary and let a deformation of the crown shape, caused by a neighbouring tree, weigh in on the final crown position and radius, as specified in Section 2.2.1.The device used was an RTK-GPS (Trimble Inc., Sunnyvale, CA, USA) that use fixed base stations to improve positioning accuracy.Live crown height was defined by the height from ground to the point where the first living branch met the stem.Any additional information that might be of value, such as diseases or damages, was also noted.
Since only one spruce with a free standing crown was found during inventory, a number of circular field plots from another inventory-first made in 2014 with a follow-up inventory in 2016-were used to get more observations in that class.These field plots were visited to measure live crown height while other variable values of the plots were available from the 2016 inventory.Trees in the plots were manually delineated in a Quick Terrain modeler (v.8.0.7,Applied Imagery, Chevy Chase, MD, USA).Points for delineation were chosen in the same way as those collected with a GPS-device for other trees.

Preprocessing
Individual trees that had several stems had a height measurement for each stem.Some trees have multiple stems and since they form a single crown together, the highest value of height and lowest value of live crown height were chosen as values for the individual tree.
To classify species, features of the tree crown were to be used.The exact location of the stem was therefore not as relevant as cutting the whole crown from the point cloud.By using the horizontal extent of the crown, rather than a circle with a center where the stem meets the ground, more of the crown could be included.To produce a center coordinate for the horizontal extent of the crown, a center of mass of the delineating coordinates was used (i.e., the mean horizontal coordinate of the delineating coordinates).Then, the mean distance from the crown center to the delineating coordinates was used as the radius of a circle when cutting the point cloud.
Return intensity had been recorded by the ALS system.The return intensity follows the inverse square law (i.e., the intensity is inversely proportional to the distance squared) since the emitted pulse is fairly focused.The measured intensity was thus multiplied by the square of the distance so that it describes the target's reflectivity as a relative value.
A local digital elevation model (DEM) was created using the National Land Survey's laser data by creating a triangulated irregular network (TIN) from returns classified as ground returns.The reason for using the National Land Survey's laser data was that has been manually quality controlled [40].
The z-coordinates of the multispectral ALS point cloud were normalized as height above this DEM, instead of height above sea level.Some returns were removed based on height and live crown height values in the field data.All returns 10 m above the tree height were removed to avoid error sources such as birds.In some cases, returns below the live crown height were also removed when constructing certain features as specified in Section 2.2.2.

Features
First and subsequent returns of a pulse will have return intensities dependent on each other.A function of first return intensity can be used to predict the maximum possible intensity of the second return [41].As the first return contains information on subsequent returns, the comparisons of feature sets were made using a point cloud of first returns.
For each channel, three types of feature sets were calculated: ellipsoid layers, horizontal layers, and distributions.Each feature set contained two types of features: structural (geometric) and intensity (radiometric) features.The channel from which a given feature was derived is denoted as a superscript (C1, C2 or C3) to that feature.Intensity and structure subsets of the features in each feature set were made to determine the subsets' importance for classification.
As Table 1 shows, some species were clearly distinguishable from others due to size differences.To accomodate for this, the features used were calculated as relative values that were size independent.

Ellipsoid Layers
By defining ellipsoids within the ALS data, differences in the density and reflectance outside as well as inside tree crowns might be identified.Maple has a very dense crown with large flat leaves, while ash has a sparser crown with thinner leaves.These differences might be measurable if an ellipsoid is fitted to the crown and ellipsoidal layers are compared to each other.
Using the general shape of the point cloud, a spheroid type ellipsoid was fitted to the crown (see Figure 2).The center of the tree was defined as where x p , y p , and z p are x-, y-, and z-coordinates of the returns in the point cloud assigned to the tree.For the outermost ellipsoid, the horizontal (r h ) and vertical (r v ) radii were calculated using each return's distance from the center of the tree, according to Equations ( 4) and ( 5), where The value of r h being defined as two times the 95th percentile of the distances from the center to a return results in ellipsoids stretching outside of the point cloud.The reason to have a horizontal radius much larger than the data extent is that many fully grown deciduous trees have a flat top; if only the 95th percentile had been used, too many returns would fall outside of the outermost ellipsoid for the analysis to make any sense.
All subsequent ellipsoidal layers were created using the same values but with a constant of 0.5 m subtracted from each radius (i.e., layer thickness).The number of ellipsoid layers were determined by dividing the smaller value of r h and r v by the layer thickness and truncating the result.For each layer, the return density, in returns per cubic meter, was divided by the mean density of all returns inside of the outermost ellipsoid.In this way, the density within the layer was compared to the density within the crown, eliminating size as a factor for tree species determination.
Features were computed using a point cloud where returns below the field measured live crown height were removed.Three main feature types were computed: J Cx y , E Cx y , and R hv .The intensity features, J, are the mean return intensity within layers or the mean intensity outside the outermost ellipsoid.Structural features, E, are relative return densities within ellipsoidal layers or the percentage of all returns that fall outside of the outermost ellipsoid.R hv is also a structural feature, namely the ratio of r h and r v (Equations ( 4) and ( 5)), describing the ellipsoid shape (a value of 1 would be a perfect sphere).Superscript variable x is the channel number (1, 2, or 3) and subscript variable y is the ellipsoidal layer ranging from 0 (outermost) and up to the innermost.y may also be set to e, which denotes that this is a value taken from outside of the outermost ellipsoid.E with a subscript e stands for the percentage of all returns that fall outside of the outermost ellipsoid, and J with a subscript e stands for the mean intensity outside the outermost ellipsoid.

Horizontal Layers
The horizontal layer feature set was created in a similar way as the ellipsoid feature set.This feature set was also computed using a point cloud where returns below field measured live crown height were removed.The main features types were K Cx y and L Cx y , where x is a channel number and y is the layer numbered from top to bottom starting from zero.The intensity features, K, are the mean return intensities within layers, and structural features, L, are relative return density within layers.
With the same constant layer thickness (0.5 m) as when computing the ellipsoid feature set, the point cloud was sliced from the top and in each layer mean intensity and relative return density were computed.Relative return density was calculated using a cylinder, defined by the layer thickness and a radius set to the maximum horizontal distance from the center to any return; the density in the layer was then divided by the density in all cylinders.

Distributions
A common way for estimating forest variables from ALS-data is by using features describing the height distribution of the point cloud.One common feature variant is height percentiles [10] (i.e., the height below which a certain percentage of returns are found [42]).Percentiles describe a distribution, and can be applied to height values or intensity values.Points above two meters were used to compute distribution features.To compute the vegetation ratio (dns), all returns, including ground returns, were used.The distribution feature subsets consisted of the features presented in Table 2. Subscripts to percentile features indicates what percentile it is, where 1 stands for the 1st percentile and 95 for the 95th.To make the height percentiles independent of the tree height, they were all normalized by dividing them with the 99th height percentile.These features describe height and intensity distribution separately, but not any relation between them.

Classification Model and Feature Selection
Classification of tree species was made using linear discriminant analysis (LDA) models [43].The classification accuracy serves only as a means to compare feature sets.LDA is a parametric model and its simplicity makes it easy to understand and apply.Compared to non-parametric models, the method is less susceptible to overfitting, but if used with too many explanatory features in combination with a too small sample size overfitting may become a problem.To minimize overfitting by selecting a limited number of features, two methods for feature selection were used: feature ranking [44] and stepwise feature selection [45].When selecting features, the criterion to stop adding more features was that the feature combination had to have fewer features than there were classes (genera).Stepwise feature selection was also restricted so that additional features were not allowed to make the classification accuracy worse; if no additional feature was found, the algorithm would finish with less than the maximum allowed number of features [45].
Single feature performance was evaluated using the F-ratio (also known as Fisher's criterion or the F statistic, commonly used in ANOVA) [46], which describes how much variability there is between the classes compared to within classes.This ranking is optimal for LDA [44] and features could be ordered by their F-ratio.To compare which individual features contained the most information on tree species, the best performing feature in each feature set and subset were extracted and used in LDA.Results were recorded as a model where included features were ordered by their F-ratio and an accuracy for the model.Accuracy is a measurement of model accuracy, namely the number of accurately classified trees as a percentage of the total [47].
Two features that by themselves provide a good basis for classification may not be the best combination, if they tend to provide similar information [44].Leave-one-out cross-validation accuracy was used to evaluate combinations of features selected by a stepwise feature selection algorithm [45].This method does not necessarily find the best combination, due to its heuristic nature, but it does find a good combination.

Species Level Classification Accuracy
Accuracy can be evaluated in different ways.To examine classification accuracy and confusion between species, confusion matrices were used.Accuracies were measured by leave-one-out cross-validation, the user's, producer's and overall accuracy were calculated from the classification made in the cross validation.To investigate how structural and spectral features contributed to classification accuracy of different species, a number of confusion matrices were created.The most interesting model should be the one that performs best, so one confusion matrix was made using the feature combination that gave the highest accuracy.An LDA-model was created using those features and cross-validated classification was used in the matrix.Additional confusion matrices were created for models built on only spectral and structural subsets of the feature set from which the best performing model was acquired.

Features for Tree Species Classification
Features used for tree species classification chosen by their F-ratio are presented in Table 3, ordered by the feature with the best (highest) F-ratio in that feature set to the worst of the eight presented, while those with a lower F-ratio are not included.
The ellipsoid data had four ellipsoid layers, numbered from the outside and inwards, from zero to three, and data from outside of the outermost ellipsoid, with an e as subscript.In horizontal-layer data, seven layers had been constructed, numbered from top to bottom zero to six with no features outside of the top or bottom layer.As described in Section 2.2.2 Distributions, the distribution statistics had 11 features for the intensity distribution and height distribution each.Percentiles used in both intensity and height percentile features were 1, 5, 10, 25, 50, 75, 90, and 95.There was also a 99th intensity percentile.
When the eight features with the highest F-ratio were selected, the only structural feature among them were the R hv -ratio, i.e., the ellipsoid shape of the crown.Another notable property of these features is that only two are from the 532 nm (C3) data, namely intensity outside of the ellipsoid and intensity in the second layer from the top, the rest are from 1064 nm (C2) and 1550 nm (C1).
In general features with information from the upper part of the trees contained more information on tree species.The three features from the ellipsoid feature set with the highest F-ratio were those with intensity information from outside the outermost ellipsoid.Horizontal layers ranging in number from zero to five were among the top eight when ordered by F-ratio, but five of eight features were from the three top horizontal layers.From the distribution feature set, only features from the average or median height and up (depending on distribution skewness) were among the eight best features; none of the 1st, 5th, 10th of 25th percentile features were among the eight with the highest F-ratio.
Table 3. Features ranked by their F-ratio.Both spectral and structural features were ranked.Feature number one was the best in the feature set to use if only one were to be used for classification.J, K and Q are intensity variables, and R hv is a value describing the shape of the ellipsoid.Superscripts indicate the channel.Subscripts of J indicate ellipsoidal layer enumerated from the outermost (0) to the innermost and e pertains to what is outside the outermost ellipsoid.Subscripts of K indicate horizontal layer enumerated from the topmost (0) to the bottommost.Subscripts of Q is the intensity percentile number, and Q is the mean intensity.

Feature
F-ratio 25.7 24.4 20.9 20.9 19.9 17.7 17.2 16.5 F-ratio 75.3 72.2 71.2 65.9 61.1 60.6 59.4 57.9 Combinations of features that could be used to produce a model with a high accuracy are presented in Table 4.Each row presents the best combination of features from that feature set, which were found using stepwise feature selection.

Feature Set Features
When features were combined to create an accurate classification model using stepwise feature selection, features from all channels were selected and all combinations had both spectral and structural features present.The combination from the ellipsoid feature set was primarily composed of intensity metrics from the outer parts of the tree crown; the only structural feature was R hv .
The feature combination from the horizontal layer feature set used four intensity features and three structural features.Only one, an intensity feature, was from channel 2 (i.e., 1064 nm).In the combination selected from the distribution feature set, intensity percentiles from the 1st to the 95th were included.Only one height distribution was selected by the algorithm: P C1 90 .

Spectral and Structural Information
Regardless of what feature set was used, multispectral data always gave a higher cross-validation accuracy than monospectral data, as shown in Table 5.The model using multispectral data and all features from the distribution data set had the highest accuracy.The greatest difference between multispectral and monospectral data was found when all variables were available for selection.The smallest difference between monospectral and multispectral data in accuracy was in the models produced with structural features only.
Table 5. Accuracy as a percentage for models built using first returns only.Rows are subsets of either intensity, structure or all features from the feature sets that are the columns.Accuracies using multispectral and monospectral (1064 nm) data are presented.

Species Level Classification Accuracy
In the confusion matrix of Table 6, user's, producer's and overall classification accuracies of the best performing model are presented.The stepwise feature selection algorithm chose seven intensity and only one structural feature from the distribution feature set.These were Table 6.Confusion matrix for the best performing model (features used are: The best performing model had a cross-validation accuracy of 76.5%.Spruce and pine had very high accuracies, the former with a 100% user's accuracy and the latter with 100% producers accuracy.Only one spruce was erroneously classified as a pine.Classification accuracy of maple, birch, ash and oak was fairly high, both in user's and producer's accuracy.Alder and linden were the two species with the lowest accuracies.
The model built using only spectral features resulted in accuracies as shown in Table 7. Results are almost identical to those in Table 6 with the exception of cherry, where none were correctly classified.
When only structural features were used, user's and producer's accuracies were fairly good only for pine (Table 8).No alders, maples, birches or cherry trees were correctly identified and most trees were classified as oaks.

Discussion
Tree species classification using both structure and intensity features from multispectral ALS data resulted in the highest accuracy.Spruce, pine and deciduous classes were well discriminated from each other, as shown in the confusion matrix of Table 6.The species that were confused with each other were mainly deciduous.Alder and cherry trees were relatively few in this sample, seven and five respectively, which made errors in those classes more difficult to interpret.Linden, with 23 individual trees, seemed to be easily confused with all other deciduous species except cherry.
The best model had a cross-validated accuracy of 76.5%.When this accuracy is compared to that only monospectral (at best 67.6%) or structural data (at best 43.0%), the results suggest that a model built on multispectral features might be useful for both nature inventory and forestry planning.
The ellipsoid and horizontal layer features were constructed to find combinations of spectral and structural features within an ellipsoid or horizontal layer that could be used for tree species identification; intensity features were related to structural features by deriving them from the same ellipsoid or horizontal layer.Distribution features, on the other hand, did not relate height and intensity distribution to each other in such a way.
Ellipsoid features that were highly ranked or selected were mainly from the outer parts of the crown and highly ranked or selected horizontal features were from the upper part of the crown.
Together, these results suggest that the upper and outer part of the tree crown is the most interesting part for species classification.
The best model used only one structural feature and seven intensity features.When features were ranked by their F-ratio (Table 3), the best were intensity features.This indicates that spectral data is highly relevant for tree species identification, which is known to be the case when using infrared aerial photography [35].In Table 8, pine seems to be the only species that can be easily identified using structural features only, but it can also be very well classified using intensity metrics only (Table 7).Even when using only the 1064 nm channel, the intensity features gave a higher accuracy than the structural features (Table 5).
Of all the features in Table 3, only two of 24 came from the 532 nm data (C3).When features were selected by stepwise feature selection, it resulted in eight of 24 features from the 532 nm data.This indicates that, while 532 nm data might not be useful as a single source of information, it is more useful when combined with other channels as it provides different information.The exact cause of the relatively poor performance of 532 nm data by itself can not be identified without further studies.
There are nonetheless some ways in which channel 3 differs from the other two channels.Due to eye safety regulations, the divergence has to be greater for visible light than for near infrared and short-wave infrared channels.As a result, the footprint of the green channel is larger and has less power per area.
Many ALS systems have an adaptive output effect, meaning that if there is a drop in return intensity, the output effect will increase.The value of the output intensity is not always recorded and this limits intensity-based classification to data from those ALS-systems that have either constant output effect or record the output effect.In the Optech Titan X system, the output effect is held constant, allowing return intensity variations to become apparent.ALS-data were gathered from a fairly low flying altitude of 400 m.This is not practical as each swath will only cover a small area.As a comparison, the New National Height model of Sweden was created using data gathered from an altitude of 1700 to 2300 m [40].Some features may be more difficult than others to construct in a closed-canopy stand, without information about the live crown height.Ellipsoid features might have to be adapted for a real-world usage.One way to adapt them could be by using a segmentation algorithm and fit half an ellipsoid over the automatically segmented crown.
Both the field and ALS data were collected for very specific purposes.Due to the large number of classes, some classes contained very few trees (e.g., cherry, with only five trees).The data was also collected from a small geographic area that may have resulted in a higher accuracy than if a separate validation data set from a larger area would have been used.The classification accuracies presented should be taken as a measurement of the usefulness of the features and feature sets, rather than as related to the accuracy of the model.Only solitary trees with clearly distinguishable crowns were used, to make exploration of data easier.Solitary spruce trees were scarce and therefore individuals from a closed-canopy stand were used.Spruce crowns may have grown into each other and become difficult to separate, meaning that structural features no longer describe an individual tree, but rather a mixture of neighboring trees.
When choosing and calculating candidate features, a number of subjective choices were made.The ellipsoid and horizontal layer thicknesses were both set to 0.5 m.Distribution features were also chosen subjectively.A cutoff was made at two meters from the ground to generate the percentiles.There were more high percentiles than low because it was presumed that a high value might be more interesting than a lower ones.This assumption seems to be true, as none of the lower percentiles were included in Tables 3 or 4. The feature sets used have been treated as separate; there is, however, no restriction for combining different feature sets.Technically, there is an endless amount of different theoretical features that could be extracted from a point cloud, forcing a choice to be made.Objective alternatives to manual feature design are machine learning algorithms that creates features.The drawback with such methods would be that features could become very synthetic and hard to interpret.
Only first returns were used, discarding information in subsequent returns.Proportions of first and last returns have been used as a basis for tree species classification, with a high accuracy for classifying pine and spruce [4].No similar combination of intensities in different channels, such as an analogue to normalized difference vegetation index [23], was used as a feature.
Two previous studies have used the Optech Titan X scanner for tree species classification.Budei et al. [33] classified the species and tree genera of individual trees from multispectral ALS data.The error was 3-5% for classification of broadleaf and coniferous trees, 13% for four genera, 20% for seven genera and 24% for ten different species.Yu et al. [34] classified the species of individual trees of three different species (i.e., pine, spruce and birch) from multispectral ALS data.The overall accuracy was 90.5% for isolated and dominant trees but lower for groups of trees and trees close to or below larger trees.The classification accuracy is difficult to compare between different studies and data, especially if the studies are done in different forest types.This study has used a material that was collected with the purpose to explore remote sensing data, as opposed to creating an optimal model for classification, which makes it even harder to compare the results to other studies.In general, classification of species has lower accuracy than classification of groups or genera.
Tree species classification from ALS data often includes features describing the shape and proportions of the tree crowns such as the ratio of crown length and tree height, the ratio of crown width and crown length, and the crown volume [10].Other commonly used features are related to the number of returns per emitted pulse and radiometric information such as the amplitude, echo width and backscatter cross-section or product of echo amplitude and width for waveform ALS data [10].In this study, the shape and proportions of the tree crowns were included by using percentiles of the vertical distribution of laser returns.According to Koenig and Höfle [10], features related to the signal strength at the upper crown parts and the vertical distribution of scatterers and their backscatter characteristics are the most important.This is consistent with the findings in this study that most of the information about tree species is found in the top layer of the canopy.

Conclusions
This study, based on 179 mature solitary trees from nine genera, provides further evidence that multispectral ALS data provide useful information for the classification of tree species.The best model, which had a cross-validated accuracy of 76.5%, used one structural and seven spectral features.When structural features (43.0%maximum accuracy) is compared with spectral features (76.0%maximum accuracy), it seems as if spectral data provides a better basis for tree species classification.High percentile values of the intensity distribution provided the most information.When the tree crown was measured in fitted ellipsoids or in layers, the results suggest that the upper and outer parts of the crown are most interesting for tree species classification.Solitary trees differ from those in closed-canopy stands, but since most information comes from the upper parts, this should prove no problem for classification in closed-canopy stands.

Figure 1 .
Figure 1.Map of the study site and the area covered by the laser scanning.The backdrop is a raster created from average intensity in each channel, where the red color band represents 1550 nm, the green represents 1064 nm and the blue represents 532 nm.Holes (light yellow) are present due to filtering of returns below two meters above ground.

Figure 2 .
Figure 2. A visualization of the ellipsoidal layers.Points in each layer were used to calculate an ellipsoidal feature.

Table 4 .
Features from different feature sets selected by stepwise feature selection.Only first returns and complete feature sets were used.J, K and Q are intensity features.E, R hv , L and P are structural features.Superscripts indicate channel.Subscripts of J indicate ellipsoidal layer enumerated from the outermost (0) to the innermost and e pertains to what is outside the outermost ellipsoid.Subscripts of K and L indicate horizontal layer enumerated from the topmost (0) to the bottommost.Subscripts of P and Q are height and intensity percentile number respectively.Q is the mean intensity.

Table 1 .
Summary of field data by species, n is the number of individuals in the class.

Table 2 .
Features of the distribution feature set.Percentile is denoted as a subscript.

Table 7 .
Confusion matrix for the model created by using only the spectral feature subset from the feature set resulting in the best performing model (features used are: QC1 , Q

Table 8 .
Confusion matrix for the model created by using only the structural feature subset from the feature set resulting in the best performing model (features used are: P C1 10 , dns C1 , P C2 10 and P C3 25 ).