Airborne Dual-Wavelength LiDAR Data for Classifying Land Cover

This study demonstrated the potential of using dual-wavelength airborne light detection and ranging (LiDAR) data to classify land cover. Dual-wavelength LiDAR data were acquired from two airborne LiDAR systems that emitted pulses of light in near-infrared (NIR) and middle-infrared (MIR) lasers. The major features of the LiDAR data, such as surface height, echo width, and dual-wavelength amplitude, were used to represent the characteristics of land cover. Based on the major features of land cover, a support vector machine was used to classify six types of suburban land cover: road and gravel, bare soil, low vegetation, high vegetation, roofs, and water bodies. Results show that using dual-wavelength LiDAR-derived information (e.g., amplitudes at NIR and MIR wavelengths) could compensate for the limitations of using single-wavelength LiDAR information (i.e., poor discrimination of low vegetation) when classifying land cover.


Introduction
Airborne light detection and ranging (LiDAR), which measures distance by illuminating a target with a laser, is used for the rapid collection of geolocated elevation data from the surface of the earth.The positions of the targets can be obtained based on a positioning and orientation system.Increasing numbers of researchers have used airborne LiDAR data in landscape mapping [1,2].LiDAR data typically contain 3D spatial point clouds and the intensity of returns (echoes), and its penetration capabilities make it a better system for identifying vegetation compared with photogrammetry.LiDAR systems can automatically classify land cover from geometric properties [1,3].Moreover, multispectral image and LiDAR data can provide a large amount of spectral and geometric information for land cover classification.The combination of LiDAR data with either multispectral [4][5][6] or hyperspectral [7] imagery has been demonstrated to improve land cover classification.
Recently, LiDAR technology has been developed into a full-waveform LiDAR system, which can record the complete waveform of a backscattered signal echo [8].The full-waveform LiDAR collects a continuous signal for each pulse, whereas the discrete-return LiDAR only collects four to five discrete points.Previous studies [8][9][10] have indicated that waveform LiDAR data record more physical characteristics than discrete-return LiDAR data.These physical characteristics affect the shape of waveforms and potentially benefit the land cover classification.For example, the waveform of an echo is wider on the canopy or ploughed fields than that on the roads [8].Each waveform is commonly represented by a mixed Gaussian model that is produced using a Gaussian decomposition process [11].Each return echo is represented by a Gaussian function, and the Gaussian parameters can be used to characterize the physical features of the echoes.For example, the echo width (Gaussian standard deviation) obtained from full-waveform data after decomposition, which is unavailable to discrete-return LiDAR data, has proven useful for land cover classification [12][13][14].The signal-processing step extracts various features from the waveforms, such as echo width [14,15], amplitude [15], intensity [15], rise/ fall time [9] and Fourier coefficients [10,16], which are used to classify land cover and identify tree species.Given these useful features, the application of waveform LiDAR data in land cover classification has been demonstrated.
Although most commercial airborne LiDAR systems emit laser radiation at a single wavelength, multi-spectral LiDAR (MSL) systems that emit laser radiation at various wavelengths have been recently developed.Given that the return laser intensities at various wavelengths are combined in the MSL data, these data can then be used to obtain several MSL indices, such as the normalized difference vegetation index (NDVI) [17] and tree structure segmentation [3], which cannot be obtained using single-wavelength LiDAR data [18,19].Thus, multiple potential applications of MSL systems have been demonstrated.Chlorophyll content retrieval with hyperspectral LiDAR was reported by [20], and NDVI with multispectral LiDAR was studied by [21,22].Morsdorf et al. [23] simulated an MSL waveform system to demonstrate its ability in capturing a vertical profile of leaf-level physiology.A dual-wavelength LiDAR can separate the canopy from ground returns [24].The dual-wavelength LiDAR system, a current MSL system, has been used for specific applications, such as measuring coastal water depths by using green and near-infrared (NIR) bathymetric LiDARs [25], measuring NDVI by using red-NIR wavelength LiDARs [26] and measuring the moisture content of vegetation by using NIR and middle-infrared (MIR) wavelength LiDARs [27].However, most dual-wavelength or MSL systems are commonly used for bench mounted test instruments or experimental terrestrial operations.MSL has not yet been used to measure the land from airborne platforms, as it is still at an experimental stage.
The classification of land cover in regional areas using remote sensing is essential.In this study, airborne dual-wavelength LiDAR data were obtained by combining two commercial airborne LiDAR systems that emit NIR and MIR laser pulses.The results demonstrated the potential of using dual-wavelength airborne LiDAR data to investigate land cover types.The dual-wavelength amplitude information and waveform features were used to classify land cover.A progressive classification test was conducted to demonstrate that using dual-wavelength LiDAR data resulted in more accurate land cover classification than using single-wavelength LiDAR data.

Study Area and Remote Sensing Data
Figure 1a shows the study area, Namasha (Namaxia), which is located on a hillside in southern Taiwan.Namasha, which is a famous source of precious wood, is a suburban district in the northeastern part of Kaohsiung City, located upstream of the Kao-ping river watershed (Figure 1a).This area was severely damaged by Typhoon Morakot in 2009.The study area is 0.95 km 2 , with an average elevation and slope of approximately 722 m and 18°, respectively.Table 1 shows the dual-wavelength data configuration in the two LiDAR systems.LiDAR data were acquired using the Optech ALTM Pegasus HD400 and the Riegl LMS-Q680i systems.The Optech system emits NIR laser pulses at a wavelength of 1,064 nm [28], whereas the Riegl system emits MIR laser pulses at a wavelength of 1,550 nm [29].The proposed dual-wavelength LiDAR was obtained by integrating two LiDAR systems, because no airborne, dual-wavelength (e.g., NIR-MIR) LiDAR system was currently available.In the experimental period, most land cover did not change in study area.The radiometric correction for each LiDAR system has been determined [30].Further correction of dual-wavelength LiDAR systems will be considered for advanced usage [31].The accuracy of the collected LiDAR data can be verified by comparing with independently surveyed ground control points.Both systems yielded horizontal accuracy of less than 0.40 m and vertical accuracy of less than 0.10 m.An IGI DigiCAM was used in the Riegl LMS-Q680i system to produce an orthoimage.To develop a reference dataset for validating the classification results, we identified six classes of land cover based on this orthoimage.The classes were selected based on the landscape of the test area: road and gravel (R&G), bare soil (SOIL), low vegetation (LV), high vegetation (HV), roofs (ROOF) and water bodies (WATER).R&G comprised the asphalt and gravel along the western side of the river and on the south side of the study area.LV comprised grass, low crops and other vegetation shorter than 2 m.HV comprised vegetation taller than 2 m, such as broadleaf evergreen forests.Water absorbs most of the incoming radiation [32].This could result in the low intensity of LiDAR return points or few return points from water bodies.In this study, low-intensity points were returned from water bodies in the Optech system, whereas few return points from water bodies were observed in the Riegl system.Studies have applied the LiDAR data from water bodies to delineate the river boundaries [33].
Figure 1b shows the locations of the reference samples used for training and tests.Various classes of land cover within a small area are often mixed.For example, when LV is not dense, SOIL and LV may mix and become difficult to separate.Thus, two rules were used to assess the reference samples.First, the pixels must be clearly recognizable on the reference samples.Second, the reference samples must be pure, containing no more than one class of land cover.For example, an area containing a mixture of grass (LV) and trees (HV) would not be considered a reference sample.and the illuminated objects along the laser path.Multi-return echoes are recorded in the laser waveform information, and the waveform data can be decomposed into individual components to characterize the original waveform and echoes [34].In the Gaussian decomposition method, which has been widely applied [11,13,14,35], a Gaussian function is used to represent a decomposed component; this method was used in this study to decompose a waveform into individual echo components.After decomposition, a Gaussian mixture representing a waveform with multiple distinct components was obtained.These components were described using three Gaussian parameters, namely, mean, amplitude and standard deviation.The Gaussian mean of each component was combined with the attitude information of the system when the laser was fired to map the 3D coordinates of each object.The echo amplitude and standard deviation were then attached to each 3D component as the attributes of the LiDAR points.The amplitude and standard deviation of the first LiDAR echo are termed "amplitude" and "echo width" hereafter.

Data Integration and Feature Selection
Most land covers contain one major echo, except trees and building roofs.Only the first-return (echo) extracted from each full waveform was selected to analyze the land cover.To integrate the LiDAR data, the sample points from the two LiDAR systems were interpolated into gridded images at 1-m resolution and integrated for subsequent processing.The moving average in a circle with a 2-m radius was applied for the interpolation.Based on the LiDAR data characteristics, the following features were captured: (1) amplitude; (2) echo width; and (3) surface height from the Riegl and Optech systems.Surface height is the height of the land cover from the ground elevation and the digital surface model (DSM).The ground elevation was obtained from the digital elevation models (DEMs) that were, in turn, obtained by processing the point clouds by using TerraScan (TerraSolid software) and manual procedures.First, the TerraScan was applied to filter out non-ground points automatically.Manual inspection and editing were subsequently conducted to ensure the quality of the ground data points.
Major features were selected using the Bhattacharyya distance (separability) [36], which is widely used in feature selection and extraction studies.For feature selection, the Bhattacharyya distance, B , has been used as a class-separability measurement between two land cover types based on the assumption of multivariate normality, and is expressed as follows: where M and C are the mean vector and covariance matrices of class i, respectively.The lower values of Bhattacharyya distance represent less separable classes and higher classification errors.Based on the relation between the Bhattacharyya distance and classification error in the graph of [36], the criterion for the Bhattacharyya distance is 1 if the classification error is less than 10%.

Classification
The support vector machine (SVM), a supervised classification algorithm, is an effective classification method.SVM is capable of mixing data from diverse sources, responding robustly to dimensionality, and effectively functioning non-linearly in remote sensing applications [37].The kernel of SVM used in this study was the Gaussian radial basis function.The SVM algorithm is implemented by using the functions from MATLAB (R2012a).Six classes (R&G, SOIL, LV, HV, ROOF and WATER) were chosen as the land cover categories.Amplitude, surface height and echo width from Riegl and Optech systems were used as the major features for classification.From the reference (sampling) data, 1% of samples in each class was selected as the training data in the SVM classifier.After the SVM classifier was trained, all reference data, except the training data, were treated as validation data.The various LiDAR feature sets were used for the progressive classification test.The confusion matrices for each feature set were calculated to assess the classification results.

Analysis of Features
Figure 3 shows the distribution of the amplitude, surface height and echo width of the image pixel elements from the six classes in the reference data.The amplitude values from the Riegl system allowed three groups, namely, WATER, {R&G, LV, HV} and {SOIL, ROOF}, to be distinguished.The amplitude feature from the Optech system improved the separation of WATER, R&G and the remaining classes.The merits of using both amplitudes for classifying land cover are reflected in the accuracy of the preliminary classification.The surface height from the Riegl and Optech systems provided information for separating {R&G, SOIL, LV, WATER} from {HV, ROOF}.The echo width information from the Riegl system indicated two groups, namely WATER, and {R&G, SOIL, LV, HV, ROOF}.
In summary (Figure 3), WATER can be readily classified using most of the features.R&G can be classified using amplitude information from the Optech system.SOIL can be separated from other classes by combining data on amplitude and surface height from the Riegl system.HV can be classified by combining amplitude and surface height information from the Riegl system, and ROOF can be classified by combining all features.

Figure 3.
Frequency distribution of (a) the amplitude from the Riegl system, (b) the amplitude from the Optech system, (c) the surface height from the Riegl system, (d) the surface height from the Optech system, (e) the echo width from the Riegl system and (f) the echo width from the Optech system.

Feature Selection Using Bhattacharyya Distance
Table 2 lists the Bhattacharyya distances among the classes for different feature sets.The performances of the Riegl and Optech surface height and echo width were consistent.The Riegl surface height and echo width were eventually considered as the major features in the study based on the comparison of Bhattacharyya distance matrix determinants.The matrix determinants of the Riegl surface height and echo width were larger than those of the Optech ones.When the model considered the Riegl surface height information, the classes such as HV and ROOF could be separated from other classes.When the model considered the Riegl echo width information, the Bhattacharyya distances between HV and SOIL and between HV and R&G were 0.85 and 0.83, respectively.The Riegl and Optech systems provided complementary amplitude information for land cover discrimination.When the Optech amplitude information was used, the separability between LV and R&G was 1.68, and 0.21 between LV and SOIL.When the Riegl amplitude information was used, the separability between LV and R&G was 0.44, and 1.98 between LV and SOIL.The same situation in complementary amplitude information occurred between HV and R&G and between HV and SOIL.Compared with the separability values obtained using the Riegl amplitude information, those obtained using the Optech amplitude information were higher for HV and R&G but lower for HV and SOIL.However, when the model considered both sets of amplitude information, the separability between LV and R&G and between LV and SOIL increased.When the model considered both the Riegl and Optech amplitude information, all land cover became separable, except between ROOF and SOIL and between ROOF and LV.A feature is more critical if the separability among all land cover types is higher.Moreover, feature separability is highly related to classification accuracy.Amplitude is a dominant feature that varies based on the radiometric and geometric properties of the targets [38].When classifying land cover, the measured amplitudes are high for bare soil and grass and low for water and roads.However, the amplitude varies for high vegetation and roofs of buildings depending on the materials and sensors.LiDAR-based features, such as laser intensity, amplitude, surface height, and topographic data, are primarily used to classify land cover [39].The feature information of LiDAR data is critical to increase the discriminability of LV and HV classes because the information contains similar spectral signatures [40].Numerous applications described in the introduction (e.g., chlorophyll or NDVI) are available from dual-wavelength LiDAR data.Future studies should examine the potential of dual-wavelength LiDAR data for extracting the details of vegetation species.When the commercial MSL becomes available for airborne platforms in the future, the MSL instruments will contain many more wavelengths to improve separability.Key information, such as the chlorophyll, NDVI and moisture content, about the vegetation can be derived from MSL data.The applications for vegetation species recognition and forest ecosystem estimation would be expected to benefit from the information.

Classification Accuracies
Table 3 shows the confusion matrices of the classification results using various feature sets.Based on the feature set, ϕ 1 , which comprised the surface height and echo width, the overall accuracy of the classification reached 84.29%.However, the level of producer accuracy was extremely low for LV, and many LV pixels were misclassified into R&G and SOIL.Thus, the user accuracy was poor for R&G and SOIL.For the other classes (R&G, SOIL, HV, ROOF and WATER), the feature set, ϕ 1 , provided sufficient information for classification.Based on the feature set, ϕ 2 , including additional Optech LiDAR amplitude information, the overall accuracy reached 90.00%.By considering Riegl amplitude features, surface height and echo width in the feature set, ϕ 3 , the overall accuracy reached 91.63%.LV was misclassified as R&G more frequently using Riegl amplitude information compared with using Optech amplitude information.However, SOIL was misclassified less using the Riegl amplitude than it was using the Optech amplitude (Table 3).User accuracy in separating SOIL and ROOF was higher using the Riegl amplitude than it was using the Optech amplitude, whereas user accuracy for R&G and LV was higher using the Optech amplitude information compared with using the Riegl amplitude information.When the feature set, ϕ 4 (surface height, echo width and dual-wavelength amplitude), was used, the overall classification accuracy substantially increased compared with using a single system.When ϕ 4 was used, the producer accuracy for LV increased to 88.3% from 44.82% and 51.90% for single systems, and both the overall producer and user accuracies exceeded 90%, except the LV producer accuracy.The overall accuracy (97.4%) and Kappa (0.966) values were highest when features including the dual-wavelength amplitude were used.Without considering the echo width in ϕ 5 (surface height and dual-wavelength amplitude), the overall accuracy decreased to 96.8% and the Kappa value decreased to 0.959.Thus, the echo width could be discarded because of its low effect on the classification.Figure 4 shows the land cover classification results based on various datasets.Most land covers were classified more accurately.These results indicate the effectiveness of using dual-wavelength airborne LiDAR data to classify land cover (Figure 4).Given that the reflectance of land cover objects varies based on wavelength, land cover objects (e.g., LV and HV, SOIL and LV) cannot be readily distinguished when amplitude information is used at a single wavelength.The features of dual-wavelength data are primarily responsible for the improvement in land cover classification demonstrated in this study.The use of dual-wavelength LiDAR data offers effective geometry information to classify land cover.First, LiDAR data can provide 3D information.Thus, the DSM, DEM, and surface height can be directly obtained.Second, LiDAR data can record multiple returns in forest areas.The canopy reflectance information in spectral images is considerably influenced by the objects under the canopy.Dual-wavelength LiDAR amplitude and geometric information for the canopy, understory vegetation, soil, and other land cover types precisely represent the features of these covers.By contrast, based on the spectral image, the canopy signal cannot be readily separated from that of the understory vegetation and soil.Thus, the LiDAR data are potentially useful in classifying 3D tree species.Third, current LiDAR systems can record waveform data that allow physical features to be extracted, such as the echo width used in this study.These features cannot be obtained from discrete-return LiDAR.All these features, including dual-wavelength amplitude features, facilitate land cover classification, as clearly demonstrated by the current findings.Therefore, this study revealed the potential of dual-wavelength LiDAR applications, which can be developed when airborne LiDAR systems become available.From a practical perspective, the combination of LiDAR and multi-spectral images will be useful for land cover classification.

Conclusion
In this study, two airborne LiDAR systems were used to obtain dual-wavelength LiDAR data (i.e., amplitudes at NIR and MIR wavelengths) and classify land cover.The proposed processes involved waveform data processing, data integration, feature selection, and land cover classification.The findings show that using dual-wavelength airborne LiDAR systems could substantially improve land cover classification in large areas compared with using single-wavelength LiDAR.The dual-wavelength amplitude features facilitated the identification of vegetation, particularly LV, more accurately compared with using single-wavelength amplitude.
Based on the major features of LiDAR data, land cover was effectively classified in the absence of auxiliary remote sensing data, and the overall classification accuracy reached 97.4%.Additional applications can be designed for this method in the future until airborne dual-wavelength LiDAR systems are developed.

Figure 1 .
Figure 1.(a) Location of the study area; (b) location of the reference data for classification.

Figure 2
Figure2shows the processes used in the classification model, namely, data processing, data integration, feature selection and classification.Both the Optech and Riegl LiDAR systems can provide waveform data, recording an intensity signal that represents the interactions between the emitted laser

Table 1 .
Configuration of dual-wavelength data in the two light detection and ranging (

Table 2 .
Bhattacharyya distance between land cover classes with different feature combinations.

A Riegl , A Optech , h
* The surface height (h) and echo width (σ) are from the Riegl system; the number in parentheses represents those from the Optech system.** A Optech , A Riegl : the Optech and Riegl amplitude information.