Using Multispectral Airborne LiDAR Data for Land/Water Discrimination: A Case Study at Lake Ontario, Canada

: Coastal areas are environmentally sensitive and are affected by nature events and human activities. Land/water interaction in coastal areas changes over time and, therefore, requires accurate detection and frequent monitoring. Multispectral Light Detection and Ranging (LiDAR) systems, which operate at different wavelengths, have become available. This new technology can provide an effective and accurate solution for the determination of the land/water interface. In this context, we aim to investigate a set of point features based on elevation, intensity, and geometry for this application, followed by a presentation of an unsupervised land/water discrimination method based on seeded region growing algorithm. The multispectral airborne LiDAR sensor, the Optech Titan, was used to acquire LiDAR data at three wavelengths (1550, 1064, and 532 nm) of a study area covering part of Lake Ontario in Scarborough, Canada for testing the discrimination methods. The elevation-and geometry-based features achieved an average overall accuracy of 75.1% and 74.2%, respectively, while the intensity-based features achieved 63.9% accuracy. The region growing method succeeded in discriminating water from land with more than 99% overall accuracy, and the land/water boundary was delineated with an average root mean square error of 0.51 m. The automation of this method is restricted by having double returns from water bodies at the 532 nm wavelength.


Introduction
Water bodies are one of the irreplaceable natural resources, which are subject to intense exploitation and require monitoring for their sustainable management. Land/water discrimination is a valuable topic in terms of the services it provides to decision makers such as water area estimation [1,2], flood monitoring [3,4], flood disaster assessment [5][6][7], hydrological regulation and erosion control [8], and water resources management [9][10][11].

Related Work
Over the past decades, remotely sensed data have been employed for bathymetric applications. Airborne Light Detection and Ranging (LiDAR) Bathymetry (ALB) systems have been developed to measure water depth in coastal zones [12]. ALB systems typically operate at dual-wavelengths, namely near infrared (1064 nm) and green (532 nm), in order to detect water surfaces and benthic layers. Allouis et al. [13] presented a combination of mathematical and heuristic methods to estimate the water depth in very shallow water areas. The LiDAR waveforms, recorded at the near infrared and green wavelengths, were first synchronized, and the peak of the waveform at near infrared wavelength was amplified. The approximate positions of the water surface and benthic layer were then performed by simple maxima detection algorithm. After that, a non-linear least square fitting of a sum of two Gaussian functions on the green signal were conducted. Finally, the water depth was calculated from the accurate estimation of the water surface and benthic layer positions, and the minimum detectable depth was around 1 m. This approach is found to be computationally expensive and requires many processing steps, as well, the delineation of land/water interface was not reported in that study.
When the green laser contacts water molecules, a small portion of energy is returned at a wavelength of 647 nm, which is called the red (Raman) channel [14]. Pe'eri and Philpot [15] analyzed the recorded waveform at the Raman channel, which represents the received intensity (digital number) relative to time (in nanoseconds), and divided it into 41 bins. Then, a normalized difference index (NDI) was created between bin 11 and bin 27 to discriminate water from land by applying a threshold value to the NDI. The LiDAR measurements were manually classified into land, water, and suspected as water. Despite the good achieved results (97%), there is a transition class "suspected as water" that cannot be identified as land or water. In addition, the manual processing of the waveform is required for each tested dataset.
Over the past years, land/water discrimination has been investigated using LiDAR data collected by monochromatic-wavelength airborne LiDAR systems. LiDAR point features such as point density [16][17][18], roughness [18,19], and intensity variation [19] were extracted and used in the discrimination process. However, data-driven threshold values were manually selected to make those point features fit the tested data, which means the existing methods are ad hoc. In addition, the existing methods require prior knowledge of the land/water interface to aid in the supervised discrimination process.
Antonarakis et al. [20] introduced a supervised object-orientated approach to separate the water bodies from land objects in urban areas. The water surfaces were assumed to have low elevation and intensity values so that the LiDAR points within a specific height range and average intensity values of less than specific threshold values were classified as water points. The manual setting of such threshold values leads to more than a 95% accuracy index for water class. Smeeckaert et al. [17] used a set of features to train a Support Vector Machine (SVM) algorithm in order to classify water areas. The results were then refined by incorporating contextual knowledge to remove pixel-wise misclassification. Their proposed work achieved an overall accuracy of more than 95% in coastal areas. Crasto et al. [18] developed a decision tree to classify water areas based on point density, elevation standard deviation, and intensity standard deviation of the LiDAR data. The results revealed that the water areas were classified with more than 95% accuracy compared to manually interpreted data.
Brzank et al. [16] classified the water surface in the Wadden Sea using a supervised fuzzy classifier based on height, intensity, and 2D point density. For each feature, a membership value (between 0 and 1) was assigned to each LiDAR point, where the values "0" and "1" indicate the mudflat and water class, respectively. Two study areas were tested, and the results yielded a completeness and correctness value up to 99.2% and 98.5%, respectively. Schmidt et al. [21] introduced a point-wise supervised classification method based on geometric and intensity features. Eight features were extracted and used to build a classifier based on conditional random fields to distinguish between water, mudflat, and mussel bed classes at the Wadden Sea. The water class was classified with a completeness of 82.4% and a correctness of 66.3%.
Höfle et al. [19] proposed a workflow for water surface classification and land/water boundary delineation based on a region growing algorithm. The seed points were first selected based on a minimum intensity density threshold value. The growing criteria were conducted by applying threshold values on the height difference, intensity density, and intensity variation in order to create segments. All segments were then classified into land and water segments by applying a threshold on minimum segment size and mean segment roughness. The algorithm demonstrated an overall classification accuracy of 97% with a root mean square error of 0.45 m for the land/water boundaries. Despite achieving a higher overall accuracy, this method requires a significant pre-processing step, which is the model of LiDAR data dropouts. This step cannot be conducted without the knowledge of the pulse repetition frequency, Global Positioning System (GPS) timestamps, and the scan direction.

of 21
In addition, many threshold values were manually selected to fit the tested data. Nevertheless, this method could not discriminate water surface from asphalt because they both exhibited low intensity and smooth surface.
Hooshyar et al. [22] delineated wet channels using LiDAR intensity and elevation data. The vegetation canopy was first filtered out by applying a threshold value on elevations, and a digital elevation model (DEM) was created. Potential wet channels extents were then extracted based on their geomorphologic characteristics. In parallel, an intensity image was created from ground returns, and a Gaussian mixture model was used to decompose the intensity image into three classes, namely wet, transition and dry classes. The edges of wet channels were then extracted from the gradient of the intensity image. Finally, the networks, extracted from the LiDAR DEM and the intensity image were overlaid. However, the final network was disconnected because of the mismatch between the two networks. This is mainly due to distortion of the DEM through the generation process or error in the DEM itself. Another possible reason for disconnection is that the low quality of data and missing intensity data.
The supervised machine learning techniques such as SVM [17], decision trees [18], or random forest are attributed to the amount of available field sample data. These samples are used as training data, which are usually limited, and affect the achieved accuracy by those methods [23]. In addition, the output results of those methods are produced using data-driven threshold values, which cannot be generalized for any dataset under different conditions. We aim in this research to present an unsupervised and fully automated land/water discrimination method using LiDAR data acquired at different wavelengths. In addition, we use the seeded region growing algorithm that considers the spatial coherence of the LiDAR data by using the neighboring points of each LiDAR point.

Multispectral LiDAR Systems
Recently, several attempts have been conducted to use multispectral LiDAR data collected from either laboratory-based multispectral LiDAR systems or terrestrial laser scanning (TLS) for vegetation applications. Woodhouse et al. [24] used laboratory-based measurements from a tunable laser operating at four wavelengths of 531, 550, 660, and 780 nm. The normalized difference vegetation index and photochemical reflectance index [25] were then used to measure plant structure and its interaction with the environment. Shi et al. [26] used a laboratory-based prototype of multispectral LiDAR, designed by [27], to record the reflectance values at four different wavelengths (556, 670, 700 and 780 nm). The physiology of the canopy was then studied using three vegetation indexes, and the reported overall classification accuracy yielded above 90%.
Regarding the TLS platforms, Puttonen et al. [28] used a TLS developed by the Finnish Geodetic Institute, called the Hyperspectral LiDAR system. The system was tested in an outdoor experiment using seven wavelength bands ranging from 500 to 980 nm to discriminate man-made targets from vegetation based on their spectral response. Four vegetation indexes were used in the classification process and achieved an overall accuracy of up to 91%. Other studies used the TLS system for measuring the three-dimensional structure of forest canopies by two lasers with wavelengths of 1063 nm and 1545 nm [29] and discriminating leaves from woody material in forestry areas by using data collected at dual-wavelengths (i.e., 1064 and 1548 nm) [30]. A few attempts have been conducted to combine different flight missions from airborne LiDAR systems of the same study area [31,32]. However, it was difficult to use the intensity data from different missions because those flight missions were conducted at different times. Thus, the surface conditions were not identical and, hence, intensity values were affected.
Teledyne Optech has recently launched the first commercial multispectral airborne LiDAR sensor, the Optech Titan. Optech Titan's typical configuration includes three active laser beams operating simultaneously at three wavelengths, collecting data in three channels. The order of the channels according to their wavelengths are as follows: C1 with wavelength of 1550 nm at 3.5 • forward looking, C2 with wavelength of 1064 nm at 0 • nadir looking, and C3 with wavelength of 532 nm at 7 • forward looking. The beam divergence of C1 and C2 is 0.35 mrad, while the beam divergence of C3 is 0.7 mrad. The preferable flying height for topographic applications is between 300 and 2000 m above ground level (AGL), while the flying height is between 300 and 600 m AGL for bathymetric applications. The pulse repetition frequency of each laser beam varies from 50 to 300 kHz. The scan frequency can be programmed to take values between 0-210 Hz. The scan angle (field of view) is programmable in the range from 0 • to 60 • [33,34].
A few studies have been conducted to explore multispectral LiDAR data collected by the Optech Titan in land cover classification. Raster images were created from the LiDAR intensity and height data, and image classification techniques were then applied [35][36][37][38]. As well, the multispectral LiDAR data were explored for 3D point classification in urban areas [39][40][41][42]. Fernandez-Diaz et al. [34] presented an overview on the use of the multispectral Titan data in different applications, such as land cover classification, measuring water depths in shallow water areas, and forestry mapping, however, nothing was reported on land/water discrimination.
In a previous work conducted by the authors on land/water discrimination, we defined three normalized difference feature indexes (NDFIs) based on the intensity data acquired at the three wavelengths by the Optech Titan sensor [43] as follows: where I C1 , I C2 , and I C3 are the intensity from the C1, C2, and C3, respectively. The results showed that the NDFIs could not be directly used for land/water discrimination. This is because of the recorded intensity values of vegetation and water are very close in each channel. Consequently, water points are misclassified as vegetation points [43].
The green wavelength has the ability to penetrate water bodies, allowing it to detect water surfaces and/or benthic layers [12][13][14][15], while the infrared wavelengths can detect the water surface. This adds a new capability to the LiDAR system, especially at the land/water interface. Therefore, the dual-wavelength or multispectral airborne LiDAR system, including the green wavelength, is expected to aid in the automation of the land/water discrimination process. Also, to the best of the authors' knowledge, there is a lack of studies using dual-wavelength or multispectral LiDAR data in discrete returns format for land/water discrimination. In this research, we aim to (1) evaluate a set of point features, derived from the LiDAR data based on geometric and radiometric attributes, for discriminating water body from land objects; and (2) present an unsupervised land/water discrimination method based on a seeded region growing algorithm using discrete return multispectral LiDAR data.

Study Area and Dataset
A study area located near Lake Ontario in Scarborough, Ontario, Canada was tested in this research. The study area covers a variety of land features, objects, including bare soil, roads, grass, shrubs, trees, and buildings, as well as part of Lake Ontario (Figure 1), and it includes a land/water interface with a gentle slope. A flight mission was conducted on 3 September 2014 to acquire multispectral LiDAR data using the Optech Titan. The acquired data were received from the supplier as a time-tagged 3D point cloud file with multiple returns in LASer file format (LAS) for each channel. A 400 m by 200 m subset of LiDAR data was clipped for the experimental testing. Table 1 summarizes the specifications of the data subset. The LiDAR data of the three channels are shown in Figure 2. The difference in the number of points between channels is attributed to the interaction of each feature with different wavelengths (e.g., reflection from the water surface and/or water benthic layer, and greenness of the vegetation). As shown in Figure 2, the highest elevation in the three channels is very close, while the lowest elevation in C3 is lower than that in C1 and C2 by about 5 m. This is attributed to the recorded points from the water benthic layer. The intermediate part of the water body has high intensity variation in all channels. In C2, the high vegetation (i.e., trees) has a relatively high intensity variation. The water points have single returns in C1 and C2, while most of them have double returns in C3 due to the reflectance from the water surface as well as water benthic layer as it is a shallow water area.  A 400 m by 200 m subset of LiDAR data was clipped for the experimental testing. Table 1 summarizes the specifications of the data subset. The LiDAR data of the three channels are shown in Figure 2. The difference in the number of points between channels is attributed to the interaction of each feature with different wavelengths (e.g., reflection from the water surface and/or water benthic layer, and greenness of the vegetation). As shown in Figure 2, the highest elevation in the three channels is very close, while the lowest elevation in C3 is lower than that in C1 and C2 by about 5 m. This is attributed to the recorded points from the water benthic layer. The intermediate part of the water body has high intensity variation in all channels. In C2, the high vegetation (i.e., trees) has a relatively high intensity variation. The water points have single returns in C1 and C2, while most of them have double returns in C3 due to the reflectance from the water surface as well as water benthic layer as it is a shallow water area. A 400 m by 200 m subset of LiDAR data was clipped for the experimental testing. Table 1 summarizes the specifications of the data subset. The LiDAR data of the three channels are shown in Figure 2. The difference in the number of points between channels is attributed to the interaction of each feature with different wavelengths (e.g., reflection from the water surface and/or water benthic layer, and greenness of the vegetation). As shown in Figure 2, the highest elevation in the three channels is very close, while the lowest elevation in C3 is lower than that in C1 and C2 by about 5 m. This is attributed to the recorded points from the water benthic layer. The intermediate part of the water body has high intensity variation in all channels. In C2, the high vegetation (i.e., trees) has a relatively high intensity variation. The water points have single returns in C1 and C2, while most of them have double returns in C3 due to the reflectance from the water surface as well as water benthic layer as it is a shallow water area.

Point Features Extraction
A set of point features could be extracted for each LiDAR point. These point features are divided into three categories, namely elevation-based features including height variation (HV) and height standard deviation (HSD); intensity-based features including intensity coefficient of variation (ICOV) and intensity density (ID); and geometry-based features including point density (PD) and number of returns (NOR). Figure 3 shows the workflow of land/water discrimination using different point features.

Point Features Extraction
A set of point features could be extracted for each LiDAR point. These point features are divided into three categories, namely elevation-based features including height variation (HV) and height standard deviation (HSD); intensity-based features including intensity coefficient of variation (ICOV) and intensity density (ID); and geometry-based features including point density (PD) and number of returns (NOR). Figure 3 shows the workflow of land/water discrimination using different point features.

Point Features Extraction
A set of point features could be extracted for each LiDAR point. These point features are divided into three categories, namely elevation-based features including height variation (HV) and height standard deviation (HSD); intensity-based features including intensity coefficient of variation (ICOV) and intensity density (ID); and geometry-based features including point density (PD) and number of returns (NOR). Figure 3 shows the workflow of land/water discrimination using different point features.  The water points were labeled based on the following assumptions. First, the water points were assumed to have the lowest intensity in the scene [16,20]. Second, the intensity variation was higher in water bodies than in the land areas [18,19]. This is attributed to the laser pulse angle of incidence with respect to the water surface, which relies on the scan angles, water surface roughness (i.e., waves), the and aircraft's attitude. Third, the C1 and C2 were used to acquire LiDAR data from water surfaces, while the green wavelength was used to acquire LiDAR data from both water surfaces and benthic layers due to its ability to penetrate water bodies. Thus, these characteristics increased the usefulness of the LiDAR data in land/water discrimination, when the C3 was combined with C1 or C2.
The LiDAR point features were extracted based on a local neighborhood of each point, where neighboring points were considered using a fixed searching radius in 2D space. A searching radius of 1 m was used in order to define the land/water interface with high resolution and to ensure a sufficient number of points in the neighborhood [44]. The details of those point features are presented in the following subsections, assuming that a LiDAR point p i and its neighboring points are j, where j = 1, 2, 3, ..., n, and n is the total number of neighboring points of p i .

Height Variation (HV)
Height Variation is the difference between the maximum and the minimum elevations of neighboring points of a LiDAR point. The water points were assumed to have high HV C3 values when using the green wavelength in C3 due to the recorded returns from water surfaces and benthic layers. Also, the water points were assumed to have low HV C1 or C2 values when using the infrared wavelengths in C1 or C2 due to the recorded returns from water surfaces only.

Height Standard Deviation (HSD)
Height Standard Deviation is the standard deviation of a LiDAR point's elevation in relation to neighboring points. The water points were assumed to create a horizontal surface (i.e., low HSD C1 or C2 ) when using the infrared wavelengths in C1 or C2. The HSD C3 of water points should be increased when using the green wavelength due to the recorded returns from the water surface and benthic layer. The HSD can be calculated from Equation (4), where Z j is the point's elevation and Z is the mean elevation.

Intensity Coefficient of Variation (ICOV)
Intensity Coefficient of Variation is calculated by dividing the standard deviation by the mean of the intensity values. The water points were assumed to have ICOV C1 or C2 or C3 greater than those of land objects. The reason for this is the intensity variation in water bodies is relatively high as a result of different incidence angles. For each LiDAR point in any channel, the ICOV C1 or C2 or C3 can be calculated from Equation (5), where I j is the point's intensity value and I is the mean of the intensity values.

Intensity Density (ID)
Within the searching radius, the ID C1 or C2 or C3 is the percentage of LiDAR points that have intensity values below a threshold value (I thresh ) as shown in Equation (6). The I thresh was identified using the Jenks natural breaks optimization method as explained in detail in Section 3.2. The water points were assumed to have high ID C1 or C2 or C3 . This is due to the water points usually having the lowest intensity values in the scene.
Point density refers to the number of LiDAR points per square meter. The water points were assumed to have low PD C1 or C2 when using the infrared wavelengths in C1 or C2, and high PD C3 when using the green wavelength in C3 due to the presence of additional points from benthic layers.

Number of Returns (NOR)
The NOR is defined as the recorded number of returns for each laser pulse. Table 2 shows the number of possible returns that can be recorded using the infrared and green wavelengths. The water point should have a single return when using the infrared wavelengths in C1 or C2 due to the recorded returns from the water surface only, and the water points should have double returns when using the green wavelength in C3 due to the recorded returns from water surfaces and benthic layers. The point feature was extracted considering single or double channels. For instance, the evaluation of ICOV and ID features were conducted using the data of a single channel (C1, C2, or C3), thus, different results were obtained from the three channels for each point feature. The HV, HSD, PD, and NOR features were conducted using the combined data from two channels (i.e., C3 with C1 or C3 with C2) and, hence, two different results were obtained.

Land/Water Points Labeling Based on LiDAR Point Features
We used different methods and/or criteria to automatically define the threshold values and then apply them to each LiDAR point feature in order to label water and land points as follows. For NOR, double returns were recorded from the water body in C3, while single returns were recorded in C1 or C2. Based on these characteristics, the water and land points were labeled.
Previously, the water points were labeled from single wavelength LiDAR data based on HSD [18,19] or PD [16][17][18] by applying manually selected threshold values. In this research, a combination of C3 with C1 or C2 was used to automatically label water points. The values of HV, HSD, and PD are higher for C3 than C1 or C2. Therefore, a LiDAR point was labeled as land or water as follows: The Jenks natural breaks optimization method was used to determine threshold values for ICOV and ID. This optimization method has been designed to minimize within-class variances and maximize the between-classes variance [45]. Let any of the aforementioned LiDAR point feature values range from [a, ..., b], and the threshold values t ∈ [a, ..., b]. The t was identified in order to separate land from water by maximizing the between-classes sum of the squared mean differences. This was done as follows: where M is the mean of point feature values, M 1 and M 2 are the mean values of the first and second classes, respectively. M was first calculated. Then, the points were divided into two classes with ranges [a, ..., t] and [t, ..., b]. The mean values M 1 and M 2 were calculated. Finally, the optimal threshold value t was obtained from Equation (10).
The tested data was assumed to consist of three main classes: water, vegetation, and built up areas. Therefore, two threshold values were identified using the Jenks optimization based on intensity. As the water points were assumed to have the lowest intensity in the scene, the first threshold value (I thresh ) was used to label water points. If the point's intensity is lower than the I thresh , the point was labeled as water; otherwise it was labeled as land. The I thresh was used to calculate the ID from Equation (6), and a percentage ID thresh was used. For ICOV, the LiDAR point clouds were separated into land and water using a threshold value (ICOV thresh ). This was done as follows:

Seeded Region Growing for Labeling Water Points
Seeded region growing is a pixel-based image segmentation algorithm [19,[46][47][48]. In this research, the seeded region growing algorithm was applied on the 3D LiDAR points. The algorithm is divided into two main steps: selection of seed points and criterion for region growing (Figure 3). Previous studies mainly relied on manually selecting seed points or applying data-driven threshold values on the LiDAR data. Also, in region growing criteria, threshold values were manually selected to fit the tested data. We aimed to present a fully automated and unsupervised method for labeling water points based on seeded region growing using the automated extracted point features NOR and HV.
In seed points selection, the point feature "NOR" was used to find the possible seed points. As mentioned previously, the water points have single and double returns when the data are collected at the infrared and green wavelengths, respectively. First, the points that have single returns (points_single) at the infrared wavelengths (e.g., 1064 nm) were extracted, which included points from built-up areas (building roofs and road surfaces) and vegetation as well as water surfaces. Second, the first points of the double returns (points_first_of_double) at the green wavelength (i.e., 532 nm) were extracted. These points included points from vegetation and may represent built-up areas (from power lines or fences) as well as water surfaces. For each point in points_single, the nearest point within the footprint of the green wavelength from points_first_of_double was found. For simplicity's sake, the footprint was assumed to be a circle with radius "r G ", where r G = 0.5 × altitude × beam divergence (β). The footprint was used to ensure that the point belonged to the same object. This step is essential for removing points that belong to built-up areas and vegetation with single returns. Thus, all points in points_single that have nearest points from points_first_of_double were considered as the possible seed points.
It should be pointed out that vegetation could be recorded with single or more returns at various wavelengths due to different interaction between vegetation and those wavelengths as well as the change of the scan angles. Therefore, the point feature "HV" was used to refine the possible seed points.
The height difference within a searching radius for each point was checked against the variations in the point's elevations at the infrared wavelengths in C1 or C2. These variations are because of the presence of waves on the water surface and the fact that the Titan channels scan a given area in slightly different time periods. A threshold of 0.5 m was used in order to preserve horizontal surfaces and, hence, filter out non-water points. Thus, the remaining points were considered as water seed points. In the region growing criterion, all points in the scene were considered as neighboring points of each water seed point. Then, points within the neighborhood of each seed point were arranged according to their distance to the seed point. The HV was subsequently tested point by point against the variations in the point's' elevations in order to label all water points in the scene.

Evaluation
Since aerial images during the LiDAR data acquisition were not available and the study area is a coastal area, the elevation was used to label the LiDAR points as references. In coastal areas, the water part usually has the largest area with the lowest elevations in the scene. Therefore, an elevation histogram was first constructed with a bin size of 1 m, where the entire range of elevations was divided into a series of equal intervals (i.e., bins), as shown in Figure 4a-c. Then, the highest peak was detected and considered the average water surface elevation. The elevation of the first peak represents the average water level for C1 or C2. The second peak represents the average water level for C3, whereas the first peak is for returns from the benthic layer. A threshold of 0.5 m (half bin size) was added to the average water surface elevation, and all points with elevation less than or equal to that elevation (i.e., 39.5 m) were labeled as water points (Figure 4d-f).
as the change of the scan angles. Therefore, the point feature "HV" was used to refine the possible seed points. The height difference within a searching radius for each point was checked against the variations in the point's elevations at the infrared wavelengths in C1 or C2. These variations are because of the presence of waves on the water surface and the fact that the Titan channels scan a given area in slightly different time periods. A threshold of 0.5 m was used in order to preserve horizontal surfaces and, hence, filter out non-water points. Thus, the remaining points were considered as water seed points. In the region growing criterion, all points in the scene were considered as neighboring points of each water seed point. Then, points within the neighborhood of each seed point were arranged according to their distance to the seed point. The HV was subsequently tested point by point against the variations in the point's' elevations in order to label all water points in the scene.

Evaluation
Since aerial images during the LiDAR data acquisition were not available and the study area is a coastal area, the elevation was used to label the LiDAR points as references. In coastal areas, the water part usually has the largest area with the lowest elevations in the scene. Therefore, an elevation histogram was first constructed with a bin size of 1 m, where the entire range of elevations was divided into a series of equal intervals (i.e., bins), as shown in Figure 4a-c. Then, the highest peak was detected and considered the average water surface elevation. The elevation of the first peak represents the average water level for C1 or C2. The second peak represents the average water level for C3, whereas the first peak is for returns from the benthic layer. A threshold of 0.5 m (half bin size) was added to the average water surface elevation, and all points with elevation less than or equal to that elevation (i.e., 39.5 m) were labeled as water points (Figure 4d-f). To evaluate the success of using the different features and the region growing algorithm in labeling water points, the confusion matrix was constructed as shown in Table 3. The completeness, correctness, and overall accuracy were then calculated for the evaluation of the results between the extracted water points from different LiDAR point features or seeded region growing algorithm and the labeled water points based on elevation.  To evaluate the success of using the different features and the region growing algorithm in labeling water points, the confusion matrix was constructed as shown in Table 3. The completeness, correctness, and overall accuracy were then calculated for the evaluation of the results between the extracted water points from different LiDAR point features or seeded region growing algorithm and the labeled water points based on elevation. where TP: the point was labeled as water in both classification process and reference data TN: the point was labeled as land in both classification process and reference data FP: the point was labeled as water in classification process and as land in reference data FN: the point was labeled as land in classification process and as water in reference data.
The completeness (or recall) indicates how complete were the extracted water points, while the correctness (or precision) indicates how correct were the extracted water points. The overall accuracy indicates how successful was the classification process. The three quality measures are defined by [49] as follows: Overall Accuracy (%) = TP + TN TP + FP + TN + FN × 100 Since the land/water interface is very important to monitor, the land/water boundary was delineated and compared with the digitized boundary from the reference points. Ten transects were then drawn perpendicular to the land/water boundary. The 2D distances between the reference boundary and the delineated boundary were measured and the root mean square error (RMSE) was calculated.

Results and Discussion
In this section, we provide the extraction of each LiDAR point feature and its usage in land/water discrimination with the evaluation of the results. As aforementioned, the point features were extracted from the neighboring points within a 2D searching radius of 1 m, and threshold values were applied to each feature to discriminate water from land. The results of the land/water discrimination using elevation-, intensity-, and geometry-based features are presented. Also, the results of using the seeded region growing algorithm and the delineation of land/water boundaries are presented at the end of this section.
The LAS data files were converted into ASCII files using lastools (lastools; rapidlasso GmbH; Gilching, Germany) so that they could be processed. The LiDAR point features extraction, the seeded region algorithm, and point labeling were implemented using Matrix Laboratory (MATLAB) (MATLAB2016a; MathWorks; Natick, MA, USA). The threshold values, which were identified using Jenks break optimization method, were selected using the embedded function in ArcGIS (ArcGIS; Esri; Redlands, CA, USA). The accuracy assessment and the LiDAR data visualization were both conducted using ArcGIS as well.

Land/Water Discrimination from Elevation-Based Features
The HV and HSD features were calculated and Equations (7) and (8) were used to label water points. The HV and HSD of water points in C3 were higher than of water points in C1 or C2. Therefore, the water points were labeled using a combination of C1 with C3 or C2 with C3. The labeled points based on HV and HSD are shown in Figure 5. The completeness, correctness and overall accuracy of the labeled points are provided in Table 4. labeled points based on HV and HSD are shown in Figure 5. The completeness, correctness and overall accuracy of the labeled points are provided in Table 4.  The discrimination based on HV and HSD features varies between the two used combinations. The reason of this is the objects (e.g., vegetation) that have double or more returns in C3, but have single returns in C1 or C2 were misclassified as water points. On the contrary, water areas that have single returns in C3 were incorrectly classified as land. As a result, the correctness of the combination of C1 with C3 is lower than that of C2 with C3 because the classification errors of land points as water points are much higher in the first combination. In general, the HV and HSD features could enhance discrimination capacity wherever benthic returns are present, which reflects the potential use of multi-channels that have various characteristics in land and water regions.

Land/Water Discrimination from Intensity-Based Features
The Jenks break optimization was used to identify Ithresh and ICOVthresh in order to label water points. Table 5 presents the threshold values of intensity (Ithresh) and ICOV (ICOVthresh). The IDthresh was selected as 0.7 to minimize the type I (FP) and II (FN) errors. The labeled LiDAR points from different channels based on ICOV and ID are shown in Figure 6, while the accuracy measures of the labeled points are provided in Table 6.   The discrimination based on HV and HSD features varies between the two used combinations. The reason of this is the objects (e.g., vegetation) that have double or more returns in C3, but have single returns in C1 or C2 were misclassified as water points. On the contrary, water areas that have single returns in C3 were incorrectly classified as land. As a result, the correctness of the combination of C1 with C3 is lower than that of C2 with C3 because the classification errors of land points as water points are much higher in the first combination. In general, the HV and HSD features could enhance discrimination capacity wherever benthic returns are present, which reflects the potential use of multi-channels that have various characteristics in land and water regions.

Land/Water Discrimination from Intensity-Based Features
The Jenks break optimization was used to identify I thresh and ICOV thresh in order to label water points. Table 5 presents the threshold values of intensity (I thresh ) and ICOV (ICOV thresh ). The ID thresh was selected as 0.7 to minimize the type I (FP) and II (FN) errors. The labeled LiDAR points from different channels based on ICOV and ID are shown in Figure 6, while the accuracy measures of the labeled points are provided in Table 6. ICOV ID Figure 6. Labeled LiDAR points based on ICOV and ID. Generally, the accuracy measures using intensity-based features are relatively low. The intermediate part of the water body had high intensity variation due to different angles of incidence of the laser pulse with respect to the water surface. Therefore, this characteristic was used as the basis of ICOV, where higher ICOV values represented water points. However, some water points in C3 exhibited low ICOV and were misclassified as land. As a result, the completeness of the water class from C1 (93.5%) and C2 3.6%) was higher than from C3 (54.0%). On the other hand, the correctness of the water class from C1 (25.0 %) and C2 (24.6%) was lower than from C3 (58.0%). This is because some vegetation areas had high intensity variation in C1 and C2, so they exhibited high ICOV and were misclassified as water points.
For ID feature, the water points were assumed to have the lowest intensity values in the scene. However, based on this assumption, the results revealed low discrimination rates. This is mainly because the intensity variations of the water points as a result of low altitude and looking angles of C1 and C2. Thus, most returns of the water area were recorded at normal incidence angle and, hence, caused intensity variations. On the other hand, the discrimination results demonstrated the highest completeness (88.7%) and correctness (52.3%) of water class in C3. This is due to the returns from the water surface and benthic layer that exhibited lower intensity values. The intermediate part of the water body was misclassified as land due to high intensity variation. In land area, points that had low intensity values (i.e., vegetation points) were incorrectly classified as water.

Land/Water Discrimination from Geometry-Based Features
As mentioned earlier, the labeling of water points using geometry-based features depends on the returns from the benthic layer. Consequently, the combination of C1/C2 and C3 was required. The water points were labeled as such if the PD of a LiDAR point in C3 was higher than the PD in C1 or C2. For NOR feature, a LiDAR point with double returns in C3 and a single return in either C1 or  Generally, the accuracy measures using intensity-based features are relatively low. The intermediate part of the water body had high intensity variation due to different angles of incidence of the laser pulse with respect to the water surface. Therefore, this characteristic was used as the basis of ICOV, where higher ICOV values represented water points. However, some water points in C3 exhibited low ICOV and were misclassified as land. As a result, the completeness of the water class from C1 (93.5%) and C2 3.6%) was higher than from C3 (54.0%). On the other hand, the correctness of the water class from C1 (25.0 %) and C2 (24.6%) was lower than from C3 (58.0%). This is because some vegetation areas had high intensity variation in C1 and C2, so they exhibited high ICOV and were misclassified as water points.
For ID feature, the water points were assumed to have the lowest intensity values in the scene. However, based on this assumption, the results revealed low discrimination rates. This is mainly because the intensity variations of the water points as a result of low altitude and looking angles of C1 and C2. Thus, most returns of the water area were recorded at normal incidence angle and, hence, caused intensity variations. On the other hand, the discrimination results demonstrated the highest completeness (88.7%) and correctness (52.3%) of water class in C3. This is due to the returns from the water surface and benthic layer that exhibited lower intensity values. The intermediate part of the water body was misclassified as land due to high intensity variation. In land area, points that had low intensity values (i.e., vegetation points) were incorrectly classified as water.

Land/Water Discrimination from Geometry-Based Features
As mentioned earlier, the labeling of water points using geometry-based features depends on the returns from the benthic layer. Consequently, the combination of C1/C2 and C3 was required. The water points were labeled as such if the PD of a LiDAR point in C3 was higher than the PD in C1 or C2. For NOR feature, a LiDAR point with double returns in C3 and a single return in either C1 or C2, was labeled as a water point. Figure 7 shows the labeled LiDAR points based on PD and NOR. The completeness, correctness, and overall accuracy of the labeled points are provided in Table 7.
Appl. Sci. 2018, 8, x FOR PEER REVIEW 14 of 21 C2, was labeled as a water point. Figure 7 shows the labeled LiDAR points based on PD and NOR. The completeness, correctness, and overall accuracy of the labeled points are provided in Table 7.
C1 with C3 C2 with C3 PD NOR Figure 7. Labeled LiDAR points based on PD and NOR. The PD feature shows lower overall accuracy, and most land points were misclassified as water. This is mainly due to the land area reflecting more returns in C3 than in C1 and C2 because the land objects exhibited various characteristics at different wavelengths. Therefore, the condition that the PD in C3 is higher than the PD in C1 or C2 was not achieved. The NOR feature demonstrated promising results. However, this is attributed to the presence of benthic returns from C3. Despite the relatively high overall classification accuracies obtained using NOR, there are misclassified points in water bodies. This is mainly due to single returns from water surfaces or benthic layers in C3 and, hence, the necessary condition for labeling water points was not achieved.

Land/Water Discrimination from Seeded Region Growing
The point feature "NOR" was used to find the possible seed points by detecting the points that had single returns in C1 or C2 (points_single) and the first of the double returns in C3 (points_first_of_double), as shown in Figure 8a-c, respectively. The NOR was used rather than other features (e.g., HV or HSD) because it achieved a higher correctness of 95.2% (C1C3) and 96.5% (C2C3) for the water body discrimination. For each point in points_single, the nearest point from and within the circle footprint of points_first_of_double was found. The searching radius was set to 0.15 m, which was calculated by multiplying the flying height (430 m) by half of the beam divergence of C3 (0.35 mrad).
The selected points might represent other land objects, such as vegetation, which could have the same characteristics. The extracted points_single and points_first_of_double are shown in Figure 8d,e. Therefore, the HV was checked within a 10 m searching radius to be less 0.5 m in order to preserve horizontal surfaces (i.e., water surface). Figure 8f,g shows the output that represents the water seed points. These points were used as an input for region growing, where the HV was checked point by Land Water Figure 7. Labeled LiDAR points based on PD and NOR. The PD feature shows lower overall accuracy, and most land points were misclassified as water. This is mainly due to the land area reflecting more returns in C3 than in C1 and C2 because the land objects exhibited various characteristics at different wavelengths. Therefore, the condition that the PD in C3 is higher than the PD in C1 or C2 was not achieved. The NOR feature demonstrated promising results. However, this is attributed to the presence of benthic returns from C3. Despite the relatively high overall classification accuracies obtained using NOR, there are misclassified points in water bodies. This is mainly due to single returns from water surfaces or benthic layers in C3 and, hence, the necessary condition for labeling water points was not achieved.

Land/Water Discrimination from Seeded Region Growing
The point feature "NOR" was used to find the possible seed points by detecting the points that had single returns in C1 or C2 (points_single) and the first of the double returns in C3 (points_first_of_double), as shown in Figure 8a-c, respectively. The NOR was used rather than other features (e.g., HV or HSD) because it achieved a higher correctness of 95.2% (C1C3) and 96.5% (C2C3) for the water body discrimination. For each point in points_single, the nearest point from and within the circle footprint of points_first_of_double was found. The searching radius was set to 0.15 m, which was calculated by multiplying the flying height (430 m) by half of the beam divergence of C3 (0.35 mrad).
The selected points might represent other land objects, such as vegetation, which could have the same characteristics. The extracted points_single and points_first_of_double are shown in Figure 8d,e. Therefore, the HV was checked within a 10 m searching radius to be less 0.5 m in order to preserve horizontal surfaces (i.e., water surface). Figure 8f,g shows the output that represents the water seed points. These points were used as an input for region growing, where the HV was checked point by point to be less than 0.5 m so that all water surface points could be labeled accurately. Figure 8h,i shows the labeled LiDAR points using the combination of C3 with either C1 or C2, while Table 8 provides the accuracy measures for the two cases. point to be less than 0.5 m so that all water surface points could be labeled accurately. Figure 8h,i shows the labeled LiDAR points using the combination of C3 with either C1 or C2, while Table 8 provides the accuracy measures for the two cases.  Using either C1 or C2 together with C3 achieved a high overall accuracy of more than 99% when using the automatic seeded region growing algorithm. The completeness and correctness of the water class is also high at more than 94%. The completeness of the water class was affected by the misclassification of a small pond at the edge of the dataset. The completeness of the combination of C1 with C3 (94%) was smaller than that of C2 with C3 (95.7%) because the misclassified part at the water edges in the first combination was higher than the misclassified part in the second combination.
This high accuracy rate shows the benefit of using multi-channel in such applications. However, this approach requires recorded returns at the green wavelength from the surface and Land Water Figure 8. LiDAR points based on seeded region growing algorithm. (a-c) Single Returns of C1, C2 and C3; (d,e) Possible Seed Points using C1 with C3 and C2 with C3; (f,g) Water Seed Points using C1 with C3 and C2 with C3; (h,i) Seeded Region Growing using C1 with C3 and C2 with C3. Using either C1 or C2 together with C3 achieved a high overall accuracy of more than 99% when using the automatic seeded region growing algorithm. The completeness and correctness of the water class is also high at more than 94%. The completeness of the water class was affected by the misclassification of a small pond at the edge of the dataset. The completeness of the combination of C1 with C3 (94%) was smaller than that of C2 with C3 (95.7%) because the misclassified part at the water edges in the first combination was higher than the misclassified part in the second combination.
This high accuracy rate shows the benefit of using multi-channel in such applications. However, this approach requires recorded returns at the green wavelength from the surface and benthic layer of water bodies in order to fully automate the discrimination process. At water body edges, there are still a few misclassified points due to low point density or high variation in point elevation. Figure 9 summarizes the overall accuracies obtained from all point features and the seeded region growing algorithm. benthic layer of water bodies in order to fully automate the discrimination process. At water body edges, there are still a few misclassified points due to low point density or high variation in point elevation. Figure 9 summarizes the overall accuracies obtained from all point features and the seeded region growing algorithm.

Land/Water Boundary Delineation
Land/water discrimination based on the seeded region growing algorithm achieved the highest overall accuracy of more than 99% with clear visualized land/water boundaries. The land/water boundary was delineated from region growing results produced by combining C1 with C3 (RG_C1C3_bound) and C2 with C3 (RG_C2C3_bound). The labeled points were converted into raster images with a pixel size of 1 m, and contour lines were automatically drawn using the embedded function in ArcGIS "Contour list". The output contour lines were compared with the contour line produced from labeled point clouds of C3 based on elevation (H_C3_bound) as shown in Figure 10. The misclassification at the dataset edges caused disconnection and delineation errors of the land/water boundary.
The RMSE was calculated from nine transects distributed perpendicular to the land/water boundary as shown in Figure 10. The tenth transect was excluded from calculations as the land/water boundary was not delineated from RG_C2C3_bound. If nine transects are considered, the RMSE of RG_C1C3_bound and RG_C2C3_bound are 4.81 m and 0.87 m, respectively. The high values are due to the discrimination errors at the land/water interface at the edges. Therefore, we also excluded transects at the edges (the first, second, and ninth transects). If only six transects are considered, the RMSE of RG_C1C3_bound and RG_C2C3_bound improved to 0.42 m and 0.60 m, respectively.

Land/Water Boundary Delineation
Land/water discrimination based on the seeded region growing algorithm achieved the highest overall accuracy of more than 99% with clear visualized land/water boundaries. The land/ water boundary was delineated from region growing results produced by combining C1 with C3 (RG_C1C3_bound) and C2 with C3 (RG_C2C3_bound). The labeled points were converted into raster images with a pixel size of 1 m, and contour lines were automatically drawn using the embedded function in ArcGIS "Contour list". The output contour lines were compared with the contour line produced from labeled point clouds of C3 based on elevation (H_C3_bound) as shown in Figure 10. The misclassification at the dataset edges caused disconnection and delineation errors of the land/water boundary.
The RMSE was calculated from nine transects distributed perpendicular to the land/water boundary as shown in Figure 10. The tenth transect was excluded from calculations as the land/water boundary was not delineated from RG_C2C3_bound. If nine transects are considered, the RMSE of RG_C1C3_bound and RG_C2C3_bound are 4.81 m and 0.87 m, respectively. The high values are due to the discrimination errors at the land/water interface at the edges. Therefore, we also excluded transects at the edges (the first, second, and ninth transects). If only six transects are considered, the RMSE of RG_C1C3_bound and RG_C2C3_bound improved to 0.42 m and 0.60 m, respectively.

Limitations of Land/Water Discrimination
The water body might be a lake, a coastal zone with gradually sloped terrain, or a river in urban or mountain areas. In addition, the LiDAR data are collected from different missions. Thus, the data specifications depend on the flying height, aircraft's attitude, scan angles, and water surface roughness (i.e., waves). Therefore, we discuss in this section the limitations of the proposed methods for land/water discrimination considering different LiDAR data specifications.
The Titan sensor is designed to detect water surfaces in C1 (1550 nm) and C2 (1064 nm) and detect water surfaces and/or benthic layers in C3 (532 nm). The arrangement of looking angles of C2 at nadir and C1 at 3.5° forward looking increases the reflection from the water surface, while the looking angle of C3 at 7° forward looking increases the probability of water surface penetration. Although the 1550 nm wavelength is less reflected than the 1064 nm wavelength from the water bodies [34], the results demonstrated that their performance is almost identical. This is due to the fact that the flying height is relatively low. A higher altitude might provide different results. Water surface penetration and benthic layer detection at the 532 nm wavelength depend on the water surface roughness, incidence angles of the laser pulses, water clarity, and water depth.
As mentioned previously, the interaction between the 532 nm wavelength and the water molecules produces returns at a red wavelength of 647 nm [14]. Similarly, when the laser at the used wavelengths contacts the land/water interface, they could produce returns at a specific wavelength, which is required to be further explored.
Water bodies in coastal areas usually occupy the lowest elevation in the scene so that an elevation threshold might be enough for discriminating water from land. However, the elevation threshold would not help in the discrimination process for elevated water areas. The NOR feature is expected to achieve promising results for elevated water areas if and only if LiDAR returns from water surfaces and benthic layers are recorded at the green wavelength. However, this is attributed to the data specifications and environment conditions. For instance, higher altitude and/or water turbidity may cause non-returns from the benthic layer. Other LiDAR point features such as HV,

Limitations of Land/Water Discrimination
The water body might be a lake, a coastal zone with gradually sloped terrain, or a river in urban or mountain areas. In addition, the LiDAR data are collected from different missions. Thus, the data specifications depend on the flying height, aircraft's attitude, scan angles, and water surface roughness (i.e., waves). Therefore, we discuss in this section the limitations of the proposed methods for land/water discrimination considering different LiDAR data specifications.
The Titan sensor is designed to detect water surfaces in C1 (1550 nm) and C2 (1064 nm) and detect water surfaces and/or benthic layers in C3 (532 nm). The arrangement of looking angles of C2 at nadir and C1 at 3.5 • forward looking increases the reflection from the water surface, while the looking angle of C3 at 7 • forward looking increases the probability of water surface penetration. Although the 1550 nm wavelength is less reflected than the 1064 nm wavelength from the water bodies [34], the results demonstrated that their performance is almost identical. This is due to the fact that the flying height is relatively low. A higher altitude might provide different results. Water surface penetration and benthic layer detection at the 532 nm wavelength depend on the water surface roughness, incidence angles of the laser pulses, water clarity, and water depth.
As mentioned previously, the interaction between the 532 nm wavelength and the water molecules produces returns at a red wavelength of 647 nm [14]. Similarly, when the laser at the used wavelengths contacts the land/water interface, they could produce returns at a specific wavelength, which is required to be further explored.
Water bodies in coastal areas usually occupy the lowest elevation in the scene so that an elevation threshold might be enough for discriminating water from land. However, the elevation threshold would not help in the discrimination process for elevated water areas. The NOR feature is expected to achieve promising results for elevated water areas if and only if LiDAR returns from water surfaces and benthic layers are recorded at the green wavelength. However, this is attributed to the data specifications and environment conditions. For instance, higher altitude and/or water turbidity may cause non-returns from the benthic layer. Other LiDAR point features such as HV, HSD and PD require double returns from the water body at the green wavelength as well, but the results are also affected by the recorded returns from land objects at different wavelengths.
The intensity-based features (ICOV and ID) rely on the recorded signal strength from the water body. As aforementioned, this is attributed to the aircraft's altitude, laser pulse angle of incidence, and water surface roughness. The results of intensity-based features also change according to the interaction of land objects and water bodies with different wavelengths. The performance of those features will differ according to the tested environment and the data specifications. In addition, intensity correction might maximize the usefulness of the intensity data.
The automation of seeded region growing method is also restricted by the presence of double returns from the water body at the green wavelength. Another point feature could be used, then, to automatically detect the water seed points. In general, more datasets representing different complexity at the land/water interface will be further investigated once the reference data are available.

Conclusions
This research presented the capability of multispectral LiDAR data, in the format of discrete returns, for land/water discrimination in coastal areas. The multispectral data of a study area at Lake Ontario were collected using the Optech Titan sensor operating at three channels with wavelengths of 1550, 1064 and 532 nm. The interaction of the water bodies at the C1 and C2 is almost identical, whereas the water surface is detected. The green wavelength can penetrate the water surface and detect both the water surface and benthic layer. A set of LiDAR point features based on elevation, intensity, and geometry were extracted and evaluated for land/water discrimination. The elevation-based features (HV and HSD) revealed an average overall accuracy of 75.1%, while the PD feature revealed an average overall accuracy of 58.3%. Those features showed various overall accuracies due to the different interaction of the wavelengths with land objects and water bodies. The NOR feature achieved relatively high overall accuracy of 90.2% due to the large number of points that have double returns from the water bodies in C3. The intensity-based features show relatively low water detection rates with an average overall accuracy of 63.9%, because the water points have the same intensity range of other land features (e.g., vegetation).
A fully automated and unsupervised land/water discrimination method, based on seeded region growing algorithm, was also presented. This method started with the automatic selection of water seed points using two point features, NOR and HV. Subsequently, a criterion based on HV was grown point by point in order to label all water points in the scene. It has been shown that our method achieves high land/water discrimination with an overall accuracy of more than 99% and an average RMSE of 0.51 m for the land/water boundary.
The use of multispectral LiDAR systems is required to fully automate the land/water discrimination process. However, dual-wavelength LiDAR systems are suitable for this application if the green wavelength (532 nm) is used. The multispectral data could be more effective in land/water discrimination if more LiDAR point features are explored and combined. Consequently, features selection and the assessment of their relevance to land/water discrimination will be required [50,51]. The land/water discrimination could be extended to consider shorelines with submerged vegetation and debris and inland water areas such as rivers in urban or mountain areas.