Classification of Land-Water Continuum Habitats Using Exclusively Airborne Topobathymetric Lidar Green Waveforms and Infrared Intensity Point Clouds

Coastal areas host highly valuable ecosystems that are increasingly exposed to the threats of global and local changes. Monitoring their evolution at a high temporal and spatial scale is therefore crucial and mostly possible through remote sensing. This article demonstrates the relevance of topobathymetric lidar data for coastal and estuarine habitat mapping by classifying bispectral data to produce 3D maps of 21 land and sea covers at very high resolution. Green lidar full waveforms are processed to retrieve tailored features corresponding to the signature of those habitats. These features, along with infrared intensities and elevations, are used as predictors for random forest classifications, and their respective contribution to the accuracy of the results is assessed. We find that green waveform features, infrared intensities, and elevations are complimentary and yield the best classification results when used in combination. With this configuration, a classification accuracy of 90.5% is achieved for the segmentation of our dual-wavelength lidar dataset. Eventually, we produce an original mapping of a coastal site under the form of a point cloud, paving the way for 3D classification and management of land and sea covers.


Introduction
Monitoring coastal areas is essential to the preservation of the land-water continuum's habitats and the services they provide, particularly in a context of local and global changes [1]. Seagrasses, salt marshes, mangroves, macroalgae, sandy dunes, or beaches are examples of such habitats that continually interact with the tide levels. They can be found along the temperate shorelines and play key roles in the ecological equilibrium of these ecotones. Seagrasses ensure water quality and are a significant carbon sink, along with salt marshes and mangroves [1,2]. Coastal habitats also provide protection from marine hazards to coastal communities and infrastructures and supply many recreational activities such as snorkeling, fishing, swimming, and land sailing [1][2][3]. Finally, they support a wide range of endemic species by offering them nurseries, food, and oxygen [1,2]. However, coastal (including estuarine) habitats are exposed to a plethora of natural and anthropic threats that may be amplified by global changes. Thorough observation of coastal processes is necessary to identify the trends of evolution of these fragile environments. It requires regular data acquisition along the shoreline with spatial resolution and time spacing both adapted to the task. However, the surveying complexities inherent to land-water continuum areas hinder their monitoring at a time scale relevant to their fast evolution, and over large, representative extents. Remote sensing can adequately address this issue. namely for vegetation or building identification. Analysis of this geometrical context is the most frequently used method to produce maps of land and water covers [23][24][25]. Research works conducted on PCs processing mostly rely on the computation of geometrical features using spherical neighborhoods [23] and, more recently, on deep neural networks [26].
Another possibility with airborne lidar is to exploit the spectral details contained in the backscattered signals. These can be recorded under the form of time series of intensities received by the sensor: lidar waveforms. Each object of the surveyed environment illuminated by the sensor's laser reflects light in a specific way, generating a characteristic signature in the signal. Waveforms consequently provide additional information on the structure and physical attributes of the targets. The shape, width, and amplitude of their spectral signature-a peak-are information that can be used for land and water covers mapping [27][28][29]. Waveforms are therefore a useful indicator of the diversity of coastal areas. Though many methods have been proposed to process airborne topographic lidar full-waveforms, airborne bathymetric lidar full-waveforms are, to the best of our knowledge, much less explored. They are even less employed for classification tasks, and often only analyzed to retrieve bathymetry. There are currently three main approaches to waveform processing. The first consists in decomposing the waveforms to isolate each element of the train of echoes in the signal [30,31]. The second consists in reconstructing the signal by fitting statistical models to the waveforms [32]. Knowing how to approximate the sensed response allows to extract the position and the attributes of each component. The last approach is to analyze waveforms straightforwardly as any 1D signal to detect their peaks [27], using first derivative thresholding for example. Identifying waveform components is useful in order to localize the objects populating the scene, but also to extract features to describe them and prepare their automatic classification [27,33,34].
Classification of land or water covers using lidar data has been well explored recently. Even when using waveform data, most of the published research is based on 2D data classification [17,25,27,29,33] while fewer articles exploit PCs [24,34,35]. Many studies researching ways to classify lidar data used machine learning algorithms such as support vector machine (SVM), maximum likelihood (ML), or random forests. The maximum likelihood is mostly used for 2D lidar data, while SVM and random forests have been proofed on PCs. SVM and random forest seem to have similar classification performances on 3D lidar data [36]. However, with these algorithms, the spatial context around each point is not considered and does not impact the prediction [36]. Research papers show that conditional random field (CRF) and Markov random field (MRF) classifications produce better results in that way [36,37]. However, these require heavier computation and are more difficult to apply to large datasets. Currently, there is a consensus on the efficiency of random forest on that aspect [36]. Contrary to SVM, CRF, or MRF, it is easy to apply to large datasets. Random forest is, furthermore, robust to overfitting issues and offers the possibility to retrieve predictors contribution easily. In this article, we therefore wish to implement a hand-crafted features' random forest classification to map coastal habitats. Although machine learning classification of lidar waveform features has been explored previously, we have found no point-based studies dedicated to mapping a large number of habitats both marine and terrestrial. Previously cited studies such as [24,34,35] classified either only marine or only terrestrial habitats from PCs and [27] processed 2D data to produce their map of coastal habitats.
The present study aims at mapping an array of 21 habitats of the 3D land-water continuum seamlessly using exclusively airborne topobathymetric bispectral lidar. Our objective is to bridge the gap between marine and terrestrial surveys, demonstrate that efficient methods can be developed to automatically map the land-water interface, and show that an integrated vision of coastal zones is feasible and advised. Our contributions consist in (1) developing a point-based approach to exploit full-waveform data acquired during topobathymetric surveys for subtidal, intertidal, and supratidal habitats mapping, (2) quantifying the contribution of green waveform features, infrared (IR) intensities, and relief information to the classification accuracy based on a random forest machine learner. We improve an experimental method presented in a previous work [29] and test it on a wider area including both emerged and submerged domains to determine the suitability of full-waveform lidar data for coastal zone mapping. UAV data, inexpensive to implement, is used to estimate the accuracy of the resulting very high spatial resolution maps, which are produced under the form of PCs, paving the way for 3D classification of land and sea covers using solely topobathymetric lidar data.

Study Area
The study location was chosen along the northern coasts of Brittany, France, for its ecological diversity and due to the availability of full-waveform lidar data acquired by the French Hydrographic Office (Shom) as part of the Litto3D ® project [38]. The study area, presented in Figure 1, features typical coastal habitats such as fine sand or pebble beaches, a sandy dune, rocky areas provided with macroalgae, seagrass meadows, and salt marshes at the estuary of a local river. It also hosts a small resort town, Sables d'Or les Pins (48 • 38 27 N: 2 • 24 24 W). Buildings, tar or concrete-covered paths, boats in mooring, vehicles in parking lots, wooded areas populated with evergreen pine trees or deciduous species, and crop fields are also present in the selected zone. All these habitats are home to a rich variety of species: shellfish; dune plant vegetation; green, red, and brown seaweed; eelgrasses; evergreen and deciduous trees; crops; and salt marsh plants such as glasswort, common soda, sea purslane, or sea lavender.

Full-Waveform Airborne Topobathymetric Lidar
Airborne lidar is an active remote sensing technique that uses the backscatter of laser light in the environment to compute ranges to the ground cover and produce 3D maps of the environment, knowing the absolute position of the sensor. Topobathymetric lidar relies on two different lasers with distinct wavelengths: a green laser is added to the usual IR laser to detect the seabed or riverbed [21,22,39]. It exploits the physical properties of the green spectrum that penetrates the water surface, whereas the IR light does not. Lidar waveform is the recording of the full backscattered signal on the surveyed environment. A waveform consists in samples of recorded backscattered intensities over time. Since laser light is reflected by objects standing on its path, each element illuminated by the lidar laser backscatters a fraction of the emitted beam, which results in a peak in the waveform. Peaks are theoretically more or less intense depending on the object's albedo, its geometry and the laser incidence angle [40]. Due to the different layers of coverage in the environment (tree canopy, tree branches, bushes, soils, for example), there can be several peaks in the same waveform, all corresponding to a different layer in the reality. A typical topographic waveform (i.e., resulting from a laser beam hitting the land) has as many peaks as there are objects of different elevation in its way [28,39]. A bathymetric waveform usually has three main components: the first is the water surface return, the second is the water column return (originating from the backscatter of photons on particles suspended in the water such as sediments or nutrients), and the third is the return originating from the reflection on the seabed or riverbed [21,22,39]. Since all objects present in the cone illuminated by the laser reflect some light towards the sensor, they all contribute to the recorded waveform's shape [28]. Each element of the landscape/seascape is therefore characterized by the shape of its component in the waveform, which can be used for land or sea cover detection and classification [27,29,34,41]. In this work, backscattered intensities are converted into pseudo-reflectances by dividing them with the emitted pulse's intensity. Examples of a typical bathymetric waveform and a typical topographic waveform are presented in Figure 2.

Full-Waveform Airborne Topobathymetric Lidar
Airborne lidar is an active remote sensing technique that uses the backscatter of las light in the environment to compute ranges to the ground cover and produce 3D maps the environment, knowing the absolute position of the sensor. Topobathymetric lidar r lies on two different lasers with distinct wavelengths: a green laser is added to the usu IR laser to detect the seabed or riverbed [21,22,39]. It exploits the physical properties the green spectrum that penetrates the water surface, whereas the IR light does not. Lid waveform is the recording of the full backscattered signal on the surveyed environmen A waveform consists in samples of recorded backscattered intensities over time. Sin laser light is reflected by objects standing on its path, each element illuminated by th lidar laser backscatters a fraction of the emitted beam, which results in a peak in th waveform. Peaks are theoretically more or less intense depending on the object's albed its geometry and the laser incidence angle [40]. Due to the different layers of coverage the environment (tree canopy, tree branches, bushes, soils, for example), there can b several peaks in the same waveform, all corresponding to a different layer in the realit A typical topographic waveform (i.e., resulting from a laser beam hitting the land) has many peaks as there are objects of different elevation in its way [28,39]. A bathymetr waveform usually has three main components: the first is the water surface return, the second is the water column return (originating from the backscatter of photons on particles suspended in the water such as sediments or nutrients), and the third is the return originating from the reflection on the seabed or riverbed [21,22,39]. Since all objects present in the cone illuminated by the laser reflect some light towards the sensor, they all contribute to the recorded waveform's shape [28]. Each element of the landscape/seascape is therefore characterized by the shape of its component in the waveform, which can be used for land or sea cover detection and classification [27,29,34,41]. In this work, backscattered intensities are converted into pseudo-reflectances by dividing them with the emitted pulse's intensity. Examples of a typical bathymetric waveform and a typical topographic waveform are presented in Figure 2.
(a) (b) Figure 2. Examples of typical lidar waveforms (for a green laser with a wavelength of 515 nm). (a) Bathymetric waveform acquired in a coastal area; (b) topographic waveform acquired in a vegetated area. A sample corresponds to 556 picoseconds.

Datasets
The lidar data [42] used for this research were acquired over the coast of Sables d'Or les Pins in September 2019 by the Shom as part of the Litto3D ® project [38], using a Leica HawkEye 4X sensor. The HawkEye 4X produces laser pulses at wavelengths of 515 nm

Datasets
The lidar data [42] used for this research were acquired over the coast of Sables d'Or les Pins in September 2019 by the Shom as part of the Litto3D ® project [38], using a Leica HawkEye 4X sensor. The HawkEye 4X produces laser pulses at wavelengths of 515 nm and 1064 nm on three different channels. Depths under 10 m have a dedicated shallow green laser, while a more powerful laser, the deep channel, is used to detect deeper seabed. These two channels provide PCs with a density of at least five points per m 2 and one point per m 2 and they have a laser spot size diameter of 1.8 m and 3.4 m, respectively. The IR laser has a laser spot size of 0.2 m and a point density of at least 10 points per m 2 . For green channels only, backscattered intensities are recorded with a time frequency of 1.8 GHz, providing waveforms with a sample every 556 picoseconds. This information is not available for the IR laser.
The survey was conducted with a constant laser amplification. Due to the power needed to penetrate through several meters of water, the shallow laser' backscattered intensities tend to be saturated over highly reflective land surfaces, but they are still usable. The deep channel's returned intensities, however, are systematically saturated and do not provide usable information for land cover classification. In this study, only shallow full waveforms were used, considering the selected area's range of depths. The green waveforms used were available for every shallow green laser shot. Over the studied area, their average density was 3.75 waveforms per m 2 . Reanalyzed echo PCs for both shallow green and IR wavelengths were also used: the IR PC brought additional spectral information, while the green PC was used to accurately position the raw waveforms, since this PC underwent refraction correction before delivery. The effects of refraction were not corrected in the raw waveform files.
To provide knowledge on the environment on site, ground truth data (presented in Figure 3) were acquired in the form of photoquadrats and UAV imagery. They helped label the lidar data to perform habitat classification. UAV imagery was acquired over five smaller areas of interest, each representing typical coastal habitats, in March and April 2021 using an RGB DJI Phantom 4 Pro V2, and a Parrot Sequoia+ including a near IR nadiral sensor (770 nm to 810 nm) with a zenithal irradiance sensor. These flights were calibrated with a total of 55 ground control points. An array of 150 photoquadrats were captured with RGB cameras and georeferenced, to seize the ecological diversity of the study area. Over marine parts of the study site, a PowerVision unmanned surface vehicle (USV) was used to gather knowledge of the seabed covers. The underwater images were acquired in September 2021. An RGB orthoimage acquired in 2014 over the whole area was also used to give extra information on the habitats present on site four years prior to the data acquisition. captured with RGB cameras and georeferenced, to seize the ecological diversity of th study area. Over marine parts of the study site, a PowerVision unmanned surface vehic (USV) was used to gather knowledge of the seabed covers. The underwater images we acquired in September 2021. An RGB orthoimage acquired in 2014 over the whole are was also used to give extra information on the habitats present on site four years prior the data acquisition.

Methodology
The algorithm developed in this study was first introduced in [29] to identify marin habitats using green full-waveform spectral features. In [29], only seagrasses and sed ments (two classes) at few meters' depths were classified. We significantly improved th algorithm and adapted it to the classification of 21 habitats across the land-water inte face, to test its abilities in supratidal and intertidal environments.

Methodology
The algorithm developed in this study was first introduced in [29] to identify marine habitats using green full-waveform spectral features. In [29], only seagrasses and sediments (two classes) at few meters' depths were classified. We significantly improved this algorithm and adapted it to the classification of 21 habitats across the land-water interface, to test its abilities in supratidal and intertidal environments.
The enhanced version presented here was tailored to the identification of land and sea covers: the seabed or riverbed type was considered in the presence of water while we focused on the surface cover over terrestrial areas (i.e., if there were two layers of surface covers, such as a trees and grass beneath it, the land cover was labelled as tree). It used a supervised point-based classification algorithm trained on various sets of input features and evaluated on a test dataset. Classified PCs of the whole area were also produced to observe the ability of each predictor sets to produce a map of the habitats in the study area using this approach.

Classes of Land and Sea Covers Investigated
A total of 21 classes of land and sea covers has been designed based on on-site observations and photo-interpretation. These classes are illustrated in Table 1.

Data Pre-Processing
The classification algorithm developed for this research aimed at processing bispectral lidar data. Two PCs were therefore used, but full-waveforms were available only for the green wavelength. IR data were thus incorporated as reflected intensities only. Due to the sensor's configuration, IR and green lasers of the topobathymetric lidar used are not co-focal. They also do not have the same swath, density, and precision. The two resulting PCs are consequently different, and the points acquired do not have identical locations. A preprocessing step was therefore required to obtain the IR backscattered intensity at each green waveform's location. Intensities of the IR PC were matched with each point of the green waveforms PC, which was kept as the reference PC, using the median IR intensity of the 10 nearest neighbors of each green waveform's location in the IR PC, which was computed and assigned to the waveform as an attribute. To this end, we used the PC processing software "CloudCompare" [43], in which the neighborhood search is performed by computing the 3D Euclidean distance between each point of the reference PC and the points of the compared PC. The coordinates of the waveforms' cover component were used to locate each waveform and obtain a green waveform PC. Consequently, each waveform was synthesized as a point, and we obtained a green waveforms PC. The IR PC was cleaned manually beforehand, to ensure all noise points, significantly above the surface, were removed from the data.
The median intensity of the 10 closest neighbors in the IR PC was chosen for two reasons. First, the number of 10 neighbors was relevant considering the difference of the two lasers' spot sizes and the resulting density of the PCs. Second, the use of the median intensity was more suited to the task than the mean intensity to avoid outliers' artefacts in spheres located at the interface of two habitats.

Training and Testing Datasets
Two main datasets were designed to perform the data classifications and assess their quality, called the training dataset and test dataset. We collected 1000 training and 500 testing samples of each class. To do so, we first drew polygons over representative areas of the 21 land and water covers, based on the ground-truth data, and made sure none of the polygons intersected. We used these to segment the PC and extract points located within representative areas of each class. For each class, we then randomly selected 1000 of them for the training datasets and 500 others for testing, which resulted in training and test datasets of 21,000 and 10,500 items, respectively. This draw without replacement ensured that no training and testing points were the same and that they were all independent. We chose to use the same areas for random drawing of training and testing points in order to account for the widest diversity possible in the training and testing samples. The resulting training and test points have a mean distance of 1.6 m; their repartition is presented in Figure 4.
Remote Sens. 2022, 13, x FOR PEER REVIEW 10 of 32 within representative areas of each class. For each class, we then randomly selected 1000 of them for the training datasets and 500 others for testing, which resulted in training and test datasets of 21,000 and 10,500 items, respectively. This draw without replacement ensured that no training and testing points were the same and that they were all independent. We chose to use the same areas for random drawing of training and testing points in order to account for the widest diversity possible in the training and testing samples. The resulting training and test points have a mean distance of 1.6 m; their repartition is presented in Figure 4.  Each array of coordinates, IR intensities, and waveform features was associated to one integer label between 1 and 21, forming a dataset with the structure illustrated in Figure 5. The point-based classification method is described in the following paragraphs. It relied on the interpolated IR intensities and on spectral features extracted from the green waveforms. Each array of coordinates, IR intensities, and waveform features was associated to one integer label between 1 and 21, forming a dataset with the structure illustrated in Figure 5.
within representative areas of each class. For each class, we then randomly selected 1000 of them for the training datasets and 500 others for testing, which resulted in training and test datasets of 21,000 and 10,500 items, respectively. This draw without replacement ensured that no training and testing points were the same and that they were all independent. We chose to use the same areas for random drawing of training and testing points in order to account for the widest diversity possible in the training and testing samples. The resulting training and test points have a mean distance of 1.6 m; their repartition is presented in Figure 4.  Each array of coordinates, IR intensities, and waveform features was associated to one integer label between 1 and 21, forming a dataset with the structure illustrated in Figure 5. The point-based classification method is described in the following paragraphs. It relied on the interpolated IR intensities and on spectral features extracted from the green waveforms. The point-based classification method is described in the following paragraphs. It relied on the interpolated IR intensities and on spectral features extracted from the green waveforms.

Waveform Processing Method
The waveform processing steps are presented in Figure 6 and detailed in the following paragraphs. All these steps were performed using tailored scripts developed with Python 3.8.8.

Waveform Processing Method
The waveform processing steps are presented in Figure 6 and detailed in the following paragraphs. All these steps were performed using tailored scripts developed with Python 3.8.8. The base concept consists in extracting tailored features from the relevant part of the waveforms. Here, we considered this part to be any return detected after the water surface component in submerged domains, and the part of the waveform encompassing the backscattered signal in terrestrial zones. In these zones, the sole processing was to isolate the useful part of the green waveform by identifying where the backscatter begins and the noise ends. This was made by evaluating the mean level of noise observable at the beginning and at the end of the waveform and extracting the part of the wave where the intensity was above this noise level.
To distinguish marine and terrestrial waveforms, we relied on a flag attributed by the Leica survey software Leica LIDAR Survey Studio (LSS) to points in the PC that correspond to an interpolated water surface under which refraction correction is made.
Since waveform files integrate this information, it was possible to use it to differentiate submerged and emerged areas. This step is a pre-classification made by LSS before the data was delivered to us.
In submerged areas, further processing was required to detect the different peaks and isolate the parts of the signal that correspond to the water surface and the water column components. All green waveforms were first filtered using the Savitzky-Golay piecewise polynomial functions estimation method to remove the noise. As explained in [29], a threshold was then applied to the first derivative of the smoothed waveforms to bring out the peaks. This step was well suited to the detection of the most significant peaks; however, depending on local conditions affecting the reflection of light, some bottom returns may be less intense and hard to expose. We therefore included a second thresholding: when only one peak was identified with the first threshold, another deriv- The base concept consists in extracting tailored features from the relevant part of the waveforms. Here, we considered this part to be any return detected after the water surface component in submerged domains, and the part of the waveform encompassing the backscattered signal in terrestrial zones. In these zones, the sole processing was to isolate the useful part of the green waveform by identifying where the backscatter begins and the noise ends. This was made by evaluating the mean level of noise observable at the beginning and at the end of the waveform and extracting the part of the wave where the intensity was above this noise level.
To distinguish marine and terrestrial waveforms, we relied on a flag attributed by the Leica survey software Leica LIDAR Survey Studio (LSS) to points in the PC that correspond to an interpolated water surface under which refraction correction is made. Since waveform files integrate this information, it was possible to use it to differentiate submerged and emerged areas. This step is a pre-classification made by LSS before the data was delivered to us.
In submerged areas, further processing was required to detect the different peaks and isolate the parts of the signal that correspond to the water surface and the water column components. All green waveforms were first filtered using the Savitzky-Golay piecewise polynomial functions estimation method to remove the noise. As explained in [29], a threshold was then applied to the first derivative of the smoothed waveforms to bring out the peaks. This step was well suited to the detection of the most significant peaks; however, depending on local conditions affecting the reflection of light, some bottom returns may be less intense and hard to expose. We therefore included a second thresholding: when only one peak was identified with the first threshold, another derivative thresholding step was introduced to try to detect peaks after the water surface (i.e., the peak already detected). This second threshold had a lower value, which would exacerbate noise if it were used on the whole waveform, but it was adapted to the detection of more attenuated returns when used on the underwater part of the waveform. If no additional return was identified with this first derivative thresholding, we concluded that no seabed was retrieved and discarded the waveform since there was no return to compute features on.
The same correction of the signal's attenuation in water as the one in [29] was applied to bathymetric waveforms; it relied on the fitting of an exponential function on the water column component to compensate for the effects of water absorption and diffusion in water on the bottom return. This was based on the estimation of the diffuse attenuation coefficient [21,22] through the evaluation of the intensity gradient at the end of the surface return. However, since there were mathematical limitations to this approach in very shallow water areas, no correction was applied in depths under 0.125 m since the optimization was impossible on a water column component containing less than two waveform samples (one sample corresponds to 0.063 m in that scenario). In places where depths were smaller than 0.125 m and over land, the attenuation was fixed at 0.
The waveform processing is summarized and illustrated in Figure 7.
Remote Sens. 2022, 13, x FOR PEER REVIEW 12 of 32 ative thresholding step was introduced to try to detect peaks after the water surface (i.e., the peak already detected). This second threshold had a lower value, which would exacerbate noise if it were used on the whole waveform, but it was adapted to the detection of more attenuated returns when used on the underwater part of the waveform. If no additional return was identified with this first derivative thresholding, we concluded that no seabed was retrieved and discarded the waveform since there was no return to compute features on. The same correction of the signal's attenuation in water as the one in [29] was applied to bathymetric waveforms; it relied on the fitting of an exponential function on the water column component to compensate for the effects of water absorption and diffusion in water on the bottom return. This was based on the estimation of the diffuse attenuation coefficient [21,22] through the evaluation of the intensity gradient at the end of the surface return. However, since there were mathematical limitations to this approach in very shallow water areas, no correction was applied in depths under 0.125 m since the optimization was impossible on a water column component containing less than two waveform samples (one sample corresponds to 0.063 m in that scenario). In places where depths were smaller than 0.125 m and over land, the attenuation was fixed at 0.
The waveform processing is summarized and illustrated in Figure 7.

Waveform Features' Extraction
Once all waveform components corresponding to ground or seabed covers were isolated, these intensities time series were converted into pseudo-reflectance series by dividing them with the emitted laser pulse intensity. This allowed us to remove potential bias induced by slightly varying emitted laser intensity. Statistical parameters were then computed on these truncated and normalized waveforms. They were selected based on [27,29,44] and are described in Table 2. Table 2. Name and definition of the features extracted from the green waveforms during processing and used as input variables to the random forest model. Height Difference of altitude between the peak of the first layer of cover and the last peak.

Name Definition
Maximum before correction Maximum pseudo-reflectance of the peak (without attenuation correction) Position of the maximum in the peak Position of the maximum in the peak (in sample indices) The terrain's elevation value was also extracted: for topographic waveforms, it corresponds to the last return's altitude (computed using traditional time of flight range measurement, extracted from the PC). For bathymetric waveforms, it was computed using the depth of the last return identified by our algorithm and withdrew to the altitude of our detected surface return, positioned with the PC. The vertical reference was the IGN 1969 system.
The spectral features computed on the truncated green waveforms, the IR intensities associated with the points and the elevations were then used as predictors for random forest classifications of ground covers over the study area. The 21,000 items of the dedicated dataset were used for the algorithm's training, and the 10,500 items of the testing dataset were used to assess the quality of the predictions.

Random Forest Classification
Contrary to [29], the data were not rasterized but features were directly classified to produce a 3D habitat map so as to avoid information loss. We also relied on a different classifier to predict data labels. A random forest model with 150 trees was used for the classification step. Considering that we wished to apply it to a dataset containing 24.5 million items after fitting, we chose a high number of trees to populate the forest, knowing that more trees theoretically equal to better classification accuracy and that the number of trees needs to be adapted to the complexity of the dataset. We also based our choice on the observation made in [45] on several different datasets that state that past 128 trees in the forest, classification accuracy gains become negligible for each additional tree, compared to computational demands. The maximum tree depth was not fixed so that nodes expanded until the leaves were pure. We relied on impurity to determine whether a leaf has to be split or not using the Gini impurity criterion, which was calculated using the following formula: where p j is the probability of class j. This criterion is close to 0 when the split is optimal. We controlled the generalizability and over-fitting of the model by monitoring the generalization score obtained on random out-of-bag samples at each fitting step. The random forest implementation of the Python library scikit-learn was used.

Comparative Study
Random forest classifications were launched on several sets of predictors that were clustered based on their conceptual similarity. The performance metrics of each group of features were then retrieved. This allowed us to evaluate the contribution of each family of feature to the classification accuracy. The feature sets were the following:

•
Statistical features: mean, median, maximum, standard deviation, variance, amplitude, and total; • Peak shape features: complexity, skewness, kurtosis, area under curve, time range, and height; • Lidar return features: diffuse attenuation coefficient estimated value, maximum, maximum before attenuation correction, position of the maximum in the peak, and associated IR intensity; • Green spectral features: all features extracted from the green waveforms, except elevation (which is later referred to as Z).
We also performed importance variable contribution analysis by dropping green waveform features one by one and computing the classification accuracy difference between the reduced set of 15 predictors and the full 16 attributes.

Results' Quality Assessment
Classification performances were assessed by considering the values obtained on the test dataset by the random forest classifier for the following metrics: overall accuracy (OA, ratio of correct predictions, best when its value is 1), precision (fraction of correct predictions among each ground truth classes, best when its value is 1), recall (fraction of correct estimation for each predicted classes, best when its value is 1), and F-score (combination of precision and recall, best when its value is 1). The precision, recall, and F-score were computed for each label, and their unweighted mean was used to compare the results obtained. Confusion matrixes presenting the producer's accuracies (PA) were also created to analyze the performances of classification on each of the 21 classes.

Spatialization of the Random Forest Predictions
Although coordinates were not used during classification, the arrays of features were geolocated with the position of the waveform's last echo, as illustrated by Figure 5. To visualize our classification results as PCs, we associated the predictions' vector to these coordinates. This allowed us to obtain the result under the form of a classified PC. The fact that we did not rasterize our data had the advantage of preserving the spatial density and therefore the spatial resolution, while also avoiding the critical issue of mixed pixels [46].
Each waveform was localized with the planar coordinates of its last return using the PC coordinates. For bathymetric waveforms, this ensured that the effects of refraction of the laser beam on the air/water interface were considered, since the green PC was corrected before data delivery.

Random Forest Classifications' Performances on the Test Dataset
Ten random forest models were trained on the training dataset-one for each configuration of predictors defined in Section 3.7. Their performances were evaluated using four metrics computed on the predictions made on the test dataset. Four of them were then used to predict labels for the complete study area and produce habitat maps under the form of PCs.
Fitting the models to the 21,000 items training dataset took on average 0.4 s. Predictions on the test dataset (which contains 10,500 items) required a mean computing time of 0.2 s. The complete area, representing a total of more than 24.5 million items, was covered in 17 to 18 min. All computations were made on a machine equipped with an AMD Ryzen 9 5900X 12-Core CPU and an NVIDIA GeForce RTX 3080 GPU.
The classification metrics obtained on each set of features defined in Section 3.7. are presented in Table 3. The very close values observed between the four criteria for each feature set are due to the averaging of each metric's value obtained for each classification label. They are also explained by the fact that all classes are balanced. The scores obtained show that our method does not have a systematic tendency to over-or under-estimate some classes.
When using only subsets of green waveform features, the random forest predictions were more often false than they were correct. However, grouping the waveform variable families improved the accuracy by at least 4%. The best configuration-composed of peak shape features and lidar return features-provided a classification accuracy of 69%. Globally, coupling statistical features with peak shapes features tended to result in accuracy loss while using combinations where they were not mixed resulted in classification performance gains. Indeed, the complete set of features obtained through green waveform processing predicted the habitat type with an OA of 56%, while the output of the classification of statistical and lidar return features was correct in 67% of cases (see Table 3).
The addition of IR intensity values to the full set of green waveform parameters improved the OA of 13%, which is 5% above the best OA obtained on a subset of green waveform attributes. The contribution of the elevation to the classification accuracy was even higher: there was a 31.5% gain in OA between the green spectral features' classification and the classification of green spectral features and elevations. Gathering the three sources of information produced the most accurate result, with an OA of 90.5% (Table 3).  Figure 8 illustrates the contribution of each predictor to the accuracy of the classification of green waveform features. It sheds light on the value-added of each attribute computed on the truncated waveforms. This assessment reveals that 9 of the 16 features extracted on each waveform contributed positively to the classification performance. The seven others had a negative impact on the quality of the random forest predictions. They were mostly parameters of the "statistical features" set, although each type of waveform parameter-statistical, peak shape-related, or lidar return-related-was represented in the nine positively contributing attributes. When using only subsets of green waveform features, the random forest predictions were more often false than they were correct. However, grouping the waveform variable families improved the accuracy by at least 4%. The best configuration-composed of peak shape features and lidar return features-provided a classification accuracy of 69%. Globally, coupling statistical features with peak shapes features tended to result in accuracy loss while using combinations where they were not mixed resulted in classification performance gains. Indeed, the complete set of features obtained through green waveform processing predicted the habitat type with an OA of 56%, while the output of the classification of statistical and lidar return features was correct in 67% of cases (see Table 3).

Green Waveform Features' Contribution to the Classification Accuracy
The addition of IR intensity values to the full set of green waveform parameters improved the OA of 13%, which is 5% above the best OA obtained on a subset of green waveform attributes. The contribution of the elevation to the classification accuracy was even higher: there was a 31.5% gain in OA between the green spectral features' classification and the classification of green spectral features and elevations. Gathering the three sources of information produced the most accurate result, with an OA of 90.5% (Table 3). Figure 8 illustrates the contribution of each predictor to the accuracy of the classification of green waveform features. It sheds light on the value-added of each attribute computed on the truncated waveforms. This assessment reveals that 9 of the 16 features extracted on each waveform contributed positively to the classification performance. The seven others had a negative impact on the quality of the random forest predictions. They were mostly parameters of the "statistical features" set, although each type of waveform parameter-statistical, peak shape-related, or lidar return-related-was represented in the nine positively contributing attributes.  An in-depth analysis of the four last random forest experiments is provided in the following sections.

Land-Water Continuum Habitat 3D Modelling
By running the waveform processing algorithm and the classifiers on the whole study area, we obtained a 21-class semantic segmentation of our complete bathymetric lidar dataset.
As expected, when dealing with PCs and not rasters, the results were noisy, but some areas had lower speckle than others. The observable ratio between information and noise gradually improved with the addition of IR intensity and elevations. The best output was obtained by combining green waveform features, IR intensities, and elevations; it is presented in Figure 9. The other results are presented in Appendix A (Figures A1-A3).
Remote Sens. 2022, 13, x FOR PEER REVIEW 17 of 32 An in-depth analysis of the four last random forest experiments is provided in the following sections.

Land-Water Continuum Habitat 3D Modelling
By running the waveform processing algorithm and the classifiers on the whole study area, we obtained a 21-class semantic segmentation of our complete bathymetric lidar dataset.
As expected, when dealing with PCs and not rasters, the results were noisy, but some areas had lower speckle than others. The observable ratio between information and noise gradually improved with the addition of IR intensity and elevations. The best output was obtained by combining green waveform features, IR intensities, and elevations; it is presented in Figure 9. The other results are presented in Appendix A ( Figures  A1-A3).  The main strength of this result was the distinction between submerged and emerged domains: except for boat false-positives on the water surface, marine classes were rarely detected over land, and vice versa. Globally, this map showed better definition of the considered habitats than the others (see Figures A1-A3), and fewer classification errors, although improvements on some classes coexisted with poorer performance on other classes. Figures 10-13 gave a more detailed insight into the classification in specific areas. Figure 10 but showed cases of overdetection. In the urban area developed on the sandy dune (southeast of Figure 10), tar, rooves, trees, and shrubs were distinguished precisely, but sandy dune vegetation, shrubs, and trees were sometimes confused for rooves or tar.
A weaker aspect of this result was the classification of the sand beaches: patches of wet sand were well defined, but the borders between wet and dry sand featured many false detections of pebble. Confusion between pebble and sand was high, as shown in the wide area of sand beach that is classified as pebble in the northeast in Figure 10.  Figure 11 presented a closer look at the classification obtained in the salt marsh area. The three types of salt marsh distinguished-low, mid, and high marsh-appeared to be well identified. The classification of the salt marsh channels was less correct: instead of wet sand and submerged sand, the classifier predicted rock and submerged rock in various areas.

Seagrass Meadow Classification
In Figure 12, the focus was set on the seagrass meadow located in the north of the study site. Emerged and submerged parts of the rocky island were well mapped, even though submerged rock was detected in places where the seabed is sandy. The two types   Figure 12 also showed the precision of the classification of boats: boats mooring in the seagrass meadow were correctly labelled even though no training or test data were collected in that area for the boats class.

Confusion Matrix Obtained with Green Waveform Features, Infrared Intensities and Elevations on the Test Dataset
The confusion matrix obtained using green waveform features, IR intensities, and elevations on the test dataset is presented in Figure 13. It confirmed the observations made on the visual results. All classes were predicted with at least 70% of correctness. The most frequent confusions were between seagrasses and macroalgae, deciduous and evergreen trees, and submerged rock and submerged sand, which corroborated the observations made on the application of the model to the broader dataset.    Figure 12 also showed the precision of the classification of boats: boats mooring in the seagrass meadow were correctly labelled even though no training or test data were collected in that area for the boats class.

Confusion Matrix Obtained with Green Waveform Features, Infrared Intensities and Elevations on the Test Dataset
The confusion matrix obtained using green waveform features, IR intensities, and elevations on the test dataset is presented in Figure 13. It confirmed the observations made on the visual results. All classes were predicted with at least 70% of correctness. The most frequent confusions were between seagrasses and macroalgae, deciduous and evergreen trees, and submerged rock and submerged sand, which corroborated the observations made on the application of the model to the broader dataset.

Urban Area and Sand Beach Classification
As Figure 10 shows, the detection of planar, highly reflective surfaces such as tar and concrete was accurate, but their specific type was sometimes misidentified. Although the pier located north of Figure 10 was correctly mapped as concrete, there was confusion between soil and tar on the sandy dune (southeast of the same figure).
The classification of roof was satisfactory both west and east of the area presented in Figure 10 but showed cases of overdetection. In the urban area developed on the sandy dune (southeast of Figure 10), tar, rooves, trees, and shrubs were distinguished precisely, but sandy dune vegetation, shrubs, and trees were sometimes confused for rooves or tar. A weaker aspect of this result was the classification of the sand beaches: patches of wet sand were well defined, but the borders between wet and dry sand featured many false detections of pebble. Confusion between pebble and sand was high, as shown in the wide area of sand beach that is classified as pebble in the northeast in Figure 10. Figure 11 presented a closer look at the classification obtained in the salt marsh area. The three types of salt marsh distinguished-low, mid, and high marsh-appeared to be well identified.

Salt Marsh Classification
The classification of the salt marsh channels was less correct: instead of wet sand and submerged sand, the classifier predicted rock and submerged rock in various areas.

Seagrass Meadow Classification
In Figure 12, the focus was set on the seagrass meadow located in the north of the study site. Emerged and submerged parts of the rocky island were well mapped, even though submerged rock was detected in places where the seabed is sandy. The two types of underwater vegetation we attempted to map were very precisely defined: the patches of seagrass meadow and macroalgae were mapped with very low confusion with submerged sand or rock. However, the type of underwater vegetation (macroalgae or seagrass) was not correct in all places. There were classification errors in the seagrass meadow, where macroalgae was detected. Seagrasses were also found in macroalgae-covered areas. Figure 12 also showed the precision of the classification of boats: boats mooring in the seagrass meadow were correctly labelled even though no training or test data were collected in that area for the boats class.

Confusion Matrix Obtained with Green Waveform Features, Infrared Intensities and Elevations on the Test Dataset
The confusion matrix obtained using green waveform features, IR intensities, and elevations on the test dataset is presented in Figure 13. It confirmed the observations made on the visual results. All classes were predicted with at least 70% of correctness. The most frequent confusions were between seagrasses and macroalgae, deciduous and evergreen trees, and submerged rock and submerged sand, which corroborated the observations made on the application of the model to the broader dataset.
The other confusion matrixes can be found in Appendix B (Figures A4-A6).

Discussion
We improved an approach initially designed to distinguish two seabed covers-fine sediment and seagrass-to map all land and sea covers present in our study scene in 3D. The findings showed lidar waveforms can be used to classify and map habitats of the coastal fringe and bridge the gap between marine and terrestrial surveys. All 21 selected classes were classified with at least 70% of accuracy in the best configuration obtained, which had an OA of 90%. Here, we discuss the results obtained regarding the classification predictors and the methodology employed. We also provide potential explanations for the performances of the algorithm.

Green Waveform Features
Our research partially aimed at exploring whether green lidar waveforms can be relevant for coastal habitat mapping. We defined 16 features to extract from the portions of the waveforms that correspond to layers of ground or seabed covers. These were efficiently retrieved both on land and underwater. However, our approach did not handle extremely shallow waters, where the surface component and the bottom return overlap in the waveforms. In these cases, the peak detection employed did not distinguish the seabed from the water surface and no features were retrieved. There was consequently a 24 m wide band without data in the surf zone on the sand beaches in our processed lidar dataset.
We also noticed cases of confusion between seabed return and noise in the water column component of the waveform, which resulted in a mis-located detected seabed. These issues could be handled by improving the way the different waveform components are isolated: using waveform decomposition [31] or deconvolution [47,48] could produce better results on that aspect.
The features defined to describe the spectral signatures of coastal habitats seemed to be equally relevant for land and sea covers mapping. However, they did not provide a highly accurate classification (56% of OA). This can be explained by analyzing the green waveforms obtained with the HawkEye III on land. Since this sensor was particularly designed for bathymetry extraction, its green lasers are set to be powerful enough to reach the seabed up to several dozens of meters in coastal waters. Over land, the laser power is so high that most of the waveforms originating from highly reflective surfaces are saturated. The green wavelength alone might consequently not encompass a fine enough range of intensities over land to allow separation of similar environments such as plane habitats, different types of herbaceous vegetation, etc. The shapes of the saturated waveform returns are also affected: there is lacking information on the shape of the peak around its maximum. This can explain why there was a lot of confusion between topographic habitats when using green waveforms only.
Though green information alone may not be enough to distinguish the 21 habitats accurately, our findings suggested that a finer selection of the waveform attributes used for classification could enhance the green waveform feature predictions. The results presented in Table 3 and Figure 8 revealed negative interactions between some of the features chosen. Combining the full sets of statistical and peak shape features (defined in Section 3.7) resulted in lower accuracy than using them separately. Furthermore, the predictors' contribution assessment ( Figure 8) showed that out of the 16 predictors, only nine contributed positively to the classification accuracy. This might be due to information redundancy between features relying on similar concepts such as mean and median intensity, for example. It could be due to the correction of attenuation performed on bathymetric waveforms. This exponential correction produced extremely high values of backscattered intensities under water, which made little sense physically. On the other hand, topographic waveforms were not corrected: their typical intensity order of magnitude was several times smaller, which might have disrupted the classifier. Fixing the issue of attenuation correction and using only a selected set of waveform predictors based on an assessment of their contribution would certainly result in better results when using the green wavelength alone.
The three different types of waveform features appeared to be complimentary: the nine predictors with a positive influence on the OA (Figure 8) represented each feature family defined in Section 3.7. This was consistent: the shape of the waveform return is characteristic of its nature, and the complexity, length, shape, maximum, and position of its maximum sum up the essential information differentiating one waveform from another.

Infrared Data
The addition of the IR wavelength increased the OA by 13%. The classification results certainly benefited from IR light's interaction with water and chlorophyll pigments, which provided essential information for the labelling of vegetation and other topographic classes such as wet sand. Considering that the green wavelength was less adapted to land cover classification, the performance increase obtained by using both lasers was expected.
Our research showed the added value of topobathymetric lidar: on top of providing quasi-continuity between land and water, both wavelengths provided complementary information for land covers' classification. The IR PC alone could not provide a coastal habitat map since it did not reach the seabed and riverbed; the OA obtained using only this wavelength (24%, presented in Table 3) confirmed that. They also showed that green lidar features alone do not provide a sufficient basis for classification either, reaching an OA of only 56%. Coupling both wavelengths improved the overall result significantly, bringing together the strengths of IR data on land covers, and the ability of green lidar to penetrate the water surface. The matrixes presented in Figures A4 and A5 show that the addition of IR intensities to green waveform features resulted in an accuracy increase for all but two of the 21 classes. The gain ranged between 0% (submerged sand) and 39% (tar). The minimum accuracy observed over the 21 labels rose from 21% to 29%. Water covers classes such as seagrass and submerged rock also benefited from the addition of IR intensities, as their accuracy showed an improvement of 1% and 6%, respectively. As HawkEye III is tailored for bathymetry, its green laser's power was set to be high, which resulted in saturated intensities over land. The IR channel provided complimentary information in places when the green channel was weaker, which partially explains the algorithm's performances we observe. The classification accuracies obtained for land covers confirm that; for example, the classification of soil, wet sand, and lawn was significatively enhanced: +28%, +34%, and +23% of OA, respectively. Our future work will focus on exploiting both IR and green waveforms for habitat classification, to maximize the accuracy attainable with full-waveform topobathymetric lidar.

Ground and Seabed Elevation
Similarly to the IR data, elevations contributed greatly to the improvement of the OA of the habitat classification, but could not be separated from spectral predictors and used alone to provide accurate detection of land and sea covers. Indeed, Table 3 revealed that elevations produced a classification with an OA of 55%, which is less than what can be achieved with spectral predictors. This was expected since many different habitats coexist at similar altitudes and are mainly differentiated by their reflectance. However, for some others, mainly salt marsh types, elevation is an intrinsic quality and the base of their definition. This explains why those are the type of classes that benefited the most from the addition of elevation to a set of spectral predictors in terms of classification accuracy. Respectively, the classification accuracy of low salt marsh, mid salt marsh, and high salt marsh rose from 71%, 76%, and 21% by adding elevations to the green waveform features in the set of predictors (see Figures A4 and A6).
Even though elevation combined with spectral information already provided high classification accuracy (90%, see Table 3), the values extracted with our approach were not always consistent with those provided by the original PCs in marine areas, as explained above. To remove artifacts due to water quality, a post-processing step could be implemented, and the neighboring elevations could be used to regularize the processed PC obtained.

Classification Approach
Our results showed that topobathymetric lidar is fitted to the classification of coastal habitats. Although elevations, IR intensities, and green waveform features did not produce high accuracy classifications of the land-water continuum, they were complimentary and achieved high-precision results when combined. To the best of our knowledge, no similar papers proposing point-based land and water covers mapping from bispectral lidar data were published, so no direct comparisons of results are possible. However, our observations corroborate those made in [35], which successfully used random forest algorithms to classify full-waveform lidar data over urban areas and obtained an overall accuracy of over 94% when identifying four types of land covers. This paper only focuses on terrestrial areas but confirms the high accuracy we observe when using waveform features without rasterization for mapping purposes. Class-wise, our results seem more homogeneous for the land covers we have in common, although this means that our approach performs less accurately than theirs on some urban classes. Indeed, ref. [35] presents a PA of 94.8% for buildings, which is higher than we observe on our roof class (89%), but our vegetation classes (trees and shrubs) have an average PA of 82.6%, while theirs is 68.9%, and our natural ground classes (soil, lawn, salt marsh) reach an average PA of 91.6%, higher than the 32.7% presented in [35]. Our approach and the method introduced in [35] perform similarly on artificial ground (for us, tar and concrete), with PAs of 96% and 96.4%, respectively.
Although we found no other research performing point-based classification of subtidal, intertidal, and supra-tidal habitats, we can compare our findings to those in [34], where the authors also observed that the use of waveform data improves seabed maps and obtained an OA of 86% for their classification of seabed substrates and aquatic macrovegetation. Their approach provides a better mapping of low underwater vegetation on soft substrate (100% versus 85% of PA in our case) but a less accurate detection of hard seabed substrate (68% versus 90% of PA in our study). Again, although we have less accurate results for some classes, our method seems to provide more balanced and homogeneous performances among different classes.
Our results also confirm those from [27], where 19 land-water continuum habitats were classified with an OA of 90%, and in which the authors concluded that the best classification results were obtained when combining spectral information and elevation. However, in [27], the authors used digital models of waveform features that they obtained by rasterizing their data, and they relied on an ML classifier. Although our metrics are similar, our classification has the advantage of preserving the spatial density and repartition of the data.
Other studies such as [49][50][51] used 2D lidar-derived data and imagery along with machine learning classifiers to map similar coastal habitats as the ones we attempted to map. They obtained performance scores in the same range as ours, with OA between 84% and 92%. The authors did not use waveform data in these studies and observed low accuracy when classifying only digital elevation models obtained with lidar surveys, therefore requiring the additional processing of imagery. Our approach has the advantage of requiring only one source of data out of the two sources often used in existing literature, which facilitates both acquisition procedures and processing.
Globally, our results are in line with [27,35,52], which all state that bathymetric lidar waveforms are well suited for benthic habitats mapping and observe the same complementarity between spectral and elevation information for habitat mapping. Our method offers an OA similar to existing research in lidar data classification for habitat mapping, while extending the application to a wider range of habitats-both marine and terrestrial-and avoiding information loss through rasterization. Although the PAs obtained for some classes are lower than results previously presented in other studies [34,35], this method also has the advantage of offering homogeneous performances and low inter classes PA differences, contrary to other existing research results [34,35].
The random forest models trained showed low overfitting, as the extended application results illustrated. The classification of boats located outside of the training and test data collection area, for example, illustrated that the classifiers obtained could be applied to other datasets accurately. Natural, semi-natural and anthropic habitats were well distinguished, and vegetation was precisely isolated, which opened perspectives for ecological assessments of those coastal areas. The remaining errors often involved classes that were close semantically. For example, there was confusion between salt marsh and high natural grasses but low confusion between lawns and low marsh. A potential improvement could be to review the classes defined initially and distinguish vegetation by layers (herbaceous, arbustive, arborescent) and by their natural, semi-natural, or anthropic nature.
Besides the quality of the training and test datasets established, a source of explanation for remaining classification errors could be found in the technical specifications of the sensor. The diameter of the HawkEye III's green laser's footprint is 1.80 m, which means that the returned waveform condensates information in a 2.5 m 2 area. This parameter may have had an influence on the ability of a given array of features to describe pebble or sand, mostly at interfaces between habitats. This could partially explain the confusion between deciduous and evergreen trees: in a mixed-species forest, two different trees can coexist in a 2.5 m 2 patch.
Although the main issues identified visually reflected in the metrics computed on the test dataset, there was a gap between the estimated quality of the map and the statistics computed. For example, the classification of portions of sandy beaches as pebble was not as obvious in the confusion matrix as it is in the map. This showed the influence of the way the test dataset is built. Further work should try to incorporate validation maps in addition to test datasets to qualify the output on the complete study site. A finer tree species inventory could also be integrated to better assess the results obtained.
Nonetheless, our results highlighted a strong classification approach, leveraging the strengths of 3D bispectral data. Working at the PC scale and not in 2D opens perspectives for 3D classifications, identifying all layers of land and sea cover, mostly in vegetated areas, by using waveform segmentation instead of waveform PC segmentation, as experimented in [53]. It also shows possibilities for post-processing and neighbor-based result regularization, as well as the exploitation of spatial information through the addition of geometrical features such as roughness or local density. Lastly, the accurate classification of habitats through 3D data offers opportunities for structural ecology assessments and communication of these results to environmental managers through virtual reality or more relatable 3D visualizations, for the implementation of sustainable integrated management of coastal areas. Figure 14 provides a 3D view of the 3D habitat mapping achieved in the present study.

Conclusions
In this article, we proposed an approach to map coastal habitats exclusively using topobathymetric lidar, including both full waveforms and reanalyzed echoes. We produced results under the form of PCs and extended the application of our best classifier-which obtained 90% of OA on the test dataset-to a dataset of 24.5 million points covering a very diverse coastal area. A total of 21 classes of land and sea covers were defined and mapped in 3D. We found that green waveforms and IR intensities complement each other: while green data provided strong results in submerged areas, the IR wavelength improved the distinction of land covers. Elevations further increased the classification accuracy by perfecting the classification of plane classes and classes, such as low, mid, and high salt marsh, which were principally differentiated by their elevation. It is of special interest to note that green waveforms alone produced better results than IR intensities or elevations alone. However, the combination of the three sources of information yielded the best result, highlighting that they each bring a specific contribution to the result. Our research showed how fit topobathymetric lidar is to the classification of such numerous land-water interface habitats. We enhanced a waveform processing method to apply it to topo-bathymetric environments. The use of PCs instead of rasters and the addition of a second wavelength provided an original 3D map of 21 coastal habitats at very high spatial resolution. This provides encouraging perspectives for 3D

Conclusions
In this article, we proposed an approach to map coastal habitats exclusively using topobathymetric lidar, including both full waveforms and reanalyzed echoes. We produced results under the form of PCs and extended the application of our best classifier-which obtained 90% of OA on the test dataset-to a dataset of 24.5 million points covering a very diverse coastal area. A total of 21 classes of land and sea covers were defined and mapped in 3D. We found that green waveforms and IR intensities complement each other: while green data provided strong results in submerged areas, the IR wavelength improved the distinction of land covers. Elevations further increased the classification accuracy by perfecting the classification of plane classes and classes, such as low, mid, and high salt marsh, which were principally differentiated by their elevation. It is of special interest to note that green waveforms alone produced better results than IR intensities or elevations alone. However, the combination of the three sources of information yielded the best result, highlighting that they each bring a specific contribution to the result. Our research showed how fit topobathymetric lidar is to the classification of such numerous land-water interface habitats. We enhanced a waveform processing method to apply it to topo-bathymetric environments. The use of PCs instead of rasters and the addition of a second wavelength provided an original 3D map of 21 coastal habitats at very high spatial resolution. This provides encouraging perspectives for 3D mapping and ecological assessment of the landwater interface and paves the way to integrated management of coastal areas, bridging the gap between marine and terrestrial domains.

Data Availability Statement:
Data sharing is not applicable to this article.

Acknowledgments:
The authors are grateful to Airborne Hydrography AB (AHAB), an affiliate company of Leica Geosystems, for their technical support, their assistance with software and development issues, and their help in understanding the specificities of the sensor and its data. M.L. would like to thank the lab members who contributed to ground truth data acquisition-Hélène Gloria, Dorothée James, Alyson Le Quilleuc, Antoine Mury, and Léna Véle-and external volunteers. M.L. is also grateful to the Rennes 2 University for the field work equipment support. The authors deeply acknowledge the input of the anonymous reviewers, which improved the quality of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A
Appendix A contains the projected 3D maps of the habitats obtained for 3 random forest experiments: the classification of green waveform features only, the classification of green waveform features and IR intensities, and the classification of green waveform features and elevations.
authors deeply acknowledge the input of the anonymous reviewers, which improved the quality of the manuscript.

Conflicts of Interest:
The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A
Appendix A contains the projected 3D maps of the habitats obtained for 3 random forest experiments: the classification of green waveform features only, the classification of green waveform features and IR intensities, and the classification of green waveform features and elevations.