UvA-DARE ( Digital Academic Repository ) Identification of Linear Vegetation Elements in a Rural Landscape Using LiDAR Point Clouds

Modernization of agricultural land use across Europe is responsible for a substantial decline of linear vegetation elements such as tree lines, hedgerows, riparian vegetation, and green lanes. These linear objects have an important function for biodiversity, e.g., as ecological corridors and local habitats for many animal and plant species. Knowledge on their spatial distribution is therefore essential to support conservation strategies and regional planning in rural landscapes but detailed inventories of such linear objects are often lacking. Here, we propose a method to detect linear vegetation elements in agricultural landscapes using classification and segmentation of high-resolution Light Detection and Ranging (LiDAR) point data. To quantify the 3D structure of vegetation, we applied point cloud analysis to identify point-based and neighborhood-based features. As a preprocessing step, we removed planar surfaces such as grassland, bare soil, and water bodies from the point cloud using a feature that describes to what extent the points are scattered in the local neighborhood. We then applied a random forest classifier to separate the remaining points into vegetation and other. Subsequently, a rectangularity-based region growing algorithm allowed to segment the vegetation points into 2D rectangular objects, which were then classified into linear objects based on their elongatedness. We evaluated the accuracy of the linear objects against a manually delineated validation set. The results showed high user’s (0.80), producer’s (0.85), and total accuracies (0.90). These findings are a promising step towards testing our method in other regions and for upscaling it to broad spatial extents. This would allow producing detailed inventories of linear vegetation elements at regional and continental scales in support of biodiversity conservation and regional planning in agricultural and other rural landscapes.


Introduction
The European landscape has dramatically changed during the Holocene as a result of human impact and climatic change [1,2]. Especially since the industrial revolution, landscapes have been deforested and reshaped into rural and agricultural landscapes. These are dominated by a mosaic of grasslands, forests, and urban areas, separated or connected by linear landscape elements such as roads, ditches, tree lines, vegetated lynchets, and hedgerows [3][4][5]. The distribution, abundance and richness of species in these landscapes is related to the amount, height, length, and quality of linear vegetation linear vegetation elements in a rural landscape of the Netherlands composed of agricultural fields, grasslands, bare soil, roads, and buildings ( Figure 1). Our method provides a promising first step for upscaling the detection of linear vegetation objects in agricultural landscapes to broad spatial extents.  [24] shows various objects identified as agricultural fields, grasslands, bare soil, and infrastructure such as paved and unpaved roads and farmhouses. The numbered photos show a selection of the variety of linear landscape elements such as (1) green lanes (2) planted tall tree lines along ditches, (3) low and high shrubs/copse, (4) hedges and (5) rows of fruit trees and willows.

LiDAR and Orthophoto Data
Raw LiDAR point cloud data were retrieved from "Publieke Dienstverlening op de Kaart" [24], an open geo-information service of the Dutch government. The data are part of the 'Actueel Hoogtebestand Nederland 3' (AHN3) dataset, which was collected between 2014 and 2019. The density of the LiDAR data is around 10 pulses/m 2 and includes multiple discrete return values (which can result into effective point densities between 10 and 20 points/m 2 ) as well as intensity data. The dataset is collected in the first quarter of each year when deciduous vegetation is leafless [25]. Nevertheless, the return signal is sufficiently strong to retrieve a useful scan of the vegetation cover. Freely available very high resolution (VHR) true color orthophotos [24] with a resolution of 25 cm were consulted for location purposes.

Study Area
The study area is located in a rural landscape in the center of the Netherlands (Figure 1). The area is about 1.6 km from east to west and 1.2 km from north to south, spanning an area of almost 2 million square meters. The area contains numerous linear vegetation elements of varying geometry,  [24] shows various objects identified as agricultural fields, grasslands, bare soil, and infrastructure such as paved and unpaved roads and farmhouses. The numbered photos show a selection of the variety of linear landscape elements such as (1) green lanes (2) planted tall tree lines along ditches, (3) low and high shrubs/copse, (4) hedges and (5) rows of fruit trees and willows.

LiDAR and Orthophoto Data
Raw LiDAR point cloud data were retrieved from "Publieke Dienstverlening op de Kaart" [24], an open geo-information service of the Dutch government. The data are part of the 'Actueel Hoogtebestand Nederland 3' (AHN3) dataset, which was collected between 2014 and 2019. The density of the LiDAR data is around 10 pulses/m 2 and includes multiple discrete return values (which can result into effective point densities between 10 and 20 points/m 2 ) as well as intensity data. The dataset is collected in the first quarter of each year when deciduous vegetation is leafless [25]. Nevertheless, the return signal is sufficiently strong to retrieve a useful scan of the vegetation cover. Freely available very high resolution (VHR) true color orthophotos [24] with a resolution of 25 cm were consulted for location purposes.

Study Area
The study area is located in a rural landscape in the center of the Netherlands (Figure 1). The area is about 1.6 km from east to west and 1.2 km from north to south, spanning an area of almost 2 million Remote Sens. 2019, 11, 292 4 of 16 square meters. The area contains numerous linear vegetation elements of varying geometry, ranging from completely straight to curved, isolated or connected to other linear or nonlinear objects. Examples of vegetation and non-vegetation elements are planted forest patches, hedges, green lanes, isolated farms, ditches, a river, dykes and a road network ( Figure 1). This landscape heterogeneity within a small area ensured that both the classification of vegetation and the segmentation of linear objects can be efficiently trained and tested.

Method
The workflow for the classification of linear vegetation objects ( Figure 2) consisted of three main routines: 1. Feature extraction, 2. Vegetation classification, and 3. Linear object segmentation. In the first routine we computed features and added these as attributes to each point in the point cloud.
In the second routine we trimmed unnecessary data points to improve computational efficiency and we applied a supervised classification machine learning algorithm [26] to classify the vegetation points in the point cloud using features based on echo, local geometric and local eigenvalue information of the point cloud (Table 1). In the third routine two preprocessing steps were applied, after which a rectangularity-based region growing algorithm was used to segment the classified vegetation points into rectangular objects, and an elongatedness criterion was applied to identify linear vegetation objects. The accuracy of the vegetation classification and linear object segmentation was tested against manually annotated datasets. ranging from completely straight to curved, isolated or connected to other linear or nonlinear objects. Examples of vegetation and non-vegetation elements are planted forest patches, hedges, green lanes, isolated farms, ditches, a river, dykes and a road network ( Figure 1). This landscape heterogeneity within a small area ensured that both the classification of vegetation and the segmentation of linear objects can be efficiently trained and tested.

Method
The workflow for the classification of linear vegetation objects ( Figure 2) consisted of three main routines: 1. Feature extraction, 2. Vegetation classification, and 3. Linear object segmentation. In the first routine we computed features and added these as attributes to each point in the point cloud. In the second routine we trimmed unnecessary data points to improve computational efficiency and we applied a supervised classification machine learning algorithm [26] to classify the vegetation points in the point cloud using features based on echo, local geometric and local eigenvalue information of the point cloud (Table 1). In the third routine two preprocessing steps were applied, after which a rectangularity-based region growing algorithm was used to segment the classified vegetation points into rectangular objects, and an elongatedness criterion was applied to identify linear vegetation objects. The accuracy of the vegetation classification and linear object segmentation was tested against manually annotated datasets. All data were analyzed using free and open source software. The scripting was performed in Python (3.6.5) using the NumPy (1.14.2) [27], SciPy (1.1.0) [28], pandas (0.22.0) [29], scikit-learn (0.19.1) [30], and CGAL (4.12) [31] libraries. PDAL (1.7.2) [32] was used for preprocessing and downsampling data. CloudCompare (v2.10alpha) [33] was used for visualizing the point cloud and for the manual classification. The full code is available via GitHub: https://github.com/chrislcs/linearvegetation-elements/tree/master/Code.

Feature Extraction
The relevance of the various input features has been extensively studied, especially to separate urban objects from vegetation [34][35][36][37]. After reviewing the relevant literature, we selected and computed fourteen features (Table 1), which were used for vegetation classification in the second routine of the workflow. These features represent both point-based and neighborhood-based features and reflect information from echo and local neighborhoods (geometric and eigenvalue based), respectively. The qualities of these features are considered to be efficient for discriminating vegetation objects from point clouds [34]. The features were computed for the total extent of the study area (Figure 1), and added as attributes to form a point cloud dataset with features. Table 1. Overview of point-based and neighborhood-based features used in the vegetation classification. The point-based features are based only on return number information stored with each point (i.e., echo information) whereas neighborhood-based features are based on the local geometry and eigenvalue characteristics derived from the x, y and z coordinates of the point cloud.
Local radius r l max j:Ni

Point-Based Features
The point-based features represent information from each single point (Table 1). The point cloud P is a set of points {p 1 , p 2 , . . . , p n } ∈ R 3 , where each point p i has x, y and z coordinates. In addition, an intensity value (I), a return number (R), and a number of returns (R t ) of the returned signal are stored. We used R t as well as the normalized return number R n as echo-based features ( Table 1). The R n highlights vegetation, since vegetation can be characterized by multiple returns [35]. Since the available LiDAR data were available without the information such as flying height, plane trajectory data and sensor-related parameters, required to do a radiometric correction of the intensity data, we omitted this feature for the classification [40].

Neighborhood-Based Features
We computed point-based local-neighborhood features. A neighborhood set N i of points {q 1 , q 2 , . . . , q k } was defined for each point p i , where q 1 = p i , by using the k-nearest neighbors method with k = 10 points. We used a spherical neighborhood search with a fixed number of points instead of a fixed radius to calculate the features in Table 1, because of the homogeneous point density across our study area [23,39]. In this way a k of 10 results in a neighborhood of 10 points, one of which is the focal point itself. Based on these neighborhoods we then computed four geometric features: Height difference, height standard deviation, local radius and local point density (Table 1).
In addition to the geometric features, we calculated eight eigenvalue-based features (Table 1), that describe the distribution of points of a neighborhood in space [34,41]. We used the local structure tensor to estimate the surface normal and to define surface variation [39]. The structure tensor describes the principal directions of the neighborhood of a point by determining the covariance matrix of the x, y and z coordinates of the set of neighborhood points and by computing and ranking the eigenvalues (λ 1 , λ 2 , λ 3 , where λ 1 > λ 2 > λ 3 ) of this covariance matrix. Hence, the magnitude of the eigenvalues of the matrix describe the spread of points in the direction of the eigenvector. The eigenvector belonging to the third eigenvalue is equal to the normal vector ( → N = (N x , N y , N z )) [39]. The points are linearly distributed if the eigenvalue of the first principle direction is significantly larger than the other two (λ 1 >> λ 2 ≈ λ 3 ), and planarly distributed if the eigenvalues of the first two principle directions are about equal and significantly larger than the third (λ 1 ≈ λ 2 >> λ 3 ). The points are scattered in all directions if all eigenvalues are about equal (λ 1 ≈ λ 2 ≈ λ 3 ). These properties (linearity, planarity, and scatter), as well as some additional features (omnivariance, eigenentropy, sum of eigenvalues, and curvature), were quantified using the formulas in Table 1.

Vegetation Classification
In a first preprocessing step, we removed irrelevant points, i.e., those that do not belong to tall vegetation (shrubs and trees). Subsequently, we applied a supervised classification in which the two echo, four geometric, and eight eigenvalue features (Table 1) were used as input for the vegetation classification after which we calculated metrics to assess the accuracy.

Preprocessing
Points that did not belong to tall vegetation were removed from the dataset to facilitate efficient processing. This was done by trimming data points that belonged to either non-vegetation, bare soil or low-stature vegetation (including grasslands and agricultural fields). This simplified the identification of tall linear vegetation elements composed of shrubs and trees. The removed points were characterized by a locally planar neighborhood and were selected on the basis of the scatter feature (Table 1). Points with low scatter values (S λ < 0.03) were removed. This threshold was very conservative, but substantially contributed to reduction of the point data size, while still preserving all points that characterize tall vegetation.

Supervised Classification
For vegetation classification, we used a random forest classifier because it provides a good trade-off between classification accuracy and computational efficiency [23,42]. The random forest algorithm creates a collection of decision trees, where each tree is based on a random subset of the training data [43]. Random forest parameters such as the maximum number of features, minimal samples per leaf, minimal samples per split, and the ratio between minority and majority samples were optimized using a grid search [44].
As a result of data trimming, the remaining point cloud contained a lot more vegetation points than other points. Since imbalanced training data can lead to undesirable classification results [45], we used a balanced random forest algorithm. In this algorithm, the subsets are created by taking a bootstrap sample from the minority class and a random sample from the majority class with a sample size similar to the minority class [46]. By employing enough trees all majority class data are eventually used, while still maintaining a balance between the two classes. The decision trees were created using a Classification and Regression Tree (CART) algorithm [47].

Accuracy Assessment
For the accuracy assessment of the vegetation classification, a manual annotation of the trimmed point cloud into vegetation (trees and shrubs) and other (buildings, ditches, railroad infrastructure) classes was done using an interpretation of both the point cloud and the high-resolution aerial photos. This resulted in a ground truth dataset of 997085 vegetation points and 56170 other points. By taking into account the dataset imbalance, we used the receiver operating characteristic (ROC) curve [48], the Matthews correlation coefficient (MCC) [49] and the geometric mean [50] as accuracy metrics. These metrics properly evaluate the performance of a classifier, even when dealing with an imbalanced dataset [51][52][53]. The area under a ROC curve (AUC) is a measure for the performance of the classifier [48] (Bradley, 1997). To create a ROC curve, the true positive (TP) rate is plotted against the false positive (FP) rate at various decision thresholds. The MCC analyzes the correlation between the observed and the predicted data and is defined as: where TN are the true negatives and FN the false negatives. The geometric mean is defined as: The MCC, AUC and the geometric mean were obtained using a 10-fold cross validation. This is done by splitting the data into 10 randomly mutually exclusive subsets and using a subset as testing data and a classifier trained on the remaining data [51].

Linear Object Segmentation
For the linear object segmentation, we applied two preprocessing steps, clustered the points, applied a region growing algorithm, merged nearby and aligned objects, evaluated their elongatedness, and finally assessed the accuracy (Figure 2).

Preprocessing
Since we use linearity as a purely two-dimensional property (Table 1), we first converted the point cloud to 2D by removing the z-coordinate of the vegetation points. In a second step, the data were spatially down-sampled to 1-meter distance between vegetation points using Poisson sampling. This step preserved precision, but substantially facilitated computational efficiency. After down-sampling, we clustered the remaining points using a density based spatial clustering of application with noise (DBSCAN) algorithm [54]. This algorithm generates clusters, based on point density, and removed outliers. The clustering reduced the amount of possible neighboring points and therefore decreased the processing time during the next region growing step.

Rectangularity-Based Region Growing
Region growing is an accepted way of decomposing point clouds [55,56] into homogeneous objects. Normally, regions are grown based on similarity of the attributes of the points. Here, we used an alternative way and grew regions based on a rectangularity constraint (Figure 3). The rectangularity of an object is described as the ratio between the area of an object and the area of its minimum oriented bounding box (MOBB) [57]. The MOBB is computed using rotating calipers [58]. First a convex hull is constructed using the QuickHull algorithm [59] and then the MOBB can be found by rotating the system by the angles which are made by the edges of the convex hull and the x-axis. The algorithm then checks the bounding rectangles of each rotation because the minimum oriented bounding box has a side collinear with one of the edges of the convex hull [60]. The area of the object can be calculated by computing the concave hull of the set of points belonging to the object. This hull is constructed by computing an alpha shape of the set of points [61]. This shape is created by computing a Delaunay triangulation of the points [62] and by removing the triangles with a circumradius higher than 1/α, where α is a parameter which influences the number of triangles removed from the triangulation and For each cluster, points with the minimum x-coordinate and its ten closest neighbors were used as the starting region. Subsequently, the eight nearest neighbors of each point were considered for growth ( Figure 3). Points were added as long as the region's rectangularity did not drop below a threshold of 0.55. We determined this threshold value on a subset of the data which showed that the best performance of the algorithm was between 0.5 and 0.6, with only marginal differences in performance. After a region is grown, the growing procedure was repeated for the next region until the entire cluster is segmented into rectangular regions. Step a shows how a current region (in green) is grown in which the ratio between the concave hull and bounding box is above the threshold of 0.55.
Step b considers a next point (in red) to be included to the region, which is also accepted.
Step c considers a next point, which does not meet the rectangularity constraint of 0.55 and is not added to that region.

Object Merging
After region growing, some objects might be isolated or very small, for example as the result of minor curves in the linear elements or small interruptions in vegetation. These objects were merged if they were within 5 m of another object, faced a similar compass direction, and were aligned. The compass direction was determined by computing the angle between the long sides of the minimum bounding box and the x-axis. The alignment of objects was checked by comparing the angle of the line between the two center points with the compass directions of the objects. Once merged the lengths of the objects were combined and the maximum of the two object widths taken as the new width.

Elongatedness
All objects were assessed for linearity by evaluating their elongatedness, which is defined as the ratio between its length and width [63]. We used a minimum elongatedness of 1.5 and a maximum width of 60 meters, which was found to be realistic values in this rural landscape. These thresholds produced consistent linear vegetation elements while large forest patches were excluded.

Accuracy Assessment
The accuracy of the delineated linear objects was assessed by calculating the user's, producer's and overall accuracy, as well as the harmonic mean of the precision and recall (F1) and MCC scores [64]. We manually prepared a dataset of linear and nonlinear vegetation objects by means of a field visit in combination with interpretation of high-resolution air-photos. Based on the difference Step a shows how a current region (in green) is grown in which the ratio between the concave hull and bounding box is above the threshold of 0.55.
Step b considers a next point (in red) to be included to the region, which is also accepted.
Step c considers a next point, which does not meet the rectangularity constraint of 0.55 and is not added to that region.
For each cluster, points with the minimum x-coordinate and its ten closest neighbors were used as the starting region. Subsequently, the eight nearest neighbors of each point were considered for growth ( Figure 3). Points were added as long as the region's rectangularity did not drop below a threshold of 0.55. We determined this threshold value on a subset of the data which showed that the best performance of the algorithm was between 0.5 and 0.6, with only marginal differences in performance. After a region is grown, the growing procedure was repeated for the next region until the entire cluster is segmented into rectangular regions.

Object Merging
After region growing, some objects might be isolated or very small, for example as the result of minor curves in the linear elements or small interruptions in vegetation. These objects were merged if they were within 5 m of another object, faced a similar compass direction, and were aligned. The compass direction was determined by computing the angle between the long sides of the minimum bounding box and the x-axis. The alignment of objects was checked by comparing the angle of the line between the two center points with the compass directions of the objects. Once merged the lengths of the objects were combined and the maximum of the two object widths taken as the new width.

Elongatedness
All objects were assessed for linearity by evaluating their elongatedness, which is defined as the ratio between its length and width [63]. We used a minimum elongatedness of 1.5 and a maximum width of 60 meters, which was found to be realistic values in this rural landscape. These thresholds produced consistent linear vegetation elements while large forest patches were excluded.

Accuracy Assessment
The accuracy of the delineated linear objects was assessed by calculating the user's, producer's and overall accuracy, as well as the harmonic mean of the precision and recall (F1) and MCC scores [64]. We manually prepared a dataset of linear and nonlinear vegetation objects by means of a field visit in combination with interpretation of high-resolution air-photos. Based on the difference between the automated and manually constructed data a map and confusion matrix detailing the accuracy were created.

Vegetation Classification
The vegetation classification resulted in a map with three classes (Figure 4). A class with points that were removed during the preprocessing (grasslands, agricultural fields, bare soil, water bodies), a class with points that were identified as infrastructure (e.g., building edges, ditches, and railroad infrastructure) and the relevant points that represent tall vegetation. The accuracy assessment of the vegetation and other classes have a producer's accuracy of 0.98 for vegetation and of 0.85 for the other classes ( Table 2). The AUC of 0.98 showed that the vegetation and other class were well separated, and this was also supported by an MCC value of 0.76 (indicating of a positive correlation between the predicted and observed classes) and the geometric mean of 0.90.
Remote Sens. 2018, 10, x FOR PEER REVIEW 9 of 15 between the automated and manually constructed data a map and confusion matrix detailing the accuracy were created.

Vegetation Classification
The vegetation classification resulted in a map with three classes (Figure 4). A class with points that were removed during the preprocessing (grasslands, agricultural fields, bare soil, water bodies), a class with points that were identified as infrastructure (e.g. building edges, ditches, and railroad infrastructure) and the relevant points that represent tall vegetation. The accuracy assessment of the vegetation and other classes have a producer's accuracy of 0.98 for vegetation and of 0.85 for the other classes ( Table 2). The AUC of 0.98 showed that the vegetation and other class were well separated, and this was also supported by an MCC value of 0.76 (indicating of a positive correlation between the predicted and observed classes) and the geometric mean of 0.90. Figure 4. Results of the supervised classification. The green class represents the tall vegetation. The two 'other' classes contained data points that were classified as grasslands, agricultural fields, bare soil and water bodies (grey class), or as infrastructure and ditches (blue class).

Linear Object Segmentation
The tall vegetation (see Figure 4) was segmented using a rectangularity-based region growing algorithm. The resulting objects were assessed for linearity and the results were compared with the manual segmentation ( Figure 5). Areas that were correctly classified as linear vegetation elements (true positives) and areas that were correctly classified as nonlinear vegetation objects (true

Linear Object Segmentation
The tall vegetation (see Figure 4) was segmented using a rectangularity-based region growing algorithm. The resulting objects were assessed for linearity and the results were compared with the manual segmentation ( Figure 5). Areas that were correctly classified as linear vegetation elements (true positives) and areas that were correctly classified as nonlinear vegetation objects (true negatives) could be identified ( Figure 5). Some nonlinear areas were misclassified as linear (false positives), and some linear regions were classified as nonlinear (false negatives). However, the confusion matrix (Table 3) shows an overall good accuracy of 0.90, and an F1-score of 0.82 and an MCC of 0.76. Hence, the majority of linear vegetation elements was successfully segmented, which is supported by user's and producer's accuracies of 0.85 and 0.80, respectively (Table 3). Nonlinear objects were successfully separated, with user's and producer's accuracies of 0.92 and 0.94, respectively.
Remote Sens. 2018, 10, x FOR PEER REVIEW 10 of 15 negatives) could be identified ( Figure 5). Some nonlinear areas were misclassified as linear (false positives), and some linear regions were classified as nonlinear (false negatives). However, the confusion matrix (Table 3) shows an overall good accuracy of 0.90, and an F1-score of 0.82 and an MCC of 0.76. Hence, the majority of linear vegetation elements was successfully segmented, which is supported by user's and producer's accuracies of 0.85 and 0.80, respectively (Table 3). Nonlinear objects were successfully separated, with user's and producer's accuracies of 0.92 and 0.94, respectively.

Discussion
We designed a method to delineate linear vegetation elements in a rural agricultural area from airborne laser scanning point clouds. Our intention was to apply already existing codes (which mainly stem from the classification of complex city scenes) in a new context for the extraction of linear vegetation elements in rural landscapes. Our findings show that LiDAR features, calculated from a pre-processed point cloud, are useful to separate tall vegetation from low-stature vegetation and non-

Discussion
We designed a method to delineate linear vegetation elements in a rural agricultural area from airborne laser scanning point clouds. Our intention was to apply already existing codes (which mainly stem from the classification of complex city scenes) in a new context for the extraction of linear vegetation elements in rural landscapes. Our findings show that LiDAR features, calculated from a pre-processed point cloud, are useful to separate tall vegetation from low-stature vegetation and non-vegetation. Moreover, the subsequent analysis enabled the extraction of linear vegetation elements by applying a rectangularity-based growing algorithm. This application is novel in this context and allows mapping and monitoring of hedges, tree rows, and other linear vegetation elements with high resolution and unprecedented detail. We see great potential to apply this methodology to derive inventories of linear vegetation elements in rural and agricultural landscapes for subsequent use in ecological applications, conservation management and regional planning.

Feature Extraction
To our knowledge, LiDAR-based features have not been applied in this way to identify linear vegetation objects. Most features have been used to quantify 3D-forest structure or to separate vegetation and infrastructure in urban environments [65,66]. This is supported by similar studies [35,36] that emphasized the importance of feature selection for separating vegetation and infrastructure. We used the 14 features which were reported to be useful for filtering vegetation from other objects in the landscape [23,[34][35][36][37]. In the preprocessing phases, we used the planarity and scatter features to trim unnecessary non-vegetation data before the actual vegetation classification, since our goal was to only classify tall linear vegetation elements. Approximately 22% of the point cloud, corresponding mainly to smooth and planar areas such as bare soil, grassland and water bodies, was removed, which made the processing of the remaining data much more efficient. Some features, such as the number of returns and point density have clear relations with the vegetation structure, while others, such as omnivariance and the sum of the eigenvalues, are more difficult to interpret in terms of linear vegetation structure. Our results suggest that these LiDAR-based features can successfully be applied to separate (linear) vegetation from other classes.

Vegetation Classification
Trimming the data proved an efficient step to reduce the computation time needed to classify the vegetation. This trimming preprocessing step forced the dataset to be imbalanced, which was overcome by using a balanced random forest classification algorithm [46] and selecting suitable accuracy metrics to evaluate performance [52,53]. When analyzing the classification statistics it is important to take the effect of the trimming step into consideration. The removed points ( Figure 4) do clearly not belong to the tall vegetation class. Consequently, the remaining other points share some mutual similarities with the tall vegetation points and are therefore more difficult to classify correctly. If the trimmed points would be included in the accuracy assessment, higher accuracy values would be reached, but less insights would be provided for the classification process. Nevertheless, the majority of points were correctly classified (Table 2).

Linear Object Segmentation
Our workflow to identify linear vegetation elements has proven successful in a small test area of a typical rural area in the Netherlands. Most of the linear vegetation elements were successfully extracted ( Figure 5). In Figure 5 near cross section 1, multiple parallel linear vegetation elements occur, which were incorrectly classified. The cross section near 5 shows very small errors, which are caused by small interruptions within the linear vegetation elements. In cross section 6, linear vegetation elements are adjacent to forest patches, which causes some small errors as well.
These observations leave some space for improving the workflow, and we suggest to apply and test the method in some other rural areas, which show different types and spatial configurations of linear vegetation elements. However, seen in the context of this rural landscape, our method is already accurate, which opens up possibilities to further refine the method and to apply it to other study areas or broader spatial extents. For example, pruning activities, a common practice in the Netherlands, may influence the accuracy of the classifications, because they are difficult to detect in a point cloud dataset. Multi-temporal point cloud change detection could overcome such problems.
For the extraction of linear vegetation elements on national or continental scales, our method would need to be upscaled. This requires us to optimize the computational efficiency and speed. For that, some data requirements and methodological steps are required. LiDAR data need to be available at large spatial extents (e.g., across single or multiple countries) and, due to variability of e.g., point cloud density and collection methods across different countries, preprocessing steps need to be carefully tested. For example, minimum point density for the calculation of features should be tested, and further testing is needed to determine the optimal number of points in a neighborhood [23] for applying our methodology across different point densities. Training data in different environmental areas need to be collected to ensure sufficiently high classification accuracies.
Another issue is that the computation time needs to be reduced by speeding up all individual workflow steps. For example, the number of points could be thinned without affecting the quality needed to extract the features which are necessary for classifying the point cloud. Another possibility would be to use parallel or cloud computing solutions which have been introduced in ongoing LiDAR projects [18]. If such steps are taken, the availability of detailed locational information on linear vegetation elements at national or cross-national scales would advance the analysis, monitoring and conservation of a wide range of species such as insects, birds etc., that depend on linear vegetation elements for shelter, food and survival in a fragmented, rural landscape. Initiatives to develop software for upscaling LiDAR vegetation metrics to national and European scale using efficient software and cloud computing facilities are in progress [18,67].

Conclusions
At present, LiDAR datasets differ in quality, content, and accessibility within and across countries. Therefore, identifying robust and scalable LiDAR features for object identification should help to overcome these inconsistencies. The quality of the AHN3 dataset of the Netherlands is high and allows us to correctly identify linear vegetation objects with our method. In addition, multi-temporal LiDAR datasets could be analyzed to monitor changes in the spatial distribution and configuration of linear vegetation objects.
The ecological value of providing such large datasets of linear vegetation objects lies in the broad extent and fine-scale locational details, which is a powerful quality that can be used in the (3D) characterization of ecosystem structure. Existing ecosystem and biodiversity assessment projects, such as the MAES (Mapping and Assessment of Ecosystems and their Services) project [68], the SEBI (Streamlining European Biodiversity Indicators) project [69], and the high nature value farmland assessment [70] on a European scale and assessments of Planbureau voor de Leefomgeving (PBL) on a national level [71], could benefit from the new details.