1. Introduction
Dehesa ecosystems are one of the most characteristic and valuable agroforestry systems of the Western Mediterranean region from an ecological point of view. They spread over approximately 2.8 million hectares of the Spanish Iberian Peninsula [
1]. These humanized landscapes are the result of many centuries of traditional land management combining dispersed
Quercus L. woodland with livestock and agricultural farming. They play critical roles for the ecosystem, which include the preservation of biodiversity, carbon sequestration, water regulation and soil preservation [
2,
3]. Characterization of the tree structure in
dehesas and other forest systems is essential to evaluate ecosystem services, plan sustainable management, monitor temporary changes in response to climate and management pressures and optimize the use of devices leading to the development of smart forests, agriculture and cattle applications [
4,
5,
6].
Traditionally, tree structural characterization in
dehesa formations has been performed through conventional forest inventory methods based on field measurements, which are highly time-consuming and costly in terms of human resources and logistics when applied on vast areas such as those involved in these types of systems [
7,
8,
9]. The traditional methods employed to measure parameters such as crown diameter, total height, crown base height and crown volume require multiple workers, specialized equipment (laser rangefinders, high-precision GPS systems, clinometers, etc.) and direct access to each tree or vegetation unit, thus significantly limiting the ability to make landscape scale inventories or frequent updates that can be monitored throughout time.
Airborne LiDAR (Light Detection and Ranging) technology emerged as a transforming tool to help characterize forest structure, providing the function of capturing detailed 3D information of the vegetation at operationally relevant spatial scales [
10,
11,
12]. The Spanish National Aerial Orthophotography Program (PNOA) of the National Center of Geographical Data (CNIG) [
13] provides low-density LiDAR coverage (1–2 points/m
2) for the entire national territory with regular updates, proving to be an invaluable open-access resource for forest and environmental applications [
14,
15,
16]. However, there is still serious doubt on the effectiveness and precision of the low-density LiDAR technique for the detailed morphological characterization of tree vegetation in open canopy systems such as
dehesa [
17,
18], particularly when using historical data from several years [
19,
20]. Nevertheless, this approach should not be discarded [
21].
For the vertical structure characterization of individual trees, ref. [
22] developed advanced methods for wood and leaf separation from terrestrial LiDAR point clouds, achieving classification accuracies of 89% through mode point evolution techniques. In the present study, the lower point density of airborne PNOA-LiDAR limits the ability to distinguish wood from leaf returns or to capture detailed lower crown structure.
A critical methodological challenge in the application of the LiDAR technique to
dehesas is the automatic segmentation of tree vegetation units. Unlike closed forests, where individual trees can be identified through crown detection algorithms based on local maximum heights,
dehesas prove to have a complex and diversified structure with a variety of groups ranging from isolated trees to sets of tenths of individuals with canopy interconnectivity [
7,
23,
24]. Previous estimates in Spanish
dehesa formations have employed Gustafson Kessel Babuska’s algorithm on orthophotos [
25] or the Random Forest supervised classification algorithm merging PNOA-NIR images with LiDAR [
7]; however, these methods require the previous specification of the number of clusters or supervised learning, which are surmountable limitations through the use of density-based spatial clustering algorithms.
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm that groups together points in spatial proximity without the need to specify the number of clusters in advance and can identify clusters randomly as well as detecting noise [
26,
27]. Recent applications have demonstrated the effectiveness of density-based clustering approaches for tree segmentation from LiDAR point clouds. DBSCAN and K-means clustering were successfully combined for individual tree segmentation in airborne LiDAR data, achieving superior performance compared to traditional methods [
28]. Hierarchical DBSCAN(HDBSCAN) combined with machine learning techniques was employed in ref. [
29] for tree stem segmentation from UAV-mounted LiDAR, achieving 82% detection rates with 98% precision without requiring site-specific parameters. Its specific application to the segmentation of tree vegetation from LiDAR point clouds in
dehesa systems has never been systematically explored nor validated, despite its theoretical potential for capturing the structural diversity that is inherent of these ecosystems.
Once segmentation has been performed, precise delineation of crown perimeters is an essential step for the estimation of morphological parameters. The convex hull methods that are largely used due to their computational simplicity, tend to overestimate crown area and volume as they include the gaps between branches [
30]. Concave hulls, which can capture irregular geometries that are characteristic of Mediterranean leafy crowns, offer a more realistic option [
31]. The Concaveman algorithm [
32], based on Delaunay triangulation with adaptive pruning, is a promising candidate for this application; however, its validation with low-density LiDAR data in the context of the Mediterranean
dehesa ecosystems requires empiric assessment.
The temporal validity of LiDAR data is a critical practical consideration that has been rarely analyzed by literature. Given that national LiDAR products such as PNOA are typically updated in 6 and 7 year cycles [
33], understanding whether old LiDAR data are still useful to characterize the current structures, considering tree growth and the change in management, has direct implications on the operational feasibility of these resources. In the case of the
Quercus ilex L., some studies have assessed apical growth of the trunk of mature trees with maximum heights of 15 mm per annum in
dehesas of the Southwest of Spain [
34]. Other authors like [
35,
36,
37] have shown the low diameter growth of trunk and stems, proving that there is potential for LiDAR historical data to remain valid with predictable and correctable biases. However, this hypothesis requires careful quantitative validation.
The purposes of this paper are: (1) the assessment of the applicability of the DBSCAN algorithm for the automatic segmentation of tree vegetation using a low-density LiDAR technique in Mediterranean dehesas, quantifying identification precision by comparison against the count from reference orthophotography; (2) the implementation and validation of Concaveman for automatic delineation of crown perimeters, assessing visual adjustment and comparing against field GNSS measurements; (3) the quantification of systematic biases between the morphological parameters derived from the LiDAR data (crown diameter, total height, crown base height, crown volume) and reference field measurements through a strict statistical analysis that recognizes a nested data structure; (4) the assessment of temporal validity of 6-year old PNOA’s LiDAR data, separating the effects of tree growth from the methodology biases; and (5) the development of the empiric corrective equations that allow to convert LiDAR measurements to field equivalents for the purposes of operational applications in similar dehesa contexts.
It is important to note that the correction equations developed in this study are derived from a single 116-ha municipal dehesa site and, although they are representative of many dehesas in the south-west of the Iberian Peninsula, should be considered site-specific. Crown architecture, grazing intensity, management practices (e.g., pruning regimes), topographic characteristics, and LiDAR acquisition conditions may all influence segmentation performance and error structure. Therefore, these equations should not be applied directly to other dehesa formations without site-specific recalibration and validation.
This study was carried out on the local dehesa of Santibáñez el Bajo (Caceres, Spain) on 116 hectares with 1254 identified vegetation units, with the results being validated through detailed field measurements on 35 carefully selected individual Q. ilex trees. The results provide a systematic assessment of the combination of the DBSCAN and the Concaveman algorithms for automatic processing of LiDAR data in dehesa systems, quantifying specific methodological biases of the PNOA-LiDAR technique for key morphological parameters, while also demonstrating the feasibility of historical data for structural characterization. This methodological approach shows potential applicability to large-scale forest inventories in dehesa formations across the Iberian Peninsula, though validation in diverse dehesa contexts would be needed before widespread operational deployment.
2. Materials and Methods
2.1. Study Area
The area of study is located in the north of the Caceres province, within the Autonomous Region of Extremadura (
Figure 1a). Specifically, at Dehesa Municipal de Santibáñez el Bajo, No. 113-CC in the Catalogue of Public Utility Forests of Extremadura, around geographic coordinate −6.24425 40.19039 (EPSG 25829: 734,572; 4,452,539). The maximum altitude is 476.3 msl and the minimum is 369.6 msl. The average land slope is 7.8% and 90% of the area is below 15%. The predominant exposure is sun-facing.
It is a dehesa system used for livestock farming and populated by Q. ilex, Quercus suber L. and Quercus pyrenaica L. tree species, in order of numeric importance. In the areas with bushes, the underlayer is predominantly populated by Cistus ladanifer L., with heights between 0.1 and 1.5 m.
Aside from the production of cork, the main use of the area is livestock farming, with permanent presence of mixed-breed cattle and pigs being raised on the Montanera (typical free-range breeding) system. The presence of cattle throughout the year shapes the morphology of the trees, since continuous animal browsing turns the minimum height of the crown base to approximately 1.7 m from the ground.
A specific area has been selected within this local
dehesa to carry out the morphological analysis of its tree vegetation units using airborne low-density LiDAR data. The tree vegetation units consist of groups of one or several adult trees with overlapping crowns. The area of reference is shown in
Figure 1b, and its delimitations are established by the coordinates listed in
Table 1, enclosing an area of 1,166,357.07 m
2.
The area under study comprises 1254 vegetation units. The count was performed manually, by visually interpreting each unit on the PNOA orthophotograph captured during a 2018 aerial survey [
13]. Only vegetation units that were fully integrated within the coordinates of
Table 1 have been considered.
The area described was selected due to its high variability in tree density per vegetation unit, while the terrain topography was variable and representative of the overall area. Furthermore, the presence of isolated holm oak specimens facilitates ground-truth data acquisition. Field measurements prove to be highly limitative for this type of research, whilst validating remote sensing methods on isolated trees makes the work feasible; these results are subsequently extrapolated to tree clusters.
In this area, the tree layout is diverse, with groups of trees ranging from one to 50 crown overlapping individuals, which makes it hard to individualize them.
2.2. Workflow
The morphological characteristics of the tree crowns were determined, and a record was made of their three-dimensional structure with the relevant geographical references. For this purpose, a workflow process was developed (
Figure 2) which included basic operations for the combination of the files containing aerial LiDAR data and their clipping based on a polygonal mask contained in a vectorial file in order to preserve only the LiDAR data for the chosen area of the study (
Table 1). All returns from construction structures and elements classified as noise were removed from the dataset obtained, as they had little relevance for the purposes of this study.
Once the data were filtered, a digital terrain model (DTM) was calculated using the returns classified as soil, which allowed us to normalize the heights of the remaining points of the LiDAR dataset. Such normalization helps assign the same reference for all the points and their represented objects to obtain their height. This way it was possible to remove all the LiDAR points with heights below the crowns of target trees and above the shrub layer, thus obtaining a dataset that represented only the returns of the LiDAR pulses received from the tree crowns.
As this constitutes a set of defined returns that isolate crowns or groups of crowns, these returns can be grouped together or delineated to create a three-dimensional representation of each vegetation unit.
2.3. LiDAR Dataset
The operations on the LiDAR data were carried out using the lidR v.4.2.1 [
38,
39] package of the R statistics software, version 4.4.3 [
40].
The LiDAR dataset used is a set of files of aerial low-density LiDAR returns obtained by the National Center of Geographical Information for the Caceres province that are provided as zipped LAS files (LAZ), each covering a 2000 m × 2000 m area (
Table 2). The flights over the area under study were taken on 12 and 29 December 2018, using an airplane mounted with a LiDAR Riegl VQ-1560i scanning system (RIEGL LMS GmbH, Munich, Germany), able to take photographs simultaneously for the visible and infrared spectra using an iXM-RS100F camera (Phase One A/S, Frederiksberg, Denmark). The pulse density emitted was at least 2 p/m
2, with a maximum of 5 returns per pulse emitted. The point-cloud data were processed using TerraScan software by TERRASOLID Ltd., (Helsinki, Finland).
The three datasets were loaded to R, and a catalog was created that was clipped to the area under study (
Table 1) using a 100 m buffer so that any subsequent operations were not affected by artifacts produced by the interpolation algorithms used to digitalize the terrain. Once the dataset was filtered by area of interest, the heights of all the returns were normalized, and the dataset was clipped to the area under study without buffering.
The highest number of returns potentially attributed to tree vegetation were used to estimate tree crown volume. This would include returns classified as high vegetation (5), medium vegetation (4) and low vegetation (3), as well as returns classified as overlaps (12) and unclassified returns (1), within a height range of 1.7 m—identified as the minimum height relevant for livestock browsing—to 25 m. This range prevents the inclusion of returns that fall outside the scope of the study, as they clearly do not represent tree vegetation. A total of 1,297,730 returns were recorded.
2.4. Crown Measurements and Time Considerations
Crown parameter measurements were obtained following two approximations: (1) low-density, LiDAR-derived measurements from 2018 aerial surveys and (2) field measurements using common on-ground methods employed in forest inventories. The six-year interval between both data sources allowed us to assess the applicability of the LiDAR data for the acquisition of morphological parameters during the time between two consecutive updates of the CNIG LiDAR products.
In
dehesa production areas, characterized by ongoing human management [
41] and permanent livestock browsing, which affects the renovation of the trees [
42], low forest productivity indexes [
43] and a specific composition that naturally grows slowly, the tree growth rates are typically very low [
35,
36,
37].
A quantitative synthesis of the published growth models for this species in Mediterranean ecosystems was carried out with soil and climate limitations with the purpose of estimating the growth in height of the
Q. ilex species. The diameter growth dynamic models developed by [
36] were used as a basis for
dehesa woodlands in the Iberian Peninsula and the height-diameter curves established by [
43] were used for irregular stands of
Q. ilex in the Eastern Mediterranean area, specifically for the low site index.
In order to estimate the growth in diameter, model E5 by [
36] was used, based on the Korf function, derived through the Generalized Algebraic Difference Approach (GADA):
where X
0 is calculated from the known diameter and age (DBH
1, t
1).
In order to associate the various heights and diameters, the Mihajlov exponential equation can be used, parametrized by [
43]
with parameters b
0 = 0.559884275 and b
1 = 0.517723183 for low site indexes and b
0 = 0.605188232, b
1 = 0.427293554 for high-quality site indexes.
The combination of both models can help predict the expected growth between 20 and 30 cm for a term of 6 years for mature trees (DBH > 30 cm), regardless of the site index.
2.5. Clustering
Once the preparation operations were completed, the clustering process was carried out to group together all the returns received from the same vegetation unit, which consisted of one or various trees with overlapping crowns. Clustering is an automated learning technique used in data analysis that allows to make groups of data based on one or several shared characteristics amongst them.
Numerous clustering algorithms have been developed [
44,
45] that can be classified according to the clustering strategy on which they are based, their computational or spatial complexity and suitability for use due to the characteristics of the data to which they apply, such as dimensionality, size, distribution function, sensitivity to the order of data input into the algorithm or to noise [
45]. A LiDAR dataset intended for clustering performance is typically large with low dimensionality, since only the planar coordinates are used, and arbitrary in terms of distribution shape. It may also contain noise.
The clustering algorithm most commonly cited in the literature is K-means [
44] which, although in terms of temporal complexity, number of implementations and ease of use, it provides advantages in comparison to others, it also reveals significant drawbacks for this study, such as the need to previously know the number of clusters present and their high sensitivity to noise. For the clustering of spatial data, the most suitable algorithms are DBSCAN, STING, CLARANS and Wavecluster [
45].
STING is adequate for the generalization of the characteristics of interest of spatial data. It is not suitable in this case, because the characteristic of interest is “position” and the aim is to keep it precise. Wavecluster, which is also based on the creation of spatial grids, focuses its cluster search on the processing of the characteristics of the points and not of their geographical position. On the other hand, CLARANS presents quadratic computational complexity, and it is not a good option for large datasets, having the same drawbacks as K-means in terms of the need for the number of clusters to be previously known.
DENCLUE is a density-based clustering algorithm that would work very well to discriminate clusters based on geographical position, with more computational efficiency in its execution, but its use requires a complex tuning of the input parameters in order for the results to prove better in comparison to other algorithms such as DBSCAN or OPTICS [
46].
DBSCAN [
26] (see
Figure 3) is also a density-based algorithm. DBSCAN examines how many points of each one are situated at a distance below ε. If the number of points is equal or above
minPts, the point is marked as a
core point. Points which at ε distance only have a number of points below
minPts within range are marked as
border points. Both
core points and
border points make up the cluster. The remaining points are considered
outliers. The set of outliers that are not part of any cluster are classified as
noise. The selection of the ε and
minPts parameters has a significant impact on the clusters identified.
A variation in the DBSCAN algorithm known as OPTICS [
47] has also been under analysis for this task. It uses the same parameters as DBSCAN, but instead of producing fixed clusters, it organizes the points in a way that reflects the underlying density structure. This helps identify clusters of various densities, although this is not the case in this study, given the regularity of LiDAR pulse emission and the homogeneity of returns density for the same vegetation type. OPTICS can omit the definition of ε in advance, which makes it more flexible in comparison to DBSCAN. However, it can be much more complex from a computational point of view.
A clustering was produced based on the returns of classes {1, 3, 4, 5, 12} using the
dbscan package [
27] on the R software, adjusting ε to 1.7 m and
minPTS to 2 points.
The selection of the ε value should not increase the complexity of the DBSCAN algorithm significantly. Its ideal adjustment would require previous knowledge of the number of existing clusters in the area under study, thus losing the non-supervisory nature of DBSCAN. It was decided that the epsilon value should be generally applicable to any areas occupied by holm oak or cork oak trees, regardless of the density of the stand, associated with the number of returns obtained from the
dehesa vegetation units using the LiDAR dataset and, therefore, generally applicable to the Caceres Province. In order to do so, the area under study was divided into 2 m × 2 m polygons, those without LiDAR returns were removed and two categories were created (
Figure 4a): polygons with neighbors that were surrounded by other polygons (
core cells) and non-surrounded polygons (
border cells). The return density of the
core cells can be used to calculate the return density of the selected classes by square meter and obtain the distance between them.
Ninety-nine per cent of the
core cells revealed higher or equal density to 1.25 returns per m
2. Once density is calculated, the distance between returns can be obtained as the square root of density.
The computed distance between returns is 0.894 m, and therefore the minimum distance between returns from two different vegetation units should remain within [0.894, 0.894 × 2). An ε value of 1.7 m was selected. This was 5% below the maximum range of minimum distances between vegetation units, which would enable adequate clustering of returns from the same vegetation unit. With the purpose of discarding any potential noise points in vegetation units using the DBSCAN algorithm, reachability was limited to a minimum of two returns for the same cluster. Consequently, a minPTS value of 2 was selected.
As there were holm oak trees on the edges of the area under study and part of their crowns were outside the area, the point clusters where some of the members were at a distance lower than 1.7 m from the edge were removed. On the other hand, certain elements of non-tree vegetation remained within the dataset, whilst showing a very low number of returns in the cluster in comparison to the others, so all elements throwing a size below 100 returns were removed.
2.6. Delineation of Vegetation Unit Boundaries
Once the clustering of the returns was completed, the area occupied by each differentiated element of vegetation was defined. The three possible strategies available were, the concave hull, the convex hull or the grid-based strategy, which are schematically represented in
Figure 5.
Grid-based crown delineation methods are easy to implement but tend to overestimate canopy extent, with the degree of overestimation increasing as grid resolution coarsens. Conversely, when grid resolution is excessively fine, it may introduce discontinuities in crown delineation. Concave or convex hulls are usually employed to define point clusters. The selection of hull type depends on the semantics of the point cloud. For delineation of vegetation units, the concave hull method was selected, as it yielded results that more accurately represented the crown perimeter when overlaid on reference imagery.
There are various methods to generate concave hulls. The most relevant methods approach delineation based on the k-nearest neighbors, α-shapes construction, and the Concaveman algorithm. Their most relevant characteristics are shown in
Table 3.
The KNN method is simple and efficient for small datasets and proves highly sensitive to the selection of the number of neighbors to explore. The α-shapes method is well suited to very irregular shapes, but it also requires the adjustment of the α parameter, which adds a high degree of subjectiveness or intense previous exploration.
The Concaveman method computes a concave hull for a 2D point set through two primary stages. First, it performs a Delaunay triangulation of the points, optimally connecting them to form a non-overlapping triangle network. Then, it applies a pruning process based on a concavity parameter, which progressively removes triangles with sides exceeding a threshold length defined by the user. This step adjusts the shape of the hull, modulating its concavity, according to the intended level of detail. Finally, the remaining triangle edges form the concave hull.
For the purposes of this study, the Concaveman method was selected, implemented to R through the library of the same name [
50], because although it may not be as precise as the α-shapes method, it is well suited to the shape of the vegetation elements with a very intuitive selection of parameters and very fast execution in large datasets. A concavity of 0.7 was selected as a parameter, and it was visually checked that it formed well-adjusted soft hulls.
2.7. Three-Dimensional Survey and Estimation of Crown Volume
In ref. [
51] a review of crown volume estimation methods is provided, both direct field estimation and remote estimation using terrestrial or aerial LiDAR techniques and aerial photography. For the purposes of this study focused on the use of LiDAR data and volume approximation by slicing for remote estimation (
Figure 6a) [
51] as well as the tree silhouette volume method [
52] in order to verify if remote estimation provided sufficiently precise volume measurement and geometry of the of tree crowns.
An R script was created to carry out the following operations for the purpose of estimating volume and geometry from LiDAR returns:
Determination of LiDAR returns classes 1, 2, 3, 4, 5, 12 higher or equal to 0 m contained or intersected by the polygons delineated during the previous stage and storage of normalized coordinates x, y, z, in an array with the number of cluster (c) to which they belong.
Setting a slice height of hs meters.
Setting as many classes of heights hi as heights that are multiple of hs for each c cluster.
Classification of each return subject to its rounded height to the first lower multiple of
hs (
Figure 6a).
Calculation of the concave hull containing all the points with equal or greater heights than hi using the Concaveman algorithm for each group of returns of a specific class of height hi.
Identification of the last slice of the tree crown. On traversing downward in height, this slice is the one immediately preceding the slice that exhibits the greatest relative reduction in point density compared to the previous slice. Within this slice, the return with the minimum height (hₙ) was located, which defines the crown base height above ground (
Figure 6b Crown base).
The maximum tree height (
ht) will be that of the return with maximum height (
Figure 6b Tree height).
The bases of each slice of the crown, their height class hi, their cluster number and the number of returns for the height class are stored as georeferenced polygons for further use.
The volume of each slice is computed as the result of the average area of the polygons defining each slice multiplied by the height of the slice. For the lowest slice, its thickness is deemed to be the distance between the crown base and the base of the following slice. For the highest slice, its thickness is deemed to be the distance between its base and the maximum height of the tree.
The total crown volume defined by the cluster for tree i (VLi) will be the addition of all the individual volumes of every slice.
2.8. Field-Based Measurement of Estimated Morphological Parameters
The LiDAR-derived data were validated by comparing them against reference measurements. To this end, a sample of 35 holm oak individuals was analyzed in the field, enabling accurate assessment of crown geographical position and volume. The selection of these individual trees was based on several factors, i.e., they had to be trees that were sufficiently isolated to allow complete crown photographs to be taken from the eight cardinal and intercardinal directions, they had to be healthy trees with full crown structure, this is, with no loss of main branches and no perceivable loss of foliage since the date of the LiDAR flights used for this study.
An initial visual screening stage identified 67 trees, based on the crown perimeters calculated during the crown delineation stage. An R script was created to remove those trees which were at a distance from the nearest trees of less than twice the diameter of their crown in cardinal and intercardinal directions. Following this screening stage, the remaining field conditions were verified and the final number of trees selected was 35 to carry out the following operations (
Table 4).
It is important to acknowledge that the selection criteria employed, isolated trees with accessible crowns for complete photographic coverage, introduce a selection bias in the validation sample. Trees in dense clumps or with overlapping crowns were necessarily excluded because the TSVM requires unobstructed views from multiple angles, making it practically impossible to obtain reliable ground-truth volume measurements for such configurations.
2.8.1. Tree Parameter Measurements
The normal diameter of each selected tree was measured at 130 cm from the ground using a MEDID MF602 flexible metallic diameter measuring tape (General de Medición SL, Santa Perpètua de Mogoda, Spain). Total tree height and crown base height were measured using a laser and ultrasonic Häglof Vertex Laser Geo 2 rangefinder (Haglöf Sweden AB, Långsele, Sweden), which provides length measuring solutions by measuring horizontal and vertical angles, with laser measured distance to the target.
The geographical position of tree
i (
CGi with coordinates {x
Gi, y
Gi}) and the locations of the cardinal and intercardinal points along its crown perimeter were recorded using a Zenith06 dual-frequency GNSS rover receiver (GeoMax AG, Widnau, Switzerland), with QField v3.x (OPENGIS.ch GmbH, Laax, Switzerland) [
53] as the data capturing software and the geographical positions being provided by GNSS and installed on an 8-inch Samsung Galaxy Tab Active 3 (Samsung Electronics Co. Ltd., Suwon, Republic of Korea).
The geographical position of the tree was taken at some point of the perimeter of the trunk’s base, and therefore it did not match with the centroid of its crown nor with the center of the trunk. Despite this imprecision, due to the large size of the crown and the distance between trees, this measuring method was suitable to identify the geographical location of an individual tree.
Taking the cardinal and intercardinal positions of the crown’s perimeter helped to determine the crown’s diameter in four directions, i.e., N–S, E–W, NE–SW and NW–SE and the averaged crown diameter (dGi).
A summary of the relevant characteristics of equipment used is shown in
Table 5. An optical square prism mounted on the same surveying rod with spirit level supporting the GNSS was used to improve precision for the positioning of the GNSS system at the edge of the tree crown, in such a way that the operator could have a simultaneous view of the tree crown and the spirit level on the surveying rod.
2.8.2. Crown Volume Measurement and Location
Crown volume was assessed using the Tree Silhouette Volume Method (TSVM) [
52], which served as the field reference (ground truth) for this study. This method involves taking horizontal photographs around the tree under study, delineation of the crown perimeter in each of such photographs and estimation of the volume by revolution of the areas enclosed within the delineated perimeters.
An R script was developed to ensure geometric accuracy and minimize the fieldwork required to locate the positions from which photographs from the tree center could be taken. For each selected and delineated tree, the script calculated the positions of the cardinal and intercardinal points from the tree centroid at a sufficient distance to capture the entire crown. (
Figure 7b).
The method intends to assess the bounding box for the delineated crown orientated by the cardinal points (N, S, E, W) and the bounding box for the intercardinal points (NE, SE, SW, NW), determining the maximum crown width
Wmax from the directions the photographs will be taken (
Figure 7a). Tree centroid
i (
CLi with coordinates {x
Li, y
Li}) is the average point of the centroids from both bounding boxes.
For the purposes of determining the optimal photographic distance (d) to ensure the holm oak crown (with maximum diameter Wmax) was fully included within the frame, the rear camera of a Samsung Galaxy Tab Active 3 (SM-T575) tablet was used, with a true focal distance of 4 mm and a horizontal viewing angle of 63.5°, which was previously calibrated through testing of objects with known dimensions. The theoretical minimum distance was calculated as d = 0.81⋅Wmax meters by application of the trigonometric relation between the object size, the focal distance and the effective sensor width (4.95 mm).
Distance was increased by 3 additional meters to ensure a range of error was considered in the event of irregularities in the shape of the crown or variations in its apparent diameter. This approach ensured the entire crown would be fully contained within the image even under non-ideal measuring conditions (e.g., angular deviations or Wmax estimation errors).
Said distance was used to calculate the 8 positions around the centroid of each tree from which the photographs were taken (
Figure 7b).
Software Fiji v1.54 [
54] was used to manually delineate the crowns as a succession of points in order. Subsequently, the information was processed by dividing each polyline into two halves by a vertical axis running through the point of the higher section of the crown that is closest to its geometrical center. Each of the two halves generates a left and a right semi perimeter for each crown on the photograph orientation. Photograph scaling was based on the field measurements of height and width of the crowns taken with the rangefinder and the GNSS system, respectively.
The area and volume of the crowns were calculated from each semi perimeter using the
Pappus Guldinus second theorem with the right and left vertical edges of the left and right semi perimeters being used, respectively, as rotation axis (
Figure 7c,d). Finally, the crown volume of each tree
i (
VPi) is the mean volume of the 16 volumes computed with every semi perimeter.
2.9. Statistical Analysis
This study assesses the feasibility of using low-density aerial LiDAR data as an alternative to traditional, manual field measurements after a period of 6 years between the capture of data by each of the two measuring systems.
For crown diameters, 280 crown diameter measurements (4 × method × tree) for 35 trees were analyzed. Measurements within each tree were spatially independent but interconnected due to tree shared characteristics.
For height measurements the same 35 trees were examined for total height, crown base height, and crown height. The relation between the observed mean difference and the maximum expected growth (20 cm to 30 cm for total height) was calculated to assess whether the identified differences exceeded the expected growth over the six-year study period. Differences that substantially surpassed the expected growth would indicate a systematic methodological bias rather than a temporary change.
The data were structured in a long format, with the parameter of interest serving as the response variable, the measuring method (GNSS vs. LiDAR) as a fixed effect, and tree identity as the random effect. This approach explicitly models the correlation structure among measurements within trees, while providing an unbiased estimation of the differences between methods. The statistical model employed was a linear mixed-effects model,
where parameter
ij represents the measurement of a given parameter for tree
i using method
j, β
0 is the intersection (the GNSS mean), β
1 is the method effect (the deviation of LiDAR relative to GNSS),
ui is the random intercept for tree
i, and
εij is the residual error.
Prior to the analysis, the intraclass correlation coefficient (ICC) was computed to quantify within-tree correlation and justify the mixed-effects approach. Model assumptions were validated through residual diagnostics including tests for normality (Shapiro-Wilk and Anderson-Darling tests), assessments of homoscedasticity (Levene’s test) and detection of outliers using standardized residuals.
Statistical significance was assessed with a α = 0.05 level of significance, and the degrees of freedom were calculated according to the number of trees (df = 34). Confidence intervals for the differences between methods were constructed using the t-distribution with the relevant degrees of freedom.
For crown volume analysis, an approach was adopted that differed from those employed for diameters and heights due to the data structure. Given that each tree yielded a single volume estimate per method (VL for LiDAR and VP for TSVM), instead of the repeated measurements design applied for diameters, a Student’s paired t-test was applied to evaluate systematic differences between the two methods.
This approach was considerated adequate from a methodological point of view, because it constitutes a paired design, where the LiDAR and TSVM measurements derive from the same sampling units (trees), thereby controlling inter-individual variability and maximizing statistical power.
Prior to the analysis, parametric assumptions were verified through: (1) a Shapiro-Wilk test for normality of the paired differences, (2) a Levene’s test for homoscedasticity between methods, and (3) identification of outliers using standardized residuals. Given that the crown volume data can exhibit asymmetric distributions inherent to forest morphometric variables, the parametric results were planned to be confirmed with the non-parametric Wilcoxon signed-rank test in the event of any assumption violations arose.
Additionally, a linear regression analysis was performed to quantify the relationship between the two methods and to develop correction equations, assessing whether the slope significantly differed from one and whether the intercept differed from zero. Method agreement was assessed with a Bland-Altman analysis, calculating the mean bias, the 95% limits of agreement, and evaluating the presence of proportional bias by correlating the differences and the means of the methods. A significant proportional bias would reveal that the magnitude of the discrepancy varies systematically with volume size, thus requiring regression-based correction instead of a simple additive factor.
All the analyses were performed with a α = 0.05 level of significance with 34 degrees of freedom (df = n − 1 pairs). The sample size of 35 trees, exceeding the minimum 17 calculated for 95% confidence intervals, ensures robustness of the estimates even with moderate deviations from normality according to the Central Limit Theorem [
55].
All the analyses were performed with R version 4.4.3 using the lme4 package for mixed-effects models.
4. Discussion
4.1. Validation of Low-Density Aerial LiDAR for the Morphological Characterization of Woodlands in Dehesa Systems
This study demonstrates that the low-density airborne LiDAR dataset provided by the Spanish National Plan for Aerial Orthophotography (PNOA), with an average density of 2 pulses per m
2, is adequate for the morphological characterization of trees in Mediterranean
dehesa formations, even when data spans over a six-year period. The difference found in total tree height (−0.34 m) is consistent with the expected growth rates, calculated from the expected DBH growth relationship reported in ref. [
36] and height-DBH allometry in ref. [
43]; thus periodical LiDAR acquisitions remain operationally valuable, provided that their predictable, correctable biases are taken into account.
Authors in ref. [
14] noted that Spanish national coverage LiDAR is widely used but its limitations for fine structure are not systematically quantified. This study partially addresses this gap, demonstrating that densities of 2 pulses per m
2 suffice for total height and crown diameter but prove insufficient for detailed vertical structure (crown base height overestimated by +0.77 m, 37%). Studies with terrestrial LiDAR [
31,
58] report greater accuracy in vertical structure, but their cost prevents their application at landscape scale, where PNOA-LiDAR is the only feasible alternative.
4.2. Clustering in Vegetation Unit Segregation
The application of DBSCAN for the automatic segregation of tree vegetation units in dehesa systems is the main methodological contribution of this study. DBSCAN, a density-based algorithm that groups points by their spatial proximity without requiring the previous specification of the number of clusters, proved ideal for the heterogeneous structure of dehesas, where vegetation units vary between 1 and 50 trees with crown overlap.
Our parameters (ε = 1.7 m, minPts = 2) are based on the intrinsic characteristics of the LiDAR dataset in response to the dehesa formations of interest. The algorithm identified 1230 vegetation units with 99.8% accuracy (only 2 false positives) and 97.3% recall (34 false negatives), producing an F-score of 98.5%.
In ref. [
3] a methodology was developed for the automatic identification of ecological units in
dehesa systems using object-based image analysis on RGB orthophotos, reporting good results but requiring supervised classification. This DBSCAN-based approach to LiDAR data is completely unsupervised, eliminating the need for training and automatically generalizing to different areas. Ojeda-Magaña et al. [
25] identified tree crowns in extremaduran
dehesas using the GK-B algorithm, but this algorithm requires the prior specification of a parameter related to the number of clusters to be identified, a limitation that DBSCAN overcomes.
The temporal complexity of DBSCAN is O(n log n) with spatial indices allowing the million LiDAR returns from this working area (116 ha) to be processed efficiently. Comparative studies [
44,
59] confirm that DBSCAN is superior to k-means, CLARANS and other partitioning algorithms for spatial data with random clusters, which is precisely the situation in Iberian
dehesas, where vegetation units have irregular geometries determined by management history, competition and topography.
4.3. Concaveman for Crown Delineation
After segregation with DBSCAN, Concaveman was used to delineate crown perimeters. This algorithm, based on Delaunay triangulation with concavity-adjusted pruning, generates concave hulls that accurately capture the irregular geometry of Q. ilex crowns in dehesas, when visually compared against an orthophoto.
Concaveman, with an intuitive concavity parameter (0.7 in this work) and O(n log n) complexity, provides a practical solution. The DBSCAN + Concaveman combination automatically processes from raw point clouds to crown vector geometries without manual intervention, which is critical for large-scale operational applications.
4.4. Comparison of Results: Diameters, Heights, and Volume
4.4.1. Crown Diameters
The GNSS-LiDAR discrepancy of 0.96 m (~7.5%) is consistent with other values reported in the literature. For example, Moreno et al. [
8], on comparing terrestrial LiDAR with traditional methods in planted coniferous stands, reported R
2 = 0.95 and RMSE = 0.341 cm for crown diameter measurements, which are similar values to the values reported in this study. Although their study benefitted from much higher point density (terrestrial versus airborne LiDAR), the similarity of the goodness-of-fit statistics underscores that the level of agreement revealed by the low-density airborne dataset is within the expected range for LiDAR-based forest measurements.
The absence of proportional bias (r = −0.171,
p = 0.326, Bland-Altman) indicates that the difference is constant across the entire size range, facilitating correction by a simple additive factor. This contrasts with results in ref. [
60] in
Pinus sylvestris L., where significant proportional bias was detected, attributed to differences in crown architecture between conifers and broadleaves.
4.4.2. Crown Heights
LiDAR underestimates total height by 0.34 m (4.1%), comparable to the 3%–6% reported by [
15] integrating LiDAR from the PNOA with the National Forest Inventory in north-eastern Spain and is close to the expected growth of 20-30 cm in height over a 6-year period. So, the observed discrepancy is almost entirely attributable to natural tree growth rather than to LiDAR measurement error.
In contrast, the overestimation of base height (+0.77 m, 37%) represents a genuine methodological limitation rather than a temporal effect, as crown base height in grazed
dehesas remains relatively stable due to continuous livestock browsing. This bias shows proportional characteristics (r = 0.516,
p = 0.002) and is consistent with the findings of other studies [
30] which reported an overestimation of approximately 20%.
Blázquez-Casado et al. [
61] reported similar difficulties in characterizing the lower vertical structure in mixed Mediterranean forests using low-density LiDAR, attributing this to limited laser penetration in dense canopies. The present case differs in that the lower crown has low leaf density due to livestock browsing, yet the low pulse density (2 pulses per m
2) is insufficient to detect the true ground-crown transition. These limitations were confirmed in ref. [
62] by modeling the vertical distribution of fuels with PNOA LiDAR.
4.4.3. Crown Volume
The LiDAR-derived volume underestimates the true crown volume by 134.94 m
3 (20.9%) with significant proportional bias (r = 0.629,
p < 0.001). Consequently, a regression-based correction (V
P = 51.10 + 1.164 × V
L, R
2 = 0.952) is required. Zhu et al. [
51] in their comprehensive review of crown volume methods, reported that computational geometry approaches using airborne LiDAR typically achieve R
2 values of 0.85–0.93. The present model (R
2 = 0.952) exceeds the upper bound of that reported range, indicating a particularly strong fit for the dataset under study.
Using high-density vehicle-mounted LiDAR and a concave hull slicing method, differences of 15%–25% were found compared to manual measurements, attributing discrepancies to undetected internal crown gaps [
31]. The present 20.9% is consistent with this range, but the underlying cause differs, i.e., the overestimation of base height propagates cubic error to volume. In estimating crown volumes in olive trees using various methods [
52], a variability of 15%–25% between techniques has been reported, confirming that crown volume estimation carries inherent uncertainty even with advanced techniques.
4.5. Uncertainty Propagation from TSVM Reference Measurements
While TSVM served as the field reference (ground truth) for crown volume validation in this study, it is important to recognize that this method itself carries non-negligible uncertainty. TSVM volume estimates are subject to errors arising from silhouette delineation accuracy, camera perspective effects, scaling based on field-measured dimensions, and the assumption of rotational symmetry in asymmetric crowns. Although some of these limitations were mitigated by acquiring eight perimeter photographs per tree, previous studies [
52] have reported that different crown volume estimation methods may vary by 15%–25% even under controlled conditions.
This inherent TSVM uncertainty propagates into presented regression correction models. The reported 20.9% LiDAR underestimation should therefore be interpreted as the discrepancy between two imperfect methods rather than as absolute error relative to true crown volume. Nevertheless, the strong correlation between methods (R2 = 0.952) and the consistency of bias patterns suggest that the regression correction provides substantial practical improvement over uncorrected LiDAR volumes, even if absolute accuracy remains somewhat uncertain, because of the problematic ground truth acquisition.
4.6. Implications and Limitations
The methodological flow DBSCAN → Concaveman → volume calculation by slices is fully automated and scalable to large areas, a critical advantage over supervised methods or those requiring manual processing. The validity of historical LiDAR data (6 years) significantly extends the usefulness of the national PNOA archive, which is updated periodically, but is freely accessible.
The main limitations are: (1) insufficient density of 2 LiDAR pulses per m2 for detailed vertical structure, (2) specific correction equations for dehesa formations with cattle grazing and Q. ilex, requiring validation in other contexts, and (3) the TSVM reference method assumes imperfect rotational symmetry in real crowns, although in this study this limitation has been overcome by taking 8 perimeter photographs.
Future research should evaluate whether machine learning techniques [
9] improve crown boundary detection in low-density LiDAR and validate the approach in Iberian
dehesas with different compositions (dominant or mixed cork oak trees) and management regimes (no grazing, pruning).
5. Conclusions
This study validates the use of the low-density airborne LiDAR technique from the PNOA (2 pulses per m2) for morphological characterization of trees in Mediterranean dehesa formations, demonstrating that historical data from six years remain operationally valid with predictable biases. The site-specific correction equations developed here provide a methodological framework that should be recalibrated and validated before application to other dehesa contexts with different management regimes, species compositions, or environmental conditions.
The main methodological contribution is the successful application of DBSCAN for automatic and unsupervised segregation of tree vegetation units, achieving an F-score of 98.5% without requiring the specification of the number of clusters in advance, overcoming the limitations of partitioning algorithms. The combination with the Concaveman approach for crown delineation provides a fully automated workflow from raw point clouds to accurate vector geometries.
Validation results on 35 individual Q. ilex trees show: (1) diameters with R2 = 0.985 and LiDAR-GNSS difference of −0.96 m without proportional bias, (2) total height with a difference of −0.34 m consistent with expected growth, but (3) crown base height with an overestimation of +0.77 m and significant proportional bias due to insufficient density for vertical structure, and (4) volume with an underestimation of 20.9% and R2 = 0.952, correctable with provided regression. While quantitative validation was restricted to isolated trees due to ground-truth measurement constraints, the visual concordance between LiDAR-derived delineations and orthophotography across all vegetation supports the broader applicability of the methodology.
The correction equations developed (height, diameter, volume) substantially improve the practical usefulness of the PNOA-LiDAR datasets for forest management in dehesa systems facilitating the integration of historical data and the evaluation of temporal changes in tree structure. The DBSCAN + Concaveman methodology is scalable at regional level, presenting new opportunities for automated forest inventories across millions of hectares of Spanish dehesas and Portuguese montados.