A Methodology for the Automated Delineation of Crop Tree Crowns from UAV-Based Aerial Imagery by Means of Morphological Image Analysis

: The popularisation of aerial remote sensing using unmanned aerial vehicles (UAV), has boosted the capacities of agronomists and researchers to offer farmers valuable data regarding the status of their crops. This paper describes a methodology for the automated detection and individual delineation of tree crowns in aerial representations of crop ﬁelds by means of image processing and analysis techniques, providing accurate information about plant population and canopy coverage in intensive-farming orchards with a row-based plant arrangement. To that end, after pre-processing initial aerial captures by means of photogrammetry and morphological image analysis, a resulting binary representation of the land plot surveyed is treated at connected component-level in order to separate overlapping tree crown projections. Then, those components are morphologically transformed into a set of seeds with which tree crowns are ﬁnally delineated, establishing the boundaries between them when they appear overlapped. This solution was tested on images from three different orchards, achieving semantic segmentations in which more than 94% of tree canopy-belonging pixels were correctly classiﬁed, and more than 98% of trees were successfully detected when assessing the methodology capacities for estimating the overall plant population. According to these results, the methodology represents a promising tool for automating the inventorying of plants and estimating individual tree-canopy coverage in intensive tree-based orchards.


Introduction
Modern agricultural practices developed around the precision agriculture (PA) paradigm, demand data-collecting systems for assembling information regarding the spatial and temporal variability of those factors of influence in agricultural production [1]. This datadriven approach is aimed at developing decision-making frameworks to help farmers with their daily tasks [2], thus supporting the eventual optimisation of the farming-inputs management and favouring the improvement of the overall agricultural activity in terms of crop productivity, sustainability, and profitability [3].
Hence, non-invasive data acquisition regarding crops condition has become of great interest to the agricultural sector [4,5]. Thus, extensive related research has been carried out on the development of solutions aimed at improving farming processes, by developing decision support models by means of remotely sensed data [6][7][8][9][10][11]. In this sense, aerial

Case Study Site
The developed methodology has been tested with three different types of tree-based crops: olive (Olea europaea L.), lemon (Citrus limon L.), and orange (Citrus sinesis L.) trees. To that end, three different orchards, one per crop, were surveyed.
In the case of the olive trees, the grove under study is in Gibraleón, province of Huelva (Andalusia, Southwest Spain). This land plot, centred in DMS coordinates 7 • 02 48.44 W and 37 • 20 39.80 N, spreads over an area of approximately 17.5 ha, with a total of 3916 olive trees planted. On the other hand, the two citrus-tree orchards can also be found in this province, in the surroundings of the village of La Redondela, Isla Cristina. The lemon-tree orchard is located centred in the coordinates 7 • 17 06.93 W and 37 • 14 07.24 N, and it has an approximate extent of 1.37 ha and 552 trees planted. The orange-tree orchard can be found in the coordinates 7 • 18 06.75 W and 37 • 13 49.55 N, with an area of 1.94 ha and 781 trees. Figure 1 shows aerial images of these three orchards. It should be remarked that none of these aerial images properly belong to the present study, as they were taken from the Spanish National Geographic Institute ("Instituto Geográfico Nacional") [46]. They were not used at any moment throughout the development, nor the validation of the methodology described in this paper, so they are proposed just for illustrative purposes.

Case Study Site
The developed methodology has been tested with three different types of tree-based crops: olive (Olea europaea L.), lemon (Citrus limon L.), and orange (Citrus sinesis L.) trees. To that end, three different orchards, one per crop, were surveyed.
In the case of the olive trees, the grove under study is in Gibraleón, province of Huelva (Andalusia, Southwest Spain). This land plot, centred in DMS coordinates 7°02′48.44″ W and 37°20′39.80″ N, spreads over an area of approximately 17.5 ha, with a total of 3916 olive trees planted. On the other hand, the two citrus-tree orchards can also be found in this province, in the surroundings of the village of La Redondela, Isla Cristina. The lemon-tree orchard is located centred in the coordinates 7°17′06.93″ W and 37°14′07.24″ N, and it has an approximate extent of 1.37 ha and 552 trees planted. The orange-tree orchard can be found in the coordinates 7°18′06.75″ W and 37°13′49.55″ N, with an area of 1.94 ha and 781 trees. Figure 1 shows aerial images of these three orchards. It should be remarked that none of these aerial images properly belong to the present study, as they were taken from the Spanish National Geographic Institute ("Instituto Geográfico Nacional") [46]. They were not used at any moment throughout the development, nor the validation of the methodology described in this paper, so they are proposed just for illustrative purposes. It should be underscored that they all show a regular row-based planting pattern. In this respect, it is worth noting that in the case of the orange grove, there is a very high degree of overlap between intra-row adjacent trees. This is the reason it was selected for the study, with the aim of carrying out a sort of stress test with which to evaluate the developed methodology. It should be underscored that they all show a regular row-based planting pattern. In this respect, it is worth noting that in the case of the orange grove, there is a very high degree of overlap between intra-row adjacent trees. This is the reason it was selected for the study, with the aim of carrying out a sort of stress test with which to evaluate the developed methodology.

Image Acquisition Equipment
Aerial images of the land plots under study were collected with a multispectral sensor MicaSense RedEdge-M TM (MicaSense, Inc., Seattle, WA, USA). This device is able to simultaneously capture in five different discrete spectral bands, and its main features are summarised in Table 1. The sensing equipment was completed with a GPS module, directly connected to the camera for georeferencing the images as they were captured, and a 5-band downwelling light sensor (DLS). From the data collected by this latter sensor, it is possible to make corrections in flight regarding global lighting changes. In addition, a MicaSense Reflectance Panel was included in the setup. Captures of this panel provide accurate information regarding the amount of light reaching the ground when performing image acquisition, so these data are used for transforming raw pixels values to absolute reflectance when later processing the collected images.
As UAV, a DJI TM Matrice 100 (SZ DJI Technology Co., Ltd., Shenzhen, Guangdong, China) quadcopter was used for the olive case study. On the other hand, for the two remaining cases corresponding to the citrus-tree orchards, image acquisition was performed with a DJI TM Phantom 3 (SZ DJI Technology Co., Ltd., Shenzhen, Guangdong, China). Because of the dimensions of both orchards, this task could be carried out with a smaller UAV with lesser capabilities. The main characteristics of both platforms are listed in Table 2. The UAVs, along with the sensing equipment specified in Table 1, are also shown in Figure 2.  Advance-based setup.

Flights Planning and Execution
All those flights carried out for image acquisition were planned with the DJIFlight-Planner TM (AeroScientific-Spatial Scientific Pty. Ltd., Adelaide, Australia; Photometrix, Kew, Australia; Dechant Consulting Services Inc., Beaux Arts Village, WA, USA) software. They all shared the same configuration, aiming at capturing aerial images with a forward overlap of 85%, a lateral overlap of 65%, and a ground sample distance (GSD) of 4.8 cm. Flight missions were performed with a height of 70 m with regard to the take-off point, a cruising speed of 4.2 m/s, and time between captures of 1.5 s. They were executed autonomously in GPS mode and monitored with the ground control software Litchi (VC Technology, Ltd., London, UK).
The olive grove was flown on 13 June 2019, approximately between 11 a.m. and 1 p.m. 8865 images were taken per spectral band, yielding a total of 44,325 captures. In the case of citrus trees, two different flights were made, one for each type of crop. Both flights took place on 1 March 2020, in the same time slot as in the olive grove case.
On the other hand, a total of 1520 images, 304 per spectral band, were taken in the orange grove. 1580 (316 per spectral band) images were acquired in the case of the lemontree orchard. In Figure 3, some examples of images captured during those flights are shown. Specifically, they correspond to the orange-tree orchard.

Flights Planning and Execution
All those flights carried out for image acquisition were planned with the DJIFlightPlanner TM (AeroScientific-Spatial Scientific Pty. Ltd., Adelaide, Australia; Photometrix, Kew, Australia; Dechant Consulting Services Inc., Beaux Arts Village, WA, USA) software. They all shared the same configuration, aiming at capturing aerial images with a forward overlap of 85%, a lateral overlap of 65%, and a ground sample distance (GSD) of 4.8 cm. Flight missions were performed with a height of 70 m with regard to the take-off point, a cruising speed of 4.2 m/s, and time between captures of 1.5 s. They were executed autonomously in GPS mode and monitored with the ground control software Litchi (VC Technology, Ltd., London, UK).
The olive grove was flown on 13 June 2019, approximately between 11 a.m. and 1 p.m. 8865 images were taken per spectral band, yielding a total of 44,325 captures. In the case of citrus trees, two different flights were made, one for each type of crop. Both flights took place on 1 March 2020, in the same time slot as in the olive grove case.
On the other hand, a total of 1520 images, 304 per spectral band, were taken in the orange grove. 1580 (316 per spectral band) images were acquired in the case of the lemontree orchard. In Figure 3, some examples of images captured during those flights are shown. Specifically, they correspond to the orange-tree orchard.

Flights Planning and Execution
All those flights carried out for image acquisition were planned with the DJIFlight-Planner TM (AeroScientific-Spatial Scientific Pty. Ltd., Adelaide, Australia; Photometrix, Kew, Australia; Dechant Consulting Services Inc., Beaux Arts Village, WA, USA) software. They all shared the same configuration, aiming at capturing aerial images with a forward overlap of 85%, a lateral overlap of 65%, and a ground sample distance (GSD) of 4.8 cm. Flight missions were performed with a height of 70 m with regard to the take-off point, a cruising speed of 4.2 m/s, and time between captures of 1.5 s. They were executed autonomously in GPS mode and monitored with the ground control software Litchi (VC Technology, Ltd., London, UK).
The olive grove was flown on 13 June 2019, approximately between 11 a.m. and 1 p.m. 8865 images were taken per spectral band, yielding a total of 44,325 captures. In the case of citrus trees, two different flights were made, one for each type of crop. Both flights took place on 1 March 2020, in the same time slot as in the olive grove case.
On the other hand, a total of 1520 images, 304 per spectral band, were taken in the orange grove. 1580 (316 per spectral band) images were acquired in the case of the lemontree orchard. In Figure 3, some examples of images captured during those flights are shown. Specifically, they correspond to the orange-tree orchard.  In order to accomplish the main goal of the developed methodology, which is the individual delineation of the tree crowns boundaries in aerial images, it is necessary to transform initial captures into a data structure suitable for that purpose. The first pre-processing stage was carried out by the application of the procedure proposed by Sarabia et al. [45]. It puts forward a comprehensive method to automatically process multispectral aerial images of a land plot, with the objective of obtaining a unique binary image representative of it, where crop tree-belonging pixel regions are segmented from the ground, and location points of each individual plant are estimated. To that end, drawing from the set of aerial captures, an initial photogrammetric imaging procedure generates a 3D-point representation of the plot to survey. In this case, PIX4Dmapper TM (Pix4D, Prilly, Switzerland) was used as photogrammetry software for the purpose of yielding a highdensity point cloud for each case study addressed. It should be noted that this software was configured such that each resulting 3D point could be correctly re-projected in a minimum of 3 different images, and no image scaling was applied. Once the 3D-point representations were obtained, corresponding digital height models (DHM) were computed by applying an inverse distance weighing (IDW) interpolation [47]. This task was conducted using ArcGis TM 10.3 (Esri, Inc., Redlands, CA, USA) along with its Geostatistical Analyst Tools extension. For this procedure, the software was configured for a fixed neighbourhood search, with a number of input neighbours up to 4, a search radius of 10 m, and a weighting exponent of 2. Then, each DHM was approached as a greyscale image. Height information at each point was treated as a pixel-intensity value, thus the models were processed by means of morphological analysis and statistical thresholding with the aim of segmenting those pixel-areas belonging to crop-tree canopies from the rest of that image. To that end, a background estimation was computed for each DHM-image, aiming at filtering each tree-crown projection by replacing the grey values of their corresponding pixels with the minimum intensity value found in the set of ground pixels present in their nearest surroundings. For that purpose, the process is carried out by iteratively applying a set of morphological opening operations [48], using a disk-shaped structuring element whose radius, β, is increased at each step, in an attempt to offer the optimum filtering for each tree canopy. The stop condition occurs when this kernel is large enough to completely contain the biggest tree-crown projection in the image. Thus, the maximum value to be reached by the radius of the structuring element corresponds to one of the two parameters that need to be set in order to apply this methodology, so it should be configured to guarantee that the corresponding structuring element is able to cover the largest individual canopy. The second required parameter relates to the height below which regional maxima of the image are discarded, assuming that tree crowns are represented at higher elevations, thus favouring their segmentation. Hence, it should be configured in accordance with the height expected for the tree crops. Table 3 shows the values of both configuration parameters for each case study addressed. After subtracting the background estimations from the corresponding grayscale representations of the DHMs, Otsu's method [49] was applied for eventually binarising those images, thus yielding the eventual canopy mass-ground segmentation. Finally, the estimation of the tree location points was undertaken by analysing the size of each of the connected components in the resulting segmentations, with regard to the dominant tree crown size in the binary image. These connected components can be defined as those groupings of pixels with the same intensity value and a neighbourhood relationship (8-connectivity was considered in this study) [50], which, in this case, potentially represent the projections of tree canopies. In Figure 4, the workflow followed by this methodology is illustrated.
After subtracting the background estimations from the corresponding grayscale representations of the DHMs, Otsu's method [49] was applied for eventually binarising those images, thus yielding the eventual canopy mass-ground segmentation. Finally, the estimation of the tree location points was undertaken by analysing the size of each of the connected components in the resulting segmentations, with regard to the dominant tree crown size in the binary image. These connected components can be defined as those groupings of pixels with the same intensity value and a neighbourhood relationship (8connectivity was considered in this study) [50], which, in this case, potentially represent the projections of tree canopies. In Figure 4, the workflow followed by this methodology is illustrated.  [45], aimed at generating a binary image from aerial multispectral captures of a land plot, where canopy coverage is shown segmented from the ground, and the set of location points of the crop trees in the plot is generated.

Image Processing
The starting point of the procedure here presented, which comprises the core of the methodology, is a binary image representative of the orchard understudy, and where tree canopies are segmented from the ground, along with a set of potential location points for the individual crop plants in that image. Figure 5 shows, as an example, the binary image resulting from pre-processing the aerial captures collected in one of the study sites considered in this work. Additionally, the estimated tree location points are represented on it.  [45], aimed at generating a binary image from aerial multispectral captures of a land plot, where canopy coverage is shown segmented from the ground, and the set of location points of the crop trees in the plot is generated.

Image Processing
The starting point of the procedure here presented, which comprises the core of the methodology, is a binary image representative of the orchard understudy, and where tree canopies are segmented from the ground, along with a set of potential location points for the individual crop plants in that image. Figure 5 shows, as an example, the binary image resulting from pre-processing the aerial captures collected in one of the study sites considered in this work. Additionally, the estimated tree location points are represented on it.  . Binary image resulting from pre-processing aerial captures of the orange grove-case study, according to the methodology proposed by Sarabia et al. [45], represented together with the estimated tree location points (red circles). Note in the zoomed area, highlighted in red, the occurrences of overlapping tree crowns.
Hence, given both pieces of information, the general idea behind the developed methodology goes through the individual processing of each of the connected components. It focuses on transforming those components into small disjoint groupings of pixels, somehow representative of each individual tree contained within them. These sorts of pixel-seeds are aimed at backing the eventual splitting of the connected component according to the plants contained. Hence, they are used to support the final individual tree coverage segmentation, by a Watershed transform-based operation. This process of transforming and segmenting the connected components is mainly based on the application of different morphological operators, which are formally defined in Appendix A, so the reader is encouraged to consult it for further information. The whole developed methodology that is presented in detail hereafter is briefly described in the flowchart shown in Figure 6.
Given the binary image IBIN yielded by the preprocessing, each of its connected components is individually processed, as mentioned above. First, it is checked the number of trees in the component, estimated during preprocessing. Given a connected component of IBIN, cci, if the estimated number of trees contained in it, nTi, is just one, it is not necessary to conduct any processing at all. When nTi is greater than one, its corresponding pixel region is initially isolated in an alternative intensity matrix, embedding it in the smallest all-black-pixel box that can contain it. For this auxiliary image, its Euclidean distance transform [51] is calculated. Thus, given this binary image ICCi, which encloses cci in its proper bounding box, its distance transform, IDTi, is calculated in accordance with the following definitions: where, given the binary image f and the pixel p, [DT(f)(p)] refers to the Euclidean distance from p to its closest background pixel. The resulting matrix, IDTi, can be seen and subsequently treated as a greyscale image, as illustrated in Figure 7. . Binary image resulting from pre-processing aerial captures of the orange grove-case study, according to the methodology proposed by Sarabia et al. [45], represented together with the estimated tree location points (red circles). Note in the zoomed area, highlighted in red, the occurrences of overlapping tree crowns.
Hence, given both pieces of information, the general idea behind the developed methodology goes through the individual processing of each of the connected components. It focuses on transforming those components into small disjoint groupings of pixels, somehow representative of each individual tree contained within them. These sorts of pixelseeds are aimed at backing the eventual splitting of the connected component according to the plants contained. Hence, they are used to support the final individual tree coverage segmentation, by a Watershed transform-based operation. This process of transforming and segmenting the connected components is mainly based on the application of different morphological operators, which are formally defined in Appendix A, so the reader is encouraged to consult it for further information. The whole developed methodology that is presented in detail hereafter is briefly described in the flowchart shown in Figure 6.
Given the binary image I BIN yielded by the preprocessing, each of its connected components is individually processed, as mentioned above. First, it is checked the number of trees in the component, estimated during preprocessing. Given a connected component of I BIN , cc i , if the estimated number of trees contained in it, nT i , is just one, it is not necessary to conduct any processing at all. When nT i is greater than one, its corresponding pixel region is initially isolated in an alternative intensity matrix, embedding it in the smallest all-black-pixel box that can contain it. For this auxiliary image, its Euclidean distance transform [51] is calculated. Thus, given this binary image I CCi , which encloses cc i in its proper bounding box, its distance transform, I DTi , is calculated in accordance with the following definitions:     Subsequently, the regional maxima of the resulting intensity matrix I DTi are explored as tree representative. The goal here is to isolate those more representative regional maxima, as they can be expected to take place where the centres of mass of the trees within the component are potentially located. This is carried out on the basis of the h-maxima transform, which allows us to remove from I DTi all maxima with depth lower or equal than scalar h. It can be computed as the morphological reconstruction by dilation (R δ ) of I DTi from the marker image resulting from subtracting h from the intensity value of each pixel in I DTi . Thus, irrelevant maxima are firstly removed from I DTi by computing: The surviving regional maxima in I DTi,h are those considered representatives and, therefore, are extracted from the image. It is achieved by subtracting from I DTi,h the result of suppressing from it all the remaining regional maxima: Note that the rightmost term of the subtraction is the h-maxima transform setting the value of h to 1. This value enables the removal of all remaining regional maxima from I DTi,h , as all of them will have, at least, that height. Then, by thresholding I RMAXi,h , it can be achieved a binary mask containing those maxima representing the centre of mass of the trees: The result of this process depends on the definition of the h value in Equation (4). The irregularity of the shape of trees, which can be corroborated in Figure 7, may force the appearance of peripheral regional maxima of reduced height. To avoid the selection of any of these non-representative maxima, the procedure described in Equations (4)-(6) is iteratively performed for increasing h values starting from 1; being the value selected that fulfilling to be the maximum h value, such that the number of segmented maxima contained in I RMAXBINi,h is equal to the number of estimated trees nT i : where for a given binary image f , #CC(f) refers to the number of connected components existing in it. Figure 8 shows the maxima extracted at different h values for the connected component previously proposed in Figure 7.
Agronomy 2022, 11, x FOR PEER REVIEW 11 of 28 Subsequently, the regional maxima of the resulting intensity matrix IDTi are explored as tree representative. The goal here is to isolate those more representative regional maxima, as they can be expected to take place where the centres of mass of the trees within the component are potentially located. This is carried out on the basis of the h-maxima transform, which allows us to remove from IDTi all maxima with depth lower or equal than scalar h. It can be computed as the morphological reconstruction by dilation ( ) of IDTi from the marker image resulting from subtracting h from the intensity value of each pixel in IDTi. Thus, irrelevant maxima are firstly removed from IDTi by computing: The surviving regional maxima in IDTi,h are those considered representatives and, therefore, are extracted from the image. It is achieved by subtracting from IDTi,h the result of suppressing from it all the remaining regional maxima: Note that the rightmost term of the subtraction is the h-maxima transform setting the value of h to 1. This value enables the removal of all remaining regional maxima from IDTi,h, as all of them will have, at least, that height. Then, by thresholding IRMAXi,h, it can be achieved a binary mask containing those maxima representing the centre of mass of the trees: The result of this process depends on the definition of the h value in Equation (4). The irregularity of the shape of trees, which can be corroborated in Figure 7, may force the appearance of peripheral regional maxima of reduced height. To avoid the selection of any of these non-representative maxima, the procedure described in Equations (4)-(6) is iteratively performed for increasing h values starting from 1; being the value selected that fulfilling to be the maximum h value, such that the number of segmented maxima contained in IRMAXBINi,h is equal to the number of estimated trees nTi: where for a given binary image f, #CC(f) refers to the number of connected components existing in it. Figure 8 shows the maxima extracted at different h values for the connected component previously proposed in Figure 7. From this binary representation of the maxima, IMAXi, an ad-hoc image is automatically generated, and eventually used as a marker image to perform the individual segmentation of each tree crown within the initial component cci. This new image, named hereafter ISEEDSi, results from transforming IMAXi such that each of its components is replaced by an all-white pixels-circle with the same area. This is, for each connected component belonging to IMAXi, a corresponding circle-shaped connected component can be found From this binary representation of the maxima, I MAXi , an ad-hoc image is automatically generated, and eventually used as a marker image to perform the individual segmentation of each tree crown within the initial component cc i . This new image, named hereafter I SEEDSi , results from transforming I MAXi such that each of its components is replaced by an all-white pixels-circle with the same area. This is, for each connected component belonging to I MAXi , a corresponding circle-shaped connected component can be found in I SEEDSi , with the same pixel-area and centre of mass. Figure 9 shows an example of this ad-hoc image, obtained in this specific case from the initially connected component presented in Figure 7.
Agronomy 2022, 11, x FOR PEER REVIEW 12 of 28 in ISEEDSi, with the same pixel-area and centre of mass. Figure 9 shows an example of this ad-hoc image, obtained in this specific case from the initially connected component presented in Figure 7. As stated before, ISEEDSi contains the set of markers with which the initially connected component cci is finally segmented at the individual tree crown level. ISEEDSi has been generated according to the hypothesis that the isolated maxima, from which it is obtained, are somehow representative of the size and positions of the overlapped treetops in the component. Therefore, the goal at this point is the achievement of an artifact of separation between the area of influence of each circle-shaped pixel-region, assuming that it will subsequently provide a proper segmentation for the tree crowns. This approach will be exploited in the application of the Watershed transform. Indeed, given a greyscale image, this segmentation tool approaches the intensity levels as values of altitude, so the darkest regions can be seen as catchment basins. By "flooding" them, the lines of water convergence can be detected. These ridgelines are determined by the morphological features of those catchment basins, such as their shape or depth, so they can adequately split the zones of influence of the basins. Hence, the idea is to approach the circled areas as those catchment basins by employing the distance transform formulated in Equation (2). Notwithstanding, in this case, the pixel-regions corresponding to the circles are expected to be the darkest (the deepest) areas in the transformed image, so the distance transform must be redefined to compute here the distance of each pixel to the closest white one. Thus, given ISEEDSi, the grayscale image IDTSEEDSi, over which the Watershed segmentation is subsequently performed, can be mathematically defined as follows: Next, the Watershed transform is applied on IDTSEEDSi. The mathematical formulation of this morphological transform can be quite extensive, so the reader is encouraged to consult related literature already published for further study [52,53]. In this sense, it should be underscored that the Watershed transform implementation used in this study is based on an algorithm proposed in [54]. Thus, after its application on IDTSEEDSi, those ridges establishing the boundaries of the basins can be extracted as follows: where WS denotes the Watershed transform. Finally, the binary image IRIDGESi is used to separate the tree crown projections contained in the initially connected component cci: As stated before, I SEEDSi contains the set of markers with which the initially connected component cc i is finally segmented at the individual tree crown level. I SEEDSi has been generated according to the hypothesis that the isolated maxima, from which it is obtained, are somehow representative of the size and positions of the overlapped treetops in the component. Therefore, the goal at this point is the achievement of an artifact of separation between the area of influence of each circle-shaped pixel-region, assuming that it will subsequently provide a proper segmentation for the tree crowns. This approach will be exploited in the application of the Watershed transform. Indeed, given a greyscale image, this segmentation tool approaches the intensity levels as values of altitude, so the darkest regions can be seen as catchment basins. By "flooding" them, the lines of water convergence can be detected. These ridgelines are determined by the morphological features of those catchment basins, such as their shape or depth, so they can adequately split the zones of influence of the basins. Hence, the idea is to approach the circled areas as those catchment basins by employing the distance transform formulated in Equation (2). Notwithstanding, in this case, the pixel-regions corresponding to the circles are expected to be the darkest (the deepest) areas in the transformed image, so the distance transform must be redefined to compute here the distance of each pixel to the closest white one. Thus, given I SEEDSi , the grayscale image I DTSEEDSi , over which the Watershed segmentation is subsequently performed, can be mathematically defined as follows: Next, the Watershed transform is applied on I DTSEEDSi . The mathematical formulation of this morphological transform can be quite extensive, so the reader is encouraged to consult related literature already published for further study [52,53]. In this sense, it should be underscored that the Watershed transform implementation used in this study is based on an algorithm proposed in [54]. Thus, after its application on I DTSEEDSi , those ridges establishing the boundaries of the basins can be extracted as follows: where WS denotes the Watershed transform. Finally, the binary image I RIDGESi is used to separate the tree crown projections contained in the initially connected component cc i :  Each connected component cci containing multiple trees in the original binary image IBIN, is replaced by those in its corresponding processed sub-image IFINALi, resulting in a final binary image, ISEG, where every tree crown appears individually segmented.

Image Postprocessing
When undertaking the last stage of segmentation, whereby initially connected components are properly split according to the ridgelines previously yielded, over segmentation phenomena can occur [55][56][57]. Indeed, given the irregularity in terms of the silhouette of the tree crowns, it is possible for non-significant parts of a plant to be outside the boundaries established by the Watershed transform for its corresponding projection. This would potentially provoke the generation of small connected components wrongly disconnected (according to the 8-connectivity considered) from the main ones representing tree crowns (see Figure 1a). These anomalous pixel-regions are removed by applying a morphological opening followed by a morphological reconstruction to exactly recover the surviving greater connected components corresponding to tree crowns: where R refers to the reconstruction by dilation, and γβ is the morphological opening by a disc-shaped structuring element β. The size of β must be large enough to remove the anomalous small connected components, but also smaller than tree crowns to preserve them and enable their reconstruction. The considerable difference in terms of size between the anomalous and tree-crown components meant this decision was not critical, due to a wide range of values being valid. In this context, the radius of β was set to 5. The explained postprocessing aimed at removing irrelevant components is illustrated in Figure 11.  Each connected component cc i containing multiple trees in the original binary image I BIN , is replaced by those in its corresponding processed sub-image I FINALi , resulting in a final binary image, I SEG , where every tree crown appears individually segmented.

Image Postprocessing
When undertaking the last stage of segmentation, whereby initially connected components are properly split according to the ridgelines previously yielded, over segmentation phenomena can occur [55][56][57]. Indeed, given the irregularity in terms of the silhouette of the tree crowns, it is possible for non-significant parts of a plant to be outside the boundaries established by the Watershed transform for its corresponding projection. This would potentially provoke the generation of small connected components wrongly disconnected (according to the 8-connectivity considered) from the main ones representing tree crowns (see Figure 1a). These anomalous pixel-regions are removed by applying a morphological opening followed by a morphological reconstruction to exactly recover the surviving greater connected components corresponding to tree crowns: where R δ refers to the reconstruction by dilation, and γ β is the morphological opening by a disc-shaped structuring element β. The size of β must be large enough to remove the anomalous small connected components, but also smaller than tree crowns to preserve them and enable their reconstruction. The considerable difference in terms of size between the anomalous and tree-crown components meant this decision was not critical, due to a wide range of values being valid. In this context, the radius of β was set to 5. The explained postprocessing aimed at removing irrelevant components is illustrated in Figure 11.

Evaluation of the Developed Methodology
As mentioned in Section 2.1, the developed methodology was tested on images acquired in three different crop fields. The approach chosen for assessing its performance was based on the comparison of the binary images resulting from each case study with handmade binary segmentation of the tree crowns. These latter were obtained from ad-hoc images resulting from orthomosaics representatives of each land plot under study, which were in turn generated from the aerial captures collected during the image acquisition stage. Indeed, given the set of multispectral aerial captures collected for each case study, they were processed with photogrammetry software (Pix4D TM Mapper), in order to achieve high-resolution aerial representations of the corresponding land plots. Resulting in five different orthomosaics after processing, one per spectral band of capture, three of them (blue, red-edge, and near-IR) were conveniently combined to finally yield a representative aerial image of the crop, wherein it became rather easy to differentiate the individual tree crowns for a human observer. Then, by using an image editor, a manual delineation of each tree crown was carried out, producing a binary image which, as a gold-standard, was eventually compared at pixel level with the corresponding segmentation provided by the developed methodology. Figure 12 shows a proposal of the colour images generated for each case study.
(see Figure 1a). These anomalous pixel-regions are removed by applying a morphological opening followed by a morphological reconstruction to exactly recover the surviving greater connected components corresponding to tree crowns: where R refers to the reconstruction by dilation, and γβ is the morphological opening by a disc-shaped structuring element β. The size of β must be large enough to remove the anomalous small connected components, but also smaller than tree crowns to preserve them and enable their reconstruction. The considerable difference in terms of size between the anomalous and tree-crown components meant this decision was not critical, due to a wide range of values being valid. In this context, the radius of β was set to 5. The explained postprocessing aimed at removing irrelevant components is illustrated in Figure 11. tation after splitting the component in (a) using (b) -note in the zoomed area the generation of anomalous disconnected artifacts; (d) segmentation-image IFINALPOSTi obtained after postprocessing-note in the zoomed area that all elements are connected, considering 8-connectivity, and how the shape of the crowns are not affected as a consequence of the postprocessing.

Evaluation of the Developed Methodology
As mentioned in Section 2.1, the developed methodology was tested on images acquired in three different crop fields. The approach chosen for assessing its performance was based on the comparison of the binary images resulting from each case study with handmade binary segmentation of the tree crowns. These latter were obtained from adhoc images resulting from orthomosaics representatives of each land plot under study, which were in turn generated from the aerial captures collected during the image acquisition stage. Indeed, given the set of multispectral aerial captures collected for each case study, they were processed with photogrammetry software (Pix4D TM Mapper), in order to achieve high-resolution aerial representations of the corresponding land plots. Resulting in five different orthomosaics after processing, one per spectral band of capture, three of them (blue, red-edge, and near-IR) were conveniently combined to finally yield a representative aerial image of the crop, wherein it became rather easy to differentiate the individual tree crowns for a human observer. Then, by using an image editor, a manual delineation of each tree crown was carried out, producing a binary image which, as a goldstandard, was eventually compared at pixel level with the corresponding segmentation provided by the developed methodology. Figure 12 shows a proposal of the colour images generated for each case study. It should be noted that any other combination of the available orthomosaics may be valid for this purpose. In this sense, the sole objective was to obtain an image comfortable It should be noted that any other combination of the available orthomosaics may be valid for this purpose. In this sense, the sole objective was to obtain an image comfortable to work with when editing it for manually segmenting the tree crowns using an image editor.
Hence, for each case study, given the binary image resulting from applying the developed methodology, and its corresponding ground-truth image, every pixel in the first one was labelled in accordance with the next four categories: • True Positive (TP): that foreground pixel (white pixel) for which its analogous one in the ground-truth image was categorised as tree crown-belonging pixel. • False Positive (FP): that foreground pixel which was labelled as a non-tree crownbelonging pixel in the ground-truth image. • True Negative (TN): that background pixel (black pixel) such that the corresponding one in the ground-truth image was categorised as a non-tree crown-belonging pixel. • False Negative (FN): that background pixel for which its analogous one was labelled as tree crown-belonging pixel in the ground-truth image.
Based on these pixel categories, the following metrics are proposed for the purpose of assessing the accuracy of the segmentation resulting from the application of the developed methodology: • Precision (PR): this metric refers to the probability with which a given foreground pixel was correctly categorised. It can be formulated as follows: where tp and fp are, respectively, the number of pixels categorised as TP and the number of them labelled as FP.
• Recall (RC): it represents the ratio between the number of foreground pixels correctly classified and the whole set of instances of actual tree-belonging pixels in the image. Mathematically: where fn is the number of pixels labelled as FN.
• F -score : as the harmonic mean of these two metrics just proposed: • Overall Accuracy (OA): it proposes the percentage of pixels correctly classified.
• Intersection-over-Union (IoU): also known as the Jaccard index, this metric signifies the similarity between the resulting segmentation and its corresponding ground-truth image. It can be defined with the following expression: where I RESUL and I GT are the final segmentation and its binary ground-truth image, respectively. They should be approached as the sets of their corresponding foreground instances, so this metric is computed as the ratio between the number of elements in their intersection and the cardinality of their union. In addition, a secondary analysis of the results was undertaken, focused on the capacities of the proposed solution for individual tree crown identification, by comparing the connected components yielded in the resulting binary images and the actual tree crowns, represented throughout the connected components in the corresponding ground-truth image. The categories defined below can be instantiated on the basis of this comparison:

•
True Positive, at tree-level (TP t ): that connected component in the final segmentation corresponding to an actual tree crown in the binary ground-truth image.

•
False Positive, at tree-level (FP t ): that connected component in the final segmentation for which it cannot be found an analogous component, representative of a tree crown, in the ground-truth image.

•
False Negative, at tree-level (FN t ): that actual tree crown in the ground-truth segmentation not represented in the final segmentation. In other words, those tree crowns are not detected by the algorithm.
Thus, metrics equivalent to those proposed in Equations (13)- (15) were computed for the defined cases.
On the other hand, it should be noted that methodology was not only tested according to the parametrisation proposed in Section 2.4.1, but also with different configurations regarding the maximum size of the filtering kernel used during preprocessing, in order to evaluate the influence of this value on its performance.

Results
As mentioned in Section 2.5, the assessment of the developed methodology was approached in two ways: first, analysing at pixel level the semantic segmentation provided by the final binary images; next, by quantifying its performance in terms of individual tree crown detection. All this, for each of the three different cases considered for testing.
Regarding the capacities of the developed methodology for providing accurate segmentation of individual tree crowns in a binary representation of the field plot, Table 4 summarises the results calculated for the metrics proposed for that purpose. observer. In this sense, given the nature of the images under study and the complexity of this process, it is not possible to guarantee that the different case studies were represented with the same degree of accuracy by their corresponding ground-truth images.
Agronomy 2022, 11, x FOR PEER REVIEW 17 of 28 Figure 13. Sub-images of the final segmentations obtained for each considered case study: (a) lemontree orchard; (b) orange-tree orchard; (c) olive grove.
On the other hand, Table 5 presents the performance of the developed methodology when detecting individual tree crowns projections, and subsequently providing an estimation of the plant population. As can be corroborated, the performance is consistent for the three crops studied, showing only slight deviations. At this point, it should be remembered that the initial estimation of the trees embedded in the connected components is realised on the basis of a dominant tree crown size within the whole population of tree projections in the binary representation to process. This reference measure, obtained during preprocessing, is used as a basis for partitioning the major diameter of each component, yielding such an estimation. Specifically, this value is obtained from the initial binary image representative of the orchard, computing for each connected component the ellipse with its same normalised second central moment [58]; it is taken from the whole set of minor axis lengths of all the ellipses obtained, as the maximum one. This reference value is increased by 20% before being used for considering certain tolerance. Finally, the number of trees in a given connected component is On the other hand, Table 5 presents the performance of the developed methodology when detecting individual tree crowns projections, and subsequently providing an estimation of the plant population. As can be corroborated, the performance is consistent for the three crops studied, showing only slight deviations. taken from the whole set of minor axis lengths of all the ellipses obtained, as the maximum one. This reference value is increased by 20% before being used for considering certain tolerance. Finally, the number of trees in a given connected component is determined by dividing its major axis length by this value. There are some related phenomena that may occur during this subprocess, that can condition the result of the individual tree crown delineation. Thus, small tree crowns that are actually overlapped may be detected as unique plants. It should be emphasised that these small tree-aggregations happen very occasionally, and mostly at the end of the planting rows, where for some reason there is a certain tendency to plant the trees closer to each other. Likewise, regular adjacent tree crowns with a high degree of canopy overlapping may exhibit this problem. This may provoke an underestimation, which is subsequently translated to the eventual individual segmentation, as the number of predicted trees is used as a threshold when retrieving relevant maxima during the procedure. Consequently, individual tree crown-pixel regions may not be properly split when underestimating the number of crop trees in a given connected component. This will cause an increase in false negatives at the tree level (FN t ). Furthermore, the opposite phenomenon can occur, i.e., the overestimation of the number of potential tree projections embedded in each component. This may happen mostly with large connected components, with occurrences of rather big individual tree canopies. This overestimation involves the generation of a larger number of segmentation seeds than should be expected, and a subsequent oversegmentation of the corresponding connected components, which in turn results in the appearance of new false positives (FP t ). Both issues are illustrated in Figure 14.
Agronomy 2022, 11, x FOR PEER REVIEW 18 of 28 determined by dividing its major axis length by this value. There are some related phenomena that may occur during this subprocess, that can condition the result of the individual tree crown delineation. Thus, small tree crowns that are actually overlapped may be detected as unique plants. It should be emphasised that these small tree-aggregations happen very occasionally, and mostly at the end of the planting rows, where for some reason there is a certain tendency to plant the trees closer to each other. Likewise, regular adjacent tree crowns with a high degree of canopy overlapping may exhibit this problem. This may provoke an underestimation, which is subsequently translated to the eventual individual segmentation, as the number of predicted trees is used as a threshold when retrieving relevant maxima during the procedure. Consequently, individual tree crownpixel regions may not be properly split when underestimating the number of crop trees in a given connected component. This will cause an increase in false negatives at the tree level (FNt). Furthermore, the opposite phenomenon can occur, i.e., the overestimation of the number of potential tree projections embedded in each component. This may happen mostly with large connected components, with occurrences of rather big individual tree canopies. This overestimation involves the generation of a larger number of segmentation seeds than should be expected, and a subsequent oversegmentation of the corresponding connected components, which in turn results in the appearance of new false positives (FPt). Both issues are illustrated in Figure 14. Nevertheless, prior estimations of the trees contained in each connected component were highly and comparably accurate in all the three case studies, thus limiting the specific occurrences of the above-mentioned segmentation problems, and the overall performance of the methodology may be concluded as highly positive.
Finally, Table 6 collects results of applying the whole procedure for different sizes of the filtering-kernel radius, all in a range of plus/minus 30 pixels with respect to the initial parameter-value fixed for each case study (Table 3), for the purpose of assessing its sensibility and impact on the overall performance. Table 6. Results of the developed methodology in terms of accuracy of semantic segmentation and individual crop-tree identification, regarding the maximum radius size of the preprocessing filtering kernel, rβ.

Semantic Segmentation Accuracy
Individual Tree Detection Case Study rβ (pixels)  Nevertheless, prior estimations of the trees contained in each connected component were highly and comparably accurate in all the three case studies, thus limiting the specific occurrences of the above-mentioned segmentation problems, and the overall performance of the methodology may be concluded as highly positive.
Finally, Table 6 collects results of applying the whole procedure for different sizes of the filtering-kernel radius, all in a range of plus/minus 30 pixels with respect to the initial parameter-value fixed for each case study (Table 3), for the purpose of assessing its sensibility and impact on the overall performance.  6 PR t = tp t /(tp t + fp t ). 7 RC t = tp t /(tp t + fn t ). 8 As commented in Section 2.4.1, the maximum size to be reached by the structuring element used for image filtering should ideally be the smallest possible to cover the largest tree crown in the image. However, this parameter is manually selected, and estimating an appropriate value for it can be complex, especially when the number of trees under study is large. As expected, the poorest results were obtained when underestimating the maximum size of the kernel. Indeed, if the largest kernel is not able to cover some of the individual canopies, these may not be properly filtered, so resulting residual maxima, being then part of the background estimation, are wrongly subtracted from the initial DHM intensity matrix, thus affecting the subsequent canopy-ground segmentation, and consequently the overall performance of the methodology. The accuracy of the segmentation at pixel level is compromised by wrongly discarding those regions of the canopies that could not be properly filtered out, increasing the number of pixels classified as FN. On the other hand, tree crowns can hardly be uniquely represented by a single connected component due to oversegmentation, impoverishing the capabilities of the proposed solution to automate plant detection as the number of false positives (FP t ) explodes. Indeed, some additional tests were carried out, using extremely small values as the limit for the kernel radius, with the aim of confirming this issue. Thus, for the olive case study, the methodology was applied with a maximum radius of 10 pixels, resulting in a final segmentation with a total number of 8795 connected components. Given an actual population of 3916 trees in the olive groove, this segmentation would reach 4879 FP t in the best-case scenario, implying a very poor performance in terms of individual tree identification. For the purpose of illustrating this issue, Figure 15 shows a canopy-ground segmentation of the orange-tree orchard, carried out with a maximum size for the filtering kernel too small to satisfy the requirements of the methodology (15 pixels), as an extreme example of the problems arising from underestimating this parameter, where it can be visualised some of the issues just commented.  Contrary, when overestimating the maximum size of the kernel it was not observed significant impoverishment of the results. In fact, even when performing stress tests with extremely large radius values, in the order of several hundred pixels, the results remained within similar ranges. However, caution must be exercised in drawing conclusions in this respect, as it may be possible that other case studies with important differences in terms of ground elevation or significant variability of trees' height would yield inadequate results if too large structuring elements are used. In general, based on the results reported, there is a sweet spot for this parameter, which it may be found around the average radius of the population of tree-crowns projections under study. The criterion suggested by the authors of the procedure, based on reaching a size large enough to cover the largest of the trees, seems appropriate, given that according to the results, the performance of the procedure is mainly compromised when the estimation of the background is carried out with filtering structuring elements that are too small, and not necessarily when this parameter is overestimated.

Discussion
Reported results regarding the assessment of the developed methodology, suggest its viability as a tool for the automated counting and delineation of individual tree canopy projections in aerial representations. Furthermore, its performance was comparable in the three cultivars tested, reinforcing its potential applicability in cultivation areas regardless of the type of crop. Notwithstanding, despite these compelling results, there are some points that are noteworthy.
In accordance with results reported, the number of undetected trees after individual canopy delineation is quite low, even in the orange orchard-case study, in which it can be observed large connected components where the pixel subregions to be split appear deeply fused, comprising a scenario prone to the underestimation of the tree crowns enclosed in those components during preprocessing; and by extension, to the occurrence of false negatives (FNt). Despite this, it must not be overlooked that any inaccuracy of the initial estimation of the number of trees contained in each connected component, which is performed at preprocessing stage, may affect the quality of the final segmentation. In this vein and as already mentioned, the number of trees in the aerial binary representation is estimated on the basis of a reference value during this preprocessing stage. This value, after automatically computed, is used to rather split the major diameter of each connected component predicting the number of plants contained. The obtention of this reference value might be approached in different ways for the purpose of slightly improving the overall performance. Indeed, different possibilities were considered during the experimentation process, as individual reference values were calculated from each connected component instead of a global one, but no tangible improvements were observed with Contrary, when overestimating the maximum size of the kernel it was not observed significant impoverishment of the results. In fact, even when performing stress tests with extremely large radius values, in the order of several hundred pixels, the results remained within similar ranges. However, caution must be exercised in drawing conclusions in this respect, as it may be possible that other case studies with important differences in terms of ground elevation or significant variability of trees' height would yield inadequate results if too large structuring elements are used. In general, based on the results reported, there is a sweet spot for this parameter, which it may be found around the average radius of the population of tree-crowns projections under study. The criterion suggested by the authors of the procedure, based on reaching a size large enough to cover the largest of the trees, seems appropriate, given that according to the results, the performance of the procedure is mainly compromised when the estimation of the background is carried out with filtering structuring elements that are too small, and not necessarily when this parameter is overestimated.

Discussion
Reported results regarding the assessment of the developed methodology, suggest its viability as a tool for the automated counting and delineation of individual tree canopy projections in aerial representations. Furthermore, its performance was comparable in the three cultivars tested, reinforcing its potential applicability in cultivation areas regardless of the type of crop. Notwithstanding, despite these compelling results, there are some points that are noteworthy.
In accordance with results reported, the number of undetected trees after individual canopy delineation is quite low, even in the orange orchard-case study, in which it can be observed large connected components where the pixel subregions to be split appear deeply fused, comprising a scenario prone to the underestimation of the tree crowns enclosed in those components during preprocessing; and by extension, to the occurrence of false negatives (FN t ). Despite this, it must not be overlooked that any inaccuracy of the initial estimation of the number of trees contained in each connected component, which is performed at preprocessing stage, may affect the quality of the final segmentation. In this vein and as already mentioned, the number of trees in the aerial binary representation is estimated on the basis of a reference value during this preprocessing stage. This value, after automatically computed, is used to rather split the major diameter of each connected component predicting the number of plants contained. The obtention of this reference value might be approached in different ways for the purpose of slightly improving the overall performance. Indeed, different possibilities were considered during the experimentation process, as individual reference values were calculated from each connected component instead of a global one, but no tangible improvements were observed with any of the alternatives proposed. That said, despite there being little room for improvement in this sense, further research may focus on this topic, exploring new ways to predict the potential number of tree crowns embedded in the components to split.
With regard to the generation of ad-hoc segmentation seeds representatives of each of the pixel subregions to be split, there are also some aspects to be discussed. Indeed, in a near-ideal scenario, with a high level of homogeneity in terms of form and size of the tree crowns and a regular spacing pattern between plants, the location points provided by the methodology used during preprocessing, approached as single white pixels, could properly work as representative of the centre of mass of the corresponding tree projections. They could subsequently be used as proper segmentation seeds from which to determine the boundaries of each subcomponent, thus making it unnecessary to calculate ad-hoc segmentation seeds, as representative artifacts of the spatial influence of each plant within each component. However, this can hardly be expected in a real setting, and occurrences of adjacent overlapping tree crowns with unbalanced sizes are usual. In those cases, the corresponding estimated location points may appear displaced from the hypothetical centres of mass of the tree projections. This is because of the way they are calculated from the aforementioned global reference value, somehow assuming every crop tree has the same size within a given plot. Hence, they cannot properly define the pixel-region influenced by each tree, and subsequently, they cannot ensure an accurate individual tree crown segmentation.
Another issue to be addressed is related to the fact that the solution here proposed does not use vegetation indices, nor any kind of machine learning technology. As already mentioned, one of the main goals pursued during the research has been the applicability of the methodology regardless of the type of tree-based crop to be studied. In this sense, machine learning approaches are based on the development of classification models from training sets, which in turn are fed with representative data of the visual features of the elements to be identified; in this case, tree canopies. So, there is a dependency between the classification models achieved and the kinds of plants with which the corresponding training sets have been generated, thus restricting the scope of application, and hence compromising the generality of the solution. Notwithstanding, this does not invalidate these kinds of approaches. Indeed, proposals based on this form of artificial intelligence for detecting and delineating tree crowns from aerial representations are quite common and effective, always under the assumption that their replicability with different types of crops may be limited. Thus, Csillik et al. [32] detected citrus-trees from multispectral imagery with precision and recall of 94.59% and 97.94%, respectively. Another example of the use of CNNs for individual tree identification is proposed by Ampatzidis and Partel [33]. They undertook the detection of citrus-trees by means of CNN, achieving a precision of 99.90% and a recall of 99.70%, and overall accuracy of 85.5% when segmenting the canopy cover in the grove tested. In this latter work, after tree detection, an alternative approach for estimating the individual canopy projections is proposed based on the normalized difference vegetation index (NDVI). In fact, vegetation indices are a common tool for extracting plant features, included dendrometric characteristics, as it is the case. Thus, Marques et al. [38] used a CIR (Colour-InfraRed) imagery-based vegetation index along with elevation data for individually detecting chestnut-trees, reporting a precision of 99.44%, a recall of 97.80%, and an F -score of 98.61%, and segmenting the canopy coverage with accuracies from 93% to 99% in all case studies addressed. Despite being proven effective in certain scenarios, this kind of approach, based on vegetation indices, can be penalised in terms of performance when a high presence of weed occurs, as their spectral reflectance signature is hardly distinguishable from that of the plants to identify [59][60][61][62]. Indeed, for the purpose of illustrating this issue, vegetation index-based segmentations for two of the case studies addressed in this work are proposed in Figure 16. For each citrus-tree orchard, its corresponding NDVI information, computed from the initial multispectral images, is presented as a grey-scale image, along with the binary representation resulting from the attempt to segment the tree-belonging canopies by exploiting this information. It should be noted that this segmentation has been approached by using statistical thresholding [49] with the corresponding NDVI data.
Agronomy 2022, 11, x FOR PEER REVIEW 22 of 28 be noted that this segmentation has been approached by using statistical thresholding [49] with the corresponding NDVI data. Figure 16. NDVI-based segmentation of the canopy coverage in the citrus-tree orchards: (a) lemontree orchard NDVI data, represented as a greyscale image; (b) binary representation of the lemontree orchard resulting from NDVI-based segmentation; (c) orange-tree orchard NDVI data, represented as a greyscale image; (d) binary representation of the orange-tree orchard resulting from NDVI-based segmentation.
As can be seen, the result obtained in the case of the lemon-tree orchard is more than acceptable. In fact, this segmentation could potentially be used as a basis for applying the methodology proposed here to delineate each tree crown individually. However, this is not the case for the orange grove, where the presence of a considerable amount of weeds, with a spectral response apparently similar to that offered by the orange trees, makes it difficult to discriminate between tree-canopy projections and soil. This does not disqualify the use vegetative indices within this area of research, since in this case it may be achieved another combination of spectral reflectance measurements that could improve segmentation results for the orange-tree orchard; but it does exemplify the problems that could be involved in using this type of indicators. On this basis, and because of the initial premise of yielding a methodology applicable without any adjustment, in as many diverse scenarios as possible, the use of any kind of vegetation indices was discarded in this work.
In fact, despite the use of vegetation indices and machine learning techniques may be considered as a trend in this field, several studies have addressed the problem of dis- Figure 16. NDVI-based segmentation of the canopy coverage in the citrus-tree orchards: (a) lemon-tree orchard NDVI data, represented as a greyscale image; (b) binary representation of the lemon-tree orchard resulting from NDVI-based segmentation; (c) orange-tree orchard NDVI data, represented as a greyscale image; (d) binary representation of the orange-tree orchard resulting from NDVI-based segmentation.
As can be seen, the result obtained in the case of the lemon-tree orchard is more than acceptable. In fact, this segmentation could potentially be used as a basis for applying the methodology proposed here to delineate each tree crown individually. However, this is not the case for the orange grove, where the presence of a considerable amount of weeds, with a spectral response apparently similar to that offered by the orange trees, makes it difficult to discriminate between tree-canopy projections and soil. This does not disqualify the use vegetative indices within this area of research, since in this case it may be achieved another combination of spectral reflectance measurements that could improve segmentation results for the orange-tree orchard; but it does exemplify the problems that could be involved in using this type of indicators. On this basis, and because of the initial premise of yielding a methodology applicable without any adjustment, in as many diverse scenarios as possible, the use of any kind of vegetation indices was discarded in this work.
In fact, despite the use of vegetation indices and machine learning techniques may be considered as a trend in this field, several studies have addressed the problem of discriminating individual tree-canopies by dispensing this kind of approach, and still reporting significant results. Thus, as previously introduced, Salamí et al. [39] proposed colour and stereo-vision -based segmentations to solve the automated detection of olive trees from aerial captures. They reported an F -score of 99.9% when detecting trees from a dataset with a total of 332 plants. Notwithstanding, it should be noted that the method was tested in olive orchards where plant arrangement guaranteed some spacing between trees, so the problem of overlapping adjacent canopies was not addressed. On the other hand, the study proposed by Ok and Ozdarici-Ok [35] may be of particular interest, given that their proposal was aimed at individually segmenting citrus-tree crowns, by means of morphological image analysis, and it also faced the overlapping canopies issue. After evaluating their methodology, they reported a precision of 91.1%, and a recall of 91.3% when assessing its performance at the pixel level. Positive results were also reported when evaluating the quality of the solution in terms of object detection, achieving an F -score value of 91.2%. In order to establish comparisons with this study, as well as with other works previously discussed, it can be underlined that the methodology here proposed achieved overall precision and recall values of 91.3% and 94.1%, respectively, when assessing the semantic segmentations in terms of pixel classification. Likewise, tests conducted for evaluating individual tree detection outperformed them yielding an F -score of 99.27%. In addition, it should be remarked that, contrary to most of the related published literature, the proposed methodology was conceived to be applicable as a generic solution regardless of the type of crop. Hence, it was tested with three different types of crop trees, obtaining promising results in all the threes case studies addressed.

Conclusions
In this paper, an image analysis methodology for the processing of aerial multispectral sensed data collected in crop tree-based intensive cultivation areas has been proposed, generating binary representations of them, and automatically delineating each tree crown that appears, thus providing individual plant canopy cover information.
Different case studies were set out to evaluate the effectiveness of the methodology, and to assess its viability as a solution independent from the type of tree-based crop cultivated, facing diverse scenarios in terms of the variability in the size of the tree population, or the characteristics of the plant's spatial arrangement. In all cases, including the borderline test realised in an orange-tree grove which presented challenging large intra-row canopy aggregations, results provided accurate segmentations at individual tree crown levels. Based on these results, along with the performance shown in terms of individual tree detection, the framework proposed here can be of broad use to farmers and agronomists for inventorying and monitoring crop trees in cultivated land plots.
Further research could be undertaken to investigate different approaches aimed at detecting and treating those exceptional cases reported, responsible for specific situations in which the under/overestimation of trees embedded in those connected components to process occurs. In addition, despite the fact that the methodology was tested with three different types of tree-based crops, it is assumable that undertaking new case studies focused on other additional crops would reinforce the applicability of the proposed solution, as independent of the kinds of trees farmed.