Individual Detection of Citrus and Avocado Trees Using Extended Maxima Transform Summation on Digital Surface Models

Individual tree detection (ITD) locates plants from images to estimate monitoring parameters, helping the management of forestry and agriculture systems. As a low-cost solution to help farm monitoring, digital surface models are increasingly involved together with mathematical morphology techniques within the framework of ITD tasks. However, morphology-based approaches are prone to omission and commission errors due to the shape and size of structuring elements. To reduce the error rate in ITD tasks, we introduce a morphological transform that is based on the local maxima segmentation (Cumulative Summation of Extended Maxima transform (SEMAX)) with the aim to enhance the seed selection by extracting information collected from different heights. Validation is performed on data collected from the plantations of citrus and avocado using different measures of precision. The results obtained by the SEMAX approach show that the devised ITD algorithm provides enough accuracy, and achieves the lowest false-negative rate than other compared state-of-art approaches do.


Introduction
Individual tree detection (ITD) locates plants from images to estimate agronomic parameters (like spatial distribution, crown sizes, and vegetation fraction), promoting the management of forestry and agriculture systems [1]. In fruit plantations, the monitoring parameters allow assessing tree density growth, orchard biomass changes, and fruit yield production [2]. At vast plantations, remote sensing technologies facilitate the extraction of ITD-based parameters, employing satellite images that may be acquired by a wide variety of sensors at several spectral and spatial resolutions [3,4]. Nonetheless, related to ITD systems, the use of satellite data suffers from a long repetition cycle and coarse resolutions, limiting its usage in those applications that involve small areas or time-sensitive decisions [5]. So, both (spatial and temporal) resolutions could be improved using private satellites, but they are more expensive [6].
To alleviate the monitoring expenses of small or medium-size areas, recent technological advances in aerospace engineering have introduced airborne imagery through cameras mounted on unmanned aerial vehicles (UAV), offering a trade-off between cost and spatial resolution [7]. Additional benefits of UAV images may include: collecting data from areas in short times, gathering data without affecting the crop, and flight mission scheduling according to the growth stages of plantations [8]. Besides, the UAV sensors can combine information acquired from several imaging modalities (like LIDAR, hyperspectral, multispectral, and RGB). In practice, aircraft can be outfitted with either LIDAR or photogrammetric technology surveying, being the first one an accurate alternative for getting a bare earth survey on land and slightly intersecting the tree canopies (see Figure 1c). Note that the chosen plantations have been pruned, and therefore almost every tree holds a single treetop. Besides, we do not use the ground control points in the present work.  As a rule, tree labeling is not usually performed on the DSM representation due to the presence of noisy regions that may resemble some plantation shapes. Instead, labeling is performed on the RGB representation that allows identifying directly from the original picture, whether one region can be labeled as a single tree. As stated in [24,25], this situation makes many studies use manually delineated tree crowns as reference data. However, such data could include subjective errors [26].
Preprocessing stage: The DSM models are created using Pix4Dmapper Pro photogrammetry software (version 4.1.3) that imports a densified point cloud set that is generated by an external source of UAV imagery. Then, we calculate the corresponding DSM, extracting high-quality maps [27]. Then, we calculate a single high-quality DSM map per each farm [28], obtaining the following number of matching points: 2,479,502, 16,664.3, and 15,556.9, for each one of the orchard areas, denoted as Farm # 1, #2, and #3, respectively). Afterward, the filter noise procedure smooths surfaces and merges tiles, resulting in DSM data with a spatial resolution adjusted to 2.24 cm, 3.98 cm, and 3.04 cm in each one of the tested orchard farms, respectively.
Image Segmentation using Extended Maxima Transform: Let Ω: f →R; p →Ω(p) be a real-valued function that maps a gray-tone pixel p ∈ N into a subset f . Thus, the morphological reconstruction yields a scalar-valued image, φ δ (g| f ), computed as follows [12]: where g⊃ f is a template image, δ * (g| f ) is a functional that provides the convergence rule: Notation ∧ stands for the pixel-wise minimum operator. Further, the tree detection relies on the extracted information from an individual interest region. Thus, using the morphological reconstruction in Equation (1), the objects of interest are initially detected by Extended Maxima transform (EMAX), noted as E (·; ·), and performed as follows [12]: where R(·) is the regional maxima transform that operates on the shifted argument R δ ( f | f − h) by saturating the pixels, which have an intensity higher than a given threshold of h ∈ R + . Therefore, an object represented by Equation (2) yields a binary image of connecting pixels, given as E (·; ·) = K k=1 {π k }, where {π k }⊂ f is the set of pixel coordinates, belonging to k-th regional maxima, and K is the total number of linked components. A frequently employed method of linking components is to connect neighboring pixels pairwise, describing a closed-form object with the largest perimeter. Thus, object segmentation is accomplished through an algorithm of region growing (namely, the watershed transform), segmenting images from their local minima set to the corresponding peaks. Nonetheless, the object images of considered farms hold high spatial resolution, so that they may have more than one maximum within each tree crown, causing over-segmentation. To cope with this situation, watershed approaches can be combined with local maxima detection to restrict the number of local maxima to one within one segment [28]. Concretely, we apply the marker-controlled watershed strategy that limits the number of the local maximum of f through a marker image, E (; ), containing the connected components of nonzero pixels within each of the desired objects. For the latter representation, we assume that the minima imposition is an image transformation that modifies an input grayscale image in such a way that the only remaining minima are those given by an additional input image [29]. Thus, we define the marker image as follows: where performed at step i. Notation ∨ stands for the pixel-wise maximum operator. Cumulative Summation of Extended Maxima Transform: For diminishing the threshold influence on Equations (2) and (3), we propose to evaluate the Cumulative Summation of Extended Maxima transform (SEMAX) and computed over a query image as below: where h = {h i ∈ R : h i <h i+1 , i ∈ N} is a set of N ranked threshold values, h N+1 = max p ( f (p)). Hence, the SEMAX rule aggregates stepwise each one of the ranked regional maxima values, which are ordered from the least to the most restrictive threshold.
To make clear our rationale behind the proposed SEMAX transform, Figure 2 displays a 1-D scenario of tree detection through morphological mapping using a region of interest extracted from DSM data. The top plot in Figure 2b shows the trees and noise objects, which are placed in a single row (marked with a magenta line), and denoted by blue and red circles, respectively. The bottom plot of Figure 2b shows one slide of heights (marked with the red line) for which SEMAX and EMAX are performed. For illustration, the latter transform is computed at different values of h (blue lines). Note that the ground-truth trees are shadowed by gray color. As seen, the EMAX transform is highly dependent on the considered threshold, resulting in several spurious maxima, while the high thresholds neglect low-height objects. The dashed line draws the optimal EMAX result, showing that not all the trees are extracted due to the differences between the treetop heights. Besides, both extracted regions (related to noise and trees) are very alike. Instead, the slide performed by SEMAX (marked with a black line) shows that the transform performs as a detrending filter so that a single threshold is required to detect all objects. Furthermore, the 1-D representation of SEMAX shows that the noise regions fade faster and become smoother than the ones performed by the trees, enabling a better object distinction from noise. Relying on this fact, we assess the evenness of the SEMAX morphological gradient for a region π k as below: where · p denotes p -norm.  Figure 2a displays the framework proposed for enhanced tree detection that is computed following the algorithm displayed in Algorithm 1. Initially, the SEMAX transform is applied over each image region using the mesh of linearly distributed thresholds {h i+1 = h i + ∆h}, being ∆h the threshold step that is adjusted to the image resolution. Secondly, the EMAX transform over the SEMAX map provides a binary image of candidate treetops. Next, we compute the controlled-marker watershed over the SEMAX morphological gradient, imposing the candidate treetops and the boundaries among them as seeds. Lastly, we label each region as a tree that assesses a higher evenness measure, exceeding a given decision threshold that is denoted as γ ∈ N. As seen in in Figure 2b, the high values of h reduce the number of noisy seeds at the cost of vanishing short trees. To overcome this limitation, we carry out an exhaustive search to adjust both thresholding parameters, h and γ, maximizing the F1-score. for n ∈= 1 . . . N do 8: 9: Trees ← F (π) > γ Tree classification 26: 27: return Trees 28: 29: end procedure

Results and Discussion
In the validation stage, the classifier performance obtained by SEMAX for the considered orchard field data is contrasted with three state-of-the-art methods for ITD: morphological gradient Hough transform (GHT) [2], controlled marker watershed with imposed LM from DSM (LGW) [17], and probabilistic LM with orientation information (PLO) [22]. For measuring the performance, the tree location reference (selected manually from UAV true-color orthophotos) is contrasted with the one provided automatically by each tested approach using the following evaluating metrics: where notations T p , F p , and F n are the outcomes of true positive, false positive, and false negative, respectively. The true negative rate is not included in the present evaluation since one image can contain millions of non-tree pixels, producing quite large values of this rate. In addition, to determine whether the detection is correct, we set a maximum distance of 20 cm from the ground-truth to a given detected object. As shown in Table 3, regions E and F present the highest false-positive values, meaning a lot of noisy objects. This result is explained by the fact that the DSMs extracted from both regions enable high rates of tree misdetection, at least, using GHT, LGW, and PLO algorithms. Instead, our proposal notably reduces the false positive rate of either area. In the remaining regions (A-D), the corresponding DSM representations are less influenced by noise and have about 80% less quantity of false positives. Therefore, we can assume that the ITD approaches benefit from the provided tree shape information so that the extracted DSM data are less affected by noisy objects. This conclusion becomes more evident in the case of region A that achieves the best false positive error.  As regards the averaged false positive rate, the PLO approach gets, on average, the worst value while achieving the best false-negative error. By contrast, the SEMAX* version that incorporates a classifier stage reaches the best F p , outperforming notably other contrasting methods. Moreover, the proposed approach decreases the false-positive error in about 70% while increments in 6% the F1-score that is also reported to reflect the balance between both decision-making frameworks (F p and F n ). It is worth noting that both proposed versions (SEMAX and SEMAX* ) do not demand additional information about variables like shadows, which can be conditioned by the sun position or the weather, and thus proving to be a promising ITD algorithm.
For illustration, Figure 3 draws the processing stages performed by SEMAX through the evolution of DSM representation for regions A and B, having different types of trees. Both input RGB images in the left column exemplify a tree distribution, which in the concrete case is manually labeled and denoted by the red dots. The middle column pictures the color maps to explain the differences in the terrain level estimated by DSM, while the right column depicts the resulting SEMAX data. In the case of region A (top row), all treetops are disseminated evenly, and the DSM representation reveals that the separation between ground and canopies is not constant. Thus, the affecting height variations are adjusted to the same level by the proposed operator in Equation (4). However, the landscape of region B holds differences in height with such an uneven tree distribution that causes subtle variations in elevation between the overlapped tree canopies. Consequently, the SEMAX* approach wrongly identifies several cumulus of trees as a single one. Therefore, additional restrictions to the detection task should be incorporated when dealing with overlapped canopies together with uneven sowing patterns.
The following aspect of consideration is the parameter influence (namely, the threshold) on the tree detection rate that is depicted in Figure 4. The X-axis denotes the recall value that is the fraction of the total amount of relevant instances (trees) that are actually retrieved, and the Y-axis represents the precision value that is the fraction of relevant instances among the retrieved ones [30]. Thus, whenever the threshold increases in LGW, PLO, SEMAX, and SEMAX* algorithms, the recall value also rises, meaning that more trees are correctly detected. However, this behavior continues to a certain degree. So, by raising the threshold too high, several noisy regions can be wrongly identified as trees, as more clearly perceived for regions E and F in Figure 3. Note that this situation holds for all algorithms except the case of GHT that should be more related to canopy shape detection rather than treetop identification. Lastly, LGW performs about 14% less true positive rate, on average across regions, due to the treetop heights are highly variable, hindering the tree identification by a single threshold value. Another aspect to discuss is the algorithm performance in farms having different orchard field types. The region A holds plantations of citrus and avocado trees with regular spacing and soft terrain slope, enabling to extract information from various heights. Hence, the performance of PLO, SEMAX, and SEMAX* provide a similar T p value. However, the GHT algorithm gets the worst true positive value because of the high variance between the sizes and shapes of trees. Further, the tree detection becomes more complicated in region B that contains citrus trees with overlapping canopies as well as different stand ages, and thus resulting in poor recall and precision values, as seen in Figure 4. Once again, the LGW algorithm fails, obtaining the highest errors of F p and F n . In turn, the regions C and D present the most overlapped canopies and the lowest image resolution so that the tested ITD approaches get negligible differences in precision and recall values. Still, the fact that the regions have a high terrain slope results in dropping the LGW precision. Finally, the regions E and F include significant variations in height with slightly intersecting tree canopies, yielding the lowest precision values in almost all algorithms.
Nevertheless, the SEMAX* algorithm performs the best F p values, and therefore, it can better deal with noisy DSM data. Even though our method achieves the highest precision, there are still large values of F n (that is, trees are not detected) when dealing with inaccurate DSM data. In particular, the omission error can arise due to a couple of factors: (i) soft DSM changes between terrain and tree-levels, as shown within a white frame in Figure 5; (ii) high-tree density having their canopies overlapped. One more important issue is the commission error, which is mostly promoted by low spatial resolution. Thus, some shadows are wrongly calculated in the DSM representation as the tree height, misleading the algorithms (see left plot Figure 6). In turn, trees with large canopies tend to have considerable height variations in the DSM data, resulting in two or more local maxima. That is, instead of detecting a single tree, our algorithm may incorrectly yield two or more trees (see the frame of right plot Figure 6).  Finally, we remark that many false positives and negatives may appear in this study due to the ground-truth is hand-annotated over the RGB image, raising a limitation. Namely, the DSM map is not always aligned with the RGB image, meaning that the treetops selected in the RGB do not correspond to the maxima regions in the DSM map. In consequence, our algorithm does not classify those regions as trees, decreasing the performance.

Concluding Remarks
In this study, a new approach is presented to detect citrus and avocado trees from using DSM data. The approach mixes both different height thresholds and a classification step that enhances seed selection. The validated accuracy proves that our proposal reaches an average value of F p of 6.7 ± 5.6, providing superior results in comparison with other considered state-of-the-art techniques.
The results obtained in real-world data suggest a potential application of the proposed technique for extracting useful parameters of orchard fields in extensive agricultural areas, and especially, is cases demanding low sensitivity (F p values) to overlapped canopies. The information provided, along with smart agriculture techniques, will enhance the efficiency of decision processes like fertilization, detection of crop diseases, update inventories, or crop yield estimation.
Nevertheless, an issue with applying the SEMAX* approach is the tree shape that should have adjacent branches immediately above the main trunk (e.g., olive trees). Otherwise, the number of false positives may increase. Another restriction that may arise is that SEMAX* is developed focusing in orchard zones so that its straightforward application in urban zones is likely to be strongly influenced by buildings or constructions, which may be incorrectly labeled as trees. This issue can be faced by using RGB image sets to extract distinguishing information from color and texture.
One of the challenging issues in the generation of digital surface models from optical stereo imageries is noise reduction that significantly increases the probability of noisy objects wrongly detected as trees [31]. Therefore, we put more attention into the decision-making framework based on false-positive error. However, in agricultural practices, the optimization must be performed in terms of the decision-making framework that reduces the risk of wasting financial resources derived from each particular case of implementation of individual tree detection systems.
Lastly, we deal with noisy digital surface models that tend to increase the F p error rate. Therefore, more attention is given to false-positive decision-making framework. As future work, the authors plan to extend the decision-making rule to more performance measures of implementing individual tree detection systems. The validation is to be carried out in plantations with more complex patterns.