Individual Tree Detection from UAV Imagery Using Hölder Exponent

This article explores the application of Hölder exponent analysis for the identification and delineation of single tree crowns from very high-resolution (VHR) imagery captured by unmanned aerial vehicles (UAV). Most of the present individual tree crown detection (ITD) methods are based on canopy height models (CHM) and are very effective as far as an accurate digital terrain model (DTM) is available. This prerequisite is hard to accomplish in some environments, such as alpine forests, because of the high tree density and the irregular topography. Indeed, in such conditions, the photogrammetrically derived DTM can be inaccurate. A novel image processing method supports the segmentation of crowns based only on the parameter related to the multifractality description of the image. In particular, the multifractality is related to the deviation from a strict self-similarity and can be treated as the information about the level of inhomogeneity of considered data. The multifractals, even if well established in image processing and recognized by the scientific community, represent a relatively new application in VHR aerial imagery. In this work, the Hölder exponent (one of the parameters related to multifractal description) is applied to the study of a coniferous forest in the Western Alps. The infrared dataset with 10 cm pixels is captured by a UAV-mounted optical sensor. Then, the tree crowns are detected by a basic workflow. This consists of the thresholding of the image on the basis of the Hölder exponent. Then, the single crowns are segmented through a multiresolution segmentation approach. The ITD segmentation was validated through a two-level validation analysis that included a visual evaluation and the computing of quantitative measures based on 200 reference crowns. The results were checked against the ITD performed in the same area but using only spectral, textural, and elevation information. Specifically, the visual assessment included the estimation of the producer’s and user’s accuracies and the F1 score. The quantitative measures considered are the root mean square error (RMSE) (for the area, the perimeter, and the distance between centroids) and the over-segmentation and under-segmentation indices, the Jaccard index, and the completeness index. The F1 score indicates positive results (over 73%) as well as the completeness index that does not exceed 0.23 on a scale of 0 to 1, taking 0 as the best result possible. The RMSE of the extension of crowns is 3 m2, which represents only 14% of the average extension of reference crowns. The performance of the segmentation based on the Hölder exponent outclasses those based on spectral, textural, and elevation information. Despite the good results of the segmentation, the method tends to under-segment rather than over-segment, especially in areas with sloping. This study lays the groundwork for future research into ITD from VHR optical imagery using multifractals.


Introduction
Unmanned aerial vehicle (UAV) systems have gained the approval of the scientific community for different applications related to the acquisition of information, becoming common in geospatial research and a wide range of applications [1]. The cost-and time-effectiveness of UAV systems, compared to traditional field surveys, is partially responsible for their increasing favor. An additional factor contributing to their popularity is that they can be equipped with several sensors, such as optical and hyperspectral cameras, light detection and ranging systems (LiDAR), synthetic aperture radars (SAR), inertial measurement units (IMU), and global positioning systems (GPS) [1][2][3][4].
Many disciplines benefit from these technologies, including forestry [5]. The application of UAV in forestry inventory and, more generally, in the extraction of the main forest parameters (e.g., forest stand density, crown widths, basal area, average diameter at breast height, height) is well established. The structural information of forest stands is vital for silviculture and forestry inventories. The accurate detection of tree crowns is necessary to estimate the dendrometric attributes of forest stands, such as the tree position, the stem diameter, the height, the crown extension, and the volume [6][7][8]. Besides, these forest parameters can be valuable ecological indicators, which determine, among others, the carbon sequestration, the shading, the risk of wind-breakage, and the tree growth [9]. The determination of these parameters is performed at the individual tree level and requires information about single trees.
Thus far, many approaches have been proposed for individual tree detection (ITD) via remote sensing. Generally, they are based on digital elevation models (DEM) that can be generated from LiDAR acquisitions [7,[10][11][12][13][14][15] or structure from motion (SfM) [5,9,11,16,17]. SfM uses optical images acquired from multiple points of view to recreate the three-dimensional geometry of an object [18,19]. The 3D model generation is carried out by incremental steps. First, the key-points are extracted from the images based on contrast and texture-related rules. The key-points are identified in all input images and then matched between different images [19,20]. Then, the bundle adjustment is performed and the sparse point cloud is usually scaled and georeferenced [21,22]. The final step consists of the densification of the point cloud thorough specific algorithms [23].
Regardless of the data source, some 2D ITD methodologies include the computation of the canopy height model (CHM) for the detection and delineation of tree crowns [5,24]. First, the local maxima of the CHM are computed to detect treetops [5,24], and then, the crowns are delineated using image-processing and segmentation algorithms [10,13,15,25]. The most common technique for the delineation of crowns consists of watershed segmentation, using as input seeds the local maxima. Segmentation works on contiguous pixels that are grouped based on similar digital number (DN) values [4,13,15,26]; when the local maxima are identified, they are used as input seeds, or starting points, for the generation of the segments. Many other 2D ITD spectral information methodologies have been explored, but, unlike the others, these procedures mainly work on the segmentation based on brightness levels [7,9,10,24,27,28]. They consider the brightest pixel in a neighborhood as the tree crown apex and identify the tree crown perimeters using dark-pixel and valley-following approaches. Most of the ITD techniques depend on CHM generation methods that may affect the accuracy of tree crown delineation [13,29]. CHM is calculated as the difference between the digital surface model (DSM) and the digital terrain model (DTM). Thus, a good DTM is a fundamental prerequisite for the accurate characterization of CHM [11].
When the DTM of a forest stand is interpolated from LiDAR or photogrammetric point clouds, their accuracy is strongly influenced by the density of the forest stand, meaning the number of ground points identified by the sensor [11]. Indeed, CHM-based methods for ITD assume that local maxima analysis detects treetops. However, in structurally complex forest stands and steep slope areas, the results should be carefully interpreted [9]. In this framework, LiDAR data is much more accurate [5] than the SfM-based approaches, since LiDAR can penetrate tree crowns and obtain terrain information by reaching the ground [30]. As a result of this, and of the commercialization of light-weighted sensors that can be mounted on UAVs, the most recent applications of ITD methodologies work on 3D datasets acquired with aerial laser scanners (ALS) [5,12,15]. Besides being able to generate more accurate point clouds, LiDAR technologies are more expensive than optical ones [24,30]. Even if some countries, such as Norway, Sweden, and Canada, use LiDAR technology for national forest inventories, several annual acquisitions at local and regional scales are generally cost-prohibitive [30]. Therefore, many countries are not in the economical position to rely on LiDAR technologies. According to White et al. [29], generally, SfM-derived data for forestry inventories are more cost-effective than LiDAR data and can cost about one-half to one-third of LiDAR data [29]. Moreover, LiDAR sensors are heavier than multispectral cameras and need to be mounded on UAVs with higher payload capacities. Besides being more expensive, larger UAVs with heavy payloads may require additional training and licensing (most UAV license national systems are based on maximum take-off weight, MTOW, categories). Among others, LiDAR requires also high data storage structures [24] and powerful computational technology to obtain accurate results [5]. LiDAR data do not provide users with the spectral information, although some models have a camera integrated into the acquisition system. Table 1 provides an analysis of the advantages and disadvantages of the optical and LiDAR systems focused on UAV data acquisition for ITD. The ITD approaches based on UAV aerial images promise to be a cost effective and valid alternative to LiDAR. They provide users with good accuracy data, with little usage of resources. Several studies have been carried out on the accuracy of ITD from UAV-derived information. Some methods identify the tree crowns from the brightness values of visible and infrared images [27,28], while some more recent ones work on multiscale filtering, segmentation of imagery, and math morphology algorithms [8] to define tree crowns [16,25,32]. These methods usually have complex segmentation workflows and require the application of image filters, such as Laplacian filters, Gaussian filters, and math morphology algorithms. Complex segmentation processes are necessary because UAV optical imagery of forested areas is frequently affected by shadows, slope-derived distortions, and low contrast [33,34]. These aspects, which are enhanced by the high spectral variability of very high resolution (VHR) imagery, make segmentation difficult. VHR images represent a challenge for segmentation and classification because, unlike in lower resolution images, single pixels no longer capture the characteristics of the classification targets [26]. Image-based methodologies for ITD, even if efficient, usually require several steps; therefore, high computational time is needed. This is one of the reasons why the image-based processes for ITD have been partially overcome by CHM-based methods. Nevertheless, when CHM is not accurate enough or too expensive, such as structural complex stands, image processing methods that do not require CHM exist and they can be a valuable alternative to CHM-based methods. Indeed, image-based segmentation techniques can provide good accuracy results, especially when a textural analysis is applied [35,36]. A shared methodology of texture analysis for segmentation (and classification) is based on the gray level co-occurrence matrix (GLCM) according to the Haralick measures [37]. For the images of complex structures, some researchers proposed the use of segmentation algorithms based on fractal and multifractal analyses [38][39][40]. It is worthwhile to remember here that a fractal is a rough or fragmented geometrical object that can be subdivided into parts, each of which is (at least approximately) a reduced-size copy of the whole object [41]. Fractals are described by one quantitative number-a fractal dimension, for the computation of which various methods have been proposed (see, e.g., [42]), but, generally, it can be treated as information about the considered object's measure of complexity and self-similarity.
Fractal dimension has been used together with other features for image texture description and segmentation, e.g., Keller et al. [43]. The fractal dimension has been also utilized in the forestry field. For instance, an interesting description of fractals in forest science can be found in Lorimer et al. [44]. Zeide and Pfeifer showed that the fractal dimension of tree crowns can be useful in crown classification and foliage distribution within a single tree crown analysis [45]. Similarly, Mandelbrot suggested applying fractals to modeling trees and analyzing their structure [41]. A comprehensive review of the application of fractal description in forest science can be found in Lorimer et al. [44]. Multifractal analysis is an extension of fractal theory and it is based on the assumption that the multifractal is a set of nontrivially intertwined fractals. Hence, the description of multifractal inner structure demands a set of parameters which permit a more detailed characterization both locally and globally.
At the beginning of the multifractal image analysis, a measure is assigned to the image and, in the next steps, the measure regularity of this measure is analyzed as the information on the image's complexity/inhomogeneity. It is worthwhile to underline the fact that various measures defined based on pixel intensities can be applied [38,40,46]. The local (pointwise) degree of regularity of a given measure is described by so-called Hölder exponent values, which strongly depend on the actual position on the image and allow researchers to identify points that differ from the background [40]. On the other hand, the distribution of Hölder exponents on the image is summarized in the form of the so-called multifractal spectrum, treated as the global characteristic of a measure's regularity (image complexity/inhomogeneity) [38,40]. Global multifractal characteristics have already been applied to VHR optical data [47,48], mostly to distinguish between different land cover types. One can find also their application in the context of the study of forest cover, such as in Danila et al.'s work [49], or to perform the segmentation of plants' disease images [50]. On the other hand, local multifractal description by using Hölder exponents has rarely been used, mainly to perform segmentation of medical data [38,40] or in the change detection aspects of satellite images [38,51]. Nevertheless, the results obtained in papers [38,40,51] suggest the usefulness of the Hölder exponent in the context of image content description. In particular, the authors of these studies underlined the fuller description of complex shapes, heterogeneous measures, and structures typical for satellite remote sensing. It is worth mentioning that, to the best of our knowledge, the Hölder exponent parameter has not been determined for VHR UAV-derived imagery yet or in the context of forest analysis. Therefore, in this study, we focus on the determination of the local Hölder exponent connected with multifractal theory and use it for the segmentation of single tree crowns from VHR UAV-derived imagery. More precisely, we propose to apply this quantitative descriptor as the unique input for the efficient identification of single tree crowns using only a cycle of multiresolution segmentation algorithms.

Study Site
This study was conducted in the North-Western Alps in a forest stand located in Cesana Torinese (TO) (44 • 56 46.1" N 6 • 46 29.5" E). The test study is a coniferous forest (

UAV Flight and Photogrammetric Data Acquisition
UAV technology was used in this research in order to generate photogrammetric products to be used as input data for the segmentation of single tree crowns using multifractal analysis. The UAV system used was chosen to take into account the characteristics of the study area, regarding the topography, and the environmental conditions that could affect the execution of flights, the resolution of the products to be generated, and the sensors to be integrated. Besides the radiometric information regarding the visible part of the electromagnetic spectrum (red, green, blue), the near infrared (NIR) part was necessary. Indeed, NIR information can enhance the presence of vegetation in the image-processing phase, and, generally, NIR information helps distinguish shadows from dark objects, which have higher reflectance in the NIR. Due to the large area involved in this application and the steep terrain, with an elevation difference of about 400 m, we used a commercial fixed-wings solution, an eBee Plus made by senseFly. The eBee has a payload of up to 0.3 kg, a flight autonomy of 59 min, and it can reach a cruise speed of 40-110 km/h. Moreover, it does not require expert users, because take-off and landing are completely automatic, thanks to the built-in global navigation satellite system (GNSS) receiver.
Two different camera devices were employed for the collection of the RGB and NIR electromagnetic spectra. To perform the RGB flight, the eBee Plus was equipped with the RGB senseFly S.O.D.A. digital camera, with a sensor of 20 MP (5472 × 3648), a focal length of 10.6 mm, and a sensor size of 13.2 × 8.8 mm. A fixed number of frames per second equal to 0.25 fps was automatically acquired by the camera using a shutter cable. The flight with the eBee was planned using the eMotion software, considering a photogrammetric overlap between images of 80% in the lateral and longitudinal direction, an altitude of 220 m, a speed of 9 m/s, and an average ground resolution of 5 cm. Due to the extension of the area and the significant difference in height of the terrain, which could have adversely affected the autonomy of the battery by not allowing the flight to end, it was decided to survey the area through two distinct flights ( Table 2). The flights were planned using as a base a digital surface model (DSM) of the area, from which the flight height was fixed. Given the steep terrain, the flight plan was created so that the survey lines of the flight path would be roughly parallel to the contour lines of similar elevation and then at constant height. In order to acquire NIR images, we used a commercial camera, the Canon S110 NIR. The main feature of the camera is that it has a modified filter that acquired the near infrared 850 nm, along with the red 625 nm and green 500 nm, light. The Canon S110 has a resolution of 12.1MP (4000 × 3000) and a focal length of 5.2 mm. Taking into account the characteristics of the camera sensor, the flight was performed with the eBee at a height of 220 m and a speed of 11 m/s, in order to guarantee an image overlap of 80% in both directions and an average ground sample distance (GSD) of about 6 cm. Table 2 shows the characteristics of the photogrammetric flights. The data acquisition is a key step of the photogrammetric process since the quality of the final result depends on it. The data acquisition phase includes not only flights but, if necessary, the measurement of ground control points (GCPs) for the point cloud georeferencing and of check points (CPs) for the evaluation of the accuracy of the final results. To this purpose, before performing flights, 20 colored markers of 40 × 40 cm size were placed within the study area. A total of 14 of them were used as GCPs during the data processing phase, while 6 markers were employed as CPs for the validation of the model ( Figure 1).
The position of the GCPs and CPs was acquired through a GNSS (global navigation satellite system) receiver using a real-time kinematic (RTK) (with a Global System for Mobile Communications GSM connection for real-time correction) approach, considering a session length of about 10 s for each point. The points' coordinates were estimated with fixed-phase ambiguities. The centimeter-level accuracy ( 3 cm) ensured a high level of precision for the georeferencing process.

Photogrammetric Data Processing
The aerial image acquisitions aimed to produce the RGB and RGN (red, green, NIR) orthomosaics. All the UAV data were post-processed through the structure from motion (SfM) approach [52]. These algorithms, which now are implemented in several commercial software, allow us to rapidly and accurately align the images, compute a three-dimensional dense point cloud and, then, to reconstruct a textured mesh of the object of study. In this case study, the photogrammetric process was carried out using the commercial solution AMP (Agisoft Metashape Professional).
The RGB datasets, acquired in two different flights, were processed together in the same project. A specific project was then dedicated to the processing of the RGN images. Nadiral images, in both projects, were aligned together, setting up the "high" level of accuracy of AMP, removing any limit on the key and tie points number. Subsequently, the measured GCPs and CPs were collimated in all the images, obtaining a 3D georeferenced model of known accuracy, as shown in Table 3. The 3D dense point clouds was produced using a "high" level of detail to obtain products suitable for medium/large-scale representations (1:500) and an "aggressive" depth filtering in order to remove the noise due to the presence of dense vegetation. The next step involved the generation of a "high" quality mesh, from which we were able to generate the DSM of the study area. The results of the UAV image data processing were two orthomosaics in the RGB ( Figure 1) and RGN ( Figure 2) channels of the area of interest, in the WGS84-UTM 32N coordinates system. According to the accuracy of the model, the orthomosaics were produced with a resolution of 10 cm, setting the "mosaic" blending option in AMP. The borders of the orthomosaics were cut out from the study area to avoid distortion of the images and to obtain a regular shape.
Comparing the two orthomosaics obtained, it can be observed that the product in the RGN channels is incomplete with regard to the central part of the study area. In fact, it was not possible to align the RGN images related to this portion of the area, probably due to the considerable difference in altitude of the terrain, due to an almost vertical rock wall. However, the vegetation present in this area was rather low and sparse and, therefore, this does not affect the application of the algorithms described below. Finally, in addition to the two products already described, it was possible to generate the DTM of the area, using the dense point cloud as input data. Due to the complex terrain orography and the presence of dense vegetation, a semi-automatic approach was chosen. In the first step, the points belonging to the ground were classified with a specific algorithm in AMP by setting the maximum angle equal to 45 (i.e., the maximum angle between the terrain model and the line to connect a point with a point from a ground class). Subsequently, the classification was optimized manually in order to replace the points not correctly classified by the software. Exploiting the identified points of the ground, it was therefore possible to generate the DTM with a resolution of 10 cm.

Hölder Exponent Calculations
In this analysis, we focused on the local description of VHR UAV-derived imagery, using parameters related to multifractal formalism. More precisely, we determined the singularity strength α (known as the Hölder exponent), which depends on the pixel's actual position in the structure (i.e., the single-band image), and this makes it possible to describe the local degree of regularity in the pixel's neighborhood [40,51]. The procedure used to calculate the Hölder exponent α is graphically presented in Figure 3 and briefly summarized below. The Hölder exponent was calculated in Matlab©.
For each pixel (m, n) of the NIR channel, we considered a square neighborhood of size ε i = 2i − 1, i = 1, 2, . . . , j, where j denotes the total number of squares, while ε i is the size of a region centered on the pixel (m, n). In this notation, ε 1 = 1 denotes a square, which contains only a single pixel, ε 2 = 3 represents a square of size 3 × 3 containing the pixel's neighbors, while ε 3 = 5 is a square of size 5 × 5, etc. It is worthwhile to stress that during the computation of α(m, n), various sizes of pixel neighborhoods j as well as shapes can be applied, allowing us to describe localized or more widespread singularities. Here, we consider cases where the maximum neighborhood (maximum square size) of a pixel is 5 × 5 ( j = 3). The next important aspect of Hölder exponent determination stated the use of various capacity measures (µ), which allows the emphasis of various effects on the image [40,48]. In the frame of this work, based on the initial tests, we applied the following type of capacity measure: where m, n denotes the pixel position, g(k, l) is a gray-scale intensity at point (k, l), and Ω i is the set of all pixels (k, l) in the square. Capacity measure ISO (Equation (1)) gives the number of pixels in the considered neighborhood, which have the same values as the centered pixel (m, n). ISO is the name of the capacity measure proposed by Véhel and Mignot (1994) [38] and Stojic et al., 2006 [40]. More precisely, the ISO measure, or ISO capacity measure, provides a presentation of a two-dimensional isosurface in the considered neighborhood window and is equal to the number of pixels with the same intensity as the analyzed pixel. A more detailed discussion about the used measures can be found in Véhel and Mignot (1994), Stojić et al. (2006), and Turner et al. (1998) [38,40,46]. After the calculation of the capacity measure µ ISO i , in the pixel neighborhood ε i , the discrete set of coarse Hölder exponents has been determined: Finally, the limiting value of the Hölder exponent for each pixel from the NIR channel has been estimated using the formula: as the slope of the linear regression through points on a log-log plot, where log ε i is plotted on the x-axis, and log µ ISO i (m, n) on the y-axis, as shown in the middle section of Figure 3 [51]. In the final step of analysis, a two-dimensional "α-image", which collects Hölder exponents, has been calculated. To compute Hölder exponents, we used the software Matlab.
Additionally, as we underlined in the Introduction, next to the local Hölder exponent, the multifractal description enables us also to analyze the global distribution of the regularity in a whole scene and to summarize it in the form of the multifractal spectrum; see, e.g., Stojić et al. [40]. However, the usefulness of this function in the context of tree detection will be the topic of a separate analysis.

Segmentation Process
In the further steps of analysis, the Hölder exponent layer (α-image) determined by using the ISO capacity measure was used as the base feature for the ITD through the segmentation process. First, it was smoothed with a simple average filter to remove small variations on the crown surface. The degree of smoothness was defined by the size of the filter (3 × 3). The segmentation was realized with eCognition Developer software. Two segmentation steps accomplished the crown extraction. In the first step, the high-fractality pixels were separated from the low-fractality ones using the contrast split algorithm applied to the Hölder exponent layer, calculated on the infrared band. It was necessary to find the threshold value that represented the breakpoint between tree crowns and other elements. Table 4 shows the adopted parameters. The threshold parameters were selected to satisfy the spectral difference between crowns and other elements. The second step consisted of the extraction of the single crowns by applying the multiresolution segmentation algorithm (Table 4). Since the segmentation visually resulted in objects slightly smaller than the crowns' RGB orthomosaics, they were up-sized to ensure the best match for the majority of the crowns. The objects were redefined by increasing the borders by 3 pixels and then removing those that measured less than 8 pixels. The growing phase interested only the tree pixels neighboring non-trees (class others) ones. The other pixels were segmented in objects of 3 × 3 pixel size, using chessboard segmentation algorithms. The crown objects were grown into the neighboring chessboard objects (Table 4). Finally, the segmentation was exported in the Quantum GIS (version 3.4.8) environment, where the jagged borders of the segments were smoothed (GDAL smoothing algorithm, set as three iterations with 0.5 offset) and validated.

Validation
Specific attention is given to the validation of the segmentation goodness methodology. Indeed, even if the literature is rich in methodologies for the evaluation of the goodness of segmentation and extraction of specific objects from imagery [53], a shared and accepted methodology for the accuracy assessment does not exist [54]. Besides this, the methods applied are quite similar to each other and, generally, they are based on the comparison between manually digitalized reference objects and the segmented objects [25,[53][54][55][56][57]. We opted for a two-level validation, which takes into consideration qualitative and quantitative accuracy measures. The first level was based on the work of Ke at al. [25] and it consisted of a simple visual evaluation, while the second level assessment was a single tree quantitative method that compares several variables and it assessed the under-segmentation and over-segmentation. Both levels will be described in detail in the following sections. The accuracy assessments used as reference 200 crowns that were randomly selected but manually delineated (Figure 4). To minimize the subjectivity, 200 random points were spread within the study area, and the crown on which the points fall was defined by manual segmentation, using as a background layer the RGN and RGB orthomosaics.

Visual Evaluation
The accuracy was evaluated in terms of correspondence between the reference crowns and the segmented ones. The evaluation methodology considers typical accuracy measures based on pixels (user's and producer's accuracy and F1 score) and applies them to measures based on objects. Particularly, the producer's accuracy (PA) and the user's accuracy (UA) are calculated using the following equations: where PA is the producer's accuracy, UA is the user's accuracy, M is the number of matching crowns, RC is the number of reference crowns, and DC is the number of defined crowns. The relationship between UA and PA is described by the F1 score, from the following equation: The situation shown in Figure 5a was considered as matching crowns (M), while the relationships of reference and segmented crowns in Figure 5b-d were considered as non-matching crowns. The segmented crowns were counted on the basis of their overlap with the reference crowns. For example, the segmented crowns in Figure 5b are zero, in Figure 5c are one, and in Figure 5d are three. Even if significant, these measures provide a partial view of the goodness of the segmentation. The omission and commission errors can describe more precisely the goodness of the segmentation.
As illustrated by Ke and Quackenbush (2011) [25], we took into consideration four possible cases of the relationship between the reference dataset and the segmented one: (i) match, (ii) simple omission, (iii) omission through under-segmentation, and (iv) commission through over-segmentation ( Figure 5). The areal difference, the perimeter, the distance of the centroid, the under-segmentation index, the over-segmentation index, and the completeness index are the evaluated metrics. The RMSE was calculated for the area and the perimeter.
The areal distance is the most common metric used as an indicator of segmentation goodness. It was calculated for the reference objects and segmented objects. In the case of over-segmentation, the reference area was compared to the sum of the segmented objects in correspondence with the reference tree.
The perimeter measures the length of the object borders; in the case of more than one crown corresponding to the reference, the segmented perimeter was calculated as the sum of the perimeters on every single object composing the crown in exam. With this approach, over-segmented objects have high RMSE values. It is worth mentioning that the perimeter metrics results should be considered with caution. Indeed, the values can vary according to the shape and the number of tree branches considered in the definition of the reference crowns.
The centroid distance represents the Euclidean distance between the gravitational centers of two shapes. The Euclidean distance between the centroids is calculated as the RMSE [55]; thus, it can be considered as the indicator of error on the distance between gravitational centers. In the event that more than one crown corresponded to the reference, the centroid distance was calculated between the reference crown and the closest centroid.
The RMSE of perimeter and area were calculated with the following formula: where R i is the value of metric m of the reference crown, and S i is the metric m for the segmented crown. Four indicators for the evaluation of the goodness of the segmentation were applied. The over-segmentation index (OS), the under-segmentation index (US), the intersection over union index (J), and the completeness (D) were evaluated for each reference tree.
The OS and US were proposed by Clinton et al. (2010) and Persello (2010) [53,54]. Their estimations are based on the relationship between the area of the segmented (S) and reference objects (R). OS and US are described by the following equations: where |R i ∩ S i | is the overlapping area between the reference crown (R i ) and the segmented crown (S i ) of object i. Zero value describes a perfect match, while values that approach 1 indicate disagreements between the reference and the segmented object. The OS and US indices were considered the maximum, minimum, median, and average values. The intersection over union (J), also known as the Jaccard index, quantifies also the false positives within the segmentation, and it is calculated as the ratio between the overlapping area (R i ∩ S i ) and the union area (R i ∪ S i ): It is worthwhile to stress that when J is equal to 1, there is a perfect segmentation. Finally, the completeness of the segmentation was evaluated through the completeness index (D) [53], calculated as the distance between the OS and the US, as follows: The completeness index D should be interpreted as the closeness to an ideal segmentation result in relation to the reference set. When the D index is close to 0, it indicates a perfect segmentation.

Comparison with Segmentation Methodologies Based on Spectral, Textural and Elevation Information
To evaluate the goodness of the Hölder exponent segmentation, the results were checked against four different segmentations based on the elaboration of spectral, textural, and elevation information. Namely, the following were used as terms of comparison: (i) original spectral bands (red, green, NIR), (ii) normalized difference vegetation index; (iii) Haralick's sum variance measure from GLCM [37], the CHM, and (iv) a multi-sourced approach that considers both the CHM and the sum variance. The goal of this validation was to evaluate on equal terms the performances for ITD of the Hölder exponent against other more common input data. Thus, the ITD from each of these measures was performed using the same ruleset applied for the Hölder exponent but with the tuning of the input parameters to achieve the best possible results. Basically, they were realized using contrast split and multi-resolution segmentation algorithms, with minor differences in the sequence to improve the final segmentation. Table A1 recaps the applied rules and parameters of each segmentation. As mentioned in the Introduction, the CHM was calculated as the difference between the digital surface model (DSM) and the digital terrain model (DTM). Then, the location of treetops was calculated by applying the local maxima algorithm and used in the multi-sourced segmentation. The NDVI, the sum variance GLCM measure, the CHM, and the local maxima were calculated using Quantum GIS. The selection of sum variance among all the GLCM existing measures is based on visual evaluation.
The comparison of the Hölder-based segmentation and the other segmentations (validation datasets) is based on the qualitative and quantitative measures for accuracy assessment described in Sections 2.5, 2.5.1 and 2.5.2.   Table 5 shows the computational time for each of the applied algorithms and the graphic restitution of their results. The final segmented objects were 9215, with an average area of 21 m 2 and an average perimeter of 18 m. Figure 7 provides a sample of the segmentation results. From the very first visual evaluation, it appears that most of the crowns were detected. Some smaller crowns neighboring the scree appear slightly over-grown.

Results of Validation
The visual assessment of the segmentation provides positive results; indeed, only three crowns out of 200 references were not detected (simple omissions). Table 6 summarizes the results of the visual assessment of Hölder exponent segmentation (and of the validation datasets). Even if the simple omissions are rare, those produced through under-segmentation (OUS) are 27. The results underline the process' tendency to under-segment. Although the PA is slightly better than the UA, it reaches 79%, against 69% of UA, while commission errors are much lower (only 13 out of 200). These affect the F1 score, which despite the OUS reach an acceptable value (73%). The outcome is positively confirmed by the area-based analysis. As Table 7 shows, the RMSE on the area represents only 14% of the average dimension of the crowns: it is 3 m 2 over 21 m 2 of average crown extension. The RMSE on the perimeter is almost 3 m over the 18 m average perimeter, corresponding to 15%. This may be caused by the difficulties related to the definition of the reference tree, but also to the non-appropriate threshold value selected for the contrast split algorithm. Table 7. Root mean square error, the average and the % of error on the average, of the perimeter, the area, and the compactness metrics of the Hölder exponent segmentation and the validation datasets.  Table 8 presents the summary statistics regarding the over-segmentation (OS), under-segmentation (US), completeness (D), intersection over union (J) indices, and the distance between centroids. The minimum, maximum, and average values for each index were computed. What stands out is the high values of under-segmentation, which confirm the results of the visual estimation. The completeness (D) and the intersection over union (J) indices show significant positive results that confirm the accuracy of the ITD.

Metric
The median values of D and J are respectively 0. 18 and 0.72. The mean distance between the centroids of the reference and segmented crowns is 83 cm, while the median distance is exceptionally 45 cm. This value is promising and indicates that the results are close to a 4-pixel error in crown localization. Table 8. Summary statistics of the over-segmentation index (OS), the under-segmentation index (US), the completeness index (D), the Jaccard index (J), and the distance between centroids. Overall, the assessment depicts a positive scenario. The method used identifies the location of the crowns (centroid distance is below 50 cm) as well as their extensions, with a segmentation mean error of 14% on the area. Figure 8 presents the median values of the Jaccard index plotted against the area of the reference crowns. It can be seen that the proposed method is very efficient on larger crowns and prone to under-segmenting on smaller crowns. Indeed, the J index for the medium extension crowns (10-30 m 2 ) is mostly above 0.5. The lowest values of J are recorded on very small crowns (less than 5m 2 ). Concerning the comparison with the ITD based on spectral, textural, and elevation information, Tables 6 and 7 show respectively the results from the visual evaluation and the RMSE for the other validation segmentation methodologies. Generally, the Hölder exponent performs better as an input feature for the segmentation ruleset. Regarding the visual assessments, at equal condition, the results from the Hölder exponent outclassed those obtained from the other five validation datasets. For all methods, the producer's accuracy shows higher values. Indeed, the number of objects describing the reference dataset, in any case, is less than 228 (the number of segments from Hölder analysis). The segmentation generated from the spectral information has the highest F1 score within the validation datasets, although it is very far from the F1 score of Hölder exponent segmentation (0.734 of Hölder against the 0.348 of spectral bands). The CHM methods show the larger value of simple omission, which might be attributable to the inaccuracies of photogrammetric DTM in areas with sloping.

OS
The geometrical accuracy does not reflect the performance of the visual assessment. Indeed, the spectral information, even if quite-well performing in the F1 score, does not provide a good geometrical match with the reference crowns, while the geometrical accuracy of the CHM method outperforms the Hölder exponent results. It is worth underlining that the CHM samples amount to only 162 reference objects due to the simply omitted crowns. Within the RMSE analysis, the performance of the multi-sourced approach is the closest to that of the Hölder exponent.
Analyzing the median values of the indices in Figure 9, the under-segmentation (US) index does not reveal any significant difference between the Hölder exponent and other segmentation procedures. Meanwhile, in the analysis of the over-segmentation (OS), we have similar values from Hölder, sum variance, and the NDVI. The mixed and CHM approaches show the worst results in the completeness (D) and OS. The lowest value of centroid distance is that detected by CHM. It appears that the results of the segmentation based on the NDVI and the multi-sourced inputs (CHM and sum variance textural analysis) are the closest to those of the Hölder exponent. Nevertheless, no methods provided results as accurate as those of the Hölder exponent by using the same simple segmentation.

Discussion
The results of this very first application of multifractals analysis of UAV imagery for the identification of single tree crowns are promising. In a relatively short time (around 13 min), it was possible to analyze 38 hectares of forest using one input layer only. The Hölder exponent analysis results in a clear image of the single tree crowns.
The pixels corresponding to the border of crowns present higher values of Hölder exponent. This most probably led to the underestimation of the dimension of the crowns after the contrast split segmentation. Nevertheless, growing the segmented objects of three pixels and smoothing them allowed us to limit such errors on most of the crowns.
The assessment of the classification reveals promising results. The visual evaluation suggests more than 73% of the F1 score, which is in accordance with similar studies. Indeed, the very recent application of Qiu et al. [8] reaches an accuracy level of 76% in the VHR imagery segmentation, but this is also higher than the producer's and user's accuracy values obtained by Ke and Quackenbush in 2011 [25]. Mohan's and Vieira's works [5,56], respectively, reached 86% and 70% of the F1 score. It is worth mentioning that these comparisons should be interpreted with caution since many aspects can influence the goodness of the ITD. Firstly, the high level of subjectivity affects visual evaluations. Secondly, the characteristics of the study areas have a dominant role in the results of the ITD. Indeed, the illumination distortions due to the topography, the density, and the structure of the stand, along with the dominant species, can influence the results (and the goodness) of the segmentation. To fairly compare the results, we should have at least similar case studies; indeed, the works mentioned above are realized in flat or low-sloped areas over different types of forest stands. The selected ruleset is an additional influencing factor: it must be underlined that the segmentation applied in this study is intentionally plain and can be further improved, especially in the refining phase.
As already mentioned, the visual evaluation is limited in the assessment of the goodness of the segmentation. Several other aspects regarding the shape and the size of the individual tree crowns can be taken into account. The results of the quantitative assessment are clear: the positions of the crowns, as well as their extension, are very well-identified. As evidence, the median value of the centroid distance is 45 cm. Additionally, the area difference is not particularly relevant, since the RMSE represents only 14% of the average crown area. Thanks to the smoothing process, there is an evident match between the borders of the segmented and reference objects (the RMSE on the perimeter is almost 3 m). Although the validation indicates a good segmentation, it is important to underline the difficulty of the manual segmentation of references: even for the human eyes, the identification of single trees is not immediate. This is a quite common weakness of ITD (and, more generally, segmentation) researches. The RMSE of the perimeter has been calculated by Yurtseven et al. in their ITD research [55]. They obtain a 6-m RMSE on the perimeter metric, even though they had the chance to identify the crowns on a 1.2 cm/pixel RGB orthomosaic, as an additional demonstration of the subjectivity and complexity of the reference dataset identification. Compared to the existing works of ITD and segmentation, the Hölder exponent provides results that are perfectly in line with the literature.
The tendency of the proposed method to under-segment more than over-segment is evident also from the comparison of US (0.284) and OS indices (0.084). The Jaccard indicator is 72%, a result which is in line with other studies, despite of the high variability of the delineation of the reference dataset. Hussin et al. [57] applied the OS and US indicators to the assessment of tree segmentation, using satellite imagery of 2-m resolution, and they obtained comparable values for both under-segmentation and over-segmentation. However, in their work, they faced the opposite situation: over-segmentation errors are more dominant than under-segmentation ones. Persello et al. and Clinton et al. [53,54] obtained very similar OS and US results too, even though both studies focused on the segmentation (and classification) of satellite imagery in urban areas. The 0.18 median value resulting from the D index mirrors the values in the literature and it is a relatively good result. The literature reports values between 0.31 and 0.42. Again, these metrics and comparison should be interpreted with caution since they are the results of segmentation from satellite imagery and this does not include the extraction of single tree crowns. Finally, the Jaccard index, or intersection over union index, values vary between 0.05 and 0.95, with 0.72 as the median value.
On the same segmentation process, the results of Hölder exponent segmentation clearly outclass the others from spectral, textural, and CHM information. From this first application, it emerged that Hölder exponent can facilitate the ITD from UAV VHR imagery. Indeed, by applying a basic segmentation process, we obtained satisfying results in line with the literature, but in a relatively short time and with one elevation-independent input layer only. With this approach, the ITD from optical imagery of densely forested areas might be more accurate than simple spectral and elevation-based analysis. Naturally, this work should not be interpreted as an attempt to discredit ITD from the spectral and CHM dataset but as an alternative and computational low-demanding solution to ITD.

Conclusions
The purpose of the current study was to determine the local Hölder exponent connected with multifractal theory and use it for the description of VHR UAV optical imagery and the detection of individual single tree crowns. Although multifractals analysis has been applied in image processing in many different fields, from the medical field to satellite remote sensing, their use for UAV imagery has not been confirmed. The high radiometric variability is typical of the VHR datasets that often introduced noise, which is reflected in imprecision in automatic segmentation and classifications. This aspect was reduced by the multifractal analysis, and the single tree crowns clearly emerged. The Hölder exponent makes the segmentation easier and simpler based on the threshold of the local contrast. The results of the validation are generally satisfying and in line with similar research realized on optical and LiDAR datasets. The main detected errors were classified as under-segmentation problems.
Unfortunately, as far as we know, little research on ITD applies quantitative methods similar to those that we used for the assessment of the segmentation. Indeed, a strong limit in the assessment of ITD is the subjectivity in the definition of the reference dataset. Nevertheless, the obtained results confirm the Hölder exponent applied to VHR imagery as a potentially powerful tool in the ITD. The analysis required a relatively short time and low computational power. Additionally, RGB and NIR sensors mounted on UAVs are systems that are becoming cheaper and easily operable. The present study lays the groundwork for future research into ITD from VHR optical imagery. Since this is its very first application, several aspects still need to be addressed and further investigated. Our focus area was coniferous-dominant, with crowns that present fractal patterns from a nadiral view; we might have very different results on broadleaves forests. Moreover, we worked with the Hölder exponent only and it would be interesting to explore additional measures in different forest types and try to work with different spatial resolutions, spectral bands, and parameters. Additionally, it may be worth testing different neighborhood sizes for the calculation of the Hölder exponent to verify its influence on the analysis. It is worth mentioning that multifractal descriptors can be applied in parallel with the DEM-based method, through the definition of the treetops from the CHM and the delineation of the crown boundaries with segmentation from the multifractal analysis. This may help to ease up the process with the optical sensor on the individual tree crown detection. Among others, some of the most interesting applications of the Hölder-ITD might be for the update of forestry inventories at the local scale and the multi-temporal monitoring of specific forest indicators (and parameters) related to crown size. An additional application of this methodology might be VHR satellite imagery. Several additional analyses and tests can still be conducted, and it is our intention to do so-our research is only the first application that moves in this direction.   Maximum threshold 100,000 Step size 500 Stepping type Add  Maximum threshold 100 Step size 5 Stepping type Add Maximum threshold 10 Step size 5 Stepping Maximum threshold 100 Step size 5 Stepping type Add