Canopy Volume Extraction of Citrus reticulate Blanco cv. Shatangju Trees Using UAV Image-Based Point Cloud Deep Learning

: Automatic acquisition of the canopy volume parameters of the Citrus reticulate Blanco cv. Shatangju tree is of great signiﬁcance to precision management of the orchard. This research combined the point cloud deep learning algorithm with the volume calculation algorithm to segment the canopy of the Citrus reticulate Blanco cv. Shatangju trees. The 3D (Three-Dimensional) point cloud model of a Citrus reticulate Blanco cv. Shatangju orchard was generated using UAV tilt photogrammetry images. The segmentation effects of three deep learning models, PointNet++, MinkowskiNet and FPConv, on Shatangju trees and the ground were compared. The following three volume algorithms: convex hull by slices, voxel-based method and 3D convex hull were applied to calculate the volume of Shatangju trees. Model accuracy was evaluated using the coefﬁcient of determination (R 2 ) and Root Mean Square Error (RMSE). The results show that the overall accuracy of the MinkowskiNet model (94.57%) is higher than the other two models, which indicates the best segmentation effect. The 3D convex hull algorithm received the highest R 2 (0.8215) and the lowest RMSE (0.3186 m 3 ) for the canopy volume calculation, which best reﬂects the real volume of Citrus reticulate Blanco cv. of Citrus reticulate Blanco cv. Shatangju trees. results are closely related to the acquired point clouds of the trees and the characteristics of the different algorithms.


Introduction
Citrus have a long history of cultivation in China, especially in the southern hilly areas, where there are abundant citrus planting resources and various varieties. Among them, Citrus reticulate Blanco cv. Shatangju is one of the famous and superior varieties of citrus. It is well liked by the public and is a common cash crop in south China. Fertile planting land and high-quality orchard management are required to ensure the stable and increased production of Citrus reticulate Blanco cv. Shatangju. Among them, the volume and external structure of the canopy of Citrus reticulate Blanco cv. Shatangju trees are important indicators to measure the growth and biological characteristics of the fruit trees [1][2][3][4]. The fruit growers can judge the nutrient required by the trees and the number of fruits in the fruiting period according to the volume of the canopy. Moreover, the canopy volume is closely related to water evaporation. These factors have a direct impact on the precision management and economic benefits of the orchard [5,6]. Therefore, the automatic acquisition of the canopy volume of the Citrus reticulate Blanco cv. Shatangju tree are of great significance to the precision management of the orchard.
The traditional manual measurement of tree canopy is time-consuming, labor-intensive and inefficient. Ground machinery relies on expensive LiDAR (Light Detection and Ranging) [7], infrared photoelectric sensors [8], ultrasonic sensors [9], etc. for data acquisition and canopy structure evaluation from the side of the trees. In recent years, scholars have carried out wide research on UAV (Unmanned Aerial Vehicle) tilt photogrammetry images acquisition using multi-angle photogrammetry and point cloud modeling of trees. Qin et al. [10] segmented individual trees over large forest areas with airborne LiDAR 3D (Three-dimensional) point cloud and very high-resolution optical imagery and obtained phenotypes characteristics at the individual tree level successfully. Tian et al. [11] compared point cloud data obtained from ground-based laser scanner and UAV images to conduct a feasibility analysis of the canopy height extraction of a planted coniferous forest. Gülci [12] established a canopy height model with UAV remote sensing technology to estimate the number, height and canopy coverage of pine trees. On the basis of UAV tilt photogrammetry images and point cloud reconstruction, Jurado et al. [13] developed an automatic detection method of grapevine trunk using 3D point cloud data with good robustness. Camarretta et al. [14] used UAV-LiDAR for rapid phenotyping of eucalyptus trees to study inter-species differences in productivity and differences in key features of tree structure. Wang et al. [15] combined UAV tilt photogrammetry and machine learning classification algorithms to classify street tree species, in which the BP (Back Propagation) neural network showed the best segmentation effect. UAV tilt photogrammetry images combined with point cloud data processing has become the most popular method of individual tree volume assessment.
In the whole orchard or the forest scenario, it is first necessary to locate and segment each tree to measure the canopy volume of a single tree. The extraction of single tree information mainly includes single tree identification and single tree canopy extraction. With the continuous development of deep learning, the segmentation and extraction of trees is also constantly improving. For the segmentation of 2D (Two-Dimensional) images, scholars have tended to use a convolutional neural network in deep learning [16][17][18] from traditional segmentation methods based on edge detection, threshold, region and specific theoretical tools. For example, Martins et al. [19] segmented the trees in the urban environment image, Yan et al. [20] identified different tree species and Chadwick et al. [21] extracted the height of the crown of a single coniferous tree. However, 2D images have problems such as insufficient spatial information and occlusion. Therefore, 3D point cloud data containing rich spatial information is gradually dominating. Deep learning point cloud semantic segmentation has become a research hotspot in the fields of autonomous driving, navigation and positioning, smart city and medical image segmentation [22][23][24][25]. However, there is relatively less research in the field of tree segmentation, especially in the orchard scenario. In addition, fruit tree parameters include tree height, canopy diameter and canopy volume. Due to the fractal nature of plants, the definition of fruit tree volume is quite subjective [26]. At present, the volume calculation algorithms of trees are mainly geometric calculation methods and voxel-based methods. The geometric calculation method treats the acquired point cloud data as a geometric body, and directly takes the volume of the geometric body as the volume of the canopy. Among them, the most commonly used geometric calculation method is the 3D convex hull algorithm, which has been applied in many studies [27][28][29]. Fernández-Sarría et al. [30] improved the overall 3D convex hull algorithm by using a height interval of 5 cm for segmentation. By applying the 3D convex hull, the volume of each segmented block was calculated and then the volumes of the small blocks were summed to obtain the canopy volume. The slicing convex hull method can also be adopted to calculate the canopy volume by dividing the plane area of a single canopy slice into small pieces, and then adding the convex hull volume of the slices to obtain the canopy volume [31]. The voxel-based method [32,33] uses conventional Remote Sens. 2021, 13, 3437 3 of 20 3D grids to represent discrete canopy point clouds. For example, Wu et al. [34] applied the voxel-based method to separate the canopy and trunk of a single tree from the point cloud data, and estimated the tree height, canopy diameter and canopy height. However, the volume calculation algorithms mentioned above are all results of a single tree, ignoring the importance of the segmentation of the entire orchard or forest. Therefore, the point cloud deep learning algorithm combined with the volume calculation algorithm is used to segment the canopy of the Citrus reticulate Blanco cv. Shatangju trees. In this way, the advantages of both can be effectively combined to quickly, automatically and accurately obtain the canopy volume of a single Shatangju tree.
A set of methods to extract the canopy volume parameters of an individual Citrus reticulate Blanco cv. Shatangju tree using UAV tilt photogrammetry images and point cloud deep learning was proposed in this study, in order to provide a reference for the precision management of an orchard. UAV tilt photogrammetry images of Citrus reticulate Blanco cv. Shatangju trees were acquired to generate a 3D point cloud model of the orchard. Point cloud deep learning algorithms were applied to the model for the segmentation of individual trees. The height, canopy width and volume of each tree were calculated and then compared by different volume algorithms.

Experimental Site
The experimental site of this study is located in an orchard of Sihui City, Guangdong Province, with geographical coordinates of 112.68 • E and 23.36 • N ( Figure 1). This area has a subtropical monsoon climate with an average annual temperature of about 21.9 • C and an average precipitation of about 1716 mm. A total of 160 Citrus reticulate Blanco cv. Shatangju trees in 16 rows × 10 columns were selected for this study, of which 128 trees were determined as the training set and 32 trees as the validation set. A total of four trials were conducted with an interval of about 20 days in order to collect canopy data of Citrus reticulate Blanco cv. Shatangju trees in different growth periods. Experiments were conducted on sunny days with low wind speed to ensure the accuracy of the images captured using the UAV.

Acquisition of UAV Tilt Photogrammetry Images
The acquisition platform is DJI Phantom 4 RTK (Real Time Kinematic) UAV (DJI, Shenzhen, China), which is equipped with a 20-megapixel CMOS (Complementary Metal-Oxide-Semiconductor) sensor. The RTK and six-way vision sensors allow the flight of the UAV safer and more stable [35][36][37]. The Phantom 4 RTK has a maximum take-off weight of 1391 g, and a maximum horizontal flight speed of 14 m/s. Its maximum ascent and descent speed is 5 and 3 m/s, respectively. The pitch angle range of the camera is approximately −90 • to +30 • . UAV tilt photogrammetry images were acquired with a flight height of 10 m and a cruise speed of 1 m/s. The overlap of flight path is 85% in both the side and heading directions. The image resolution is 5472 pixels × 3648 pixels, and the ground resolution is 0.37 cm. The DJI platform and the UAV tilt photogrammetry image acquisition scene are shown in Figure 2.

Acquisition of UAV Tilt Photogrammetry Images
The acquisition platform is DJI Phantom 4 RTK (Real Time Kinematic) UAV (DJI, Shenzhen, China), which is equipped with a 20-megapixel CMOS (Complementary Metal-Oxide-Semiconductor) sensor. The RTK and six-way vision sensors allow the flight of the UAV safer and more stable [35][36][37]. The Phantom 4 RTK has a maximum take-off weight of 1391 g, and a maximum horizontal flight speed of 14 m/s. Its maximum ascent and descent speed is 5 and 3 m/s, respectively. The pitch angle range of the camera is approximately −90° to +30°. UAV tilt photogrammetry images were acquired with a flight height of 10 m and a cruise speed of 1 m/s. The overlap of flight path is 85% in both the side and heading directions. The image resolution is 5472 pixels × 3648 pixels, and the ground resolution is 0.37 cm. The DJI platform and the UAV tilt photogrammetry image acquisition scene are shown in Figure 2. Ground Control Points (GCPs) were added and RTK base station were set up in this experiment to ensure the accuracy of the subsequent point cloud information. The horizontal accuracy of RTK base station is 1 cm + 1 ppm, and the vertical accuracy is 2 cm + 1 ppm (1 ppm means that the accuracy deteriorates 1 mm with every 1 km increase). The high-precision GPS (Global Positioning System) information of the locating points was obtained using a punter ( Figure 3). This method can control the error within 6 cm between point cloud analysis and the true value from manual measurement. Ground Control Points (GCPs) were added and RTK base station were set up in this experiment to ensure the accuracy of the subsequent point cloud information. The horizontal accuracy of RTK base station is 1 cm + 1 ppm, and the vertical accuracy is 2 cm + 1 ppm (1 ppm means that the accuracy deteriorates 1 mm with every 1 km increase). The high-precision GPS (Global Positioning System) information of the locating points was obtained using a punter ( Figure 3). This method can control the error within 6 cm between point cloud analysis and the true value from manual measurement. Pix4Dmapper was applied to process the tilt photogrammetry images obtained using the UAV. This software can quickly and automatically convert thousands of images into Pix4Dmapper was applied to process the tilt photogrammetry images obtained using the UAV. This software can quickly and automatically convert thousands of images into professional and accurate planar maps or 3D models. Pix4Dmapper obtains point cloud data and carries out post-process through the principle of multi-angle reconstruction of photogrammetry and aerial photographs. It is widely used in the fields of aerial and engineering photogrammetry, computer animation and remote sensing [38].
After importing the image data obtained in this experiment into Pix4Dmapper, the software automatically reads the pose information of the images based on the selected coordinate system. WGS 84 (World Geodetic System-1984)/UTM (Universal Transverse Mercator) Zone 50N projection coordinate system is used by default. The software first monitors the regional integrity and data quality of aerial photogrammetry, then marks control points, and automatically generates point cloud and texture information, and finally generates 3D textured mesh to form a 3D model of the experimental site ( Figure 4).

Deep Learning Algorithms
Point cloud semantic segmentation is a technology to divide the point cloud in eral specific regions with unique properties, and to identify the point cloud inform [39]. Traditional point cloud feature extraction methods are mainly classified int features [22], regional features [23], global features and multi-scale features [40,41]. classification of point cloud data is ideal to avoid transformation error. Researcher proposed many point-cloud-based 3D deep learning architectures, among whi PointNet algorithm first proposed by Qi et al. [42] has become the most classica learning algorithm for point cloud segmentation due to its fewer model paramete faster training speed. In this study, a classical point cloud deep learning network an recent innovative network models were selected to carry out semantic segmentat the point clouds of fruit trees and non-fruit trees. The algorithms suitable for this sc were compared as well.
PointNet++ [43] is an extension of the architecture of PointNet with an add hierarchical structure, which performs hierarchical feature extraction by building a mid-like aggregation scheme to combine features at multiple scales. Its network str is divided into sampling layer, grouping layer and PointNet layer. Figure 5 sho overall network architecture of PointNet++.
In the sampling layer, Farthest Point Sampling (FPS) method is selected in the point cloud samples in order to cover the whole 3D spatial point cloud as much as ble. The principle of FPS method is as follows: firstly, a starting point is selected fro N points of the input point cloud data and denoted as K0; secondly, the Euclidean di between the remaining N-1 points and the starting point is calculated, and the poin the largest distance is recorded as K1; then the shortest Euclidean distance betwe points and K0 and K1 is calculated and summed up, respectively, and the point w largest distance is recorded as K2, etc. until the specified number of points is reach the grouping layer, PointNet++ implements the grouping of point clouds through K In CloudCompare software, the point cloud cropping tool was used to manually separate the trees from the land of the whole experimental site, and remove point cloud noise. The samples for training and validation were determined as well.

Deep Learning Algorithms
Point cloud semantic segmentation is a technology to divide the point cloud into several specific regions with unique properties, and to identify the point cloud information [39]. Traditional point cloud feature extraction methods are mainly classified into local features [22], regional features [23], global features and multi-scale features [40,41]. Direct classification of point cloud data is ideal to avoid transformation error. Researchers have proposed many point-cloud-based 3D deep learning architectures, among which the PointNet algorithm first proposed by Qi et al. [42] has become the most classical deep learning algorithm for point cloud segmentation due to its fewer model parameters and faster training speed. In this study, a classical point cloud deep learning network and two recent innovative network models were selected to carry out semantic segmentation on the point clouds of fruit trees and non-fruit trees. The algorithms suitable for this scenario were compared as well.
PointNet++ [43] is an extension of the architecture of PointNet with an additional hierarchical structure, which performs hierarchical feature extraction by building a pyramidlike aggregation scheme to combine features at multiple scales. Its network structure is divided into sampling layer, grouping layer and PointNet layer. Figure 5 shows the overall network architecture of PointNet++. regional features of point clouds through the hierarchical structure, which makes the network structure faster and more stable. However, PointNet++ has the same problem as PointNet network, which is that is extracts the local features of points individually without establishing the connection between points. The learning of point cloud data is not sufficient. MinkowskiNet is a generalized sparse 3D convolutional algorithm proposed by Choy et al. [44] for efficient processing of high-dimensional data, which solves the problem of low efficiency in the application of dense convolutional neural networks on spatial sparse data with better accuracy than 2D (Two-dimensional) or hybrid deep learning algorithms.
Convolution is a fundamental operation in many fields. This network adopts convolution on sparse tensors, and proposes a generalized convolution on sparse tensors. Originally, the most common representations for extracting feature-intensive data were vectors, matrices and tensors. However, for 3D or even higher dimensional space, such intensive representations are inefficient and the effective information occupies only a small portion of that space, resulting in a waste of resources. Therefore, information can only be stored in non-empty regions of the space, as if it is stored in a sparse matrix. This representation is the N-dimensional extension of the sparse matrix, which is called the sparse tensor. MinkowskiNet network defines neural networks specific to these input sparse tensor networks, which process and generate sparse tensors. To build a sparse tensor network, all standard neural network layers are constructed in the same way as those defined on dense tensors and implemented in the network model, such as MLP (Muti-Layer Perception), nonlinear, convolution, pooling and other operations.
Generalized sparse convolution treats all discrete convolutions as its subclass, which is crucial for high-dimensional perception. It calculates only the outputs of predefined coordinates and saves them into a compact sparse tensor. MinkowskiNet network convolves universal input and output coordinates and arbitrary kernel shapes. It allows the sparse tensor network to extend to very high spaces and dynamically generate task coordinates, especially for high-dimensional data, which can serve as a memory and computation saver. The operational flow of the MinkowskiNet network is specified in Figure 6. In the sampling layer, Farthest Point Sampling (FPS) method is selected in the input point cloud samples in order to cover the whole 3D spatial point cloud as much as possible. The principle of FPS method is as follows: firstly, a starting point is selected from the N points of the input point cloud data and denoted as K0; secondly, the Euclidean distance between the remaining N-1 points and the starting point is calculated, and the point with the largest distance is recorded as K1; then the shortest Euclidean distance between N-2 points and K0 and K1 is calculated and summed up, respectively, and the point with the largest distance is recorded as K2, etc. until the specified number of points is reached. In the grouping layer, PointNet++ implements the grouping of point clouds through K neighborhood algorithm and Ball query algorithm. K-neighborhood algorithm finds the nearest neighbors around the center point and creates multiple subsets of the point cloud according to the number of points. Ball query algorithm selects a central point and regards the points in a spherical region within a certain radius of the center point as a local region. The input sample point cloud is divided into overlapping local regions by sampling and layering/grouping operations. PointNet network is then used to convolve and pool the point clouds to obtain higher-order feature representations of these point cloud subsets. In addition, a density-adaptive entry layer is added to the PointNet layer to learn the features of different scale regions when the input sampling density is changed.
The PointNet++ network solves the problem of uneven sampling of point cloud data and considers the distance measurement between points at the same time. It learns local regional features of point clouds through the hierarchical structure, which makes the network structure faster and more stable. However, PointNet++ has the same problem as PointNet network, which is that is extracts the local features of points individually without establishing the connection between points. The learning of point cloud data is not sufficient.
MinkowskiNet is a generalized sparse 3D convolutional algorithm proposed by Choy et al. [44] for efficient processing of high-dimensional data, which solves the problem of low efficiency in the application of dense convolutional neural networks on spatial sparse data with better accuracy than 2D (Two-dimensional) or hybrid deep learning algorithms.
Convolution is a fundamental operation in many fields. This network adopts convolution on sparse tensors, and proposes a generalized convolution on sparse tensors. Originally, the most common representations for extracting feature-intensive data were vectors, matrices and tensors. However, for 3D or even higher dimensional space, such intensive representations are inefficient and the effective information occupies only a small portion of that space, resulting in a waste of resources. Therefore, information can only be stored in non-empty regions of the space, as if it is stored in a sparse matrix. This representation is the N-dimensional extension of the sparse matrix, which is called the sparse tensor. MinkowskiNet network defines neural networks specific to these input sparse tensor networks, which process and generate sparse tensors. To build a sparse tensor network, all standard neural network layers are constructed in the same way as those defined on dense tensors and implemented in the network model, such as MLP (Muti-Layer Perception), nonlinear, convolution, pooling and other operations.
Generalized sparse convolution treats all discrete convolutions as its subclass, which is crucial for high-dimensional perception. It calculates only the outputs of predefined coordinates and saves them into a compact sparse tensor. MinkowskiNet network convolves universal input and output coordinates and arbitrary kernel shapes. It allows the sparse tensor network to extend to very high spaces and dynamically generate task coordinates, especially for high-dimensional data, which can serve as a memory and computation saver. The operational flow of the MinkowskiNet network is specified in Figure 6.
ConvTr3 (512) ConvTr2 (512) ConvTr1 (512) Conv4(512) The MinkowskiNet network starts with data processing to generate the sparse tensor, which uses batch indexing to expand the sparse tensor coordinates, converting them into unique coordinates, associative features and optional labels during training semantic segmentation. Then, the output coordinates Cout are generated by inputting the given coordinates Cin. This process requires the addition of kernel mapping on the basis of the convolutional step size, the input coordinates and the step size (minimum distance between coordinates) of the input sparse tensor, as it defines how to map the input to the output through the kernel. Since the pooling process of this method may remove the density information, a variant is proposed that does not divide the number of inputs and is named the total pool, dividing the merged elements by the number of inputs mapped to each output. Finally, a generalized sparse convolution is used to create a high-dimensional network, which makes the network easier and more universal. For the U-shaped variant, multiple cross-sparse convolutions and cross-sparse transposed convolutions are added to the basic residual network, and the layers with the same span size are connected using skip connections. Figure 6 shows the overall network architecture of MinkowskiNet.
FPConv [45] is a 2D convolution algorithm that can directly process the surface geometry of a point cloud without converting it to an intermediate representation (e.g., a 3D grid or graph). FPConv is able to apply regular 2D convolution to effective feature learning by automatically learning weight maps to gently project surrounding points onto a 2D grid for local expansion. This network model cleverly maps the local point cloud into a 2D plane through interpolation, and finally uses 2D convolution to compute the features.
FPConv maps N points within a point neighborhood into a 2D plane of Mw×Mh, with Mw and Mh denoting the width and height of the plane, respectively. Then, PointNet algorithm is applied to calculate the local features of the relative coordinates of the N points, and the local features are mosaicked together with the coordinates of each point. Each position on the plane can be filled with a new feature obtained by feature interpolation of the N points. Interpolation is essentially a weighted sum of the N features. A network is then built to learn and obtain the weights of the corresponding weighted sum at each position. Since there are N points, each position corresponds to N weight parameters, which are obtained through the MLP network. There are Mw×Mh positions on the mapping plane; therefore, there is a total of N×(Mw×Mh) parameters to learn. After learning The MinkowskiNet network starts with data processing to generate the sparse tensor, which uses batch indexing to expand the sparse tensor coordinates, converting them into unique coordinates, associative features and optional labels during training semantic segmentation. Then, the output coordinates Cout are generated by inputting the given coordinates Cin. This process requires the addition of kernel mapping on the basis of the convolutional step size, the input coordinates and the step size (minimum distance between coordinates) of the input sparse tensor, as it defines how to map the input to the output through the kernel. Since the pooling process of this method may remove the density information, a variant is proposed that does not divide the number of inputs and is named the total pool, dividing the merged elements by the number of inputs mapped to each output. Finally, a generalized sparse convolution is used to create a high-dimensional network, which makes the network easier and more universal. For the U-shaped variant, multiple cross-sparse convolutions and cross-sparse transposed convolutions are added to the basic residual network, and the layers with the same span size are connected using skip connections. Figure 6 shows the overall network architecture of MinkowskiNet.
FPConv [45] is a 2D convolution algorithm that can directly process the surface geometry of a point cloud without converting it to an intermediate representation (e.g., a 3D grid or graph). FPConv is able to apply regular 2D convolution to effective feature learning by automatically learning weight maps to gently project surrounding points onto a 2D grid for local expansion. This network model cleverly maps the local point cloud into a 2D plane through interpolation, and finally uses 2D convolution to compute the features.
FPConv maps N points within a point neighborhood into a 2D plane of Mw×Mh, with Mw and Mh denoting the width and height of the plane, respectively. Then, PointNet algorithm is applied to calculate the local features of the relative coordinates of the N points, and the local features are mosaicked together with the coordinates of each point. Each position on the plane can be filled with a new feature obtained by feature interpolation of the N points. Interpolation is essentially a weighted sum of the N features. A network is then built to learn and obtain the weights of the corresponding weighted sum at each position. Since there are N points, each position corresponds to N weight parameters, which are obtained through the MLP network. There are Mw×Mh positions on the mapping plane; therefore, there is a total of N×(Mw×Mh) parameters to learn. After learning these parameters, the features of N points can be used to interpolate Mw×Mh features. Finally, the interpolation result is considered as a 2D picture, and a feature is computed to represent the point features using traditional 2D convolutional network and pooling operation. Figure 7 shows the network architecture of FPConv.

Accuracy Evaluation of Algorithms
In this study, the performance of the semantic segmentation model is evaluated by accuracy and mIoU (mean Intersection over Union). The segmentation accuracy is implemented based on the confusion matrix by dividing the sum of the diagonals of the confusion matrix by the sum of the elements of this matrix. The mIoU calculates the average value of the IoU (Intersection over Union) for each class, while IoU is obtained by measuring the ratio of intersection and union between the actual and model annotations. The two evaluation indexes are calculated as follows: where TP denotes correct classification of detection target, TN denotes correct classification of background, FP denotes wrong classification of background as detection target, FN denotes wrong classification of detection target as background, C is category and pij is the number of objects or points that belong to the i-th category but are predicted to be in the j-th category.

Manual Calculation Method
This experiment requires large amount of manual measurement, which is time-consuming and laborious. Thirty-two Citrus reticulate Blanco cv. Shatangju trees were selected for manual measurement. A total of four trials were taken with a field rod and tape measure. Figure 8 shows the measured parameters. Since the Shatangju trees in this study are nearly spherical in shape, the volume of spherical canopy was calculated using the following equation [46,47]: where VM is manually measured volume (m 3 ), D is mean maximum diameter (m), Ht is total canopy height (m), Hc is height from the ground to the maximum diameter of the canopy (m) and Hs is height from the ground to the bottom of the canopy (m).

Accuracy Evaluation of Algorithms
In this study, the performance of the semantic segmentation model is evaluated by accuracy and mIoU (mean Intersection over Union). The segmentation accuracy is implemented based on the confusion matrix by dividing the sum of the diagonals of the confusion matrix by the sum of the elements of this matrix. The mIoU calculates the average value of the IoU (Intersection over Union) for each class, while IoU is obtained by measuring the ratio of intersection and union between the actual and model annotations. The two evaluation indexes are calculated as follows: where TP denotes correct classification of detection target, TN denotes correct classification of background, FP denotes wrong classification of background as detection target, FN denotes wrong classification of detection target as background, C is category and p ij is the number of objects or points that belong to the i-th category but are predicted to be in the j-th category.

Manual Calculation Method
This experiment requires large amount of manual measurement, which is timeconsuming and laborious. Thirty-two Citrus reticulate Blanco cv. Shatangju trees were selected for manual measurement. A total of four trials were taken with a field rod and tape measure. Figure 8 shows the measured parameters. Since the Shatangju trees in this study are nearly spherical in shape, the volume of spherical canopy was calculated using the following equation [46,47]:

Phenotypic Parameters Obtained by the Algorithms
Point cloud data of the Citrus reticulate Blanco cv. Shatangju trees is p Cartesian coordinate system, where the value of the highest point in the Z Zmax, the value of the lowest point is Zmin and the difference between them of tree, Ht, in meters (m).

= −
The canopy width is usually divided into width in east-west directi north-south direction, D2. Since the coordinate system in the Pix4Dmappe WGS 84, the X and Y axes of the point cloud data in the 3D coordinate system to the east-west and north-south directions in the geographic location, respec projecting 3D point cloud data onto a 2D plane, the maximum distance in th tions is calculated, and then the average value is taken as the average maximu D.
The key to the precision spraying with agricultural UAVs in orchards l curate volume calculation. In this study, the following three algorithms were s reference to the relatively well-researched volume algorithms: convex hu voxel-based method and 3D convex hull.
Convex hull by slices method divides the point cloud into several irreg drons and an irregular cone at the top layer according to the composition canopy point cloud. The whole canopy point cloud is stratified in the direc

Phenotypic Parameters Obtained by the Algorithms
Point cloud data of the Citrus reticulate Blanco cv. Shatangju trees is placed in the Cartesian coordinate system, where the value of the highest point in the Z direction is Zmax, the value of the lowest point is Zmin and the difference between them is the height of tree, Ht, in meters (m).
The canopy width is usually divided into width in east-west direction, D 1 , and northsouth direction, D 2 . Since the coordinate system in the Pix4Dmapper software is WGS 84, the X and Y axes of the point cloud data in the 3D coordinate system correspond to the east-west and north-south directions in the geographic location, respectively. After projecting 3D point cloud data onto a 2D plane, the maximum distance in the two directions is calculated, and then the average value is taken as the average maximum diameter, D.
The key to the precision spraying with agricultural UAVs in orchards lies in the accurate volume calculation. In this study, the following three algorithms were selected with reference to the relatively well-researched volume algorithms: convex hull by slices, voxel-based method and 3D convex hull.
Convex hull by slices method divides the point cloud into several irregular polyhedrons and an irregular cone at the top layer according to the composition shape of the canopy point cloud. The whole canopy point cloud is stratified in the direction of the Z axis according to a certain interval ∆h. The setting of the parameter ∆h is related to the density of the point cloud; if ∆h is too large, the volume estimation is not accurate; if ∆h is too small, the calculation is complicated and inefficient. In actual data processing, it is usually set to 1 to 5 times the average point cloud spacing. The ∆h in this study was finally set to 2 cm. The contour points of each layer of slices are extracted, and the vertices of the convex hull are connected in turn to form a closed polygon. Then, the convex hull area, Si, of each point cloud slice is calculated, and the volume of each part is calculated by using the volume calculation equation of polyhedron and cone. Finally, the overall canopy volume is obtained by summing up the volumes of each part [31,48]. A schematic diagram of convex hull using the slices method is shown in Figure 9. Voxel-based method allows simple and efficient segmentation of a point cloud and represents it as a group of volume elements [32,34]. A voxel is a cuboid building block whose geometry is defined by length, width and height. Voxel-based method can transform discrete points with disordered distribution into voxel filtering with topological relations [33]. Firstly, the maximum and minimum values of the point cloud data in the X, Y and Z directions are calculated to determine a cuboid block that encloses the overall canopy point cloud. Secondly, the three-dimensional space of this block is equally divided to n small cubes (the step size in this study was set to 1 cm in all three directions). Then, the coordinates of the point cloud data are traversed to determine whether each small cube contains point cloud data. If it does, a small cube is created with that grid point as the center; otherwise, no small cubes are built. Finally, all grid points are iterated and the number of cubes containing point cloud is counted. The canopy volume of the Shatangju tree can be calculated given the volume of each cube. The schematic diagram of this method is shown in Figure 10a.
The 3D convex hull method involves the creation of a minimal convex hull that encloses all point clouds in 3D space for the point cloud data of Shatangju trees, which is defined by the set of external small planes that wrap the entire point cloud [27,48]. The merge is performed to ensure that the whole shape is convex and does not contain errors caused by non-convex solutions. The volume of the convex hull is then calculated using the small planes as boundaries. Its boundary consists of many Delaunay triangles. The internal gaps of the convex hull are filled to generate the solids (Figure 10b). Generally, convex hulls are mainly calculated using the incremental algorithm, gift-wrapping algorithm, divide-and-conquer method and quick-hull algorithm [28,29]. The quick-hull algorithm was selected in this study. Its algorithm steps are as follows: 1. The point cloud of the Shatangju tree canopy was converted to .txt format data. Six coordinate points in the point cloud (including the maximum and minimum values of the coordinates) were selected to generate an irregular octahedron and form the initial convex hull model. At this moment, there were some points outside the octahedron. These points formed the new convex hull boundary, which were divided into eight separate regions by the octahedron. The point cloud inside the initial convex hull was removed when the polyhedron was built.
2. Among the points in the eight regions that were divided, the vertical distances of these points to the corresponding planes were compared and the point with the largest distance in each region was selected. The points selected in step 1 were merged with the newly selected points to form a new triangle and convex hull. Again, the points inside the new convex packet were deleted.
3. By repeating step 2, the point farthest from each new triangular plane was selected to create a new convex hull. The points inside the convex hull were deleted until there Voxel-based method allows simple and efficient segmentation of a point cloud and represents it as a group of volume elements [32,34]. A voxel is a cuboid building block whose geometry is defined by length, width and height. Voxel-based method can transform discrete points with disordered distribution into voxel filtering with topological relations [33]. Firstly, the maximum and minimum values of the point cloud data in the X, Y and Z directions are calculated to determine a cuboid block that encloses the overall canopy point cloud. Secondly, the three-dimensional space of this block is equally divided to n small cubes (the step size in this study was set to 1 cm in all three directions). Then, the coordinates of the point cloud data are traversed to determine whether each small cube contains point cloud data. If it does, a small cube is created with that grid point as the center; otherwise, no small cubes are built. Finally, all grid points are iterated and the number of cubes containing point cloud is counted. The canopy volume of the Shatangju tree can be calculated given the volume of each cube. The schematic diagram of this method is shown in Figure 10a. were no points outside the convex hull. Finally, an n-sided convex hull is formed, and the volume of this 3D convex hull model was taken as the volume of the tree canopy.

Evaluation of Model Accuracy
The accuracy of the model in this study is evaluated by the following two indicators: coefficient of determination (R 2 ) and Root Mean Square Error (RMSE). An R 2 value close to 1 indicates a better fit of the volume calculation. The smaller the RMSE is, the smaller the deviation of the predicted value is from the true value, i.e., the closer the calculated value of the volumetric algorithm is to the true value of the manual measurement, the higher the prediction accuracy of the model is. The calculation equation is as follows. The 3D convex hull method involves the creation of a minimal convex hull that encloses all point clouds in 3D space for the point cloud data of Shatangju trees, which is defined by the set of external small planes that wrap the entire point cloud [27,48]. The merge is performed to ensure that the whole shape is convex and does not contain errors caused by non-convex solutions. The volume of the convex hull is then calculated using the small planes as boundaries. Its boundary consists of many Delaunay triangles. The internal gaps of the convex hull are filled to generate the solids (Figure 10b). Generally, convex hulls are mainly calculated using the incremental algorithm, gift-wrapping algorithm, divide-and-conquer method and quick-hull algorithm [28,29]. The quick-hull algorithm was selected in this study. Its algorithm steps are as follows: 1.
The point cloud of the Shatangju tree canopy was converted to .txt format data. Six coordinate points in the point cloud (including the maximum and minimum values of the coordinates) were selected to generate an irregular octahedron and form the initial convex hull model. At this moment, there were some points outside the octahedron. These points formed the new convex hull boundary, which were divided into eight separate regions by the octahedron. The point cloud inside the initial convex hull was removed when the polyhedron was built.

2.
Among the points in the eight regions that were divided, the vertical distances of these points to the corresponding planes were compared and the point with the largest distance in each region was selected. The points selected in step 1 were merged with the newly selected points to form a new triangle and convex hull. Again, the points inside the new convex packet were deleted.

3.
By repeating step 2, the point farthest from each new triangular plane was selected to create a new convex hull. The points inside the convex hull were deleted until there were no points outside the convex hull. Finally, an n-sided convex hull is formed, and the volume of this 3D convex hull model was taken as the volume of the tree canopy.

Evaluation of Model Accuracy
The accuracy of the model in this study is evaluated by the following two indicators: coefficient of determination (R 2 ) and Root Mean Square Error (RMSE). An R 2 value close to 1 indicates a better fit of the volume calculation. The smaller the RMSE is, the smaller the deviation of the predicted value is from the true value, i.e., the closer the calculated value of the volumetric algorithm is to the true value of the manual measurement, the higher the prediction accuracy of the model is. The calculation equation is as follows.
where SSE is the Sum of Squares for Error, SST is the Sum of Square for total, Y j is the sample predicted value, X j is the sample true value and M is the number of samples.

Segementation Accuracy of Deep Learning
Three deep learning models were applied to classify orchard point cloud data as described previously. Table 1 shows the prediction performance of each model based on the validation set of mIoU. The PointNet++ model has the lowest value in all three evaluation indexes, where the segmentation accuracy of trees is only 27.78%. The MinkowskiNet model achieved the highest value in all three evaluation indexes, with a mIoU value as high as 94.57%, and the best segmentation effect. The FPConv model also has a better classification, but the segmentation accuracy of trees is slightly lower. MinkowskiNet has a higher accuracy than PointNet++ for both trees and ground. The latter only uses the spatial information of the point cloud, while the former can learn the spatial information and color information of each point. In the training process (Figure 11), the accuracy of the PointNet++ model started to fluctuate as it converged from the 100th epoch, and almost converged after the 200th epoch, but there were still small fluctuations in the later stages. The accuracy of the MinkowskiNet model was as high as 80% in the initial training and converged after the 50th pass. The accuracy of the FPConv model in the initial training was 57% for the initial training and increased rapidly between the 180th and 200th epoch, and basically converged after the 200th time. By comparing the three deep learning models, it can be seen that the MinkowskiNet model requires fewer training epochs, and it is faster and more stable as well.
Remote Sens. 2021, 13, x FOR PEER REVIEW In the training process (Figure 11), the accuracy of the PointNet++ model st fluctuate as it converged from the 100th epoch, and almost converged after th epoch, but there were still small fluctuations in the later stages. The accuracy of t kowskiNet model was as high as 80% in the initial training and converged after t pass. The accuracy of the FPConv model in the initial training was 57% for the initi ing and increased rapidly between the 180th and 200th epoch, and basically con after the 200th time. By comparing the three deep learning models, it can be seen MinkowskiNet model requires fewer training epochs, and it is faster and more s well. The qualitative results of the three segmentation models are shown in Figure 1 point represents a classified component (tree or ground). The segmentation accu the PointNet++ model is relatively low. There were points of the ground incorrec mented as points of the trees, and most of the incorrectly segmented points were trated on the left side of the figure. There were missed segmentations of points tree; therefore, its segmentation accuracy of trees was low. The segmentation accu both the trees and the ground in the MinkowskiNet model was very high, onl points of individual trees were incorrectly segmented as points of the ground. The segmentation accuracy of the FPConv model was relatively high, and each tree had that were wrongly divided into ground points. Therefore, it can be concluded MinkowskiNet model has the best effects for the point segmentation of the trees ground.
In conclusion, the MinkowskiNet model achieved the best results among th The qualitative results of the three segmentation models are shown in Figure 12. Each point represents a classified component (tree or ground). The segmentation accuracy of the PointNet++ model is relatively low. There were points of the ground incorrectly segmented as points of the trees, and most of the incorrectly segmented points were concentrated on the left side of the figure. There were missed segmentations of points of each tree; therefore, its segmentation accuracy of trees was low. The segmentation accuracy of both the trees and the ground in the MinkowskiNet model was very high, only some points of individual trees were incorrectly segmented as points of the ground. The ground segmentation accuracy of the FPConv model was relatively high, and each tree had points that were wrongly divided into ground points. Therefore, it can be concluded that the MinkowskiNet model has the best effects for the point segmentation of the trees and the ground. Remote Sens. 2021, 13, x FOR PEER REVIEW 14 of 21 Figure 12. Semantic segmentation results for the orchard dataset.

Accuracy of Phenotypic Parameters Acquisition Model
The canopy volumes of 32 sample trees were calculated using the four calculation methods proposed in Section 2.3. The results of the three volume algorithms are shown schematically in Figure 13, which demonstrates the volume morphology of the Shatangju trees. The calculation results of the measurement sample in the first trial are listed in Table  2. The statistics of the total results of the four trials show that the height of the Shatangju trees ranged in 1.06-2.15 m and the tree canopy width ranged in 0.92-1.94 m. In addition, the volumes calculated from the manual measurement, convex hull by slices algorithm, voxel-based method and 3D convex hull algorithm were 0.53-4.28, 0.48-4.64, 0.40-2.94 and 0.46-4.02 m 3 , respectively. For the same tree, the volume calculated using the convex hull by slices algorithm was the largest value, followed by the voxel-based method and the 3D convex hull algorithm. The significant differences in the calculated results are closely related to the acquired point clouds of the trees and the characteristics of the different algorithms. In conclusion, the MinkowskiNet model achieved the best results among the three deep learning models with the highest segmentation accuracy and the least convergent training times.

Accuracy of Phenotypic Parameters Acquisition Model
The canopy volumes of 32 sample trees were calculated using the four calculation methods proposed in Section 2.3. The results of the three volume algorithms are shown schematically in Figure 13, which demonstrates the volume morphology of the Shatangju trees. The calculation results of the measurement sample in the first trial are listed in Table 2. The statistics of the total results of the four trials show that the height of the Shatangju trees ranged in 1.06-2.15 m and the tree canopy width ranged in 0.92-1.94 m. In addition, the volumes calculated from the manual measurement, convex hull by slices algorithm, voxel-based method and 3D convex hull algorithm were 0.53-4.28, 0.48-4.64, 0.40-2.94 and 0.46-4.02 m 3 , respectively. For the same tree, the volume calculated using the convex hull by slices algorithm was the largest value, followed by the voxel-based method and the 3D convex hull algorithm. The significant differences in the calculated results are closely related to the acquired point clouds of the trees and the characteristics of the different algorithms. Remote Sens. 2021, 13    Regression models between each attribute and the manually obtained true values were analyzed to verify whether the point cloud results obtained from trees reliably represent the canopy structure. The manually measured height and diameter mean values were taken as true values. The point cloud heights and canopy widths acquired in CloudCompare software were compared with the true values. From the results shown in Figure 14, it can be seen that the R 2 between the heights acquired based on the point clouds and the manually measured height is 0.9571 with an RMSE of 0.0445 m. The R 2 between the canopy width acquired from the point clouds and the manually measured diameter is 0.9215 with an RMSE of 0.0587 m. It indicates that the Shatangju tree parameters obtained from the point clouds can reliably reflect the real tree height and canopy width. There was a high correlation between the different growth periods. Therefore, in order to automate the calculation of volume parameters, the point cloud data can be directly applied as true values for volume calculation. Regression models between each attribute and the manually obtained true values were analyzed to verify whether the point cloud results obtained from trees reliably represent the canopy structure. The manually measured height and diameter mean values were taken as true values. The point cloud heights and canopy widths acquired in Cloud-Compare software were compared with the true values. From the results shown in Figure  14, it can be seen that the R 2 between the heights acquired based on the point clouds and the manually measured height is 0.9571 with an RMSE of 0.0445 m. The R 2 between the canopy width acquired from the point clouds and the manually measured diameter is 0.9215 with an RMSE of 0.0587 m. It indicates that the Shatangju tree parameters obtained from the point clouds can reliably reflect the real tree height and canopy width. There was a high correlation between the different growth periods. Therefore, in order to automate the calculation of volume parameters, the point cloud data can be directly applied as true values for volume calculation. In order to select the optimal algorithm for Shatangju tree volume calculation, linear regression analysis was also performed between each of the three volume algorithms and the manual measurement. The calculated volume value of the spherical canopy was taken as the true value, and the calculation of different volume algorithms was used as the comparison value. From the results in Figure 15, it can be seen that the volume of the trees as calculated using the convex hull by slice algorithm showed an R 2 of 0.8004 and RMSE of 0.3833 m 3 compared with the manually measured data. It exceeded the voxel-based method with an R 2 and RMSE of 0.5925 and 0.3406 m 3 , respectively. However, the 3D convex hull algorithm performed the best with the strongest correlation (0.8215) and the lowest RMSE (0.3186 m 3 ). That is, the 3D convex hull algorithm achieved the highest correlation that was close to one, followed by the convex hull by slices algorithm. The correlation of the voxel-based method was poor. Based on the analysis of Figure 13c, it can be observed that the 3D convex hull algorithm encloses the whole canopy, and its principle is closest to the formula for calculating the volume of the spherical canopy. However, due to the fact that the triangular planes that enclosed the canopy were not smooth spherical surfaces, the calculated volume of the 3D convex hull algorithm is slightly smaller than In order to select the optimal algorithm for Shatangju tree volume calculation, linear regression analysis was also performed between each of the three volume algorithms and the manual measurement. The calculated volume value of the spherical canopy was taken as the true value, and the calculation of different volume algorithms was used as the comparison value. From the results in Figure 15, it can be seen that the volume of the trees as calculated using the convex hull by slice algorithm showed an R 2 of 0.8004 and RMSE of 0.3833 m 3 compared with the manually measured data. It exceeded the voxel-based method with an R 2 and RMSE of 0.5925 and 0.3406 m 3 , respectively. However, the 3D convex hull algorithm performed the best with the strongest correlation (0.8215) and the lowest RMSE (0.3186 m 3 ). That is, the 3D convex hull algorithm achieved the highest correlation that was close to one, followed by the convex hull by slices algorithm. The correlation of the voxel-based method was poor. Based on the analysis of Figure 13c, it can be observed that the 3D convex hull algorithm encloses the whole canopy, and its principle is closest to the formula for calculating the volume of the spherical canopy. However, due to the fact that the triangular planes that enclosed the canopy were not smooth spherical surfaces, the calculated volume of the 3D convex hull algorithm is slightly smaller than the true value of volume. The convex hull by slices algorithm (Figure 13a) took part of the point cloud as the calculation object, and there were still a large number of gaps in the established sliced convex hull and the segmented body, resulting in the fact that the computed volume is larger than the actual canopy volume. The voxel-based algorithm (Figure 13b) built the point cloud into small cubes with a known volume and calculated the tree volume by counting the number of small cubes. However, since the point cloud itself was built based on the surface of images and missed the internal point cloud information, the computed volume was less than the true value. The correlation was also poor. Therefore, the 3D convex hull algorithm was determined as the algorithm to automatically calculate the volume of Shatangju trees. the true value of volume. The convex hull by slices algorithm (Figure 13a) took part of the point cloud as the calculation object, and there were still a large number of gaps in the established sliced convex hull and the segmented body, resulting in the fact that the computed volume is larger than the actual canopy volume. The voxel-based algorithm ( Figure  13b) built the point cloud into small cubes with a known volume and calculated the tree volume by counting the number of small cubes. However, since the point cloud itself was built based on the surface of images and missed the internal point cloud information, the computed volume was less than the true value. The correlation was also poor. Therefore, the 3D convex hull algorithm was determined as the algorithm to automatically calculate the volume of Shatangju trees.

Discussion
The results indicate that it is feasible to use the point cloud deep learning algorithm combined with the volume calculation algorithm to automatically obtain the canopy volume parameters of the Citrus reticulate Blanco cv. Shatangju tree. With the development of UAV technology, UAV images have also been widely used [49][50][51][52][53][54][55][56]. Studies [10][11][12][13] have shown that UAV tilt photogrammetry can quickly and conveniently obtain images and texture information of forests, pine trees, eucalyptus, etc. UAV tilt photogrammetry is also suitable to obtain the image of the Citrus reticulate Blanco cv. Shatangju orchard, in order to further establish the 3D model of the orchard. It saves time and labor compared to manual measurement and provides a low-cost solution compared to ground mechanical sensors. For the acquisition of the canopy volume of a single Citrus reticulate Blanco cv. Shatangju tree, the proposed method in this research considers the segmentation optimization of the whole orchard, and uses the cutting-edge point cloud deep learning segmentation algorithm. Moreover, the point cloud data obtained by segmentation is directly input into the volume algorithm to improve the accuracy of the volume calculation of a single tree.
In this study, tilt photogrammetry images of Citrus reticulate Blanco cv. Shatangju trees were acquired with a DJI UAV. Experiments should be conducted at noon, when no shadow was in the image affecting the accuracy of the 3D point cloud model. However, data acquisition could not be completed within a short period of time at noon due to the large area of the experimental site. In this case, a time period with better light conditions in the daytime was selected for UAV tilt photogrammetry image acquisition in this study. In addition, Citrus reticulate Blanco cv. Shatangju trees were easily obscured by branches and leaves of adjacent trees during UAV tilt photogrammetry [15], resulting in the occlusion of the lower part and bottom of the trees. Therefore, the angle of the camera and the flight path need to be adjusted. Different head angles of −90°, −60°, −45° and −30°, as well

Discussion
The results indicate that it is feasible to use the point cloud deep learning algorithm combined with the volume calculation algorithm to automatically obtain the canopy volume parameters of the Citrus reticulate Blanco cv. Shatangju tree. With the development of UAV technology, UAV images have also been widely used [49][50][51][52][53][54][55][56]. Studies [10][11][12][13] have shown that UAV tilt photogrammetry can quickly and conveniently obtain images and texture information of forests, pine trees, eucalyptus, etc. UAV tilt photogrammetry is also suitable to obtain the image of the Citrus reticulate Blanco cv. Shatangju orchard, in order to further establish the 3D model of the orchard. It saves time and labor compared to manual measurement and provides a low-cost solution compared to ground mechanical sensors. For the acquisition of the canopy volume of a single Citrus reticulate Blanco cv. Shatangju tree, the proposed method in this research considers the segmentation optimization of the whole orchard, and uses the cutting-edge point cloud deep learning segmentation algorithm. Moreover, the point cloud data obtained by segmentation is directly input into the volume algorithm to improve the accuracy of the volume calculation of a single tree.
In this study, tilt photogrammetry images of Citrus reticulate Blanco cv. Shatangju trees were acquired with a DJI UAV. Experiments should be conducted at noon, when no shadow was in the image affecting the accuracy of the 3D point cloud model. However, data acquisition could not be completed within a short period of time at noon due to the large area of the experimental site. In this case, a time period with better light conditions in the daytime was selected for UAV tilt photogrammetry image acquisition in this study. In addition, Citrus reticulate Blanco cv. Shatangju trees were easily obscured by branches and leaves of adjacent trees during UAV tilt photogrammetry [15], resulting in the occlusion of the lower part and bottom of the trees. Therefore, the angle of the camera and the flight path need to be adjusted. Different head angles of −90 • , −60 • , −45 • and −30 • , as well as a flight path of 30 • east of north, were set in the experiments to ensure that the bottom of the trees could be photographed.
Pix4Dmapper software was applied to build the 3D point cloud model of the experimental site based on UAV tilt photogrammetry images. In order to reduce the computational complexity and noise error in the preprocessing process, three network models, PointNet++, MinkowskiNet and FPConv, were selected for the feature information extraction directly. Although PointNet++ solves the problem of the uneven sampling of point cloud data, it is the same as the PointNet network with insufficient learning of local features, resulting in misclassification with large error. MinkowskiNet makes use of sparse 3D convolution and takes all the discrete convolutions as its subclasses to enhance the information interaction between points and to better distinguish the high-dimensional data. Therefore, MinkowskiNet was able to produce high-quality segmentation effects for the 3D point cloud data in this study. FPConv also achieved good segmentation results, especially for the orchard ground, because its local features were mosaicked with the coordinate information of each point. However, fewer points of trees were classified as ground, possibly because the 2D surface convolution lost a small part of the 3D information of the point cloud. In this study, only the above three network models were adopted for learning. A variety of network models should be investigated in subsequent studies for comparative analysis to determine the most suitable deep learning network for orchard scenes.
The canopy volume and internal structure of Shatangju trees are important indicators for their growth volume and biological characteristics, which provide scientific reference for precision spraying. After the point cloud deep learning algorithm segments a single tree, the height, canopy width and volume of the Shatangju tree are then automatically calculated using volume algorithms. The height and canopy width of the point cloud showed high correlation with the manual measurement values, and the volume algorithm also presented high correlation, except for the voxel-based method. However, as the manual measurement calculated the hull volume through the outermost branches of the canopy, there were certain gaps and holes in the canopy. As a result, the calculated volume could be larger than the true value of the tree canopy without removing these spaces [48]. For the error caused by manual measurement, LiDAR scan data can be used as the true value instead to explore whether there is a certain linear relationship between the point cloud models established using UAV tilt photogrammetry images [49]. The LiDAR point cloud could be applied to calculate a more accurate volume of Shatangju trees, in order to improve the accuracy of spraying volume in pesticide applications.
The purpose of this study is to automatically obtain the canopy volume of the Citrus reticulate Blanco cv. Shatangju trees, in order to provide a reference for the precision management of an orchard. The canopy volume parameters of a single tree were automatically obtained using UAV tilt photogrammetry, deep learning and the volume algorithm. A volume grading map can be achieved to present the growth information of the orchard based on the results in this study. In agricultural practice, producers can transform the growth information map of the orchard into an operational prescription map based on their spraying experience and decision making to achieve precision spraying by plant protection UAVs. Subsequent research is suggested to use this dataset to explore algorithms of generating a prescription map for pesticides application. A transmission protocol is also required to transfer the prescription map to plant protection UAVs in order to guide the practical pesticides spraying applications.

Conclusions
This study established a set of automatic acquisition methods for the canopy volume of Citrus reticulate Blanco cv. Shatangju trees. The point cloud model established based on UAV tilt photogrammetry images was trained by three deep learning networks, PointNet++, MinkowskiNet and FPConv. The results show that the MinkowskiNet model works best for point segmentation between Citrus reticulate Blanco cv. Shatangju trees and the ground. The overall accuracy of the MinkowskiNet model was higher than that of the other two models, with an average mIoU of 94.57%. The segmentation accuracy of the MinkowskiNet model was 90.82 and 98.32% for trees and ground, respectively. The MinkowskiNet model achieved an accuracy as high as 80% for the first training epoch, and it converged after the 50th epoch. It required less training steps, and it was faster and more stable as well.
Both the height and canopy width obtained from the point cloud were highly correlated with the manually measured values and were not affected by the growth period of the Citrus reticulate Blanco cv. Shatangju trees. The R 2 and RMSE values for the height and canopy width were 0.9571 and 0.0445 m, and 0.9215 and 0.0587 m, respectively. The accuracy evaluation of the proposed point cloud model indicates that the model has high estimation accuracy and can be used to obtain the volume values of Citrus reticulate Blanco cv. Shatangju trees by volume algorithms.
From the results of linear regression analysis between each of the three volume algorithms and the manual volume measurement, it is clear that the 3D convex hull algorithm achieved the highest R 2 of 0.8215, followed by the convex hull by slices algorithm and the voxel-based method. Therefore, the 3D convex hull algorithm was selected as the optimal algorithm for automatic volume of Citrus reticulate Blanco cv. Shatangju trees.