Leaf Segmentation Based on k-Means Algorithm to Obtain Leaf Angle Distribution Using Terrestrial LiDAR

It is critical to take the variability of leaf angle distribution into account in a remote sensing analysis of a canopy system. Due to the physical limitations of field measurements, it is difficult to obtain leaf angles quickly and accurately, especially with a complicated canopy structure. An application of terrestrial LiDAR (Light Detection and Ranging) is a common solution for the purposes of leaf angle estimation, and it allows for the measurement and reconstruction of 3D canopy models with an arbitrary volume of leaves. However, in most cases, the leaf angle is estimated incorrectly due to inaccurate leaf segmentation. Therefore, the objective of this study was an emphasis on the development of efficient segmentation algorithms for accurate leaf angle estimation. Our study demonstrates a leaf segmentation approach based on a k-means algorithm coupled with an octree structure and the subsequent application of plane-fitting to estimate the leaf angle. Furthermore, the accuracy of the segmentation and leaf angle estimation was verified. The results showed average segmentation accuracies of 95% and 90% and absolute angular errors of 3◦ and 6◦ in the leaves sampled from mochi and Japanese camellia trees, respectively. It is our conclusion that our method of leaf angle estimation has high potential and is expected to make a significant contribution to future plant and forest research.


Introduction
A canopy structure not only reveals the leaf investment strategies in its growth and development but also influences ambient environmental factors, such as water balance, wind speed, carbon balance, and microclimates [1,2].
Leaf angle distribution (LAD) is considered to be an important canopy structural parameter and has been extensively studied to understand its impact on light transmission within the canopy and the biophysical process of its roles [3][4][5][6][7].Specifically, LAD is an important parameter used in the radiative transfer model for indirect estimations of the leaf area index [3][4][5], predictions of plant productivity, and estimations of energy balance [6,7].
In situ, LAD is manually measured with a protractor and compass [8].However, because it is dynamic and varies with a time scale, the task of LAD acquisition through manual measurement is difficult [9].Moreover, manual measurement is costly, labor-intensive, and limited in its ability to acquire and reproduce data from a tall tree [10].
Several theoretical simulations have also attempted to calculate the angular distribution of leaves by assuming the randomness of leaf arrangement and a regular shape distribution [11][12][13].
Such model-based approaches have greatly minimized the requirement of field measurements; however, the models were based on unclear reasoning, and there was insufficient field experimental verification.On the other hand, LAD measurement simulated by leveled digital photograph techniques has shown great potential, but it is susceptible to the direction in which the camera is facing the target [14,15].
Recently, there has been attention on using terrestrial LiDAR (Light Detection and Ranging) to capture canopy structural details down to millimeter-level resolutions [16].Terrestrial LiDAR achieves a high resolution of distance measurements or object structures by illuminating the target with a laser and measuring the reflected pulse phase difference or the round-trip time of the pulse (time of flight).Terrestrial LiDAR has great potential for accurately obtaining canopy structure details, which are widely used to estimate vegetation structure parameters such as tree height [17], leaf area index [17][18][19][20], leaf area density [10,21,22], and leaf angle distribution [10,16].In mathematical parlance, the leaf angle is typically described by the normal vector of the leaf surface.In short, it seems likely that the calculation of the normal vector should be an important task for obtaining accurate leaf angle estimates.Point cloud data (PCDs), a form of LiDAR-derived data, can reconstruct a 3D model of a leaf.Through 3D leaf models, the structure of leaf PCDs has been recognized and the leaf angle calculated by employing the plane-fitting method, which follows the same pattern as the least-squares method [16,19,22,23].
As a necessary process for obtaining appropriate leaf segments to fit the surface, leaf angles estimated from the fitted plane of the leaf PCDs have been manually extracted from entire LiDAR-derived PCDs [22,23].Yet, the process of manual segmentation of leaf PCD is time-consuming.In most cases, manual segmentation is not recommended for complicated canopy structures, while automatic segmentation still presents challenges.An algorithm that automatically separates LiDAR-derived PCDs into acceptable clusters has been reported.In the reported method, the LiDAR-derived PCDs were regularly segmented into clusters using sequence cubic grids [24].Moreover, neighboring points of center were repetitively selected to group the clusters within a certain Chebyshev distance (e.g., 1 cm) [25] as well as triangular intersect regions [10].Similarly, the application of a k-nearest-neighbor algorithm (KNN) is an option that groups points based on the distance of the k-number [16].In addition, classification with a mixed Gaussian model and an expectation/maximization (EM) algorithm also show great potential [26].
Improper segmentation leads to the same grouped segment containing points from different leaves, especially at adjacent points on the leaf edge.Unfortunately, the above-mentioned methods cannot prevent false segmentation because a grouping method based on the selection of PCDs within a specific distance from the centroid is inelastic [10,24,25], resulting in the inability to group PCDs that are not evenly distributed; further, sometimes the performance of the results is data-dependent [16,24].On the other hand, the k-means method is more elastic in terms of grouping, with flexibility in choosing the cluster sizes.In addition, in most studies, the discussion and verification of segmentation accuracy, which affects the cumulative accuracy of LAD calculations, have been omitted.Therefore, this study aimed to develop a method for developing resilience in region segmentation to achieve a higher leaf angle and LAD estimation accuracy based on the k-means method using LiDAR-derived PCDs.In addition, two standardization procedures are provided: (1) verifying the segmentation accuracy to check whether the leaf clusters are correctly segmented; and (2) verifying that the verified leaf angles match the actual leaf angles.

Study Site and Materials
The terrestrial LiDAR (Light Detection and Ranging) used in this study was a Focus3D X330 (FARO, FL, USA) weighing 5.2 kg with a vertical field of view of 300 • and a horizontal field of view of 360 • .The measurement speed was 122 to 976 kpts/sec (kpts: thousands of points), with a scan range between 0.6 m and 330 m, a wavelength of 1550 nm, a distance ranging error of ±1 mm, and an angular resolution of 0.009 • for each angular scan.The field trial was conducted at the University of Tokyo (Figure 1a), which is located at 35 • 43'03.1"N139 • 45'43.4"E, on 12 November 2018, and the day was windless and sunny.The sampled species included the mochi tree (Ilex integra) and the Japanese camellia (Camellia japonica), which were well isolated and themselves surrounded by the same species.The leaf lengths of the mochi and Japanese camellia tree were approximately 12 and 4 cm, respectively, and the tree heights were 4.5 and 3 m, respectively.

Data Collection/Preprocessing
The scanner was mounted on a 1.5-m-high tripod and placed 3 m in front of the target (Figure 1b).Before scanning, 10 sampling leaves from both of the species were marked using colored tape.The leaf angle was measured with a protractor for subsequent validation purposes [23][24][25].In order to prevent unreliable data from being taken from the measurement of the leaf angle, the measurements of leaf angle in each leaf sample were performed cautiously in a windless state and were repeated three times to obtain an average value.Each scanning angular resolution was 0.009, and the scanner spent approximately 15 min completing each scan.After scanning, PCDs were collected, and the target leaf PCD parts were manually extracted using CloudCompare software.Except for the leaf PCDs, the nonphotosynthetic parts, such as wood trunks and branches in the canopy, were pre-deleted by grouping the images based on established classification methods for estimating the leaf angle without effects from untargeted materials [26].Research studies use either geometric-or intensity-based classification methods to group leaf and wood PCDs, but the methods are slightly complicated in terms of procedures [27][28][29].In contrast, considering that the recently developed laser scanner had its own digital camera, color information could also be used as a basis for classification.In fact, the color components of wood and leaves are intuitively different; therefore, it is not difficult to distinguish between them and classify them according to color difference.
The digital number (DN) in the RGB channel (for red, blue, green three colors) shows the absolute magnitude of the color component in each channel.However, if only the DN is considered, the classification between wood and leaf PCDs is not always successful.We could not guarantee that the DN of leaves in the green channel would always be higher than that of wood even though the leaves looked greener, because the wood material could maintain a higher "brightness", resulting in a higher wood DN in the green channel.However, if compared under the same brightness conditions, the DN of the wood PCDs in the green channel is relatively lower than the leaf.Therefore, the DN of the PCDs was rescaled via normalization to allow for a comparison under the same brightness conditions.A simple method of doing this is to obtain the value in the target channel and divide it by the sum of the DNs in the R, G, and B channels [20].However, in this study, another simple way was proposed to emphasize the comparison in the green channel by converting the RGB color space into an L*a*b* color space (is the color space defined by International Commission on Illumination), which contained three representative values: L* for brightness from black (0) to white (100), a* for green (-) to red (+), and b* for blue (-) to yellow (+).Generally, the leaf is greener, which results in mostly negative a* values in the L*a*b* color space, while the a* value of wood is positive.Thus, a threshold a* value was defined to classify the wood and leaf PCDs by averaging the a* values of randomly selected PCDs.Then, the wood PCDs were removed according to the classification results.The accuracy of removal classification was calculated as given in Reference [27]: Here, T w is the number of points correctly classified as wood, F w is the number of points incorrectly classified as wood, T l is the number of points correctly classified as a leaf, and T w is the number of points incorrectly classified as a leaf.

k-Means
The concept of the developed algorithm is demonstrated in the workflow shown in Figure 2. The k-means algorithm has been widely used to group, classify, and segment clusters of point data [30].This algorithm was used in this study to implement leaf PCD segmentation.The leaf PCDs in a specific region were segmented into small groups, each of which depicted either a single leaf PCD or part of a leaf PCD, based on the initial given parameter k.The parameter k is used for deciding the number of segmentations.Specifying a larger k-value leads to a cluster of PCDs in the space to be segmented into smaller pieces of points groups.The method for deciding the optimal k-value for the k-means algorithm will be discussed later, in Section 2.3.the DN of leaves in the green channel would always be higher than that of wood even though the leaves looked greener, because the wood material could maintain a higher "brightness", resulting in a higher wood DN in the green channel.However, if compared under the same brightness conditions, the DN of the wood PCDs in the green channel is relatively lower than the leaf.Therefore, the DN of the PCDs was rescaled via normalization to allow for a comparison under the same brightness conditions.A simple method of doing this is to obtain the value in the target channel and divide it by the sum of the DNs in the R, G, and B channels [20].However, in this study, another simple way was proposed to emphasize the comparison in the green channel by converting the RGB color space into an L*a*b* color space (is the color space defined by International Commission on Illumination), which contained three representative values: L* for brightness from black (0) to white (100), a* for green (-) to red (+), and b* for blue (-) to yellow (+).Generally, the leaf is greener, which results in mostly negative a* values in the L*a*b* color space, while the a* value of wood is positive.Thus, a threshold a* value was defined to classify the wood and leaf PCDs by averaging the a* values of randomly selected PCDs.Then, the wood PCDs were removed according to the classification results.The accuracy of removal classification was calculated as given in Reference [27]: Here, Tw is the number of points correctly classified as wood, Fw is the number of points incorrectly classified as wood, Tl is the number of points correctly classified as a leaf, and Tw is the number of points incorrectly classified as a leaf.

k-Means
The concept of the developed algorithm is demonstrated in the workflow shown in Figure 2. The k-means algorithm has been widely used to group, classify, and segment clusters of point data [30].This algorithm was used in this study to implement leaf PCD segmentation.The leaf PCDs in a specific region were segmented into small groups, each of which depicted either a single leaf PCD or part of a leaf PCD, based on the initial given parameter k.The parameter k is used for deciding the number of segmentations.Specifying a larger k-value leads to a cluster of PCDs in the space to be segmented into smaller pieces of points groups.The method for deciding the optimal k-value for the k-means algorithm will be discussed later, in Section 2.3.3.The k-means involves iteratively exploring the centroid of every cluster and then picking and regrouping the near points into a new group.Figure 3a displays the changes in the class boundary and cluster centroid between the nth and (n + 1)th iterations.As shown, in the nth iteration, point p was closer to the centroid of the leaf a.However, it was grouped with leaf b (D a < D b ), which did not lead to the smallest sum of the distance between the points and the cluster's centroid within the group.Therefore, the leaf PCDs were reorganized to group the p point with leaf a, where Di > D i, and the iteration ended in D i, which was the smallest.For instance, assuming that D i in the (n + 1)th and (n + 2)th iterations were equal, the iteration stopped at the (n + 1)th iteration because the cluster in the (n+1)th iteration was the smallest.

Octree
Use of the k-means algorithm helps in careful segmentation of the PCD clusters, but running the program across all PCDs has limitations of memory and time consumption.For these reasons, the leaf PCDs were spatially discretized into subspaces based on an octree structure [31] to reduce computational complexity and memory usage before running them through the k-means.The octree structure is an effective approach for assigning point clouds to uniform subspaces.On the basis of an octree structure, the leaf PCDs were split into eight subspaces in every iteration until the number of points was less than the specified number (Figure 3b).The number of points in each octree space unit was limited to 1500 in both the sampled species to maintain high computational efficiency for each batch processing of k-means in every octree unit space.Besides, an estimation of the number of leaves in an octree unit is helpful in deciding the initial k-value and reducing unnecessary computation in the k-means algorithm.

k-Value for k-Means
The k-value, which is the initial parameter of the k-means, is the total number of segments expected in the PCDs.The elbow method has been introduced to determine the robustly optimal kvalue in the literature [30,32].However, the insignificant spatial deviation of datasets might occur with different k-values, leading to criticism due to unsuccessful performance.The k-value is expected to be equal to or greater than the number of leaves (N) to prevent multiple leaves from being grouped into the same cluster (k ≧ N).In other words, knowing the value of N explicitly is of great interest in

Octree
Use of the k-means algorithm helps in careful segmentation of the PCD clusters, but running the program across all PCDs has limitations of memory and time consumption.For these reasons, the leaf PCDs were spatially discretized into subspaces based on an octree structure [31] to reduce computational complexity and memory usage before running them through the k-means.The octree structure is an effective approach for assigning point clouds to uniform subspaces.On the basis of an octree structure, the leaf PCDs were split into eight subspaces in every iteration until the number of points was less than the specified number (Figure 3b).The number of points in each octree space unit was limited to 1500 in both the sampled species to maintain high computational efficiency for each batch processing of k-means in every octree unit space.Besides, an estimation of the number of leaves in an octree unit is helpful in deciding the initial k-value and reducing unnecessary computation in the k-means algorithm.This number was initially unknown, but the points of a single leaf were counted to approximately estimate the total number of leaves that might be included in an octree unit space.In our case, the average number of points in a single leaf was between 500 and 1500.Therefore, one octree unit space (Oi) was expected to contain 1 to 3 leaves.

k-Value for k-Means
The k-value, which is the initial parameter of the k-means, is the total number of segments expected in the PCDs.The elbow method has been introduced to determine the robustly optimal k-value in the literature [30,32].However, the insignificant spatial deviation of datasets might occur with different k-values, leading to criticism due to unsuccessful performance.The k-value is expected to be equal to or greater than the number of leaves (N) to prevent multiple leaves from being grouped into the same cluster (k N).In other words, knowing the value of N explicitly is of great interest in determining the optimal k-value.As discussed previously (Section 2.3.2),N can simply be estimated by evaluating the relationship between the total number of points in a single leaf and the octree unit space.In addition, as described in a previous section, one octree unit space (Oi) is expected to contain 1 to 3 leaves.Therefore, the initial k-value should be set to at least greater than 3 (k N = 3).However, in reality, the estimation of N is not always correct.In Figure 4a, N is truly equal to 4, but the estimation (N = 3) is intrinsically lower.A larger k-value (k → k') should be given until k' ≥ N to properly separate the clusters of leaf PCDs into a specific number of clusters.Increasing the k-value without limitation, regardless of a reduction in the number of PCDs, causes the leaf PCDs to become smaller, rendering them susceptible to noise and anomalies in calculating the leaf angle [10] and making the process computationally inefficient.
Remote Sens. 2019, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/remotesensingclose leaves.As shown in Figure 4a, the longest distance from the centroid to a cluster point in each group is r.Thus, for the groups Ci, Cj, and Ck, the longest distances are ri, rj, and ri, respectively, where rj > rk > ri.When multiple leaves are included in the group, such as in Cj, the distance r will be longer than half the distance of the leaf gap (r > L/2).In this case, the k-value can be added to have more domains for grouping the leaf PCDs.As can be seen in group Cj, as more than a single leaf is included in group j, the k-value can be increased to k + 1 (i.e., 4).The cluster is split into two different clusters if r of the cluster exceeds a pre-specified value (L/2).As a result, four domains are created, and two leaves being grouped together can be prevented.In other words, it is not required to increase the k-value indefinitely, but only gradually until all r-values in the domains are not beyond (L/2), which means r = {ri …rn} ≦(L/2) is satisfied.L was set at 2 cm because it mostly ranged from 2 cm to 5 cm in this study.There was consistent accuracy in the segmentation, justifying the approach.Segmentation based on this approach was compared to manually extracted leaf segments.Leaf PCDs from four to five regions of the sampled trees were manually divided into several individual leaf segments.Each region consisted of approximately 20-30 leaf segments.Figure 4b shows an example of invalid segmentation.Invalid segmentation of a cluster such as C1 might affect the estimation of a leaf angle individually as well as the total accumulation of the leaf angle distribution.The impact is evaluated by the accuracy of the segmentation, given by (a) (b) In this study, a simple method for deciding the k-value is provided by considering the distance of the points from the centroid and the leaf gap distance (L), i.e., the shortest distance between two close leaves.As shown in Figure 4a, the longest distance from the centroid to a cluster point in each group is r.Thus, for the groups Ci, Cj, and Ck, the longest distances are ri, rj, and ri, respectively, where rj > rk > ri.When multiple leaves are included in the group, such as in Cj, the distance r will be longer than half the distance of the leaf gap (r > L/2).In this case, the k-value can be added to have more domains for grouping the leaf PCDs.As can be seen in group Cj, as more than a single leaf is included in group j, the k-value can be increased to k + 1 (i.e., 4).The cluster is split into two different clusters if r of the cluster exceeds a pre-specified value (L/2).As a result, four domains are created, and two leaves being grouped together can be prevented.In other words, it is not required to increase the k-value indefinitely, but only gradually until all r-values in the domains are not beyond (L/2), which means r = {ri . . .rn} (L/2) is satisfied.L was set at 2 cm because it mostly ranged from 2 cm to 5 cm in this study.There was consistent accuracy in the segmentation, justifying the approach.
Segmentation based on this approach was compared to manually extracted leaf segments.Leaf PCDs from four to five regions of the sampled trees were manually divided into several individual leaf segments.Each region consisted of approximately 20-30 leaf segments.Figure 4b shows an example of invalid segmentation.Invalid segmentation of a cluster such as C 1 might affect the estimation of a leaf angle individually as well as the total accumulation of the leaf angle distribution.The impact is evaluated by the accuracy of the segmentation, given by Here, Fc is the number of points in false classes that contain the points from more than a single leaf, while Tc is the true class, including only a single leaf.For instance, the accuracy of the segmentation shown in Figure 4b is calculated as 1 -(C 1 / C 1 + C 2 + . . . . . .+ C 9 ).

Definition and Calculation
The leaf angle is defined as the clockwise angle from the zenith (z axis) (shown in Figure 5), which is the angle between the normal vector ( N) and the vector in the z direction.Here, ∑ Fc is the number of points in false classes that contain the points from more than a single leaf, while ∑ Tc is the true class, including only a single leaf.For instance, the accuracy of the segmentation shown in Figure 4b is calculated as 1 -(C1/ C1 + C2 +……+ C9).

Definition and Calculation
The leaf angle is defined as the clockwise angle from the zenith (z axis) (shown in Figure 5), which is the angle between the normal vector ( ⃑ ) and the vector in the z direction.The leaf angle θ is calculated with respect to the normal vector from the fitted plane ( x +  y + z = d) of segmented clusters.The following equation is used to calculate the leaf angle.
Multiplication by pi/180 is essential to convert units into degrees: Here,  ⃑ is the normal vector ( ,  ,  ).The normal vector ( ⃑ ) of the fitted plane was obtained using the least-squares method [26].The fitted plane has the smallest distance between the plane and the points of clusters.The least-squares method calculates the plane as follows: LAD is commonly proportioned by the number of PCDs, which means that the estimated leaf angle in the region of dense point clouds is likely to dominate the entire computation of LAD.It has been stated that the leaf angle should be weighted by a triangular leaf area for the calculation of the LAD, as the distribution of PCDs is not spatially uniform [10].However, implementing this is difficult in terms of computational complexity.Instead, the information for calculated leaf angles stored in PCDs The leaf angle θ L is calculated with respect to the normal vector from the fitted plane (n x x + n y y +n z z = d) of segmented clusters.The following equation is used to calculate the leaf angle.Multiplication by pi/180 is essential to convert units into degrees: Here, N is the normal vector (n x , n y, n z ).The normal vector ( N) of the fitted plane was obtained using the least-squares method [26].The fitted plane has the smallest distance between the plane and the points of clusters.The least-squares method calculates the plane as follows: LAD is commonly proportioned by the number of PCDs, which means that the estimated leaf angle in the region of dense point clouds is likely to dominate the entire computation of LAD.It has been stated that the leaf angle should be weighted by a triangular leaf area for the calculation of the LAD, as the distribution of PCDs is not spatially uniform [10].However, implementing this is difficult in terms of computational complexity.Instead, the information for calculated leaf angles stored in PCDs was weighted and averaged by cubic cell (0.1 cm) because it was approximately equal to the leaf area.All of the angle values stored in the LAD were calculated with f(θ), as follows: Here, f(θ j ) is a function of the LAD that is used to describe the proportion of the leaf angle in the jth interval (%); θ j represents the angle range in the jth interval ( 5• is assigned at each interval); N j is the number of cubic cells containing the angle of the jth interval range; and N is the sum of cubic cells in the complete range of intervals.

Removal of Non-Photosynthetic Wood
The classification of wood and leaves based on LiDAR-derived PCD geometric features or intensity extractions has been suggested earlier [27][28][29].In this study, instead of performing complex calculations that were data-dependent, a simple method was adopted based on the difference in color components between the wood and leaves.As a result, the accuracy of classification was 0.93 for the Japanese camellia and 0.91 for the mochi tree, respectively.Figure 6a is a representation of the a* value in the L*a*b* color space after conversion, and the a* value for wood was observed to be higher than for leaves.The classified results are shown in Figure 6b, where the wood is blue and the leaves are red.Our method has the advantage of not requiring any geometric feature extraction calculations and a LiDAR intensity database, but it is limited by its high dependence on lighting conditions.Therefore, it is recommended that these methods be combined according to the situation.
Remote Sens. 2019, 11, x FOR PEER REVIEW 8 of 14 was weighted and averaged by cubic cell (0.1 cm) because it was approximately equal to the leaf area.All of the angle values stored in the LAD were calculated with f(θ), as follows: Here, f(θ ) is a function of the LAD that is used to describe the proportion of the leaf angle in the jth interval (%); θ represents the angle range in the jth interval (5° is assigned at each interval);  is the number of cubic cells containing the angle of the jth interval range; and  is the sum of cubic cells in the complete range of intervals.

Removal of Non-Photosynthetic Wood
The classification of wood and leaves based on LiDAR-derived PCD geometric features or intensity extractions has been suggested earlier [27][28][29].In this study, instead of performing complex calculations that were data-dependent, a simple method was adopted based on the difference in color components between the wood and leaves.As a result, the accuracy of classification was 0.93 for the Japanese camellia and 0.91 for the mochi tree, respectively.Figure 6a is a representation of the a* value in the L*a*b* color space after conversion, and the a* value for wood was observed to be higher than for leaves.The classified results are shown in Figure 6b, where the wood is blue and the leaves are red.Our met

Segmentation of Leaves
The results (Figure 7a,b) showed that our algorithm could segment the leaves with an average accuracy of 95% (ranging from 90% to 98%) for the sampled leaves from the mochi tree and an average accuracy of 90% (ranging from 87% to 94%) for the Japanese camellia.The average classification accuracy for the mochi tree was higher than that for the Japanese camellia.Between the two sampled species, the segmentation accuracy for the top region of the tree was relatively low, and the angular resolution that was degraded over the scanning distance was considered to have a slight influence on the accuracy, but this was not confirmed in the study.Furthermore, the size of the octree space of each terminal node (Oi) may have been the cause of the results.The computational efficiency was considered to be the most affected here.The size of the octree space determined the number of kvalues that needed to be specified.When the octree space was large, a larger k-value was required to run k-means, resulting in a longer calculation time.On the one hand, some of the leaves that surrounded the space boundary might have been split into small pieces of incomplete leaves.This might also be a factor affecting the accuracy of classification.Fortunately, our method gradually increases the value of k and can make the value of k eventually larger than the number of clusters in the space.Since the k-value was, finally, greater than the number of clusters in the space, the accuracy of the classification was not affected.
A higher leaf curvature was observed in the Japanese camellia than in the mochi tree, which was caused by the overlapping of leaves.Leaf PCDs of overlapped leaves consist of continuous and close points within the surrounding leaf edges, which makes the group centroid move to a position located in the middle of the leaf gap (Figure 4a, Cj).As the centroid of the grouped cluster is located in the space within the leaves, the distance r (the distance farthest from the centroid, as mentioned earlier, should be specified to be at least less than half of the leaf gap distance to avoid two leaves being classified into the same class (r ≦ L/2).In this case, as the group centroid is positioned in the middle of L, with L assumed at 2 cm, r ≦ 1 cm should be satisfied.Due to the mostly uneven distribution of the PCDs, the group centroid is not located in the middle, but is closer to the leaves, so the condition is satisfied even when the leaf gap is less than 2 cm.However, if the leaf gaps are smaller than 2 cm, for example, 1 cm, and the group centroid is positioned in the middle, the condition must be set at r

Segmentation of Leaves
The results (Figure 7a,b) showed that our algorithm could segment the leaves with an average accuracy of 95% (ranging from 90% to 98%) for the sampled leaves from the mochi tree and an average accuracy of 90% (ranging from 87% to 94%) for the Japanese camellia.The average classification accuracy for the mochi tree was higher than that for the Japanese camellia.Between the two sampled species, the segmentation accuracy for the top region of the tree was relatively low, and the angular resolution that was degraded over the scanning distance was considered to have a slight influence on the accuracy, but this was not confirmed in the study.Furthermore, the size of the octree space of each terminal node (Oi) may have been the cause of the results.The computational efficiency was considered to be the most affected here.The size of the octree space determined the number of k-values that needed to be specified.When the octree space was large, a larger k-value was required to run k-means, resulting in a longer calculation time.On the one hand, some of the leaves that surrounded the space boundary might have been split into small pieces of incomplete leaves.This might also be a factor affecting the accuracy of classification.Fortunately, our method gradually increases the value of k and can make the value of k eventually larger than the number of clusters in the space.Since the k-value was, finally, greater than the number of clusters in the space, the accuracy of the classification was not affected.
A higher leaf curvature was observed in the Japanese camellia than in the mochi tree, which was caused by the overlapping of leaves.Leaf PCDs of overlapped leaves consist of continuous and close points within the surrounding leaf edges, which makes the group centroid move to a position located in the middle of the leaf gap (Figure 4a, Cj).As the centroid of the grouped cluster is located in the space within the leaves, the distance r (the distance farthest from the centroid, as mentioned earlier, should be specified to be at least less than half of the leaf gap distance to avoid two leaves being classified into the same class (r L/2).In this case, as the group centroid is positioned in the middle of L, with L assumed at 2 cm, r 1 cm should be satisfied.Due to the mostly uneven distribution of the PCDs, the group centroid is not located in the middle, but is closer to the leaves, so the condition is satisfied even when the leaf gap is less than 2 cm.However, if the leaf gaps are smaller than 2 cm, for example, 1 cm, and the group centroid is positioned in the middle, the condition must be set at r 0.5 cm to prevent false segmentation.The lower the value of the required r distance, the larger the k-value to operate the k-means is.Even assuming that a larger k is possible in suppressing the limit, the decreased computational efficiency and the noise effect on the leaf angle estimation of a small group of clusters cannot be ignored (for example, in our case, using the general commercial computers, it took 2-5 min to complete the calculation using a sampled tree formed by about 2 million PCDs).In addition, after the verification of accuracy, the segmentation error was considered to be small enough to affect the estimation of LAD.Therefore, we compromised here by approximating more accurate results while maintaining higher computational efficiency without increasing the k-value.≦ 0.5 cm to prevent false segmentation.The lower the value of the required r distance, the larger the k-value to operate the k-means is.Even assuming that a larger k is possible in suppressing the limit, the decreased computational efficiency and the noise effect on the leaf angle estimation of a small group of clusters cannot be ignored (for example, in our case, using the general commercial computers, it took 2-5 min to complete the calculation using a sampled tree formed by about 2 million PCDs).In addition, after the verification of accuracy, the segmentation error was considered to be small enough to affect the estimation of LAD.Therefore, we compromised here by approximating more accurate results while maintaining higher computational efficiency without increasing the kvalue.

Leaf Angle Estimates Based on Plane-Fitting Method
As is evident from the results shown in Figure 8, the absolute error of leaf angles in the Japanese camellia (6°) was higher than in the mochi tree (3°), which is evident from the R-squared value for the mochi tree (R 2 = 0.81) and the Japanese camellia tree (R 2 = 0.69).The average absolute error was higher than the absolute error (2.5°) obtained in Reference [25].This was because most of the sampling species used in that study had broad leaves that had a lower curvature (flat) and contained less overlapped leaves.In general, the leaf angle estimate based on the fitting surface method is susceptible to surface curvature in the samples.The normal vector used to calculate the leaf angle cannot be accurately estimated in higher-curvature clusters by a single fitted plane [33,34].More internal leaf segmentation with multiple plane-fittings is required to calculate a more accurate local leaf angle within the leaf.Given this initial condition, r will be less than the specific number specified in the study, and the internal leaf segmentation will have to be split more; in other words, the partition size of segmentation will have to be smaller to make the leaf angle estimation accurate if following the plane-fitting method.However, as described in Section 3.1.2,the computational efficiency and the size of the segmentation area are susceptible to ambient noise.Because such an experiment requires a larger variety of samples, a detailed description of the effect of curvature on determining the k-means is omitted here.

Leaf Angle Estimates Based on Plane-Fitting Method
As is evident from the results shown in Figure 8, the absolute error of leaf angles in the Japanese camellia (6 • ) was higher than in the mochi tree (3 • ), which is evident from the R-squared value for the mochi tree (R 2 = 0.81) and the Japanese camellia tree (R 2 = 0.69).The average absolute error was higher than the absolute error (2.5 • ) obtained in Reference [25].This was because most of the sampling species used in that study had broad leaves that had a lower curvature (flat) and contained less overlapped leaves.In general, the leaf angle estimate based on the fitting surface method is susceptible to surface curvature in the samples.The normal vector used to calculate the leaf angle cannot be accurately estimated in higher-curvature clusters by a single fitted plane [33,34].More internal leaf segmentation with multiple plane-fittings is required to calculate a more accurate local leaf angle within the leaf.Given this initial condition, r will be less than the specific number specified in the study, and the internal leaf segmentation will have to be split more; in other words, the partition size of segmentation will have to be smaller to make the leaf angle estimation accurate if following the plane-fitting method.However, as described in Section 3.1.2,the computational efficiency and the size of the segmentation area are susceptible to ambient noise.Because such an experiment requires a larger variety of samples, a detailed description of the effect of curvature on determining the k-means is omitted here.

Leaf Angle Distribution (LAD)
The tendency observed in this study is consistent with other findings where the species maximized the solar interception by changing their LAD [10].In the mochi tree, the angle of the leaves in the canopy appeared to be mostly inclined at 50° in the higher regions and 32° near the ground (Figure 9).The LAD of the Japanese camellia behaved like the mochi tree, with the leaves inclined horizontally at the top and gradually inclining toward vertical toward the bottom, but the average LAD was more vertical (Figure 10).The LAD in both species revealed that their canopies employed the same concept to effectively use light energy.In agreement with Reference [23], the leaf angle of the species inclined toward horizontal at the top and toward vertical near the ground to increase the leaf outer edge to maximize sunlight interception.

Leaf Angle Distribution (LAD)
The tendency observed in this study is consistent with other findings where the species maximized the solar interception by changing their LAD [10].In the mochi tree, the angle of the leaves in the canopy appeared to be mostly inclined at 50 • in the higher regions and 32 • near the ground (Figure 9).The LAD of the Japanese camellia behaved like the mochi tree, with the leaves inclined horizontally at the top and gradually inclining toward vertical toward the bottom, but the average LAD was more vertical (Figure 10).The LAD in both species revealed that their canopies employed the same concept to effectively use light energy.In agreement with Reference [23], the leaf angle of the species inclined toward horizontal at the top and toward vertical near the ground to increase the leaf outer edge to maximize sunlight interception.

Leaf Angle Distribution (LAD)
The tendency observed in this study is consistent with other findings where the species maximized the solar interception by changing their LAD [10].In the mochi tree, the angle of the leaves in the canopy appeared to be mostly inclined at 50° in the higher regions and 32° near the ground (Figure 9).The LAD of the Japanese camellia behaved like the mochi tree, with the leaves inclined horizontally at the top and gradually inclining toward vertical toward the bottom, but the average LAD was more vertical (Figure 10).The LAD in both species revealed that their canopies employed the same concept to effectively use light energy.In agreement with Reference [23], the leaf angle of the species inclined toward horizontal at the top and toward vertical near the ground to increase the leaf outer edge to maximize sunlight interception.

Figure 2 .
Figure 2. Workflow of leaf angle estimates.Figure 2. Workflow of leaf angle estimates.

Figure 2 .
Figure 2. Workflow of leaf angle estimates.Figure 2. Workflow of leaf angle estimates.

Figure 3 .
Figure 3.The algorithm for leaf segmentation: (a) k-means, where triangles and squares represent the centroid of grouped clusters.The distance from the centroid to the group points is  in the nth iteration and   in the (n + 1)th iteration.As the sum of ∑   becomes minimum, the iteration will stop or the iteration will continue.(b) Octree (Oi) is an octree space unit that is divided by octree.

Figure 3 .
Figure 3.The algorithm for leaf segmentation: (a) k-means, where triangles and squares represent the centroid of grouped clusters.The distance from the centroid to the group points is Di in the nth iteration and D i in the (n + 1)th iteration.As the sum of D i becomes minimum, the iteration will stop or the iteration will continue.(b) Octree (Oi) is an octree space unit that is divided by octree.

Figure 4 .
Figure 4.The algorithm for leaf segmentation.(a) The relationship between the distance r and the leaf gap distance (L).The k-value increased from 3 to 4 to decompose more segmented pieces to make r j > (L/2).(b) Example of the result of segmentation.C 1 represents invalid segmentation.

Figure 4 .
Figure 4.The algorithm for leaf segmentation.(a) The relationship between the distance r and the leaf gap distance (L).The k-value increased from 3 to 4 to decompose more segmented pieces to make rj > (L/2).(b) Example of the result of segmentation.C1 represents invalid segmentation.

Figure 6 .
Figure 6.The classification for removing non-photosynthetic material in the case of Japanese camellia: (a) a* value in L*a*b* color space; (b) classified results (red: leaves; blue: wood); (c) original sampled tree; (d) sampled tree with wood removed.

Figure 7 .
Figure 7. Randomly extracted samples for the validation of leaf segmentation in (a) the mochi tree and (b) the Japanese camellia.

Figure 7 .
Figure 7. Randomly extracted samples for the validation of leaf segmentation in (a) the mochi tree and (b) the Japanese camellia.

Figure 8 .
Figure 8.Comparison between manual measurement and LiDAR-derived leaf angle for (a) mochi trees and (b) Japanese camellia.

Figure 8 .
Figure 8.Comparison between manual measurement and LiDAR-derived leaf angle for (a) mochi trees and (b) Japanese camellia.

Figure 8 .
Figure 8.Comparison between manual measurement and LiDAR-derived leaf angle for (a) mochi trees and (b) Japanese camellia.