Point2Tree(P2T) -- framework for parameter tuning of semantic and instance segmentation used with mobile laser scanning data in coniferous forest

This article introduces Point2Tree, a novel framework that incorporates a three-stage process involving semantic segmentation, instance segmentation, optimization analysis of hyperparemeters importance. It introduces a comprehensive and modular approach to processing laser points clouds in Forestry. We tested it on two independent datasets. The first area was located in an actively managed boreal coniferous dominated forest in V{\aa}ler, Norway, 16 circular plots of 400 square meters were selected to cover a range of forest conditions in terms of species composition and stand density. We trained a model based on Pointnet++ architecture which achieves 0.92 F1-score in semantic segmentation. As a second step in our pipeline we used graph-based approach for instance segmentation which reached F1-score approx. 0.6. The optimization allowed to further boost the performance of the pipeline by approx. 4 \% points.


Introduction
The use of high-resolution 3D point clouds from terrestrial laser scanning (TLS), personalized laser scanning (PLS), and drone or helicopter-based laser scanning data have long been an area of intensive research for the characterization of forest ecosystems.In addition to measuring traditional variables such as stem volume, diameter, and height Astrup et al. (2014) or tree species (e.g.Allen et al., 2022), these very high-detail 3D forest structural data can allow new insight into single tree properties such as biomass (e.g.Demol et al., 2022;Calders et al., 2015), stem curves (e.g.Hyyppä et al., 2020), tree height growth (e.g.Puliti et al., 2022), wood quality (e.g.Pyörälä et al., 2018), key ecological indicators (Calders et al., 2020), and phenotyping (Grubinger et al., 2020;Hartley et al., 2022).
The emergence of improved sensor technology and implementation of Simultaneous Localization And Mapping (SLAM) algorithms have greatly improved the availability and reduced the cost of dense point clouds from mobile laser scanning (MLS) platforms.The continuous move from stationary TLS to mobile, personalized, or drone-based scanning systems has greatly increased the ease of scanning larger forest plots (Tockner et al., 2022a) and reduced the challenge of when not limited to fixed stations (see Boucher et al., 2021).Recent studies have also pointed out that personalized or mobile laser devices may provide an improved (Donager et al., 2021) or more cost-efficient (Kükenbrink et al., 2022) alternative to traditional field measurements with calipers and hypsometers for collection of ground truth for air-or space-borne remote sensing.
At the core of most forest applications of high-density 3D point clouds is the ability to efficiently, with high precision and accuracy, segment the point cloud into different compartments such as stem, leaf, or branches (semantic segmentation) and further into single tree point clouds, hereafter referred to as instance segmentation.Even though the development of segmentation algorithms has been a substantial field of research for many years, both semantic and instance segmentation remains a significant bottleneck to unleashing the full potential of high-density point clouds in the forest context.Most studies have relied on algorithmic approaches (e.g., clustering and circle fitting or voxel-based approaches Wang et al. (2008)) to identify and segment single trees Vicari et al. (2019); Burt et al. (2019).In all cases, the segmentation routines tend to produce artifacts, and the single tree point clouds often require manual editing.In addition, such approaches are generally tailored to the specific data set and sensors they were developed on and are seldom transferable to new data.
Recent advances in the field of deep learning are triggering a new wave of studies looking into the possibilities to disentangle the complexity of highdensity 3D point clouds and solve semantic (Krisanski et al., 2021;Hyyppä et al., 2020), instance segmentation (Windrim and Bryson, 2020) and regression tasks (Oehmcke et al., 2021).One promising avenue in this field is the development of sensor-agnostic models that can learn general point cloud features and allow their transferability independently from the characteristics of the input point cloud.The advantage of such models is that they can be used off-the-shelf on new data without hyperparameter tuning.One exemplary case of moving in the direction of sensor-agnostic models for forest point clouds is the study by Krisanski et al. (2021), who developed a semantic segmentation model to classify primary features (i.e., ground, wood, and leaf) in forest 3D scenes captured with TLS, ALS, and MLS.While desirable, there are currently no sensor-agnostic tree instance segmentation models for point cloud data.However, steps have been made to integrate sensor-agnostic deep learning semantic models with more traditional algorithmic pipelines to solve the instance segmentation challenge (Krisanski et al., 2021;Oehmcke et al., 2022;Chen et al., 2021).
TLS2Trees Wilkes et al. (2022) addressed the instance segmentation problem while leveraging on the FSCT semantic segmentation model published by Krisanski et al. (2021).In their approach, the wood classified points from the semantic segmentation are used to construct a graph through the point cloud, then uses a shortest path analysis to attribute points to individual stem basesWilkes et al. (2022).In a final step, the leaf classified points are then added to each graph.A key aspect in this pipeline that affects the quality of the downstream products is the initial definition of the clusters done using the DBSCAN clustering method (Ester et al., 1996).The performance of DBSCAN depends on the separability of the instances, which is tightly linked to the output of the FSCT segmentation model (i.e., wood class parts, including stems and branches) and to the forest type.In particular, instance clustering can be challenging in dense forests with a substantial amount of woody branches in the lower parts of the crown (e.g.Norway spruce forests).One potential avenue to boost the quality of the initial definition of the instances is to develop new point cloud semantic segmentation models that allow for a clearer separability of the instances by, for example, focusing on the main tree stem (i.e., excluding branches).
In TLS2Trees Wilkes et al. (2022) , the instance segmentation performance depends on a set of hyperparameters which should be individually tuned for a given type of forest to archive the best possible performance.So far, both in TLS2Trees and other tree segmentation approaches tuning of this types of hyperparameters have traditionally been done manually by individual researchers for each data set through trial and error processes.However, the possibility for a systematic and automated approach for hyperparemeter optimization exists.Several potential methods that could solve the challenge exists (e.g.simulated annealing or Bayesian optimization Brochu et al. (2010)).Furthermore, it is also worth keeping in mind that the gradientbased methods may be ill-posed due to the non-convex profile of the hyperparameter space.
Leveraging on resent advances in semantic segmentation Krisanski et al. (2021) and instance segmentation approaches Wilkes et al. (2022) this study introduces Point2Tree which is a new modular framework for semantic and instance segmentation for MLS data.The Point2Tree has two main modules (1) a newly trained Pointnet++ based semantic segmentation model with the classes optimized for coniferous forest (p2t semantic, see Tab.2), and (2) an optimisation procedure for instance segmentation hyperparameter optimization based on the Bayesian flow approach Burt et al. (2019).Point2Tree is modular in the way that each of the components can be easily replaced by an improved module.We evaluate the performance of both the semantic and instance segmentation of Point2Tree with settings including: (a) with and without hyperparameter optimization and (b) with both the new semantic Pointnet++ model (i.e.p2t semantic) as well as with the semantic model from FSCT Krisanski et al. (2021) i.e. fsct semantic (see Tab.2).The evaluation is done against a newly annotated dataset from our study area as well as an existing independent dataset from another part of Europe.

Study area
The study area was located in an actively managed boreal coniferous dominated forest in Våler municipality in south-eastern Norway (N 59.503 219°, E 10.884 240°).A total of 16 circular plots of 400 m 2 were purposefully selected to cover a range of forest conditions in terms of species composition (see Fig. 1) and stand density (200-2500 trees ha −1 ; Tab. 1).These plots included forests where the dominant species was either Norway spruce (Picea abies (L.) Karst.),Scots pine (Pinus sylvestris L.), or birch (Betula pubescens or Betula pendula), including different degrees of mixing between the species.Concerning the developmental stages, the selected plots were located either in mature forests stands or in stands in the middle of their rotation period.However, no young forest in the regeneration phase were included.

MLS data acquisition
Mobile laser scanning (MLS) data was collected in June 2022 using a GeoSLAM ZEB-HORIZON (GeoSLAM 2020) in correspondence to the 16 circular field plots of 400 m 2 area.The data collection was initialized by booting the GeoSLAM ZEB-HORIZON in the center of the field plot.Consequently, the operator was walking two perpendicular eight figures extending for a diameter of approximately 30 m, followed by a walk around the plot's perimeter.The data collection lasted for 10-15 min per plot.The raw MLS data were then processed within the GeoSLAM Hub software relying on a proprietary SLAM algorithm.The resulting point clouds were down-sampled to only 9 % of the total points and exported as .lasfiles.This value is the default value in GeoSLAM Hub software, which in previous experiences was found to reduces data redundancy while maintaining the 3D structure information.The whole point clouds were further clipped to include only the area of the plot plus a buffer of 5 m around the plot area to ensure that all crowns of trees at the edge of the plots could be segmented.

Point cloud annotation
The point clouds corresponding to the 16 selected plots were manually annotated using CloudCompare (Girardeau-Montaut et al.) by a team of two annotators, followed by a review step by the annotators' administrator.The annotation consisted of two consecutive steps: (1) Instance annotation: segmentation of single trees if they could be identified as trees (i.e., not always possible for small understory trees).The segmentation was done so that branches of intermingled trees were separated as far as practically possible.
(2) Semantic annotation: the annotators classified every single point into the following classes: ground, vegetation (branches, leaves, and low vegetation), coarse woody debris (i.e., deadwood), and stems.The classes were the same as those defined by Krisanski et al. (2021), with the difference that the stems were separated from the branches and assigned to a general vegetation class.The reason behind this modification of the semantic classes, was that the stems are distinct features in coniferous forests that can enable a more precise separation of the single instances.The points from trees with the stem at breast height outside the plot were removed from further analysis.
We followed the same split approach by Krisanski et al. (2021) and divided the circular plot into four radial slices and randomly assigned two for training (50 %), one for validation (25 %) and one for the testing (25 %).A complete overview of the segmented plot can be found in the supplementary materials in

Methods
The pipeline used in this work comprises several stages, as presented in Fig. 3 and 4 .It is worth noting that due to the large size of point cloud files, arranging all the steps in the pipeline well and orchestrating their behavior to obtain a good system performance is essential.In some cases, it may be hard to arrive at the processing results due to ineffective data processing which do not account for all the aspects of the data well.This effect may mostly occur when dealing with locally very sparse point clouds (often on borders of the cloud).Therefore, all steps are prepared in a modular fashion and the stages are parameterized so they can be adjusted to a different densities and types of point clouds.Furthermore, the steps of the pipeline are prepared in the way which enables smooth substitution of selected components of the system.The elements of the system are interconnected using programming language agnostic composition strategy as presented in Fig. 3.
Within this work we used a custom naming convention.Tab. 2 presents a set of acronyms and features of different pipelines we partially based our framework and which we compare against our performance.Point2Tree framework is equipped with optimization, analytics modules and enables also incorporation of external module implemented in different programming languages thanks to its modular architecture.This feature of P2T was used in order to employ semantic segmentation from FSCT and Instance segmentation from TLS pipelines.
It is worth noting that (see Tab. 2) that "p2t semantic" was obtained from "fsct semantic" by replacing original set of FSCT weights of Point- net++ model with the model trained on our data.

Data preprocessing
In the preprocessing step, the initial tiling operation is done.Tile size is adjustable and should be chosen based on the data profile.It is also worth noting that there is usually an entire range of low-density point cloud tiles on in the edges of point clouds.This data is difficult to digest for both semantic segmentation and instance extraction part of the pipeline.Therefore, a dedicated procedure to remove these low-density tiles was adopted.The procedure examines all the tiles in terms of their point density.A tile is removed from further processing if the density is below a critical threshold.As may be noticed, the tiles' granularity strictly affects the point cloud's shape after the protocol.The effect of the density-based tiling protocol is presented in Fig. 4.

Semantic segmentation
In our flow Pointnet++ (Qi et al., 2017;Krisanski et al., 2021) implemented in Pytorch was used as a base model for semantic segmentation.The model was trained from scratch using the newly annotated dataset (p2t semantic, see Tab.2).
The point cloud was sliced into cube-shaped regions to prepare the data for Pointnet++.Each cube was shifted to the origin before inference to avoid floating-point precision issues.The preprocessing is performed before training or inference, and each sample is stored in a file to minimize computational time and facilitate taking advantage of parallel processing.The preprocessing also takes advantage of vectorization by using the NumPy package.
During training, we used subsampling and voxelization protocol to 1 cm.The parameters used in the training process are listed in Tab. 3, and the training was done on Nvidia GV100GL [Tesla V100S PCIe 32GB].

Instance segmentation
We employed the TLS2trees instance segmentation technique Wilkes et al. (2022).This segments trees through a series of steps that follow use initial semantic segmentation as input.The TLS2trees method initially constructs a graph through the wood classified points.A comprehensive explanation of this approach is provided in Wilkes et al. (2022).It is important to note that the accuracy of the instance segmentation is reliant on the results of the semantic segmentation as well as the quality of the data employed for training this model.

Evaluation
Evaluating machine learning models and pipelines can be challenging, as it requires providing an appropriate set of metrics and a protocol for applying  them in a repeatable and reliable way.While it is much easier to develop a protocol if the point matching at the cloud level is ensured, there is still a way to compare results against the ground truth when that is not the case.
In the solution presented in this paper, the input data is down-sampled during the pre-processing, leaving only a single random point per voxel.As a result, the number of output points is lower than the input points.There are also small distortion introduced related to several point-cloud conversion in a process of mapping between data formats.We provide a methods based on KNN algorithm for point matching.Our method consists of the following steps: 1.An algorithm for iterative tree elimination: (a) Find the biggest trees in the GT.(b) Find the biggest overlap in PD.
(c) Assign GT (Ground Truth) to PD (predicted) and eliminate PD.
(d) Add to collection (dictionary).2. Compute tree-level metrics based on the dictionary.We aggregate results on multiple levels and use a set of common metrics such as F1-score and IOU (Jaccard index) to assess the performance of the model on a pixel level.(2) Also the residual height operating on a tree level (Eq.5) as the difference between ground truth height (gt) and predicted height (pred) of trees is calculated.
The square root of the average squared difference between ground truth heights (h gt) and predicted heights (h pred) over a dataset is given by Eq. 6 For large datasets, serial execution of the metrics is pretty slow; thus, a parallel version was implemented and used for experiments in this work.

Optimization
This work proposes an optimization protocol based on a Bayesian approach Brochu et al. (2010); Shahriari et al. (2016); Snoek et al. (2012).It is a sequential method that gradually explores the space of hyperparameters, focusing on the most promising manifolds within it.This method is especially suitable for applications where each iteration is time-consuming, as is the case with processing a large volume of point cloud data, presented in this work.In particular, we compute F1-score (Eq. 3) as a function of point cloud instance segmentation.
The method works by constructing a probabilistic model, typically a Gaussian Process (GP), to represent the unknown function and then using an acquisition function to balance exploration and exploitation when deciding on the next point to sample.The objective is to find the global optimum with as few evaluations as possible.The GP is defined by a mean function µ(x) and a covariance function k(x, x ), which together describe the function's behavior.The choice of kernel function is crucial for the performance of Bayesian optimization as it encodes the prior belief about the function's smoothness.A commonly used kernel is the squared exponential kernel: where σ 2 f represents the signal variance and l is the length scale parameter, x, x are input vectors for which we want to compute the covariance.They are multi-dimensional and model tree hyperparameters setup.
In the Bayesian optimization framework, we start with a prior distribution over the unknown function, and after each evaluation, we update our beliefs using Bayes' rule.This results in a posterior distribution,which is used to guide the search for the global optimum (Brochu et al., 2010).
The instance segmentation stage is composed of multiple modules which contain a series of hyperparameters that should be optimized to reach the best possible performance of the model.The most important ones are depicted in Fig. 5 [10,20,30,50,100,150,200] Table 4: Optimization parameters and their ranges.See Figure 5

and Wilkes et al. 2022 for parameter definitions
The chosen values of the hyperparameters cover the most promising and useful ranges.It is worth noting that the choice and the number of parameters affects the performance of the optimization algorithm.Consequently, they should be picked according to the specific profile of the forest dataset in question.
The optimization process of Point2Tree involves many iterations of the pipeline execution with a distinct set of parameters.Therefore, applying a well-structured protocol to address this process is reasonable.
In each iteration of the optimization process, the F1-score is derived from the complete dataset, and the algorithm maximizes its value over the steps of the execution.
The optimization is done by optimizing the F1-score for the entire set.The overall F1-score is calculated using a three-fold protocol: Algorithm 1 F1-score calculation 1: for plot in dataset do 2: for for tree in plot do 3: Compute F1-score 4: end for

5:
Aggregate F1-score per plot 6: end for 7: Aggregate F1-score per dataset Based on the F1-score the optimization algorithm guides the next steps of the optimization.In our research and experiments we have noticed that it is possible to improve the optimization results by decomposing the process into several stages.After the initial stage (e.g.40 runs) it is possible to stop the optimization and restart it for a limited and the most important set of parameters.
Algorithm 2 Optimization Algorithm -two stage protocol Require: Initial parameters 1: Run initial optimization 2: Select less then 4 parameters of the highest importance 3: Run optimization for the selected parameters Ensure: Optimized parameters Point2Tree provides a module for assessing the importance of hyparameters in the optimization process.The results are presented in the supplementary material.

Final validation
After completing the optimization, we evaluated the best set of hyperparameters for both Point2Tree (P2T) with fsct semantic and p2t semantic.To validate our results, we compared them against the LAUTx dataset, a benchmark for the instance segmentation (Tockner et al., 2022b).Aside from the F1-score, we evaluated additional metrics, including precision, recall, residual height, detection, commission, and omission rate.Finally, we provided comparisons for the regular model with standard parameters used in Wilkes et al. (2022) and the optimized set of parameters.

Semantic segmentation performance
The newly trained p2t semantic model achieved precision, recall, and an F1-scores of 0.92.The F1-score ranking per class (vegetation, terrain, and stem) reflected the proportions of these classes in the training data, as shown in Table 5. Vegetation was the most common class, representing over 61% of the dataset, while CWD and stem were the least common, accounting for only 0.34% and 13.6% of the dataset, respectively.The confusion matrix for p2t semantic model is shown in Figure 6, while

Instance segmentation optimization
We applied the Bayesian flow for the optimization of hyperparameters for the instance segmentation Brochu et al. (2010) and considered all the hyperparameters as presented in Tab. 4. We optimized the hyperparameters for (1) TLS2Trees Wilkes et al. (2022) that uses the semantic segmentation from FCST i.e. fsct semantic Krisanski et al. (2021) and (2) Point2Tree which uses the semantic segmentation model developed in this paper and the instance segmentation framework from TLS2Trees i.e. tls instance.
The optimization was interrupted after 50 iterations as only minor marginal improvements were observed for both tested models (see Fig. 7).Interestingly, when considering all iterations, the rate of growth of the F1-score over the number of iterations was negative for the optimization of Point2Tree (slope of -0.0002).At the same time, it was slightly positive TLS2Trees (slope of 0.0002).These numbers indicate that the method Point2Tree was more robust to variations in the choice of hyperparameters.Such property is desired and might be explained by the definition of the instances being more robust when based only on clean stems rather than using a class merging stems and woody branches.
The analysis of the importance of the different hyperparameters on the F1-score revealed differences between the two approaches (see Fig. 12).In particular, we found the following for the respective hyperparameters: • find stem height: there was a contrasting effect between the Point2Tree with p2t semantic and fsct semantic.In the latter, find stem height was the important hyperparameter, while it was the least important for p2t semantic approach.Point2Tree results showed that fsct semantic semantic segmentation approach selected higher (approx.1.75 m above ground) slices, whereas p2t semantic preferred close-to-the-ground (approx.0.8 m) slices.The need for fsct semantic to search for tree instances higher up the stem might be due to larger noise due to low branches and low vegetation in the wood instance class used for clustering the instances.In this context, p2t semantic proved more robust in filtering out low-vegetation and non-stem points.
• find stems thickness: for this hyperparameter, defining the thickness of the slice used for clustering the instances, the two approaches also behaved differently, with P2T-fsct semantic approach tended to select narrower (approx.20 cm) slices, whereas P2T with p2t semantic approach preferred wider (approx.0.75 m) slices.The selection of narrower search windows in P2T-fsct semantic approach, coupled with the selection of higher slices, is needed to reduce the noise due to branches and low vegetation.Using wider slices allows for including more extensive portions of a tree stem, thus increasing the chance of detecting single instances.
• find stem min points: this was the most crucial hyperparameter for P2T-p2t semantic approach and was negatively correlated with the F1-score (-0.4), meaning that the minimum number of points to trigger a new instance was 50-100 points, rather than 200 points in the P2T-fsct semantic default values.On the other hand, for the P2Tfsct semantic approach, this hyperparameter was the second least important and negatively correlated with the f1-score(-0.26),preferring values between 100 and 120 points.
• graph edge length: This was the second most important hyperparameter for P2T-p2t semantic approach but was very weakly correlated (0.008) with the F1-score.The preferred value for P2T-pt2 semantic model was around 1 m, the same as the default value in TLS2trees.For P2T-fsct semantic approach, the most suitable values were in the 0.4 -0.6 m range.This is the maximum length an edge in the graph can be.
If this is set to a larger value then disconnected points (occlusion) can be connected although this may bridge gaps between trees.It relates to the flexibility of the graph growth, and P2T-fsct semantic approach is more rigid.
• graph maximum cumulative gap We can see that (Fig. 12) this hyperparameter has a contrast effect on P2T-p2t semantic and P2Tfsct semantic approaches.In P2T-pt2 semantic approach, it has a pretty high positive correlation (0.25) and kind of medium importance, which means that long gaps are accepted, resulting in more additional stems and branches to be included (see Wilkes et al. (2022) Fig. 7).This is acceptable and desirable in P2T-p2t semantic case since we skip branches and focus only on trunks.In P2T-fsct semantic method, the smaller values of graph maximum cumulative gap are preferred, which may be considered an attempt to reduce the noise in the form of small branches and stems.The parameter is also important in P2Tfsct semantic.
• add leaves voxel length In the case of add leaves voxel length again the impact contrasting.In the case of P2T-p2t semantic approach, the parameter is quite important (approx.016) and slightly positively correlated.Those values indicate that the voxel size affects the output F1-score of our method.This relation is expected because our method is based on trunk modeling without branches, so the size of the leaf voxel is important.On the other hand, attempting to manipulate this parameter does not lead to high gains in model performance.Conversely, this parameter has a negative (approx.-0.35) correlation in the P2T-fsct semantic method but even smaller importance.This lack of impact can be explained by the fact that since the P2T-fsct semantic approach is based on the branches, the size of the leaf voxels is not that critical, and also, it is preferred to be low to give more freedom to the graph constructing algorithm in TLS.
• add leaves edge length This parameter, in both the approaches, has a low correlation and low importance.The overall impact of the parameter on the output is relatively low.
• slice thickness This parameter is unimportant in both methods.However, it is worth noting that in the case of P2T-fsct semantic method, it has a more negative correlation.Therefore, in P2T-fsct semantic approach, it is better to have smaller slices, which can be beneficial since the algorithm operates on branches.On the other hand, in P2T-p2t semantic approach, trunks are more distinguishable as an effect of the lack of branches, so the TLS algorithm can allow larger slices, which is reflected in the correlation of the parameter.
More detailed examination of Fig. 12 reveals two P2T-fsct semantic and five of P2T-p2t semantic hyperparameters in the right part of the plot.It means that for P2T-fsct semantic approach, there are two important and contrasting parameters, namely graph maximum cumulative gap and find stem height, whereas, for P2T-p2t semantic method, there are five of them.Furthermore, in the case of P2T-p2t semantic approach, the optimization landscape is blurry since we need to manipulate five less contrasting parameters.

Metrics evaluation 4.3.1. On test data from this study
The evaluation of the metrics computed against the test data (see Tab. 7) revealed that using optimal parameters resulted in a performance boost compared to using default values for both P2T with p2t semantic and fsct semantic .In line with the optimization findings, the magnitude of the improved performance for P2T-fsct semantic was twice as large (0.08 F1-score increase) compared to P2T-p2t semantic (0.04 F1-score increase).When optimized, both approaches reached similar levels of F1-score.Despite the marginal differences, P2T with p2t semantic approach resulted in a smaller residual height of 0.87 m and 3.47 RMSE and a lower detection rate than P2Tfsct semantic.The analysis of the false positive rates indicated that both approaches tended to over-segment the point clouds, and such behavior was more prominent in P2T-p2t semantic.On the other hand, it is essential to highlight that while having more significant commission errors, P2T-p2t semantic model reduced to nearly zero the false negatives (i.e., omitted trees).
The experiments presented in Tab.7 were conducted for a set of the best hyperparameters which we obtained in the optimization process.The set is given in Tab.8.

On the LAUTx data
The results from applying the P2T with fsct semantic and P2T with p2t semantic pipelines to the LAUTx data for each of the different sets of hyperparameters revealed that both approaches performed consistently or even better than what was found for our initial test data (see Tab. 9).
Interestingly, the default parameters were more suitable than the optimized hyperparameters, highlighting the need for more extensive and varied datasets of annotated plots for more robust optimization.Alternatively, the optimization process can be done separately for each dataset but this requires that a part of a dataset is labeled for the Point2Tree pipepline for adjusting the hyperparamters of the instance segmentation.
The P2T with p2t semantic approach performs slightly better than the P2T fsct semantic.This result may stem from p2t semantic model focusing more on the trunks, and the LAUTx dataset Tockner et al. (2022b) is composed of high and mostly separated trees.
Sample results of the performance of the P2T-p2t semantic on the LAUTx dataset Tockner et al. (2022b) are provided in Fig. 8.
In order to visualize the performance of the metrics used in the experi-    ments, we have provided a series of plots for different quality of results.They are given by Fig. 9, 11 and 10.We can see that the number of artifacts grows once the metrics values go down.It is also worth noting that some artifacts are specific to a given dataset labeling process.For instance, as we can see in Fig. 9, 11 and 10, the bottom of tree trunks were very deeply labeled, such that the labeling account for a part of the ground.This issue is not the case for the data p2t semantic model was trained on.Thus it is hard to achieve a perfect metric on the LAUTx dataset Tockner et al. (2022b).

Conclusions
The study show a new framework for point cloud instance segmentation.The framework consist of series of components which are structured in the flexible way.This allows to replace a selected parts of the pipeline (e.g.instance segmentation) with new or alternative modules.The new modules may be implemented in alternative languages (e.g.C++, java etc.) and still the integration is possible.The framework is also equipped with optimization module and important parameters visulization.
We also tested the effect of hyperparameter tuning the TLS2trees instance segmentation pipeline, developed initially mainly for tropical forests, to optimal settings for coniferous forests.Further, we tested the effect of using a semantic segmentation model specifically designed to focus on identifying the stems of coniferous trees on the tree instance segmentation accuracy.Our study found that the hyperparameter tuning positively affected the segmentation output quality of our data.However, when applying the same parameters to the external LAUTx dataset, the performance was poorer than using the default setting.This result indicates that to estimate a more robust and transferable set of hyperparameters, we need to develop more extensive databases of openly available annotated point cloud data spanning a broader range of forest types than those used in this study.When optimized, the effect of using different semantic segmentation models (i.e., P2T with p2t semantic and fsct semantic) was marginal.However, it is also true that the instance segmentation relying on p2t semantic model seemed less sensitive to the choice of hyperparameters and thus more robust in dense forests or forests with many low branches (e.g., non-self-pruning species).
Due to the architecture of the instance segmentation algorithm there are set of hyperparamters which are not acceptable and lead to collapse of the pipeline (one or several plots break).This imposed another constrain a choice of the protocol for hyperparameter tuning i.e.Bayesian approach for which the space of the hyperparameters does not have to be convex.

Figure 1 :
Figure 1: The difference in forest structures included in the sample plots by tree species.

Figure 2 :
Figure 2: Visualization of the instance and semantic annotation with a detail showing how the stem class was separated from the rest of the crown.

Figure 3 :
Figure 3: Block diagram of the modular architecture of the processing pipeline.

Figure 4 :
Figure 4: Schematic representation of the adopted workflow.
, and tested values are shown in Tab. 4.

Figure 5 :
Figure 5: Schematic representations of the meaning of each different tls instance hyperparameter tested in this study, including hyperparameters related to the identification of the single tree instances both for p2t semantic (a) and fsct semantic (b) semantic segmentation models; the drawing of the stem/wood instance graph (c); and the drawing of the leaves graph (d).

Figure 7 :
Figure 7: F1-score across the optimization iterations of instance segmentation with two different semantic segmentation models i.e. fsct semantic and p2t semantic

Figure 12 :
Figure 12: Comparison of P2T with p2t semantic parameters importance and correlation with P2T with fsct semantic importance and correlation

Figure A. 13 :
Figure A.13: Visualization of the annotated plots for the semantic and instance segmentation.

Table 1 :
Summary statistics for the selected plots.

Table 2 :
Components of the pipeline and their properties.*This is fsct semantic but trained on our data.

Table 3 :
The parameters used in the training process for p2t sementic module.

Table 5 :
Number of point-cloud points in the dataset used for training semantic segmentation model p2t semantic

Table 6 :
Table6shows the class-wise metrics for semantic segmentation.Class-wise metrics for p2t semantic semantic segmentation

Table 7 :
Instance segmentation results on the test dataset.

Table 8 :
Set of the best parameters for both of the models.