Improving Aerial Targeting Precision: A Study on Point Cloud Semantic Segmentation with Advanced Deep Learning Algorithms

Bozkurt, Salih; Atik, Muhammed Enes; Duran, Zaide

doi:10.3390/drones8080376

Open AccessArticle

Improving Aerial Targeting Precision: A Study on Point Cloud Semantic Segmentation with Advanced Deep Learning Algorithms

by

Salih Bozkurt

^1,2,*

,

Muhammed Enes Atik

¹

and

Zaide Duran

¹

Department of Geomatics Engineering, Istanbul Technical University, Maslak, 34469 İstanbul, Türkiye

²

Baykar Technology, Baykar Makina San. ve Tic. A.Ş., Orhangazi Mah., Hadımköy-İstanbul Cad., No:258, Esenyurt, 34538 İstanbul, Türkiye

^*

Author to whom correspondence should be addressed.

Drones 2024, 8(8), 376; https://doi.org/10.3390/drones8080376

Submission received: 23 June 2024 / Revised: 21 July 2024 / Accepted: 21 July 2024 / Published: 6 August 2024

Download

Browse Figures

Versions Notes

Abstract

The integration of technological advancements has significantly impacted artificial intelligence (AI), enhancing the reliability of AI model outputs. This progress has led to the widespread utilization of AI across various sectors, including automotive, robotics, healthcare, space exploration, and defense. Today, air defense operations predominantly rely on laser designation. This process is entirely dependent on the capability and experience of human operators. Considering that UAV systems can have flight durations exceeding 24 h, this process is highly prone to errors due to the human factor. Therefore, the aim of this study is to automate the laser designation process using advanced deep learning algorithms on 3D point clouds obtained from different sources, thereby eliminating operator-related errors. As different data sources, dense 3D point clouds produced with photogrammetric methods containing color information, and point clouds produced with LiDAR systems were identified. The photogrammetric point cloud data were generated from images captured by the Akinci UAV’s multi-axis gimbal camera system within the scope of this study. For the point cloud data obtained from the LiDAR system, the DublinCity LiDAR dataset was used for testing purposes. The segmentation of point cloud data utilized the PointNet++ and RandLA-Net algorithms. Distinct differences were observed between the evaluated algorithms. The RandLA-Net algorithm, relying solely on geometric features, achieved an approximate accuracy of 94%, while integrating color features significantly improved its performance, raising its accuracy to nearly 97%. Similarly, the PointNet++ algorithm, relying solely on geometric features, achieved an accuracy of approximately 94%. Notably, the model developed as a unique contribution in this study involved enriching the PointNet++ algorithm by incorporating color attributes, leading to significant improvements with an approximate accuracy of 96%. The obtained results demonstrate a notable improvement in the PointNet++ algorithm with the proposed approach. Furthermore, it was demonstrated that the methodology proposed in this study can be effectively applied directly to data generated from different sources in aerial scanning systems.

Keywords:

aerial defense; defense industry; LiDAR; mapping; photogrammetry; UAV

1. Introduction

Recent technological advancements have had a profound impact on the field of artificial intelligence (AI), instilling greater confidence in the outcomes produced by AI models. Consequently, AI has found widespread application across diverse sectors such as automotive, robotics, healthcare, space exploration, and defense, owing to its adeptness in swiftly rendering decisions based on comprehensive data analyses. Point clouds, originating from various methods like RADAR (Radio Detection and Ranging), LiDAR (Light Detection and Ranging), or photogrammetry, have gained paramount importance, notably within the defense and automotive industries. This significance is underscored by the incorporation of high-performance gimbal or RADAR systems into aerial vehicles. In the defense sector, AI has been pivotal in tasks encompassing target recognition, environmental situational awareness, and terrain analysis. This shift from rule-based methodologies to AI-driven paradigms is underpinned by the inherent intricacies associated with point cloud data processing [1]. Semantic segmentation of point clouds, constituting a pivotal realm of research, has witnessed the ascendancy of deep learning techniques. Although existing studies predominantly concentrate on point clouds generated through LiDAR technologies, those stemming from Unmanned Aerial Vehicles (UAVs) remain relatively underrepresented. This underrepresentation is somewhat surprising given the higher data density and photogrammetric processing prerequisites unique to UAV-derived point clouds. It is incumbent upon the research community to address this disparity, especially considering the extensive deployment of UAVs within defense contexts. Of notable significance are gimbal camera systems, which play instrumental roles in laser-guided munition deployments. In these scenarios, gimbal operators wield laser designation to guide munitions with precision. The primary objective of this study is to automate the segmentation and detection of 3D objects in point cloud data, which vary in features and densities across different sources. This study focuses on dense point clouds containing color information produced using photogrammetric methods and data obtained from LiDAR systems, which do not contain color information and are less dense. For photogrammetric point cloud data, image acquisition flights were conducted using the Akinci UAV’s multi-axis gimbal camera system within the scope of this study. The acquired images were processed using photogrammetric methods to produce dense point clouds containing color information. For the evaluation of point clouds obtained from LiDAR systems, the DublinCity [2] LiDAR dataset was used. To perform 3D object segmentation, popular 3D deep learning approaches—the PointNet++ [3] and RandLA-Net [4] algorithms—were chosen due to their success in various studies for different purposes. In this study, the results obtained from the algorithms were compared, and a new approach based on point-wise color and position information was proposed specifically to improve the segmentation performance of the PointNet++ algorithm. In conclusion, considering the overall evaluation of the results presented in Section 4, it is believed that this study will contribute to both defense industry operations and academia.

2. Related Works

Recent advancements in deep learning algorithms have catalyzed the proliferation of algorithms in the domain of point cloud processing. Notable among these algorithms are PointNet++ and RandLA-Net, distinguished for their computational efficiency and high accuracy, as evidenced by various studies [4,5]. The selection of PointNet++ and RandLA-Net as the focal point of this study stems from these promising outcomes. Within the realm of point cloud processing, in addition to PointNet++ and RandLA-Net, various other artificial intelligence algorithms have found utility. Lei et al. [6] introduced a framework founded on fuzzy kernel separation for 3D point cloud segmentation. This framework incorporates the SegGCN (Segmentation Graph Convolutional Network) structure for segmenting point clouds, integrating fuzzy logic into discrete convolutional kernels tailored for 3D point cloud data. Challenges were encountered during segmentation, particularly in delineating areas characterized by small geometries. In a study by Liu et al. [7], the under-utilization of semantic information in 3D data was addressed. They proposed the Point Context Encoding (PointCE) structure, aimed at integrating semantic information into 3D point clouds. Additionally, they introduced the Semantic Context Encoding Loss (SCE Loss) controller to guide the learning of semantic context features, obviating extensive hyperparameter tuning while ensuring high accuracy. Fan et al. [8] focused on learning spatial features, introducing the Spatial Contextual Features (SCF) structure, distinct from that proposed by Liu et al. [7]. SCF is designed to work harmoniously with different algorithms, facilitating evaluations with diverse algorithm integrations and datasets. Cheng et al. [9] recognized the high costs associated with labeling in 3D point cloud segmentation and the sensitivity of current algorithms to label information. They proposed SSPC-Net, a semi-supervised, semantic point cloud segmentation network. SSPC-Net predicts points based on their semantic significance, employing 3D superpoints and superpoint graphs for partially predicting unlabeled data. Atik and Duran [10] proposed an approach for enhancing deep learning-based 3D semantic segmentation results, leveraging 3D geometric features and filter-based feature selection with algorithms like RandLA-Net and superpoint graph. Hedge and Gangisetty [11] introduced PIG-Net, an inception-based architecture for characterizing local and global features of 3D point clouds, demonstrated on the ShapeNet and PartNet datasets. Jiang et al. [12] proposed the 3D PointSIFT algorithm, inspired by 2D SIFT, and verified its performance against prominent 3D segmentation algorithms using the S3DIS and ScanNet datasets. In a separate work by Duran et al. [13], machine learning algorithms were applied to photogrammetric and LiDAR data, integrating color information into segmentation processes, with the MLP algorithm yielding the highest accuracy. Wu et al. [14] proposed a novel network architecture for indoor point cloud semantic segmentation, addressing the limitations of existing methods that focus primarily on complex local feature extraction, neglecting global features. Their method, based on anisotropic separable set abstraction (ASSA), includes an improved ASSA module for enhanced local feature extraction, an inverse residual module to improve global feature extraction, and a mixed pooling method to fuse coarse- and fine-grained features. Lin et al. [15] tackled the common issue of over- and under-segmentation arising from nonhomogeneous objects of interest and uneven sampling densities. Their approach combines conditional and voxel filtering to reduce the spatial range and amount of point cloud data. They further segmented the spatial range into a concentric circular grid to simplify data processing and employed a dynamic threshold model for accurate ground point identification, particularly on uneven, broken, and sloped roads. Additionally, they utilized a point cloud homogeneity model to enhance ground point identification in vegetated areas. Ozturk et al. [16] proposed a feature-wise fusion strategy of optical images and LiDAR point clouds to enhance road segmentation performance. Using high-resolution satellite images and LiDAR data, they trained a deep residual U-Net architecture, improving prediction statistics across different ResNet backbones. The integration of optical images and LiDAR point clouds enhances road segmentation performance, particularly in woodland and shadowed areas.

Despite the numerous studies conducted, there remains a limited number of research efforts that address operational problems while incorporating UAV systems and making substantial algorithmic contributions. Particularly notable is the scarcity of studies that effectively bridge the gap between academic research and industrial applications. This study addresses these gaps by introducing several key contributions. First, it presents a novel enhancement of the PointNet++ algorithm by integrating color attributes, which significantly improves segmentation accuracy. Second, it demonstrates the application of advanced deep learning techniques to automate 3D object segmentation and detection in aerial defense operations, a critical area of interest. Lastly, the study provides a comprehensive evaluation using both photogrammetric and LiDAR-derived point cloud data, offering practical insights into the use of UAV systems for data acquisition and processing. These contributions not only advance the field of 3D point cloud segmentation but also offer valuable solutions for real-world applications involving UAV technology.

3. Materials and Methods

3.1. Study Area

As part of the study, the required flights for generating the photogrammetric 3D point cloud data were conducted using the Akinci UAV produced by Baykar Technology in İstanbul, Türkiye, and the study area selected is the district of Çorlu, located in Tekirdağ province, Türkiye. The study area covers an approximate area of 5 km², featuring multi-story regular structures. A visual representation of the study area is presented in Figure 1.

3.2. Data

3.2.1. DublinCity LiDAR Point Cloud

The DublinCity dataset was created in 2015 in the city center of Dublin, the capital of Ireland, using the Aerial Light Detection and Ranging (A/LiDAR) method. The flights were conducted via a helicopter at an approximate altitude of 300 m over an area of 5.6 km² in the city center of Dublin. The resolution of the obtained point cloud data varies between 250 and 348 points per square meter. The dataset was generated through the collection and labeling of LiDAR data on a city-wide scale. It consists of 13 classes hierarchically organized. These classes are further divided into four main categories: building, vegetation, ground, and undefined. Within these main categories, there are subcategories, such as window, door, tree, and others. A hierarchical representation of the DublinCity dataset is shown in Figure 2.

The DublinCity dataset encompasses approximately 100,000 datasets and a vast collection of 260 million points. The data density is quantified at an average of 384.43 points per square meter [17]. An illustrative example of the DublinCity dataset is depicted in Figure 3.

The fundamental objective of the conducted study is to perform the segmentation of outdoor objects visible to aerial vehicles. The choice of the DublinCity LiDAR dataset is attributed to its comprehensive annotation of outdoor objects, making it one of the largest labeled datasets available for this purpose.

3.2.2. Photogrammetrically Generated Point Cloud

The data used in this study were collected using the Akinci UAV developed by Baykar Technology Inc. The data collection flights were conducted from an altitude of approximately 5000 ft/1524 m due to flight permit restrictions. The acquisition of the required test data took approximately 2 h. During the data collection process, the UAV followed predefined circular flight paths with radii of 1 km and 2 km over the study area. The UAV’s gimbal camera system conducted scans in three axes over the region. These scans were conducted using a RGB daylight camera, capturing video recordings at a frame rate of 30 frames per second (FPS). Finally, images of the study area were obtained from the recorded video at a reduced frame rate of 5 FPS. The primary purpose of reducing the FPS rate was to eliminate highly similar images, thereby minimizing the time and hardware requirements during network training.

The utilized gimbal camera system is outfitted with a range of sensors tailored for distinct purposes. Within the framework of this study, the flights were carried out during daytime conditions, necessitating the utilization of the Electro-Optic Wide (EOW) sensor. The EOW sensor functions across an array of focal lengths, spanning from 8.6 to 154 mm. Furthermore, under standard lighting circumstances, the sensor offers a digital field of view encompassing angles between 1.57° and 27.6°, along with an analog field of view ranging from 1.05° to 18.6°. During low-light scenarios, the sensor’s effective field of view extends from 2.38° to 40.8°. Detailed information concerning other sensors remains undisclosed, primarily for security reasons.

The production of the point cloud involved a total of 1349 images and the utilization of 27 uniformly distributed ground control points. Due to the considerations of defense industry requirements and scenarios, field surveys were not conducted. Consequently, the ground control points were established based on orthophoto images with a ground sampling distance of 30 cm, provided by the General Directorate of Mapping. Height information was derived from the GNSS (Global Navigation Satellite System) and laser data of the aerial vehicle. At the conclusion of the process, a point cloud dataset containing 12,923,148 points was generated. This point cloud dataset has a resolution of 328 points per square meter (p/m²).

In the context of photogrammetric methods for generating three-dimensional point clouds, the initial step involves photogrammetric flight planning. However, when considering defense industry products, gimbal systems exhibit significantly higher capabilities compared to civilian equipment. Furthermore, flight missions often encompass multiple objectives simultaneously, necessitating the assessment and analysis of acquired data in this multifaceted manner. Given these considerations, photogrammetric approaches bring forth a variety of challenges. Examples of such challenges encompass the differences in gimbal axes on the aerial vehicle, the incompatibility of end-user analysis programs with photogrammetric assumptions, and the need for laborious manual calculations of cost-intensive computations. In this study, efforts were directed toward resolving the aforementioned errors during the point cloud data generation process.

To address these challenges, calculations were performed on the aerial vehicle’s image center coordinates to remedy issues. Utilizing the aerial vehicle’s reference coordinates, the image center coordinates for each image were computed. Subsequently, based on these computed center coordinates, the gimbal’s orientation data during the image capture moment were determined. This procedure facilitated the transition from the aerial vehicle’s coordinate system to the gimbal’s coordinate system. Ultimately, the acquired images, computed image center coordinates, and orientation data were processed using the trial version of the DJI Terra software V4.2.0 [18]. This integration culminated in the completion of the three-dimensional point cloud generation process. DJI Terra is a software tool capable of generating 3D point clouds from aerial photographs using photogrammetric methods. In this study, since the primary objective is the segmentation of 3D point clouds, the DJI Terra program was used in the point cloud production process. Software tools such as DJI Terra predominantly facilitate the generation of photogrammetric point clouds through the utilization of Structure-from-Motion (SfM) algorithms. SfM is a computer vision technique that employs images captured from various perspectives to compute the three-dimensional structure of an object. Within the algorithm’s framework, the initial step involves the identification of corresponding points among the images. These corresponding points typically encompass distinctive features, such as sharp edges, corners, or descriptive regions, often detected using algorithms like SIFT or SURF. As a result of this process, the algorithm establishes which points in different images correspond to the same scene. Subsequently, for each image, the perspective-n-point (PnP) problem is solved to estimate the camera’s position and orientation. The PnP problem involves determining the pose of a calibrated camera given a set of 3D points in the world and their corresponding 2D projections in the image. Optimization techniques such as RANSAC or Levenberg–Marquardt are commonly employed to solve PnP problems. Following the PnP estimation, a reverse-projection process is applied to the 2D corresponding points to generate the triangulation models of 3D points. This iterative process is conducted using multiple images to progressively refine camera pose and orientation parameters, thereby minimizing errors inherent in these parameters [19]. A visual depiction illustrating the working principle of the SfM algorithm is presented in Figure 4.

The illustrations depicting the coordinate axes where the transformation processes were conducted are presented in Figure 5. Sample images showcasing the results obtained prior to the transformations and the point cloud data after the transformations are provided in Figure 6.

After the point cloud generation, the labeling process was conducted manually using the open-source software CloudCompare Stereo 2.12.1 [20]. An example image depicting the labeling process is presented in Figure 7.

3.3. Semantic Segmentation Methods

3.3.1. PointNET++ Algorithm

PointNet++, an extension of PointNet, overcomes limitations in capturing fine details and complex structures [3]. It employs a hierarchical approach that iteratively uses PointNet on input point clouds, progressively learning local features with increasing contextual scales using metric space distances. The key innovation is its division of points into overlapping local regions, akin to CNNs, which enables the extraction of features at different scales. PointNet++ consists of three sub-layers—sampling, grouping, and PointNet layers—which collectively create a hierarchical structure for abstracting local features. This approach allows the network to effectively capture contextual information. The architecture of PointNet++ is visually presented in Figure 8, showcasing how the algorithm’s hierarchical structure processes and abstracts local features from the input point cloud.

3.3.2. PointNet++ Enriched with Color Information

The PointNet++ algorithm is an artificial intelligence model capable of directly utilizing the geometric features of point clouds in both training and prediction processes [3]. In this study, the point cloud data generated through photogrammetry also includes color information. It was hypothesized that the neighborhood relationships and distinctiveness provided by color information would positively contribute to the PointNet++ algorithm. Therefore, the algorithm’s layers were customized to process not only the spatial information but also the color information of each point in the RGB color space. Finally, enhancements were made to obtain prediction data from the PointNet++ model trained with enriched color information. A detailed evaluation of the results obtained by incorporating color information is conducted in Section 4.3.2. When evaluating the results obtained from the study, it is demonstrated that the modifications made to the PointNet++ algorithm significantly increase prediction accuracy.

3.3.3. RandLA-Net Algorithm

RandLA-Net (Randomized Local Aggregation Network) is a powerful 3D point cloud segmentation algorithm [4]. It utilizes a hierarchical approach to capture both local and global point cloud features. By randomly sampling and dividing the input points into overlapping regions, it learns local features through MLPs (Multi-Layer Perceptrons) and aggregates them using a unique random fusion strategy. The size of the overlapping area is determined by a predefined radius, typically set to a value that balances computational efficiency and segmentation accuracy, commonly in the range of 0.1 to 0.3 m depending on the dataset. Additionally, global features are learned and combined with local ones, resulting in accurate point cloud segmentation [21]. Figure 9 visually demonstrates RandLA-Net’s hierarchical feature learning and aggregation process for point cloud segmentation.

3.3.4. Evaluation Metrics

In this study, the evaluations of the produced outputs were conducted using a sequence of metrics: accuracy [22], recall (true positive rate) [23], precision (positive predictive value [24], F1 score [23], mean intersection over union (mean IoU) [25], weighted intersection over union (W. IoU) [23], and Kappa [23]. These metrics were derived from the confusion matrices obtained from the testing of the trained artificial intelligence networks. The equations for these metrics are presented in the range (1) to (7).

Accuracy = \frac{T P_{c} + T N_{c}}{T P_{c} + T N_{c} + F P_{c} + F N_{c}}

(1)

Recall = \frac{T P_{c}}{T P_{c} + F N_{c}}

(2)

Precision = \frac{T P_{c}}{T P_{c} + F P_{c}}

(3)

F 1 Score = \frac{2 \cdot Precision \cdot Recall}{Precision + Recall}

(4)

Mean IoU = \frac{T P_{c}}{T P_{c} + F P_{c} + F N_{c}}

(5)

W . IoU = \frac{1}{\sum (w_{c} \cdot (T P_{c} + F P_{c} + F N_{c}))} \cdot \sum (w_{c} \cdot T P_{c})

(6)

Kappa = \frac{T P_{c} + T N_{c} - F P_{c} - F N_{c}}{T P_{c} + T N_{c} + F P_{c} + F N_{c}}

(7)

In the provided equations, TP represents true positive samples, TN represents true negatives, FP represents false positives, and FN represents false negatives. TP and TN signify correct model predictions, while FP and FN signify incorrect predictions. These metrics are used for classification model evaluation, with ‘c’ representing the class, and ‘w’ denoting class weight.

4. Results

In the scope of this research, initial developments and tests were conducted on the DublinCity dataset, detailed in Section 3.2.1, which was obtained with LiDAR, followed by the dataset produced using photogrammetric methods. The DublinCity dataset does not contain color information, so studies were conducted only based on positional information for both PointNet++ and RandLA-Net. On the other hand, the dataset produced using photogrammetric methods contains color information, so tests were evaluated separately under different headings. Due to the hierarchical data structure of the DublinCity LiDAR dataset, the same label number is used for different objects. Therefore, software was developed in the .Net environment to separate all independent objects with unique label data. With this software, the label numbers of objects with the same label number in different hierarchical clusters were made independent. Consequently, this prevented incorrect learning during network training.

The initial testing procedures were conducted using the PointNet++ algorithm developed in the Matlab R2022a environment and the RandLA-Net algorithm developed in the Python environment. The Matlab application is licensed under Istanbul Technical University’s educational license. The training and testing processes of the RandLA-Net and PointNet++ algorithms were guided by recommended parameters from the literature. The network parameters utilized for training and testing are presented in Table 1. Lastly, the same process was applied to the Akinci dataset, which was produced photogrammetrically and detailed in Section 3.2.2.

The results obtained for the DublinCity dataset are presented in Table 2 and Table 3, while the results obtained for the dataset produced using photogrammetric methods are presented in Table 4, Table 5, Table 6 and Table 7. Detailed examinations and explanations regarding the results are presented in the subsections within this section.

4.1. Results of DublinCity LiDAR Data Using PointNet++

In this section, the outcomes achieved through the application of the PointNet++ algorithm to the DublinCity dataset are presented. Initially, the dataset was categorized into six classes: building, door, ground, roof, vegetation, and window. Following the training and testing phases, the overall accuracy results were obtained and are presented in the first row of Table 2. An example of the six-class training data from the DublinCity dataset can be seen in Figure 10a, while the results obtained from the trained network are presented in Figure 10b.

Upon reviewing the results, although the accuracy values were around 80%, it was observed that the average IoU values were approximately 38%. Considering this, it was determined that the algorithm’s class-specific accuracy for this scenario was below par. Particularly, challenges were observed in segmenting details like doors and windows. Based on the results obtained from the six-class scenario, a decision was made to reorganize the data classes during the network’s training phase to focus on areas of detail, such as doors and windows. Consequently, the dataset was reduced to four classes: building, door, roof, and window. The overall accuracy results for the retrained four-class data using the same parameters are presented in the second row of Table 2. An example of the four-class training data can be found in Figure 11a, and an example combining the predicted labels with the actual scene is presented in Figure 11b.

Upon examining the results obtained from the four-class data, it was observed that the loss in the training graph consistently decreased over the stages, while the accuracy values increased and then stabilized after a certain point. Therefore, it was speculated that the network had either learned the data or had fallen into an overfitting state. However, one of the main reasons for network overfitting is data density. Since the DublinCity dataset used is considered to be dense enough, it was concluded that overfitting was not a problem. When reviewing the results shown in Table 2, similar to the six-class data, no conclusions could be drawn for door and window data; therefore, predictions could not be made. Looking at the overall accuracy and IoU results, distinctive and guiding results were not achieved compared to the six-class scenario. However, in a general context, it was deduced that the algorithm struggled with extracting detailed geometries. On the other hand, when analyzing building, ground, and roof data, it was evident that successful outcomes could be obtained in extracting large geometries. As a result, the dataset underwent further adjustments, focusing solely on the building, roof, and ground classes, and the operations were repeated. The overall accuracy results from the three-class data are presented in the third row of Table 2. An example of the three-class training data can be found in Figure 12a, while an example combining the predicted labels with the actual scene is presented in Figure 12b. Upon examining the approximate 90% accuracy results obtained from the three-class data, it was concluded that the interpretations made based on the results from the four-class data were accurate and appropriate. When analyzing the overall accuracy results, it is evident that the PointNet++ algorithm can detect data with larger geometries, such as buildings, ground, and roofs. However, difficulties may arise in detecting data with smaller geometries and lower densities, such as doors and windows. This conclusion aligns with the insights from the four-class results. As a result, the decision was made to focus primarily on extracting objects with larger geometries from the Akinci gimbal imagery-based stereo-generated point cloud data. Therefore, this work evolved in this direction.

4.2. Results of DublinCity LiDAR Data Using RandLA-Net

In this section, the results obtained using the RandLA-Net algorithm on the DublinCity dataset are presented. Similar to Section 4.1, where class-based analysis was conducted for PointNet++, the same approach was followed for RandLA-Net. Therefore, the results are presented sequentially for six, four, and three classes. It is important to note that the RandLA-Net architecture can utilize color information to make decisions. However, since the DublinCity dataset lacks color information, the results could not be generated based on color data. The overall accuracy for six classes is provided in the first line of Table 3, while an example with the predicted labels is shown in Figure 13a. When examining the results, it can be observed that although the accuracy values were around 89%, the average IoU values were approximately 53%, and it can be deduced that the algorithm’s class-specific overall accuracy for this scenario was below the desired level. Similar to the findings for the PointNet++ algorithm, RandLA-Net also encountered issues in segmenting detailed geometries, such as doors and windows. Therefore, the process was repeated for four classes. The overall accuracies of the tests conducted using the four-class DublinCity dataset and the RandLA-Net algorithm are presented in the second row of Table 3, while Figure 13b shows an example with the predicted labels.

Similar to the scenarios with six classes and four classes, issues were also observed for RandLA-Net in segmenting detailed geometries like doors and windows in the three-class scenario. Therefore, the process was repeated with three classes. The overall accuracies of the tests conducted using the three-class DublinCity dataset and the RandLA-Net algorithm are provided in the third row of Table 3, and Figure 13c shows an example with the predicted labels. In the tests conducted for the three-class scenario, it was observed that the overall accuracy values and the average IoU values were close to each other. For structures with large geometries, class-specific accuracy values were observed to range from approximately 83% to 99%. Based on these results, it can be concluded that the RandLA-Net algorithm can be effectively used for large geometries.

4.3. Results of Photogrammetric Point Cloud Using PointNet++

As can be seen in Figure 6c, the photogrammetrically produced point cloud data contain color information. Therefore, in the studies conducted on these data, color information was also used with the RandLA-Net algorithm. Additionally, the results of the approach developed in this study—the color-enhanced PointNet++ algorithm—are presented in a separate subsection in Section 4.3.2. To compare these results with those obtained from the DublinCity dataset, the results obtained using only positional information are presented under separate headings. In the DublinCity dataset, the best results were observed for the three-class scenario. Therefore, the processes on the generated point cloud data were also carried out based on three classes. The network parameters were kept the same as those used for the DublinCity dataset.

4.3.1. PointNet++ Trained with Only Geometric Features

The overall accuracy statistics after the training and testing processes are presented in Table 4 first row. An example of the three-class training data is shown in Figure 14a, while an example of the predicted labels along with their corresponding ground-truth labels is presented in Figure 14b.

Upon initial inspection of the results shown in Figure 14b, it was observed that despite incorrect labels in the resulting visual, an accuracy value of 94.02% was achieved. The significant differences between the graphs, despite using the same parameters as the DublinCity dataset, can be attributed to variations in point cloud density between the two datasets. Therefore, it was decided that the stereo-based data might have lower density compared to the DublinCity data, and thus increasing the number of epochs and the minimum batch size would likely have a positive impact on the results. As a result, in the second stage, the number of epochs was increased from 50 to 100, and the minimum batch size was raised from 16 to 32. The overall accuracy values with the updated parameters are presented in the second row of Table 4, while an example with the overlaid resulting labels is shown in Figure 15a.

Upon reviewing all the results, it was observed that the modified parameters had a relatively minimal impact on the accuracy values. Therefore, considering the capabilities of the current hardware, only the number of epochs was increased by a factor of 10, reaching 1000, while the minimum batch size remained constant due to hardware limitations. These hardware limitations refer specifically to the GPU memory capacity, which restricts the size of the batch that can be processed at once without running into memory overflow issues. With these updated parameters, the test process was restarted. The overall accuracy values with the updated parameters are presented in the third row of Table 4, while an example with the overlaid resulting labels is shown in Figure 15b. It can be observed that the accuracy values did not exhibit significant changes compared to the results obtained using the previous parameters. However, upon comparing the overall accuracy matrices, it is noticeable that the accuracy values for building and ground classes improved, while there was a slight decrease in accuracy for the roof class. The partial decreases observed in the roof class could potentially be attributed to overfitting. It is believed that increasing the dataset size could effectively address this issue.

4.3.2. PointNet++ Trained with RGB Color and Geometric Information

The PointNet++ algorithm is fundamentally designed to utilize solely geometric features. Enhancements were made within the scope of this study to enable the algorithm to incorporate color information. Subsequently, training and testing procedures were conducted by including the color information in the photogrammetrically generated data. The statistics of the results at the conclusion of the process are presented in Table 5. A noticeable enhancement in accuracy values was observed when color information was introduced, in contrast to the PointNet++ algorithm trained solely on geometric features. While an accuracy of approximately 94% was achieved when exclusively utilizing geometric features, the inclusion of color information elevated the accuracy to approximately 96%.

4.4. Results of Photogrammetric Point Cloud Using RandLA-Net

In this section, the results obtained from applying the RandLA-Net algorithm to the point cloud generated from Akinci UAV data through photogrammetric methods are presented. Most point cloud segmentation algorithms primarily rely on geometric relationships for interpretation. The RandLA-Net architecture, however, allows for the integration of both geometric information and color relationships during network training. In this section, the results obtained solely based on geometric information are provided, as well as the results where color relationships are included in the analysis.

4.4.1. RandLA-Net Trained with Only Geometric Features

In this section, the segmentation results obtained using only geometric relationships with the RandLA-Net algorithm are presented. The statistics of the results are shown in Table 6. The original and segmented images of a building used as the test data in the algorithm are presented in Figure 16a,b, and the prediction results are shown in Figure 16c.

4.4.2. RandLA-Net Trained with RGB Color and Geometric Information

The statistics of the results obtained from the RandLA-Net algorithm, where color relationships are included in addition to geometric features, are presented in Table 7. The original and segmented images of a building used as test data in the algorithm are shown in Figure 17a,b, and the prediction results are shown in Figure 17c.

5. Discussion

In this study, the PointNet++ algorithm, originally designed to accept only 3D coordinates as input, was extended to incorporate color information. This enhancement significantly improved its performance, particularly in the semantic segmentation of photogrammetric point clouds.

The PointNet++ algorithm was evaluated using two different methods. The initial evaluation used the original architecture, relying solely on geometric features. The second evaluation utilized the modified PointNet++ algorithm with added color information. The assessments are detailed in Section 4.3.1 and Section 4.3.2. The standard geometric-only approach achieved a maximum accuracy of approximately 94.18%, whereas the enhanced method achieved a maximum accuracy of approximately 96.03%. Similar improvements were observed in other metrics, such as mean IoU and weighted IoU, indicating notable enhancements. RandLA-Net was also evaluated through two approaches. The first approach used only geometric features, while the second included color information during training. The results are detailed in Section 4.4.1 and Section 4.4.2. Using only geometric features, the overall accuracy was 94.29% with a mean IoU of 84.95%. Including color information increased the overall accuracy to 96.74% and the mean IoU to 91.30%. These results suggest that integrating color data from photogrammetric point clouds significantly enhances the accuracy of the RandLA-Net algorithm. However, for point clouds derived from RADAR or LiDAR sensors without color information, training on geometric features alone can still achieve accuracy values of around 94%. Examining the results of both the enhanced PointNet++ and RandLA-Net, it is evident, as noted by Deschaud et al. [26], Wang et al. [27], and Robert et al. [28], that incorporating color information substantially improves segmentation accuracy.

In Table 2 and Table 3, the results for the DublinCity dataset reveal that for both six and four classes, the class-wise accuracies of PointNet++ and RandLA-Net were significantly lower than the overall accuracy values. However, for the three-class scenario, both algorithms achieved overall accuracies exceeding 90%. Notably, RandLA-Net outperformed PointNet++ in terms of accuracy for this scenario. Significant improvements were observed with the developed PointNet++ algorithm enriched with color information. The enriched PointNet++ achieved an accuracy of 96.03%, while RandLA-Net achieved an accuracy of 96.74%. When comparing the photogrammetric point cloud results to those of the DublinCity LiDAR dataset, RandLA-Net consistently produced slightly superior results. Overall, RandLA-Net demonstrated higher accuracy in both dense point cloud data, like DublinCity, which lack color information, and less dense, photogrammetrically generated point cloud data that include color information. Thus, incorporating color information into 3D coordinates proves beneficial for segmenting large geometric objects in point clouds. Photogrammetric point clouds are a crucial data source for semantic segmentation, as they provide precise color information for each point.

As observed by Long et al. [24], Joulin et al. [29], and supported by this study, reducing the number of classes in point clouds and performing class-based generalization positively impacts the results. This observation is likely because increasing the number of classes, especially those with small geometries such as doors and windows, leads to characteristic mixing. Both PointNet++ and RandLA-Net show suboptimal results for small geometries. Since small geometries are not critical for UAV systems, this is not a significant issue in this study. Future research will explore this problem using high-resolution RADAR systems.

Examining the results of both algorithms highlights the positive impact of color information on the segmentation of all class structures. The greatest success is observed in the ground class, while roof and building classes occasionally become confused. This confusion is likely due to regional color similarities and the geometrically adjacent positions of roof and building objects.

6. Conclusions

In this study, it was observed that using the PointNet++ algorithm for segmenting photogrammetrically generated point cloud data can achieve an accuracy of 96.03%. Similarly, the RandLA-Net algorithm can reach an accuracy of approximately 96.74%. However, when evaluating the results with the DublinCity dataset, a potential decrease in accuracy was noted for areas with small geometries.

The outcomes indicate that 3D segmentation tasks can be effectively performed on point cloud data obtained through photogrammetric methods, even in the absence of RADAR and LiDAR systems. Additionally, based on the results from DublinCity LiDAR point cloud data, it is speculated that the proposed approach can be extended to RADAR- and LiDAR-based point clouds integrated into UAV systems or other aircraft.

Analyzing the study’s results, it is believed that the presented approach could be particularly beneficial for target detection, marking, and similar processes. In UAV systems, especially during gimbal targeting operations, there is significant potential for the automatic guidance of laser designation systems.

In conclusion, this study provides valuable insights into the potential applications of the proposed approach, particularly in industry, offering benefits for various aspects of UAV systems and point cloud analysis and segmentation.

Another significant contribution of this study is the introduction of a new approach to the PointNet++ algorithm. By including color information in the standard PointNet++ algorithm, it has been demonstrated that prediction outcomes are positively influenced. This contribution is expected to inspire and guide future studies with different objectives. Furthermore, efforts are planned to adapt the proposed methods for use with Akinci UAV RADAR systems in the near future.

Author Contributions

Conceptualization, S.B.; methodology S.B. and M.E.A.; software, S.B. and M.E.A.; investigation, S.B.; resources, S.B.; data curation, S.B.; validation, S.B. and M.E.A.; visualization, S.B.; writing—original draft preparation, S.B.; project administration, S.B.; writing—review and editing, S.B. and M.E.A.; supervision, Z.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The codes developed and used, as well as the datasets produced and utilized within the scope of this study, are available on the GitHub repository (https://github.com/bzkrtslh/Enriched-PointNetPP-and-RandLA-NET-Point-Cloud-Semantic-Segmentation, accessed on 16 May 2024).

Acknowledgments

We would like to extend our sincere thanks to Baykar Technology Inc.’s Chairman and CTO Selçuk Bayraktar for the support of this project.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
A/LiDAR	Aerial Light Detection and Ranging
EOW	Electro-Optic Wide
FPS	Frames Per Second
GNSS	Global Navigation Satellite Systems
LiDAR	Light Detection and Ranging
SfM	Structure from Motion
RADAR	Radio Detection and Ranging
RandLA-Net	Randomized Local Aggregation Network
UAV	Unmanned Aerial Vehicle

References

Atik, M.E.; Duran, Z. An Efficient Ensemble Deep Learning Approach for Semantic Point Cloud Segmentation Based on 3D Geometric Features and Range Images. Sensors 2022, 22, 6210. [Google Scholar] [CrossRef] [PubMed]
Zolanvari, S.M.; Ruano, S.; Rana, A.; Cummins, A.; da Silva, R.E.; Rahbar, M.; Smolic, A. DublinCity: Annotated LiDAR point cloud and its applications. arXiv 2019, arXiv:1909.03613. [Google Scholar]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems, Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Markham, A. RandLA-Net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11108–11117. [Google Scholar]
Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
Lei, H.; Akhtar, N.; Mian, A. Seggcn: Efficient 3d point cloud segmentation with fuzzy spherical kernel. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11611–11620. [Google Scholar]
Liu, H.; Guo, Y.; Ma, Y.; Lei, Y.; Wen, G. Semantic context encoding for accurate 3D point cloud segmentation. IEEE Trans. Multimed. 2020, 23, 2045–2055. [Google Scholar] [CrossRef]
Fan, S.; Dong, Q.; Zhu, F.; Lv, Y.; Ye, P.; Wang, F.Y. SCF-Net: Learning spatial contextual features for large-scale point cloud segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14504–14513. [Google Scholar]
Cheng, M.; Hui, L.; Xie, J.; Yang, J. SSPC-Net: Semi-supervised semantic 3D point cloud segmentation network. Proc. AAAI Conf. Artif. Intell. 2021, 35, 1140–1147. [Google Scholar] [CrossRef]
Atik, M.E.; Duran, Z. Selection of Relevant Geometric Features Using Filter-Based Algorithms for Point Cloud Semantic Segmentation. Electronics 2022, 11, 3310. [Google Scholar] [CrossRef]
Hegde, S.; Gangisetty, S. PIG-Net: Inception based learning architecture for 3D point cloud segmentation. Comput. Graph. 2021, 95, 13–22. [Google Scholar] [CrossRef]
Jiang, M.; Wu, Y.; Zhao, T.; Zhao, Z.; Lu, C. Pointsift: A sift-like network module for 3d point cloud semantic segmentation. arXiv 2018, arXiv:1807.00652. [Google Scholar]
Duran, Z.; Ozcan, K.; Atik, M.E. Classification of photogrammetric and airborne LiDAR point clouds using machine learning algorithms. Drones 2021, 5, 104. [Google Scholar] [CrossRef]
Wu, W.; Liang, Y.; Zhang, W. Improved point cloud semantic segmentation network based on anisotropic separable set abstraction network. J. Appl. Remote Sens. 2023, 17, 036505. [Google Scholar] [CrossRef]
Lin, C.; Yang, J.; Gong, B.; Liu, H.; Sun, G. Grid and homogeneity-based ground segmentation using light detection and ranging three-dimensional point cloud. J. Appl. Remote Sens. 2023, 17, 038506. [Google Scholar] [CrossRef]
Ozturk, O.; Isik, M.S.; Kada, M.; Seker, D.Z. Improving Road Segmentation by Combining Satellite Images and LiDAR Data with a Feature-Wise Fusion Strategy. Appl. Sci. 2023, 13, 6161. [Google Scholar] [CrossRef]
DublinCity LiDAR Dataset. DublinCity: Annotated LiDAR Point Cloud and its Applications. 2015. Available online: https://v-sense.scss.tcd.ie/dublincity/ (accessed on 31 May 2022).
DJI. DJI Terra. 2022. Available online: https://www.dji.com/dji-terra (accessed on 16 June 2022).
Snavely, N.; Seitz, S.M.; Szeliski, R. Modeling the World from Internet Photo Collections. Int. J. Comput. Vis. 2008, 80, 189–210. [Google Scholar] [CrossRef]
Girardeau-Montaut, D. CloudCompare Stereo 2.12.1. 2022. Available online: https://www.cloudcompare.org/ (accessed on 11 May 2022).
Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. (TOG) 2019, 38, 1–12. [Google Scholar] [CrossRef]
Han, J.; Kamber, M.; Pei, J. Data Mining: Concepts and Techniques, 3rd ed.; Morgan Kaufmann: Burlington, MA, USA, 2011. [Google Scholar]
Powers, D. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
Cohen, J. A Coefficient of Agreement for Nominal Scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
Deschaud, J.E.; Duque, D.; Richa, J.P.; Velasco-Forero, S.; Marcotegui, B.; Goulette, F. Paris-CARLA-3D: A real and synthetic outdoor point cloud dataset for challenging tasks in 3D mapping. Remote Sens. 2021, 13, 4713. [Google Scholar] [CrossRef]
Zhu, H.; Wang, Y.; Huang, D.; Ye, W.; Ouyang, W.; He, T. Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning. arXiv 2024, arXiv:2402.02500. [Google Scholar]
Robert, D.; Vallet, B.; Landrieu, L. Learning multi-view aggregation in the wild for large-scale 3d semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5575–5584. [Google Scholar]
Joulin, A.; Grave, E.; Bojanowski, P.; Mikolov, T. Bag of tricks for efficient text classification. arXiv 2016, arXiv:1607.01759. [Google Scholar]

Figure 1. The study area is the district of Çorlu, situated in Tekirdağ province, Türkiye. (a) A map of Türkiye’s provinces. (b) A map of Tekirdağ province’s districts. (c) An image of the study area located in the district of Çorlu.

Figure 2. Illustration of DublinCity LiDAR data hierarchical structure.

Figure 3. Sample of DublinCity LiDAR data.

Figure 4. Illustration of SfM algorithm.

Figure 5. Illustration of the conversion between the aircraft and gimbal axes.

Figure 6. Comparison of before and after axis transformations for the generated 3D point cloud. (a,b) Before axis transformation. (c) After axis transformation.

Figure 7. Example of CloudCompare labeling phase. (a) Label layers. (b) Regions in the labeling stage.

Figure 8. Illustration of the PointNet++ architecture for a single-scale point group.

Figure 9. Illustration of RandLA-Net architecture.

Figure 10. Sample of DublinCity LiDAR data with 6 classes. (a) Sample of PointNet++ training data. (b) Sample of PointNet++ predicted data.

Figure 11. Sample of DublinCity LiDAR data with 4 classes. (a) Sample of PointNet++ train data. (b) Sample of PointNet++ predicted data.

Figure 12. Sample of DublinCity LiDAR Data with 3 classes. (a) Sample of PointNet++ training data. (b) Sample of PointNet++ predicted data.

Figure 13. Sample of DublinCity LiDAR data RandLA-Net prediction results. (a) Prediction results for 6 classes. (b) Prediction results for 4 classes. (c) Prediction results for 3 classes.

Figure 14. Sample of generated point cloud with 3 classes. (a) Sample of PointNet++ training data. (b) Sample of PointNet++ predicted data (minimum batch size: 16; epochs: 50).

Figure 15. Sample of generated point cloud PointNet++ results created with 3 classes (a) produced with a minimum batch size of 32 and 100 epochs, and (b) produced with a minimum batch size of 32 and 1000 epochs.

Figure 16. RandLA-Net with only geometric features: sample of a building used as test data and its original and predicted label views. (a) Original view of building. (b) Manually labeled building. (c) Predicted labels of the building.

Figure 17. RandLA-Net with color and geometric features: Sample of a building used as test data and its original and predicted label views. (a) Original view of building. (b) Manually labeled building. (c) Predicted labels of the building.

Table 1. PointNet++ and RandLA-Net training parameters.

Definition	Value
Learning Rate	0.01
L2 Regularization	0.01
Number of Epochs	50
Minimum Batch Size	16
Learning Rate Drop Factor	0.1
Learning Rate Decay	10
Gradient Decay Factor	0.9
Quadratic Gradient Decay Factor	0.999
Optimizer	Adam
Used Processor	GPU

Table 2. Metrics for DublinCity LiDAR data with PointNet++.

	Accuracy	Recall	Mean IoU	F1 Score	Kappa	Precision	W. IoU
DC + PN: 6C	0.8074	0.4623	0.3828	0.4723	0.6875	0.5009	0.6959
DC + PN: 4C	0.8789	0.4138	0.3634	0.4165	0.6655	0.4212	0.7863
DC + PN: 3C	0.9016	0.8315	0.7611	0.8537	0.8352	0.8960	0.8223

DC: DublinCity; PN: PointNet++; C: Class; W. IoU: weighted IoU. Since the DublinCity dataset does not contain color information, training was carried out using only geometric features.

Table 3. Metrics for DublinCity LiDAR data with RandLA-Net.

	Accuracy	Recall	Mean IoU	F1 Score	Kappa	Precision	W. IoU
DC + RN: 6C	0.8913	0.6492	0.5291	0.6084	0.8328	0.6182	0.8220
DC + RN: 4C	0.9388	0.5308	0.4746	0.5333	0.8476	0.5484	0.8943
DC + RN: 3C	0.8947	0.9024	0.8249	0.9033	0.8267	0.9162	0.8085

DC: DublinCity; RN: RandLA-Net; C: Class; W. IoU: weighted IoU. Since the DublinCity dataset does not contain color information, training was carried out using only geometric features.

Table 4. Metrics for photogrammetric point cloud with PointNet++ trained with only geometric features.

	Accuracy	Recall	Mean IoU	F1 Score	Kappa	Precision	W. IoU
(PPC + PN) $^{X}$	0.9402	0.9143	0.8385	0.9093	0.8943	0.9058	0.8888
(PPC + PN) $^{Y}$	0.9416	0.9063	0.8487	0.9163	0.8961	0.9277	0.8901
(PPC + PN) $^{Z}$	0.9418	0.9529	0.8295	0.9027	0.8954	0.8742	0.8955

All metrics were produced on a 3-class basis. PPC: photogrammetric point cloud; PN: PointNet++; C: class, W. IoU: weighted IoU; X: minimum batch size is 16, epoch is 50; Y: minimum batch size is 32, epoch is 100; Z: minimum batch size is 32, epoch is 1000.

Table 5. Metrics for photogrammetric point cloud with PointNet++ trained with geometric and color features.

	Accuracy	Recall	Mean IoU	F1 Score	Kappa	Precision	W. IoU
PPC + PN	0.9603	0.9616	0.8613	0.9214	0.9284	0.8959	0.9285

All metrics were produced on a 3-class basis. PPC: photogrammetric point cloud; PN: PointNet++; C: class; W. IoU: weighted IoU.

Table 6. Metrics for photogrammetric point cloud with RandLA-Net trained with only geometric features.

	Accuracy	Recall	Mean IoU	F1 Score	Kappa	Precision	W. IoU
PPC + RN	0.9429	0.8917	0.8495	0.9169	0.8958	0.9577	0.8930

All metrics were produced on a 3-class basis. PPC: photogrammetric point cloud; RN: RandLA-Net; W. IoU: weighted IoU.

Table 7. Metrics for photogrammetric point cloud with RandLA-Net trained with geometric and color features.

	Accuracy	Recall	Mean IoU	F1 Score	Kappa	Precision	W. IoU
PPC + RN	0.9674	0.9424	0.9130	0.9540	0.9406	0.9690	0.9374

All metrics were produced on a 3-class basis. PPC: photogrammetric point cloud; RN: RandLA-Net; W. IoU: weighted IoU.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bozkurt, S.; Atik, M.E.; Duran, Z. Improving Aerial Targeting Precision: A Study on Point Cloud Semantic Segmentation with Advanced Deep Learning Algorithms. Drones 2024, 8, 376. https://doi.org/10.3390/drones8080376

AMA Style

Bozkurt S, Atik ME, Duran Z. Improving Aerial Targeting Precision: A Study on Point Cloud Semantic Segmentation with Advanced Deep Learning Algorithms. Drones. 2024; 8(8):376. https://doi.org/10.3390/drones8080376

Chicago/Turabian Style

Bozkurt, Salih, Muhammed Enes Atik, and Zaide Duran. 2024. "Improving Aerial Targeting Precision: A Study on Point Cloud Semantic Segmentation with Advanced Deep Learning Algorithms" Drones 8, no. 8: 376. https://doi.org/10.3390/drones8080376

APA Style

Bozkurt, S., Atik, M. E., & Duran, Z. (2024). Improving Aerial Targeting Precision: A Study on Point Cloud Semantic Segmentation with Advanced Deep Learning Algorithms. Drones, 8(8), 376. https://doi.org/10.3390/drones8080376

Article Menu

Improving Aerial Targeting Precision: A Study on Point Cloud Semantic Segmentation with Advanced Deep Learning Algorithms

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Study Area

3.2. Data

3.2.1. DublinCity LiDAR Point Cloud

3.2.2. Photogrammetrically Generated Point Cloud

3.3. Semantic Segmentation Methods

3.3.1. PointNET++ Algorithm

3.3.2. PointNet++ Enriched with Color Information

3.3.3. RandLA-Net Algorithm

3.3.4. Evaluation Metrics

4. Results

4.1. Results of DublinCity LiDAR Data Using PointNet++

4.2. Results of DublinCity LiDAR Data Using RandLA-Net

4.3. Results of Photogrammetric Point Cloud Using PointNet++

4.3.1. PointNet++ Trained with Only Geometric Features

4.3.2. PointNet++ Trained with RGB Color and Geometric Information

4.4. Results of Photogrammetric Point Cloud Using RandLA-Net

4.4.1. RandLA-Net Trained with Only Geometric Features

4.4.2. RandLA-Net Trained with RGB Color and Geometric Information

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI