Efficient Underground Tunnel Place Recognition Algorithm Based on Farthest Point Subsampling and Dual-Attention Transformer

An autonomous place recognition system is essential for scenarios where GPS is useless, such as underground tunnels. However, it is difficult to use existing algorithms to fully utilize the small number of effective features in underground tunnel data, and recognition accuracy is difficult to guarantee. In order to solve this challenge, an efficient point cloud position recognition algorithm, named Dual-Attention Transformer Network (DAT-Net), is proposed in this paper. The algorithm firstly adopts the farthest point downsampling module to eliminate the invalid redundant points in the point cloud data and retain the basic shape of the point cloud, which reduces the size of the point cloud and, at the same time, reduces the influence of the invalid point cloud on the data analysis. After that, this paper proposes the dual-attention Transformer module to facilitate local information exchange by utilizing the multi-head self-attention mechanism. It extracts local descriptors and integrates highly discriminative global descriptors based on global context with the help of a feature fusion layer to obtain a more accurate and robust global feature representation. Experimental results show that the method proposed in this paper achieves an average F1 score of 0.841 on the SubT-Tunnel dataset and outperforms many existing state-of-the-art algorithms in recognition accuracy and robustness tests.


Introduction
Place recognition is a fundamental capability for robot state estimation and is widely applied in robotic systems such as autonomous cars.In restricted environments like underground tunnels where GPS [1,2] is not available, place recognition can provide precise global positioning information for vehicles or robots, ensuring their stable operation in underground tunnels.Specifically, in stagewise pose estimation, place recognition adds loopback constraints to a vehicle's repeated arrival intervals, reducing the cumulative error in globally consistent localization.Secondly, place recognition also helps initialize the positioning system for continuous posture tracking and is also responsible for acting as a backup positioning scheme in case of odometer drift.Therefore, the study of an efficient position identification method is of great significance in realizing global localization estimation [3] in underground tunnels.Existing 3D LiDAR-based position recognition methods have primarily been developed to address position recognition problems on urban roads.To the best of our knowledge, current 3D LiDAR-based position recognition methods have not been developed for underground tunnel environments and have not been tested on underground tunnel datasets.
Underground tunnels exhibit significant large-scale geometric repetition due to their unique scene configuration.Sensor data acquired in such environments inevitably lack effective geometric features that can be easily distinguished.Existing algorithms, which are mainly experimented with in usual environments such as school campuses and highways, generally rely on recognizable descriptive features in the data for place recognition.However, when faced with underground tunnel scenes with a large number of repeating 3D structures and similar textures, it is difficult to guarantee the original recognition accuracy.It is even more difficult to deal with the tricky situations that often occur in underground tunnels, such as obstacles blocking the sensors and changes in sensor viewpoints.
A few researchers have designed place recognition algorithms based on semantic information, focusing more on the influence of point cloud categories on the recognition effect.SG-PR [4], on the other hand, transforms point clouds into graph-structured data with the help of existing semantic segmentation methods.It describes the scene at the semantic level and focuses on the encoding relationships between semantic objects.However, the recognition accuracy of this method is heavily dependent on the performance of the segmentation algorithm, and the algorithm runs with high complexity.RINet [5] designs a rotationally invariant global descriptor and utilizes semantic information and geometric features to improve its discriminative ability.It has low computational complexity, but relies on highly accurate semantic labeling and spends extra time converting data formats.In addition, descriptor-based place recognition methods have also received extensive attention.M2DP [6] parses the point cloud from a multi-view perspective and obtains the density distribution of each point cloud based on the spatial density distribution characteristics of the points on the plane.The method is novel and robust, but it is difficult to guarantee the recognition accuracy due to the loss of some data features when projecting the point cloud data.SegMatch [7] segments the point cloud and drastically reduces the number of matches by clustering.It first encodes the features using a 3D convolutional neural network (CNN), after which the corresponding candidate values are identified through k-nearest neighbors (KNN), and finally, a geometric validation step is used to convert the candidate values into location-aware candidates.This approach combines the advantages of local and global descriptors, but the connection between objects is not fully considered, and the recognition accuracy needs to be improved.PointNetVLAD [8] innovatively adopts PointNet [9] as the backbone network to extract point cloud features, with targeted consideration of point cloud substitution invariance and affine invariance.PCAN [10] improves on the above method.It first extracts multi-scale local contextual information to generate point attention maps via the group sampling method of Point-Net++ [11], and then uses NetVLAD [12] to aggregate the attention-weighted local features into a global descriptor.DAGC [13] again improves the above algorithm by combining the dynamic graph architecture and the dual-attention mechanism to aggregate local context information and then using the NetVLAD layer to aggregate the global descriptor.The above algorithm has been continuously improved and has gained some improvement in accuracy.However, the generalized PointNet-based point cloud analysis architecture makes it difficult to improve the recognition accuracy of complex underground scenes in a targeted manner.SeqLPD [14] and LPD-Net [15] first fuse the neighborhood features of each point in feature space and Cartesian space, and then use NetVLAD to generate global descriptors.However, some of the effective information is lost when 3D data are projected to 2D.SOE-Net [16] accomplishes the recognition task end-to-end.The network first extracts local descriptors point-by-point using the PointOE module, which is a combination of Pointnet and an Orientation-encoding Unit, and then aggregates differentiated global descriptors using the Self-Attention module and NetVLAD.However, the loss function threshold of this method needs to be set in advance.
In summary, some advanced algorithms rely on the accuracy of semantic segmentation algorithms, and it is difficult to independently and stably realize the autonomous recognition of underground tunnels.Meanwhile, the advanced place recognition methods based on descriptors make it difficult to ensure recognition accuracy due to the difficulty of fully utilizing the small number of effective features in the underground tunnel data.To address these issues, this paper introduces an efficient point cloud place recognition algorithm, DAT-Net.This method first employs a farthest point subsampling module to remove invalid redundant points from the point cloud data, reducing the point cloud scale while preserving its basic shape and geometric features to improve algorithm efficiency and convergence speed.Next, this paper proposes a dual-attention Transformer module, which utilizes dot-product attention and sinc attention to adaptively extract important features to enhance distortion sensitivity.Specifically, by using the simple and efficient parallelizable dot-product attention, attention is focused on the K key V value pairs related to the Q query vector, capturing local features in the data and adaptively enhancing local information in the point cloud.Additionally, the proposed sinc attention is used for local feature filtering, improving fine-grained quality prediction performance in underground tunnel scenes.This dual-attention strategy effectively enhances the extraction of local descriptors and integrates highly discriminative global descriptors based on the global context using the feature fusion layer, resulting in more accurate and robust global feature representation.Here are the contributions of this article: 1.
The use of the farthest point subsampling module significantly reduces the point cloud size, decreasing the computational complexity of the model while preserving point cloud features.

2.
A point cloud analysis module based on the dual-attention layer Transformer has been developed, enhancing the accuracy and robustness of place recognition.

3.
Experiments in place recognition and robustness testing on an underground track dataset demonstrate that our approach performs exceptionally well, achieving an average dispersion of 0.841.

Model Design 2.1. Overall Network Design
In order to improve the accuracy of the place recognition algorithm in underground tunnels, this paper proposes an efficient point cloud place recognition algorithm (DAT-Net), as shown in Figure 1.The algorithm first uses the farthest point downsampling module to eliminate the invalid redundant points in the point cloud data and retain the basic shape of the point cloud to reduce the size of the point cloud and at the same time reduce the impact of the invalid point cloud on the data parsing.After that, the transformed feature space solver module is used to dimensionally diffuse the point cloud and map the spatial flip suitable for feature extraction.Finally, the Transformer's dual-attention layer is used to facilitate the exchange of multiple pieces of local information and integrate highly discriminative global descriptors according to the feature fusion module to obtain a more accurate and robust global feature representation.based on descriptors make it difficult to ensure recognition accuracy due to the difficulty of fully utilizing the small number of effective features in the underground tunnel data.
To address these issues, this paper introduces an efficient point cloud place recognition algorithm, DAT-Net.This method first employs a farthest point subsampling module to remove invalid redundant points from the point cloud data, reducing the point cloud scale while preserving its basic shape and geometric features to improve algorithm efficiency and convergence speed.Next, this paper proposes a dual-attention Transformer module, which utilizes dot-product attention and sinc attention to adaptively extract important features to enhance distortion sensitivity.Specifically, by using the simple and efficient parallelizable dot-product attention, attention is focused on the K key V value pairs related to the Q query vector, capturing local features in the data and adaptively enhancing local information in the point cloud.Additionally, the proposed sinc attention is used for local feature filtering, improving fine-grained quality prediction performance in underground tunnel scenes.This dual-attention strategy effectively enhances the extraction of local descriptors and integrates highly discriminative global descriptors based on the global context using the feature fusion layer, resulting in more accurate and robust global feature representation.Here are the contributions of this article: 1.The use of the farthest point subsampling module significantly reduces the point cloud size, decreasing the computational complexity of the model while preserving point cloud features.

A point cloud analysis module based on the dual-attention layer Transformer has
been developed, enhancing the accuracy and robustness of place recognition.

Experiments in place recognition and robustness testing on an underground track
dataset demonstrate that our approach performs exceptionally well, achieving an average dispersion of 0.841.

Overall Network Design
In order to improve the accuracy of the place recognition algorithm in underground tunnels, this paper proposes an efficient point cloud place recognition algorithm (DAT-Net), as shown in Figure 1.The algorithm first uses the farthest point downsampling module to eliminate the invalid redundant points in the point cloud data and retain the basic shape of the point cloud to reduce the size of the point cloud and at the same time reduce the impact of the invalid point cloud on the data parsing.After that, the transformed feature space solver module is used to dimensionally diffuse the point cloud and map the spatial flip suitable for feature extraction.Finally, the Transformer's dual-attention layer is used to facilitate the exchange of multiple pieces of local information and integrate highly discriminative global descriptors according to the feature fusion module to obtain a more accurate and robust global feature representation.

Farthest Point Downsampling
Compared to images, a single frame of 3D LiDAR captures a significantly larger amount of data, and using this directly as input for a Transformer would lead to a substantial increase in model size.Refs [6,8,15] have used random downsampling methods or projected 3D data into 2D images to reduce the point cloud data volume.However, they lose too many point cloud features during downsampling, making it challenging to guarantee algorithm accuracy.Inspired by the research on PointNet [9], we believe that the outer points of an object hold more analytical value than the inner points because, in an ideal scenario, a sufficient number of outer points can adequately represent an object.Therefore, this paper adopts the farthest point downsampling algorithm, which reduces the scale of the point cloud while preserving environmental shape features as much as possible.The process is shown below: 1.

2.
Select start point: randomly select a point as p o from the data or according to the density.

3.
Initialize the subsampled point set: Begin by creating an empty subsampled point set S, and then add P 0 to the subsampled point set, resulting in S = {P 0 }. 4.
Loop until the termination condition is met: a.
For each point p i in the initial point cloud, compute the Euclidean distance d i from point p o .
Sort the calculated Euclidean distance values and select the point furthest from point p o in the downsampled point set S. c.
Update the selected point to p o .

5.
Termination condition: the preset number of downsampled points N is reached.6.
Output: downsampled point cloud data X∈ R N×3 .
As shown in Figure 2, a frame of point cloud data from the SubT-Tunnel dataset typically contains around 120,000 points.Over-representation of the environment in some point-concentrated regions appears to be redundant.When the number of downsampling points N is set to 10,240 or 5120, it can be seen that the points in some areas are still very concentrated.After a series of attempts, it is found that when N is set to 1024, the downsampled point cloud can still express the shape of the original point cloud well, and will not be as sparse as the point cloud when N is set to 256.Therefore, this paper finally sets the number of downsampled points N to 1024.N = 1024 is the optimal value for this method.The specifications of the sensors and the environmental conditions during data collection influence the shape, density, and features of the original point cloud.The value of N for farthest point subsampling should be adjusted based on the specific environment and sensors.As shown in Figure 2, a frame of point cloud data from the SubT-Tunnel dataset typically contains around 120,000 points.Over-representation of the environment in some point-concentrated regions appears to be redundant.When the number of downsampling points N is set to 10,240 or 5120, it can be seen that the points in some areas are still very concentrated.After a series of attempts, it is found that when N is set to 1024, the downsampled point cloud can still express the shape of the original point cloud well, and will not be as sparse as the point cloud when N is set to 256.Therefore, this paper finally sets the number of downsampled points N to 1024.N = 1024 is the optimal value for this method.The specifications of the sensors and the environmental conditions during data collection influence the shape, density, and features of the original point cloud.The value of N for farthest point subsampling should be adjusted based on the specific environment and sensors.

Solution Module for Transformed Eigenspaces
Setting the step size of convolutional neural networks (CNNs) to 1 is capable of achieving translational isotropy in image analysis.However, in point clouds, 0 degrees and 359 degrees are adjacent, which means limiting the step size of CNNs to 1 does not fully realize rotational isotropy for point cloud analysis.Therefore, a convolutional neural network suitable for point cloud analysis is required to better advance the performance of place recognition.An intuitive solution is to use a recurrent convolutional neural network with a step size of 1.Its greatest property is rotational invariance, which is well-suited for point cloud analysis.First, the downsampled point cloud data X ∈ R L×3 are subjected to three rotationally invariant convolution blocks, and then feature extraction is performed level by level to obtain different levels of features F 1 , F 2 , and F 3 .The rotation invariant convolution can be expressed in the following form: where i ∈ [0, N − 1] and K ∈ R (2M+1)×N denotes the convolution kernel.As shown in Figure 3, the shallower feature F 1 is passed into the coordinate transformation module for feature dimension diffusion using 1D convolution.After that, the transformation matrix R is gathered and computed with the help of a linear connection layer.After flipping the feature F 1 , it is fused with the deeper feature F 3 to enhance the adaptability of the algorithm to the scene perspective change.

Solution Module for Transformed Eigenspaces
Setting the step size of convolutional neural networks (CNNs) to 1 is capable of achieving translational isotropy in image analysis.However, in point clouds, 0 degrees and 359 degrees are adjacent, which means limiting the step size of CNNs to 1 does not fully realize rotational isotropy for point cloud analysis.Therefore, a convolutional neural network suitable for point cloud analysis is required to better advance the performance of place recognition.An intuitive solution is to use a recurrent convolutional neural network with a step size of 1.Its greatest property is rotational invariance, which is well-suited for point cloud analysis.First, the downsampled point cloud data X ∈ R L×3 are subjected to three rotationally invariant convolution blocks, and then feature extraction is performed level by level to obtain different levels of features F 1 , F 2 , and F 3 .The rotation invariant convolution can be expressed in the following form: where denotes the convolution kernel.As shown in Figure 3, the shallower feature  1 is passed into the coordinate transformation module for feature dimension diffusion using 1D convolution.After that, the transformation matrix R is gathered and computed with the help of a linear connection layer.After flipping the feature  1 , it is fused with the deeper feature  3 to enhance the adaptability of the algorithm to the scene perspective change.

Transformer
Transformer and self-attention mechanisms have made a number of contributions in the field of machine translation and NLP.This has inspired the development of self-attention models for images, the most influential of which is ViT [17].Refs [18,19] tried to analyze the whole point cloud with self-attention and found that its learning mechanism is very effective for point clouds.This paper introduces a dual-attention Transformer module, which utilizes both dot-product attention and sinc attention to adaptively extract important features, thereby enhancing sensitivity to distortions.Specifically, it focuses attention on K key-value pairs related to the Q query vector using the simple, efficient, and parallelizable dot-product attention.This helps capture local features in the data, adaptively reinforcing local information within the point cloud.Additionally, the proposed sinc attention is employed to perform local feature filtering and establish long-term dependencies, enhancing the algorithm's fine-grained quality prediction performance in underground tunnel scenes.This dual-attention strategy effectively improves the extraction of local descriptors.Subsequently, with the help of the feature fusion layer, it integrates highly distinguishable global descriptors based on the global context to obtain a more accurate and robust global feature representation.More specifically, the feature fusion

Transformer
Transformer and self-attention mechanisms have made a number of contributions in the field of machine translation and NLP.This has inspired the development of selfattention models for images, the most influential of which is ViT [17].Refs.[18,19] tried to analyze the whole point cloud with self-attention and found that its learning mechanism is very effective for point clouds.This paper introduces a dual-attention Transformer module, which utilizes both dot-product attention and sinc attention to adaptively extract important features, thereby enhancing sensitivity to distortions.Specifically, it focuses attention on K key-value pairs related to the Q query vector using the simple, efficient, and parallelizable dot-product attention.This helps capture local features in the data, adaptively reinforcing local information within the point cloud.Additionally, the proposed sinc attention is employed to perform local feature filtering and establish long-term dependencies, enhancing the algorithm's fine-grained quality prediction performance in underground tunnel scenes.This dual-attention strategy effectively improves the extraction of local descriptors.Subsequently, with the help of the feature fusion layer, it integrates highly distinguishable global descriptors based on the global context to obtain a more accurate and robust global feature representation.More specifically, the feature fusion layer autonomously learns and assigns larger weight values to high-quality local descriptors, ensuring that the aggregated global descriptors can better describe the entire point cloud.The multi-head self-attention layer is shown in Figure 4.The feature F is first equated into local descriptors Q ∈ R N×D/h , K ∈ R N×D/h , and V ∈ R N×D/h after a sigmoid layer, a rotationally invariant convolutional layer, and a linearly connected layer.Subsequently, the first local feature extraction is performed in the first attention layer.The computational process is represented by Equation ( 2) as follows: where B is the matrix of learnable parameters.B Q i , B K i , and where B is the matrix of learnable parameters.   ,    , and    ∈  ×/ℎ and   ∈  × .The second attention layer is proposed to parse the point cloud features in more depth and is named as the Sine Attention Mechanism.It first substitutes features Q, K, and V into the Sinc function to obtain a smoother numerical representation.After that, the parameters are exchanged and shared through Multilayer Perceptron (MLP).Based on this, feature multiplication by Q and K is performed to calculate the assigned weights of local features.The formula for the sinusoidal attention mechanism is expressed as follows: The size of the features remains constant after multiple attention layers and the quality of the local descriptors can be improved by stacking the network layers L times.The final output is a splice of N D-dimensional local descriptors  ∈  × .The h in the multihead attention is set to 12 and the number of loops L is also set to 12.

Feature Fusion Module
Inspired by SimGNN [20] and SG-PR [4], this paper assigns weights to local descriptors with the help of the attention mechanism.It learns autonomously and assigns larger weight values to high-quality local descriptors so that the aggregated global descriptors can better describe the whole point cloud.
As shown in Figure 5, the learnable parameter matrix  ∈  × is set.First, J is multiplied with the average of all local descriptors to construct a global context to provide The second attention layer is proposed to parse the point cloud features in more depth and is named as the Sine Attention Mechanism.It first substitutes features Q, K, and V into the Sinc function to obtain a smoother numerical representation.After that, the parameters are exchanged and shared through Multilayer Perceptron (MLP).Based on this, feature multiplication by Q and K is performed to calculate the assigned weights of local features.The formula for the sinusoidal attention mechanism is expressed as follows: The size of the features remains constant after multiple attention layers and the quality of the local descriptors can be improved by stacking the network layers L times.The final output is a splice of N D-dimensional local descriptors F ∈ R N×D .The h in the multi-head attention is set to 12 and the number of loops L is also set to 12.

Feature Fusion Module
Inspired by SimGNN [20] and SG-PR [4], this paper assigns weights to local descriptors with the help of the attention mechanism.It learns autonomously and assigns larger weight values to high-quality local descriptors so that the aggregated global descriptors can better describe the whole point cloud.
As shown in Figure 5, the learnable parameter matrix J ∈ R D×D is set.First, J is multiplied with the average of all local descriptors to construct a global context to provide global structure and feature information.After that, a sigmoid function is used to ensure that the weights are in [0, 1] to assign a weight value to each local descriptor.Finally, the final global descriptor e ∈ R D is obtained through summation, which is expressed by the following: Sensors 2023, 23, x FOR PEER REVIEW 7 of 12 global structure and feature information.After that, a sigmoid function is used to ensure that the weights are in [0, 1] to assign a weight value to each local descriptor.Finally, the final global descriptor  ∈   is obtained through summation, which is expressed by the following:

Loss Function
In this paper, two linear connected layers are used to reduce the dimensionality of the matching vectors, and a sigmoid function is used to obtain the similarity

Test Results and Performance Analysis
In order to evaluate the effectiveness of the method proposed in this paper, the periments are conducted using the SubT-Tunnel [21] underground tunnel dataset, which consists of three different tunnel scenarios, S1, S2, and E1.The server for training uses an Intel ® CoreTM i7-6950X CPU processor with four NVIDIA GeForce GTX 1080Ti, each with 11G of memory.The experiments are implemented on a Pytorch [22] deep learning framework based on the Ubuntu 18.04 operating system with GPU (GTX1080Ti) for training and testing.In this paper, the model parameters are optimized using the Adam [23] optimizer, where the initial learning rate is set to 0.0001, learning decreases with the number of iterations, weight decay is 0.0005, batch size is 1024, and the maximum number of iterations is 500 rounds.

Experimental Setup
In the experiments, point cloud pairs with timestamps differing by more than 30 s and Euclidean distances less than 3 m will be defined as positive samples, and point cloud pairs with Euclidean distances more than 20 m will be defined as negative samples.During training, sample pairs with timestamps less than 30 s are also included in the training set to increase the interference term.During the evaluation process, positive and negative samples with a time difference of less than 30 s are eliminated.This means that the performance of the algorithm is not evaluated based on simple pairs of positive samples

Loss Function
In this paper, two linear connected layers are used to reduce the dimensionality of the matching vectors, and a sigmoid function is used to obtain the similarity p(y i ) ∈ [0, 1] of the scenes.The loss function used for training is the binary cross-entropy loss function with the following equation: where y i is the binary label 0 or 1, and p(y i ) is the probability that the output belongs to label y i .

Test Results and Performance Analysis
In order to evaluate the effectiveness of the method proposed in this paper, the experiments are conducted using the SubT-Tunnel [21] underground tunnel dataset, which consists of three different tunnel scenarios, S1, S2, and E1.The server for training uses an Intel ® CoreTM i7-6950X CPU processor with four NVIDIA GeForce GTX 1080Ti, each with 11G of memory.The experiments are implemented on a Pytorch [22] deep learning framework based on the Ubuntu 18.04 operating system with GPU (GTX1080Ti) for training and testing.In this paper, the model parameters are optimized using the Adam [23] optimizer, where the initial learning rate is set to 0.0001, learning decreases with the number of iterations, weight decay is 0.0005, batch size is 1024, and the maximum number of iterations is 500 rounds.

Experimental Setup
In the experiments, point cloud pairs with timestamps differing by more than 30 s and Euclidean distances less than 3 m will be defined as positive samples, and point cloud pairs with Euclidean distances more than 20 m will be defined as negative samples.
During training, sample pairs with timestamps less than 30 s are also included in the training set to increase the interference term.During the evaluation process, positive and negative samples with a time difference of less than 30 s are eliminated.This means that the performance of the algorithm is not evaluated based on simple pairs of positive samples (neighboring scenarios), allowing for a more accurate assessment of its capabilities in real-world environments.In this paper, DAT-Net is compared with state-of-the-art algorithms such as M2DP [6], LPD-NET [15], DISCO [24], PointNetVLAD [8], SeqOT [25], and RINET [5].For a fair comparison, the F 1 score [26] value is used uniformly to measure the algorithm performance.F 1 is defined as follows, where P denotes precision and R denotes recall.

Analysis of Experimental Results
Qualitative Analysis: Figure 6 shows the recall-precision plot, and the P-R curve of DAT-Net shows that it has higher recall and precision compared to the other algorithms in each scene, which indicates that the method proposed in this paper is able to maintain better recognition in different underground tunnel scenes.In addition, it can be seen that unlike the S1 and S2 scenarios, the performance of each algorithm decreases somewhat in the mega-scene E1.However, the proposed algorithm DAT-Net maintains a stable performance in the E1 scenario compared to other state-of-the-art algorithms, which indicates that the accuracy and stability of DAT-Net are excellent.
(neighboring scenarios), allowing for a more accurate assessment of its capabilities in realworld environments.In this paper, DAT-Net is compared with state-of-the-art algorithms such as M2DP [6], LPD-NET [15], DISCO [24], PointNetVLAD [8], SeqOT [25], and RINET [5].For a fair comparison, the  score [26] value is used uniformly to measure the algorithm performance. is defined as follows, where P denotes precision and R denotes recall.

Analysis of Experimental Results
Qualitative Analysis: Figure 6 shows the recall-precision plot, and the P-R curve of DAT-Net shows that it has higher recall and precision compared to the other algorithms in each scene, which indicates that the method proposed in this paper is able to maintain better recognition in different underground tunnel scenes.In addition, it can be seen that unlike the S1 and S2 scenarios, the performance of each algorithm decreases somewhat in the mega-scene E1.However, the proposed algorithm DAT-Net maintains a stable performance in the E1 scenario compared to other state-of-the-art algorithms, which indicates that the accuracy and stability of DAT-Net are excellent.Quantitative Analysis: According to the results in Table 1, DAT-Net achieved excellent experimental results on the underground tunnel dataset, i.e., an average maximum score of 0.841.In this experiment, DAT-Net scored on average 0.134 points higher than the second-place RINet in the S1, S2, and E1 scenarios.The experimental results of M2DP and LPD-Net were unsatisfactory, mainly because they converted the data to top view, resulting in the loss of some features of the underground tunnel data.SeqOT also lost some valid information in the data conversion.PointNetVLAD only considers the features of a single point and ignores their connection with neighboring features.Underground tunnel scenes have more sparse effective features compared to normal scenes, so it is difficult to use the above algorithms to show higher accuracy in underground tunnels due to the lack of effective utilization of features.It is worth noting that the E1 series used in the experiments were recorded in an ultra-large-scale complex underground tunnel environment, and all the algorithms decreased in accuracy for this series.DAT-Net still obtained a high score as high as 0.797 in the E1 scene, indicating that the method proposed in this paper still has excellent recognition capability and stability in large underground tunnel scenes.
= [ −  ] + [ −  ] + [ −  ] b.Sort the calculated Euclidean distance values and select the point furthest from point  in the downsampled point set S. c. Update the selected point to  . 5. Termination condition: the preset number of downsampled points N is reached.6. Output: downsampled point cloud data X∈  × .
the scenes.The loss function used for training is the binary cross-entropy loss function with the following equation:

Table 1 .
Average F scores of each advanced algorithm on SubT-Tunnel dataset.