Hypergraph Position Attention Convolution Networks for 3D Point Cloud Segmentation

: Point cloud segmentation, as the basis for 3D scene understanding and analysis, has made significant progress in recent years. Graph-based modeling and learning methods have played an important role in point cloud segmentation. However, due to the inherent complexity of point cloud data, it is difficult to capture higher-order and complex features of 3D data using graph learning methods. In addition, how to quickly and efficiently extract important features from point clouds also poses a great challenge to the current research. To address these challenges, we propose a new framework, called hypergraph position attention convolution networks (HGPAT), for point cloud segmentation. Firstly, we use hypergraph to model the higher-order relationships among point clouds. Secondly, in order to effectively learn the feature information of point cloud data, a hyperedge position attention convolution module is proposed, which utilizes the hyperedge–hyperedge propagation pattern to extract and aggregate more important features. Finally, we design a ResNet-like module to reduce the computational complexity of the network and improve its efficiency. We have conducted point cloud segmentation experiments on the ShapeNet Part and S3IDS datasets, and the experimental results demonstrate the effectiveness of the proposed method compared with the state-of-the-art ones.


Introduction
With the development of 3D scanning and imaging technologies, the acquisition of point cloud data has become increasingly easy, and its applications have expanded from remote sensing information to a wide range of fields, such as robotics, virtual reality, automated driving and smart cities. Three-dimensional point clouds have shown strong vitality and great application potential, facilitating the interaction between machines and the real world [1].Point cloud semantic segmentation is an important part of 3D processing technology.By giving machines the ability to recognize and classify elements of the surrounding environment, semantic segmentation plays an important role in enhancing our perception of the world [2,3].The segmentation results directly affect subsequent applications such as autonomous driving and robot navigation, and therefore, accurate and effective point cloud segmentation is significant.
Point clouds possess characteristics such as disorder, unstructured, high-dimensional and non-uniform density.Currently, the mainstream method used for point cloud semantic segmentation is deep learning.Among them, two commonly used methods are pointbased and graph convolution-based ones.Although, methods such as PointNet [4] and pointnet++ [5] process points independently on a local scale to maintain permutation invariance.However, this independence ignores the geometric relationships between points, resulting in an inability to capture local features [6,7].The graph convolution-based methods, such as GACNet [8] and HDGCN [9], model point clouds with graphs, then learn point cloud features using graph neural networks.The methods improve the performance of point cloud segmentation to a greater extent.However, they consider only pairwise relationships among data while ignoring higher-order relationships.The higher-order relationships refer to those between two or more objects [10].A growing number of studies have shown that focusing on higher-order relationships can help to dig deeper into the potential connections between data samples, thus improving the capability of the model.Hypergraph is a generalized graph structure that extends the traditional notion of a graph.It is composed of a vertex set and a hyperedge set.The hyperedge set is a collection of subsets of the vertex set, and each hyperedge can connect one or more vertices [11].This structure allows hypergraphs to represent and handle complex relationships more flexibly.
Due to the advantages of hypergraph in modeling data correlation.In recent years, some scholars have adopted hypergraph tools to analyze and process point cloud data, but the related studies are quite sparse.Zhang et al. [12] introduced a tensor-based approach to estimate the hypergraph spectral components and frequency coefficients of the point cloud in both ideal and noisy environments, established an analytical relationship between hypergraph frequencies and structural features, then evaluated the effectiveness of hypergraph spectra in the tasks of point cloud sampling and denoising.Subsequently, they investigated the capability of hypergraph spectral analysis in unsupervised segmentation of 3D point clouds [13].In addition, Jiang et al. [14] proposed a 3D object detection method for noisy point clouds based on hypergraph construction-compression-conversion based on the fact that the hypergraph is robust to noise, by constructing hypergraphs with multi-scale voxelized structures and clustering methods, then transforming hypergraphs into graphs, as well as learning the features using graph neural networks.Deng et al. [15] investigated point cloud resampling based on hypergraph signal processing (HGSP) and designed hypergraph spectral filters to capture multilateral interactions between nodes.In the above methods, the method of utilizing hypergraph spectrum and hypergraph signal processing will bring the problem of excessive computational complexity, and the method of converting hypergraph to graph not only brings a conversion cost but also leads to loss of higher-order information.Therefore, it is necessary to apply deep learning techniques to the hypergraph, such as constructing hypergraph neural networks, to give full access to their ability in representation learning by utilizing higher-order data correlations, so as to comprehensively explore the potential information in the data and obtain better point cloud semantic segmentation performance.
A hypergraph neural network is a neural network structure that utilizes higher-order data correlations for representation learning.Compared to graph neural networks, it is better able to capture both global and local information in data.Currently, hypergraph neural networks have shown excellent performance in a variety of tasks such as object retrieval and classification [16], action recognition [17], sentiment prediction [18] and recommender systems [19].In the literature [20][21][22][23], K-Nearest Neighbor strategy is used to construct hypergraphs, and a hyperedge convolution operator is proposed to obtain the output features of the vertices by aggregating hyperedge features in which the vertices are located.The literature [19,21,24] considers the attention of the node layer and the hyperedge layer and introduces the attention mechanism into the vertex convolution and hyperedge convolution process, which automatically learns the different weights of the vertices and hyperedges during the feature transformation and propagation process.Although hypergraph neural networks have shown significant advantages in a variety of tasks, there are still some challenges in applying them to point cloud semantic segmentation.Firstly, in the current hypergraph neural networks, the widely adopted hyperedge convolution operator [20,25] can effectively aggregate the local information of the nodes in the hypergraph, but in the face of discrete and disordered point cloud data, it is difficult to capture the correlations between local and global features in the data, which leads to the incompleteness and distortion of the information.Secondly, some current hypergraph attention operators [26,27] adopt the vertex-hyperedge-vertex feature transformation mode by using hyperedges as the intermediate layer, but this practice increases the number of parameters, leading to an increase in model complexity.These limitations make it difficult for the current hypergraph neural networks to process large-scale point cloud data.Therefore, it is urgent to design more suitable hypergraph neural network structures for point cloud data processing, thereby better improving the semantic segmentation performance of point clouds.
To this end, we propose an end-to-end hypergraph deep learning framework, i.e., hypergraph position attention convolution network framework, for semantic segmentation of point clouds.Specifically, in order to efficiently organize disordered, unstructured, and high-dimensional point clouds, we construct a hypergraph to capture correlations between point clouds by combining the farthest point sampling and ball query methods.Then, we propose a hyperedge position attention convolution operator for extracting highlevel semantic features of point clouds.This operator adopts the hyperedge-hyperedge feature propagation model, which not only effectively utilizes the spatial positional information and higher-order information of the point cloud but also avoids the vertex to hyperedge propagation process, reducing the number of network parameters.Finally, we design a ResNet-like module for feature learning, which further improves the efficiency of the network by introducing deep convolution into the network.The main contributions of our work are summarized as follows: 1.
We propose a new hyperedge position attention convolution module for feature information extraction, which makes the network focus more on task-related feature information through the position information of the points and the combination of hyperedges generated from other features.

2.
We design a hypergraph position attention convolution network framework for the semantic segmentation of point clouds.Particularly, we introduced a ResNet-like deep convolution module to lighten the network and improve its efficiency.

3.
We perform segmentation and a series of ablation experiments on the S3IDS and ShapeNet Part datasets to validate the performance of the proposed method.

Related Work
The key techniques involved in this paper include deep learning-based semantic segmentation of 3D point clouds, hypergraph learning and hypergraph attention networks, which are briefly described below.

Deep Learning-Based Semantic Segmentation of 3D Point Clouds
The voxel-based methods [28] provide a way to convert unstructured geometric data into regular 3D meshes that can be directly applied by standard CNNs.The methods effectively motivate the regularization of unorganized point clouds.However, the inhomogeneity of the point cloud leads to redundant 3D meshes, which require a lot of computation and consume significant amounts of memory.Multiview-based methods [29,30] map 3D objects to 2D images from different perspectives and then process them using 2D convolution followed by fusion of these features to predict the results.Compared to voxel-based methods, the methods reduce the memory cost, but they are vulnerable to virtual viewpoints, leading to the loss of geometric feature information in the point cloud.Point cloud-based methods such as the well-known PointNet [4] and PointNet++ [5] employ direct processing of raw point cloud data to maximize the preservation of spatial features of the point clouds and achieve impressive results in testing.However, this kind of methods ignore the geometric relationships between points, resulting in a lack of localized features.The graph convolution-based methods [8,9,31] consider each point in the point clouds as a vertex of the graph and generate directed edges with neighboring points, which learn the point and edge features in the spatial or spectral domain to capture the local geometric structure features of point clouds.However, the higher-order relationships between objects are not taken into account, and it is difficult to capture the complex structures present in 3D data.

Hypergraph Neural Networks
Recently, a number of hypergraph neural networks have been proposed for various machine learning tasks, such as image classification, object detection, semantic segmentation, and human pose estimation.Depending on how they process hypergraphs, they can be divided into two categories.One is to map the hypergraph into a graph, then graph convolution methods can be applied to it.For example, in HyperGCN [25], each hyperedge is approximated by a set of pairs of edges, and then a GCN is executed on that graph.Line hypergraph convolution networks (LHCNs) [32] mapped the hypergraph to a weighted and attributed line graph and used GCN to learn the line graph.These methods increase the difficulty of the transformation and do not retain higher-order information well.Another kind of method is to design convolution operators directly on the hypergraph.For example, Feng et al. [20] proposed a Hypergraph Neural Network (HGNN) framework in which hyperedge convolution is designed by using hypergraph Laplacian operators, so that the output vertex features are obtained by aggregating their associated hyperedge features.Considering the differences in the hypergraph structure on each layer, the Dynamic Hypergraph Neural Networks (DHGNNs) [22] further extended the idea of HGNN by adding a dynamic hypergraph building block and designing vertex convolution and hyperedge convolution to aggregate vertex and hyperedge features, respectively.Bai et al. [26] proposed both hyperedge convolution and hypergraph attention operators, where hyperedge convolution is the same as HGNN, and the hypergraph attention operator utilized the attention mechanism to measure the connectivity between vertex and hyperedge.These methods can allow for a large degree of preservation of higher-order information while avoiding complex transformation processes.These works open up new perspectives and greatly broaden research ideas in the field of point cloud segmentation.

Hypergraph Attention Networks
In recent years, hypergraph attention networks (HGATs) have been an important research direction that has attracted much attention in the field of graph neural networks.The architecture of HANs [33] has been carefully designed to take full advantage of the adaptive attention mechanism, which helps to capture the association information between the hypergraph and node.Some research works have been conducted to aggregate higherorder correlation information between nodes by constructing hyperedge and introducing attention mechanisms [34].In addition, certain methods [21] introduce attention mechanisms at both the node and hyperedge levels, thus facilitating the automatic learning of different weights for vertices and hyperedges during vertex and hyperedge convolution.However, the methods may face the problem of high computational complexity when dealing with large-scale hypergraphs, thus motivating scholars to explore the design of more effective hypergraph attention mechanisms.Scholars have combined hypergraph attention networks with gated recurrent unit networks [35] to capture higher-order correlations and temporal features.In addition, to further enhance the performance and scalability of hypergraph attention networks, Cui et al. [36] enhanced hypergraph convolution networks by hierarchical organization of intra-hyperedge, inter-hyperedge and inter-hypergraph attention modules to reduce information loss.These innovative research works have expanded a new possibility for hypergraph attention networks in handling challenging tasks with complex relationships.In this paper, we follow this line of thought and construct attention mechanisms to better aggregate feature information and work on reducing its complexity.

Method
In this section, we introduce the proposed hypergraph position attention convolution network.First, we review graph attention and hypergraph definitions.Second, we describe the hypergraph construction process.Finally, we describe the architecture of the network and the corresponding modules in detail.

Review
Since our hyperedge attention convolution is inspired by graph attention.Therefore, we first introduce the relevant theory of graph attention and then review the definition of a hypergraph.

Graph Attention
The input to the graph attention layer is a set of node features , where N is the number of nodes, and F is the number of features in each node.In order to obtain enough expressive power to convert the input features into higher level features, a shared attention mechanism is executed on the nodes to compute the attention coefficients: That shows the importance of the features of node j for node i.In order to make the coefficients between different nodes easily comparable, all choices of j are normalized using the softmax function: Once obtained, the normalized attention coefficients are used to compute a linear combination of the features corresponding to them as the final output features for every node: where σ denotes the activation function.

Hypergraph Definition
A hypergraph is an important concept in discrete mathematics, which is a generalization of ordinary graphs [37].In a hypergraph, the hyperedge can connect more than 2 vertices, unlike an ordinary graph where each edge connects only two vertices.Thus, a hypergraph can represent higher-order relations more flexibly.Hypergraph G = {V, E, W H } consists of a vertex set V, a set of hyperedges E, and a diagonal matrix W of hyperedge weights.Here, each hyperedge e ∈ E is assigned a positive weight w(e).A hypergraph with N vertices can be represented by an incidence matrix H ∈ R N×|E| whose elements are defined as The degree of vertex v ∈ V based on H is denoted by The degree e ∈ E of hyperedge is denoted by Further, let D v and D e denote the diagonal matrices of the vertex degree and hyperedge degree, respectively.

Hypergraph Construction
In this paper, we use the hypergraph method to construct point cloud data.We sample the original point cloud using farthest point sampling (FPS) to identify representative centroids.Furthermore, we use the ball query to generate hyperedges that effectively capture the local and global structural relationships between point clouds, as shown in Figure 1.First of all, a vertex set V = {v 1 , v 2 , . . . ,v N } is obtained from point clouds X = {x 1 , x 2 , . . . ,x N } with N points, then the representative vertex V is obtained using the furthest point sampling algorithm, n c ≤ p ≤ N, where p represents the number of points sampled, and n c represents the number of target classes.Next, we use a ball query to divide the entire point set into a set of hyperedges.Specifically, a sampled point v F,i ∈ V F is chosen as the centroid at a time, and all nodes within the ball of radius R (including the centroid) are found and used to generate hyperedges.An important basis for choosing a radius R is the construction of a connected hypergraph, which means a reachable path exists between any pair of vertices within the hypergraph.The sampling radius of a sampling point v F,i is therefore restricted between its own maximum and minimum distances from other sampling points.The sampling radius distance of a sampled point v where the maximum sampling radius is denoted as max(dist(v F,i , V F )).The minimum sampling radius is denoted as min(dist(v F,i , V F )), V F ′ denotes the need to remove(\) the starting point v F,1 , the center point v F,i and the sampling point ν F,j 1 , • • • , ν F,j q that has been used by other nodes to generate the radius.V F ′ is expressed as In this way, we obtain the hypergraph G = (V F , E) generated from point clouds, where V F denotes the vertex set, and E denotes the hyperedge set.The method is more suitable for point cloud data with uneven distribution, irregularity, and other characteristics than kNN for constructing hypergraphs.It also reduces the complexity of computing distances.

Hypergraph Position Attention Convolution Networks
Figure 2 shows the hypergraph position attention convolution network framework for point cloud segmentation.First, the input point cloud data information is constructed by a hypergraph to obtain nodes and hyperedges.Then, it is passed into the ResNet-like module, which undergoes a 1 × 1 convolution operation.Next, the feature information enters the hyperedge position attention convolution module, and after passing through the DWconv convolution module, BN and activation function, the new feature information is obtained by multiplying with the features transmitted by the residual mechanism.Finally, the feature information is pooled by pooling operation and then enters the upsampling process to obtain the feature information by interpolation and skip operation.

Hyperedge Position Attention Convolution Module
The purple part of Figure 2 shows our proposed hyperedge position attentional convolution module, which aims to focus on the local features of the point cloud efficiently.We utilize hyperedge-hyperedge propagation to construct a position adaptive attention mechanism that focuses on the most relevant parts of local features.This allows the convolution kernel to dynamically adapt to the complex spatial variations and geometric structure of the point clouds.The attention mechanism dynamically adapts the weights of input features, and we calculate the attention weights between hyperedge and hyperedge as follows: where || denotes the connectivity relation and Conv1D denotes the 1 × 1 convolution operation.θ denotes the DWConv operation, which is used to reduce the number of parameters in the convolution operation.The first term o e i in θ denotes the hyperedge feature information constructed based on the spatial location relationship of the point cloud, which helps to aggregate the spatial information of the unordered vertices.The second term q e i denotes the computation of the spatial location relationship between neighboring points on each edge, which aims to enhance the local feature information.Considering the spatial relationships between neighboring hyperedges, our method can capture the local structural information to improve the performance of the model more accurately.Through computing the attention weights, we are able to efficiently assess the degree of association between hyperedges, which leads to more accurate feature aggregation and point cloud segmentation.Next, we normalize the attention weights of all neighbors of hyperedge i and obtain the final measure of the hyperedge attention factor for the whole hypergraph as follows: With this approach, we generate a new attention factor α for the hyperedge-hyperedge relationship, α∈[0,1].Thus, the final output equation obtained by our proposed method is as follows: where BN denotes Batch Normalization and ReLu denotes the activation function.We apply operations such as the ReLU function to it to obtain the final coefficients.
Comparison with existing methods: Current hypergraph attention convolution operators [21,26] usually use the vertex-hyperedge-vertex feature transformation model, which introduces an attention mechanism to handle vertex convolution and hyperedge convolution, as well as automatically learning the different weights of vertices and hyperedges during the feature transformation and propagation process.However, the method increases the number of model parameters and leads to training difficulties.In contrast, we design the hypergraph position attention convolution with the hyperedge-hyperedge feature propagation model.It eliminates the need for the vertex-hyperedge transformation process and simplifies the architecture of the model, which reduces the number of network parameters and decreases the complexity of the model.Thus, the designed convolution operator improves some of the limitations of the current hypergraph attention networks and provides a more effective solution for point cloud segmentation and other related tasks.Furthermore, experimental comparisons are made later in the network analysis to demonstrate the effectiveness of the method.

ResNet-like Module
In the field of deep learning, the residual network (ResNet) [38] is known for its unique residual module design, which mainly consists of multiple convolutional layers and shortcuts.It effectively solves the problems of gradient vanishing and network degradation caused by over-deep networks while reducing the number of parameters.DWConv [39] is a decomposition of the standard convolution operation into deep convolution and pointwise convolution.In the deep convolution stage, each input channel undergoes the convolution operation independently, thus effectively extracting the spatial features within each channel.Then, pointwise convolution fuses the features from different channels to generate the final output feature map.Inspired by the idea of the ResNet network, we introduced DWConv to construct a ResNet-like module, as shown in the red part of Figure 2a.We input the hyperedge information into the hyperedge position-attentive convolution module and multiply it with the processed node information to obtain the output feature information.We skillfully integrate this module into the hypergraph positional attention convolutional network, which not only reduces the computational complexity of the model but also significantly improves the operation efficiency of the network.The formula for the output characteristics is as follows: where V F denotes a set of input vertex features, each feature V F,i ∈ R F is associated with a corresponding hypergraph vertex, and the dimensional information of each vertex includes the most basic 3D coordinate information and also other information XYZ, such as colors and normal vectors.W ′ denotes the weight library, defined as i is a weight matrix, and D controls the number of weight matrices in the weight library.⊙ denotes the Hadamard product, which is performed as an element-byelement product of two vectors.By introducing this innovative module, hypergraph neural networks with high efficiency in point cloud segmentation are obtained.

Pooling Operation
The pooling operation performs aggregation of local area features to reduce the dimensionality of the point cloud, it can help to retain important feature information as well as reduce the amount of computation.In point cloud segmentation, the features of each vertex in the hypergraph G l+1 come from the set of aggregated output features in the previous-level hypergraph G l .The vertex feature V l+1 of the l+1 layer in G l+1 is computed as follows: V l+1 = pooling(Φ(V l out ))) (13) where V l out denotes the vertex output feature at layer l.Φ denotes the ResNet-like attentional convolution operation.Pooling denotes the pooling function, and we use maximum pooling for the downsampling operation.

Skip Connection
In point cloud segmentation, skip connection is a technique that fuses different levels of features to improve the model performance.The process consists of three main steps: interpolation, concatenation and 1 × 1 convolution.First, the weight of each node is calculated based on the distance between the known point cloud position information and the target point cloud position information.The magnitude of the weight value depends on the distance.New point cloud features are generated by weighted average interpolation operation.Subsequently, the features are concatenated together to form a richer feature representation.Finally, the 1 × 1 convolution operation is applied to reduce the number of channels of the fused features, thus effectively reducing the complexity of the model.The equations for the upsample output features are as follows: where I denotes weighted average interpolation operation, and V l−1 XYZ and V l XYZ , respectively, denote the target point cloud position information and the known point cloud position information.V XYZ denotes the result of the interpolation operation between them.This technique provides an effective means to improve the performance of the model in point cloud segmentation tasks.

Experiment
In this section, we evaluate our approach by performing semantic segmentation experiments on the S3IDS dataset [40] and instance segmentation experiments on the ShapeNet Part dataset [41].First, we present the datasets and implementation details, then we show the experimental results, and finally we present the ablation experiments.In this dataset, we use Area 5 as a test set, while the rest of the area is used for training in order to perform an objective evaluation of the model performance.This dataset provides an important resource for the task of semantic segmentation of point clouds, which can be used to develop and evaluate various algorithms and models, thus advancing research and applications in the field of indoor environment understanding.
ShapeNet Part dataset is a richly annotated and large-scale 3D image dataset designed to be used to evaluate the performance of 3D object part segmentation networks.The dataset consists of 16 different classes and a total of 50 segments, containing a total of 16,846 samples that exhibit a variety of inhomogeneous features.The dataset is divided into 12,137 samples for training, 1870 samples for validation, and 2874 samples for testing, reaching a total of 16,881 samples.Notably, it contains 35 duplicate samples.Using this dataset for in-depth research work could advance the field of computer vision in 3D image analysis.

Evaluation Indicators
On the S3IDS dataset, we use three metrics to quantitatively evaluate the performance of the proposed method on semantic segmentation, namely overall segmentation accuracy (OA), mean accuracy (mAcc) and mean intersection over union (mIoU).On the ShapeNet Part dataset, we use two metrics to quantitatively evaluate the performance of the proposed method on instance segmentation, namely class mIoU (Cls.mIoU) and instances (Ins.mIoU).FLOPs are used to measure the model computational complexity.

Implementation Details
Our proposed framework uses the PyTorch 1.8.1 deep learning framework with hardware configuration CPU:16-core Intel(R) Xeon(R) Platinum 8350C CPU @ 2.60 GHz and GPU: RTX 3090.In the S3DIS dataset, the training dataset was generated following the same preprocessing scheme as well-known networks such as PointNet++ [5] and PAConv [42].Specifically, the room is divided into blocks using the sliding window method, and then, 4096 points are randomly sampled as samples in each block.We utilize the farthest point sampling and ball querying schemes for hypergraph construction of XYZ and RGB feature information of the sample point clouds for the point cloud semantic segmentation task.We train the model using the SGD optimizer with the cross-entropy loss function minimized, the number of weight matrices set to 16, the PAConv convolution kernel set to [16,16,16], and the empirical learning rate set to 0.05.We maintain the same training and testing batches as PAConv so that we can better evaluate the performance of the network.
In the ShapeNet Part dataset, we use the same hardware and software configuration.We set the training batch size to 32 and use 200 epochs for training, the number of sample points is set to 2048, the model [42] convolution kernel is set to [16,16,16], and we also train the model using the Adam optimizer, set the momentum to 0.9 for batch normalization, set the minimum learning rate to 0.003, and use a fixed-step decay scheme to test our network.Our comparison with other models can demonstrate the effectiveness of our proposed method.To enhance the data, we also perform random scaling, rotation and dithering on the points of the tested model.

Segmentation Evaluation Results
Tables 1 and 2 show the segmentation evaluation results for two different datasets: the S3IDS and ShapeNet datasets, respectively.In the S3IDS dataset, we use the same configuration of sample size and training parameters as models such as GACNet [8], DGCNN [6] and PAConv [42], e.g., 4096 samples and 100 epochs of input for training.This careful configuration is used in order to perform a robust comparative analysis.Compared to PAConv, our new method provides a significant improvement, increasing the mIoU by 0.21%.In addition, our method achieves significant progress in terms of overall accuracy (OA), mAcc, and mIoU by 1.3%, 0.3%, and 4.1%, respectively, compared to the BIDEL model.Similarly, with respect to the ShapeNet Part dataset, we conducted an evaluation using consistent sampling points to assess the performance of multiple networks such as AGNet [43], DGCNN, and PAConv.Through rigorous analysis, we found an improvement in mIoU for instance segmentation 0.7% compared to the SBSNet network.
PointNet++ [5], SegGCN [44] and our method represent point-based, graph-based and hypergraph-based methods, respectively.The analysis in Figure 3 shows that our method achieves the best results in all three indicators, OA, mAcc and mIoU, and it proves the effectiveness and advancements of our method using a hypergraph for point cloud segmentation.Specifically, PointNet++ directly deals with the raw point cloud data, maximizing the preservation of the spatial features of the point cloud.However, this method ignores the geometric relationships between points, resulting in a lack of local features.SegGCN is to process graph-structured data by using graph convolution, which performs convolution operations on nodes and edges, thus realizing feature extraction and representation learning on graph-structured data.However, it is difficult to capture the complex structures present in 3D data because the higher-order relationships between objects are not taken into account.Our approach utilizes hypergraph convolution models that act directly on the raw feature data of the point cloud, which naturally preserves the spatial position information of the point cloud and helps to better capture the local feature information, thus improving the performance of the model in the point cloud data.

Complexity Results Analysis
Table 3 shows the comparison of the computational complexity of our model on the two datasets.By setting the same hyperparameters and comparing them on the two datasets, it is clear that there is a significant reduction in the FLOPs of our proposed method, and the model is more efficient.This is attributed to the fact that we introduce a ResNet-like module in the model to reduce the FLOPs by incorporating a deep convolution in the module, which performs the convolution operation independently for each channel.Meanwhile, we design the hypergraph position attentive convolution with a hyperedge-hyperedge feature propagation model.It eliminates the need for vertex-hyperedge transformation process and simplifies the architecture of the model, which reduces the number of network parameters and decreases the complexity of the model.As a result, our model achieves higher computational efficiency compared to other methods, reinforcing its advantages in resource utilization.
Table 3.Comparison of the computational complexity of FLOPs for different models (M denoted 10 6 ).

Visualization Results
Figure 4 shows the visualization of the results of the S3IDS dataset [40] on area 5.The first row is the original scene input, the second one is the real scene information labeling, and the third one is the segmentation scene of our proposed method.From this result, we can see that our proposal method obtains better segmentation results.

Ablation Study
This section explores in depth the impact of integrating the Dwconv convolution module into the model architecture.The experimental results in Table 4 demonstrate the effect of this architectural modification.As can be seen from the data, the introduction of the convolution module significantly reduces the computational complexity of the model, i.e., the FLOPs are reduced by 143 M.This improvement is particularly important because before the introduction of the convolution module, the initial computational complexity of the model was as high as 1302 M. The significant reduction in computational complexity is a strong argument for a more efficient model design.This result shows that our model not only improves the accuracy of the segmentation task, but also increases the computational efficiency.As shown in Figure 5, the unintroduced convolution module has a large impact on the stability of the network, and the performance of the model on the point cloud semantic segmentation task can be comprehensively evaluated based on the change in mIoU.

Parameter Sensitivity Study
In this section, we delve into our proposed method and conduct a series of experiments with a parametric analysis for comparative studies.All experiments were tested on region 5 of the S3IDS dataset.

Influence of Hyperedge Feature Information
We utilize different point cloud feature information for hyperedge construction and hyperedge concatenation, which apply to the hyperedge attention mechanism.The different hyperedge construction information includes spatial position information (XYZ), local spatial position information (local XYZ) and feature information containing RGB (features).In order to evaluate the effect of hyperedge information on segmentation, we conduct tests with different hyperedge information as shown in Table 5, where "||" indicated the concatenation of different hyperedge information.The comparison reveals that using the spatial position of the point cloud to construct the hyperedge is superior to using the feature information to construct the hyperedge, verifying the superiority of constructing the hyperedge by spatial position information.The weight matrices in the network help the model to better process and analyze the input data.Therefore, we discuss the effect of varying the number of weight matrices (denoted by the number of D) in the model weight library on the point cloud segmentation results.As shown in Table 6, the size of D increases exponentially, and when the number reaches 16, the segmentation metric mIoU appears to decrease, but FLOPs gradually increase.The test results show that when the number of weight matrices is too large, it brings redundancy and leads to heavy memory/computation overhead, which results in a rise in complexity FLOPs and a decrease in segmentation metric mIoU.Comparative analysis shows that the mIoU metric performs best when the weight matrix size is 16.As shown in Figure 6, it can be seen on the test set that the accuracy tends to increase when the weight matrix size is 16.When the weight matrix size is 32, there is a fluctuation.In comparison, it is not as stable as when the weight matrix size is 16.In order to further validate the feature learning effect of the proposed network under different sampling parameters, we conducted experiments using different numbers of sampling points (i.e., 2048, 4096, 8192, 16,384 and 32,768) in this study.As shown in Figure 7, the segmentation metric mIoU tends to increase with the increase in the number of sampling points, which indicates that the accuracy of the model has improved.However, it is worth noting that the increase in the number of sampling points leads to an increase in storage and computation requirements.From the observations in Figure 7, it can be seen that the difference in the mIoU values corresponding to the changes in the last four parameters is less than 0.1%.Thus, we can find that the model achieves an optimal solution between segmentation efficiency and computational cost using 4096 sampling parameters.

Influence of Convolutional Layers
We test the effect of a different number of layers given to the network.Observing Table 7, we can find that the number of convolutional layers increases, and the segmentation metrics mIoU and accuracy grow gradually.However, when the number of convolutional layers reaches four layers, the two metrics do not increase but rather decrease.Through analysis, we find that due to increasing the number of convolutional layers, it causes overfitting of the model, which affects the segmentation results of the model.

Conclusions
In this paper, we propose a novel hypergraph position attention convolutional network for 3D point cloud segmentation.This method improves the performance of point cloud segmentation by taking advantage of the hypergraph position attention convolutional model, which makes full use of the inherent neighborhood information and spatial geometric structure information between point clouds.In addition, we propose a ResNet-like module that lightens the model and improves the network efficiency.A large number of experiments have proven the effectiveness of our method and provided new solutions in the field of point cloud segmentation.However, there are some limitations of our method.For example, we only utilize the spatial position information and color information of the point cloud without fully combining and utilizing other feature information, such as normal vectors.In the future, we will use this information further to explore the possibility of this network for point cloud segmentation performance enhancement.Meanwhile, we will use the network in other fields such as sentiment recognition and the recommender system to further prove the effectiveness of our network.

Figure 2 .
Figure 2. Point cloud segmentation framework based on hypergraph position attention convolution networks.(a) denotes the ResNet-like module, (b) denotes the hypergraph position attention convolution module, and (c) denotes the skip connection module.

4. 1 .
Datasets and Evaluation Indicators 4.1.1.Dataset S3IDS dataset covers different types of indoor areas, making it a representative indoor point cloud dataset.The dataset contains over 200 million points, each categorized into 13 different semantic categories, providing large-scale data for training and testing.

Figure 3 .
Figure 3. Accuracy of different methods.

Figure 4 .
Figure 4. Visualization of the semantic segmentation results of the S3IDS dataset on Area 5.

Figure 6 .
Figure 6.The accuracy change on the test set when D is 16 and 32.4.5.3.Influence of the Number of Sampling Points

Figure 7 .
Figure 7. Influence of the number of sampling points.

Table 2 .
Shape part segmentation results for the ShapeNet part dataset (%).

Table 4 .
Influence of convolution module on the model.

Table 6 .
Influence of the number of weight matrices on the model (%).