3.1. Overview
We propose a 3D shape classification network based on triangular mesh features and graph convolutional neural networks.
Figure 2 shows the flow chart of the algorithm proposed in this paper. The flow chart consists of two parts, one is the simplification and feature extraction of the mesh data, and the other is the 3D shape classification based on graph convolutional neural networks.
Triangle meshes are a common way to display 3D models, which consist of three parts: vertices, edges, and faces. A face refers to a triangular face formed by interconnecting three adjacent vertices in the mesh data. The triangular mesh data can be defined as , where denotes the set of points and denotes the set of faces. Triangular mesh data are better equipped to describe 3D models than other data types such as voxels, multi-views, and point clouds. Furthermore, the explicit connection feature of the mesh makes it easier to extract the adjacency matrix of the mesh.
We should simplify the original input 3D model mesh data to obtain a model with no more than 1024 faces. In order to combine the data from nearby vertices, the face is supplied into the model. Multi-scale local feature splicing and pooling then produces the classification results. The model for mesh processing and classification is described in full below.
3.3. Model Design
The graph convolutional neural networks are capable of processing unstructured data. The graph features will be learned using the spectral domain graph convolutional neural network proposed by Kipf [
9] in this paper. Supported by spectral graph theory, the kernel of the neural network is defined by a filter for graph signal processing, giving it a better filtering capability. The input of the graph convolutional network consists of two parts: First, a feature description
for each node
, which denotes the
feature matrix (
denotes the number of nodes, and
denotes the number of features of the input). The second is the adjacency matrix of the graph. (Construct the adjacency matrix using the adjacency relationship between faces.) The value of each element in the adjacency matrix can be calculated by Equation (3).
Suppose that there are
layers of graph convolution and that
denotes the current number of layers.
denotes the output of layer
.
. denotes the adjacency matrix of the graph of
nodes (
), so each graph convolutional layer neural network can be represented by the nonlinear function Equation (4).
For a graph convolutional neural network to retain information about the nodes, each node needs to be connected to itself. Next, normalize
by
, where
is the degree diagonal matrix of the nodes and
(
is the unit matrix). The propagation equation of the graph convolutional neural network can be expressed as
Equation (5) shows that constructing the graph’s adjacency matrix is the key to using graph convolution. In this paper, the adjacency matrix and the degree matrix can be built by the connection relationship with the face. The graph convolution can be applied to the mesh data.
Before inputting the face features into the graph convolutional network for aggregation, the center point features and the corner vector features need to be processed separately. Referring to the method in MeshNet [
35], rotational convolution is used to process the corner vectors of the triangular face, which only works on two corner vectors at a time.
Figure 4 shows the diagram of rotational convolution. Suppose that
,
,
are the vectors of a face from the center point to the three angles and define its convolution output as Equation (6). Finally, the output is passed through the fully connected layer to obtain a feature with a length of 64.
where
means a convolution operation and
.
The denotes the center point feature of the face. Increasing the center point feature to 64 dimensions through a fully connected layer results in obtaining the center point feature dimension as . We connect the convolutional output of the corner vectors with the features of the center points to obtain the high-dimensional features and to input them into the graph convolutional neural network for feature aggregation.
In this paper, a two-layer graph convolutional neural network is used. We input the high-dimensional features into the first layer of graph convolution to obtain the information on aggregated first-degree neighboring vertices. Then, after the activation function, the output features go through the second layer of graph convolution to obtain the information of the aggregated second-degree neighboring vertices. The information of the aggregated first-degree neighboring vertices and the aggregated second-degree neighboring vertices are stitched together to obtain multi-resolution features, as shown in
Figure 5. Degree = 1 denotes the aggregation of first-degree neighboring vertices, and degree = 2 denotes the aggregation of second-degree neighboring vertices.
The multi-resolution features are passed through a fully connected layer (1024) to obtain higher-dimensional features, and the information redundancy in the high-dimensional space facilitates the subsequent global pooling operation. In order to obtain the global features, global fusion of the extracted features is required. Max pooling is a nonlinear feature fusion function that is insensitive to the order of elements, and we take the max pooling operation on graphs for graph data. The output features are subjected to max pooling in order to obtain global features, and then, the classification results are obtained through the fully connected layer (512, 256, and 40) and the Softmax layer.