1. Introduction
As a cold mechanical joining process used to connect lightweight components, Self-Piercing Riveting (SPR) is widely employed in the manufacturing of car bodies and aircraft fuselages [
1,
2]. The widespread use of SPR technology is expected to reduce energy consumption and emissions. Due to the complexity of the riveting process [
3,
4], the occurrence of abnormal situations during the forming process will affect the status of the rivets. This may lead to riveting failure, resulting in external or internal defects in the rivets [
5,
6]. Since the condition of the rivet head can be used to directly assess the reliability of the interlocking structure [
7,
8], the defect detection for the head plays a critical role in industrial manufacture to ensure the joining quality of products.
Existing methods for riveting defect detection can be classified into three main categories: 2D image-based detection methods, traditional 3D point cloud-based methods, and deep learning-based neural network methods. A comparative analysis is then conducted to evaluate the respective strengths and limitations of the three types of methods.
Firstly, 2D image-based detection is inadequate to present the critical spatial features of rivets [
9,
10], such as the three-dimensional posture, and quantitative assessments of rivet flush and defects, which are essential for evaluating the riveting quality [
11]. Additionally, the detection method based on 2D images is sensitive to illumination and shooting angles, which affect the accuracy and robustness of the detection [
12]. While deep learning can be incorporated to raise recognition performance [
13,
14], the approach remains deficient in depth information and hence hinders the ability to extract sufficient feature information of the riveting status. Consequently, such methods are seldom employed for rivet defect detection.
Secondly, 3D vision technology has been extensively adopted in various defect-detection domains due to its ability to precisely reproduce the spatial information of real targets from point clouds [
15,
16]. One representative class of methods, which relies exclusively on specific geometric features as opposed to deep learning, processes point cloud data and is therefore referred to as traditional 3D point cloud-based methods. Using 3D vision sensors, high-precision defect detection can be achieved through the extraction and recognition of the geometric contour features of multi-type rivet head defects. Zhou et al. [
17] utilized the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm to extract the point clouds of rivets based on a 3D scanner. The 3D rivet reconstruction was performed by generating localized regions and a gap filling strategy. Based on the contour features of the reconstructed model, the rivet head defects can be identified, such as damage, indentation, and dislodgement. Wang et al. [
18] measured the diameter and height of the rivet head with the hybrid data based on 2D and 3D information. The detection accuracy was improved by 50%, and the height measurement error was less than 10 μm, thereby effectively enhancing the defect features. Xie et al. [
19] carried out the rivet detection via adaptive density computation with point clouds obtained from a laser 3D scanner. The defects and flush inspection of rivet can be quantified through local density enhancement and the circular fitting method for rivet contours. Although 3D vision detection technology can address the limitations of traditional 2D images, it is advisable to further explore a specific lightweight and efficient defect recognition and classification network to facilitate industrial applications. Therefore, reasonable data-processing methods and defect feature-extraction techniques are key to the identification of riveting defects.
Thirdly, deep learning-based neural network methods have emerged as the dominant approaches in defect detection. Such methods can be combined with existing 3D point cloud processing techniques. These techniques can be categorized into voxel-based methods [
20], multi-view based methods [
21], and point-based methods [
22]. The voxel-based and multi-view based methods may miss feature details during processing, resulting in decreased detection accuracy. By contrast, the method based on original point clouds can preserve complete geometric feature information to achieve high-precision defect detection and hence is widely used. As an innovative achievement, PointNet directly processes unordered 3D point sets using a network architecture based on three 1D convolutions and max pooling, extracting global features for classification [
23]. However, when point sets contain complex geometric structures, rich local features must be captured to boost the model’s generalization performance. To this end, PointNet++ is developed by adopting a hierarchical approach—passing through sampling, grouping, and feature-extraction layers in sequence—to learn local features [
24]. In this way, it can achieve high-precision classification and segmentation. Subsequently, Ma et al. [
25] constructed the PointMLP model based on a lightweight geometric affine module and multiple residual MLP modules. By mapping local point features to a unified scale and extracting local features in a staged manner, the classification performance can be further improved although at the cost of an increased parameter count and computational cost. Furthermore, a non-parametric network termed Point-NN is established based on a non-parametric encoder and a point-memory bank [
26]. It can achieve favorable performance on different 3D tasks without training. Based on their foundational framework, Point-PN is further developed with learnable linear layers, which reduces the number of parameters and improves feature learning efficiency.
Although the point-based deep learning methods can extract local geometric features, challenges remain in fully extracting key features. For this reason, attention mechanisms have been incorporated into deep learning architectures to enhance the extraction of subtle features in image processing [
27]. It allows models to selectively focus on key input features and capture relationships across different dimensions effectively by offering global perception and flexible interaction capabilities [
28,
29,
30]. Zhang et al. [
31] presented an MA-SPRNet model for defect detection of riveted joints. The model is integrated with a multi-attention mechanism that enhances the defect information captured from channel, spatial, and structural dimensions. While the riveting detection precision is improved, the computation load is high. Huang et al. [
32] proposed a global attention mechanism based on YOLOv5 to capture defect features on the aircraft body surface and reduce redundant scene information, significantly improving defect-detection performance. Xie et al. [
33] introduced a field attention unit to learn the characteristics of the rivet region of the fuselage surface by assigned weights. Based on 3D point clouds, the rivet point prediction is carried on with the RRCNet model, and the accuracy can reach 93.0%. Tang et al. [
34] proposed the SCA-Net, a spatial and channel attention-based model that captures the geometric relationship between point cloud patches. The network extracts local features from three divided patches of the dataset and learns their geometric correlations. Then, it extracts global features via auto pooling. Finally, classification and segmentation experiments validated its performance.
It is well known that the learning network with attention modules can improve the classification accuracy. However, this may increase the number of parameters and computational complexity. Therefore, many researchers focus on the design of a lightweight network to improve detection efficiency [
35,
36]. The prior developed lightweight network MobileNet reduces the parameter computation volume by introducing depthwise separable convolution, which trades off between computation and accuracy [
37]. In contrast, the GhostNet model replaces conventional convolutional layers with cheap linear transformations and identity mappings based on the Ghost module [
38]. This flexible embedding method with stacked modules is beneficial to promote detection efficiency. Mohammadi et al. [
39] designed a Point-LN classification network with non-parametric position encoding, combining the non-parametric components with a streamlined learnable classifier. Based on the ModelNet40 dataset, the conducted classification tests show that Point-LN can offer better accuracy and efficiency simultaneously, for the reduced number of parameters and less runtime. Peng et al. [
40] replaced the depthwise separable convolution in the Ghost convolution with DO convolution, thereby constructing the DC-Ghost module, which enhances image features and reduces network parameters and computational costs. Zhang et al. [
41] proposed a lightweight position-recognition network, termed LR-Net. It obtains rotation-invariant features with low-dimensional features extraction structure, thereby improving recognition efficiency and accuracy. Recent studies have also explored lightweight 3D point cloud frameworks for edge deployment and industrial inspection applications. To reduce model parameters and computational complexity for deployment on edge devices, Li et al. [
42] proposed a lightweight 3D point cloud object-detection architecture tailored for computationally constrained platforms. Liang et al. [
43] proposed a Rotationally Invariant Features (RIF) framework for 3D anomaly detection and designed a lightweight Convolutional Transform Feature Network (CTF-Net) to achieve efficient and robust point cloud feature extraction, demonstrating its potential for industrial inspection applications. Despite these advances, a general approach to feature extraction and recognition has not been formulated for various practical applications. There is little research on comprehensive detection methods for various rivet head defects based on 3D point clouds.
Systematic research on defect detection for a multi-type rivet head remains limited in the relevant literature, and even fewer efforts have been devoted to lightweight network architectures that enable efficient defect detection in industrial settings. In fact, the multiclass classification offers an ideal framework for rivet defect recognition, provided that the defects can be learned effectively [
44]. To this end, this study proposes a lightweight classification network, PointGhost, for automatic rivet head defect detection based on 3D point clouds. This research contains the following three contributions:
We create a dataset containing five types of rivet head defects collected by a 3D scanner. The DBSCAN clustering method is first employed to extract individual rivet head data from the riveted plates. Then, the dataset of the rivet head can be formed using the Non-Maximum Eigenvalue Curvature Method (NMECM), which retains the defect features with reduced redundant information.
As a lightweight network, PointGhost is formulated for defect classification of the rivet head. The developed framework includes three main modules. In the sampling module, a Virtual Block Sampling (VBS) mechanism is proposed to reduce computational complexity. In the feature extraction module, a lightweight model, Grouped Pointwise Convolution Ghost (GPC-Ghost), is introduced for local and global feature learning. In addition, an efficient Dynamic Screening Self-Attention (DSSA) module is proposed to integrate and improve the feature expressiveness for defects. Through these means, multi-type head defects can be classified.
The severity levels of rivet protrusion and indentation defects are further quantified using the Principal Component Analysis (PCA) method and Total Least Squares (TLS) plane fitting algorithm.
The remainder of this study is organized as follows:
Section 2 introduces the type of rivet head defects and presents the riveting samples for detection. This section also proposes an efficient rivet data-extraction algorithm based on DBSCAN clustering.
Section 3 introduces our detection architecture.
Section 4 describes the experimental research, including the dataset, performance analysis, and flush evaluation of the rivet head.
Section 5 summarizes the conclusion and introduces the direction of future work.
4. Experiments and Results
This section presents the experimental evaluation of the proposed method. It includes the dataset description, classification performance analysis with an ablation study, rivet head flushness quantification, and detection performance analysis.
4.1. Rivet Head Dataset and Experimental Environment
All riveted specimens are scanned by a KSCAN-Magic combined handheld 3D scanner with a scanning accuracy of 0.05 mm, as shown in
Figure 9a.
Figure 9b shows a partial riveted specimen. Using the aforementioned data preprocessing algorithm, individual rivet head data are extracted with the number of subregions
n set to 20. This value is chosen according to the specific dimensions of the specimens, as well as the size and number of rivets, to ensure the completeness of the rivet head data extracted after plane fitting. The data reduction is performed with the number of nearest neighbors
m set to 30, which can effectively characterize the local geometric features between each given point and its neighborhood. The constructed dataset includes six types of rivet head states listed in
Table 5. Among them, rivet head protrusions and indentations are further classified into three severity levels. A total of 1680 sample data sets is collected, of which 1260 are used for training, while the remaining 420 are reserved for testing.
The experimental environment is configured with an Intel Xeon Silver 4210 CPU and an NVIDIA RTX A5000 GPU. The software environment comprises Windows 10, Python 3.9, and PyTorch 2.1.0, with computations accelerated by CUDA 12.2 and cuDNN 8.8.1. The batch size for training is configured as 15, with six classified categories. The model is trained for 100 epochs. Each rivet data sample consists of spatial coordinates and normal vectors, with 1024 input points. Training is conducted with an initial learning rate of 0.001 using the Adam optimizer. The CPU utilization rate of the proposed method is approximately 5%, and the memory consumption reaches 2.3 GB at runtime.
4.2. Comparison of Defect Classification Performance
To examine the performance of the proposed network, a comparative performance analysis of different types of lightweight models, attention mechanisms, and point cloud classification networks is conducted under the same experimental environment, dataset, and parameter settings. Given the imbalanced class distribution, the Mean Accuracy (MA) metric is adopted to objectively reflect the overall balance of the network model in classification. Unlike the Overall Accuracy (OA), the MA is more conservative and provides a more effective indication of the model’s ability to recognize minority classes.
4.2.1. Performance Comparison of Lightweight Models
The training performance of the GPC-Ghost model and the resulting PointGhost network is compared with that of several lightweight models, including CondenseNet, ShuffleNet, MobileNet V2, MobileNet V3, Ghost, and Ghost-PC. For a fair and consistent comparison, each model is integrated into PointNet++ by replacing its original feature learning module.
Table 6 presents the comparison results of training performance, and
Figure 10 shows the training process along with the mean classification accuracy over epochs.
Compared with the PointNet++, CondenseNet shows no significant performance improvement. ShuffleNet reduces the computational cost and parameter count by 0.33 G and 0.23 M, respectively, but does not improve the MA. MobileNet V2 and MobileNet V3, owing to the introduction of expansion layers, incur a significant increase in both computational cost and parameter count without substantial improvements in accuracy. The Ghost module reduces the computational cost and parameter count by 0.43 G and 0.42 M, respectively, while increasing the mean accuracy by 3.8%. It demonstrates its lightweight effectiveness. Ghost-PC also reduces the computational cost and parameter count along with improved classification accuracy. However, its training process is unstable. The GPC-Ghost module reduces the computational cost and parameter count by 0.66 G and 0.59 M, respectively, achieving a 4.31% improvement in MA. It indicates a substantial performance gain. Among all compared models, the PointGhost network achieves the largest reductions in the computational cost and parameter count, namely 0.75 G and 1.22 M, respectively. It attains a mean classification accuracy of 99.86% and exhibits the most stable training behavior. The above comparison demonstrates that the lightweight design of GPC-Ghost module can effectively elevate the classification accuracy of the PointGhost network.
4.2.2. Performance Comparison of Attention Mechanisms
The performance of five attention mechanisms—EMA, CBAM, CPCA, CA, and SA—is compared with that of DSSA. To ensure consistency, all six attention mechanisms are individually integrated into the feature learning module of PointNet++. Their respective impacts on defect classification performance are then comparatively analyzed.
Table 7 presents the training comparison results for each attention module. The corresponding MA curves are presented in
Figure 11. In the comparison with the results of PointNet++ summarized in
Table 6, the introduction of attention mechanisms leads to consistent improvements in the model’s MA. Among them, the EMA attention mechanism increases the computational load due to the introduction of an additional fully connected layer, yet classification accuracy shows no significant improvement. The CBAM attention mechanism combines channel and spatial attention, which increases convolution operations. However, the increased operations fail to enhance global feature learning capability, leading to reduced classification accuracy and increased computational cost. The CA attention mechanism primarily analyzes and processes information across two spatial dimensions, resulting in a 1.29% decrease in classification accuracy.
In addition, compared with the results of PointNet++ in
Table 6, the CPCA and SA modules improve MA by 0.98% and 2.6%, respectively. However, both modules significantly increase the computational cost and parameter count, thereby increasing the network’s computational burden. Although the self-attention mechanism of the SA module effectively captures global dependencies and enhances feature representation, this global learning strategy can result in high computational complexity when processing large input data. The comprehensive comparison reveals that the DSSA module achieves the highest mean classification accuracy of 99.44%. It shows the most significant performance improvement without a noticeable increase in the computational cost or parameter count. Furthermore, DSSA exhibits faster convergence during training and attains optimal classification accuracy in fewer epochs. In summary, the proposed DSSA attention mechanism effectively enhances feature extraction and classification performance through its dynamic screening self-attention mechanism. In addition, its training process is stable with fast convergence, making it more suitable for lightweight network design.
4.2.3. Performance Comparison of Classification Networks
To benchmark the performance of different types of point cloud classification networks, the PointGhost network is trained alongside PointNet, PointConv, PointMLP, PointNet++, PointNeXt, and Point-PN on the constructed dataset. Each network is trained for ten rounds, with 100 epochs per round. To optimize training performance, the batch size for PointMLP and PointNeXt is set to 20, and the learning rate for PointMLP is set to 0.1.
Table 8 presents the MA of each classification network across all training rounds. The performance gap represents the difference in MA between each classification network and PointGhost. The downward arrow (↓) denotes inferior performance relative to PointGhost.
Figure 12 shows the training process corresponding to the best accuracy achieved by each network.
An analysis of the training results indicates that PointNet exhibits an unstable training process with significant fluctuations, resulting in the lowest classification accuracy among all evaluated networks. In contrast, PointConv and PointMLP achieve slightly improved mean classification accuracy but require longer training times. PointNet++, PointNeXt, and Point-PN exhibit relatively smaller fluctuations in classification accuracy across rounds, indicating better stability. However, they still have limitations in learning and extracting subtle defect features, making it difficult to meet the requirements for different defect detection of rivet heads.
The comprehensive comparison shows that the PointGhost network achieves superior training performance, with an MA of 99.49%. This surpasses Point-PN and PointNeXt by 0.93% and 4.55%, respectively. Moreover, it maintains stable classification accuracy, with the majority of test results exceeding 99%. It can be seen from its stable training process and consistent classification performance that the network exhibits good robustness. Compared with PointNet++, the MA is improved by 4.41%, which preliminarily verifies the effectiveness and reliability of the proposed network for point cloud data classification task.
An ablation study is conducted to further evaluate the effectiveness of each key component in PointGhost. As shown in
Table 9, the introduction of VBS significantly reduces the inference time from 107.82 ms to 32.68 ms, while maintaining comparable classification performance. This indicates that it is effective in accelerating point cloud sampling. After integrating GPC-Ghost, the parameter count is reduced from 1.467 M to 0.211 M, while the classification accuracy is significantly improved. This shows that the lightweight feature extraction design is efficient. With the addition of DSSA, the final PointGhost framework achieves the best classification performance, with an F1-score of 99.50% and a MA of 99.72%. This confirms the effectiveness of the proposed attention mechanism.
Figure 13 presents the confusion matrix of PointGhost on the test set, providing an intuitive visualization of its classification performance. In the figure, the normal rivet head, head protrusion, head indentation, rivet rollover, empty rivet, and head damage are labeled as 1 through 6, respectively. These results indicate that PointGhost achieves strong classification performance across different rivet defect categories. Only three misclassifications are observed, further demonstrating the effectiveness and stability of the proposed method.
To further evaluate the classification performance and generalization capability of the PointGhost network, tests are conducted based on the test set (420 data groups) using the classification networks trained in
Table 8. The results are presented in
Table 10 and
Figure 14. For PointGhost, the misclassification results corresponding to the maximum misclassification rate are shown, whereas for the other five network models, the results with the lowest number of misclassifications are displayed. In the figure, the normal rivet head, head protrusion, head indentation, rivet rollover, empty rivet, and head damage are labeled as 1 through 6, respectively.
The comparison shows that PointNet++ achieves an average misclassification rate of 3.29%. The result corresponding to its minimum misclassification rate shows that all 13 head damage samples are misclassified as normal heads. This failure is due to the network’s inadequate learning of both the normal vector features of point cloud data and the characteristics of head damage. Although the model achieves a low overall misclassification rate, such misclassifications—labeling defective rivets as normal—would severely degrade riveting quality inspection performance. PointNeXt yields the highest average misclassification rate of 7.83%. The misclassification patterns are diverse, including head protrusions, indentations, empty, head damage, and rollover. For instance, among eight head protrusion samples, one is misclassified as head indentation, and the remaining seven are misclassified as head damage. This indicates that PointNeXt fails to effectively extract and learn the subtle features of different defect types, resulting in poor discrimination between samples with similar local geometric features but different defect categories. Point-PN, PointConv, and PointMLP show progressively lower misclassification rates relative to PointNeXt, indicating improved feature-learning capability. In contrast, PointGhost achieves the lowest average misclassification rate of 1.19%, with a maximum rate of 1.9%. Although this maximum slightly exceeds PointMLP’s minimum misclassification rate of 1.43%, PointGhost maintains a lower overall misclassification level. An analysis of the test result with the minimum misclassification rate reveals that only one head protrusion sample is misclassified as head damage. An analysis of the test result with the maximum misclassification rate shows that, among the eight misclassified samples, five head protrusions are misclassified as head damage, one head damage is misclassified as head protrusion, and two head damage are misclassified as head indentation. These misclassifications arise because the fine geometric features of head defects tend to hinder the model from adequately learning their distinct morphological characteristics. This limitation can be addressed by incorporating additional training samples that exhibit such subtle defect features. The above analysis demonstrates that the PointGhost model achieves superior overall test performance. This is attributed to its stable training process and consistent classification behavior, which further indicate the network’s strong robustness and generalization ability. These analytical results validate that PointGhost can perform the classification of multi-type rivet head defects efficiently and accurately.
4.3. Quantification of Rivet Defect Severity
Following rivet head state classification, flushness quantification is performed on rivet head samples with protrusion and indentation defects. It can be achieved by calculating the maximum distance from the fitted plane of the riveted plate to the projected filtered rivet head data on the fitted plane of the rivet head.
Table 11 presents the quantified flushness results. Among them, two misclassifications are identified: a mild protrusion defect with a quantified value of 0.62 mm and a mild indentation defect with a quantified value of 0.61 mm, both of which are incorrectly classified as moderate defects. The quantified values deviate from the upper bound of the mild defect range by 0.02 mm and 0.01 mm, respectively. Overall, the quantified results of the proposed method are highly consistent with actual measured values. Among the 240 samples, only two cases are misjudged, yielding a misclassification rate of 0.83%. No significant deviations were observed, thus verifying the feasibility and effectiveness of the proposed quantification method for efficient defect severity assessment.
Finally, the inspection process for an individual rivet comprises four sequential stages: individual rivet data extraction, rivet head point cloud data reduction, head status classification, and head flushness quantification (only for protrusion and indentation). The average processing times for these stages are 0.61 s, 0.64 s, 0.02 s, and 0.03 s, respectively. Excluding the sampling time, the total time from data processing to classification and quantification for a single rivet is within 1.5 s. This demonstrates efficient processing and reliable classification performance, making it suitable for industrial inspection applications.
4.4. Analysis of Detection Performance
The detection performance of the proposed method is governed by three factors: limitation of scanning accuracy, impact of data reduction, and influence of residual noise. Firstly, all riveted specimens are scanned using a 3D scanner with a nominal scanning accuracy of 0.05 mm. However, the actual detection accuracy of the system is expected to be no better than 0.15 mm under extreme conditions (e.g., unfavorable scanning angles or elevated noise levels), i.e., at least 3 times lower than the nominal sensor accuracy. Secondly, although the RRHD preserves the primary defect features of the riveted heads and significantly enhances detection efficiency, it inevitably removes some fine-scale geometric information, thereby limiting the detection capability for small defects. Finally, under extreme scanning conditions, some noise points may persist despite the application of filtering algorithms. These residual noise points may cause misclassifications, especially for very small defects where the signal-to-noise ratio is low. As for industrial application, parallel computing techniques can be utilized to achieve the concurrent detection of each rivet head. In this way, the presented method is suitable for scenarios where the total inspection time, including the scanning time, is no less than 40 s, such as a quality inspection island in automotive production lines.
5. Conclusions
This research aims to address the challenges of complex defect feature extraction from rivet heads on an automotive body and the limited performance of conventional classification networks. The inspection method is investigated for five common types of rivet head defects. Riveted specimens are prepared for defect detection based on the point clouds of riveted plates collected using a 3D scanner. An individual rivet head dataset is created via extraction and reduction from the acquired point data of the plates based on the DBSCAN clustering method and the NMECM. To achieve efficient classification of multi-type rivet head defects, we propose PointGhost, a lightweight classification model that integrates a VBS mechanism, a GPC-Ghost network, and a DSSA mechanism. The model performs both local and global feature learning on the key geometric information of head defects, thereby enhancing defect feature representation and enabling accurate and efficient classification. Subsequently, the severity of rivet protrusion and indentation defects is quantified using PCA combined with the TLS plane fitting algorithm. On the custom-built dataset, the PointGhost model is validated for stable training and consistent classification performance with reasonable robustness. The model achieves a mean accuracy of 99.49%. Compared with PointNet, PointConv, PointNet++, PointMLP, PointNeXt, and Point-PN, the presented network achieves improvements of 24.32%, 2.47%, 4.41%, 3.68%, 4.55%, and 0.93%, respectively. A comparative analysis of test results reveals that PointGhost attains the lowest average misclassification rate of only 1.19%, outperforming PointNeXt, PointConv, Point-PN, and PointMLP by 6.64%, 3.83%, 3.41%, and 2.38%, respectively. Furthermore, the network requires only 0.13 G of computational cost and 0.25 M parameters and hence significantly improves computational efficiency. The total time for a single rivet from data processing to classification and quantification does not exceed 1.5 s, demonstrating efficient processing and reliable classification performance. This demonstrates its potential for industrial inspection applications. However, the proposed method shows limited effectiveness in identifying small defect features, primarily due to the difficulty of fabricating small-defect specimens, which results in an insufficient number of such samples in the dataset. Future research will focus on the fabrication and detection of mild defect samples with defect sizes ranging from 0.1 to 0.2 mm. By further refining and optimizing the network architecture and parameters, we aim to enhance detection performance for small defect samples, reduce the misclassification rate, and deploy the proposed detection method for industrial applications.