Research on Defect Detection in Automated Fiber Placement Processes Based on a Multi-Scale Detector

: Various surface defects in automated ﬁber placement (AFP) processes affect the forming quality of the components. In addition, defect detection usually requires manual observation with the naked eye, which leads to low production efﬁciency. Therefore, automatic solutions for defect recognition have high economic potential. In this paper, we propose a multi-scale AFP defect detection algorithm, named the spatial pyramid feature fusion YOLOv5 with channel attention (SPFFY-CA). The spatial pyramid feature fusion YOLOv5 (SPFFY) adopts spatial pyramid dilated convolutions (SPDCs) to fuse the feature maps extracted in different receptive ﬁelds, thus integrating multi-scale defect information. For the feature maps obtained from a concatenate function, channel attention (CA) can improve the representation ability of the network and generate more effective features. In addition, the sparsity training and pruning (STP) method is utilized to achieve network slimming, thus ensuring the efﬁciency and accuracy of defect detection. The experimental results of the PASCAL VOC and our AFP defect datasets demonstrate the effectiveness of our scheme, which achieves superior performance.


Introduction
Carbon fiber-reinforced plastic (CFRP) has remarkable advantages such as light weight, high strength, fatigue resistance, and corrosion resistance, and it is often used in large, single-piece aircraft structures [1,2]. The manufacturing methods of CFRP include hand layup, automated tape laying, and automated fiber placement [3]. Given the problems associated with the hand layup process, which include difficulty in achieving complex shapes, the need for the manufacturing of large-sized parts, low efficiency, and difficulty in achieving quality consistency, the relatively novel technique of automated fiber placement (AFP) is increasingly used in industry to make manufacturing economical, fast, and efficient [4][5][6]. Automated fiber placement (AFP) contains a placement head and a robotic arm. The placement head lays the CFRP material layer by layer onto a mold. The procedure of automatic fiber laying (AFP) is schematically shown in Figure 1.
In the actual production environment, various defects may occur during fiber layup, which will affect the quality [7][8][9][10]. These defects are often directly related to the layup process itself. Harik et al. [7] investigated the link between AFP defects and process planning, layup strategies, and machining.
The common types of AFP defects include wrinkles, twists, gaps, bubbles, and the presence of foreign material. A series of scanned images of actual defects and a reference sample without any defect are illustrated in Figure 2. In the actual production environment, various defects may occur during fiber layup, which will affect the quality [7][8][9][10]. These defects are often directly related to the layup process itself. Harik et al. [7] investigated the link between AFP defects and process planning, layup strategies, and machining.
The common types of AFP defects include wrinkles, twists, gaps, bubbles, and the presence of foreign material. A series of scanned images of actual defects and a reference sample without any defect are illustrated in Figure 2. Defect detection typically requires manual observation by the naked eye, which leads to low production efficiency. Manual online detection is easily affected by subjective experience, and it may cause problems such as missed detection when the manufacturing task is heavy. With the rapid development of computer vision, deep learning, and other technologies, the defect visual inspection technology [11][12][13][14][15] based on deep learning can be effectively used in the quality control and monitoring of the CFRP manufacturing process. Sebastian Zambal et al. [8] considered defect detection in AFP as an image segmentation problem that can be trained by manually generated training sets. In their study, a laser triangulation sensor was used to obtain the data of the layup machinery, and a dataset with 5000 samples was established. The trained neural network could recognize the gaps, overlaps, and foreign objects on the product's surface.
In this paper, we propose the spatial pyramid feature fusion YOLOv5 with channel attention (SPFFY-CA) to achieve defect detection in AFP. The SPFFY-CA includes spatial  In the actual production environment, various defects may occur during fiber layup, which will affect the quality [7][8][9][10]. These defects are often directly related to the layup process itself. Harik et al. [7] investigated the link between AFP defects and process planning, layup strategies, and machining.
The common types of AFP defects include wrinkles, twists, gaps, bubbles, and the presence of foreign material. A series of scanned images of actual defects and a reference sample without any defect are illustrated in Figure 2. Defect detection typically requires manual observation by the naked eye, which leads to low production efficiency. Manual online detection is easily affected by subjective experience, and it may cause problems such as missed detection when the manufacturing task is heavy. With the rapid development of computer vision, deep learning, and other technologies, the defect visual inspection technology [11][12][13][14][15] based on deep learning can be effectively used in the quality control and monitoring of the CFRP manufacturing process. Sebastian Zambal et al. [8] considered defect detection in AFP as an image segmentation problem that can be trained by manually generated training sets. In their study, a laser triangulation sensor was used to obtain the data of the layup machinery, and a dataset with 5000 samples was established. The trained neural network could recognize the gaps, overlaps, and foreign objects on the product's surface.
In this paper, we propose the spatial pyramid feature fusion YOLOv5 with channel attention (SPFFY-CA) to achieve defect detection in AFP. The SPFFY-CA includes spatial Defect detection typically requires manual observation by the naked eye, which leads to low production efficiency. Manual online detection is easily affected by subjective experience, and it may cause problems such as missed detection when the manufacturing task is heavy. With the rapid development of computer vision, deep learning, and other technologies, the defect visual inspection technology [11][12][13][14][15] based on deep learning can be effectively used in the quality control and monitoring of the CFRP manufacturing process. Sebastian Zambal et al. [8] considered defect detection in AFP as an image segmentation problem that can be trained by manually generated training sets. In their study, a laser triangulation sensor was used to obtain the data of the layup machinery, and a dataset with 5000 samples was established. The trained neural network could recognize the gaps, overlaps, and foreign objects on the product's surface.
In this paper, we propose the spatial pyramid feature fusion YOLOv5 with channel attention (SPFFY-CA) to achieve defect detection in AFP. The SPFFY-CA includes spatial pyramid dilated convolutions (SPDCs) and channel attention (CA) modules. In addition, we used the sparsity training and pruning (STP) method to achieve network slimming and ensure the efficiency and accuracy of defect detection.
The contributions of this work can be briefly summarized as follows: • We propose the spatial pyramid feature fusion YOLOv5 (SPFFY), which adopts spatial pyramid dilated convolutions (SPDCs) to fuse the feature maps extracted in different receptive fields, thus integrating multi-scale defect information; • The channel attention (CA) mechanism was utilized to evaluate the importance of the channels obtained from concatenate functions, which improves the representation ability of the model and generates more effective features; • The sparsity training and pruning (STP) method based on the measurement of sparse and redundant features was utilized to obtain a smaller and more compact network while maintaining accuracy; • The proposed method was evaluated on the PASCAL VOC and our AFP defect datasets, and based on the results, it performs better than the original models.
The remainder of this paper is organized as follows: Section 2 discusses related work. Section 3 describes the proposed methods in detail. Sections 4 and 5 present the experiments on the PASCAL VOC and AFP defect datasets, respectively. Finally, we conclude our work in Section 6.

Related Work
With the rapid development of deep learning in the field of object recognition, the AFP defect detection algorithm based on deep convolution neural networks (CNNs) has become a new research direction.

Deep CNNs for Object Detection
In recent years, deep convolution neural networks (CNNs) have achieved great success in visual recognition tasks [16][17][18][19][20]. With the improvement of hardware capability and the rapid development of deep convolutional neural network (CNN) architectures (AlexNet [16], VGGNet [21], ResNet [22], MobileNets [23,24], etc.), these models have powerful feature extraction capability to process large-scale images and are suitable for object recognition in complex scenes. The target recognition method based on CNNs is mainly divided into two-stage detection and one-stage detection [25]. Early two-stage recognition methods include R-CNN [26], SPP-Net [27], Fast R-CNN [28], and Faster R-CNN [29]. R-CNN and SPP-Net algorithms use SVM [30] for feature scoring and classification, which is complex to train and takes a long time to detect. Fast R-CNN uses the full connection layer instead of the SVM classifier, but it takes a long time to obtain the region of interest (ROI), and its detection speed is slow. Faster RCNN uses regional candidate networks (RPNs) to achieve end-to-end target recognition and detection, which improves the speed of target detection. However, as the two-stage target detection algorithm needs a large number of calculations and parameters, it cannot meet the requirements of real-time detection and batch application. One-stage detection methods include the YOLO series [31][32][33][34][35] and the SSD series [36,37]. When using the YOLO (You Only Look Once) algorithm for object recognition, the input image only needs one forward inference to predict all target positions and category information in the image. Each series of algorithms can further improve the recognition performance of the model by changing its classification strategy and backbone network.

Defect Detection in AFP
Various methods currently exist for AFP defect detection. In the AFP process, due to environmental factors, laying temperature, laying speed, laying pressure, equipment accuracy, laying trajectory planning, etc., different types of defects will occur in the final composite products. Many methods have been proposed based on machine vision to detect defects during the AFP process. Shadmehri et al. [38] proposed a laser vision detection system for the automatic fiber placement manufacturing process. This laserassisted detection system is very intuitive but in essence is still based on manual detection, which does not significantly improve its efficiency. Marani et al. [39] used thermal imaging technology to obtain the surface image of glass fiber-reinforced materials. The SURF operator and unsupervised learning K-means are used to detect the surface defects of glass fiber composites. Denkena et al. [40] proposed a defect detection system based on infrared thermal imaging and related image processing for the inspection of AFP 4 of 16 processes. The edge detection algorithm is used to analyze the specific area compacted by the roller, extract the geometric shape and position of the tow, obtain the relevant information of the layer, and further detect defects such as overlaps, gaps, twists, etc. Brüning et al. [41] proposed a machine learning algorithm using an integrated infrared (IR) camera, which detects different types of defects and provides real-time quality information for the inspection of AFP processes, achieving automated data capture, data storage, modeling, and optimization. Chen et al. [42] proposed an intelligent AFP detection system that uses infrared vision for defect recognition and measurement and includes intelligent decision making, multi-parameter optimization, and data storage.
Some related studies in the field of deep learning have addressed AFP defect recognition [43]. Carsten Schmidt et al. [44] proposed a defect detection and classification method based on thermal imaging and deep learning in the automatic fiber placement (AFP) process. They designed three different CNN architectures for the detection and monitoring of tow defects, as well as for path monitoring. This method is only used to classify different defects and cannot locate them. In addition, when the defect target is small, the image contains a large amount of invalid background information, which interferes with the accuracy of classification. Sebastian Zambal et al. [8] proposed image segmentation to address defect detection in AFP and used artificially generated data [45] to solve the problem of insufficient defect data. The authors used probabilistic graphical models to generate training images and annotations and designed a neural network for image segmentation using an architecture similar to U-Nets, which is suitable for training with few real data. Sebastian Meister et al. [46] proposed a defect detection method based on convolutional and recurrent neural networks. In this method, one-dimensional signals are used to analyze the input height distribution of a laser line scanning sensor line by line, which is suitable for classifying images with large defects.
In these existing studies, the quality inspection of automatic fiber placement (AFP) is rarely addressed from the aspects of target defect recognition and the location of end-to-end learning and detection networks. Furthermore, the existing studies still cannot effectively solve the problem of background information interference in AFP defect detection. Thus, we aimed to design a deep learning algorithm to identify and analyze the defects of different scales and types in end-to-end frameworks and intuitively provide the inspection results.

Pruning
To achieve a more compact and effective network that eliminates the time-consuming detection of two-dimensional images, we utilized structured pruning for online AFP defect detection. Pruning methods commonly include unstructured and structured pruning. A pruning process consists of three steps: training large networks, pruning redundant channels, and retraining the pruned networks. Regarding unstructured pruning, LeCun et al. used second-derivative information and removed the weights based on their saliency [47]. The early weight pruning method is also mentioned in [48]. Han et al. [49,50] proposed a weight pruning framework to remove some CNN parameters and connections by pruning low-magnitude weights, thus achieving model compression. In contrast, structured pruning can be utilized to perform network slimming and computational acceleration, which do not require specialized hardware or libraries. Some studies [51][52][53] proposed a set of pruning criteria for CNNs to evaluate and remove unimportant feature channels and their corresponding kernels. In [54][55][56], sparsity regularization strategies were proposed to obtain sparse weights and features and reduce the time-intensiveness of the pruningretraining step. In light of this body of research, we utilized feature sparsity training for the structured pruning and acceleration of CNNs to obtain a compact model.

Methods
In this section, the proposed method is described in detail. We present the architecture of our proposed method with the spatial pyramid feature fusion YOLOv5 (SPFFY), channel attention (CA), and sparsity training-pruning (STP).

Multi-Scale Feature Fusion
The original YOLOv5 utilizes a C3 architecture (a CSP bottleneck with 3 convolutions) with an SPPF (spatial pyramid pooling-fast) layer as the backbone to extract the feature map of the last convolution layer. The feature extraction capability of the backbone network directly affects the detection performance of ATP defects. Many recent studies [57,58] have revealed that the feature maps obtained from low-level convolutional layers have higher resolutions and, therefore, help to detect small objects. In these methods, a multi-scale spatial pyramid directs attention to the object by using its spatial features, which improves its detection. An SPPF block uses pooling layers with one-size kernels, and the output of each pooling becomes the input of the next pooling. Inspired by the SPPF, we propose a spatial pyramid dilated convolution (SPDC) module to fuse the multi-scale features extracted in different receptive fields in the same feature map, as shown in Figure 3. These modules replace SPPF and are further integrated with a channel attention mechanism. In the SPDC module, CBS represents conv + bn + silu. k3, s1, p2, and dr1 represent the convolution kernel, stride, padding, and dilation rate of size 3, 1, 2, and 1, respectively. SPDC modules can be regarded as a special block of CNN, as in these modules, the input and output feature maps have the same size; thus, they can be easily added to the backbone network of current detectors to obtain multi-scale feature maps. Here, we added an SPDC module behind each C3 module to replace the original SPPF in the backbone network of YOLOv5.
of CNNs to obtain a compact model.

Methods
In this section, the proposed method is described in detail. We present the architecture of our proposed method with the spatial pyramid feature fusion YOLOv5 (SPFFY), channel attention (CA), and sparsity training-pruning (STP).

Multi-Scale Feature Fusion
The original YOLOv5 utilizes a C3 architecture (a CSP bottleneck with 3 convolutions) with an SPPF (spatial pyramid pooling-fast) layer as the backbone to extract the feature map of the last convolution layer. The feature extraction capability of the backbone network directly affects the detection performance of ATP defects. Many recent studies [57,58] have revealed that the feature maps obtained from low-level convolutional layers have higher resolutions and, therefore, help to detect small objects. In these methods, a multi-scale spatial pyramid directs attention to the object by using its spatial features, which improves its detection. An SPPF block uses pooling layers with one-size kernels, and the output of each pooling becomes the input of the next pooling. Inspired by the SPPF, we propose a spatial pyramid dilated convolution (SPDC) module to fuse the multi-scale features extracted in different receptive fields in the same feature map, as shown in Figure 3. These modules replace SPPF and are further integrated with a channel attention mechanism. In the SPDC module, CBS represents conv + bn + silu. k3, s1, p2, and dr1 represent the convolution kernel, stride, padding, and dilation rate of size 3, 1, 2, and 1, respectively. SPDC modules can be regarded as a special block of CNN, as in these modules, the input and output feature maps have the same size; thus, they can be easily added to the backbone network of current detectors to obtain multi-scale feature maps. Here, we added an SPDC module behind each C3 module to replace the original SPPF in the backbone network of YOLOv5.

Channel Attention
In the existing network architectures, multi-scale features are obtained by concatenating the output features from different layers, but the importance of the output feature channel after concatenation is often ignored. In high-level layers, the extracted features

Channel Attention
In the existing network architectures, multi-scale features are obtained by concatenating the output features from different layers, but the importance of the output feature channel after concatenation is often ignored. In high-level layers, the extracted features often contain target feature information, and output channels have less redundant information. In low-level layers, by contrast, only simple edges and color blocks can be extracted, and the extracted features contain a large amount of background interference information. If the output feature channels extracted from high-level layers are directly concatenated with the low-level output features behind upsample, the target feature information undergoes interference. Therefore, we added a channel attention module after each concatenating operation in the neck part of the model, and the redundant feature channels can be ascribed to different weights for eliminating some noises.
The channel attention (CA) module assigns weights to fusion features from different scales. The channel attention mechanism is utilized after each concatenating operation in the neck network to direct more attention to the effective feature channels, as shown in Figure 4.
The CA module consists of two branches: multi-scale feature fusion and channel attention mechanism. The input feature maps after concatenation are represented as F in ∈ R H×W×C . The feature fusion can generate the same sizes of output feature maps.
The size of output feature maps with feature fusion is F in ∈ R H×W×C . The channel attention mechanism contains two one-dimensional convolutional operations and the Electronics 2022, 11, 3757 6 of 16 sigmoid activation function, which can be used to obtain the weights of each channel. The i-th channel attention score is calculated as: where the tensor x ∈ R 1×1×C is obtained from one-dimensional convolution operations. s ∈ R 1×1×C represents the weight of each feature channel. Then, the output feature channel is calculated as: where the operation is performed by channel-wise multiplication between the score s and the feature map F in . The SPDC and CA modules are embedded in the backbone and neck network. The proposed model is illustrated in Figure 5. often contain target feature information, and output channels have less redundant information. In low-level layers, by contrast, only simple edges and color blocks can be extracted, and the extracted features contain a large amount of background interference information. If the output feature channels extracted from high-level layers are directly concatenated with the low-level output features behind upsample, the target feature information undergoes interference. Therefore, we added a channel attention module after each concatenating operation in the neck part of the model, and the redundant feature channels can be ascribed to different weights for eliminating some noises. The channel attention (CA) module assigns weights to fusion features from different scales. The channel attention mechanism is utilized after each concatenating operation in the neck network to direct more attention to the effective feature channels, as shown in Figure 4.
where the operation is performed by channel-wise multiplication between the score s and the feature map in F .
The SPDC and CA modules are embedded in the backbone and neck network. The proposed model is illustrated in Figure 5.

Sparsity Training and Pruning
With the addition of SPDC and CA modules to the network, we introduced the sparsity training and pruning (STP) method to obtain more compact models and ensure the speed and accuracy of defect detection.

Sparsity Training and Pruning
With the addition of SPDC and CA modules to the network, we introduced the sparsity training and pruning (STP) method to obtain more compact models and ensure the speed and accuracy of defect detection.
In general, the model compression rate can be determined by the actual use environment. However, when the compression rate is high, and the pretraining model has low sparsity, it is easy to prune the useful feature channels in the model, resulting in reduced detection accuracy. Therefore, in typical pruning methods, the number of channels to prune needs to be set to a small value in each iteration, and the pruning-retraining step needs to be repeated many times to obtain the final compact model. To avoid this, we employed the sparsity training of the pretrained networks to increase feature sparsity in each layer. We then used feature sparsity regularization on selected channels. During the sparsity training, the channels to be removed are penalized, and their outputs gradually decrease to zero. In this way, pruning can be finished in one iteration.
Different from most of the existing typical pruning methods [51,52], which adopt multiple iteration schemes (including pruning and retraining), our model needs only one iteration to perform sparsity training and pruning and achieve network slimming. The STP framework is illustrated in Figure 6. First, the location and number of convolutional kernels that need to be pruned are determined by calculating the sparse redundancy of each feature map. Then, sparsity constraint training is performed in each convolutional channel to be pruned, thus speeding up the sparsity of redundant channels and achieving one-step pruning and model precision recovery. The loss function is one of the important components of neural networks, used to calculate the gradients and update the weights of the network. The YOL function consists of three parts: class loss (BCE loss), objectness loss (BCE loss), a tion loss (CIoU loss). It can be formulated as: where λ1, λ2, and λ3 are the control parameters balancing these three terms. Ad ly, the proposed loss function with sparsity training for CNNs is given by: where Rp denotes the feature sparsity regularization on each layer and is calcu The loss function is one of the important components of neural networks, which is used to calculate the gradients and update the weights of the network. The YOLOv5 loss function consists of three parts: class loss (BCE loss), objectness loss (BCE loss), and location loss (CIoU loss). It can be formulated as: where λ 1 , λ 2 , and λ 3 are the control parameters balancing these three terms. Additionally, the proposed loss function with sparsity training for CNNs is given by: where R p denotes the feature sparsity regularization on each layer and is calculated by the Lp norm of feature map F. For the pruning process, different from a simple layer-stack structure, additional attention should be given to each special module of the proposed network.
For the backbone network, each block consists of C3 and SPDC modules. A C3 module with N bottlenecks is illustrated in Figure 7, where the symbol * denotes the convolution operation, and the white blocks represent the pruned channels. The number of bottlenecks in the output channel needs to be consistent to finish the sum operation. We utilize the L 1 norm of the feature map to evaluate the sparsity and redundancy of the output feature channels obtained by the element-wise addition of the last bottleneck in each C3 to determine the location and number of feature channels to be pruned. Then, the output channels of convolutional kernels corresponding to the second layer in each bottleneck are pruned. The importance of the output feature map of the first layer in each bottleneck is evaluated. Then, the corresponding output channels of kernels in the first layer and the input channels in the second layer can be pruned. The pruning architecture of the SPDC module is shown in Figure 8, where the symbol * denotes the convolution operation, and the white blocks represent the pruned channels. The importance of the feature maps obtained after the concatenation operation is first evaluated to determine the redundant feature channels. Then, the corresponding convolutional kernel channels of the previous layer and the input convolutional kernel channels of the next layer can be pruned. The pruning architecture of the SPDC module is shown in Figure 8, where the symbol * denotes the convolution operation, and the white blocks represent the pruned channels. The importance of the feature maps obtained after the concatenation operation is first evaluated to determine the redundant feature channels. Then, the corresponding convolutional kernel channels of the previous layer and the input convolutional kernel channels of the next layer can be pruned.
For the channel attention (CA) module at the neck part of the proposed network, the CA-pruning module is illustrated in Figure 9, where the symbol * denotes the convolution operation, and the white blocks represent the pruned channels.
The pruning architecture of the SPDC module is shown in Figure 8, where the symbol * denotes the convolution operation, and the white blocks represent the pruned channels. The importance of the feature maps obtained after the concatenation operation is first evaluated to determine the redundant feature channels. Then, the corresponding convolutional kernel channels of the previous layer and the input convolutional kernel channels of the next layer can be pruned. For the channel attention (CA) module at the neck part of the proposed network, the CA-pruning module is illustrated in Figure 9, where the symbol * denotes the convolution operation, and the white blocks represent the pruned channels.

Experiments
In this section, we evaluate the effectiveness of the proposed SPFFY-CA on the benchmark PASCAL VOC dataset and our AFP defect dataset. Data augmentation methods, namely random crop, shifting, scaling, clipping, and random color jittering, were adopted to avoid overfitting. We trained the original network from scratch, defined as the baseline, using the computer with an Intel I7-8700 CPU and NVIDIA GTX 3060 with 12 GB of memory. YOLOv5 are open-source machine learning frameworks that accelerate the process from research prototyping to production deployment.

Experiments on PASCAL VOC Datasets
The PASCAL Visual Object Classes Challenge (PASCAL VOC) dataset consists of VOC2007 and VOC2012. The dataset contains 20 objects, namely, Human: person; Animal: bird, cat, cow, dog, horse, and sheep; Vehicle: airplane, bicycle, boat, bus, car, motorbike, and train; indoor: bottle, chair, dining table, potted plant, sofa, and tv/monitor. The mean average precision (mAP) at the IoU threshold of 0.5 was calculated to measure the accuracy of target recognition. All the networks were trained on the datasets (16,551) containing VOC2007 and VOC2012 train-val datasets and were tested on the VOC2007 testing dataset (4,952). In terms of the training details, the proposed models were trained using the SGD optimizer. The mini-batch size was 30, and an initial learning rate of 10 −2 was used. The momentum was 0.937, and the weight decay was 0.0005. The inference latency (batch size equal to 1) and parameters of the models were determined.
On the PASCAL VOC dataset, the performance of the proposed SPFFY-CA was compared with other state-of-the-art studies, and the results are shown in Table 1. It can be seen from Table 1 that SPFFY-CA-STP obtains 0.9% mAP higher than YOLOv5m with

Experiments
In this section, we evaluate the effectiveness of the proposed SPFFY-CA on the benchmark PASCAL VOC dataset and our AFP defect dataset. Data augmentation methods, namely random crop, shifting, scaling, clipping, and random color jittering, were adopted to avoid overfitting. We trained the original network from scratch, defined as the baseline, using the computer with an Intel I7-8700 CPU and NVIDIA GTX 3060 with 12 GB of memory. YOLOv5 are open-source machine learning frameworks that accelerate the process from research prototyping to production deployment.

Experiments on PASCAL VOC Datasets
The PASCAL Visual Object Classes Challenge (PASCAL VOC) dataset consists of VOC2007 and VOC2012. The dataset contains 20 objects, namely, Human: person; Animal: bird, cat, cow, dog, horse, and sheep; Vehicle: airplane, bicycle, boat, bus, car, motorbike, and train; indoor: bottle, chair, dining table, potted plant, sofa, and tv/monitor. The mean average precision (mAP) at the IoU threshold of 0.5 was calculated to measure the accuracy of target recognition. All the networks were trained on the datasets (16,551) containing VOC2007 and VOC2012 train-val datasets and were tested on the VOC2007 testing dataset (4952). In terms of the training details, the proposed models were trained using the SGD optimizer. The mini-batch size was 30, and an initial learning rate of 10 −2 was used. The momentum was 0.937, and the weight decay was 0.0005. The inference latency (batch size equal to 1) and parameters of the models were determined.
On the PASCAL VOC dataset, the performance of the proposed SPFFY-CA was compared with other state-of-the-art studies, and the results are shown in Table 1. It can be seen from Table 1 that SPFFY-CA-STP obtains 0.9% mAP higher than YOLOv5m with the same magnitude of parameters and latency time. Compared with other algorithms, SPFFY-CA and SPFFY-CA-STP have fewer parameters and higher recognition accuracy.  Table 2 shows the average precision (AP) of the proposed SPFFY-CA and SPFFY-CA-STP compared with SSD300 [36], SSD512 [36], CenterNet [61], and YOLOv5 [62]. It can be seen that the performance of the proposed method is superior to that of the other algorithms in the recognition performance of each category.

Ablation Study
We conducted ablation studies to validate the proposed method as follows: Spatial pyramid dilated convolutions (SPDC): We investigated the power of the spatial pyramid dilated convolution module by comparing the SPFFY-CA with and without the SPDC module. For this experiment, we used the SPFFY-CA without SPDC and trained it on the PASCAL VOC dataset. The training strategy was the same as in the previous section. The performance comparison results are shown in Table 3. It can be seen that the SPFFY-CA with the SPDC module can obtain better performance. Channel attention (CA): In this experiment, we studied the effects of SPFFY-CA with and without the multi-scale channel attention (CA) module. We used the SPFFY-CA without the CA module and trained the model on the PASCAL VOC dataset. The training strategy was the same as in the previous experiment. The performance comparison results are shown in Table 4. It can be seen that the SPFFY-CA with the CA module can obtain better performance. In this experiment, we investigated the effect of sparsity training and pruning (STP) on SPFFY-CA. We used the SPFFY-CA trained on the PASCAL VOC dataset. The training strategy was the same as in the previous experiment. The SPFFY-CA model was pruned with three different compression rates, and the results are shown in Table 5, where "SPFFY-CA-pruned-2" is based on the model of "SPFFY-CApruned-1". From Table 5, it can be inferred that the STP can compress the SPFFY-CA model and ensure the stability of identification accuracy.

Experiments on AFP Defect Datasets
Due to the complexity of the AFP manufacturing process, as well as environmental factors, process parameters, CFRP defects, equipment accuracy, laying trajectory planning, etc., different types of defects will appear in the final composite products, which will affect their mechanical properties [63,64]. Common types of AFP defects include wrinkles, twists, gaps, bubbles, and the presence of foreign material. In this study, an AFP defect dataset was labeled with 3000 images with an original resolution of 1000 × 1000. Then, 80% of the defect samples were used as the train-val dataset, and the rest were used as the test set to evaluate the performance of the model. The number of instances found for each type of defect is shown in Figure 10.
The mean average precision (mAP) at the IoU threshold of 0.5 was calculated to measure the accuracy of defect recognition. In terms of the training details, the models were trained using the SGD optimizer with a batch size of 30 and an initial learning rate of 10 −2 . The momentum was 0.9, and the weight decay was 0.0005. The latency time (batch size equal to 1) was determined. On the AFP defect dataset, the performance of the proposed SPFFY-CA was compared with other detection algorithms, and the results are shown in Table 6. It can be seen from Table 6 that the SPFFY-CA proposed in this paper achieves an accuracy of 93.1% on the AFP defect dataset and has higher recognition confidence than YOLOv5m for defect detection. For all the various types of defects, SPFFY-CA achieves higher performance than YOLOv5m. Electronics 2022, 11, x FOR PEER REVIEW 13 of 17 The mean average precision (mAP) at the IoU threshold of 0.5 was calculated to measure the accuracy of defect recognition. In terms of the training details, the models were trained using the SGD optimizer with a batch size of 30 and an initial learning rate of 10 −2 . The momentum was 0.9, and the weight decay was 0.0005. The latency time (batch size equal to 1) was determined. On the AFP defect dataset, the performance of the proposed SPFFY-CA was compared with other detection algorithms, and the results are shown in Table 6. It can be seen from Table 6 that the SPFFY-CA proposed in this paper achieves an accuracy of 93.1% on the AFP defect dataset and has higher recognition confidence than YOLOv5m for defect detection. For all the various types of defects, SPFFY-CA achieves higher performance than YOLOv5m.  Figure 11 shows the detection results of SPFFY-CA-STP and the original YOLOv5m for various defects, where the confidence score is higher than 0.5. Figure 11a shows the recognition effect of the designed SPFFY-CA-STP model, and Figure 11b shows the detection effect of the original YOLOv5m. It can be seen from Figure 11 that the SPFFY-CA-STP model has higher recognition confidence than YOLOv5m for defects of different scales and types.   Figure 11 shows the detection results of SPFFY-CA-STP and the original YOLOv5m for various defects, where the confidence score is higher than 0.5. Figure 11a shows the recognition effect of the designed SPFFY-CA-STP model, and Figure 11b shows the detection effect of the original YOLOv5m. It can be seen from Figure 11 that the SPFFY-CA-STP model has higher recognition confidence than YOLOv5m for defects of different scales and types.
The proposed SPFFY-CA-STP can achieve higher performance on the PASCAL VOC dataset while maintaining the same detection speed and can realize the real-time detection of multi-scale AFP defects. The quality inspection of automatic fiber placement (AFP) is thus addressed through the recognition of target defects and the location of an endto-end learning and detection network, but the main limitation is that this method is mainly used when the defect types are known, and further data collection is required for unknown defects.  The proposed SPFFY-CA-STP can achieve higher performance on the PASCAL VOC dataset while maintaining the same detection speed and can realize the real-time detection of multi-scale AFP defects. The quality inspection of automatic fiber placement (AFP) is thus addressed through the recognition of target defects and the location of an end-to-end learning and detection network, but the main limitation is that this method is mainly used when the defect types are known, and further data collection is required for unknown defects.

Conclusions and Future Work
In this paper, we proposed a multi-scale AFP defect detection algorithm named the spatial pyramid feature fusion YOLOv5 with channel attention (SPFFY-CA), which includes spatial pyramid dilated convolutions (SPDCs) and channel attention (CA) modules to fuse the feature maps extracted in different receptive fields, thus integrating multi-scale defect information. Through the CA mechanism, the importance of the channels obtained from the concatenate function was evaluated, and further attention was given to the effective feature channels, which improved the representation ability and generated more effective features. In addition, we employed the sparsity training and pruning (STP) method to obtain more compact models and ensure the speed and accuracy of defect detection. The experimental results on the PASCAL VOC and the AFP defect datasets prove the effectiveness of the proposed approach and that it can obtain state-of-the-art performance. In future research, we will further study the defect detection of manual paving and better apply the visual identification technology to control the composite manufacturing process.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author (wangwei_4524@163.com). The data are not publicly available due to privacy restrictions.

Conclusions and Future Work
In this paper, we proposed a multi-scale AFP defect detection algorithm named the spatial pyramid feature fusion YOLOv5 with channel attention (SPFFY-CA), which includes spatial pyramid dilated convolutions (SPDCs) and channel attention (CA) modules to fuse the feature maps extracted in different receptive fields, thus integrating multi-scale defect information. Through the CA mechanism, the importance of the channels obtained from the concatenate function was evaluated, and further attention was given to the effective feature channels, which improved the representation ability and generated more effective features. In addition, we employed the sparsity training and pruning (STP) method to obtain more compact models and ensure the speed and accuracy of defect detection. The experimental results on the PASCAL VOC and the AFP defect datasets prove the effectiveness of the proposed approach and that it can obtain state-of-the-art performance. In future research, we will further study the defect detection of manual paving and better apply the visual identification technology to control the composite manufacturing process.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author (wangwei_4524@163.com). The data are not publicly available due to privacy restrictions.