Process Monitoring in Friction Stir Welding Using Convolutional Neural Networks

: Preliminary studies have shown the superiority of convolutional neural networks (CNNs) compared to other network architectures for determining the surface quality of friction stir welds. In this paper, CNNs were employed to detect cavities inside friction stir welds by evaluating inline measured process data. The aim was to determine whether CNNs are suitable for identifying surface defects exclusively, or if the approach is transferable to internal weld defects. For this purpose, 120 welds were produced and examined by ultrasonic testing, which was the basis for labeling the data as “good” or “defective.” Different types of artiﬁcial neural network were tested for predicting the placement of the welds into the deﬁned classes. It was found that the way of labeling the data is signiﬁcant for the accuracy achievable. When the complete welds were uniformly labeled as “good” or “defective,” an accuracy of 98.5% was achieved by a CNN, which was a signiﬁcant improvement compared to the state of the art. When the welds were labeled segment-wise, an accuracy of 79.2% was obtained by using a CNN, showing that a segment-wise prediction of the cavities is also possible. The results conﬁrm that CNNs are well suited for process monitoring in friction stir welding and their application enables the identiﬁcation of various defect types.


Introduction
Friction stir welding (FSW) is a modern joining process in which a weld is produced through frictional heating and by the mixing of material in the plastic state using a rotating tool. Since it is a solid-state process well below melting-temperature, the weldability of aluminum alloys is superior compared to fusion welding technologies. Consequently, FSW is well suited for a variety of joining tasks, especially in the aerospace industry [1]. A recent trend is the use of FSW in the production of heat exchangers and battery trays for electric vehicles [2].
With the increasing application of FSW, demand is growing for non-destructive evaluation methods that are more reliable than those currently available on the market [3]. As FSW is a highly automated process, the application of sensors for inline process monitoring is feasible. Inline monitoring methods can be categorized as direct or indirect methods. While direct methods use technologies such as camera vision or ultrasonic testing, indirect methods evaluate information such as forces and temperatures. Indirect methods are usually less accurate but more economical and less sensitive to external influences, such as light exposure. Consequently, indirect methods are preferable to direct methods for industrial applications [4].
For indirect methods in particular, the appropriate processing and analysis of sensor signals are of crucial importance to correctly interpret information about the manufacturing process [4]. Developments in the field of machine learning in general and deep learning in particular offer great potential for manufacturers to profitably evaluate production data and monitor product quality [5].

Related Work
In the field of FSW, there have been various efforts applying ANNs to identify weld defects by direct or indirect monitoring. The first research work in this area was published by Boldsaikhan et al. [8]; the authors recorded the process forces in three spatial directions and the spindle torque at a sampling rate of 51.2 Hz. The time signals were transformed into the frequency domain using a discrete Fourier transform. The required signal features to train and test various FCNNs were extracted in the frequency domain. One FCNN predicted whether the welds contained metallurgical defects. A total of 205 samples were available, whereby the split between "good" and "defective" samples was quite unbalanced in both the test and the training data set, with significantly more "good" than "defective" samples (the test data set contained 146 "good" and five "defective" samples). The highest test accuracy of 100% was achieved when evaluating the y-force.
Fleming et al. [9] used a regression neural network to detect an improper positioning of the welding tool during FSW. For data generation, the tool was displaced in the y-direction (orthogonal to the welding direction) from −4 mm up to +4 mm relative to the center position in 30 experiments. The forces in the xand z-directions were evaluated using an FCNN. The mean absolute error for the prediction of the tool position relative to the centerline was 0.42 mm with a standard deviation of 0.51 mm.
Boldsaikhan et al. [10] recorded the occurring process forces in the welding direction and transverse to the welding direction with a sampling rate of 68.2 Hz and evaluated the resulting data using an FCNN. One cross section for metallography was taken from each weld to determine whether the welds actually contained cavities. Whenever the cross section revealed a cavity with a diameter of more than 0.08 mm, the entire weld was labeled as "defective". By this procedure, a prediction accuracy of up to 95% was achieved.
Du et al. [11] tested a total of five different procedures to predict defects in FSW. Two different machine learning methods (decision trees and FCNNs) and three different kinds of input data (experimental data, data from an analytical model, and data from a numerical model) were utilized. The 108 data samples were collected from the literature and labeled as "good" or "defective". The best results were obtained employing the data from the numerical model, whereby a test accuracy of 96.6% was achieved with both the FCNN and the decision tree algorithm. The analysis of the experimental data by using the FCNN led to an accuracy of 83.3%.
Hartl et al. [12] implemented a direct monitoring method using a CNN-based object detection algorithm to recognize friction stir welds on aluminum sheets, and up to 95.0% of the human performance level was achieved. Subsequently, the surface properties of the welds were classified by another CNN, whereby various surface defects such as toe flash or surface galling were identified. Color images recorded with a digital camera and topography images acquired by a three-dimensional surface profilometer were tested as input data. The topography images led to the best results, enabling a classification accuracy of 92.1% (the human repeatability in classifying the topography images corresponded to 93.9%) [12]. Mishra et al. [13] also applied a CNN to classify images into conventional fusion welds and friction stir welds. For this purpose, 100 images were utilized, which were scaled Metals 2021, 11, 535 3 of 12 up to a total of 1000 images by using data augmentation. By employing the VGG-19 [14] network architecture, an accuracy of 85% was achieved for the classification task.
In Hartl et al. [15], the focus was on the indirect monitoring of the surface quality. Various sensors were employed for the inline acquisition of accelerations, forces, the spindle torque, and temperatures. To predict whether the weld surface quality will be "good" or "defective", three different network architectures were tested: FCNNs, RNNs, and CNNs. The best results were obtained when evaluating the spindle torque by a CNN, whereby a prediction accuracy of 87.4% was reached.
In addition to their deployment in FSW, ANNs have also been applied in the field of friction stir processing (FSP), which is a surface modification technique based on the principles of FSW [16]. Fahd [17] used an ANN to predict the resulting grain size after performing FSP. The input variables were the tool rotational speed, the traverse speed, and the chemical composition of the aluminum alloy. The comparison between the experimental data and the values generated by the ANN revealed that for more than 90% of the predictions, the percentage error relative to the actual value was below 10%. Dinharan et al. [18] applied an ANN to predict the wear rate of copper surface composites that were produced using FSP. An FCNN with four input neurons, a hidden layer with 10 neurons, and one output neuron was employed. On the test data set, a correlation coefficient of 0.99 was obtained between the experimental data and the prediction of the ANN, which qualified the FCNN as an accurate and powerful tool for determining the wear rate of surface composites in FSP.
The present paper examines the crucial question of whether CNNs are superior to the other two network types FCNNs and RNNs for predicting internal weld defects such as cavities. If this were the case, it would strengthen the assumption that CNNs are superior to FCNNs and RNNs regardless of the defect type to be detected. The most relevant related work on the prediction of internal weld defects was published by Boldsaikhan et al. [10]. However, their approach of uniformly labeling the entire weld as "good" or "defective" depending on one cross section per weld was a considerable simplification and should be extended to a segment-wise assessment of the welds. This would enable a more precise localization of the cavities inside the welds. Consequently, the present paper explores two hypotheses: CNNs provide greater accuracy than FCNNs and RNNs do for detecting cavities.

II.
A non-destructive, data-based, and segment-wise prediction of cavities is possible.

Welding Experiments
The welding experiments were conducted on a four-axis horizontal milling machine, MCH 250 from Gebr. Heller Maschinenfabrik GmbH (Nuertingen, Germany), which had been adapted to perform FSW. To obtain a sufficient amount of data, 120 welds were produced using the aluminum alloy EN AW-6082-T6. In each experiment, two sheets with a thickness of 4.0 mm were welded in butt-joint configuration. The welds had a one-dimensional trajectory with a length of 205 mm. A two-piece tool consisting of a shoulder and a probe was utilized in the experiments. Figure 1 displays the tool geometry, and Table 1 lists the tool's relevant dimensions. To obtain a sufficient number of welds with cavities for appropriately training the ANN, process parameters resulting in a low welding temperature were deliberately applied. At low welding temperatures, the likelihood of cavity occurrence is particularly high. All welds were produced in position-controlled mode employing a tool tilt angle of 2 • and a plunge depth of 0.1 mm. The welding speed v s and the tool rotational speed n (RPM) were Metals 2021, 11, 535 4 of 12 varied according to a full factorial experimental plan: The welding speed v s ranged from 500 mm/min to 1200 mm/min (with steps of 50 mm/min), and the n/v s ratio varied from 1.0 mm −1 to 1.7 mm −1 (with steps of 0.1 mm −1 ). The tool rotational speed n was adjusted accordingly. High welding speeds beyond 1000 mm/min are of major importance to meet the productivity requirements of the automotive industry [2]. Consequently, such high welding speeds were included in the experimental plan. In order to avoid damage to the welding machine, the welding tool, or the measuring equipment, the n/v s ratio was at least 1.0 mm −1 . A full experimental plan is given in Figure S1 in the supplementary materials to this article. To obtain a sufficient number of welds with cavities for appropriately trai ANN, process parameters resulting in a low welding temperature were delibera plied. At low welding temperatures, the likelihood of cavity occurrence is par high. All welds were produced in position-controlled mode employing a tool tilt 2° and a plunge depth of 0.1 mm. The welding speed vs and the tool rotational (RPM) were varied according to a full factorial experimental plan: The welding ranged from 500 mm/min to 1200 mm/min (with steps of 50 mm/min), and the n varied from 1.0 mm −1 to 1.7 mm −1 (with steps of 0.1 mm −1 ). The tool rotational spee adjusted accordingly. High welding speeds beyond 1000 mm/min are of m portance to meet the productivity requirements of the automotive industry [2] quently, such high welding speeds were included in the experimental plan. In avoid damage to the welding machine, the welding tool, or the measuring equ the n/vs ratio was at least 1.0 mm −1 . A full experimental plan is given in Figure S supplementary materials to this article.

Data Acquisition and Pre-Processing
The process forces in three spatial directions Fx, Fy, and Fz, and the spindle to were recorded with a sampling rate of 9.6 kHz using a dynamometer from HBM (Darmstadt, Germany). The temperatures at the tool shoulder TS and the tool p were measured with a sampling rate of 220 Hz by thermocouples. The acceleratio Figure 1. Tool geometry.

Data Acquisition and Pre-Processing
The process forces in three spatial directions F x , F y , and F z , and the spindle torque M z were recorded with a sampling rate of 9.6 kHz using a dynamometer from HBM GmbH (Darmstadt, Germany). The temperatures at the tool shoulder T S and the tool probe T P were measured with a sampling rate of 220 Hz by thermocouples. The accelerations a x , a y , and a z in three spatial directions were determined with a sampling rate of 20 kHz by an acceleration sensor from Kistler Instrumente GmbH (Winterthur, Switzerland). The experimental set-up is depicted in Figure 2, whereby the x-direction coincided with the welding direction. In Table S2 in the supplementary materials to this article, mean values and root mean square (RMS) values are provided for the nine different process variables for all 120 welds.
The various recorded process signals were cut to the relevant area where the feed occurred and were uniformly sampled with a frequency of 5.0 kHz. Outliers and noise in the signals were removed by employing moving average and interpolation filters. Then, the signals of each weld were divided into 17 weld segments of 10 mm in length, the so-called regions of interest (ROI). Further pre-processing of the signals depended on the architecture of the three different network types. For the FCNN, the mean values were calculated for each signal in each ROI. For the RNN, the instantaneous frequency [19] and the spectral entropy, which are also often used as a feature in medicine signal processing [20], were determined and employed as input. For the CNN, spectrograms were generated, similar to Hartl et al. [15]. Spectrograms depict the spectral density of a signal depending on the time and the frequency in a three-dimensional manner [21].
Metals 2021, 11, x FOR PEER REVIEW 5 of 13 [20], were determined and employed as input. For the CNN, spectrograms were generated, similar to Hartl et al. [15]. Spectrograms depict the spectral density of a signal depending on the time and the frequency in a three-dimensional manner [21].

Material Testing
It is not possible to take a metallographic sample at every point of the weld to determine the actual occurrence of cavities. Consequently, ultrasonic testing was used as an alternative to detect cavities in the entire welds. The tests were performed by Element Materials Technology Aalen GmbH (Aalen, Germany) via straight-beam scanning in an immersion technique using the GE USIP40 equipment from GE Sensing and Inspection Technologies GmbH (Huerth, Germany) and an ISS Alpha 15 MHz 0.25" probe. Water with an added inhibitor served as a couplant. The tests were conducted according to the ISO 16810 standard [22]. For the calibration, a reference flat bottom hole with a diameter of 1.0 mm was prepared in one of the welds at a depth of 2.0 mm. The amplification during the calibration was 56 dB. The amplification during testing was 68 dB, corresponding to a flat bottom hole of approximately 0.5 mm in diameter at 80% screen height (SH). The test frequency was 15 MHz. Figure 3 displays the C-scan of the weld containing the reference flat bottom hole and the corresponding A-scan at the position of the reference hole. In the A-scan, the amplitude enabled a comparison of the size of a natural defect with the size of the reference defect. The sound path corresponded to the depth of a defect from the surface of the weld.
To validate the results of the ultrasonic tests, a total of 37 metallographic samples were prepared. The specimens were embedded in an epoxy system, ground to a fineness of P1200, and polished with a 3 mm diamond suspension and colloidal silica. Finally, the samples were etched using Kroll's etchant [23]. In the supplementary materials to this article, images of all 37 metallographic specimens are provided in Table S1.

Material Testing
It is not possible to take a metallographic sample at every point of the weld to determine the actual occurrence of cavities. Consequently, ultrasonic testing was used as an alternative to detect cavities in the entire welds. The tests were performed by Element Materials Technology Aalen GmbH (Aalen, Germany) via straight-beam scanning in an immersion technique using the GE USIP40 equipment from GE Sensing and Inspection Technologies GmbH (Huerth, Germany) and an ISS Alpha 15 MHz 0.25" probe. Water with an added inhibitor served as a couplant. The tests were conducted according to the ISO 16810 standard [22]. For the calibration, a reference flat bottom hole with a diameter of 1.0 mm was prepared in one of the welds at a depth of 2.0 mm. The amplification during the calibration was 56 dB. The amplification during testing was 68 dB, corresponding to a flat bottom hole of approximately 0.5 mm in diameter at 80% screen height (SH). The test frequency was 15 MHz. Figure 3 displays the C-scan of the weld containing the reference flat bottom hole and the corresponding A-scan at the position of the reference hole. In the A-scan, the amplitude enabled a comparison of the size of a natural defect with the size of the reference defect. The sound path corresponded to the depth of a defect from the surface of the weld.
To validate the results of the ultrasonic tests, a total of 37 metallographic samples were prepared. The specimens were embedded in an epoxy system, ground to a fineness of P1200, and polished with a 3 mm diamond suspension and colloidal silica. Finally, the samples were etched using Kroll's etchant [23]. In the supplementary materials to this article, images of all 37 metallographic specimens are provided in Table S1. been demonstrated to be adequate in previous studies [15]. The allocation of the ROI to the three data sets and the initialization of the weights of the ANNs were conducted randomly. For this reason, all computations were performed 10 times, and subsequently the mean value and the standard deviation of the accuracies were calculated. The training of the ANNs took place for a maximum of 30 epochs. For the FCNN, the Levenberg-Marquard training function [26] was used. For the RNN and the CNN, the Adam optimizer [27] was applied.

Data Set
The data set consisted of 120 welds, each of which was further subdivided into 17 ROI. This resulted in a total of 2040 ROI that were available for the training, validation, and testing of the ANNs. In Figure 4, the amplitudes from the ultrasonic testing are depicted depending on the cavity sizes being measured on the cross sections of the 37 prepared metallographic samples. It is evident that there is no distinct correlation between the cavity size and the amplitude. Consequently, it is not possible to determine the exact size of the cavity from the ultrasonic test.

Artificial Neural Network (ANN) Modeling, Training, Validation, and Test
The FCNN contained one input neuron, one hidden layer with 10 neurons, and one classification layer with one output. Varying the number of hidden layers of the FCNN as well as the neurons in the hidden layers did not lead to any improvement.
The RNN had one sequence input layer with two neurons for the instantaneous frequency and the spectral entropy, one bi-directional long short-term memory layer with 100 hidden units, one fully connected layer with two outputs, one softmax layer, and finally one classification layer. Here again, varying the number of hidden units did not result in any further improvement in accuracy.
The CNN was based on the network architecture AlexNet [24]. Using deeper CNN architectures (VGG-16, VGG-19 [14], and ResNet-50 [25]) did not increase the obtained accuracy. Additionally, the computation time was significantly lower when using the AlexNet-based architecture compared to the other three tested CNN architectures.
The entire data was divided into 70% training data, 15% validation data, and 15% test data. This division is frequently used in the field of machine learning and has also been demonstrated to be adequate in previous studies [15]. The allocation of the ROI to the three data sets and the initialization of the weights of the ANNs were conducted randomly. For this reason, all computations were performed 10 times, and subsequently the mean value and the standard deviation of the accuracies were calculated. The training of the ANNs took place for a maximum of 30 epochs. For the FCNN, the Levenberg-Marquard training function [26] was used. For the RNN and the CNN, the Adam optimizer [27] was applied.

Data Set
The data set consisted of 120 welds, each of which was further subdivided into 17 ROI. This resulted in a total of 2040 ROI that were available for the training, validation, and testing of the ANNs. In Figure 4, the amplitudes from the ultrasonic testing are depicted depending on the cavity sizes being measured on the cross sections of the 37 prepared metallographic samples. It is evident that there is no distinct correlation between the cavity size and the amplitude. Consequently, it is not possible to determine the exact size of the cavity from the ultrasonic test.  Two criteria were considered for the selection of a suitable threshold value to separate the ROI into the categories "good" and "defective". First, the available data set of 2040 ROI should be divided as evenly as possible into the two classes. Second, as many data points as possible should be located in the I. and III. quadrants of Figure 4, because this indicates a high consistency of the classes "good" and "defective" between the metallography and the ultrasonic tests. An amplitude of 65% SH was selected. This value revealed a high agreement with a cavity size of 0.5 mm: above an amplitude of 65% SH, 18 of 23 cross sections showed a cavity size above 0.5 mm (the corresponding 18 data points are located in the I. quadrant in Figure 4); below an amplitude of 65% SH, 12 of 14 cross sections revealed a cavity size below 0.5 mm (the corresponding 12 data points are located in the III. quadrant in Figure 4). Furthermore, the ROI were divided sufficiently evenly into the two classes (1226 good ROI; 814 defective ROI) when defining the threshold at 65% SH.

Comparison of Different Process Variables
The results of the prediction of the cavities using different process variables and network architectures are summarized in Figure 5. Here the validation data set was used.   Two criteria were considered for the selection of a suitable threshold value to separate the ROI into the categories "good" and "defective". First, the available data set of 2040 ROI should be divided as evenly as possible into the two classes. Second, as many data points as possible should be located in the I. and III. quadrants of Figure 4, because this indicates a high consistency of the classes "good" and "defective" between the metallography and the ultrasonic tests. An amplitude of 65% SH was selected. This value revealed a high agreement with a cavity size of 0.5 mm: above an amplitude of 65% SH, 18 of 23 cross sections showed a cavity size above 0.5 mm (the corresponding 18 data points are located in the I. quadrant in Figure 4); below an amplitude of 65% SH, 12 of 14 cross sections revealed a cavity size below 0.5 mm (the corresponding 12 data points are located in the III. quadrant in Figure 4). Furthermore, the ROI were divided sufficiently evenly into the two classes (1226 good ROI; 814 defective ROI) when defining the threshold at 65% SH.

Comparison of Different Process Variables
The results of the prediction of the cavities using different process variables and network architectures are summarized in Figure 5. Here the validation data set was used.  Two criteria were considered for the selection of a suitable threshold value to separate the ROI into the categories "good" and "defective". First, the available data set of 2040 ROI should be divided as evenly as possible into the two classes. Second, as many data points as possible should be located in the I. and III. quadrants of Figure 4, because this indicates a high consistency of the classes "good" and "defective" between the metallography and the ultrasonic tests. An amplitude of 65% SH was selected. This value revealed a high agreement with a cavity size of 0.5 mm: above an amplitude of 65% SH, 18 of 23 cross sections showed a cavity size above 0.5 mm (the corresponding 18 data points are located in the I. quadrant in Figure 4); below an amplitude of 65% SH, 12 of 14 cross sections revealed a cavity size below 0.5 mm (the corresponding 12 data points are located in the III. quadrant in Figure 4). Furthermore, the ROI were divided sufficiently evenly into the two classes (1226 good ROI; 814 defective ROI) when defining the threshold at 65% SH.

Comparison of Different Process Variables
The results of the prediction of the cavities using different process variables and network architectures are summarized in Figure 5. Here the validation data set was used.   The mean values received from the 10 computations fluctuated between 54.9% and 80.1% depending on the process variable and the network architecture employed. The evaluation of the forces in yand x-directions using the CNN led to the highest accuracies, namely 80.1% and 78.3%. The presence of cavities inside the weld causes a distinct alteration of the forces in the xand y-directions [28]. The high classification accuracy shows that this relation is recognized by the CNN and is the basis for the prediction. Of the 306 ROI used for validation, an average of 245 were classified correctly and 61 incorrectly when evaluating the y-force. Of the 61 incorrect predictions, 27 were false positives (i.e., the ROI was good, but the CNN mistakenly classified it as defective), and 34 were false negatives (i.e., the ROI was defective, but the CNN mistakenly classified it as good), revealing a slight trend towards false negative predictions. Positive means that a cavity is indicated, regardless of whether a cavity is actually present. Since the evaluation of the y-force by the CNN led to the best results on the validation data set, this configuration was also applied to the test data set. With that, a mean accuracy of 79.2% was reached, which demonstrates that a segment-wise prediction of cavities is possible via CNNs. The combination of different process variables did not lead to an improvement in accuracy.
Furthermore, it is remarkable that when applying FCNNs, which are simple in terms of network architecture compared to CNNs, similarly high accuracies were achieved for some process variables (see Figure 5). When evaluating the welding temperatures, the results using the FCNN were even better than for the CNN. As the formation of cavities strongly depends on the welding temperature, some crucial information for the prediction of cavities can already be obtained by evaluating the mean temperature in each ROI by applying the FCNN. When the RNN was used, the highest accuracies could not be achieved for any of the process variables.
To compare the performance of the CNN to the performance of the FCNN presented by Boldsaikhan et al. [10], an additional test was conducted: The 17 ROI of each of the 120 welds were uniformly labeled "good" or "defective", depending on whether their mean amplitude from the ultrasonic test was higher or lower than the chosen threshold of 65% SH. In this way, a mean validation accuracy of 98.8% and a mean test accuracy of 98.5% was achieved when evaluating the F y signal while applying the CNN. This demonstrates the difference between a segment-wise labeling and a uniform labeling of the data of each weld.

Dependence of the Validation Accuracy on the Sampling Rate and the Amount of Training Data
In a previous study, the dependence of the prediction accuracy on the sampling rate was investigated [15]. It was determined that the accuracy only increases up to a sampling rate of approximately 100 Hz. Beyond that, no significant improvement could be detected up to a frequency of 9000 Hz. This behavior was confirmed for the prediction of the cavities (see Figure 6): By investigating in more detail the evaluation of the y-force by using the CNN, it was found that the prediction accuracy tends to increase up to a sampling rate of 500 Hz. However, beyond that no further improvement was observed. This affirms that a high-frequency acquisition of process data during FSW in the kilohertz range offers no additional benefit for evaluations through ANNs.
It was also observed in previous work that the accuracy of the prediction only increased significantly until 20% of the available data was used for training [15]. Beyond that, no significant increase in accuracy could be noted. This result was also confirmed in the present study (see Figure 7). Until 20% of the available data set was employed for training, that is, the data from 408 ROI, the accuracy increased considerably. Beyond that, no significant improvement was observed until 1428 ROI were utilized. This again proves that the quality of the training data is as important for the performance of the ANN as is the quantity. It was also observed in previous work that the accuracy of the prediction only increased significantly until 20% of the available data was used for training [15]. Beyond that, no significant increase in accuracy could be noted. This result was also confirmed in the present study (see Figure 7). Until 20% of the available data set was employed for training, that is, the data from 408 ROI, the accuracy increased considerably. Beyond that, no significant improvement was observed until 1428 ROI were utilized. This again proves that the quality of the training data is as important for the performance of the ANN as is the quantity.    It was also observed in previous work that the accuracy of the prediction only increased significantly until 20% of the available data was used for training [15]. Beyond that, no significant increase in accuracy could be noted. This result was also confirmed in the present study (see Figure 7). Until 20% of the available data set was employed for training, that is, the data from 408 ROI, the accuracy increased considerably. Beyond that, no significant improvement was observed until 1428 ROI were utilized. This again proves that the quality of the training data is as important for the performance of the ANN as is the quantity.  40  50  60  70  80  90  100  200  300  400  500  600  700  800  900  1000  2000  3000  4000  5000  6000  7000 Hz 9000 Trendline manually inserted

Discussion
The accomplished accuracies for three different performed studies for process monitoring in FSW using CNNs are listed in Table 2.
Direct monitoring methods are usually more accurate than indirect methods (see Section 1). Therefore, it is plausible that the highest accuracy was achieved by direct monitoring [12]. The accuracy for the indirect recognition of the internal quality is lower than for the surface quality, presumably because the determination of the labels for the data is more complex. The identification of the cavities by ultrasonic testing is associated with uncertainty (see Section 4.1), whereas the surface characteristics can be determined reliably. Two reasons can be mentioned for the limited correlation between the ultrasonic testing results and the metallographic specimens prepared: first, although ultrasonic testing makes it possible to determine the location of a defect very reliably, the exact identification of the defect size is not readily possible and depends on the orientation of the cavity inside the weld. A better estimate of the dimension of the cavities would be possible by sonicating them from different angles. Phased array ultrasonic testing probes can provide this function of different angles of sonication [29]. Second, the exact extraction of the metallographic specimens and thus the precise assignment of the metallographic cross sections to the corresponding location from the ultrasonic image posed a problem, resulting in additional uncertainty. These two circumstances explain the lower accuracy achieved when monitoring the internal quality compared to the surface quality. The conducted study revealed that the way of labeling the data has a significant impact on the achievable accuracy. When all 17 segments of the individual welds were uniformly labeled as "good" or "defective" (which is a simplification), the accuracy of the non-destructive, data-based detection of cavities was increased from 95% to 98.5% compared to the state of the art [10]. This high accuracy makes the application of CNNs interesting for industrial purposes. In addition, the state of the art was extended by the aspect that the welds were also divided into 10-mm-long weld segments, which were labeled individually (this became possible through the performance of the ultrasonic tests). In this case, an accuracy of 79.2% was reached on the test data set, which shows that CNNs also allow for a segment-wise recognition and thus a more precise localization of the cavities.
The effective application of CNNs for predicting cavities in this work constitutes an important step towards a more reliable and accurate process monitoring in FSW. Both hypotheses established in the present study were confirmed: By using the CNN, a higher prediction accuracy was achieved than by using the FCNN or the RNN. II.
It could be shown that a non-destructive, data-based, and segment-wise prediction of cavities is possible.
Based on the present work, the following future research is proposed: • To further increase the prediction accuracy, it is recommended to improve the quality of the training data in future research work. An identification of the cavities in the welds used for training the CNNs by means of phased array ultrasonics or computed tomography scans could significantly increase the accuracy, but will also considerably raise the cost for the weld inspection. • Further prospective research should also address the question of whether other welding imperfections (e.g., internal imperfections such as the hook and root flaws such as the bonded joint remnant [30]) can be recognized by evaluating the process variables using CNNs. • Another future step should be the combination of the presented approach for process monitoring by means of ANNs with an intelligent process optimization. Promising modern algorithms for the optimization of the process parameters in FSW are Bayesian optimization and reinforcement learning [31]. • It is assumed that the presented approach is also applicable in other welding techniques. One example could be the monitoring of optical coherence tomography data in laser beam welding [32]. This must be verified.

Conclusions
In the present work, 120 friction stir welds were produced with different process parameters and inspected by ultrasonic testing to identify cavities inside the specimens. During the welding experiments, nine different process variables were recorded. Afterwards, three different types of ANN were tested to detect the cavities by evaluating the process variables in a non-destructive and data-based manner. Based on two previous studies [12,15] and the present work, the following conclusions can be drawn:

•
CNNs are well suited for process monitoring in FSW. This applies to both surface defects and internal defects. • When evaluating the accuracy achieved when using ANNs, it must be considered whether the welds were labeled uniformly or segment-wise.

•
The prediction accuracy when applying CNNs for process monitoring in FSW initially increases significantly with an increasing sampling rate and with a growing amount of training data. However, as the sampling rate and the amount of training data continue to rise, the rate of improvement of the prediction accuracy drops.
It can be summarized that CNNs are well suited for process monitoring in FSW. This finding represents a decisive step towards a more reliable monitoring of FSW processes by using ANNs. It is assumed that CNNs are also appropriate for process monitoring in other welding technologies.