Predicting the Ultimate Tensile Strength of Friction Stir Welds Using Gaussian Process Regression

In the work described here, Gaussian process regression was applied to predict the ultimate tensile strength of friction stir welds through data evaluation and to therefore avoid destructive testing. For data generation, a total of 54 welding experiments were conducted in the butt joint configuration using the aluminum alloy EN AW-6082-T6. Four tensile samples were taken from each of the 54 experiments and the resulting ultimate tensile strength of the weld seam segment was modeled as a function of the weld’s surface topography. Further models were created for comparison, which received either the process variables or the process parameters to predict the ultimate tensile strength. It was shown that the ultimate tensile strength can be accurately predicted based on the weld’s surface topography. Especially for low welding speeds, the correlation coefficients between the true and the predicted ultimate tensile strength were high. However, overall, even higher correlation coefficients could be achieved when providing the process variables or the process parameters to the model. In conclusion, it was shown that the developed Gaussian process regression model is a powerful approach for replacing destructive testing and for predicting ultimate tensile strength based solely on data that can be collected non-destructively.


Introduction
Friction stir welding (FSW) produces welds by using a rotating, non-consumable welding tool to locally soften a workpiece using heat that is generated through friction and plastic work [1]. Ever since Thomas [2] first demonstrated friction stir welding in 1991, its application has been steadily growing throughout the industry [3]. With the use of FSW soaring, there is an increasing need for non-destructive testing (NDT) processes that are superior to those currently available on the market in order to provide adequate quality inspection, particularly for safety-critical applications [3]. The destructive evaluation of friction stir welds is not recommended in most cases because it is expensive both in terms of lost workpieces and time-consuming execution [4]. The NDT requirements for assessing the weld quality expressed by the industry are short execution times and cost-efficiency [3]. The ability to reliably detect small defects and determine the associated mechanical properties for products that are joined using FSW is important for maintaining quality assurance and for complying with defined standards [3]. NDT methods can be classified as either direct or indirect methods [5]. The direct methods make use of techniques, such as camera vision or radiography. They are based on dimensional measurements and have a high degree of accuracy, however, due to numerous practical limitations, such as the positioning of the sensors in an environment with exposure to vibrations and improper illumination, the direct methods are often limited to laboratory use [5]. In order to overcome problems The results gained by using the surface topography data as input information had to be compared to the results in case process variables (e.g., process forces and temperatures) or process parameters (welding speed and tool rotational speed) are used as input data. It also had to be examined whether the use of more complex covariance functions, such as compositional structures introduced by Duvenaud et al. [18] (hereafter called additive GP or Add) or the spectral mixture (SM) covariance function introduced by Wilson et al. [19], leads to an improvement over simpler covariance functions, such as the RBF, the rational quadratic (RQ), or the Matérn 5/2 (Mat 5/2) covariance function. The fundamentals of the GPR are described in detail in Rasmussen et al. [9] and are briefly presented in Appendix B of the present work.

Welding Experiments
The experiments were performed on a four-axis milling center adapted for FSW. Each weld had a length of 205 mm and combined two sheets of the aluminum alloy EN AW-6082-T6 in the butt joint configuration. The chemical composition of the used material was specified by the selected supplier Bikar Metalle GmbH (Bad Berleburg, Germany), as listed in Table 1. The designation "T6" implies that the material was solution heat-treated and then artificially aged [20]. Each individual sheet had a dimension of 325 mm × 88.5 mm. The sheet thickness t was 4 mm, as shown in Figure 1. examined whether the use of more complex covariance functions, such as compositional structures introduced by Duvenaud et al. [18] (hereafter called additive GP or Add) or the spectral mixture (SM) covariance function introduced by Wilson et al. [19], leads to an improvement over simpler covariance functions, such as the RBF, the rational quadratic (RQ), or the Matérn 5/2 (Mat 5/2) covariance function. The fundamentals of the GPR are described in detail in Rasmussen et al. [9] and are briefly presented in Appendix B of the present work.

Welding Experiments
The experiments were performed on a four-axis milling center adapted for FSW. Each weld had a length of 205 mm and combined two sheets of the aluminum alloy EN AW-6082-T6 in the butt joint configuration. The chemical composition of the used material was specified by the selected supplier Bikar Metalle GmbH (Bad Berleburg, Germany), as listed in Table 1. The designation "T6" implies that the material was solution heat-treated and then artificially aged [20]. Table 1. Chemical composition of the used material EN AW-6082-T6 in %, which was reported by the selected material supplier, Bikar Metalle GmbH (Bad Berleburg, Germany).

Si
Fe  The process forces in three spatial directions Fx, Fy, and Fz, and the spindle torque Mz were recorded with a sampling rate of 9.6 kHz by a dynamometer, which is described in more detail in Krutzlinger et al. [21]. The temperatures at the tool shoulder TS and the tool probe TP were measured by thermocouples with a sampling rate of 220 Hz. The temperature measuring system was based on the one described by Costanzi et al. [22]. The accelerations ax, ay, and az in three spatial directions with a sampling rate of 20 kHz were acquired by an acceleration sensor type 8762A50 of Kistler Instrumente GmbH (Winterthur, Switzerland). The accelerometer was positioned 20 mm away from the immersion point of the welding tool during the experiments. A two-piece tool with a concave shoulder with a radius rS of 7 mm and a conical probe with a radius rP of 3 mm was utilized. The probe The process forces in three spatial directions F x , F y , and F z , and the spindle torque M z were recorded with a sampling rate of 9.6 kHz by a dynamometer, which is described in more detail in Krutzlinger et al. [21]. The temperatures at the tool shoulder T S and the tool probe T P were measured by thermocouples with a sampling rate of 220 Hz. The temperature measuring system was based on the one described by Costanzi et al. [22]. The accelerations a x , a y , and a z in three spatial directions with a sampling rate of 20 kHz were acquired by an acceleration sensor type 8762A50 of Kistler Instrumente GmbH (Winterthur, Switzerland). The accelerometer was positioned 20 mm away from the immersion point of the welding tool during the experiments. A two-piece tool with a concave shoulder with a radius r S of 7 mm and a conical probe with a radius r P of 3 mm was utilized. The probe had a M6 thread and three flats. The tool geometry and the most important dimensions are presented in Figure 2 and Table 2   The experiments were performed in position-controlled operation with an immersion depth of 0.1 mm and a tool tilt angle of 2 • . The dwell time at the immersion point was one second. The welding speed v s and the tool rotational speed n (r/min rate) were modified. The examined welding speeds were 500 mm/min, 1000 mm/min and 1500 mm/min. As high welding speeds are becoming increasingly important for industrial applications, especially in the context of electromobility [24], welding speeds of up to 1500 mm/min were applied The ratio between the tool rotational speed and the welding speed n/v s was varied over a wide interval from 1 mm −1 to 7 mm −1 . Furthermore, the tool rotational speed did not exceed 5000 min −1 . Exceeding these boundaries could have damaged the welding tool or the measuring equipment. To generate a sufficient amount of data, the tool rotational speed was adjusted in steps of 200 min −1 within the mentioned boundaries, which resulted in an experimental design totaling 54 experiments. Table A1 in Appendix A shows the process parameters applied in each welding process. Figure 1 displays the areas of removal of the four slices (a)-(d) for the tensile specimens. Since four tensile specimens were taken from each of the 54 manufactured welds, a total of 216 tensile specimens were available. In order not to change the weld seam surface, the tensile specimens were prepared to the correct geometry for the tensile tests after scanning the surface topography of the 216 slices. The dimensions of the tensile specimens are illustrated in Figure 3a. The topography of the welds was examined using a three-dimensional profilometer VR-3100 (Keyence Deutschland GmbH, Neu-Isenburg, Germany). Thereby, white LEDs projected light from two places onto the weld and the reflected light was measured by a CMOS sensor. The smallest measurable difference in the z-direction, as shown in Figure 1, was 1 µm. The sheet surface was always defined as the zero height. The distance between the individual topography points in the x-y-plane was approximately 24 µm. Consequently, a total of about 470,000 topography points were generated for the area (A), as shown in Figure 3a, containing the weld seam on the 15-mm-wide tensile specimens. Consequently, a total of about 470,000 topography points were generated for the area (A), as shown in Figure 3a, containing the weld seam on the 15-mm-wide tensile specimens. The key indicators to quantify the flash formation and the weld seam width were calculated by using area (A), the key indicators for the seam underfill were specified by using area (B), and the arc texture formation was characterized along the weld centerline (C), as shown in Figure 3b,c. Figure 3b schematically shows the flash height f, the seam underfill u and the weld seam width w for a section of the weld surface. The weld seam width w was defined as the distance between the two peaks of the flash formation on the advancing side (AS) and on the retreating side (RS). Due to the distance of the topography points of approximately 24 µm, there were 625 sections of the weld's topography for each of the 15-mm-wide tensile specimens. From the corresponding 625 values for the flash height f, the seam underfill u, and the weld seam width w, the mean values for the flash height fm and the seam underfill um, as well as the standard deviations of the flash height Sf, of the seam underfill Su, and of the weld seam width Sw were calculated. Figure 3c schematically shows the topography along the weld centerline (C). Due to the seam underfill, the topography along the centerline is usually below the sheet surface. The number of local valleys and local peaks along the centerline was counted (ncount) and compared with the theoretical number (ntheoret), which leads to the ratio rarc: The theoretical number ntheoret was calculated using the tool rotational speed n, the welding speed vs, and the width of the tensile specimen, which was 15 mm: In addition, for each tensile specimen the standard deviation Sd of the differences darc between the local valleys and the subsequent local peaks were calculated along the 15-mm-long centerline (C)  The key indicators to quantify the flash formation and the weld seam width were calculated by using area (A), the key indicators for the seam underfill were specified by using area (B), and the arc texture formation was characterized along the weld centerline (C), as shown in Figure 3b,c. Figure 3b schematically shows the flash height f, the seam underfill u and the weld seam width w for a section of the weld surface. The weld seam width w was defined as the distance between the two peaks of the flash formation on the advancing side (AS) and on the retreating side (RS). Due to the distance of the topography points of approximately 24 µm, there were 625 sections of the weld's topography for each of the 15-mm-wide tensile specimens. From the corresponding 625 values for the flash height f, the seam underfill u, and the weld seam width w, the mean values for the flash height f m and the seam underfill u m , as well as the standard deviations of the flash height S f , of the seam underfill S u , and of the weld seam width S w were calculated. Figure 3c schematically shows the topography along the weld centerline (C). Due to the seam underfill, the topography along the centerline is usually below the sheet surface. The number of local valleys and local peaks along the centerline was counted (n count ) and compared with the theoretical number (n theoret ), which leads to the ratio r arc : The theoretical number n theoret was calculated using the tool rotational speed n, the welding speed v s , and the width of the tensile specimen, which was 15 mm: In addition, for each tensile specimen the standard deviation S d of the differences d arc between the local valleys and the subsequent local peaks were calculated along the 15-mm-long centerline (C) of the tensile specimens. The peak material volume V mp [25] was determined for area (B) of each tensile specimen. In a previous study it was found that, by employing the peak material volume V mp , the surface galling of the weld can be quantified [15]. The eight topography indicators utilized to quantify the 15-mm-long weld surface segment on the tensile specimens are summarized in Table 3. These values were later used as input variables for the Gaussian process regression model to predict the ultimate tensile strength based on the weld topography. The signals of the nine different recorded process variables (F x , F y , F z , M z , T P , T S , a x , a y , a z ) were filtered, cut, and assigned to the corresponding weld segments of the tensile specimens. Afterwards, the following ten statistical values were calculated for each process variable corresponding to the 216 weld segments: arithmetic mean, maximum, minimum, median, root mean square (RMS), variance, kurtosis, skewness, highest amplitude in the frequency spectrum after performing a fast Fourier transform, and the span between the maximum and the minimum signal value of each segment. Thus, a total of 90 different features (nine process variables times ten statistical values) were available for each of the 216 tensile specimens. Some of these values were later provided as inputs for the Gaussian process regression model to predict the ultimate tensile strength based on the process variables.
A period of 7.5 weeks was scheduled between the welding process and the tensile tests. Based on the findings of Brenner et al. [26], it was assumed that the metallurgical transformations were completed after this period. For the tensile test, a Z050 AllroundLine material testing machine (ZwickRoell GmbH & Co. KG, Ulm, Germany) was utilized. In addition to the 216 tensile specimens for the welds, ten tensile tests were conducted on specimens with the base material for reference. The geometry of the tensile specimens corresponded to the specifications of DIN 50125, Form E [27]. All tensile tests were performed according to the standards DIN EN ISO 4136 [28] and DIN EN ISO 6892-1 [29]. According to the recommendation of the standard DIN EN ISO 6892-1 [29], the test speed was set to 0.0067 1/s to determine the ultimate tensile strength.
Metallographic specimens were prepared to inspect the welds for internal defects. After taking the samples for metallography from the welded parts, as shown in Figure 1, they were embedded in an epoxy resin, ground to a fineness of P1200, polished with a 3 µm diamond suspension, and then finely polished with colloidal silica. Finally, the samples were etched with Kroll's etchant, which is described in Vander Voort [30].
The values for the eight surface topography indicators, as shown in Table 3, as well as the ultimate tensile strengths of the 216 tensile specimens, are given in the Supplementary Materials to the article.

Application of the Gaussian Process Regression
For the present work, the Gaussian process regression model, as described in Appendix B, was applied. The experimental data were stored in a D × q design matrix X, where q represented the number of observations and D corresponded to the total number of features which are, for example, the mean flash height f m , the mean z-force F z,m or the tool rotational speed n. The mean function m(x) was set to zero, which is a common choice when modeling with Gaussian processes [9]. As a consequence, the data were normalized to mean zero and to variance one before performing the regression. The data were divided into 80% training data and 20% test data. The distribution of the data between the two data sets was random. Following Rasmussen et al. [9], a five-fold cross-validation was applied.
Five different covariance functions were tested: RBF, RQ, Matérn 5/2, additive GPs, and SM. The covariance functions are described in Appendix B. For the RBF, the RQ, and the Matérn 5/2 covariance function, the automatic relevance determination (ARD) variant was used (see Equation (A19) in Appendix B). For the application of the additive GP according to Duvenaud et al. [18], it was necessary to define the base covariance function and the maximum order up to which calculations should be performed. The RBF covariance function was employed, and the maximum order was set to the maximum number of dimensions D. For the use of the SM covariance function as described by Wilson et al. [19], the choice of an integer parameter Q for the maximum number of mixture components was necessary. All possible models for a Q from 1 to 20 were calculated and the model with the lowest mean absolute error (MAE) was chosen. In order to learn sensible model parameters, gradient-based optimization of the logarithmic marginal likelihood was performed. To avoid getting stuck in local optima, the process of model fitting was repeated ten times. In each iteration, the initial values of the hyperparameters were randomly set between e 0 and e 10 , and afterwards, the model with the lowest MAE for predicting the ultimate tensile strength of the specimens contained in the test data set was applied.
The performance measure ultimately utilized to evaluate the deviation between the predicted and the true ultimate tensile strength was the Pearson correlation coefficient (PCC) [31]. Using the PCC proved to be the best way to compare the results obtained with the different input data for the model and the different covariance functions that were employed. As a benchmark for the GPR, a multi-linear regression (MLR) [31] was conducted using the same input data, respectively. In the MLR, a division of the data set into 80% training data and 20% test data was implemented as for the GPR by using a fivefold cross-validation.
The computations were performed on the CPU, which was an Intel ® Core™ i7-6700HQ CPU at 2.60 GHz. The installed RAM was 16 GB.

Results
Training the algorithm and predicting the ultimate tensile strength were performed separately for the three different welding speeds of 500, 1000, and 1500 mm/min, as well as for the combined data using all three welding speeds. In the following paragraphs, not all achieved results with all five employed covariance functions are given. Instead, only the best result and the corresponding covariance function are presented. The mean of the ten reference tensile specimens for their measured ultimate tensile strength was 332.97 MPa at a standard deviation of 1.15 MPa.

Prediction of the Ultimate Tensile Strength Using Surface Topography Data
All eight surface topography indicators (f m , S f , u m , S u , S w , r arc , S d , V mp ), described in Section 2.1, were used as input variables for the GPR model. The input matrix X thus had eight dimensions d in total. In the first step, the data from all 216 conducted tensile tests were utilized. Some outliers with significantly lower ultimate tensile strengths or very strong flash formation, which occurred especially at very high or very low tool rotational speeds n, were also included in the data. The results are displayed in Table 4. Overall, in the five-fold cross-validation, a mean PCC between the true and the predicted ultimate tensile strength of 0.76 was achieved when the data from all three welding speeds were taken into account. The standard deviation of the five values was 0.17. This result was achieved when using the SM covariance function. The total computation time was 6953 s. This was the time for the training and the testing of the five-fold cross-validation. As described in the previous chapter, for each part of the cross-validation in another nested inner loop, a ten-fold calculation with different initial hyperparameters was performed. The highest mean PCC of 0.87 was achieved at a welding speed of 500 mm/min. However, no pattern could be identified as to why different covariance functions led to the highest mean PCC at the different welding speeds. The differences for the achieved PCCs for the different covariance functions were mostly small. For example, in Figure 4, the comparisons for the PCCs means and standard deviations between the true and predicted ultimate tensile strength when using the data collected at all three welding speeds are displayed. As already presented in Table 4, the best result was achieved when using the SM covariance function. However, in Figure 4, it becomes evident that the PCCs were only slightly lower when using other covariance functions, but the computation time decreased considerably. Further investigations also showed that the calculation with the SM covariance function required the longest computing time. The results when using the MLR were significantly worse than when using the GPR throughout the entire work, which confirmed the results of Verma et al. [8].  However, no pattern could be identified as to why different covariance functions led to the highest mean PCC at the different welding speeds. The differences for the achieved PCCs for the different covariance functions were mostly small. For example, in Figure 4, the comparisons for the PCCs means and standard deviations between the true and predicted ultimate tensile strength when using the data collected at all three welding speeds are displayed. As already presented in Table 4, the best result was achieved when using the SM covariance function. However, in Figure 4, it becomes evident that the PCCs were only slightly lower when using other covariance functions, but the computation time decreased considerably. Further investigations also showed that the calculation with the SM covariance function required the longest computing time. The results when using the MLR were significantly worse than when using the GPR throughout the entire work, which confirmed the results of Verma et al. [8].   Figure 5b, it is demonstrated that the three outliers with an ultimate tensile strength of less than 175 MPa were also very well predicted. In the result with the lowest PCC of 0.53, the two outliers with an ultimate tensile strength below 175 MPa were not well predicted. At extremely low or high tool rotational speeds, internal weld defects, such as tunnel errors, occurred. These considerably reduce the ultimate tensile    Figure 5b, it is demonstrated that the three outliers with an ultimate tensile strength of less than 175 MPa were also very well predicted. In the result with the lowest PCC of 0.53, the two outliers with an ultimate tensile strength below 175 MPa were not well predicted. At extremely low or high tool rotational speeds, internal weld defects, such as tunnel errors, occurred. These considerably reduce the ultimate tensile strength and cannot be detected via the weld seam topography. This makes it difficult to predict the ultimate tensile strength and sometimes leads to low PCCs, as shown in Figure 5a. In the next step, some outliers were removed from the data set. These were the welds from experiments no. 1, 2, 3, 4, 5, 35, 36, 53 and 54, which were all experiments with extremely low or high tool rotational speeds, as shown in Table A1 in Appendix A. Extreme tool rotational speeds, as used in these experiments, lead to tunnel errors and other irregularities in the welds and are usually not employed in industrial applications. Table 5 shows that the PCCs significantly improved as a consequence. Using the data collected at all three welding speeds, the SM covariance function yielded a mean PCC of 0.96. The results presented in Tables 4 and 5 also demonstrate that the mean values for the PCC, when considering the individual welding speeds, tended to decrease as the welding speed increased. This is probably due to the fact that process stability and regularity decrease at higher welding speeds, making it more difficult to predict the ultimate tensile strength. The differences in the results between the various covariance functions were again small, as before, in the investigation with providing the outliers at extreme tool rotational speeds to the model. It was noticeable, however, that the SM covariance function again led to the best result when using all the data.  Figure 6 shows the worst and best results from the five-fold cross-validation when using the data from all three performed welding speeds. The five results ranged from a PCC of 0.95 to a PCC of 0.98, which was assessed as a very high degree of reproducibility. The PCC standard deviation of the five calculations was 0.01, as shown in Table 5. In the next step, some outliers were removed from the data set. These were the welds from experiments no. 1, 2, 3, 4, 5, 35, 36, 53 and 54, which were all experiments with extremely low or high tool rotational speeds, as shown in Table A1 in Appendix A. Extreme tool rotational speeds, as used in these experiments, lead to tunnel errors and other irregularities in the welds and are usually not employed in industrial applications. Table 5 shows that the PCCs significantly improved as a consequence. Using the data collected at all three welding speeds, the SM covariance function yielded a mean PCC of 0.96. The results presented in Tables 4 and 5 also demonstrate that the mean values for the PCC, when considering the individual welding speeds, tended to decrease as the welding speed increased. This is probably due to the fact that process stability and regularity decrease at higher welding speeds, making it more difficult to predict the ultimate tensile strength. The differences in the results between the various covariance functions were again small, as before, in the investigation with providing the outliers at extreme tool rotational speeds to the model. It was noticeable, however, that the SM covariance function again led to the best result when using all the data.  Figure 6 shows the worst and best results from the five-fold cross-validation when using the data from all three performed welding speeds. The five results ranged from a PCC of 0.95 to a PCC of 0.98, which was assessed as a very high degree of reproducibility. The PCC standard deviation of the five calculations was 0.01, as shown in Table 5. One advantage of using GPR is that it can specify a confidence interval for the prediction. This can be observed in Figure 7. For better visualization, only the data from the calculation with a welding speed vs of 500 mm/min and the Matérn 5/2 covariance function were used to generate Figure 7. A mean PCC of 0.94 and a standard deviation of 0.05 were achieved in this way (see Table 5). In addition to the mean values of the predictions, the 95% confidence intervals are also given for the test samples. One advantage of using GPR is that it can specify a confidence interval for the prediction. This can be observed in Figure 7. For better visualization, only the data from the calculation with a welding speed v s of 500 mm/min and the Matérn 5/2 covariance function were used to generate Figure 7. A mean PCC of 0.94 and a standard deviation of 0.05 were achieved in this way (see Table 5). In addition to the mean values of the predictions, the 95% confidence intervals are also given for the test samples. Figure 7a,b show, again, the results with the lowest and highest PCC. It becomes clear that almost all true values are within the 95% confidence interval of the prediction. Especially for safety-critical applications, such as in the aerospace industry or electromobility, it might be interesting to provide a confidence interval for each weld segment within which the ultimate tensile strength is located with a certain probability.
Overall, it was proven that predicting the ultimate tensile strength in friction stir welding is possible when exclusively using surface topography information. However, there are also limits: the higher the welding speed is, the more erroneous the prediction. The reason for this is probably that the irregularity of the FSW process increases as the welding speed rises. Additionally, individual outliers in the measured data, for example, due to tunnel errors or sporadic irregularities in the recording of the surface topography, have reduced the achieved PCCs. By removing these outliers, which have always occurred at extreme tool rotational speeds that are of minor importance for industrial applications, the results could be improved significantly. The differences among the correlation coefficients between the true and the predicted values for the various covariance functions were low. It was noticeable, though, that when using the data for all three welding speeds, the SM covariance function always achieved the best results. However, the calculation time was also significantly higher than for the other covariance functions. This must be taken into account in a future application of the Gaussian process regression as an online monitoring system. One advantage of using GPR is that it can specify a confidence interval for the prediction. This can be observed in Figure 7. For better visualization, only the data from the calculation with a welding speed vs of 500 mm/min and the Matérn 5/2 covariance function were used to generate Figure 7. A mean PCC of 0.94 and a standard deviation of 0.05 were achieved in this way (see Table 5). In addition to the mean values of the predictions, the 95% confidence intervals are also given for the test samples.

Predicting Ultimate Tensile Strength Using Process Variables
For reasons of comparability, process variables were used as input data for the GPR model in the next step. As already described in Section 2.1, a total of nine different process variables (e.g., forces or temperatures), were recorded during the welding experiments. From each of these process variables, ten different statistical parameters were calculated. Thus, a total of 90 different features for the process variables were available for each of the 216 tensile specimens. In order to reduce the computing time, only the following six input dimensions were provided as input data for the model: the mean spindle torque M z , the mean F y -force, the mean F z -force and the root mean square (RMS) values of the accelerations in x-, y-, and z-direction. These six input dimensions were determined based on a simple statistical analysis to assess which of the 90 different features have the highest correlation with the ultimate tensile strength.
The results in Table 6 were generated using all 216 tensile specimens from the 54 welding experiments. Outliers in the measurement data were not removed. The results listed in Table 6 demonstrate that the mean correlation coefficient PCC between the true and the predicted ultimate tensile strength was always above 0.90. Taking into account the data of all welding speeds, the mean PCC was 0.99 with a standard deviation of 0.01. Again, the best result for all welding speeds was generated using the SM covariance function. The significantly higher computation time of 6374 s, compared to the other results, which are listed in Table 6, resulted from the bigger data set and the use of the more complex SM covariance function. The quality of the prediction when using process variables thus exceeded the result when using the topography data as input variable. Furthermore, the model is less sensitive to outliers and therefore more robust when process-related features are integrated into the model.  Overall, the result for predicting the ultimate tensile strengths when evaluating the process variables exceeded the performance when using the topography data as the input variables. One explanation for this could be that the process variables can better represent the occurrence of internal defects, such as tunnel errors, than the surface topography data. Monitoring the surface topography can therefore not completely replace monitoring the process variables. Nevertheless, monitoring the surface topography can be a useful addition to an online monitoring system.

Predicting the Ultimate Tensile Strength Using Process Parameters
Finally, it was investigated which correlation coefficients PCC are achieved when using the two varied process parameters of welding speed and tool rotational speed as input information. When using all three welding speeds, both the welding speed and the tool rotational speed were utilized as input information. If the three welding speeds were considered individually, only the tool rotational speed was used as input information, so that only one input dimension d was provided. Again, all existing data were employed and outliers with a significantly lower ultimate tensile strength were not removed. Verma et al. [8] also used the two process variables of welding speed and tool rotational speed as input variables (see Section 1), whereby the six test samples yielded a correlation coefficient of 0.97 between the true and the predicted ultimate tensile strength when using the GPR with an RBF covariance function.
The results of the present work are listed in Table 7. The mean PCC from the five-fold crossvalidation between the true and the predicted ultimate tensile strength was 0.99. This corresponds to the same result as when providing the process variables as input information. Additionally, the Overall, the result for predicting the ultimate tensile strengths when evaluating the process variables exceeded the performance when using the topography data as the input variables. One explanation for this could be that the process variables can better represent the occurrence of internal defects, such as tunnel errors, than the surface topography data. Monitoring the surface topography can therefore not completely replace monitoring the process variables. Nevertheless, monitoring the surface topography can be a useful addition to an online monitoring system.

Predicting the Ultimate Tensile Strength Using Process Parameters
Finally, it was investigated which correlation coefficients PCC are achieved when using the two varied process parameters of welding speed and tool rotational speed as input information. When using all three welding speeds, both the welding speed and the tool rotational speed were utilized as input information. If the three welding speeds were considered individually, only the tool rotational speed was used as input information, so that only one input dimension d was provided. Again, all existing data were employed and outliers with a significantly lower ultimate tensile strength were not removed. Verma et al. [8] also used the two process variables of welding speed and tool rotational speed as input variables (see Section 1), whereby the six test samples yielded a correlation coefficient of 0.97 between the true and the predicted ultimate tensile strength when using the GPR with an RBF covariance function.
The results of the present work are listed in Table 7. The mean PCC from the five-fold cross-validation between the true and the predicted ultimate tensile strength was 0.99. This corresponds to the same result as when providing the process variables as input information. Additionally, the standard deviation was, once again, 0.01. When only using the data generated at a welding speed of 500 mm/min, a mean correlation coefficient PCC of 1.00 was achieved with a standard deviation of zero. When providing the process parameters as input data, it was also noticeable that two of the rather simpler covariance functions, RBF and RQ, led to the best results.    Figure 9 visualizes the results for (a) the lowest and (b) the highest achieved correlation between the true and the predicted ultimate tensile strength from the five-fold cross-validation when using the SM covariance function. Most of the points are very close to the diagonal, which indicates good predictions.
(a) (b) Figure 9. (a) The lowest and (b) the highest correlation out of the five-fold cross-validation to predict the ultimate tensile strength Rm, taking into account the process parameter data from all three welding speeds vs and using the SM covariance function.
Since the tool rotational speed was the only input dimension when considering the individual welding speeds, the GP can be visualized for these cases. Figure 10 illustrates the mean value function and the 95% confidence interval for the ultimate tensile strength in dependence on the tool rotational speed. Figure 10a shows the graph when using the RBF covariance function and Figure 10b shows the graph for the SM covariance function. A difference between the two diagrams becomes clear when looking closely at the range between 1000 min −1 and 3500 min −1 . In Figure 10a, the width of the confidence interval is nearly constant, whereas in Figure 10b, the width of the confidence interval varies in this range. In the ranges of the tool rotational speed that the model has not yet experienced, the width of the confidence interval increases. It seems plausible that predicting the ultimate tensile strength for unknown tool rotational speeds has a higher uncertainty than for tool rotational speeds that have already been tested. The SM covariance function is more flexible than the RBF covariance function and can probably, therefore, learn this uncertainty profile. Additionally, Figure 10 shows Since the tool rotational speed was the only input dimension when considering the individual welding speeds, the GP can be visualized for these cases. Figure 10 illustrates the mean value function and the 95% confidence interval for the ultimate tensile strength in dependence on the tool rotational speed. Figure 10a shows the graph when using the RBF covariance function and Figure 10b shows the graph for the SM covariance function. A difference between the two diagrams becomes clear when looking closely at the range between 1000 min −1 and 3500 min −1 . In Figure 10a, the width of the confidence interval is nearly constant, whereas in Figure 10b, the width of the confidence interval varies in this range. In the ranges of the tool rotational speed that the model has not yet experienced, the width of the confidence interval increases. It seems plausible that predicting the ultimate tensile strength for unknown tool rotational speeds has a higher uncertainty than for tool rotational speeds that have already been tested. The SM covariance function is more flexible than the RBF covariance function and can probably, therefore, learn this uncertainty profile. Additionally, Figure 10 shows that the GPR models are able to reproduce the lower ultimate tensile strength at tool rotational speeds below 1000 min −1 , which was attributed to internal defects, such as tunnel errors.

Discussion
The conducted investigations demonstrate that it can be beneficial to evaluate the topography data of friction stir welds and this provides an additional possibility for online monitoring during friction stir welding for predicting the weld's ultimate tensile strength and the acceptability of the welded parts. Especially when no extreme tool rotational speeds are applied, the prediction of the ultimate tensile strength based on the surface topography shows good results. For safety-critical applications, such as in the aerospace industry or in electromobility, it might be interesting to monitor the weld's topography and supplement other usually applied monitoring systems, such as a force or torque monitoring system. However, when utilizing the process variables or the process parameters, the results for predicting the ultimate tensile strength were better than for the surface topography data. Additionally, the presence of typical defects of friction stir welds, such as tunnel errors, kissing bonds, or the lack of penetration defect probably cannot be detected by a topography monitoring system. Therefore, other monitoring systems cannot be completely replaced.
Whether there are pronounced correlations between the surface topography and weld defects, such as tunnel errors or kissing bonds, or between the surface topography and the microstructure of the weld, should be investigated more closely in future research work. Since the formation of the surface topography is presumably related to the welding temperature, which in turn is also connected to the weld's microstructure [32], correlations are conceivable, in particular, between the surface topography and the microstructure. If significant correlations exist here, these could also be exploited to improve the monitoring of the FSW process.
A disadvantage of using the process parameters is assumed to be the lower degree of transferability of the trained model when applied to other welding tasks. However, further investigations are necessary to confirm the better transferability of the trained GPR model to other welding tasks when using the surface topography or process variables as input data. A disadvantage of using process variables compared to topography data is that they must be recorded during the process and cannot be determined subsequently. The weld's topography data can be determined offline at any time after the welding process. However, an advantage of the process variables Overall, predicting the ultimate tensile strength when utilizing the process parameters led to comparable PCCs, as when using the process variables.

Discussion
The conducted investigations demonstrate that it can be beneficial to evaluate the topography data of friction stir welds and this provides an additional possibility for online monitoring during friction stir welding for predicting the weld's ultimate tensile strength and the acceptability of the welded parts. Especially when no extreme tool rotational speeds are applied, the prediction of the ultimate tensile strength based on the surface topography shows good results. For safety-critical applications, such as in the aerospace industry or in electromobility, it might be interesting to monitor the weld's topography and supplement other usually applied monitoring systems, such as a force or torque monitoring system. However, when utilizing the process variables or the process parameters, the results for predicting the ultimate tensile strength were better than for the surface topography data. Additionally, the presence of typical defects of friction stir welds, such as tunnel errors, kissing bonds, or the lack of penetration defect probably cannot be detected by a topography monitoring system. Therefore, other monitoring systems cannot be completely replaced.
Whether there are pronounced correlations between the surface topography and weld defects, such as tunnel errors or kissing bonds, or between the surface topography and the microstructure of the weld, should be investigated more closely in future research work. Since the formation of the surface topography is presumably related to the welding temperature, which in turn is also connected to the weld's microstructure [32], correlations are conceivable, in particular, between the surface topography and the microstructure. If significant correlations exist here, these could also be exploited to improve the monitoring of the FSW process.
A disadvantage of using the process parameters is assumed to be the lower degree of transferability of the trained model when applied to other welding tasks. However, further investigations are necessary to confirm the better transferability of the trained GPR model to other welding tasks when using the surface topography or process variables as input data. A disadvantage of using process variables compared to topography data is that they must be recorded during the process and cannot be determined subsequently. The weld's topography data can be determined offline at any time after the welding process. However, an advantage of the process variables compared to the topography data is that the required sensors (e.g., for recording forces or the spindle torque), are often already integrated in FSW systems. To record the surface topography of the welds, an additional sensor is necessary [14].
It was also remarkable that, when considering the data from all welding speeds, the SM covariance function always achieved the best result, as shown in Tables 4-7. It is therefore suspected that the more complex covariance functions (Add and SM) perform better than the simpler covariance functions (RBF, RQ, and Matérn 5/2) if there are more training data available and there are more input dimensions provided. However, this assumption also needs to be confirmed by further investigations. Since the computing time was by far the highest when using the SM covariance function, its online monitoring capability, which requires low computing times, must be evaluated in future research.

Conclusions
In the present work, a novel approach is proposed that estimates the ultimate tensile strength through a cost-efficient NDT method that is based on evaluating weld surface topography. For this purpose, a total of 54 welding experiments, which were performed at three different welding speeds, were evaluated. Four tensile samples were taken from each weld. By using Gaussian process regression and testing five different covariance functions, the ultimate tensile strength of friction stir welds was non-destructively assessed. Surface topography indicators were used as input data and the results were compared to cases in which process variables or process parameters were utilized as input data for the model. The achieved correlation coefficients PCC between the true and the predicted ultimate tensile strength when using the data from all welding speeds are summarized in Table 8. By removing some outliers at very low or very high tool rotational speeds, the mean correlation coefficient PCC when providing the surface topography data to the model was improved to 0.96 with a standard deviation of 0.01.
The most important conclusions were: • The Gaussian process regression is a powerful approach to non-destructively predict ultimate tensile strength through data evaluation. The uncertainty of the prediction can be quantified, and a confidence interval can be specified within which the ultimate tensile strength is located with a certain probability.

•
It is possible to predict the ultimate tensile strength of friction stir welds by evaluating the surface topography through Gaussian process regression. This is especially valid for low welding speeds and when extremely low or high tool rotational speeds are not employed.

•
The correlation coefficients for the prediction of the ultimate tensile strength by using the process variables or the process parameters were even higher compared to when using the surface topography data as inputs to the model. • The differences in the PCCs for the various covariance functions used were low. However, when using the data from all investigated welding speeds, the spectral mixture covariance function according to Wilson et al. [19], always yielded the best results.
Supplementary Materials: The following are available online at http://www.mdpi.com/2504-4494/4/3/75/s1, Table S1: Values for the eight surface topography indicators listed in Table 3 and the ultimate tensile strengths for all 216 tensile specimens; the location of the individual slices a, b, c and d taken from the weld is specified in Figure 1.

Conflicts of Interest:
The authors declare that they have no conflict of interest.

Appendix B Fundamentals of the Gaussian Process Regression
A Gaussian process (GP) is a distribution over functions. Formally, it is a collection of random variables that follow a joint Gaussian distribution and which are completely specified by the GP's mean function and covariance function [9]: f (x)~GP(m(x), k(x, x )) (A1) whereby the mean function m(x) and the covariance function k(x, x ) of a real process f (x) are defined by [9]: and Consequently, any input x has a mean, which can be evaluated by the mean function. Furthermore, every two inputs x and x' have a common covariance that can be evaluated by the covariance function. If x' is equal to x, the covariance function returns the variance of x.
Assume a data set D with q observations: is given, where x is a D-dimensional input vector and y is a scalar output variable. All available data are aggregated in the D × q matrix X. The GP regression model with noise is given by: where f (X) is a GP over the inputs X. The observed values y differ from the function values f (X) by additive noise ε that follows an independent, identically distributed Gaussian distribution with zero mean and variance σ 2 ε [9]. The data set D is divided into a training data set and a test data set. The training data are used for learning the GP model parameters. The already known training data are denoted by y at the inputs X and the unknown test data are denoted by f * at the points X * . When modelling with GPs, two main steps have to be followed [9]. First, a joint distribution over all quantities of interest (y, f * ) T , also called prior distribution, is defined as: y f * ∼ N m(X) m(X * ) , K(X, X) + σ 2 ε I K(X, X * ) K(X * , X) K(X * , X * ) (A6) whereby [9]: K(X, X) + σ 2 ε I = Cov(y, y) (A9) K(X * , X) = Cov(f * , y) (A10) K(X, X * ) = Cov(y, f * ) (A11) K(X * , X * ) = Cov(f * , f * ) Second, after observing data, the observations can be combined with the prior distribution using Bayes' rule to determine the posterior distribution p of f * , which is, again, Gaussian [9]: p(f * X, y, X * ) = (f * (f * , y, X * ), Cov(f * X, y, X * )) where D is the dimension of the input space, and σ 2 n is the variance of the n-th order. In particular, the D-th order additive covariance function is a product of each dimension's covariance function [18]: In the case where each base covariance function is an RBF covariance function, the D-th order term corresponds to the multivariate RBF-ARD covariance function (see Equation (A19)). The only design choice for additive covariance functions is the selection of the one-dimensional base covariance functions for each input dimension d.
Finally, a closed form covariance function for automatic pattern discovery and extrapolation, introduced by Wilson et al. [19] and called spectral mixture (SM) covariance function, was utilized in the present work. This covariance function is derived by modelling the spectral density of covariance functions (its Fourier transform) using scale-location mixtures of Gaussians.