Optimization of friction stir weld joint quality using a meshfree fully-coupled thermo-mechanics approach

: There is currently a need for an efﬁcient numerical optimization strategy for the quality of friction stir welded (FSW) joints. However, due to the computational complexity of the multi-physics problem, process parameter optimization has been a goal that is out of reach of the current state-of-the-art simulation codes. In this work, we describe an advanced meshfree computational framework that can be used to determine numerically optimized process parameters while minimizing defects in the friction stir weld zone. The simulation code, SPHriction-3D, uses an innovative parallelization strategy on the graphics processing unit (GPU). This approach allows determination of optimal parameters faster than is possible with costly laboratory testing. The meshfree strategy is ﬁrstly outlined. Then, a novel metric is proposed that automatically evaluates the presence and severity of defects in the weld zone. Next, the code is validated against a set of experimental results for 1 / 2 ” AA6061-T6 butt joint FSW joints. Finally, the code is used to determine the optimal advancing speed and rpm while minimizing defect volume based on the proposed defect metric.


Introduction
The friction stir welding (FSW) process is quickly becoming the joining method of choice for aluminum alloys.The solid-state process is able to form high-fidelity welds at excellent throughput rates.Because of the solid-state nature of the method, many types of defects are avoided that are associated with melting and solidification in conventional fusion welding processes.Nevertheless, depending on the process parameters, FSW joints can have volumetric defects that are detrimental to the ultimate strength of the joint.Determining the FSW process parameters that will prevent defects is a difficult task-one that is typically performed using trial and error experimentation.These experiments are costly and time-consuming.Given the inherent complexity of the friction stir welding process, the determination of optimal process parameters is an important field of research.Many researchers [1][2][3][4][5][6][7] have worked on determining optimal process parameters.These groups have focused on optimizing joint strength via experimental work.The general conclusion from these groups is that the rotational and advancing speeds play the most important role in the final weld quality.
Tamjidi et al. [8] used multi-objective biogeography-based optimization to optimize the process parameters for dissimilar butt joints composed of AA6061-T6 and AA7075-T6.Their approach allowed them to find the optimal tool speed, tilt, and offset based on a series of experiments in the lab.
where W i , W j , and W k are the weights over the n, m, and l integration points in the ξ, η, and ζ isoparametric directions [20].Equation (1) shows that as | = J | diverges away from 1.0, the volume integration becomes less precise.In FSW, the deformation is excessive, and in order to prevent breakdown of quadrature, it is necessary to re-mesh the problem domain.Schmicker [21] had good success with FEM adaptive re-meshing algorithms in 2D for the rotary friction welding process.Other researchers [16,[22][23][24][25] have focused on coupled Eulerian-Lagrangian (CEL) and arbitrary Lagrangian-Eulerian (ALE) approaches.These methods have advantages over FEM with re-meshing when it comes to dealing with large deformation.However, they are not true Lagrangian approaches, which makes following the material history difficult.Because of this, quantification of material mixing is not feasible in either CEL or ALE.
Since the major issue with simulating large deformation processes is associated with the mesh, an innovative approach is to use a numerical method that does not require a mesh.Such an approach falls into the class of meshfree methods, whereby the set of partial differential equations describing the physics of FSW are solved at a set of material points.There is a vast array of meshfree methods available, as described by Liu et al. [26,27].Some of the more popular meshfree methods are the local radial point interpolation method (LRPIM) [28,29], meshless local Petrov-Galerkin (MLPG) [30], finite point method (FPM) [31], as well as the smoothed particle hydrodynamics method (SPH) [32,33].have recently developed an advanced meshfree framework on the graphics-processing unit (GPU) to simulate the entire FSW process.Their code-SPHriction-3D-is able to predict temperature, process forces/torques, internal and surface defects, tool wear, and residual stresses.
In this work, the meshfree formulation employed in SPHriction-3D is initially outlined.Following that, the code is validated against experimental results for temperature history and defect volume.Next, an automatic defect quantification metric is proposed, which is built on the idea that the material points physically move in the simulation owing to the Lagrangian framework of SPH.This means that a volumetric defect will be captured naturally during the simulation.From the simulation results, the process operating-window is mapped out by considering a maximum admissible defect size.Furthermore, the defect metric is used to form a response surface (RS) that is a function of tool rotation and advancing speed.A steepest descent algorithm is used to determine the optimal process parameters from the RS that minimize the volume of defects.This work presents the first use of a fully-coupled 3D thermo-mechanical numerical approach that minimizes defect volume.Because of the parallel framework on the GPU, the simulation code is able to predict optimal process parameters in an expedient manner, namely within one day, using a multi-GPU computer system as opposed to many weeks with a CPU implementation.The proposed numerical optimization framework is faster and more cost-effective than experiment-based process optimization.

Methodology
In the following section, an outline of the meshfree formation is provided, followed by a brief look at the implementation on the GPU.Then, a defect metric is introduced that can be used to numerically evaluate weld quality.Finally, the optimization strategy is outlined.

Meshfree Approach
The smoothed particle hydrodynamics method was originally developed for astrophysics applications by Gingold and Monaghan [32] as well as independently by Lucy [33].SPH is a true meshfree method that does not require a background mesh.This is in contrast to global space meshfree methods that require a background mesh to perform the spatial integration.
In SPH, spatial integration is performed directly at the nodes, as shown in Figure 1.To calculate the value of a field variable at an ith material point, a weighted sum is used over the jth neighboring material points within its influence domain.Mathematically, the discretized SPH approximation of a field variable, f , at the spatial location, x i α , of the ith material point is: where h is the smoothing length that determines the size of the influence domain of the jth point on point i.The material point mass and density is m j and ρ j , r = |x i α − x j α |, and W(r, h) is an interpolation kernel, often referred to as the smoothing function.In this work, the smoothing function of Yang et al. [44] is used.The sum is taken over the total number (N i ) of j material points within the influence domain of i; these are termed the neighbors of i.Using the SPH formalism, the gradient of a scalar and divergence of a vector are and where ℎ is the smoothing length that determines the size of the influence domain of the th point on point .The material point mass and density is and , = − , and ( , ℎ) is an interpolation kernel, often referred to as the smoothing function.In this work, the smoothing function of Yang et al. [44] is used.The sum is taken over the total number ( ) of material points within the influence domain of ; these are termed the neighbors of i.Using the SPH formalism, the gradient of a scalar and divergence of a vector are and Figure 1.Interpolation in the smoothed particle hydrodynamics method (SPH).Reproduced from [36].
SPHriction-3D solves a set of continuum mechanics equations described by the conservation of mass, momentum, and energy (heat diffusion in this case).An abbreviated set of the partial differential field equations and their discretized counterparts using the SPH approach are listed in Table 1.A summary of the supporting equations and the associated nomenclature is provided in Tables 2 and 3. Full details of the field equations and their SPH formulation can be found in [34,36,37,[40][41][42][43].
Table 1.Field equations and their SPH formulation.

Equation Description
Field Equation SPH Form

Conservation of mass
. Interpolation in the smoothed particle hydrodynamics method (SPH).Reproduced from [36].
SPHriction-3D solves a set of continuum mechanics equations described by the conservation of mass, momentum, and energy (heat diffusion in this case).An abbreviated set of the partial differential field equations and their discretized counterparts using the SPH approach are listed in Table 1.A summary of the supporting equations and the associated nomenclature is provided in Tables 2 and 3. Full details of the field equations and their SPH formulation can be found in [34,36,37,[40][41][42][43].
Table 1.Field equations and their SPH formulation.

Equation Description
Field Equation SPH Form

Equation of state
Equivalent stress (von Mises)

Plastic work heating
.
, where q α ≡ heat f lux due to convection and radiation The Fraser-Kiss-St-Georges (FKS) material model that was developed by Fraser et al. [36] is used here.This model provides an improved flow stress prediction for AA6061-T6 at high temperature and plastic strain levels compared to other commonly used material models such as Johnson-Cook (JC) [45].The material model is a multiplicative combination of strain hardening, (ε p ); thermal softening, Θ(T * ); and strain rate stiffening, Λ .ε, T * , of the form , and The model accounts for the fact that strain rate stiffening is insignificant at room temperature, but becomes more important at high temperature in 6xxx-series aluminum alloys.This phenomenon is not present in the JC model that is very popular for FSW simulation.Furthermore, the model is more precise in the range of plastic strain typical of FSW.The JC model predicts an ever-increasing flow stress with increasing plastic strain, which is not realistic in typical aluminum alloys.The flow stress as a function of plastic strain and temperature is shown in Figure 2a and as a function of strain rate and temperature in Figure 2b.
temperature and plastic strain levels compared to other commonly used material models such as Johnson-Cook (JC) [45].The material model is a multiplicative combination of strain hardening, ( ); thermal softening, ( * ); and strain rate stiffening, ( , * ), of the form , and * = − − (5) The model accounts for the fact that strain rate stiffening is insignificant at room temperature, but becomes more important at high temperature in 6xxx-series aluminum alloys.This phenomenon is not present in the JC model that is very popular for FSW simulation.Furthermore, the model is more precise in the range of plastic strain typical of FSW.The JC model predicts an ever-increasing flow stress with increasing plastic strain, which is not realistic in typical aluminum alloys.The flow stress as a function of plastic strain and temperature is shown in Figure 2a and as a function of strain rate and temperature in Figure 2b.The set of ordinary differential equations that result from the SPH formulation are integrated in time with a second-order explicit time stepping algorithm that is outlined in detail in [36].The mechanical and thermal timestep criteria to assure stability are: Typically, the critical timestep for the thermal problem is one or two orders of magnitude greater than that of the mechanical problem.However, we have found that improved results are obtained by performing consistent temporal integration of the thermal and mechanical problems.This can be explained by the rapidly changing contact condition as the tool rotates and the associated heat generation due to friction.
Plastic deformation is accounted for by using the radial return algorithm, where the body is first assumed to deform elastically, and then a check is performed to see if the calculated stress surpasses the yield.If the material has yielded, the stress is projected back onto the yield surface.Due to the small timestep required for stability, the plastic strain increment can be found by performing a Taylor series expansion with linearization from the current timestep [46].
Considering the difference in material stiffness between the tool and the workpieces, the support base, clamps, and tool can be modeled with rigid finite elements.On the other hand, the The set of ordinary differential equations that result from the SPH formulation are integrated in time with a second-order explicit time stepping algorithm that is outlined in detail in [36].The mechanical and thermal timestep criteria to assure stability are: Typically, the critical timestep for the thermal problem is one or two orders of magnitude greater than that of the mechanical problem.However, we have found that improved results are obtained by performing consistent temporal integration of the thermal and mechanical problems.This can be explained by the rapidly changing contact condition as the tool rotates and the associated heat generation due to friction.
Plastic deformation is accounted for by using the radial return algorithm, where the body is first assumed to deform elastically, and then a check is performed to see if the calculated stress surpasses the yield.If the material has yielded, the stress is projected back onto the yield surface.Due to the small timestep required for stability, the plastic strain increment can be found by performing a Taylor series expansion with linearization from the current timestep [46].
Considering the difference in material stiffness between the tool and the workpieces, the support base, clamps, and tool can be modeled with rigid finite elements.On the other hand, the large deformation in the workpieces requires the use of the SPH material points.Thermal-mechanical contact between SPH and FEM is accomplished with a penalty based method that "pushes" the ith material points out of the jth FEM along its surface normal vector, nFEM , of the contacted FEM.The developed contact force in the normal direction is then where δ and .
δ are the penetration and the penetration rate.The contact stiffness, k ij , and the damping, ζ contact , is given by The force due to friction is based on a modified Coulomb law where the tangential force that acts on the ith material point will be A tangential normal vector, nT , is found by normalizing the projection of the relative velocity between the ith material point and the contact point on the jth finite element onto the plane that passes through the nodes of the finite element.For the simulations reported here, a constant value of 0.5 is used for the coefficient of friction, µ.The contact area is denoted by A c , and is assumed to be a square patch with sides equal to the average spacing, ∆s, of the ith material point.

Parallelization on the GPU
Simulation of all the phases of the friction stir welding process within a multi-physics framework is computationally expensive.To achieve reasonable simulation times, the highly non-linear thermo-mechanical problem must be solved using an advanced parallelization strategy.SPHriction-3D has been developed to run on NVIDIA GPUs using the CUDA Fortran programming language (see Ruetsch and Fatica [47]).
The parallelization strategy used in the code is to assign each material point to a thread on the GPU (similar to a thread on a CPU).The summations in the SPH form are computed on each thread, which is an efficient use of the GPU architecture.Although the clock speed of the thread processors on the GPU are slower than on the CPU, the sheer number of threads and the rate at which they are able to change tasks (context switching) makes the GPU the ideal platform to parallelize the SPH code.The simulation is initialized (all model data, mesh, and material point information) on the CPU.The information is then transferred to the GPU and resides there for the remainder of the simulation.This is only possible in a code that has been entirely programmed to run on the GPU, thus minimizing costly data transfer back and forth from the CPU and the GPU.Various test cases have been run in SPHriction-3D and compared to an equivalent commercial SPH code running on an Intel i7-3630QM processor (Intel Corporation, Santa Clara, CA, USA).The results from these tests have shown that the GPU code is over 20 times faster.Models that previously required weeks of calculation time can now to run in a few hours.This improvement in calculation time makes the concept of finding optimal process parameters numerically a viable option.

Defect Metric
In order to evaluate the effect of different process parameters on the formation of defects in the simulation, a non-dimensional metric should be established.Because of the Lagrangian nature of the simulation code, internal and external free surface areas can be determined and used to evaluate the presence of defects.The metric will compare the pre-weld surface area, A so , to that in the post-weld, A s , based on The pre-weld area is selected to include the region within the weld zone as shown in Figure 3.The area is calculated from A so = 2(L wz W wz ) + 2(h wz W wz ) + π r pin 2 l pin + r pin (11) which includes an approximation for the surface area associated with the hole created by the pin of length l pin and radius r pin .The material points that belong to the free surface are determined via , and Metals 2018, 8, x FOR PEER REVIEW 8 of 23 presence of defects.The metric will compare the pre-weld surface area, , to that in the post-weld, , based on The pre-weld area is selected to include the region within the weld zone as shown in Figure 3.The area is calculated from which includes an approximation for the surface area associated with the hole created by the pin of length and radius .The material points that belong to the free surface are determined via The post-weld area is determined by counting the number of material points that are surface nodes, , in the measuring zone and multiplying by the surface area of the material point (a square patch with side of ∆ ): The number of surface nodes is found by looping over all the elements and counting the elements that are tagged = 1.In this sense, a perfect weld with no defects will have a value of = 1.0 ; this would correspond to a case where = .As the simulation progresses, the defect metric will naturally fall below 1.0.The actual surface area associated with the defects can be found from It is important to note that contains contributions from internal volumetric defects as well as the increase in surface area on the surface of the weld in the weld track.
Once the set of material points that belongs to the free surface is found, a triangulation algorithm is used to create a smooth surface.An example of the triangulated surface for the internal defects is shown in Figure 4a, and for the surface defects (flash and weld track striations) in Figure 4b.Very hot welds with low weld pitches will tend to produce more flash and weld track striations than a cold weld.For this reason, including the area associated with the weld surface in the metric is an important part of our approach.If this area were to be excluded from the calculation, welds with low weld pitches that have no or very little internal volume defects would score better than cold welds with small-to-medium-sized internal defects (voids and worm-holes).The proposed defect metric is a powerful tool to numerically determine the process operating-window as well as the optimal process parameters.The post-weld area is determined by counting the number of material points that are surface nodes, N sur f ace nodes , in the measuring zone and multiplying by the surface area of the material point (a square patch with side of ∆s): A s = ∆s 2 N sur f ace nodes (13) The number of surface nodes is found by looping over all the elements and counting the elements that are tagged Sur f ace node = 1.In this sense, a perfect weld with no defects will have a value of ψ de f ect = 1.0; this would correspond to a case where A s = A so .As the simulation progresses, the defect metric will naturally fall below 1.0.The actual surface area associated with the defects can be found from It is important to note that A sDe f ect contains contributions from internal volumetric defects as well as the increase in surface area on the surface of the weld in the weld track.
Once the set of material points that belongs to the free surface is found, a triangulation algorithm is used to create a smooth surface.An example of the triangulated surface for the internal defects is shown in Figure 4a, and for the surface defects (flash and weld track striations) in Figure 4b.Very hot welds with low weld pitches will tend to produce more flash and weld track striations than a cold weld.For this reason, including the area associated with the weld surface in the metric is an important part of our approach.If this area were to be excluded from the calculation, welds with low weld pitches that have no or very little internal volume defects would score better than cold welds with small-to-medium-sized internal defects (voids and worm-holes).The proposed defect metric is a powerful tool to numerically determine the process operating-window as well as the optimal process parameters.

Optimization Approach
Finding the set of process parameters that minimizes the probability of defects in the weld zone could be accomplished using a variety of different optimization strategies.The approach chosen should be implementable within the framework of the numerical code in order to provide optimized results without the need for visual inspection of the simulation results.The true beauty of using the Lagrangian meshfree code is that all calculations regarding defect size and location are handled automatically in the code.The code simply needs to run through a batch of process parameters and to store the associated defect metric for the different combinations of tool rotation and advancing speed.
Because the code runs on the GPU, a relatively inexpensive computer with four GPUs can simultaneously run four individual simulations.Once a simulation has finished computing a case, the GPU will be available to run another simulation.As soon as enough simulations have been performed to provide an adequate amount of data sets, the optimization routine is then called.In this work, we use a second-order polynomial surface function with eight coefficients to form a response surface.The number of simulations should be at least greater than the number of coefficients to be found.Evidently, more data sets are beneficial to ensure the highest level of accuracy for the response surface.However, running an extensive set of simulations will lead to longer calculation times.Fifteen different parameter sets (shown in Table 4) have been used to form the response surface.A summary of the optimization procedure with a three-GPU system is shown in Figure 5.In our approach, since a simulation model is run on one individual GPU, the performance will increase linearly when increasing the number of GPUs.That is to say, if you run one simulation on the GPU and it takes two hours, running four simulations on four GPUs will also take two hours, leading to a performance increase of four times.The linear improvement in performance is possible since there is no communication between the individual GPUs.This approach can be used as long as the data from the individual simulation models can fit onto the GPU's memory.Current state of the art consumer GPUs (for example the NVIDIA GTX 1080 Ti, Nvidia Corporation, Santa Clara, CA, USA) have up to 11 GB of onboard memory, which allows us to simulate meshfree thermo-mechanical models with over two million material points.The memory model currently employed in SPHriction-3D leads to ~5.5 KB storage requirements per material point.The high storage requirements help to improve the calculation efficiency of the code by saving valuable neighbor information, smoothing, and derivative values.

Optimization Approach
Finding the set of process parameters that minimizes the probability of defects in the weld zone could be accomplished using a variety of different optimization strategies.The approach chosen should be implementable within the framework of the numerical code in order to provide optimized results without the need for visual inspection of the simulation results.The true beauty of using the Lagrangian meshfree code is that all calculations regarding defect size and location are handled automatically in the code.The code simply needs to run through a batch of process parameters and to store the associated defect metric for the different combinations of tool rotation and advancing speed.
Because the code runs on the GPU, a relatively inexpensive computer with four GPUs can simultaneously run four individual simulations.Once a simulation has finished computing a case, the GPU will be available to run another simulation.As soon as enough simulations have been performed to provide an adequate amount of data sets, the optimization routine is then called.In this work, we use a second-order polynomial surface function with eight coefficients to form a response surface.The number of simulations should be at least greater than the number of coefficients to be found.Evidently, more data sets are beneficial to ensure the highest level of accuracy for the response surface.However, running an extensive set of simulations will lead to longer calculation times.Fifteen different parameter sets (shown in Table 4) have been used to form the response surface.A summary of the optimization procedure with a three-GPU system is shown in Figure 5.In our approach, since a simulation model is run on one individual GPU, the performance will increase linearly when increasing the number of GPUs.That is to say, if you run one simulation on the GPU and it takes two hours, running four simulations on four GPUs will also take two hours, leading to a performance increase of four times.The linear improvement in performance is possible since there is no communication between the individual GPUs.This approach can be used as long as the data from the individual simulation models can fit onto the GPU's memory.Current state of the art consumer GPUs (for example the NVIDIA GTX 1080 Ti, Nvidia Corporation, Santa Clara, CA, USA) have up to 11 GB of onboard memory, which allows us to simulate meshfree thermo-mechanical models with over two million material points.The memory model currently employed in SPHriction-3D leads to ~5.5 KB storage requirements per material point.The high storage requirements help to improve the calculation efficiency of the code by saving valuable neighbor information, smoothing, and derivative values.
Our approach leads to a powerful means of numerically determining optimal process parameters within a 24-h period without the need to perform time-consuming and costly experiments.Our approach leads to a powerful means of numerically determining optimal process parameters within a 24-h period without the need to perform time-consuming and costly experiments.

Validation of the Numerical Model
Validation of the material point temperatures as well as defect size and location is required before the simulation code can be used to determine optimal parameters.In this section, temperature history, temperature contours, and defect size will be compared with experimental results.The experimental setup for the butt joint weld of two ½" AA6061-T6 plates is shown in Figure 6.The surface of the workpieces was painted flat black with a high-temperature paint in order to obtain quality images with an infrared camera.The clamping system applied pressure to hold the plates together as well as hold them down on the support base.The material properties used in the simulations for the workpieces and the tool are listed in Tables 5-7.

Validation of the Numerical Model
Validation of the material point temperatures as well as defect size and location is required before the simulation code can be used to determine optimal parameters.In this section, temperature history, temperature contours, and defect size will be compared with experimental results.The experimental setup for the butt joint weld of two 1 /2" AA6061-T6 plates is shown in Figure 6.The surface of the workpieces was painted flat black with a high-temperature paint in order to obtain quality images with an infrared camera.The clamping system applied pressure to hold the plates together as well as hold them down on the support base.The material properties used in the simulations for the workpieces and the tool are listed in Tables 5-7.The validation case parameters were selected to provide a large variation of weld pitches.We will show that the simulation model was able to predict temperature and defect with errors under 10%, thus confirming the robustness of our approach.In Section 4.1, validation of the calculated optimal process parameters will be provided by comparing the predicted defects from simulation to those measured experimentally.

Speed
All three cases used a plunge speed of 38 mm/min and a dwell time of five seconds.The simulation model is shown in Figure 7.The material point spacing was ∆s = 1.27 mm, which provides 10 points through the thickness of the workpieces.Based on the stability criterion, the time step size was 2.07 × 10 −7 s, with timestep factor, CFL = 0.7.The simulation model for the workpieces was 100 mm wide by 150 mm long by 12.7 mm thick.The length of the workpieces was selected to provide enough distance along the length of the weld to attain a state where the change in system energy remained stable.A full transient calculation was performed wherein the tool starts entirely disengaged from the workpieces, then plunges, dwells, advances, and retreats.A good prediction of defects and other material history is only possible when the full sequence of welding events is modeled.The tool used was a tri-flute design with a shoulder diameter of 21.6 mm, upper pin diameter of 11 mm, lower pin diameter of 8.4 mm, and a pin length of 10.8 mm.Three cases were simulated and compared to experimental results.The cases are: 1. 800 rpm, 305 mm/min-0.38weld pitch 2. 800 rpm, 660 mm/min-0.83weld pitch 3. 800 rpm, 1069 mm/min-1.34weld pitch The validation case parameters were selected to provide a large variation of weld pitches.We will show that the simulation model was able to predict temperature and defect with errors under 10%, thus confirming the robustness of our approach.In Section 4.1, validation of the calculated optimal process parameters will be provided by comparing the predicted defects from simulation to those measured experimentally.
All three cases used a plunge speed of 38 mm/min and a dwell time of five seconds.The simulation model is shown in Figure 7.The material point spacing was ∆ = 1.27 mm, which provides 10 points through the thickness of the workpieces.Based on the stability criterion, the time step size was 2.07 × 10 s, with timestep factor, CFL = 0.7.The simulation model for the workpieces was 100 mm wide by 150 mm long by 12.7 mm thick.The length of the workpieces was selected to provide enough distance along the length of the weld to attain a state where the change in system energy remained stable.A full transient calculation was performed wherein the tool starts entirely disengaged from the workpieces, then plunges, dwells, advances, and retreats.A good prediction of defects and other material history is only possible when the full sequence of welding events is modeled.

Temperature History and Contour Validation
Thermocouples were embedded into the workpieces to measure the temperature history in the advancing phase.The thermocouples were used to validate the calculated temperatures in the simulation.The location of the thermocouples is shown in Figure 8a.The temperature history was compared for the 0.38 weld pitch case in Figure 8b, the 0.83 weld pitch case in Figure 8c, and the 1.34 weld pitch case in Figure 8d.We can see good correlation among the temperature histories.As the

Temperature History and Contour Validation
Thermocouples were embedded into the workpieces to measure the temperature history in the advancing phase.The thermocouples were used to validate the calculated temperatures in the simulation.The location of the thermocouples is shown in Figure 8a.The temperature history was compared for the 0.38 weld pitch case in Figure 8b, the 0.83 weld pitch case in Figure 8c, and the 1.34 weld pitch case in Figure 8d.We can see good correlation among the temperature histories.As the weld-pitch increased, the temperatures measured in the experiments and calculated in the simulations decreased.This is an expected result, since less heat is generated with increasing weld pitch.The peak measured/calculated temperature was 281/269 (4.3% error), 208/210 (1.0% error), and 171/184 (4.0% error) for the 0.38, 0.83, and 1.34 weld pitch cases, respectively.Note that the temperature histories end shortly after the peak temperature was obtained.This is because the simulations terminated at the end of the advancing phase and did not include the cooling phase.In future work, we plan to extend the simulations into the cooling stage to determine residual stresses and distortion; however, for the current work, prediction of the peak advancing temperature was sufficient.temperature histories end shortly after the peak temperature was obtained.This is because the simulations terminated at the end of the advancing phase and did not include the cooling phase.In future work, we plan to extend the simulations into the cooling stage to determine residual stresses and distortion; however, for the current work, prediction of the peak advancing temperature was sufficient.
The temperature contours at the end of the simulation in Figure 9a are compared to experiment in Figure 9b by using an infrared thermal camera.The camera requires a surface with uniform reflectivity to give a good result, which was obtained by painting the top surface of the plates a flat black.The region where the FSW tool will pass could not be painted, since this would adversely affect friction at the tool-work piece interface.For this reason, the temperature contours are not shown in the weld track.Again, good correlation between simulation and experiment was found.The highest temperature was closest to the FSW tool and decreased further away from the tool.The plunge and dwell parameters led to high temperatures at the start of the weld.This is obvious in Figure 9a,b due to the shape of the temperature contours at the start of the weld.The temperature contours at the end of the simulation in Figure 9a are compared to experiment in Figure 9b by using an infrared thermal camera.The camera requires a surface with uniform reflectivity to give a good result, which was obtained by painting the top surface of the plates a flat black.The region where the FSW tool will pass could not be painted, since this would adversely affect friction at the tool-work piece interface.For this reason, the temperature contours are not shown in the weld track.Again, good correlation between simulation and experiment was found.The highest temperature was closest to the FSW tool and decreased further away from the tool.The plunge and dwell parameters led to high temperatures at the start of the weld.This is obvious in Figure 9a,b due to the shape of the temperature contours at the start of the weld.

Defect Validation
The defects present in the welds performed in the experiment were evaluated using X-ray tomographic imaging (X-ray herein) with an advanced void detection algorithm.This method is non-destructive and provides an interesting means of measuring the location and size of volumetric defects in a friction stir welded joint.Certainly, the X-ray technique is perfectly suited for research work where the weld samples are small enough to fit into the confines of the X-ray enclosure.Such an approach would not be possible for industrial welds of parts with large dimensions.The X-ray system's void detection algorithm was calibrated using an aluminum block with cavities of known dimensions and location.The parameters used to perform the X-ray images are provided in Appendix A. A comparison between the predicted defects from simulation and those measured from the experiments for the 0.38 weld pitch case is shown in Figure 10a,b.The defects were mainly at the start of the weld in the plunge/dwell zone; however, small defects were predicted and measured along the weld as well.The 0.83 weld pitch case is shown in Figure 10c,d.This case showed defects in the plunge/zone only, and no defects were present elsewhere in the weld.The third validation case, with a weld pitch of 1.34, is shown in Figure 10e,f; here extensive defects were predicted and measured throughout the entire weld as well as in the plunge/dwell region.The void detection algorithm employed by the X-ray system evaluates the local density at discrete points throughout the sample.When a region with decreased density is found, the algorithm flags the region as a possible defect and associates a probability.The algorithm returns the location and size of the defect by assuming that the defect is a spheroid.In [36], the surface area calculated from the X-ray void algorithm was used directly to compare to the simulation results, which led to an inaccurate prediction of the defect surface area.In this work, we have used the location of the center of the spheroids and their radii to perform surface triangulation in order to get a measurement of the volume of the internal defects.The procedure is as follows: 1.
Create set of points from X-ray results with x, y, z position and defect radius 2.
Re-sample the data to add more points on the surface of each spheroid 3.
Find the nearest neighbors for the re-sampled data 4.
Calculate the normal vectors and find the free surface points using Equation (11) 5.
Perform Delaunay tessellation to discretize the defect domain with tetrahedral elements 6.
Calculate the volume of each of the tetrahedrons and determine the overall defect volume (sum of individual element volumes) In this section, the volume predicted in the numerical simulations only includes the internal defects; the increase in surface area in the weld track due to the semi-circular striations is excluded.This is done because the X-ray system is not able to determine such an increase in surface area.This approach is different from what was reported in [36].A comparison of the predicted and measured defect volumes is provided in Table 8.The volume was greatest for the case with a weld pitch of 1.34 and least for the 0.83 weld pitch case.
4. Calculate the normal vectors and find the free surface points using Equation ( 11) 5. Perform Delaunay tessellation to discretize the defect domain with tetrahedral elements 6. Calculate the volume of each of the tetrahedrons and determine the overall defect volume (sum of individual element volumes) In this section, the volume predicted in the numerical simulations only includes the internal defects; the increase in surface area in the weld track due to the semi-circular striations is excluded.This is done because the X-ray system is not able to determine such an increase in surface area.This approach is different from what was reported in [36].A comparison of the predicted and measured defect volumes is provided in Table 8.The volume was greatest for the case with a weld pitch of 1.34 and least for the 0.83 weld pitch case.Typically, in the FSW process, the probability of defects being present in the weld increases as the weld pitch increases.The simulation models accurately predicted this trend and showed a volumetric defect prediction error in the range of 6-10%.

Results
Once the simulation method was validated, it could be used to determine optimal process parameters.The first step in the optimization process is to run enough simulations to have a large enough data set to for a response surface.A total of 15 simulations were run in SPHriction-3D with the parameters provided in Table 4.

Optimal Process Parameters
At the end of each of the simulations, the final value of the defect metric was recorded and stored for the optimization algorithm.The predicted internal defects for the 15 cases are shown in Figure 11.The images are arranged for increasing advancing speed from left to right and increasing  Typically, in the FSW process, the probability of defects being present in the weld increases as the weld pitch increases.The simulation models accurately predicted this trend and showed a volumetric defect prediction error in the range of 6-10%.

Results
Once the simulation method was validated, it could be used to determine optimal process parameters.The first step in the optimization process is to run enough simulations to have a large enough data set to for a response surface.A total of 15 simulations were run in SPHriction-3D with the parameters provided in Table 4.

Optimal Process Parameters
At the end of each of the simulations, the final value of the defect metric was recorded and stored for the optimization algorithm.The predicted internal defects for the 15 cases are shown in Figure 11.The images are arranged for increasing advancing speed from left to right and increasing rpm from the top to the bottom.From inspection of the results, we can see that the defects were minimized when the weld pitch was in the range of 0.5-0.8.The response surface shows that the weld quality decreases drastically as the weld pitch increases above 0.8, and decreases slightly for weld pitches below 0.5.Physically speaking, the welds with a pitch below 0.5 tend to exhibit surface defects such as galling due to overheating as well as The data sets can now be used to form a response surface that characterizes the defect metric as a function of rpm (ω) and advancing speed (V a ).The function is of the general form where the b i , c i , and d i coefficients are found from linear least squares regression analysis.The response surface shows that the weld quality decreases drastically as the weld pitch increases above 0.8, and decreases slightly for weld pitches below 0.5.Physically speaking, the welds with a pitch below 0.5 tend to exhibit surface defects such as galling due to overheating as well as excess flash along the retreating side of the weld.Welds with a pitch of over 0.8 tend to have internal defects such as worm-holes, voids, and incomplete welds.More detail on defect classification and detections is available from [49][50][51][52][53]. Inspection of the response surface suggests that the optimal process parameters that will minimize the defects will be located at the apex of the surface.To find The response surface shows that the weld quality decreases drastically as the weld pitch increases above 0.8, and decreases slightly for weld pitches below 0.5.Physically speaking, the welds with a pitch below 0.5 tend to exhibit surface defects such as galling due to overheating as well as excess flash along the retreating side of the weld.Welds with a pitch of over 0.8 tend to have internal defects such as worm-holes, voids, and incomplete welds.More detail on defect classification and detections is available from [49][50][51][52][53]. Inspection of the response surface suggests that the optimal process parameters that will minimize the defects will be located at the apex of the surface.To find the exact value of the rpm, ω, and the advancing speed, V a , we want to maximize the value of the defect metric, ψ de f ect .To accomplish the aforementioned optimization, the problem to be solved is then: Subject to (constraints): 500 rpm ≤ ω ≤ 1800 rpm 100 mm/min ≤ V a ≤ 2000 mm/min The lower bound constraints were selected to eliminate the process parameters that would lead to slow and costly welds.The upper bound constraints represent the limitation of the welding machinery.Certainly, the choice of constraints will be dependent upon settings specific to the industry as well as on factors relating to process productivity.The maximum point on the response surface is found by solving simultaneously the two following equations: A series of results can be found by solving the set of equations.However, there is only one solution that satisfies the constraints.The optimal process parameters, ω * and V a * , are listed in Table 9.To ensure that a local maximum has been found, the positive/negative definite of the Hessian matrix, H, can be checked by: H i,j = ∂ 2 ∂ω i ∂V a j ψ de f ect , and Since the Hessian matrix is negative definite, a local maximum has been found.In order to validate the simulation model's ability to determine the optimal parameters, a simulation was run at ω * = 1194 rpm and V a * = 790 mm/min.The results of the simulation were compared to the defect reconstructed from the experimental X-ray images in Figure 13.The simulation predicted defects at the plunge/dwell zone only, which is in good agreement with the experimental results.A comparison of the predicted and measured volumes at the optimal process parameters is given in Table 10.The simulation predicted marginally less defect volume (9.7% precision error) than was measured from the experiment.The excellent correlation between the predicted and the measured defect volume at the optimal process parameters further validates the proposed optimization method and shows the robustness of our method.

Process Window
Determination of the process window for friction stir welding is possible using a number of techniques.A popular approach is to perform a series of welds in the lab with various parameters, and then attempt to draw a curve that separates the good weld parameters from the poor ones.This method is also possible using numerical simulation, whereby the tests in the lab are substituted by numerical simulations.One of the drawbacks of this approach is that the choice of curve location and form to define the process window can be difficult.
Another approach is to use the results from the response surface to implicitly choose the shape of the curve that defines the process window.To accomplish this, it is necessary to first re-arrange the equation defining the response surface to obtain the advancing speed as a function of the rpm.This is akin to finding a level-set (iso-contour) of the response surface for a specific value that we will call .The parametric level set is defined by: The choice of the value of will evidently change the size of the process window.For the FSW setup in this work, was varied until the process window encompassed the simulation points without defects and so that the boundary of the window was close to the welds with minor

Process Window
Determination of the process window for friction stir welding is possible using a number of techniques.A popular approach is to perform a series of welds in the lab with various parameters, and then attempt to draw a curve that separates the good weld parameters from the poor ones.This method is also possible using numerical simulation, whereby the tests in the lab are substituted by numerical simulations.One of the drawbacks of this approach is that the choice of curve location and form to define the process window can be difficult.
Another approach is to use the results from the response surface to implicitly choose the shape of the curve that defines the process window.To accomplish this, it is necessary to first re-arrange the equation defining the response surface to obtain the advancing speed as a function of the rpm.This is akin to finding a level-set (iso-contour) of the response surface for a specific value that we will call ψ PW .The parametric level set is defined by: The choice of the value of ψ PW evidently change the size of the process window.For the FSW setup in this work, ψ PW was varied until the process window encompassed the simulation points without defects and so that the boundary of the window was close to the welds with minor defects.Performing this procedure, the value determined for ψ PW was 0.872.The resulting process window is shown in Figure 14.The welds in the graph are classified according to the following: As previously mentioned, inspection of the simulation and experimental results suggests that weld pitched in the range of 0.5 to 0.8 led to satisfactory welds.This range of weld pitches is well captured in the process window.As previously mentioned, inspection of the simulation and experimental results suggests that weld pitched in the range of 0.5 to 0.8 led to satisfactory welds.This range of weld pitches is well captured in the process window.

Conclusions and Discussion
In this work, an efficient numerical approach was presented to determine optimal process parameters that minimize the presence of defects in a friction stir welded joint.The simulation models of the FSW process were calculated using an advanced meshfree coupled thermo-mechanical code on the GPU.The efficiency of the code allows determination of optimal process parameters within a few hours-a feat that was not previously possible when attempting to optimize using a multi-physics simulation approach.
The major developments presented are: • Large-deformation Lagrangian solid mechanics formulation;

Conclusions and Discussion
In this work, an efficient numerical approach was presented to determine optimal process parameters that minimize the presence of defects in a friction stir welded joint.The simulation models of the FSW process were calculated using an advanced meshfree coupled thermo-mechanical code on the GPU.The efficiency of the code allows determination of optimal process parameters within a few hours-a feat that was not previously possible when attempting to optimize using a multi-physics simulation approach.
The major developments presented are: • Large-deformation Lagrangian solid mechanics formulation; • Fully transient simulation of all phases of the FSW process; • Meshfree method able to predict surface and internal defects; • New material model for AA6061-T6; • Defect metric that allows for automatic, on the fly, evaluation of defects in a FSW simulation model;

•
Response surface methodology to determine optimal process parameters • Implicit determination of the process window shape from simulation results

•
Robust and efficient parallelization strategy.
The simulation code was validated in terms of temperature history and defects for three different parameter sets.The model results correlated excellently with the experimental data.The error associated with the prediction of the peak temperature during the advancing phase at a location in the work piece was found to be less than 4.3%, whereas the predicted defect volume error was under 10.1%.The location and appearance of the defects also corresponded well with the experimental results.By using a fully transient solution procedure along with a meshfree Lagrangian method, the intermittence and variability of the defects was also well captured.
The presented numerical simulation approach is a powerful design tool that allows the user to quickly determine the optimal process parameters for the FSW process.Since the actual process window is highly dependent on the FSW setup, tool shape, and operating conditions, only a highly sophisticated simulation model is sufficient to optimize the process.The presented method is able to take into account important aspects such as the support base material, clamping arrangement, as well as the features on the FSW tool.As such, the technique can be used to circumvent the need to perform time-consuming and costly testing in the laboratory or using production machinery.Ultimately, this leads to increased weld productivity and efficiency.

Figure 2 .
Figure 2. Fraser-Kiss-St-Georges (FKS) flow stress model for AA6061-T6: (a) as a function of plastic strain and temperature; and (b) as a function of strain rate and temperature.

Figure 2 .
Figure 2. Fraser-Kiss-St-Georges (FKS) flow stress model for AA6061-T6: (a) as a function of plastic strain and temperature; and (b) as a function of strain rate and temperature.

Figure 3 .
Figure 3. Measuring zone for initial pre-weld surface area.

Figure 3 .
Figure 3. Measuring zone for initial pre-weld surface area.

Figure 4 .
Figure 4. (a) shows the triangulated internal defects (in red) and (b) shows the flash formation and increased surface area in the weld track for a case with 500 rpm and 102 mm/min advance.

Figure 4 .
Figure 4. (a) shows the triangulated internal defects (in red) and (b) shows the flash formation and increased surface area in the weld track for a case with 500 rpm and 102 mm/min advance.

Figure 8 .
Figure 8.(a) Location of thermocouples for temperature history.Temperature history comparison between experiment and simulation for 800 rpm with: (b) 305 mm/min, (c) 660 mm/min, and (d) 1069 mm/min.

Figure 9 .
Figure 9.Comparison of temperature contours at the end of the advancing phase between the (a) numerical and (b) experimental results for 800 rpm with 1069 mm/min advance speed.

Figure 10 .
Figure 10.Comparison of predicted and measured defects at the end of the advancing phase: (a,b) show 0.38 weld pitch, (c,d) show 0.84 weld pitch, (e,f) show 1.34 weld pitch.

Figure 10 .
Figure 10.Comparison of predicted and measured defects at the end of the advancing phase: (a,b) show 0.38 weld pitch, (c,d) show 0.84 weld pitch, (e,f) show 1.34 weld pitch.

Figure 13 .
Figure 13.Comparison of predicted and measured defects at the end of the advancing phase for * = 1194 rpm and * = 790 mm/min: (a) simulation (b) experiment.

Figure 13 .
Figure 13.Comparison of predicted and measured defects at the end of the advancing phase for ω * = 1194 rpm and V a * = 790 mm/min: (a) simulation (b) experiment.

•
Good welds (shown with on plot)-No defects are present in the advancing region of the weld.Defects in the plunge and retraction zone are permissible; • Welds with minor defects (shown with on plot)-Minor internal defects are present.The extent of the defects is not to a point that the weld would necessarily be rejected; • Welds with defects (shown with on plot)-Welds that have extensive internal defects and would be rejected outright.Defects of this type would be voids, worm-holes, and incomplete welds; • Welds with surface defects (shown with on plot)-These welds are typically very hot, leading to surface galling, tearing, hot cracking, and excessive flash.These welds would likely be rejected; • Welds with surface and internal defects (shown with on plot)-These welds have significant internal defects as well as surface defects.

Table 4 .
Process parameter sets for optimization.

Table 4 .
Process parameter sets for optimization.

Table 8 .
Defect volume comparison between simulation and experiment.

Table 8 .
Defect volume comparison between simulation and experiment.

Table 9 .
Parameters that minimize defects.

Table 10 .
Comparison of predicted and experiment defect volume at optimal process parameters.