1. Summary
Inconel 718, a prominent member of the nickel-based superalloy family, is known for its strength, fatigue life, structural stability at elevated temperatures, and corrosion resistance [
1,
2,
3,
4,
5]. Consequently, it is widely adopted in the aerospace and oil and gas industries [
6,
7,
8,
9,
10]. This material meets the demands of the manufacturing industry by being amenable to casting, welding, and forming but continues to pose challenges during machining. As investigated by Amigo et al., In718 produces smoother finished parts when using oil-based emulsion rather than cryogenic cooling [
11]. Apart from the traditional manufacturing methods, selective laser melting has become a popular additive manufacturing technology by which geometrically complex parts of this material are produced [
12,
13,
14,
15,
16]. As a demonstration of this fact, replicative octahedral structures of different sizes have been manufactured using the LPBF technique by Ochoa et al. [
17]. Further, this printed part has been tested for compressive loads using FEM simulation and validated using experimental testing for the purpose of aerospace light weighting. Laser powder bed fusion (LPBF) involves the layer-by-layer melting of metallic powder that consolidates to form the required solid present in the design [
18,
19,
20,
21]. After the printing process is completed, surface and heat treatments are carried out in order to smoothen the part and enhance microstructural homogeneity [
22,
23,
24,
25,
26]. As this manufacturing technique allows better geometric freedom than the subtractive and mass-containing techniques, it is used for the fabrication of equipment used in oil and gas processing industries. A few examples are mud motor modules, subsurface valve components, and pump manifolds. Since the drilling and extraction environment contains high-pressure and high-temperature chlorides and contaminants, the material tends to disintegrate due to corrosion. Corrosion testing plays a vital role in the quality assessment of these parts [
27,
28,
29]. A persistent challenge in this domain is to design corrosion testing techniques that produce accurate, industry-relevant results and thereby aid in lifetime extension. Electrochemical corrosion measurement is one of the established methods that performs testing under a simulated environment consisting of the tested material and the environment [
30,
31,
32,
33]. Potentiodynamic polarization (PD) measures the electrochemical activity in the cell, indicating the corrosion activity present in the material–environment combination [
34,
35,
36]. The evolution of the corrosion potential is an important indicator in the understanding of long-term corrosion of the metallic alloy. Electrochemical impedance spectroscopy (EIS) analysis measures the stability of the passive layer formed when a material is exposed to an aggressive environment [
37,
38,
39]. These two testing methods, put together, give us a picture of the rate at which the material is likely to corrode and the nature of the protective layer formed over the surface of materials upon reacting with the environment.
The data collected from the above experiments can be utilized to build machine learning (ML) models that predict the corrosion behavior of this material in the tested environment [
40,
41]. Machine learning algorithms are capable of performing multiple tasks, such as regression, classification, clustering, and dimensionality reduction, each finding its own application in the field of materials science [
42,
43,
44,
45]. Model development consists of three important steps: data preprocessing, algorithm adoption, and model testing and verification [
46,
47]. Data preprocessing refers to steps involving collection, cleaning, and organizing raw data. Algorithm adoption involves choosing the right type of algorithm based on the type of output required. Model testing and verification is the final step wherein the predictive capability of the model is tested and modified based on the errors arising. Many researchers are beginning to explore the adoption of ML models for accelerated material property prediction. Using the support vector regression algorithm (SVM) and back propagation neural network (BPNN), Wen et al. [
48] predicted the rate of corrosion in steel alloy exposed to a seawater environment. Particle swarm optimization is used for parametric tuning in the SVM algorithm, and it has consistently outperformed the BPNN algorithm. LOOCV (leave-one-out cross validation) was adopted to validate the efficiency of the model. Kamrunnahar et al. [
49] predicted the corrosion behavior of metallic glasses using polarization curves, carbon steel using weight loss data, and titanium alloy using crevice corrosion data, all collected from available literature. The BPNN network was able to capture the polarization behavior for a series of metallic glasses from the data of a single-member alloy. Compositional element details were used to model the corrosion rates in plain carbon steel and steel alloy. A good agreement was found between the experimental and estimated results. In another attempt to model the electrochemical behavior of an alloy system, Gong et al. [
50] built machine learning models using several algorithms, such as
k-nearest neighbor, decision tree, random forest, SVM, and gradient boosting decision tree algorithms. These models were then tested using corrosion data obtained for copper in a repository environment. The random forest algorithm produced the model closest to the experimental data. Feature importance analysis showed that sulfide concentration influenced the corrosion potential the most, and temperature influenced the impedance behavior the most. A more recent set of machine learning models for EIS analysis was built by Zhu et al. [
51]. Literature data available for the electrical equivalent circuit (EEC) model for EIS analysis was adopted in the SVM algorithm. After the model “trained”, it was satisfactorily able to produce an EEC model upon providing the impedance data. This greatly reduces the human effort required in modelling the EEC. As the literature survey suggests, machine learning has the potential to grow into a beneficial tool that can expedite material behavior prediction. Most of these ML approaches are confined to cast or wrought material and have not been, to the authors’ knowledge, extended to selective-laser-melted material. Further, when compared with prior research, this study focuses on a unique form of corrosion data, the results of an electrochemical testing environment. The motivation for conducting this study is to develop a predictive model that can predict the corrosion behavior of an LPBF component and mathematically compute the contribution arising from each postprocessing treatment in the additive processing cycle. Currently, as LPBF is being optimized for wide-scale industrial adoption, understanding the role of postprocessing could facilitate the improvement of the product quality. Although this method of manufacturing has multiple advantages, it is, presently, expensive [
52]. When machine learning is brought in to complete the prediction, the process becomes expedient, convenient, and cost-efficient.
In this work, machine learning algorithms were used to predict the electrochemical behavior of Inconel 718 produced via the additive route. For the purpose of model development, data were collected from one of our earlier experiments. Four machine learning models, decision tree (RF), extreme gradient boosting (XGB), support vector machine (SVM), and polynomial regression (PR) algorithms, were built using the collected data. The independent features considered were heat treatment temperature and duration, shot peening inclination and velocity, and individual input electrical parameter for each test. The performances of the ML models were evaluated, and the best-performing algorithm was employed to carry out feature importance analysis. Feature importance analysis performs the function of ranking the independent features for each of the electrochemical tests. This analysis is essential for determining the postprocessing parameters that largely influence the corrosion behavior of LPBF-processed Inconel 718.
2. Data Description
In order to collect data for building the machine learning model, corrosion experiments were conducted on selective-laser-melted and postprocessed Inconel 718 samples. These samples were divided into four categories: as built (AB), heat treated (HT), shot peened (SP), and heat treated with shot peening (HTSP). A detailed description of the experiment has been reported elsewhere [
53]. A brief description of the printing process reported earlier in our work is provided below (
https://www.mdpi.com/2075-4701/10/12/1562 (accessed on 5 May 2021).
The Inconel 718 specimens were printed vertically in small rectangular blocks with dimensions of 15 mm × 5 mm × 15 mm in an EOS M280 machine. The printer was equipped with a 200 W YB optical fiber laser with a beam diameter of 100 µm. Using a layer thickness of 20 µm and a scanning speed of 7 m/s, the blocks were printed with the 5 mm side as the base side. The printing chamber was enclosed in an argon atmosphere in order to prevent reactions between IN718 powder and the atmosphere. The powder supply to the printing region is enabled by the upward movement of the powder delivery system. When the requisite amount of powder is available, the recoater blade spreads the metallic powder on the build platform. In the equipment utilized for the current experiments, the recoater blade moved at a speed of 50 mm/s. The laser beam moved along the programmed path in order to melt the powder in the designed contour. After the current layer was completed, the next layer was spread again, and this procedure repeated until the object was printed completely. After the completion of the printing procedure, the build plate was removed, and the blocks were cut from the plate using a wire-cut electrodischarging machine.
A summary of the experimental conditions is presented in
Table 1.
The postprocessing conditions are summarized in
Table 2.
The specimens, thus printed, postprocessed, and categorized, were tested for general corrosion using an electrochemical method. The data from these tests represented the overall corroding tendency and passivation strength of the specimen. After running the tests four times to ensure reproducibility, data pertaining to PD and EIS analysis were collected in all four specimen conditions. Two of the electrochemical parameters that represented the corrosion activity in the PD test were Icorr and Epit. A lower Icorr represented lesser corrosion, and a higher Epit value represented improved passivating behavior. This trend was reflected by the HTSP specimen, with Icorr and Epit values of 0.04 µA/cm2 and 570 mV, respectively. The as-built specimen, offering the least corrosion resistance, possessed an Icorr value of 0.21 µA/cm2 and an Epit value of 220 mV. In the EIS testing, the passivation strength of the protective film was measured using the impedance of the film. It was found to be the least in the as-built specimen at 235 kΩ and the highest in the HTSP specimen at 682.2 kΩ. Heat treatment and shot peening, both forms of postprocessing, contributed to the enhancement of corrosion resistance of the selective-laser-melted Inconel 718. The laves phase distribution was reduced by the heat treatment process, thereby reducing the chromium depletion in the matrix. The high surface roughness of the samples, usually found in selective-laser-melted parts, was greatly reduced due to the shot peening process that consequently reduced the pitting tendency of the material.
2.1. Model Building
Database
The experimental electrochemical results thus obtained from the tests were categorized as Tafel plot, Bode plot, and Nyquist plot. The curves pertaining to these plots consisted of individual data points. For each individual input data point, a corresponding output data point was generated. In this manner, datasets formed for each sample condition was created. These datasets were utilized for model generation and corrosion behavior prediction.
2.2. Feature Selection
While building a predictive model, it is important to select the input features that contribute to the target variable the most [
54,
55,
56]. All the present variables might not influence the outcome, and hence, the most causative parameters need to be carefully picked to improve the predictive capability of the model. In the current model, the postprocessing parameters and input parameters for each test were taken as independent variables. The target parameters also varied for each test, and these parameters are presented in
Table 3,
Table 4 and
Table 5.
2.3. Model Development
A series of models were built for the four specimen conditions, namely, AB, HT, SP, and HTSP, for the three electrochemical plots in four different algorithms. The train, validation, and test data split adopted are as follows: train data set = 70% (to train the model behavior), validation data set = 15% (for hyperparameter tuning), test data set = 15% (for unbiased evaluation of the model performance). The number of data points used for model building was 29,100. Scikit-learn library was used for algorithm adoption, and the programming was carried out in Python language. A brief mathematical basis of each algorithm is provided in the following section.
2.3.1. Polynomial Regression
Polynomial regression (PR) [
57,
58] is a special case of linear regression where a polynomial equation is fitted on the data with a curvilinear relationship between the target variable and the independent variables. The features consist of all polynomial combinations of the features with a degree less than or equal to the specified degree.
where
ε is the error. Similar to linear regression, the objective is to minimize the ordinary least squares sum.
2.3.2. Support Vector Regression
Support vector regression (SVR) is a supervised regression algorithm with the advantage of controlling the deviation between the actual and predicted values to find an appropriate hyperplane to fit the data [
59]. Unlike linear regression, SVR minimizes the squared sum of coefficients of the model.
Constraining,
where
ε is the maximum deviation or the margin length, |
ζ| is the absolute deviation from the margin, and
C is the regularization parameter. To capture the nonlinear relationship, the RBF kernel is employed. If the original feature space is represented by the vector,
X = [
x1,
x2,
x3,...,
xn], then for any two datapoints
A and
B, the transformed feature space,
ϕ(
X), follows the condition,
where ||.||
2 is the L2 norm and γ is the scaling parameter. The RBF kernel decreases with distance and ranges between zero (when ||
A −
B||
2 tends towards infinity) and one (when
A =
B), which gives a ready interpretation as a similarity measure. Because of the nonlinearity introduced by the RBF kernel, the curve is linear in the
ϕ(
X) feature space.
2.3.3. Decision Tree
Decision tree regression (DT) is a supervised regression algorithm that learns the set of decision rules for segmenting the features [
60]. After segmenting the feature space, a constant piecewise approximation, such as the local average, is used for the prediction. The decision tree is built by recursive partitioning, with the root node (as the first parent) as the complete training dataset and the split data as the child nodes. These child nodes can be further split, considering them as new parent nodes. The node splitting is performed based on the mean squared error minimization. When a node
S is split into
A and
B, the split value is determined as follows.
Minimizing the function:
where
The split value is obtained by minimizing the total mean squared loss of child nodes A and B, with the prediction in each of the child nodes as the sample mean of the respective node.
2.3.4. Extreme Gradient Boosting
Extreme gradient boosting (XGB) is a tree-based boosting ensemble method [
61]. It is an evolved form of the boosting algorithm [
62] formulated for enhanced predictive performance through optimization [
63]. The objective function is given by the sum of loss function and regularization term as expressed using Equation (9).
where,
l = loss function representing the difference between actual value (
yi) and predicted (
pi) value, and the regularization term Ω (
fk) is given by
where T is the number of trees, w is the leaf weight, γ is the pruning index, and λ is the scaling factor of the weights.
If
pi is the prediction at the
t-th instance, we add an additional function
ft (x
i) in order to minimize the objective function. The objective function at the t-th iteration now becomes
The first and second orders derivative of the Taylor approximation function is used to solve Equation (6) and given as Equation (12).
Eliminating the constants, the equation now becomes
As can be seen from Equation (8), the objective function is dependent on the g and h values and hence becomes the optimization goal for the subsequent tree. In this manner, every loss function thus becomes optimized. The g and h values are calculated for each tree and placed in the corresponding leaf, and the values are added together by using the formula to identify a good tree.
2.4. Hyperparameter Optimization
In order to obtain the best set of parameters that predicts the corrosion behavior closest to the experimental values, Bayesian optimization is performed over the hyperparameter space to minimize the root mean square error. Let H represent the overall hyperparameter space to be searched and h be a vector from the {H} space. Let O be the objective function to be minimized (i.e., root mean square error (RMSE)). In the Bayesian approach,
P(
O|
h) is used for hyperparameter sampling and updated iteratively to get
h* (best set of hyperparameters). In the probabilistic view, to calculate the improvement in performance between two consecutive iterations, the expected improvement metric (EI) is used:
where
O* is the threshold value or the current best value of the objective function.
According to Bayes rule,
Here,
p (
h|
O) is the probability of the hyperparameters given the score on the objective function, which is represented as follows:
On applying this expression, we can infer that EI is directly proportional to l(h)/g(h). Hence, to maximize EI, more samples need to be drawn from l(h) than g(h). As l(h) corresponds to O < O*, the objective function is bound to decrease. This algorithm is called the tree-structured Parzen estimator (TPE). To run this algorithm in a parallel manner, the asynchronous successive halving algorithm (ASHA) is used for pruning in every iteration. The least RMSE was obtained by the XGBoost algorithm for the PD, Nyquist, and Bode plots, and the hyperparameters were optimized using the Optuna package.
Figure 1 represents the optimization process graphically, adopted for the PD modelling. As can be seen, the objective value approaches the best value more closely as the number of trials keeps increasing. The individual parameters after the optimization are listed in
Table 6. The model development process flow is presented in
Figure 2.