Machine Learning Model to Map Tribocorrosion Regimes in Feature Space

: Degradation by wear and corrosion are frequently encountered in a variety of tribosystems, including materials and tools in forming operations. The combined effect of wear and corrosion, known as tribocorrosion, can result in accelerated material degradation. Interfacial conditions can affect this degradation. Tribocorrosion maps serve the purpose of identifying operating conditions at the interface for an acceptable rate of degradation. This paper proposes a machine learning-based approach to generate tribocorrosion maps, which can be used to predict tribosystem performance. Two tribocorrosion datasets from the published literature are used. The materials have been chosen based on the wide availability of their tribocorrosion data in the literature. First, unsupervised machine learning is used to identify and label clusters from tribocorrosion data. The identiﬁed clusters are then used to train a support vector classiﬁcation model. The trained support vector machine is used to generate tribocorrosion maps. The generated maps are compared with those from the literature. The general approach can be applied to create tribocorrosion maps of materials widely used in material forming.


Introduction
The development and implementation of predictive models from experimental data using machine learning (ML) has been an active research topic in material science and engineering. In ML, a computer program based on an algorithm learns from historical data to improve a performance metric. The selection of the algorithm and metric depend on the specific data and problem at hand. The commonly used ML algorithms can be divided into regression, probability estimation, classification and clustering [1]. Recently, ML tools have been used in tribology to build predictive models for friction coefficient [2,3], wear rate [4] and wear volume [5], to design lubricants [6][7][8] and functional materials [9], for tribocorrosion [10] and surface roughness [11] modeling, for wear particle classification [12], in corrosion modeling [13][14][15][16][17][18][19], and in medical implant classification [20].
Tribocorrosion, which is the combined effect of wear and corrosion, is often undesirable and can accelerate material degradation. It is frequently encountered where surfaces are in contact with each other in a corrosive environment. Interfacial conditions such as materials, lubrication, normal load, type of contact (sliding, fretting, rolling, impact), relative speed, surface topography, temperature, humidity and pH can affect the degradation process [21]. This phenomenon is relevant to a range of industries such as material forming and machining [22][23][24][25][26], mining and energy [27][28][29], healthcare [30], and transportation [31]. The total mass change by tribocorrosion (K wc ) can be explained using the analysis in [32] as where K w is the total mass of material removed due to wear, K c is the total mass of material removed due to corrosion. K w is K wo + ∆K w , where K wo is the mass loss by wear in the absence of corrosion, and ∆K w is the synergistic effect of corrosion on wear. K c is K co + ∆K c , where K co is the mass loss by corrosion in the absence of wear, and ∆K c is the additive effect (enhancement) of corrosion due to wear. These terms can be determined experimentally and used to characterize the dominant degenerative mechanism for the tribosystem. Tribocorrosion regimes [33] are defined based on the ratio K c /K w as follows: Tribocorrosion mechanism maps and synergy maps illustrate these regimes as functions of interfacial conditions such as normal load, sliding speed and pH. These maps are helpful to optimize material forming processes, identify conditions of minimal degradation, improve tool life, and design functional materials and coatings [34].
The standard approach to create tribocorrosion maps involves extensive experiments to generate data, determine the regimes and create graphical illustrations. Machine learning can be used to quickly and automatically create accurate predictive models from the experimental data. Although ML tools are widely used in tribology, no previous studies have dealt with identifying clusters and creating tribocorrosion maps using ML tools. This study uses both unsupervised and supervised learning techniques to make predictive models for tribocorrosion and to generate maps. The two datasets used to create the models are from published tribocorrosion studies. The materials used in these studies are the alloys Ti-25Nb-3Mo-3Zr-2Sn and Co-Cr. These have been chosen because their tribocorrosion data are widely available in the literature.
The titanium alloy has mechanical properties that make it suitable for a wide range of applications. It has a yield strength in the range of 440-500 MPa, elastic modulus in the range of 50-80 GPa, and ultimate tensile strength in the range of 708-715 MPa [35]. The corrosion potential of −0.51 V and the corrosion current density of 0.13 µA/cm 2 were reported for the alloy in phosphate-buffered solution simulating physiological environment [36]. Their high strength-to-weight ratio, low modulus of elasticity, corrosion resistance, and biocompatibility make them ideal for biomedical implants.
Co-Cr alloys are also widely used in biomedical implants due to their high specific strength, hardness, corrosion resistance, and biocompatibility. Their tensile strength is in the range of 145-270 MPa and hardness in the range of 550-800 MPa [37]. The corrosion potential and corrosion current density of Co-Cr in artificial saliva is about −0.45 V and 0.16 µA/cm 2 [38].
This study presents an ML-based general methodology to create tribocorrosion maps. The same process can be applied to create similar maps of materials widely used in material forming. The ML methodology is discussed in the following section.

Methodology
The predictive model is developed in two parts. The first part involves identifying clusters in tribocorrosion experimental data using the K-Means clustering technique. The second part involves training a support vector classifier using the data and the cluster labels. The support vectors can then be used to draw wear-corrosion mechanism, synergy, and wastage maps. The schematic in Figure 1 shows an overview of the proposed methodology. Two experimental datasets from the published literature are used in this study. The materials in these studies, albeit related to medical implants, are the subject of several tribocorrosion studies. Therefore, their relatively large volume of experimental second part involves training a support vector classifier using the data and the cluster labels. The support vectors can then be used to draw wear-corrosion mechanism, synergy, and wastage maps. The schematic in Figure 1 shows an overview of the proposed methodology. Two experimental datasets from the published literature are used in this study. The materials in these studies, albeit related to medical implants, are the subject of several tribocorrosion studies. Therefore, their relatively large volume of experimental tribocorrosion data is widely available. Note that the volume of the dataset is decisive in effectively training an ML algorithm.  Table S1). Dataset 2 (see Supplementary Materials, Table S2) was collected from the work of Stack et al. [40]. That study focused on the tribocorrosion of Co−Cr in Ringer's solution mixed with silicon carbide particles at different loads and applied potentials.
It is essential to understand the degradation and predict the lifecycle of the materials under various conditions, both of which can possibly be done with ML. The data were standardized by removing the mean and scaling to unit variance before training the model. The models were developed using the open source ML package scikit-learn [41].

Unsupervised Learning
Clustering is a multivariate statistical technique used to group data into clusters based on their underlying structure. A commonly used technique is K-means clustering. The number of clusters (K) and the starting centroids are provided as input parameters. This is followed by an iterative process of assigning the data points to each cluster based on their Euclidean squared distance to the corresponding centroid, and recalculating the centroids, until the centroids can no longer be adjusted.
In this study, K-means clustering was used for exploratory analysis. For this, an unsupervised ML model was developed using the standardized datasets. In order to determine the optimal value of K for each dataset, the elbow method [42] and the silhouette coefficients [43] were used.

Supervised Learning
Support vector machines (SVMs) [44] are supervised ML algorithms that can be used for classification and regression [45]. SVMs use kernel functions to map datapoints from an input space to a high dimensional feature space to establish linear decision boundaries (hyper planes). These boundaries become nonlinear when transformed back to original input space, thus making non-linear classification possible. The datapoints closest to the hyperplane are called support vectors. Four basic kernel functions k(xi,xj) are linear (Equation (6)), polynomial (Equation (7)), radial basis function (RBF) (Equation (8)) and sigmoid (Equation (9)). The hyperparameters γ, r, and d associated with the SVMs are generally tuned using cross-validation or grid search [46]:  Table S1). Dataset 2 (see Supplementary Materials, Table S2) was collected from the work of Stack et al. [40]. That study focused on the tribocorrosion of Co-Cr in Ringer's solution mixed with silicon carbide particles at different loads and applied potentials.
It is essential to understand the degradation and predict the lifecycle of the materials under various conditions, both of which can possibly be done with ML. The data were standardized by removing the mean and scaling to unit variance before training the model. The models were developed using the open source ML package scikit-learn [41].

Unsupervised Learning
Clustering is a multivariate statistical technique used to group data into clusters based on their underlying structure. A commonly used technique is K-means clustering. The number of clusters (K) and the starting centroids are provided as input parameters. This is followed by an iterative process of assigning the data points to each cluster based on their Euclidean squared distance to the corresponding centroid, and recalculating the centroids, until the centroids can no longer be adjusted.
In this study, K-means clustering was used for exploratory analysis. For this, an unsupervised ML model was developed using the standardized datasets. In order to determine the optimal value of K for each dataset, the elbow method [42] and the silhouette coefficients [43] were used.

Supervised Learning
Support vector machines (SVMs) [44] are supervised ML algorithms that can be used for classification and regression [45]. SVMs use kernel functions to map datapoints from an input space to a high dimensional feature space to establish linear decision boundaries (hyper planes). These boundaries become nonlinear when transformed back to original input space, thus making non-linear classification possible. The datapoints closest to the hyperplane are called support vectors. Four basic kernel functions k(x i ,x j ) are linear (Equation (6)), polynomial (Equation (7)), radial basis function (RBF) (Equation (8)) and sigmoid (Equation (9)). The hyperparameters γ, r, and d associated with the SVMs are generally tuned using cross-validation or grid search [46]: Coatings 2021, 11, 450 4 of 8 The labelled datasets obtained after clustering is used to train SVM classification models. Since SVMs are designed for binary classification, different strategies can be adopted based on the multiclass classification problem at hand [47].

Identifying the Clusters
Clustering was done for a range of K values based on the data, and the resulting within-cluster sum of squares (WCSS) was obtained. The red lines in Figure 2a,b show the WCSS for datasets 1 and 2, respectively. An elbow point in the WCSS graph typically signifies the optimal K value. The elbow can be observed at K = 3 in Figure 2a. However, the elbow in Figure 2b is not obvious. Further analysis was done by calculating the silhouette coefficients. The blue lines in Figure 2a,b show the silhouette coefficients for a range of K values for datasets 1 and 2, respectively. The best value of K is where the silhouette coefficient is maximum, which is equal to 3 for both the datasets. , = + , > 0 (7) The labelled datasets obtained after clustering is used to train SVM classification models. Since SVMs are designed for binary classification, different strategies can be adopted based on the multiclass classification problem at hand [47].

Identifying the Clusters
Clustering was done for a range of K values based on the data, and the resulting within-cluster sum of squares (WCSS) was obtained. The red lines in Figure 2a,b show the WCSS for datasets 1 and 2, respectively. An elbow point in the WCSS graph typically signifies the optimal K value. The elbow can be observed at K = 3 in Figure 2a. However, the elbow in Figure 2b is not obvious. Further analysis was done by calculating the silhouette coefficients. The blue lines in Figure 2a,b show the silhouette coefficients for a range of K values for datasets 1 and 2, respectively. The best value of K is where the silhouette coefficient is maximum, which is equal to 3 for both the datasets. It is worth noting that among the two methods discussed above to determine the optimal K, the elbow method can be ambiguous when the data are not very clustered. In such cases, the WCSS curve may not have a sharp elbow. Silhouette coefficient, on the other hand, is a value between +1 and −1, with values closer to the upper bound indicating appropriate clustering. Thus, a silhouette coefficient is a more robust metric to determine K.
The two datasets were clustered and labelled using K-means clustering with K = 3. The three clusters in dataset 1 can clearly be distinguished on a Kw vs. Kc plot as shown in Figure 3a. The single datapoint that forms a distinct cluster is identified as the corrosionwear cluster, as it has a high Kc compared to Kw. The other two clusters are closer to the Kw axis, denoting wear-corrosion synergy. The cluster near the origin is labelled as the low wear-corrosion synergy cluster, and the cluster in the lower right-hand corner is labelled as the high wear-corrosion synergy cluster. It is worth noting that among the two methods discussed above to determine the optimal K, the elbow method can be ambiguous when the data are not very clustered. In such cases, the WCSS curve may not have a sharp elbow. Silhouette coefficient, on the other hand, is a value between +1 and −1, with values closer to the upper bound indicating appropriate clustering. Thus, a silhouette coefficient is a more robust metric to determine K.
The two datasets were clustered and labelled using K-means clustering with K = 3. The three clusters in dataset 1 can clearly be distinguished on a K w vs. K c plot as shown in Figure 3a. The single datapoint that forms a distinct cluster is identified as the corrosionwear cluster, as it has a high K c compared to K w . The other two clusters are closer to the K w axis, denoting wear-corrosion synergy. The cluster near the origin is labelled as the low wear-corrosion synergy cluster, and the cluster in the lower right-hand corner is labelled as the high wear-corrosion synergy cluster.
Similarly, the three clusters in dataset 2 can be clearly identified on a K c vs. K w plot as shown in Figure 4a. The cluster with the very small values of K c , and large values of K w can be identified as having wear as the predominant degradation mechanism. The cluster in the top left-hand corner with similar values of K c and K w can be identified as having corrosion-wear synergy. The third cluster, approximately in the center of the plot, is classed as having wear-corrosion synergy. This classification is similar to the results obtained in the source [40].

Tribocorrosion Maps
Since the two datasets under consideration have three clusters each, a one-vs.-one strategy was used to train the SVM and perform classification. The clusters in dataset 1 were labelled 0, 1 and 2. The features used were abrasive concentration (g/cm 3 ) and normal load (N). A polynomial kernel function was chosen to map the datapoints to a higher dimensional space. Values for the hyperparameters in Equation (7) were as follows: kernel

Tribocorrosion Maps
Since the two datasets under consideration have three clusters each, a one-vs.-one strategy was used to train the SVM and perform classification. The clusters in dataset 1 were labelled 0, 1 and 2. The features used were abrasive concentration (g/cm 3 ) and normal load (N). A polynomial kernel function was chosen to map the datapoints to a higher dimensional space. Values for the hyperparameters in Equation (7) were as follows: kernel

Tribocorrosion Maps
Since the two datasets under consideration have three clusters each, a one-vs.-one strategy was used to train the SVM and perform classification. The clusters in dataset 1 were labelled 0, 1 and 2. The features used were abrasive concentration (g/cm 3 ) and normal load (N). A polynomial kernel function was chosen to map the datapoints to a higher dimensional space. Values for the hyperparameters in Equation (7) were as follows: kernel parameter γ = 2, penalty parameter = 10, degree d = 2 and r = 0. The model was trained on the labelled dataset. The tribocorrosion map (Figure 3b) of the feature spaceabrasive particle concentration vs. normal load-was generated by prediction using the trained model. The map shows the three previously identified clusters. From the map, the major degradation mechanism is identified as wear-induced corrosion, which is similar to the results obtained by [39]. Corrosion-induced wear is the degradation mechanism for a small segment of the operating conditions. Similarly, the clusters in dataset 2 were labelled as 0, 1 and 2. The features used were potential (V) and normal load (N). RBF kernel (Equation (8)) was used to map the dataset 2 to a higher dimensional space. The model was trained on the labelled dataset with hyperparameter γ = 7 and the penalty parameter as 10. The tribocorrosion map obtained by predicting using the trained model on the feature space is shown in Figure 4b. The three degradation mechanisms-wear, wear-corrosion, and corrosion-wear-are mapped on the potential vs. normal load graph. The wear-corrosion cluster, as identified in Figure 4a, covers most of the area in the tribocorrosion map. Wear is the dominant degradation mechanism in the potential range from −0.5 to −0.3 V. Corrosion-induced wear dominates in the region near 0 V potential.
Thus, the relationship between competing material degradation mechanisms is determined for the two tribosystems under consideration. From Figures 3b and 4b, the dominant degradation mechanism for a given set of features can be identified. With this knowledge, acceptable material loss rate can potentially be achieved by maintaining the interfacial conditions within certain bounds identified from the maps. Based on the availability of empirical data, maps can be generated in any desired feature space. Note that Figure 3b is mapped on load vs. abrasive concentration, while Figure 4b is mapped on load vs. potential.
Although the data used in this study are for specific materials, the method proposed is relevant to mapping tribocorrosion in a range of tribosystems. The maps provide an overall framework in which we can analyze the relationship between the variables, their correlation to material degradation rate, and the dominant wastage mechanisms. Potential applications include prediction of degradation mechanism and wastage rate, material selection, and lifecycle estimation of tools, material processing equipment, automotive and marine machine components.
In this study, the performance of the models was not evaluated using test and validation datasets due to limited sample sizes. It is important to note that test and validation datasets are necessary to prevent over-fitting the models. In future work, more robust ML predictive models will be created using a larger training dataset.

Conclusions
A machine learning-based approach is proposed to generate tribocorrosion maps. First, tribocorrosion experimental data are clustered and labelled based on their underlying structure. The labelled datasets are used to train SVM classification models, with the labels being the targets. Since problem is non-linear, RBF and polynomial kernels are used to map the data and establish the decision boundaries. The trained models are used to predict the targets on the feature space to generate the tribocorrosion maps. The proposed methodology to create tribocorrosion maps is relevant to any tribocorrosion system. Thus, it can find applications such as optimizing the material forming processes, material selection, identifying degradation mechanisms, lifecycle estimation, and improving tool life.