Recent Progress of Machine Learning Algorithms for the Oil and Lubricant Industry

: Machine learning (ML) algorithms have brought about a revolution in many industries where otherwise operation time, cost, and safety would have been compromised. Likewise, in lubrication research, ML has been utilized on many occasions. This review provides an in-depth understanding of seven ML algorithms from a tribological perspective. More speciﬁcally, it presents a comprehensive overview of recent advancements in ML applied to lubrication research, organized into four distinct categories. The ﬁrst category, experimental parameter prediction, highlights the signiﬁcant contributions of artiﬁcial neural networks (ANNs) in accurately forecasting operating conditions related to friction and wear. These predictions offer valuable insights that aid in forensic preparation. Discriminant analysis, Bayesian modeling, and transfer learning approaches have also been used to predict experimental parameters. Second, to predict the lubrication ﬁlm thickness and identify the lubrication regime, algorithms such as logistic regression and ANN were useful. Such predictions provide up to 99.25% accuracy. Third, to predict the friction and wear for a given experimental condition, support vector machine (SVM), polynomial regression, and ANN offered an accuracy above 93%. Finally, for condition monitoring for bearings, gearboxes, gear trains, and similar critical situations where regular in-person inspection is difﬁcult, Naïve Bayes, SVM, decision trees, and ANN were utilized to predict the safe life of lubricants. This review highlighted these four aspects with state-of-the-art examples and discussed the current situation and projected future possibilities of lubricant design facilitated by ML techniques.


Introduction
The modern world is dependent on versatile machinery that is susceptible to continuous friction and wear. As a result, numerous lubricants and additives have been developed to reduce the cost associated with friction and wear [1]. As per the literature, effective use of lubricants could contribute from 1% to 1.55% of the yearly gross domestic product (GDP) of the United States [2][3][4]. Therefore, effective lubrication is extremely important to strengthen the national economy. Lubricants can be categorically divided into three types: liquid, solid, and gaseous [5,6]. Each type has its own application, but overall the purpose is the same: optimizing friction and wear.
Among all lubricants, liquid lubricants have unique functional features that have helped them expand their applicability over many years [7,8]. Solubility, viscosity, heat capacity, wettability, dampening ability, resistance to corrosion, wear, and friction are typical characteristics of liquid lubricants [9][10][11]. Engineers have added solid particles to liquid lubricants and utilized their synergy to reduce friction and wear [12]. Tribologists have also developed superior lubricants that can provide high thermal stability, exhibit low volatility, and have a broad liquid range [13,14]. Therefore, the list of lubricants and their properties has become innumerable nowadays. As a result, it became difficult even for an expert to choose the best lubricant or simply predict the optimum condition for a chosen   In the case of supervised learning, the model has access to both the training data and their labels, and the model can be trained on the labeled data to perform prediction tasks on any unseen data. The supervised learning task can be formulated as a classification or a regression problem [24]. Classification provides discrete-valued output. It is helpful to predict a characterization for a particular case; for a naive example, machine learning in tribology can predict if wear is coming from a specific type of coated surface or an uncoated surface, which is a discrete-valued output [25].
Regression, on the other hand, provides a continuous numerical value. It is helpful to estimate any relationship between variables. Some of the common regression algorithms are linear regression and logistic regression, to name a few. One example of regression could be the investigation of the relationship between the coefficient of friction (COF) and surface roughness in sliding contact [26,27]. For instance, let us consider a study where different sample surfaces with varying roughness values are subjected to sliding against a counter material. The COF is measured under controlled conditions, and the corresponding surface roughness parameters (Ra, Rq, Rsk, and Rku) are quantified for each sample. A regression analysis could then be performed to determine the correlation between the COF and surface roughness, allowing for the formulation of a predictive model. Such a model could help estimate the friction behavior of similar materials with known surface roughness, aiding in the design and optimization of tribological systems.
For unsupervised learning, no labeled data is provided to the model; rather, an unlabeled data set is fed, and the capabilities of the model are utilized to find the hidden structures in the dataset. Under the unsupervised learning category, cluster analysis is helpful to group a similar set of objects in one group while grouping another set of similar objects in another group. The wear particle data obtained from the lubricated system could be analyzed using this technique. Clustering can form distinct clusters of data that share similarities [28]. Such clustering can provide insights into the wear mechanisms, lubricant effectiveness, or the presence of any irregular wear conditions.

Machine Learning Algorithms
There are multiple algorithms available to utilize the characterization of a dataset. Each such algorithm needs a unique approach to learning. The choice of one algorithm for favoring a particular benefit could cause a tradeoff of other benefits, including accuracy, complexity, and speed [24]. Therefore, the success of an algorithm depends on the specific nature of the problem, its complexity, cost, and often trial and error.
Currently, there are different platforms to utilize ML algorithms, such as Matlab, Python, C, and C++, to name a few. In Figure 1, different algorithms are presented that are compatible with the Matlab platform [15].

Explored Machine Learning Algorithms
Machine learning is a process that involves computer programs using software such as MATLAB and Python and needs to be structured following some algorithms. Scientists have explored many fundamental algorithms for lubrication, and a few of them have repeatedly appeared in tribological studies, such as linear regression, logistic regression, support vector machines, discriminant analysis, Naïve Bayes, decision trees, and artificial neural networks. The mathematical modeling of these seven algorithms with relevant tribological examples is summarized here.

Linear Regression
Regression analysis is a widely used technique to analyze multifactor data [29]. The basic idea behind linear regression is to get a line that best describes the relationship between the input and output variables. For example, if x represents an experimental time and y represents a wear volume, then as shown in Figure 2, the straight-line equation relating these two variables could be as follows: where β 0 is the intercept and β 1 is the slope.
Lubricants 2023, 11, x FOR PEER REVIEW 5 of 28 relating these two variables could be as follows: where is the intercept and is the slope. Here, h(x) is called the hypothesis function that is used to approximate the target variable (i.e., the wear volume here). By defining an objective function or cost function, computer scientists try to minimize the error between the obtained line and the actual data points. If y is the actual output variable and there are m total training samples, we can write, To estimate the parameter values ( and ) so that the line best fits the data, the cost function needs to be minimized. Researchers use a gradient descent algorithm to minimize the cost function and find the best set of parameters. The multiple linear regression model is used in the case of more than one input variable [30]. There are different variations of linear regression, such as Lasso and Ridge regression, to ensure the generalizability of the model [31].

Logistic Regression
Logistic Regression is a classification algorithm that estimates the probability that a certain data point belongs to a particular class. An investigation to classify the lubrication regime of hydrodynamic journal bearings was carried out using logistic regression [32]. Logistic regression follows a similar learning principle to linear regression. However, unlike linear regression, the output variable is binary, and thus, the input to the output mapping goes through a non-linear transformation called the sigmoid function to make a binary prediction. To illustrate, let us assume we want to fit the data with our hypothesis function h(x), where Here, h(x) is called the hypothesis function that is used to approximate the target variable (i.e., the wear volume here). By defining an objective function or cost function, computer scientists try to minimize the error between the obtained line and the actual data points. If y is the actual output variable and there are m total training samples, we can write, To estimate the parameter values (β 0 and β 1 ) so that the line best fits the data, the cost function needs to be minimized. Researchers use a gradient descent algorithm to minimize the cost function and find the best set of parameters. The multiple linear regression model is used in the case of more than one input variable [30]. There are different variations of linear regression, such as Lasso and Ridge regression, to ensure the generalizability of the model [31].

Logistic Regression
Logistic Regression is a classification algorithm that estimates the probability that a certain data point belongs to a particular class. An investigation to classify the lubrication regime of hydrodynamic journal bearings was carried out using logistic regression [32]. Logistic regression follows a similar learning principle to linear regression. However, unlike linear regression, the output variable is binary, and thus, the input to the output mapping goes through a non-linear transformation called the sigmoid function to make a binary prediction. To illustrate, let us assume we want to fit the data with our hypothesis The sigmoid function can transform any real-numbered value into a range between 0 and 1, which makes it a powerful choice of transformation in classification algorithms. For example, in Figure 3, if x → ∞ , then sigmoid(x) → 1 , and if x → −∞ , then sigmoid(x) → 0 . The sigmoid function is defined as The sigmoid function can transform any real-numbered value into a range between 0 and 1, which makes it a powerful choice of transformation in classification algorithms. For example, in Figure 3, if → ∞ , then ( ) → 1 , and if → −∞ , then ( ) → 0. The sigmoid function is defined as Therefore, we obtain ℎ( ) = 1 1 + * (5) Here, the cost function is derived using the maximum log-likelihood method.
In a similar way to linear regression, the gradient descent method is used to minimize the cost function and achieve the right set of parameters [33,34].

Support Vector Machines (SVM)
Support vector machines (SVM) are one type of ML algorithm that analyzes data for supervised learning. SVM provides a sorted data map with two groups, with some margins to the farthest possibility. This algorithm has been widely utilized in text recognition, image classification, and many other scientific applications [35,36]. This algorithm was observed as useful for the oil and gas exploration phase of the hydrocarbon industry [37]. It was reported that for hydrocarbon reservoir prediction, the SVM classification method was more suitable than expert judgment or filter approaches [37]. Especially for relatively complex problems, SVM was proven to be more suitable. A similar technique could be Therefore, we obtain Here, the cost function is derived using the maximum log-likelihood method.
In a similar way to linear regression, the gradient descent method is used to minimize the cost function and achieve the right set of parameters [33,34].

Support Vector Machines (SVM)
Support vector machines (SVM) are one type of ML algorithm that analyzes data for supervised learning. SVM provides a sorted data map with two groups, with some margins to the farthest possibility. This algorithm has been widely utilized in text recognition, image classification, and many other scientific applications [35,36]. This algorithm was observed as useful for the oil and gas exploration phase of the hydrocarbon industry [37]. It was reported that for hydrocarbon reservoir prediction, the SVM classification method was more suitable than expert judgment or filter approaches [37]. Especially for relatively complex problems, SVM was proven to be more suitable. A similar technique could be useful in tribology for predicting any cracks or irregularities on a material surface. Such a prediction would help to estimate the necessity of additional lubricants.
SVM was first invented in 1992, and over the last thirty years, it has become one of the most powerful ML algorithms [38]. The basic intuition behind the SVM is to build a separating hyperplane in an optimized way to create the maximum margin between two classes.
Let's assume that in Figure 4, our separating hyperplane forms the following equation: Lubricants 2023, 11, 289 7 of 27 separating hyperplane in an optimized way to create the maximum margin between two classes. Let's assume that in Figure 4, our separating hyperplane forms the following equation: However, in most cases, we deal with more than one input variable. So, we can think of as an n-dimensional vector, where, = , , … , . Here w and b can be defined as model parameters, where, = , , … , , , and b is the intercept term. Therefore, the equation can be rewritten as: Unlike the logistic regression, our target variable ∈ −1, 1 . So, we define ( ) = 1, if 0, and ( ) = −1 otherwise. In SVM, the concept of maximum margin comes into play. The margin is defined as the closest distance from a data point to the hyperplane. It can be derived mathematically: If there are m training examples, the constrained optimization problem of SVM can be written as: There can be non-separable cases where positive and negative examples overlap. Therefore, the soft margin is used by introducing C and parameters. The parameter C determines the trade-off between the misclassification of training examples and the However, in most cases, we deal with more than one input variable. So, we can think of x as an n-dimensional vector, where, x = [x 1 , x 2 , . . . , x n ]. Here w and b can be defined as model parameters, where, w = [w 1 , w 2 , . . . , w n ,], and b is the intercept term. Therefore, the equation can be rewritten as: Unlike the logistic regression, our target variable y ∈ {−1, 1}. So, we define g(z) = 1, if z ≥ 0, and g(z) = −1 otherwise. In SVM, the concept of maximum margin comes into play. The margin is defined as the closest distance from a data point to the hyperplane. It can be derived mathematically: If there are m training examples, the constrained optimization problem of SVM can be written as: minimize There can be non-separable cases where positive and negative examples overlap. Therefore, the soft margin is used by introducing C and γ parameters. The parameter C determines the trade-off between the misclassification of training examples and the simplicity of the decision boundary, while the γ parameter determines the influence of individual training examples on the decision boundary. Different kernels can be used to learn the SVM in high-dimensional feature space, namely polynomial, gaussian, and radial basis function kernels [39].

Discriminant Analysis
Discriminant analysis is a statistical technique that can differentiate between two or more groups of objects while simultaneously taking several variables into account [40]. This technique was first developed by Fischer in 1936. Discriminant analysis has been used in anthropology, biology, criminology, and political science. Recently, some expeditions on tribology have been carried out with this analysis [41]. The main intuition behind a discriminant analysis (linear) is to find a hyperplane where the projection of the classes will have maximum separability, as shown in Figure 5. Let us assume that the projection of data point X onto the line is y, where X = [x 1 , x 2 , . . . , x m ], and the projection vector W = [w 1 , w 2 , . . . , w m ]. We can write Now, if the mean of the features (X) in classes 1 and 2 is µ 1 and µ 2 , and the mean of the projections (y) in classes 1 and 2 are µ 1 and µ 2 , From that, we define scatter as the sum of square differences between the projected samples and their class mean. Hence, the scatter for class c can be written as: According to Fisher's linear discriminant, the objective function can be defined as: dial basis function kernels [39].

Discriminant Analysis
Discriminant analysis is a statistical technique that can differentiate between two or more groups of objects while simultaneously taking several variables into account [40]. This technique was first developed by Fischer in 1936. Discriminant analysis has been used in anthropology, biology, criminology, and political science. Recently, some expeditions on tribology have been carried out with this analysis [41].
The main intuition behind a discriminant analysis (linear) is to find a hyperplane where the projection of the classes will have maximum separability, as shown in Figure 5. Let us assume that the projection of data point X onto the line is y, where = , , … , , and the projection vector = , , … , . We can write = Now, if the mean of the features (X) in classes 1 and 2 is and , and the mean of the projections (y) in classes 1 and 2 are and , From that, we define scatter as the sum of square differences between the projected samples and their class mean. Hence, the scatter for class c can be written as: According to Fisher's linear discriminant, the objective function can be defined as: Maximizing J ensures that we find a projection line where projections of the same class are in the closest proximity and their means are at the farthest distance. Maximizing J ensures that we find a projection line where projections of the same class are in the closest proximity and their means are at the farthest distance.

Naïve Bayes
Naive Bayes is a simple classification technique that utilizes Bayes' rule along with a string assumption depicting that the attributes are conditionally independent given the class [42]. Naive Bayes has been widely used due to its many desirable features. Sreenath et al. [43] implemented the Naive Bayes algorithm to monitor the failure mode of a gearbox.
According to Bayes theorem, we can write: where • P(h) is the probability that a hypothesis is true, irrespective of the data; • P(D) is the prior, the overall probability of the data being observed irrespective of the hypothesis; • P(D|H) is the probability that the data will be observed if the hypothesis is true; • P(H|D) is the probability that the hypothesis is true, given the data being observed (posterior).
In Naive Bayes, we first calculate the prior probabilities for given classes. Then the conditional probability of the attributes given the class labels is computed. All these values are used to compute the posterior. We pick the class (the hypothesis) with the maximum posterior, which is also known as the Maximum Posteriori (MAP) decision rule.

Decision Trees
A Decision Tree is a non-parametric learning algorithm that works by splitting the dataset based on different conditions. This algorithm helps to model the consequences of possible decisions in the context of any outcome or event [44]. Decision tree classification was utilized in lubricant condition monitoring for generating the model for predicting wear conditions of equipment using UOA (used oil analysis), wear particle data, and failure data [45]. Furthermore, it has been used for classifying oil samples, and fault detection [43]. However, despite its flexibility, this model can be sensitive to missing values and outliers.
The Decision Tree works in a tree-like structure, where the topmost node is known as the root ( Figure 6). The root is an attribute selected by measuring the information gain, and the feature with the maximum information gain is selected as the root node. The internal nodes are the rest of the features. The branch represents the decision rule, and the leaf nodes (nodes without any children) denote the decision.
According to Bayes theorem, we can write: where • (ℎ) is the probability that a hypothesis is true, irrespective of the data; • ( ) is the prior, the overall probability of the data being observed irrespective of the hypothesis; • ( | ) is the probability that the data will be observed if the hypothesis is true; • ( | ) is the probability that the hypothesis is true, given the data being observed (posterior).
In Naive Bayes, we first calculate the prior probabilities for given classes. Then the conditional probability of the attributes given the class labels is computed. All these values are used to compute the posterior. We pick the class (the hypothesis) with the maximum posterior, which is also known as the Maximum Posteriori (MAP) decision rule.

Decision Trees
A Decision Tree is a non-parametric learning algorithm that works by splitting the dataset based on different conditions. This algorithm helps to model the consequences of possible decisions in the context of any outcome or event [44]. Decision tree classification was utilized in lubricant condition monitoring for generating the model for predicting wear conditions of equipment using UOA (used oil analysis), wear particle data, and failure data [45]. Furthermore, it has been used for classifying oil samples, and fault detection [43]. However, despite its flexibility, this model can be sensitive to missing values and outliers.
The Decision Tree works in a tree-like structure, where the topmost node is known as the root ( Figure 6). The root is an attribute selected by measuring the information gain, and the feature with the maximum information gain is selected as the root node. The internal nodes are the rest of the features. The branch represents the decision rule, and the leaf nodes (nodes without any children) denote the decision.

Artificial Neural Network (ANN)
An artificial neural network (ANN) is capable of creating models using interconnected mathematical nodes [46,47]. As such, ANN can imitate some abilities of the human brain and probe the advantages of parallel processing, noise immunity, strong fault tolerance, and good memorization [48][49][50]. ANN can learn from samples and explore those samples to realize the relationships between the inputs and results. Therefore, ANN is very suitable for treating nonlinear and complex tribological problems that cannot be solved using traditional physical theories or regular mathematical approaches. The basic step of an ANN is presented in the below picture (Figure 7), where an artificial neural network scheme is shown for fuel consumption prediction based on various engine parameters such as torque, speed, intake air pressure, and EGR%. In a similar fashion, lubrication research was facilitated by ANN to predict lubricant performance or the need for maintenance by analyzing input parameters of the tribological system [51].
A neural network provides a layered representation of the input features in the latent space, with hidden layers and their corresponding activation functions. Activation functions are used to add non-linearity to the neural network. very suitable for treating nonlinear and complex tribological problems that cannot be solved using traditional physical theories or regular mathematical approaches. The basic step of an ANN is presented in the below picture (Figure 7), where an artificial neural network scheme is shown for fuel consumption prediction based on various engine parameters such as torque, speed, intake air pressure, and EGR%. In a similar fashion, lubrication research was facilitated by ANN to predict lubricant performance or the need for maintenance by analyzing input parameters of the tribological system [51]. A neural network provides a layered representation of the input features in the latent space, with hidden layers and their corresponding activation functions. Activation functions are used to add non-linearity to the neural network.
Let us assume a dataset with n examples has three input features , , , and corresponding outputs , , . The mathematical notation, ( ) denotes the neuron in the i th unit of the j th layer. In Figure 8, the first layer is the input layer, which is defined as a vector a (1) . All the internal layers other than the final layer are called hidden layers. The first (and only) hidden layer a (2) is mapped from the input layer by some weights. The final layer is obtained in the same way and then transformed into a regression or classification setting. It uses a backpropagation algorithm to learn from the error, which is an algorithm to find the gradient required to estimate the weights of the ANN. Backpropagation was first introduced in 1970 but brought to light by Rumelhart et al. in 1986 [53,54]. To introduce non-linearity, the nodes of the layers go through further transformation by an activation function. Depending on the specific problem at hand, the choice of activation function includes sigmoid, tanh, ReLU, and LeakyRelu [43]. Let us assume a dataset with n examples has three input features x 1 , x 2 , x 3 , and corresponding outputsx 1 ,x 2 ,x 3 . The mathematical notation, a i (j) denotes the neuron in the ith unit of the jth layer. In Figure 8, the first layer is the input layer, which is defined as a vector a (1) . All the internal layers other than the final layer are called hidden layers. The first (and only) hidden layer a (2) is mapped from the input layer by some weights. The final layer is obtained in the same way and then transformed into a regression or classification setting. It uses a backpropagation algorithm to learn from the error, which is an algorithm to find the gradient required to estimate the weights of the ANN. Backpropagation was first introduced in 1970 but brought to light by Rumelhart et al. in 1986 [53,54]. To introduce non-linearity, the nodes of the layers go through further transformation by an activation function. Depending on the specific problem at hand, the choice of activation function includes sigmoid, tanh, ReLU, and LeakyRelu [43]. As such, for layer 2, the following set of equations can be derived: As such, for layer 2, the following set of equations can be derived: Here, θ with different subscripts and superscripts denote the weights. For example, ij denotes the connection weight from the jth unit of the kth layer to the ith unit of (k + 1)th layer. x 0 and a 0 denotes the bias terms, and activation is applied through g(.). This process is also called forward propagation (FP). If the dataset has m examples and there are K classes, the cost function can be written as: Using backpropagation, the θ values are updated with every iteration. The gradient descent algorithm ensures that we find the best set of θ so as to minimize the cost function.
There are many variations of ANN based on the nature of the task. A few popular algorithms include the Recurrent Neural Network (RNN), Convolutional Neural Network (CNN), and Generative Adversarial Network (GAN), etc. [55][56][57].

Application of Machine Learning Algorithms for the Lubricant Industry
The lubricant industry is heading towards automation through condition monitoring and useful life prediction. This shift is not only happening on the experimental side to choose a lubricant; rather, it starts with the extraction of the crude oil from the oil bore [58]. The geological data and drill bit condition monitoring were a primary focus for safe oil extraction. When the crude oil is purified and assigned as lubricants, monitoring is very important to ensure a safe operation for the machine with minimum breakdown. For that, the condition of the lubricant is critical. In addition, at the design phase, the prediction of COF and wear is important to avoid unwanted machine breakdown due to lubrication failure. If the algorithm is trained well, it will predict with higher accuracy, which would be beneficial to designing lubricants with superior qualities. In addition, there have been studies where experimental parameter prediction was a key concern to identify failure conditions. Also, ML algorithms have been utilized for film thickness prediction, which dictates lubricant performance significantly. Overall, the recent progress of ML algorithms for lubrication could be segmented into the following four categories:
Lubricant condition monitoring.
Each of these categories is discussed in the following sub-sections with a state-of-theart literature review in tabulated format.

Predicting Experimental Parameters
Experimental parameter prediction is an important aspect where ML could be utilized. In a few earlier expeditions, wear profiles were inspected and trained against their operating conditions in order to enable the algorithm to predict the experimental conditions for any similar wear profile. This prediction is important because it can tell if any experimental condition has gone wrong. If there is significant wear observed, then the prediction can illustrate the condition behind that wear so that the operator can improvise such extreme conditions quickly to minimize that condition deliberately. Also, performing tests under critical conditions might be more convenient for computer simulation than real experiments.
In the recent past, Gong et al. [59] proposed an approach for simulating the tribochemistry, wear, and stress of a general fully formulated oil and found that tribofilm growth can be promoted by temperature and shear. Information from such simulations could be helpful in training ML models to predict experimental conditions. Also, molecular dynamic simulation can help predict the lubricant's performance and evaluate its lubrication mechanisms. These techniques have been proven helpful to complement experimental observation to evaluate lubricant performance to some extent. Therefore, parameters from simulations are often used as descriptors to evaluate lubricant performance [60]. In Table 1, a few examples are illustrated where researchers have investigated the experimental conditions against some output results using popular algorithms such as discriminant analysis, Bayesian modeling, and the artificial neural network (ANN).
Six common automobile gasoline and diesel engine oils were collected from seventy-six sources, analyzed using inductively coupled plasma-optical emission spectrometry (ICP-OES), and then statistically compared. Brand of oil, engine type, and source vehicle type were predicted using ML with 92.1%, 82.9%, and 92.1% accuracy, respectively. This oil analysis would be beneficial for forensic analysis.

Applied Algorithm Input Parameters/Descriptors Involved Lubricants Remarks
Ref.

Bayesian modeling and transfer learning approach
Descriptors were selected based on the physical models for estimating dissipation in 2D materials-
M-X bond length 6.
Covalent radii of M/X atoms 9.
In-plane stiffness 10. Cohesive energy per unit cell 11. Thermal conductivity 12. Exciton binding energy 13. Average atomic mass 14. Bandgap energy Five experimental data sets (from the literature) and ten molecular dynamic simulation data sets of the maximum energy barrier (MEB) of the potential surface energy were obtained. MEB is correlated to intrinsic friction (friction for a crystalline material with no defects) obtained from ten different 2D materials. Less than an 8% difference was observed in MEB values as predicted via the ML model and the PES profiles obtained from the MD simulation. [60] Back propagation neural network (BPNN), Generic algorithm (GA)

Predicting Film Thickness
ML has been implemented to predict the film thickness of lubricants. This is important for gearboxes, where oil lubrication restricts the real contact between gear teeth and reduces friction, heat buildup, vibration, and corrosion. Ali et al. [64] prescribed a lambda ratio (λ) to predict the fatigue life of the gears, where λ is the ratio between film thickness and RMS surface roughness. Basically, the factors that affect the λ ratio are load, temperature, surface roughness, and gear speed [65]. The film thickness could be derived using the following equation [64]: Here, η 0 is the dynamic viscosity of the lubricant in Pa·s, k = 1.6 α 0.6 E 0.03 , α = pressure-viscosity co-efficient in mm 2 /N, R = equivalent radius in m, and ω = load applied normal to the line of contact in N/m. It was reported that gears usually operate in the elastohydrodynamic and boundary regimes [64]. In the boundary regime, the film thickness becomes lower, and contact between the tribo-pairs is likely to happen [3,66,67]. Therefore, knowledge about film thickness is very important to design and monitor oil performance. As per their studies, the network simulation conducted by MATLAB Simulink attained 100% success in predicting and classifying at high speed [64]. Table 2 represents individual cases where researchers have investigated the film thickness and highlighted the lubrication regime per the Stribeck curve.

Predicting COF and Wear
The prediction of COF and wear rate is important from a research and development point of view. A lubricant may provide good performance at one operating condition but may not be effective at other conditions. Predicting their best fit-in situation is important to formulate novel lubricants with superior performances. ML has started to ease this task by helping to predict the COF and wear values for extreme conditions, provided that the algorithm is trained with the data that was obtained from tests previously. In Table 3, several examples have been presented where ML algorithms have been successfully utilized to predict the coefficient of friction and wear. Wiener index [72] Thirty-six compounds were tested as additives. Such asbenzoxazole, benzimidazole, benzothiazole, dihydrothiazole piperazine, and piperidine were used as lubricant additives, and their tribological data were examined.
The authors concluded that elements S and P in the lubricant significantly improved tribological properties. Also, minimal carbon chain length is needed in the case of additives. However, extending the carbon chain length might not always be beneficial; rather, there could be a threshold number of carbons in the chain that would facilitate the lowest wear. [73] Linear regression • Low orbital energy • Dipole moment Thirty-six lubricant data were used for wear estimation. Among them, 29 groups were used for training purposes whereas 7 groups were used to perform the prediction.
For COF and wear estimation, a standard four-ball tester was used at 1450 rpm over 30 min. A linear regression algorithm was used to predict the wear and COF of seven test lubricants. The database was established in the MySQL database. [74] Artificial neural network Neural network predictor consisted of: one input neuron, ten hidden layer of neurons, and three output neurons with non-linear activation function Mobil 0W-40 The load-carrying capacity of the journal bearing was investigated with a steel shaft with varying surface textures. The developed neural network was able to learn quickly and predict the load-carrying capacity.
[75] The experiments were carried out on the tribological testing equipment named a mini-traction machine (MTM) using the ball-on-disk method. The friction coefficient was predicted using the feed-forward artificial neural network, which was simulated in MATLAB software. [76] Artificial neural network Seven inputs from the lubricant properties were: Inputs taken from the machining conditions were: A tool-chip tribometer was used to simulate a similar aspect of a metal-cutting operation with a pool of oil samples. The prediction was made on the surface roughness and COF using artificial neural network for bio-lubricants. [77] Artificial neural network Loading conditions were utilized, such as-Vibration, Temperature torque -The condition of wear and film thickness were predicted using ANN for a gearbox. MATLAB was used to create the simulation model. Hammock-Dowson's equation was utilized in the model's development.
[68] A new equation: X = 1 + λ k −a Is proposed that enables the simple calculation of the proportion of mixed/boundary friction in a contact as a function of λ ratio. When the X is plotted against λ, it represents a logistic or sigmoid curve. For a varied λ, this curve could be useful to predict friction under boundary and mixed lubrication regimes. [81]

Lubricant Condition Monitoring
Lubricant plays an important role in reducing friction, wear, and associated costs in machinery. Therefore, it is crucial to monitor the lubricant's conditions regularly. In small-scale applications, periodic checks by maintenance technicians can ensure the safe usage of a lubricant. However, on sites such as wind turbines or in space, regular checking might be difficult. Also, the opportunity cost of inspection could be higher if the lubricant condition is monitored periodically by the shutdown of the running machines. In order to reduce these costs, lubrication condition monitoring (LCM) plays a very vital role. Manghai et al. [82] reviewed brake fault monitoring and highlighted the significance of ML in that process. For lubricant fault monitoring, a few classes of parameters hold significance. In Table 4, such parameters have been tabulated for reference. Table 4. Common parameters used in lubrication monitoring (adapted from [21]).

Class
Parameter Refs.

Physical
Viscosity at 40 • C and 100 • C, Thermal stability, Temperature, Density, Vibration  [91][92][93][94] Continuous lubrication condition monitoring is especially important for large-scale machine tribo-pairs, wind turbine gearsets, and large plants where regular maintenance might be difficult or may interrupt operating hours. Such interruption could lead to a significant reduction in production, and therefore, automatic condition monitoring could be a great tool to implement. In recent years, companies like SKF have offered remote diagnostic services for machinery that help monitor machine health regularly [95]. Schaeffler Group also offers condition monitoring services for customers, where vibration and temperature are used [96]. A vibration and temperature monitoring system involves a set of tools to measure one or more parameters in order to identify changes in the behavior of machinery.
The key advantage of such monitoring is that it helps schedule maintenance activities based on predictive analysis. In addition to temperature and vibration, other parameters such as load, material hardness, and lubricant properties also play an important role and can help predict the service life of lubricants. In some recent research articles in this area, ML algorithms such as ANN, decision trees, Naïve Bayes, and SVM have been utilized. In Table 5, lubricant condition monitoring and surface condition prediction techniques are summarized based on the available literature.

Dry condition
Three cooling rates (0.5, 2, and 3 • C/s) were applied on six specimens made from pre-alloyed Astaloy 85 Mo and Distaloy AB powders through the powder metallurgy (PM) method. [98] Decision tree, Naïve Bayes Statistical features were given as the inputs No lubricant was considered The vibration signal was monitored from the actual gearbox of the automobile, with simulated fault conditions within the gear and bearings. The fault detection was performed using two algorithms, and classification accuracy was observed at more than 80% for all cases. [43] Random Forest (RF) Gearbox input load moment and force measurement, torque arm displacement and angular misalignment, wind direction and speed, generator mount displacement, generator rotational speed, and blade pitch angle

None
The authors highlight the need for machine learning in wind turbine drivetrain load monitoring. A RF model is employed to determine sensor positions that significantly influence the accuracy of virtual load sensing in wind turbine transmission. [99] Artificial neural network Lubricant properties were used as inputs. Data were observed and feed based after a certain life interval - The remaining utilizable life of the lubricants (RULL) was predicted using ANN.
[100] Bearing failure prediction plays an important role in safety. The SVM classifier provided at least 92% mean accuracy. [104] Back propagation neural network, Modular neural network, Radial basis neural network A wide range of inputs was used for each of the three algorithms Used and unused engine oils Two types of vehicle engines were used for the analysis of oil quality and performance. The input layer consisted of 1 neuron, whereas the hidden layer consisted of 10 neurons and the output layer consisted of 4 neurons. This research provided helpful information for industrial applications for vehicle oil analysis and fault detection. [105] Artificial neural network Leakage oil quantity and some other parameters were involved - The effect of slippers (which enhance the efficiency of axial piston pumps and motors) on lubrication was studied under different surface roughnesses and conditions). [106]

Future of ML in the Lubrication Industry
Lubricant design and development over the last century were somewhat dependent on trial and error [107]. Now that ML has been introduced in the lubricant industry, future prospects will undoubtedly be influenced by this technique. Lubricant performance depends on the operating conditions, lubricant properties, and material-pair properties. Therefore, there is scope to incorporate a variety of parameters under machine learning other than those that have already been considered. In recent studies, it was observed that friction and wear can be correlated not only to sliding speed, sliding distance, and normal load but also to properties such as hardness, yield strength, tensile strength, and ductility [108]. It is very likely that ML descriptors will incorporate these variables to project COF and wear and may be able to further predict the associated real-time cost of energy due to such interaction.
Moreover, it is known that to reduce friction and wear, lubricant is introduced between tribo pairs, where the shear strength of the lubricant needs to be less than that of the mating surfaces. Otherwise, there will be abrasion due to the lubricants. Therefore, when using additives or solid lubricants, the shear modulus could be an important parameter. Similarly, the hardness of both mating surfaces is critical to determining the adhesive or plowing wear. ML can be utilized to identify correlations between different properties of lubricants, such as film thickness, viscosity, and COF, which can then be used to optimize lubrication performance for specific applications. As such, the use of ML in tribology has initiated a new branch of studies called "Triboinformatics" [109]. Triboinformatics aims to enhance understanding, prediction, and optimization of tribological processes through the utilization of innovative data-driven approaches. It involves the development of models, algorithms, and databases to capture, store, analyze, and visualize tribological data for various applications, such as lubricant design, material selection, and predictive maintenance. For instance, ML can be used to predict the lubrication properties of a specific set of lubricants under different operating conditions and can help determine the most effective lubricant to use for a particular application. Recent trends in ML for the lubrication industry include the development of efficient models that can forecast the performance of lubricants with a high degree of accuracy [110]. These models can be trained using large datasets, which allows for more accurate predictions of the performance of lubricants in real-world applications. Overall, by using ML algorithms, it is possible to gain insights into lubricant properties that could otherwise be difficult to predict and analyze [74,111,112].
In addition to lubricant design, ML can play a significant role in the predictive maintenance of machines and equipment that require lubrication. The analysis of machine data is one of the fastest-growing areas in the industrial Internet of Things (IoT) and data analytics [113]. By analyzing real-time sensor data, machine-learning algorithms can predict when a machine requires lubrication or maintenance. Another exciting prospect of ML in the lubrication industry is its potential to reduce environmental impact. With the growing concern about climate change, companies are looking for ways to reduce their carbon footprint. ML can be used to optimize lubricant use, minimize waste, and reduce the environmental impact of manufacturing processes. By identifying patterns in data and analyzing performance, ML algorithms can optimize lubrication systems to minimize waste and improve efficiency. Furthermore, with the rise of Industry 4.0, the lubrication industry is rapidly moving towards a more connected, digital future [114]. Companies can leverage the vast amounts of data generated by sensors and equipment to optimize performance, reduce costs, and improve quality using ML techniques. As a result, valuable insights into the operations are possible, which not only predict failures but also make data-driven decisions to improve the bottom line.
In summary, ML is transforming the lubrication industry by providing a more accurate and efficient way of designing lubricants, optimizing lubrication systems, predicting maintenance needs, reducing waste, and improving environmental sustainability [115]. As the technology continues to evolve, ML will play a critical role in the future of lubrication, helping companies to remain competitive and reduce their carbon footprint.

Conclusions
ML is a field with versatile applications. With supervised and unsupervised umbrella terms, ML holds a wide variety of algorithms, each with its own unique advantages. Therefore, understanding these algorithms is important for tribologists. Along with the discussions on ML algorithms, this review offered the following conclusions: • ML techniques are being increasingly utilized in the lubrication research and lubrication industry to enhance lubricant design and optimization and predict maintenance needs; • Various ML algorithms, such as artificial neural networks, support vector machines, and decision trees, have been successfully applied to predict lubricant properties, such as viscosity, COF, and wear, under different operating conditions; • ML can assist in identifying correlations between lubricant properties and performance, enabling the optimization of lubrication solutions for specific applications; • From the literature studies, it was observed that SVM, linear regression, and Bayesian regression have been utilized several times, whereas the number of studies involving artificial neural networks is significant. The ANN algorithm has been improvised for all four aspects discussed with significant accuracy; • The future of ML in the lubrication industry holds great promise, including advancements in lubricant design, predictive maintenance, environmental sustainability, and optimization of manufacturing processes; • Lubricant condition monitoring is being improved through the use of ML, allowing for real-time analysis of lubricant parameters and early detection of potential issues in lubrication systems; • By leveraging ML, companies can make data-driven decisions, reduce costs, improve efficiency, and minimize the environmental impact of lubrication processes; • For experimental condition prediction or lubricant film estimation, ANN showed significant advantages. Moreover, for COF, wear prediction, and lubricant condition monitoring, researchers relied mostly on ANN. It is because of the built-in feature of neural networks, where the input and output layers are separated by various intermediate layers as per the designers' requirements. Therefore, the future of the lubricant industry's prediction ability will depend largely on how efficiently tribologists can develop efficient neural networks that can take tribo-informatics to the next level.
In conclusion, this research provided a thorough examination of seven machine learning algorithms applied in the field of tribology. The selection of tribological examples in this review was meticulously based on the specific algorithms employed, ensuring comprehensive coverage of the four discussed categories. It is important to acknowledge that the boundaries set for this review may have excluded certain articles that explore alternative algorithms and other intriguing topics within the broader realm of machine learning in lubrication. Therefore, considering the rapid advancement of this field, there is immense potential for future review articles to delve into those unexplored aspects and provide valuable insights for aspiring machine learning enthusiasts. Overall, this review will be helpful for researchers to understand the machine learning perspective better from a tribological point of view. The incorporation of ML in tribological studies will reduce machine breakdown and operating costs, help reduce carbon emissions, and unlock many possibilities. This will, in turn, help save energy and build a sustainable future.