Machine Learning for Film Thickness Prediction in Elastohydrodynamic Lubricated Elliptical Contacts

: This study extends the use of Machine Learning (ML) approaches for lubricant ﬁlm thickness predictions to the general case of elliptical elastohydrodynamic (EHD) contacts, by considering wide and narrow contacts over a wide range of ellipticity and operating conditions. Finite element (FEM) simulations are used to generate substantial training and testing datasets that are used within the proposed ML framework. The complete dataset entails 915 samples; split into an 823-sample training dataset and a 92-sample testing dataset, corresponding to 90% and 10% of the combined dataset samples, respectively. The proposed ML model consists of a pre-processing stage in which conventional EHD dimensionless groups are used to minimize the number of inputs into the model, reducing them to only three. The core of the model is based on Gaussian Process Regression (GPR), a powerful ML regression tool, well-suited for small-sized datasets, producing output central and minimum ﬁlm thicknesses, also in dimensionless form. The last stage is a post-processing one, in which the output ﬁlm thicknesses are retrieved in dimensional from. The results reveal the capabilities and potential of the proposed ML framework, producing quasi-instantaneous predictions that are far more accurate than conventional ﬁlm thickness analytical formulae. In fact, the produced central and minimum ﬁlm thickness predictions are on average within 0.3% and 1.0% of the FEM results, respectively.


Introduction
The development of new mechanical systems focuses on maximizing their efficiency and service life.Accordingly, ensuring that machine elements operate in safe conditions is imperative to build reliable systems with low maintenance requirements.In practice, machine elements that undergo high loads, namely gears and bearings, are often lubricated to reduce component wear and optimize system efficiency.For these elements, accurately quantifying the thickness of lubricant films at the contact level is critical to maximizing the system uptime and the lifetime of its components while preventing excessive wear and reduced efficiency.Moreover, suboptimal efficiency is linked to increased energy consumption, resulting in higher operating costs, and greater greenhouse gas emissions [1].
Many lubricated mechanical components (e.g., gears, bearings, etc.) generally operate in a regime in which contact surfaces are fully separated by the lubricant, referred to as the full film separation regime.In many cases, the pressure endured by these machine elements at the contact level can exceed several gigapascals.In such cases, the solids in contact can experience elastic deformation in the contact region, a regime known as "elastohydrodynamic lubrication" (EHL).
In the general elastohydrodynamic (EHD) contact, the solids are approximated (at the contact level) by ellipsoids, each of different principal radii of curvature, R x and R y , in the x-developed for isothermal point contacts and their respective range of interest is provided by Wheeler et al. [19].
When predicting film thicknesses in EHD contacts, a trade-off has traditionally existed between accuracy and speed.This limitation can be overcome by adopting specific data-driven approaches, such as Machine Learning (ML).Although developing sizeable datasets for ML requires significant time and effort, the resulting predictive models offer the potential for accurate and real-time film thickness predictions.To fully replace timeconsuming-but accurate-physics simulations, generated models should be carefully optimized to minimize prediction errors.
ML methods are becoming increasingly popular in physical and engineering sciences.In recent years, these methods have been introduced into tribology for both classification and regression problems [20].One popular classification example is bearing fault detection and identification, which has been tackled using a variety of ML models.For example, Kankar et al. [21] used Support Vector Machines (SVMs) [22] and Artificial Neural Networks (ANNs) [23], while Shen et al. [24] used physics-informed convolutional neural networks (CNNs).One regression example is the prediction of the remaining useful life of rolling bearings, which was addressed using, for example, Random Forests [25], and several deep learning approaches, including Stacked Autoencoder and Recurrent Neural Networks [26] and Generative Adversarial Networks [27], among others.For regression problems similar to film thickness prediction, ML methods can benefit from both the speed of analytical formulas and the accuracy of simulations.Additionally, ML methods can be used to solve differential equations.For instance, physics-informed neural networks (PINNs), which do not necessarily require a training dataset, have been used to solve hydrodynamic lubrication problems [28,29].
In terms of EHL, Marian et al. [30] were the first to employ machine learning models to predict EHD parameters, namely central and minimum film thicknesses of line and circular contacts, based on learning datasets generated using FEM simulation results.The study looked at the performance of Support Vector Regression (SVR) [31], Gaussian Process Regression (GPR) [32][33][34], and ANNs.Different input parameters were also tried, namely the dimensional input variables and the two aforementioned sets of dimensionless groups (Hamrock and Dowson as well as Moes).It was concluded that GPR models can offer the smallest prediction error and that dimensionless groups fail to train accurate models.This is somehow unexpected, given their popular and relatively successful use in analytical formulas.More recently, Walker et al. [35] worked on predicting the central film thickness and viscous and boundary friction in EHD line contacts using ANNs.
While the current study similarly utilizes a dataset of simulation results to build predictive machine learning models for EHL parameters, it is the first to examine central and minimal film thicknesses for the general case of elliptical contacts.

Finite Element Model
The FEM model employed for dataset generation is detailed in this section.It follows a full coupling or full-system approach [5] and assumes smooth solid surfaces operating under steady-state, isothermal, and Newtonian conditions, for simplicity.Moreover, a fullfilm regime is assumed, with the lubricant entrainment direction being in the x-direction (i.e., solids are moving at constant surface velocities u 1 and u 2 in the x-direction).The contact is subjected to a constant external applied load F.
As previously discussed, the solids are approximated by ellipsoids at the contact level, as illustrated in Figure 1a.For simpler modeling, an equivalent reduced configuration is adopted, consisting of a contact between an elastic ellipsoid of equivalent radii of curvature and solid properties and a rigid flat plane, as shown in Figure 1b.The elastic properties (i.e., Young's modulus of elasticity E and Poisson's coefficient υ) of the equivalent ellipsoid, as a function of the properties of each solid, are given by the following: , and υ = 0 (1) Lubricants 2022, 10, x FOR PEER REVIEW 4 of 26 properties (i.e., Young's modulus of elasticity E and Poisson's coefficient υ ) of the equiv- alent ellipsoid, as a function of the properties of each solid, are given by the following: The radii of curvature of the reduced configuration, and x y R R , would then be writ- ten as follows: , , and Finally, let R be the equivalent radius of curvature of the reduced geometry, written as follows: For a convenient and generalized study, the equations are written in dimensionless form as a function of the operating conditions and the Hertzian dry contact parameters.The latter consist of the Hertzian contact pressure h p (i.e., the maximum pressure reached at the center of the contact) and the semi-axes of the contact ellipse in the x-and y-directions,

and
x y a a , respectively.The Hertzian contact pressure is given by the following: The ellipticity ratio of the Hertzian dry contact patch  The radii of curvature of the reduced configuration, R x and R y , would then be written as follows: , and Finally, let R be the equivalent radius of curvature of the reduced geometry, written as follows: 1 For a convenient and generalized study, the equations are written in dimensionless form as a function of the operating conditions and the Hertzian dry contact parameters.The latter consist of the Hertzian contact pressure p h (i.e., the maximum pressure reached at the center of the contact) and the semi-axes of the contact ellipse in the xand y-directions, a x and a y , respectively.The Hertzian contact pressure is given by the following: The ellipticity ratio of the Hertzian dry contact patch θ = a x /a y can be found given the ratio of the radii D = R x /R y .For wide elliptical contacts, where 0 < D ≤ 1 and 0 < θ ≤ 1, the Hertzian contact parameters are given as follows: , and where Ψ 1 is the complete elliptical integral of the first kind.For narrow or slender elliptical contacts, corresponding to D ≥ 1 and θ ≥ 1, the Hertzian contact parameters are written as follows: , and a y = Lubricants 2023, 11, 497

of 23
The dimensionless parameters employed in the governing equations can be defined using the aforementioned variables as follows: where u, v, and w are the solid elastic deformations in the x-, y-, and zdirections, respectively, h and p the lubricant film thickness and pressure, respectively, and ρ 0 and µ 0 the density and viscosity of the lubricant, respectively, at ambient pressure.

Governing Equations
The computational domain Ω, shown in Figure 2, is the region over which the elastic deformation equations are applied.Its boundary includes a bottom domain, denoted as ∂Ω b , and a contact domain, denoted as Ω c , located on the upper surface of Ω, over which the Reynolds equation is applied.Moreover, given the unidirectional lubricant flow in the x−direction, the problem is symmetric with respect to an xz−plane, denoted as ∂Ω s , passing through the center of the contact.As a result of this symmetry, the number of degrees of freedom (dofs) of the elastic and hydrodynamic parts of the problem is reduced by half.The dimensionless side length of the domain Ω was taken to be 60 [5], which is reduced to 30 in the y-direction due to symmetry (−30 ≤ X ≤ 30, −30 ≤ Y ≤ 0, and −60 ≤ Z ≤ 0).Such large dimensions are required to attain a half-space configuration.The dimensionless size of the contact domain Ω c was taken as 6 × 3, accounting for symmetry (−4.5 ≤ X ≤ 1.5, −3 ≤ Y ≤ 0).Note that the origin of the domain is taken at the dry undeformed point of contact.with where

∂Ω ∂Ω 
. For the latter, a symmetry condition is required ( Lastly, the dimensionless film thickness H is defined as follows: The Reynolds equation describes the pressure distribution over the contact domain Ω c and is written in its dimensionless form as follows:

∂X
Where : where u m = (u 1 + u 2 )/2 is the mean entrainment speed.A zero-pressure boundary condition (P = 0) is imposed on the boundary ∂Ω c of the contact domain Ω c , excluding the symmetry boundary ∂Ω c ∩ ∂Ω s .For the latter, a symmetry condition is required (∂P/∂Y = 0).Lastly, the dimensionless film thickness H is defined as follows: The linear elasticity equations govern the dimensionless deformations U, V, and W, in the x-, y-, and zdirections, respectively.The equations are given as follows: The boundary conditions of the linear elastic deformation equations are: over ∂Ω s σ n = 0 and {σ t } = {∅} elsewhere (11) where σ n and {σ t } are the normal and tangential stresses, respectively (with τ ij being the latter's component in the j direction within a plane having i as normal).
Finally, for the load balance between the external load and the lubricant pressure, the equation is written as follows: This equation ensures that the correct constant external load F is applied to the contact by monitoring the value of the rigid body separation term H 0 .The lubricant adopted for this study is a well-characterized mineral oil, "Shell T9", for which the density-pressure response is characterized by the Murnaghan equation given by the following: with the ambient temperature T 0 = 30 • C, K 0 = 10.545,K 00 = 9.234 GPa, β K = 6.09 × 10 −3 K −1 , and ρ 0 = 872 kg/m 3 [36].The viscosity-pressure dependence of this lubricant is characterized by the modified Yasutomi-WLF model given by the following: where T g is the glass transition temperature, with T g0 being its ambient-pressure value, and with parameters , and µ g = 10 12 Pa•s [36].Given these values, the ambient-pressure viscosity µ 0 = 0.0125 Pa•s, and the reciprocal asymptotic isoviscous pressure coefficient [37] α * = 21.21GPa −1 .

Overall Numerical Procedure
The domain Ω was discretized into a mesh of second-order (quadratic) Lagrange finite elements of a tetrahedral shape (10 nodes).The mesh across the contact domain Ω c was taken as the two-dimensional projection of the three-dimensional mesh, that is, a mesh of 6-node Lagrange triangular elements.To decrease the computational time, the mesh was refined the most at Ω c and made increasingly coarser in further regions of the domain Ω.Moreover, a mesh refinement process was conducted to ensure grid-independent solutions.
All equations were discretized using FEM and solved simultaneously.Since the Reynolds equation is a convection-diffusion equation, artifact oscillations may arise in the high-load solution when using the standard Galerkin formulation [5].Several formulations have been introduced to remedy this issue, namely the Streamline Upwind Petrov-Galerkin (SUPG), Galerkin Least-Squares (GLS), and Isotropic Diffusion (ID) [5].In the model employed in this paper, the SUPG stabilizing term is combined with the ID term, which allows for the resolution of cases with Hertzian pressures of several gigapascals.
Furthermore, the penalty term introduced by Wu [38] was added to the Reynolds equation to comply with the cavitation boundary condition.The resulting non-linear algebraic system of equations is then solved using the damped-Newton method [39], starting from a carefully chosen initial guess.For more details on the FEM modeling of the EHL problem (convergence criteria, meshing, FEM formulations, etc.), interested readers are referred to [5].

Experimental Validation
The FEM model was previously validated against experiments for circular contacts on numerous occasions, but not for elliptical contacts.Therefore, experimental validation for wide and narrow elliptical contacts is required.
Experiments were conducted using an optical interferometry roller-on-disc apparatus in a temperature-controlled environment at 30 • C, with a steel roller (E = 210 GPa and υ= 0.3) against a glass plane (E = 72 GPa and υ= 0.23).For the wide contact experiments, the roller's radii of curvature at the point of contact R x = 13.05 mm and R y = 84 mm, and the experiments were conducted for a constant load of F = 150 N, corresponding to a Hertzian pressure of p h = 0.484 GPa.Under these conditions, the contact ellipticity ratio θ ≈ 0.295.A total of 20 different values of entrainment speeds ranging from u m = 0.05 m/s to u m ≈ 6.458 m/s were considered.For narrow contacts, the roller's radii of curvature at the point of contact R x = 12.7 mm and R y = 4.82 mm, and the experiments were conducted for a constant load of F = 13 N, corresponding to a Hertzian pressure of p h = 0.526 GPa.Under these conditions, the contact ellipticity ratio θ = 1.89.A total of 15 different values of entrainment speeds ranging from u m = 0.8214 m/s to u m ≈ 6.46 m/s were considered.For both cases, the lubricant used was the Shell T9 lubricant.Note that film thickness measurements for wide contacts extend to much lower speeds compared to slender ones.This is because the film thickness range covered by the employed test rig is fixed.It spans from a few tens of nanometers up to slightly less than a micron.Given that the same lubricant is employed for both wide and slender contacts, with the same inlet temperature, a wide contact would generate thicker films than a slender one (at similar speeds), allowing for measurements at smaller entrainment speeds for the former.In addition, for wide contacts, the higher external load F enhances contact stability, allowing for measurements of even thinner films (i.e., at even lower speeds).
As can be seen in Figure 3, which shows central and minimum film thicknesses against mean entrainment speeds on a log-log scale, the FEM model generally complies with the experimental data, especially at low speeds where the deviation is minimal.For higher entrainment speeds, the discrepancy is seen to increase, which can be justified by the prevalence of thermal and shear-thinning effects that were not accounted for in the model.Coefficient of determination R 2 values (between experimental and numerical data) are reported within Figure 3 for each set.

Machine Learning
In general, supervised ML models for regression are algorithms that learn the relati ship between variables in a dataset and predict a continuous target or output variable.T ML model is provided with input features that will be used to interpret the variation of output variable in a learning process known as model training.Once the dataset is genera or compiled, it is split into at least two datasets: a training and a testing dataset.A variety ML models are trained on the former set with varying parameters.Then, the performan and accuracy of these models are evaluated on the testing dataset based on appropri evaluation metrics to empirically find the optimal model for a given application.An i portant procedure conducted by ML engineers is feature selection, which involves select the necessary input features (here, the relevant EHD parameters) to properly capture variation in the outputs (here, the central and minimum film thicknesses).This section vers the development of the dataset, the feature selection process, the description of the e ployed ML model (namely GPR), its underlying choices for the kernel function, and the e ployed data standardization techniques and performance metrics.

Data Generation
Prior to generating the operating conditions of every sample in the dataset, a ran of operating conditions should be specified.The ranges are defined such that every da point belongs to the full-film elastohydrodynamic lubrication regime.In other words, lubricant film should be sufficiently thick to prevent a surface asperity contact, indicat of the mixed lubrication regime.In addition, the pressure endured by the compone should not result in plastic deformation, which is avoided in practice as it would lead failure.To achieve this, specific ranges for the Hertzian pressure h p , mean entrainm speed m u , and ratio of radii D were established, and the values of the remaining para eters were calculated accordingly.Moreover, cutoff values for the Moes dimensionl parameters M and L [12] were set to enforce the same full-film EHL conditions.This ditional restriction ensures that all considered samples remain within the vicinity of piezoviscous elastic lubrication regime.Further verification will be applied through careful visual inspection of all obtained pressure and film thickness profiles to ensure t they exhibit all typical characteristics of this regime (i.e., a flat central film thickness reg with a minimum thickness in the vicinity of the side lobes, a near-Hertzian pressure d tribution with a zero pressure gradient on the inlet side to ensure that the chosen in length is sufficient to avoid numerical starvation, etc.).The Moes dimensionless grou are written as a function of the Hamrock and Dowson dimensionless groups U , G , a W [11], which are given as follows:

Machine Learning
In general, supervised ML models for regression are algorithms that learn the relationship between variables in a dataset and predict a continuous target or output variable.The ML model is provided with input features that will be used to interpret the variation of the output variable in a learning process known as model training.Once the dataset is generated or compiled, it is split into at least two datasets: a training and a testing dataset.A variety of ML models are trained on the former set with varying parameters.Then, the performance and accuracy of these models are evaluated on the testing dataset based on appropriate evaluation metrics to empirically find the optimal model for a given application.An important procedure conducted by ML engineers is feature selection, which involves selecting the necessary input features (here, the relevant EHD parameters) to properly capture the variation in the outputs (here, the central and minimum film thicknesses).This section covers the development of the dataset, the feature selection process, the description of the employed ML model (namely GPR), its underlying choices for the kernel function, and the employed data standardization techniques and performance metrics.

Data Generation
Prior to generating the operating conditions of every sample in the dataset, a range of operating conditions should be specified.The ranges are defined such that every datapoint belongs to the full-film elastohydrodynamic lubrication regime.In other words, the lubricant film should be sufficiently thick to prevent a surface asperity contact, indicative of the mixed lubrication regime.In addition, the pressure endured by the components should not result in plastic deformation, which is avoided in practice as it would lead to failure.To achieve this, specific ranges for the Hertzian pressure p h , mean entrainment speed u m , and ratio of radii D were established, and the values of the remaining parameters were calculated accordingly.Moreover, cutoff values for the Moes dimensionless parameters M and L [12] were set to enforce the same full-film EHL conditions.This additional restriction ensures that all considered samples remain within the vicinity of the piezoviscous elastic lubrication regime.Further verification will be applied through the careful visual inspection of all obtained pressure and film thickness profiles to ensure that they exhibit all typical characteristics of this regime (i.e., a flat central film thickness region with a minimum thickness in the vicinity of the side lobes, a near-Hertzian pressure distribution with a zero pressure gradient on the inlet side to ensure that the chosen inlet length is sufficient to avoid numerical starvation, etc.).The Moes dimensionless groups are written as a function of the Hamrock and Dowson dimensionless groups U, G, and W [11], which are given as follows: Lubricants 2023, 11, 497 9 of 23 Then, the Moes dimensionless groups are given by the following: −3/4 , and L = G(2U) 1/4  (16) The ranges of interest of all parameters are presented in Table 1.
To provide representative sampling across the ranges of interest, Latin Hypercube Sampling (LHS) [40] was employed.LHS is a space-filling design method that maximizes the minimum distance between all datapoints.Such a method is employed as it is ideal for ML models to have spaced-out samples.
Two separate datasets, each of 625 points, were generated using LHS.One dataset was for wide elliptical contacts, and another one was for narrow contacts.Both datasets were generated for steel-steel contacts, lubricated with Shell T9.The lhsdesign function in MATLAB [41] generated a unit hypercube, which was then mapped linearly to match the ranges of the operating conditions in Table 1.Note that argument "Smooth" was set to "off" to obtain equally spaced steps in each dimension, and the argument "Criterion" was set to "maximin" to maximize the minimum distance between datapoints, as desired.The distribution of LHS datapoints of each dataset can be seen in Figure 4a,b for wide and narrow contacts, respectively.After applying the constraints in Table 1, the former dataset was left with 413 samples, while the latter had 503 samples.The distribution of constrained LHS datapoints of each dataset is shown in Figure 4c,d for wide and narrow contacts, respectively.Note that for both wide and narrow contacts, the low-speed high-load cases and the high-speed low-load cases were filtered out in the constrained datasets.This is because the former would yield extremely thin films (below 10 nm, roughly) where the direct contact between asperities is likely to occur (mixed lubrication), while the latter would yield extremely thick films with low pressures and little-to-no solid elastic deformations (hydrodynamic lubrication).For every simulated case, the values of parameters M, L, and θ were recorded in a table, along with target variables H * c and H * m , as detailed in Section 3.2.The datasets were finally combined into one larger dataset of 915 samples, which was split into an 823-sample training dataset and a 92-sample testing dataset, corresponding to 90% and 10% of the combined dataset samples, respectively.

Feature Selection
Feature selection is a key component in any ML process, consisting of the careful selection of the input parameters that would be needed for the prediction of specified outputs.The simplest and most obvious choice for the EHL problem would be a brute-

Feature Selection
Feature selection is a key component in any ML process, consisting of the careful selection of the input parameters that would be needed for the prediction of specified outputs.The simplest and most obvious choice for the EHL problem would be a brute-force approach, whereby all physical inputs of the contact (i.e., kinematics, load, solid and fluid material properties, etc.) are selected as features.This would result though in a relatively high number of input parameters, some of which may be correlated.Knowledge of the EHL problem and its characteristics will turn out to be fundamental in selecting the right inputs, as will be discussed in what follows.
This study aims to develop ML models that can accurately predict the central and minimum film thicknesses of EHD elliptical contacts.A common practice in ML is to reduce the number of input features into a model and hence the popularity of methods, such as Principal Component Analysis (PCA) [42].This practice boosts the model's simplicity and interpretability, as well as its computational efficiency, both in training and prediction.In this work, the number of input features was minimized based on the existing knowledge of the EHL problem, its underlying physics, and governing mechanisms.The Moes dimensionless groups, M and L [12], defined in Equation ( 16), were exclusively used as input features for GPR, as this set has the lowest number of dimensionless groups due to the optimum similarity analysis employed to derive it.This set was also proven to be relatively successful at film thickness prediction in analytical formulas [19].Furthermore, the ellipticity of the contact θ (or the ratio of radii D) is required to convey the shape or aspect ratio of the contact.The feature selection process is summarized in Figure 5, where the numbers between brackets refer to the equation(s) defining their corresponding parameters.This figure details the pre-processing and post-processing phases applied prior to and following the ML phase.The values of parameters R x , E, and α * were fixed, along with the density and viscosity models and their parameters for the Murnaghan equation of state (13) and the modified Yasutomi-WLF Equation ( 14), respectively.Only parameters D, p h , and u m were varied based on LHS.The number of input features was then reduced by deriving parameter F using Equation (4) as a prerequisite to finding the Hamrock and Dowson dimensionless groups, U, G, and W using Equation (15).Then, these groups were further combined into the Moes dimensionless groups M and L using Equation (16).Furthermore, these features were transformed into their logarithmic values ln(M) and ln(L), respectively.At this stage, it is essential to introduce another commonly-adopted definition for the dimensionless film thickness: This definition is used in the ML process to conform with the Moes dimensionless groups (M and L) as input parameters.The same M, L, and θ values result in the same H * and not H (not accounting for lubricant compressibility, and assuming an idealistic exponential viscosity-pressure response) [12].Analytical formulas featuring this definition were previously established by Evans and Snidle [43] and Nijenbanning et al. [17].This allows the model to be generalized to different solid material properties.A sufficiently accurate generalization would also be attained for fluids with a non-exponential viscositypressure behavior, by using α * to describe their response.are also transformed into their logarithmic values, ln(H * c ) and ln(H * m ).The logarithmic transformation is based on analytical formulas indicating a power-law relationship between the input and output features.It reduces the non-linearity of the problem and allows for easier and more accurate ML modeling, as will be discussed in Section 4. Dimensional film thicknesses were then retrieved using Equation (17), and the ML-model-performance metrics were evaluated based on the output dimensional film thicknesses, as discussed in Section 3.3.Note that fixing a subset of EHD parameters while maintaining the ability to generalize onto other values is made possible through the use of dimensionless groups.
equation of state (13) and the modified Yasutomi-WLF Equation ( 14 This definition is used in the ML process to conform with the Moes dimensio groups ( M and L ) as input parameters.The same M , L , and θ values result i same * H and not H (not accounting for lubricant compressibility, and assuming an alistic exponential viscosity-pressure response) [12].Analytical formulas featuring definition were previously established by Evans and Snidle [43] and Nijenbanning [17].This allows the model to be generalized to different solid material properties.A ficiently accurate generalization would also be attained for fluids with a non-expone viscosity-pressure behavior, by using α * to describe their response.The output fea logarithmic transformation is based on analytical formulas indicating a power-law rela ship between the input and output features.It reduces the non-linearity of the problem allows for easier and more accurate ML modeling, as will be discussed in Section 4. Di sional film thicknesses were then retrieved using Equation (17), and the ML-model-p mance metrics were evaluated based on the output dimensional film thicknesses, a cussed in Section 3.3.Note that fixing a subset of EHD parameters while maintainin ability to generalize onto other values is made possible through the use of dimensio groups.

Gaussian Process Regression (GPR)
Several models may be used for regression given tabular data, each of different w ing principles.For the sake of brevity, only GPR [33] will be considered here, giv superior performance over SVR [31] and its better suitability for limited datasets pared to ANNs [23].GPR is a non-parametric-the number of parameters is a functi the number of training points-probabilistic model that can account for noise and u tainty.This model employs a kernel function to capture data nonlinearity.The k

Gaussian Process Regression (GPR)
Several models may be used for regression given tabular data, each of different working principles.For the sake of brevity, only GPR [33] will be considered here, given its superior performance over SVR [31] and its better suitability for limited datasets compared to ANNs [23].GPR is a non-parametric-the number of parameters is a function of the number of training points-probabilistic model that can account for noise and uncertainty.This model employs a kernel function to capture data nonlinearity.The kernel function should be carefully selected or designed to inform the model on the relationship and covariance between variables and therefore potentially achieve better performance.The different kernel function choices that are considered in this work are detailed in Appendix A.
ML models typically have hyperparameters that should be tuned to improve performance.Hyperparameters traditionally were fixed before training begins and not altered during the process.However, models now employ optimization algorithms to fine-tune some hyperparameters during training.This helps to avoid overfitting and improve generalization to unseen data [44].In GPR, the hyperparameters include the kernel length-scale, which can be different for every input feature, and determines the flexibility and sensitivity of the model (range over which one datapoint can influence other datapoints) and the kernel signal variance, which controls the scale of the function and the distribution of data.The values of the kernel hyperparameters are initialized prior to training the models.Then, through an optimization process, these values are updated until convergence is reached.In the scikit-learn GPR implementation employed in this work, the l-bfgs-b algorithm [45] maximizes the log marginal likelihood [33], which reduces errors and improves model accuracy.
In GPR, the predicted function is modeled as a Gaussian process distribution.In other words, the predicted function does not have one specific parametric form, but rather it is a distribution over several functions.This allows the model to approximate uncertainty and noise in data by calculating the standard deviation of predictions.Let x and x be the sets of values of input features from two sample datasets X and X of dimensions n and n, respectively, with x i and xi being their individual inputs (i = 1 . . .N f , where N f corresponds to the number of input features).A multivariate Gaussian process is defined by a mean function m(x) and a kernel function k( x, x) evaluated for every possible pairwise combination ( x, x).The former is the expected value for an input x, while the latter represents the dependence and correlation between inputs x and x.A function y(x) following a Gaussian distribution would then be denoted as follows: In a Gaussian process, any subset of points belonging to the same dataset follows a joint normal, Gaussian distribution.As such, the training dataset output values y tr and the testing dataset output values y te are drawn from a joint multivariate Gaussian distribution, in a GPR model.Initially and before training or observing any data, a prior distribution over functions is assumed.The prior distribution can be seen as an initial assumption about the model parameters prior to observing any data.Then, by applying Bayes' rule [33] and conditioning to the training dataset, the posterior distribution is obtained.This distribution combines the prior with the likelihood function, both of which are assumed to be Gaussian.The latter is derived from training data and reflects the probability of observing the output value, given the set of input values.The posterior distribution is also a multivariate Gaussian process representing the updated belief about model parameters, characterized by a posterior mean function and a posterior kernel function.The posterior mean function represents the noise-free or average value of predictions, while the posterior kernel function can quantify noise and uncertainty in predictions.The output (and input) data will be standardized around a zero mean, as will be discussed later.Accordingly, the mean function m(x) is taken as zero, allowing for notational and computational simplicity.Note that since the dataset was developed using deterministic FEM simulations, the data are assumed to be noise-free (no uncertainty).The noise variance value is negligible, and the uncertainty of predicted values is not considered, i.e., the posterior kernel function is not considered, and the predicted output is a specific value and not a range of values.Therefore, to make predictions, the posterior mean function is simply evaluated for the input values of data points to be predicted.Given the following prior distribution over functions: the prior joint multivariate Gaussian distribution between all dataset points (split into training and testing subsets, denoted by the subscripts "tr" and "te", respectively) is written as follows: and the prediction function of the GPR model for testing datapoints is as follows: The GPR prediction, Equation ( 21), can be seen as a linear regression or weighted sum equation [32], where every output in y tr is multiplied by a given weight.The weight stems from the evaluation of the kernel function, and its value is based on the similarity between the input values of the testing and training points.For a more detailed derivation of Equation ( 21), interested readers are referred to [33].Notice that only two kernel matrices need to be evaluated for noise-free predictions.The first is k(X tr , X tr ) for every possible pairwise combination of training samples, resulting in a matrix of size N tr × N tr , where N tr is the number of samples in the training set.The second kernel matrix k(X te , X tr ) evaluates the kernel function for every possible pairwise combination of samples, where one sample is from the testing set and the other is from the training set.The size of k(X te , X tr ) would be N te × N tr , where N te is the number of samples in the testing set.The computational complexity is imposed by the inversion of matrix k(X tr , X tr ), which results in an algorithm scaling with O N 3 tr .For a relatively small dataset of several hundred datapoints, this is not a concern.However, it may turn out to be prohibitive for larger datasets.The exact GPR implementation used in scikit-learn is based on Algorithm 2.1 in [33].Finally, note that all input features fed into the ML model and the outputs should be normalized or standardized prior to training and predicting, as detailed in Appendix B. Also, listed in Appendix B are the performance metrics that are employed in evaluating the pertinence and accuracy (in comparison with FEM simulation results) of the different considered ML model configurations, namely the adjusted R-squared, mean absolute percentage error (MAPE), and maximum absolute percentage error (MAXAPE).The "error" terminology in these metrics simply corresponds to output deviations with respect to FEM simulation results.

Results and Discussion
In this section, the performance of different GPR models is evaluated in order to find an optimal configuration, for which the predictive performance is then compared to that of a conventional analytical film thickness formula for the central and minimum film thickness in elliptical EHL contacts.
First, several kernel functions and a combination of these functions were considered, in order to determine the best performing configuration.The performance of all functions based on the testing dataset is summarized in Table 2. Clearly, ARD-Matern kernels offer the best fit, while the ARD-RBF kernel seems to be the worst.Moreover, combining the ARD-Matern kernel for ν = 3/2 with that for ν = 5/2 further reduces prediction errors with respect to the FEM simulations.For this configuration, MAPE values drop to 0.31% and 1.00% only, for the central and minimum film thicknesses of testing points, respectively, with a MAXAPE of 3.05% and 6.97%, respectively.The corresponding kernel function is obtained by simply combining or adding the ARD-Matern kernel for ν = 3/2 with that for ν = 5/2.One noticeable trend depicted in Table 2 is the larger errors for minimum film thickness predictions compared to central film thickness.Such a trend is not unusual though, and it has often been observed in analytical film thickness formulae [19].This is because the governing physical mechanisms for the central film thickness are better understood than those for the minimum film thickness.Even after decades of trials, the quest for a reliable minimum film thickness formula remains elusive.From the earliest theoretical studies on EHL, it was identified that the central film thickness is governed by lubricant rheology in the low-pressure inlet region of the contact, though proper quantification of the inlet pressure was only recently carried out [46].This has led to the identification of a single pressure-viscosity parameter that would govern the central film thickness response in analytical formulae.There is still no general consensus though on the definition of this parameter, and several variants have been proposed over the years [47].Nonetheless, it is at least clear that central film thickness is governed by the inlet rheology.The minimum film thickness was also thought to be governed by the inlet rheology.Only recently did Habchi et al. [48] discover that it is also influenced by the high-pressure rheology in the central part of the contact.This implies that an additional high-pressure definition for a pressure-viscosity parameter would be needed-whether in analytical formulae or ML frameworks-to properly describe the governing physics of minimum film thickness.Such a definition is yet to be developed, and it is definitely beyond the scope of the current work.Nonetheless, once available, it would be straightforward to include as an additional input feature for analytical formulae or ML models.The lack of a proper understanding of the governing mechanisms of the minimum film thickness is probably the reason why corresponding recent analytical formulae have shifted-with somewhat more successtowards a prediction of the ratio of the central-to-minimum film thickness, rather than a direct prediction of the minimum film thickness itself [49].
The performance of the best performing model (and kernel) based on the testing dataset is illustrated in Figure 6. Figure 6a,b show the predicted central and minimum film thickness values, respectively (using the ML model), compared to their "true" values (given by the FEM simulations).Note that the closer the predictions are to the diagonal, the more accurate they are.The performance metrics for each feature are displayed within its corresponding figure.Clearly, the proposed ML framework is perfectly capable of predicting both the central and minimum film thickness for all considered testing points, with a slightly better precision for the former.Figure 6c,d show the percentage residuals/errors of the central and minimum film thickness, respectively, for all testing points, as a function of their corresponding so-called true value, obtained using the FEM model described in Section 2. The results reveal that errors are rather scattered, with no significant error bias towards low, medium, or high film-thickness valuese.
Lubricants 2022, 10, x FOR PEER REVIEW 16 of 26 Section 2. The results reveal that errors are rather scattered, with no significant error bias towards low, medium, or high film-thickness valuese.3.In comparison with the results of Table 2, it can be seen that all models for all kernel functions perform worse when the scale of input features M and L and outputs H and * m H is linear instead of logarithmic.For the best-performing model (last row in both tables), the logarithmic transformation reduces the MAPE from 0.52% to 0.31% and MAXAPE from 7.48% to 3.05% for the central film thickness.For the minimum film thickness, MAPE is reduced from 1.50% to 1.00% and MAXAPE from 7.32% to 6.97%.Reductions are more significant for other choices of kernel functions.This shows the importance Next, in order to showcase the benefits of transforming the features M, L, H * c , and H * m into their logarithmic values, the models were re-trained using their untransformed/original values.The corresponding results for the testing dataset are presented in Table 3.In comparison with the results of Table 2, it can be seen that all models for all kernel functions perform worse when the scale of input features M and L and outputs H * c and H * m is linear instead of logarithmic.For the best-performing model (last row in both tables), the logarithmic transformation reduces the  Trained ML models can generate thousands of predictions per second, hence offering a similar speed to analytical formulas, such as, for example, the Hamrock and Dowson equations [16] (still probably the most widely used film thickness formulas to date [50], despite decades of development).It would therefore be interesting to compare the accuracy of these formulas to that of the proposed ML model, based on the same testing set, as employed here.Figure 8 shows the performance metrics of the Hamrock and Dowson formulas, in a similar fashion to Figures 6 and 7. Out of fairness, narrow elliptical contacts Trained ML models can generate thousands of predictions per second, hence offering a similar speed to analytical formulas, such as, for example, the Hamrock and Dowson equations [16] (still probably the most widely used film thickness formulas to date [50], despite decades of development).It would therefore be interesting to compare the accuracy of these formulas to that of the proposed ML model, based on the same testing set, as employed here.Figure 8 shows the performance metrics of the Hamrock and Dowson formulas, in a similar fashion to Figures 6 and 7. Out of fairness, narrow elliptical contacts were left out of the testing set, since the Hamrock and Dowson formulas were originally developed for wide and circular contacts only.Not only are the analytical formulas' predictions significantly less accurate than the ML model, but they are also mostly of a non-conservative nature (i.e., they overpredict the film thickness).In addition, the usual significant loss of accuracy of analytical formulae for the minimum film thicknessdiscussed earlier-can be clearly seen in Figure 8b,d.The results of Figure 8 are in general agreement with those of Wheeler et al. [19] who reported relative deviations in central and minimum film thickness predictions using the Hamrock and Dowson formulae that are as high as approximately 22% and 95%, respectively, compared to EHL simulation results.
Lubricants 2022, 10, x FOR PEER REVIEW 18 of 26 predictions significantly less accurate than the ML model, but they are also mostly of a non-conservative nature (i.e., they overpredict the film thickness).In addition, the usual significant loss of accuracy of analytical formulae for the minimum film thickness-discussed earlier-can be clearly seen in Figure 8b,d.The results of Figure 8 are in general agreement with those of Wheeler et al. [19] who reported relative deviations in central and minimum film thickness predictions using the Hamrock and Dowson formulae that are as high as approximately 22% and 95%, respectively, compared to EHL simulation results.The superior predictive performance of ML regression frameworks, like GPR, over analytical film thickness formulas may be attributed to the fact that the former are nonparametric (i.e., they operate in a functional space of infinite dimensions or, in other words, with an infinite number of possible regression functions), whereas the latter are parametric, and they usually rely on a pre-defined single regression function with a certain fixed number of parameters (e.g., Hamrock and Dowson [16]) or the combination of a limited number of pre-defined functions (e.g., Nijenbanning et al. [17], which employs a combination of four pre-defined functions, one for each of the known EHL regimes: rigid-isoviscous, elastic-isoviscous, rigid-piezoviscous, and elastic-piezoviscous).

Remark 1.
Out of fairness, it should be emphasized that the range of M and L that was originally covered in developing the Hamrock and Dowson formulas is significantly smaller than the one covered by the testing dataset.Actually, only a few samples of the testing dataset fall within that range.Therefore, the current comparison is not entirely fair.Nonetheless, it is quite illustrative, The superior predictive performance of ML regression frameworks, like GPR, over analytical film thickness formulas may be attributed to the fact that the former are nonparametric (i.e., they operate in a functional space of infinite dimensions or, in other words, with an infinite number of possible regression functions), whereas the latter are parametric, and they usually rely on a pre-defined single regression function with a certain fixed number of parameters (e.g., Hamrock and Dowson [16]) or the combination of a limited number of pre-defined functions (e.g., Nijenbanning et al. [17], which employs a combination of four pre-defined functions, one for each of the known EHL regimes: rigid-isoviscous, elastic-isoviscous, rigid-piezoviscous, and elastic-piezoviscous).

Remark 1.
Out of fairness, it should be emphasized that the range of M and L that was originally covered in developing the Hamrock and Dowson formulas is significantly smaller than the one covered by the testing dataset.Actually, only a few samples of the testing dataset fall within that range.Therefore, the current comparison is not entirely fair.Nonetheless, it is quite illustrative, since it gives the reader an idea of the range of errors that are involved in using such analytical formulas outside their range of application, a common practice in the EHL literature.In addition, the Hamrock and Dowson formulas were developed using the simplistic exponential pressure-viscosity relationship, which fails to accurately capture the high-pressure rheology of typical lubricants.As such, this would yield relatively inaccurate minimum film thickness predictions, as discussed earlier.The impact on central film thickness should be minimal, however, as long as the correct pressure-viscosity coefficient is employed.This is because the central film thickness is governed by the low-pressure inlet rheology of the lubricant [46], which is well captured, even by such simplistic rheological models.

Conclusions
This study extends the use of Machine Learning (ML) approaches for EHL film thickness predictions to the general case of elliptical contacts by considering wide and narrow contacts over a wide range of ellipticity and operating conditions.FEM simulations are used to generate substantial training and testing datasets that are used within the proposed ML framework.The complete dataset entails 915 samples, split into an 823-sample training dataset and a 92-sample testing dataset, corresponding to 90% and 10% of the combined dataset samples, respectively.The proposed ML model consists of a pre-processing stage in which the conventional EHD dimensionless groups are used to minimize the number of inputs to the model, reducing them to only three.The core of the model is based on GPR, a powerful ML regression tool, well-suited for small-sized datasets, producing output central and minimum film thicknesses also in a dimensionless form.The last stage is a postprocessing one, in which the output film thicknesses are retrieved in a dimensional form.
First, the ML model was tuned to find the most suitable choice/combination of kernel functions, which was then used to make film thickness predictions for the testing dataset.The results revealed the power of the proposed ML approach, producing predictions that are far superior to analytical film thickness formulae in terms of accuracy, for a similar negligible computational effort.Then, the importance of transforming the input Moes dimensionless parameters and the output film thicknesses into a logarithmic scale was quantified.Such a transformation reduces the non-linearity in the ML model, leading to improved prediction accuracy, since central and minimum film thicknesses are known to vary as a power function of the Moes parameters.
To conclude, this study constitutes a first approach towards establishing a generalized ML framework for elliptical EHD contacts, through the use of conventional EHL dimensionless groups, namely the Moes parameters.It was shown that such groups can, in fact, be employed as input features and produce accurate models, contrarily to what was suggested by Marian et al. [30].The generality of the proposed framework is not attributed to the ML model itself, but rather to the well-known central and minimum film thickness similitude of EHD contacts with similar values of the Moes parameters.Nonetheless, some improvements can still be applied to enhance the predictive accuracy.For instance, the influence of lubricant compressibility may be incorporated either a priori by employing several density-pressure responses and adding their corresponding parameters to the input features of the dataset or a posteriori by using a correction factor for the central film thickness [51,52].This is because the influence of lubricant compressibility is known to be restricted to the central film thickness, with no noticeable effect on the minimum film thickness.Incorporating the influence of compressibility would enhance the accuracy of central film thickness predictions.As for minimum film thickness predictions, these could be enhanced through the derivation of a dedicated pressure-viscosity parameter at high-pressure, to be added as an additional input feature.

Nomenclature α *
Reciprocal asymptotic isoviscous pressure coefficient (Pa where α is the scale mixture parameter, which controls the trade-off between capturing long-scale variations and short-scale fluctuations.Lastly, the general ARD-Matern kernel function is written as follows: where K v is the modified Bessel function of the second kind [54], Γ is the gamma function, and ν is an additional strictly positive hyperparameter that controls the function smoothness (ν > 0).In fact, for ν = 1/2, the function is reduced to the Absolute Exponential kernel [33].
When ν → ∞ , the RBF function is obtained.For ν = 3/2, the kernel is once differentiable (i.e., the function and its derivative are continuous), and for ν = 5/2, the kernel is twice differentiable (i.e., the function, its first and second derivatives are continuous).Note that ν remains fixed during optimization, unlike the length-scale hyperparameters, which are optimized.More importantly, these kernel functions can be added or multiplied to model more complex behavior as seen in [34].

Appendix B. Data Standardization and Performance Metrics
The input features fed into the ML model and the outputs should be normalized or standardized prior to training and predicting.This pre-processing step is a typical practice in ML and presents two main advantages [55].The first advantage is that it prevents the dominance of features of higher values and scale over features of lower values and scale (i.e., the dominance of M over L and D in EHL, for example).This effect is reflected by the value of the Euclidean distance employed in the kernels.The second advantage of this practice is that it can accelerate the convergence of the hyperparameter optimization algorithm.Moreover, the inversion of matrix k(X tr , X tr ) becomes a more stable operation when input data are standardized.Given the zero-mean assumption of the output values introduced in Equation ( 19) and for computational reasons, a z-score normalization (or standardization) is preferred for GPR [56].The resulting distribution conforms with the Gaussian distribution, which improves the performance.The scaling transformation centers the data around zero and normalizes them with respect to the standard deviation.The normalized value x i for a given feature x i reads as follows: x i = x i − µ tr (x i ) where µ tr (x i ) and σ tr (x i ) correspond to the feature's mean value and its standard deviation, respectively, both obtained from the training set only.This transformation is applied to each of the N f input features and both output variables individually and to each of the N te + N tr data points of the entire dataset.Every feature has its own mean and variance values and hence its own transformation equation.Note that when normalizing the testing dataset, it is important to use the training dataset mean and standard deviation to avoid data leakage, which indirectly informs the ML model about the testing dataset during training.For outputs, this transformation is reversed after prediction, by inverting Equation (A4) to restore the original scale of values.The performance of trained ML models is assessed using the testing set based on several metrics.One popular metric used to evaluate the trained models is the R-squared metric or coefficient of determination [57].The advantages of this metric include its generality and dimensionless property, which make comparing different models convenient.It is calculated over the testing dataset and is given as follows:

Figure 1 .
Figure 1.The contact geometry of (a) an actual elliptical contact and (b) a reduced elliptical contact.
the Hertzian contact parameters are given as follows:

Figure 1 .
Figure 1.The contact geometry of (a) an actual elliptical contact and (b) a reduced elliptical contact.

Figure 2 .
Figure 2. Computational domain of the EHD point contact.The Reynolds equation describes the pressure distribution over the contact domain

0 P
entrainment speed.A zero-pressure boundary condition ( = ) is imposed on the boundary c ∂Ω of the contact domain c Ω , excluding the symmetry boundary c s

Figure 2 .
Figure 2. Computational domain of the EHD point contact.

Figure 3 .
Figure 3.Comparison of numerical and experimental central and minimum film thickness va tions as a function of the entrainment speed for (a) wide contacts and (b) narrow contacts.

Figure 3 .
Figure 3.Comparison of numerical and experimental central and minimum film thickness variations as a function of the entrainment speed for (a) wide contacts and (b) narrow contacts.

Figure 4 .
Figure 4. Distribution of data points across the input space for unconstrained (a) wide and (b) narrow contacts, as well as constrained (c) wide and (d) narrow contacts.

Figure 4 .
Figure 4. Distribution of data points across the input space for unconstrained (a) wide and (b) narrow contacts, as well as constrained (c) wide and (d) narrow contacts.

Figure 5 .
Figure 5. Diagram illustrating the dataset generation, dimensionality reduction, and feature tion for this study.

Figure 5 .
Figure 5. Diagram illustrating the dataset generation, dimensionality reduction, and feature selection for this study.

Figure 6 .
Figure 6.Predicted vs. true values and percentage residual plots of central ((a) and (c), respectively) and minimum film thicknesses ((b) and (d), respectively) for the best-performing ML model.Next, in order to showcase the benefits of transforming the features M , L , * c H , and

Figure 6 .
Figure 6.Predicted vs. true values and percentage residual plots of central ((a) and (c), respectively) and minimum film thicknesses ((b) and (d), respectively) for the best-performing ML model.

Figure 7 .
Figure 7. Predicted vs. true values and percentage residual plots of central ((a) and (c), respectively) and minimum film thicknesses ((b) and (d), respectively) for the best-performing ML model, without logarithmic scaling of M , L , * c H , and * m H .

Figure 7 .
Figure 7. Predicted vs. true values and percentage residual plots of central ((a) and (c), respectively) and minimum film thicknesses ((b) and (d), respectively) for the best-performing ML model, without logarithmic scaling of M, L, H * c , and H * m .

Figure 8 .
Figure 8. Predicted vs. true values and percentage residual plots of central ((a) and (c), respectively) and minimum film thicknesses ((b) and (d), respectively) for the Hamrock and Dowson analytical formulas for wide and circular cases of the testing dataset only.

Figure 8 .
Figure 8. Predicted vs. true values and percentage residual plots of central ((a) and (c), respectively) and minimum film thicknesses ((b) and (d), respectively) for the Hamrock and Dowson analytical formulas for wide and circular cases of the testing dataset only.

Table 1 .
Operating conditions' ranges of interest for dataset generation.

Table 2 .
Performance metrics of GPR models based on the testing dataset for different kernel functions.

Table 3 .
Performance metrics of GPR models using the initial scale (instead of logarithmic) of M , L , * c h m h Shear stress in the j-direction within a plane having i as normal (Pa) a x , a y Hertzian elliptical contact semi-axes in the x, y-directions (m)A 1 , C 2 Modified Yasutomi-WLF viscosity model parameters ( • C) A 2 , b 1 Modified Yasutomi-WLF viscosity model parameters (Pa −1 ) b 2 , C 1 Modified Yasutomi-WLF viscosity model parameters D Ratio of contact equivalent radii of curvature R x and R y E Equivalent solid Young's modulus of elasticity (Pa) E 1 , E 2Young's moduli of elasticity of solids 1 and 2 (Pa)