Data-Driven Sliding Bearing Temperature Model for Condition Monitoring in Internal Combustion Engines

Condition monitoring of components in internal combustion engines is an essential tool for increasing engine durability and avoiding critical engine operation. If lubrication at the crankshaft main bearings is insufficient, metal-to-metal contacts become likely and thus wear can occur. Bearing temperature measurements with thermocouples serve as a reliable, fast responding, individual bearing-oriented method that is comparatively simple to apply. In combination with a corresponding reference model, such measurements could serve to monitor the bearing condition. Based on experimental data from an MAN D2676 LF51 heavy-duty diesel engine, the derivation of a data-driven model for the crankshaft main bearing temperatures under steady-state engine operation is discussed. A total of 313 temperature measurements per bearing are available for this task. Readily accessible engine operating data that represent the corresponding engine operating points serve as model inputs. Different machine learning methods are thoroughly tested in terms of their prediction error with the help of a repeated nested cross-validation. The methods include different linear regression approaches (i.e., with and without lasso regularization), gradient boosting regression and support vector regression. As the results show, support vector regression is best suited for the problem. In the final evaluation on unseen test data, this method yields a prediction error of less than 0.4 °C (root mean squared error). Considering the temperature range from approximately 76 °C to 112 °C, the results demonstrate that it is possible to reliably predict the bearing temperatures with the chosen approach. Therefore, the combination of a data-driven bearing temperature model and thermocouple-based temperature measurements forms a powerful tool for monitoring the condition of sliding bearings in internal combustion engines.


Introduction
Internal combustion engines (ICE) are employed as energy converters in manifold applications such as transportation of goods and people, machinery and power generation [1][2][3][4]. Their widespread utilization is due to advantageous key characteristics such as high power-to-weight ratio, robustness, efficiency, affordability and large-scale fuel supply infrastructure availability [2,3,5]. Global issues such as climate change, environmental pollution and scarcity of resources are currently posing major challenges to engine manufacturers, who must meet the requirements of substantially reduced emissions of CO 2 and other greenhouse gases, elimination of pollutant emissions and increased service life of ICEs [3,4,6]. Because the development of entirely new ICE concepts requires extensive research and development work, engine manufactures are focusing on increasing the efficiency of existing ICE technology in the short term [7]. One possible solution employs 1.
Condition detection refers to the acquisition of one or more informative parameters which reflect the current condition of the machinery.

2.
Condition comparison consists of comparing the actual condition with a reference condition of the same parameter. 3.
Diagnosis evaluates the results of the condition comparison and determines the type and location of failure. Based on the diagnosis, compensation measures or maintenance activities can be initiated at an early stage.
Besides the diagnosis to determine the type and location of failure, there are other evaluation goals for a CM system as well. Beyond the diagnosis task, Vanem [44] and Mechefske [42] introduce prognostics as a task that provides information about the possible future of the condition of the machinery. Furthermore, condition monitoring can generally be classified as either permanent or intermittent monitoring [43,45].
Existing literature proposes several measurement parameters that can help in detecting the sliding bearing condition. Two main categories are observed. First, there are significant parameters such as vibration, acoustic emission and oil contaminants [46][47][48][49], which can be measured at a certain distance from the bearing. Second, there are informative parameters that have to be measured directly at or inside the bearing. These include bearing temperature [19,50,51], bearing deformation and vibration [19], oil film temperature [19,51], oil film pressure and thickness [50,52,53] and metal-to-metal contact [47]. With the second category, information can be obtained about the condition of each individual bearing, and the signal quality is higher and transient response faster in case of a rapid change in bearing condition compared to measurement parameters acquired at a distance from the bearing [51,54]. At the same time, it likely requires a larger effort for instrumentation due to the restricted access to the bearings and the need to not influence bearing functionality [7]. In the existing literature, bearing temperature measurement with instruments such as thermocouples has proven to be a reliable, continuous, fast responding measurement method that is comparatively simple to apply [15,50,55]. With these characteristics, the method is particularly useful to diagnose bearing failure modes which lead to a rapid change in bearing temperature [47,56,57].
A straightforward approach to condition comparison of bearing temperature values would simply employ a global temperature limit which may not be exceeded during engine operation. With this approach in particular, anomalies in bearing temperature behavior may not be detected if the defined temperature limit is not reached during the anomaly. On the contrary, a bearing temperature model that incorporates the current engine operation would enable the identification of anomalies in bearing temperature as soon as the measured temperature is outside the limits of a comparatively small tolerance range around the predicted model value. For such a model, transient engine operation poses a specific challenge: Due to the thermal inertia of the engine components and the engine operating media, the bearing temperature reacts slowly to swift changes in engine operating conditions such as engine speed and engine torque. However, this is beyond the scope of this paper with its focus on steady-state engine operation. There are two main types of approaches for deriving a bearing temperature model: data-driven approaches and physics-based approaches (also referred to as model-based or model-driven) [44]. While the latter apply physical domain knowledge to formulate a mathematical model of the monitored machinery condition [58], data-driven approaches simply utilize the inherent information in the available data [44]. Finally, the combination of a physics-based and a data-driven approach is often referred to as a hybrid approach [44].
Today artificial intelligence (AI) and in particular machine learning (ML) form the backbone of data-driven approaches. Although AI and ML emerged in recent times, their origins date back to the 1950s and even earlier [59,60]. Machine learning refers to the ability of an AI system to extract its own knowledge from raw data [60]. Therefore, statistical learning methods such as (linear) regression models are usually included in ML [60][61][62]. For machine fault diagnosis, CM or PdM data-driven methods have been proven to work in various engineering application scenarios [63][64][65], yet the simple application and training of data-driven methods is usually not straightforward because proper data and knowledge are required to train a suitable model. Consequently, it is common to use more controllable experimental data rather than data from a real-world application for model training [58]. However, by taking into account the application-specific background and the underlying structure of the experimental data, it is possible to derive a model that can be generalized for real-world applications or at least taken as the basis for further developments.
The goal of this paper is to demonstrate that the combination of a data-driven bearing temperature model and thermocouple-based temperature measurements forms a powerful tool for monitoring the condition of sliding bearings in ICEs. The data-driven model of the crankshaft main bearing temperatures under steady-state engine operation is derived based on experimental data. In order to obtain a model that is realistic for real-world applications, only measured or calculated parameters that would also be available on a production engine are considered as model inputs. In Section 2, information on the experimental investigations is provided and the acquired data are analyzed. In addition, the requirements for a suitable data-driven model, the considered ML methods as well as the model training and selection approach are discussed. The results of the modeling process are then presented and analyzed in Section 3. Finally, the main conclusions and possible next steps are discussed and summarized in Sections 4 and 5.

Experimental Investigations
To generate a measurement database, experimental investigations with a test engine were carried out on one of the Large Engines Competence Center's (LEC) engine test beds at the Graz University of Technology campus. The engine under study is an MAN D2676 LF51 in-line six-cylinder diesel engine, which has a displacement of approximately 12.4 dm 3 and is used for heavy-duty applications. During fired engine operation, a VA Tech Elin EBG GmbH Indy 80/4P/5500 dynamometer with a pendulum stator acted as a brake and thus controlled the engine speed level. The engine was operated and monitored with the test bed automation system PUMA Open version 1.5.3 from AVL List GmbH (Graz, Austria). Some engine parameters were directly measured and retrieved via the engine's electronic control unit (ECU). The additionally applied measurement technology, the experimental setup and the engine operating strategy are summarized below. A more detailed description of the experimental methodology can be found in [7].
Comprehensive external conditioning systems for coolant and lubricating oil and for fuel, charge air and ambient air were employed to ensure defined and accurately reproducible engine operating conditions and to allow the independent adjustment of specific parameters. All relevant parameters such as engine torque and speed as well as media temperatures, pressures and flow rates were measured and recorded with measuring instruments and a data acquisition system. The measuring instruments applied are specified in Section 2.2 (data selection process, cf. Table 1). The temperatures of the seven crankshaft main bearings were measured with type K thermocouples (Class 1 accuracy) fitted through a bore in the bearing support whose measuring tip is in contact with the external surface of the bearing shell. Figure 1 schematically illustrates this measurement setup for a single crankshaft main bearing. The instrumented bearings are numbered one to seven, starting from the clutch side. No measurements of bearing #2 are available due to sensor failure.  The operating points used for the engine tests were based on the sixteen specific operating points illustrated in Figure 2, which include various combinations of engine speed and engine torque. Nearly the entire engine operating map is covered, where engine load is defined as the percentage of the maximum available brake torque at a defined engine speed. After adjustment of each operating point, a settling time of approximately 15 min was observed to ensure that all relevant measurement parameters were in a steady state. These parameters are recorded and averaged over a period of 30 s. In order to examine and ensure the repeatability of the measurements, three consecutive recordings were performed at each investigated operating point.   At the 50 % engine load operating points, oil inlet temperature and inlet pressure were varied individually and independently with the oil conditioning system. Oil temperature was varied from 90°C (standard temperature) to 80°C and 100°C and oil pressure was varied from 4 bar(g) (standard pressure) to 3 bar(g) and 6 bar(g). All parameter variations described above were carried out with three different lubricant oil viscosity grades, whose setting was achieved through oil changes at engine standstill. The employed grades were SAE 10W-40, 10W-20, and 5W-20. Due to testing time limitations, the oil temperature variation was not carried out with the viscosity grade 5W-20. The viscosity of each oil in relation to the oil temperature is shown in Figure 2 (values provided by oil supplier).

Data Selection and Model Requirements
The data include a total of 313 temperature measurements per bearing that originate from 105 different operating points (for two operating points, only two repetitions were valid). Although a large number of parameters were measured during the engine tests, only specific parameters are used to model the bearing temperatures under steady-state operation. The modeled bearing temperature should solely depend on engine parameters that would be available on a production engine as well. Furthermore, by selecting parameters whose influence on bearing temperature can be ruled out on the basis of physical considerations, there is the risk of the model relying on parameters that correlate with the bearing temperature but do not cause it. The following considerations have also affected data selection: • Due to the applied conditioning systems, coolant-related parameters such as pressure and temperature at inlet or outlet are constant and therefore irrelevant to modeling. This also applies to ambient air temperature. The coolant mass flow, however, is a measurement result that varies according to the applied engine operating point. Therefore, it is considered as a model input candidate.

•
Indication system-based measurements such as in-cylinder pressures and key figures derived from them are not considered because they are usually not available on a production engine. • Linear transformations of a single measured or calculable parameter are not used for modeling. For example, the break mean effective pressure (BMEP) is a linear transformation of the brake torque, which is considered to be available from the ECU of a production engine. Parameters that are calculated from multiple other parameters (e.g., brake-specific fuel consumption is calculated from fuel mass flow and engine power) are considered as model input candidates.
Taken together, the measured or calculated parameters listed in Table 1 are considered for modeling the seven crankshaft main bearing temperatures. With the bearing-related information, i.e., the targeted temperature as well as the bearing identification parameter, it is possible to distinguish between the bearing positions during modeling. As an additional model input candidate, the oil type information is indirectly included via the temperaturedependent oil viscosity curves shown in Figure 2, where for each measurement, the oil temperature at the engine inlet is used as reference. To avoid any circular reasoning, it was decided not to use the crankshaft main bearing temperatures as references for a bearingspecific viscosity approximation, but rather the technically most proximate one. Moreover, this viscosity information could be adapted for use in a production engine as well.  Figure 2 cSt calculation - Figure 3 shows all available temperature measurements, where each temperature profile corresponds to an associated measurement recording. As already mentioned, the temperature values for bearing #2 (MB2) are missing (for graphical representation, these values were linearly interpolated, but the interpolation must be considered inaccurate). As a result, the bearing position-related information is considered as a nominal variable, i.e., there is no ranking of the positions. To avoid an arbitrary joint numerical encoding (e.g., a single bearing variable taking values from 1 to 7), so-called one-hot encoding (also referred to as full dummy encoding) is used. As described in Table 1, by using one-hot encoding, each bearing position is a single Boolean variable (i.e., true or false), which is then binary encoded for modeling (i.e., 1 or 0). Through this encoding, it is possible to derive a single model that includes all bearing positions and can serve as reference during condition comparison of newly measured bearing temperatures.  Based on Figure 3, it also appears that the peripheral bearings #1 and #7 generally exhibit a significantly lower temperature level than bearings #3 to #6. This is likely because the thermal load is lower at the ends of the crankshaft, where there is only one neighboring crankpin and where heat dissipates more quickly due to a considerable temperature gradient towards the crankcase. All six observed bearing-related temperature distributions are similar but have shifted centers. The single temperature profiles are smooth and appear shifted by an individual base level. This is another motivation for using the one-hot encoded bearing position. However, all curves do not behave uniformly, especially at bearing #5.
The distributions of the measured and calculated parameters shown in Figure 4 do not provide a uniform picture either. Since the parameters do not change between the bearings, they are included once (i.e., 313 observations per plot). Due to the experimental design, several parameters have a rather discrete or multimodal distribution. Furthermore, some parameters distributions are skewed or heavy-tailed.
As shown in Figure 5, some parameters are also correlated with each other. The Pearson correlations in the left correlation matrix plot are calculated using the raw values and indicate the linear relationship between two parameters. The Spearman's rank correlations are based on the parameter rankings and indicate how monotonic the relationship between two parameters is. Both correlation matrices show a similar picture and indicate so-called multicollinearity and redundancy of several parameters. For example, while all air temperature-related parameters are positively correlated, the intake air pressure at intake manifold has a (weak) negative correlation with all of them. As might also be expected, load and engine power are positively correlated with engine torque.    However, if there are non-monotonic relationships, both Pearson and Spearman-type correlations can miss out on important associations [67]. In contrast, Hoeffding's D [68] is a general and robust similarity measure for detecting dependencies [67]. Dendrograms from hierarchical cluster analyses of the Pearson correlation and the Hoeffding's D similarity measures are shown in Figure 6. The further right a split is in a dendrogram, the stronger the correlation/dependency between two subsequent clusters. Both reveal a similar basic relationship structure between the engine operation parameters. But they also indicate different relationships for some parameters. While m_oil_inlet and p_oil_inlet are only strongly (linearly) correlated with each other (cf. Figure 5), m_oil_inlet is closer to p_air_intake and visc_oil_inlet in terms of Hoeffding's D. Hoeffding's D clustering While collinearity makes it difficult to interpret the effect of individual model parameters, it does not affect predictions that are made on datasets similar to those on which a model was fit [67]. Therefore, whether collinearity is problematic is closely related to the actual aim for which a data-driven model should be used.
Deciding between interpretable model types and black boxes or between parsimony and complexity are two of the many choices that need to made when deriving a model [67]. While a simple and rather inflexible method is advantageous if inference is the goal of a data-driven approach, the interpretability of a model does not matter if the focus is on prediction [61]. But complex and highly flexible ML methods do not necessarily result in more accurate predictions because they also have a higher risk of overfitting [61]. For this reason, a proper model training strategy is required, especially for highly flexible ML methods. In the event that a black-box model eventually yields the best results, however, it is still possible to gain insights by developing an interpretable approximation to the black-box model [67].
First and foremost, the data-driven model for monitoring the bearing temperatures should be capable of accurately predicting the temperature values based on the engine parameter inputs. In terms of machine learning, a predictive model attempts to predict a given target using other variables (or features) in the dataset as inputs [70]. Since the target is given, such a task is usually referred to as supervised learning. On the contrary, unsupervised learning lacks a target and aims to better understand and describe a given dataset [70]. Depending on whether the target is a numeric (continuous) or a categorical (discrete) outcome, supervised learning is further divided into regression tasks and classification tasks, respectively [70]. Since the bearing temperature is a numeric variable, ML methods for regression tasks may basically be applied.

Machine Learning Methods
From ordinary linear regression to highly complex deep neural networks, there are various ML methods available to address a regression problem. Yet since there is no single best method for all possible datasets, it is challenging to select the best approach [61]. In addition to the considerations regarding the model aim discussed above, the available data also inherently influence the choice of potential methods. This paper examines three different ML methods: linear regression (with and without lasso regularization), gradient boosting regression and support vector regression. They differ in their interpretability as well as flexibility, whereas flexibility is significantly affected by so-called hyperparame-ters-the "adjusting screws" of an ML method that require sophisticated tuning during model training with the so-called training data.
Many different software solutions are available for the computational implementation of a machine learning project. For this paper, model training is basically carried out and controlled using the statistical programming language R [71]. Furthermore, seamless integration of Python [72] libraries such as scikit-learn [73] has been achieved through a self-implemented solution based on the R packages reticulate [74] and R6 [75].

Linear Regression and Regularization
Linear regression is a fairly simple method that often provides an adequate and interpretable description of how features affect a target [62]. The (multiple) linear model (LM) has the form where x = (x 1 , . . . , x p ) T is the p-dimensional vector of input variables and β = (β 0 , . . . , β p ) T are the corresponding model coefficients [62]. The most popular method for obtaining estimates of the model coefficients is the least squares method, which aims to minimize the residual sum of squares where the (training) data consist of n instances (or observations) of targets y i and feature vectors x i = (x i1 , . . . , x ip ) T , i = 1, . . . , n. An LM has no hyperparameters to tune and is easy to fit (i.e., there is a unique solution of the least squares minimization problem). However, the set of features and the inherent model formula have to be set beforehand. Often stepwise feature selection techniques (e.g., those based on statistical hypothesis tests) are applied to find the most important features, but these techniques are associated with major problems and should be avoided [67]. In contrast, the so-called lasso [76] is a regularization method that shrinks coefficients towards zero by adding the penalty term λ · ∑ p j=1 β j to the RSS minimization problem (2), where the tunable hyperparameter λ ≥ 0 determines the strength of the shrinkage penalty. With the lasso, all features are considered, but if λ is large enough, some coefficients are forced to be exactly zero and a feature selection is performed [61]. For a fair comparison of the features, they have to be on similar scales and thus an initial standardization is required (i.e., center a variable by its mean and divide it by its standard deviation) [77].
In this paper, two fixed LM structures (both realized directly in R) are considered for predicting the bearing temperatures: • An LM including all available engine parameters (cf. For a sparser representation of the bearing temperatures, an LM with lasso regularization is also evaluated. In this paper, the R package glmnet [78] was chosen for the lasso computations because it allows for individual penalty factors per coefficient. In this way, the shrinkage of a coefficient can be omitted (i.e., the corresponding variable is always included in the model) or even more strongly forced [77,78]. Based on the underlying data structure and the discussed model aim, the bearing position identifiers (MB1, . . . , MB7) are not penalized. All engine parameters (cf. Table 1) are penalized equally except for parameters that are directly calculated from other parameters, which are additionally penalized by the number of other parameters involved. Therefore, engine power P (product of N and Md) and BSFC (calculated with m_fuel and P) are penalized two and three times, respectively.

Gradient Boosting Regression
Gradient boosting (GB) regression is based on decision trees. A decision tree is a summary of rules used to split the feature space into different regions/partitions [61]. Although a single decision tree is hardly capable of adequately modeling a regression problem, many of these weak trees can be aggregated into a potentially very powerful predictive model (a "committee") using a so-called ensemble method [61,62]. With GB, the regression tree ensemble learns sequentially, i.e., each tree learns from the previous one by being fit on the residuals of the previous tree (i.e., the differences between actual and predicted target values), thereby improving the ensemble [70]. The aggregated predictive model of B trees has the form where x = (x 1 , . . . , x p ) T is the p-dimensional input vector and each f b is a single decision tree that was fit on the residuals from f b−1 [70]. This GB approach minimizes the mean squared error loss function and is considered as a gradient descent algorithm that can be generalized to other loss functions as well [70].
There are many software solutions available with different variants of GB algorithms. For this paper, the widely used XGBoost library (short for eXtreme Gradient Boosting) [79] is applied via its scikit-learn API. XGBoost offers a variety of tunable hyperparameters including additional regularization terms. Table 2 summarizes the hyperparameters used for tuning. Table 2. XGBoost hyperparameters of the scikit-learn API for gradient boosting regression used for tuning [79].

n_estimators Number of trees used for boosting (corresponds to B) eta
Learning rate of boosting updates (cf. gradient descent) max_depth Maximum depth of a single tree min_child_weight Minimum number of data instances/weight for a child node in a tree gamma Minimum loss reduction required for further partitioning on a leaf node lambda Ridge regression-analogous L2 regularization on tree weights alpha Lasso regression-analogous L1 regularization on tree weights

Support Vector Regression
Support vector regression (SVR) is essentially an adaptation of the support vector machine (SVM) that is intended for binary classification problems. SVR aims to find a function in the feature space that should not deviate from each target by more than a tolerance margin ε and at the same time is as flat as possible [80]. While errors less than ε are not penalized, errors greater than ε are penalized with an additional hyperparameter C > 0. As a result, there is a trade-off between the flatness of the function and the tolerance for larger errors [80]. The corresponding loss function for an error ξ is therefore called ε-insensitive loss function |ξ| ε := max(0, |ξ| − ε) [80]. This type of SVR is usually referred to as ε-SVR [80].
Analogous to the SVM, the real strength of the SVR comes from using the so-called kernel trick, in which the feature space is implicitly projected into a higher-dimensional space, where the problem may be easier (linear) to solve. In this way, it is also possible to model nonlinear target behavior in the original feature space. For a p-dimensional input vector x = (x 1 , . . . , x p ) T and based on the data consisting of n feature vectors x i = (x i1 , . . . , x ip ) T , i = 1, . . . , n, the SVR model has the form where β 0 is an intercept term, α i , i = 1, . . . , n, are instance-related coefficients that need to be optimized with regard to the targets and k is the kernel function [81]. There are various kernel types with different properties and even additional hyperparameters [80,81]. This paper considers the linear kernel k(x i , x) = x i · x and the radial basis function (RBF) kernel k(x i , x) = exp −γ · x i − x 2 , γ > 0. While the linear kernel is very simple, the RBF kernel actually projects into an infinite-dimensional space. Thus in contrast to the linear kernel, the interpretability of the model in respect of the feature space is lost with the RBF kernel.
The kernel selection is considered as an additional hyperparameter that is tunable. The SVR is evaluated using the scikit-learn implementation [73], and Table 3 summarizes the hyperparameters used for tuning. Since SVR is a distance-based method like the lasso LM, similar feature scales (e.g., via standardization) are required. Table 3. Hyperparameters of the scikit-learn implementation for support vector regression used for tuning [73].

Hyperparameter Description epsilon
Margin tolerance of ε-SVR C Trade-off (regularization) parameter kernel Kernel function to be used ("linear" or "rbf") gamma Coefficient for RBF kernel

Model Training and Selection
A good predictive model does not need to perform well on the already known training data, but it should accurately predict previously unseen test data [61]. Therefore, it is of interest to find the method that yields the lowest test error (or loss) rather than the lowest training error [61]. There are various measures for assessing the prediction error/loss of an ML method. This paper uses the mean squared error (MSE) for training and evaluation of the ML approaches presented above. Later, the mean absolute error (MAE) will also be used for comparison purposes. For n pairs of actual target values y = (y 1 . . . , y n ) T and model predictionsŷ = (ŷ 1 , . . . ,ŷ n ) T = (f (x 1 ) , . . . ,f (x 1 )) T , they are defined as follows: For better interpretation in terms of the original units, the root mean squared error (RMSE), RMSE(y,ŷ) := MSE(y,ŷ), will also be reported in some instances. In ML, the entire dataset is usually split into data for training and data for testing that is not used during model training to accommodate the idea of unseen test data. However, there are some pitfalls to this approach. On the one hand, if the model was trained on data completely different from what it is tested on, the risk is high that the results are not accurate. On the other hand, if there is any information leakage from the test data to the training data, the performance evaluation on the test data might be positively biased. An example of such information leakage is the use of all data for standardizing. Therefore, proper model training requires the performance of such steps solely using the training data. This also applies to the data analyses presented in Section 2.
To evaluate the performance of the final bearing temperature model, the data are randomly split into approximately 75 % training data (235 measurement recordings) and 25 % test data (78 measurement recordings). Since the random sampling is performed on the 313 measurement recordings, all six bearing measurements from one measurement recording are in the same split. In addition, the random sampling is restricted so that all two or three measurement recordings of each of the 105 engine operating points are in the same split (i.e., grouped sampling to ensure that probably very similar measurements are kept together) and the training and test data both have similar temperature distributions (i.e., stratified sampling on the mean bearing temperature per measurement recording).
Only the training data are used to tune all the ML approaches and find the best predictive model. To tune the hyperparameters of an ML method, it is again necessary to compare and evaluate the performance of different hyperparameter settings on previously unseen data. A so-called k-fold cross-validation (CV) is often used for hyperparameter tuning. For the CV, the training data are split into k equally sized parts, and each part is used once for validation (testing) iteratively while training is performed using the k − 1 others. The average of the k errors, the CV error, indicates the hyperparameter setting with the best overall performance. A strategy is also required to define a set of hyperparameter candidates in the first place. Two common approaches are the grid search and the random search. While the grid search involves evaluating all combinations of a predefined hyperparameter grid with a CV, the random search consists of random sampling of a fixed number of combinations from specified hyperparameter distributions.
However, CV also has its potential pitfalls [82]. For example, the approach discussed above (for hyperparameter tuning) may yield optimistically biased estimates of the generalization error of a model [83]. The entire process of tuning a model should be seen as an integral part of model fitting and be validated as well [83] including all preprocessing steps. To this end, complete modeling procedures (or pipelines) must be evaluated and compared.
In this paper, the bearing temperature model is derived using a so-called repeated nested cross-validation. Figure 7 illustrates a nested CV process. During each outer CV iteration, the currently available training data are again split for the inner CV. While the inner CVs are used to train and tune the modeling procedures, the outer loop is used to compare their performance. Analogous to the basic train-test split described above, all outer and inner CVs samplings are again grouped by engine operation points and stratified by the bearing temperature values. To derive reliable generalization error estimates, the entire nested CV procedure is repeated multiple times. The lowest mean CV error determines the most suitable modeling procedure for the bearing temperature CM model. This modeling procedure is then fit again on the entire training data before it is assessed on the unseen test data.
The entire nested CV procedure has been self-implemented in R, including support for parallel computing. In combination with the R-based solution, which allows seamless integration of Python, it is possible to evaluate the R and the Python procedures with random but identical CV splits. With this CV implementation, a nested CV is repeated 25 times to derive the results presented below. In the course of this, five folds are used for all outer as well as inner CVs. Since the hyperparameter search is an integral part of each modeling procedure, algorithms suited for each ML method are applied. While glmnet's default log scale-based 1D grid search [78] is used for the LM with lasso regularization, scikit-learn's [73] RandomizedSearchCV (with 1000 samples) and GridSearchCV (with 1456 combinations) are used for the GB regression and the SVR, respectively.

Procedure comparison
Fit best procedure on entire training data  Figure 8 presents the distributions of the CV errors over the 25 nested CV repetitions. Both the MSE-based CV errors (i.e., the minimization target during training) and the CV-related MAEs are provided. The two plots show that for all CV repetitions, the SVR modeling procedure best predicts the bearing temperatures. Considering the temperature range from approximately 76°C to 112°C, considerably small CV errors of less than 1°C (both RMSE and MAE) are achieved with the SVR. The XGBoost regressions perform worse than the LMs using all engine parameters (LM_all) and the lasso-regularized LMs (LM_lasso). Compared to the other methods, the results of the GB approach are also not that stable. Nevertheless, all these plotted methods are able to predict the bearing temperatures well. For graphical reasons, the results of the LM that relies on the categorical bearing position only (LM_bearing) are not displayed. As listed in Table 4, this crude approach yields stable yet much higher errors than all other ML methods.

Model Assessment on Test Data
To assess the performance of the SVR on the test, the entire modeling procedure is carried out again using all training data. Evaluated by means of a 5-fold CV, the grid search yields the optimal hyperparameters reported in Table 5. As shown in Figure 9, the SVR also performs very well on the previously unseen test data. There are only minor errors and the residuals do not show any patterns, so the model has a similar predictive accuracy throughout the entire temperature range. The largest residuals of the test data predictions are observed with regard to bearing #5. Moreover, the error comparison in Table 6 shows that the test errors are in accordance with the CV errors. The higher errors for bearing #5 are also in agreement with the graphical observations of the bearing temperature profiles (cf. Figure 3).  Figure 9. Graphical analysis of SVR predictions on previously unseen test data. Since the RBF kernel is chosen, the SVR does not allow for a direct interpretation of the feature's importance. The LM with lasso regularization (also trained on the entire training data) is strongly correlated with SVR results (Pearson correlation of 0.9881). This also applies to the previously derived CV results, where the correlation between the predictions of these two ML methods ranges between 0.9871 and 0.9905. The LM with lasso regularization (with an optimal hyperparameter λ = 0.0237) allows a direct interpretation of the features. The selected variables (nonzero coefficients) form the simple bearing temperature model T_MB = 13.2245 − 5.7486 · MB1 − 0.2673 · MB3 + 1.8001 · MB4 + 0.7950 · MB5 + 1.7327 · MB6 − 2.5297 · MB7 + 0.0083 · N + 2.0051 · load − 0.0006 · m_oil_inlet + 0.8787 · T_oil_inlet − 0.0644 · T_oil_sump + 0.0034 · m_air_inlet − 4.7267 · p_air_intake + 0.3624 · visc_oil_inlet, where the coefficients correspond to the non-standardized inputs of the engine parameters. For proper interpretation of the interrelationships, it is necessary to rely on currently available information only. Although not very different from the entire dataset, the Pearson correlations and the Hoeffding's D statistics of the engine parameters for the training data only are provided in Figure 10. Except for the strong positive correlation between N and m_air_inlet, there are no other strong correlations among the model variables due to the lasso regularization. Replacing a model variable in the LM equation (6) above with another non-included engine parameter that is only correlated to that included model variable would not change the predictive performance of the model greatly. Therefore, replacing m_oil_inlet with p_oil_inlet would not significantly change the model results (if the unit-related coefficient is compensated). As to be expected, T_oil_inlet has an effect on the bearing temperature and is (weakly) related to visc_oil_inlet. However, no further dependencies on T_oil_inlet are observed. All other selected features of the LM_lasso model (6)

Discussion
Considering the temperature range from approximately 76°C to 112°C, with a prediction error of 0.3995°C (RMSE on previously unseen test data), the results presented above show that it is possible to reliably predict bearing temperatures on the basis of engine operation parameters. However, the results also demonstrate that there is often a trade-off between the interpretability and the predictive quality of a data-driven approach. While the best model obtained (an SVR with a radial basis kernel) does indeed perform excellently as it predicts the bearing temperatures on the basis of engine operation parameters, it does not allow for a direct interpretation of their importance. Nevertheless, given the wide range of ML methods applied, it has also been demonstrated that a simpler and more easily interpretable approach (the LM with lasso regularization) serves as an understandable approximation of the best model obtained. Since only a subset of the engine parameters is used for predicting the bearing temperatures, the simpler approach would also be more robust to potential sensor failures.
Considering the comparatively small amount of data available for an ML application, more data will be acquired in a follow-up measurement campaign to improve and validate the derived data-driven CM model as well as to acquire data from the currently missing bearing position #2. Hence, the bearing position correlation and encoding will be reevaluated. With the insights already gained (especially from the interpretable ML approach), meaningful data can be efficiently acquired and very low or high bearing temperatures can be specifically studied.
To further improve the performance of the predictive model, additional ML methods such as kernel ridge regression or random forests could be evaluated as well. It might also be beneficial to enhance the LM approaches by using parameter transformations or interaction terms. Linear additive models, for example, also permit modeling of the nonlinearity of certain features. Additional preprocessing steps such as principal component analysis may help to further improve the performance of a modeling procedure. All these model types and methods could easily be implemented in the previously created modeling pipeline. Of course other ML methods such as artificial neural networks could improve the predictions as well, yet such methods usually require even greater training effort, which would necessitate an adapted framework.
The present paper does not address data from transient engine operation. Since bearing temperature reacts comparatively slowly to swift changes in engine operating conditions such as engine speed and engine torque, transient operation poses special challenges to both data collection and experimental design. For reliably modeling transient engine operation, it will probably be necessary to consider time-dependent effects and correlations. In order to reflect all possible engine operating modes, future investigations will focus on transient engine operation as well.

Conclusions
This paper demonstrates that it is possible to reliably predict sliding bearing temperatures that are measured with thermocouples fitted through a bore in the bearing support. Solely depending on engine operation parameters, the data-driven model that is ultimately derived is well suited to serve as a reference during condition comparison in a CM system under steady-state engine operation. As part of such a system, it enables the identification of anomalies in bearing temperature as soon as the measured temperature is outside the limits of a comparatively small tolerance range around the predicted model value. The combination of a data-driven bearing temperature model and thermocouple-based temperature measurements, therefore, is an eminently suitable solution for monitoring the condition of sliding bearings in ICEs. An application is particularly suitable for large ICEs because the cost for bearing instrumentation is relatively low compared to the potential cost of an engine failure caused by the bearing system. Although this paper investigates crankshaft main bearings in a heavy-duty diesel engine, the approaches it discusses could be applied to other engine types or similar problems as well. Data Availability Statement: The study did not report any data.

Acknowledgments:
The authors would like to acknowledge the financial support of the "COMET-Competence Centres for Excellent Technologies" Programme of the Austrian Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology (BMK) and the Federal Ministry for Digital and Economic Affairs (BMDW) and the Provinces of Styria, Tyrol and Vienna for the COMET Centre (K1) LEC EvoLET. The COMET Programme is managed by the Austrian Research Promotion Agency (FFG).

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: