Fatigue Life Prediction of 25CrMo4 Alloy Steel Based on Interpretable Methods

Li, Ze-Cheng; Chen, Xiao-Min

doi:10.3390/ma19122544

Open AccessArticle

Fatigue Life Prediction of 25CrMo4 Alloy Steel Based on Interpretable Methods

by

Ze-Cheng Li

¹ and

Xiao-Min Chen

^2,*

¹

International Institute of Engineering, Changsha University of Science and Technology, Changsha 410114, China

²

College of Mechanical and Vehicle Engineering, Changsha University of Science and Technology, Changsha 410114, China

^*

Author to whom correspondence should be addressed.

Materials 2026, 19(12), 2544; https://doi.org/10.3390/ma19122544

Submission received: 13 May 2026 / Revised: 5 June 2026 / Accepted: 9 June 2026 / Published: 12 June 2026

(This article belongs to the Special Issue Advances in Fatigue and Fracture of Materials: Mechanisms, Modelling and Emerging Experimental Methods)

Download

Browse Figures

Versions Notes

Abstract

The fatigue failure of railway axles is directly associated with the operational safety of trains. As 25CrMo4 steel is commonly employed for high-speed train axles, precise evaluation of its fatigue life is essential for transportation reliability. This study compared six machine learning models following hyperparameter optimization via a differential evolution algorithm. The DE-optimized Gaussian process regression (DE-GPR) model exhibited superior predictive performance, achieving a coefficient of determination (R²) of 0.8020 and a root mean square error (RMSE) of 0.1250 on the most significant outer test fold. Furthermore, an interpretable analysis of the model utilized a combination of SHapley Additive exPlanations (SHAP) and partial dependence plots (PDP) to elucidate feature importance. The results indicate that the applied stress level is the predominant feature affecting fatigue life predictions and that it slightly interacts with surface residual stress and full width at half maximum to influence the predicted fatigue life. This study can provide valuable insights into the fatigue life assessment and process optimization of 25CrMo4 steel components.

Keywords:

25CrMo4; fatigue life prediction; machine learning; SHAP method

1. Introduction

Axles, as critical components of high-speed trains, bear complex cyclic loading. 25CrMo4 alloy steel, also designated as EA4T steel, has been extensively employed in the fabrication of high-speed railway axles due to its superior mechanical properties [1,2]. Various defects inevitably occur in axles during manufacturing and in-service operation, which increases the likelihood of fatigue crack initiation and poses serious potential safety hazards to train operations [3,4,5]. To extend the fatigue life of axles, multiple fatigue performance enhancement techniques have been applied, including ultrasonic surface rolling [6], induction hardening [7], and deep rolling processes [8]. Among these, shot peening, a well-established surface modification technology, can significantly improve fatigue performance by introducing compressive residual stress into the material [9]. A previous study confirmed that micro-shot peening (MSP) can simultaneously achieve maximum surface compressive residual stresses (up to −530 MPa), minimize surface roughness, and increase surface microhardness, substantially improving the material’s fatigue limit [10]. Thus, the efficient prediction of fatigue life in 25CrMo4 steel, taking into account material defects and MSP treatment, is a critical scientific issue.

Currently, the development of physics-based theoretical models is a common approach. Rooted in well-understood physical mechanisms, this method uses experimental fatigue data to validate and refine the model, thereby enhancing predictive accuracy. For example, in the Chaboche fatigue damage model, Ling et al. [11] considered the effects exerted by defects and nonlinear mean stress effects arising from the laser powder bed fusion (LPBF) process. Through parameter calibration based on experimental S-N fatigue data and defect characteristics, they successfully predicted the high-cycle fatigue life of Ti-6Al-4V alloy. However, physical theoretical models rely on empirical assumptions and mechanistic simplifications. These models may overlook the actual service conditions of railway axles. When developing a high-cycle fatigue life prediction model for TC17 alloy based on the classical Paris law of fracture mechanics, Ding et al. [12] did not consider microcrack orientation and length in their calculations. Similarly, Xu et al. [13] noted that the Paris model produces overly conservative prediction results for press-fitted axle life due to the neglect of the stress ratio effect caused by press-fitting. Although the NASGRO model developed by their research group offers improved prediction accuracy, it ignores the load spectrum effect. Consequently, inevitable errors are caused in predictions.

Machine learning models have gained widespread attention to mitigate prediction errors that arise from the incomplete analysis associated with conventional methods. Leveraging their robust theoretical foundations, machine learning approaches can effectively extract data features and quickly establish prediction models under multifactorial mechanisms, thereby overcoming the limitations of idealization [14,15]. After evaluating three different machine learning models, Zhan et al. [16] found that the random forest (RF) model achieved superior accuracy in predicting the fatigue life of SS316L material. Bao et al. [17] utilized X-ray tomography to characterize Ti-6Al-4V specimens and incorporated the extracted location, size, and morphology of internal defects into the feature set used to train a support vector machine (SVM) model, achieving a coefficient of determination (R²) of 0.99 between predictions and experimental results. Horňas et al. [18] employed a Bayesian optimization variant known as the tree-structured Parzen estimator (TPE) to optimize the hyperparameters of a gradient boosting regression (GBR) model, thus producing a highly precise predictive model. Gan et al. [19] utilized the kernel extreme learning machine (KELM), a neural network with a single hidden layer, to predict fatigue life considering mean stress effects, with the model yielding an R² of 0.961 and a mean squared error (MSE) of 0.063.

Despite being constrained by the inherent “black-box” nature of machine learning models, which limits their interpretability, the specific decision-making processes of these models remain opaque [15], and the relationships between input features and predicted outcomes are not intuitive. In recent years, SHapley Additive exPlanations (SHAP) [20], which is rooted in game theory, has been applied to analyze machine learning models interpretably. Horňas et al. [21] used various machine learning models to predict the fatigue life of Ti-6Al-4V and subsequently applied the SHAP method for interpretable analysis. Their results indicated that critical features such as maximum stress and defect size exhibited a significant negative correlation with fatigue life, whereas features including defect compactness (d_cmpt) and sphericity (d_sphr) positively influenced the fatigue life predictions of most models. These findings closely align with existing experimental observations. Jafari et al. [22] employed machine learning to predict the brittle fracture strength of glass and ceramic materials and utilized the SHAP method to analyze feature contributions. Their results demonstrated that the dimensionless parameter (

\sqrt{t / R}

) was the most influential factor governing fracture strength, whereas Poisson’s ratio (

ν

) had negligible effects on the prediction. The feature importance interpreted by SHAP aligned closely with classical brittle fracture mechanics theories. Zhang et al. [23], in their study of the fatigue behavior of high-strength bolts, employed the SHAP method to examine the interaction effects among input features. Their analysis revealed that the interaction between SA and MAXS exerted the most pronounced influence on fatigue life, suggesting that a concurrent increase in both parameters should be avoided in practical engineering applications due to their synergistic detrimental effects. Conversely, the interaction between SA and MAXF exhibited an insignificant impact on fatigue performance. Yu et al. [24] also focused on feature interactions through SHAP analysis, elucidating that stress amplitude

σ_{am}

exhibits strong and complex interaction mechanisms with density

ρ

, laser volume energy density E_V, and yield strength (YS). Specifically, elevated values of

ρ_σ_{am}_sub

and

Ev_σ_{am}_sub

played critical roles in enhancing the fatigue life of laser powder bed fused Ti-6Al-4V alloy. Another study [25], focusing on the fatigue life of corroded steel wires, quantified the average contribution of each feature using the SHAP method. It clarified that the stress amplitude range (S) constitutes a key parameter influencing fatigue life. Concurrently, SHAP dependence plots revealed a threshold effect for S at 360 MPa, and the adverse effect of corrosion level w on fatigue life shows a pronounced nonlinear increase once it exceeds 5%. In conclusion, SHAP effectively quantifies the contribution of features to model outputs and analyzes the influence mechanisms and interaction effects among features, substantially enhancing the interpretability of machine learning models.

This study aims to develop an interpretable machine learning model for efficiently predicting the high-cycle fatigue life of 25CrMo4 axle steel. The differential evolution (DE) algorithm, used in conjunction with nested five-fold cross-validation [26], optimizes the hyperparameters of the candidate machine learning models. The optimal model, selected through comparative evaluation, is further analyzed by integrating SHAP with partial dependence plots (PDP). This research constitutes the first application of the SHAP-PDP joint interpretative framework to MSP 25CrMo4 steel, providing relevant insights from a machine learning perspective. Additionally, partial dependence plots are employed to verify the SHAP analysis, minimizing the risk of misinterpretation from a single method and enhancing the reliability of model explanations. By balancing predictive efficiency with interpretability, this framework provides a theoretical reference for the fatigue reliability design of 25CrMo4 alloy axles.

2. Data Source and Preprocessing

2.1. Data Source

The data utilized in this study were sourced from the notched fatigue performance experiments on 25CrMo4 alloy steel conducted by Li et al. [27]. A total of 89 rotating bending fatigue tests were performed on both MSP and untreated specimens. The dataset includes the applied stress level (

σ_{a}

), surface residual stress (

σ_{r s}

), full width at half maximum (FWHM), average roughness (

R_{a}

), equivalent notch size (

\sqrt{a r e a}

), and corresponding fatigue life values of the specimens. These variables describe the external load, residual stress field, work hardening, surface morphology, and notch defects, rendering the dataset particularly suitable for this study.

2.2. Data Analysis and Preprocessing

To investigate the correlations between features, the Pearson correlation coefficient is employed to quantify linear relationships, as defined by [28]:

r_{x y} = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}} \sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}}

(1)

where

r_{x y}

denotes the Pearson correlation coefficient,

n

represents the sample size,

x_{i}

and

y_{i}

are the two variables (

i

= 1, …,

n

),

\bar{x}

and

\bar{y}

are their averages.

As illustrated in Figure 1, a strong negative correlation exists between

σ_{a}

and

\sqrt{a r e a}

, with the most pronounced negative correlations observed between

σ_{r s}

and FWHM (r = −1.00), and between

σ_{r s}

and

R_{a}

(r = −0.99). Additionally, an exceptionally strong positive correlation is detected between FWHM and

R_{a}

, with a correlation coefficient of 0.99. These findings indicate that MSP can introduce compressive residual stresses into the material, simultaneously increasing work hardening and surface roughness, consistent with the experimental results [29].

Moreover, the relationships among the input features and between each input feature and fatigue life were analyzed. A negative correlation was found between

σ_{a}

and

N_{f}

(r = −0.57). Conversely, weak correlations are observed between

N_{f}

and other features: the correlation coefficient for

σ_{r s}

is 0.14, for

\sqrt{a r e a}

is 0.25, and for FWHM and

R_{a}

are −0.15 and −0.16, respectively.

To develop the predictive model, the dataset underwent data preprocessing, which included data cleaning, feature scaling, and target variable transformation.

During feature scaling, all input features were then standardized using the Z-score method, defined as follows [14]:

x^{'} = \frac{x - μ}{σ}

(2)

Here,

x

represents the original data,

x^{'}

is the normalized data,

μ

and

σ

denote the mean and standard deviation of the original data, respectively.

The transformation of the target variable was necessary since the fatigue life (

N_{f}

) values varied across several orders of magnitude. To facilitate model convergence, the raw values were transformed to

\log_{10} N_{f}

.

Given the limited sample size, a fixed train–test partition was not adopted to avoid evaluation bias from a single split. Instead, 5-fold cross-validation was employed to evaluate model performance.

3. Predictive Models and Interpretable Analysis Methods

3.1. Predictive Models and Optimization Algorithms

3.1.1. Paris Law

Paris law is a fundamental empirical model for describing fatigue crack propagation behavior in engineering materials. It quantifies the relationship between the fatigue crack growth rate and the stress intensity factor range, and has long been a standard benchmark for fatigue life prediction. The core formula of the Paris law is expressed as follows [30]:

\frac{d a}{d N} = C {(Δ K)}^{m}

(3)

where

d a / d N

is the fatigue crack propagation rate and

d a / d N

in mm/cycle,

C

and

m

are the Paris law coefficient and exponent, respectively, with their values taken from the existing literature [27].

3.1.2. K-Nearest Neighbor Regression

The K-nearest neighbor (KNN) regression, a well-established machine learning algorithm, is simple to implement. The fundamental principle involves finding the k nearest training samples in the feature space for a sample to be predicted, and the prediction is made by averaging the target values of these neighboring samples. The performance of the algorithm is influenced by the choice of distance metrics and the k value. The Euclidean distance is commonly employed, and the relevant computational formulas are presented below [31]:

d_{E} (x, y) = \sqrt{\sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}}

(4)

\hat{f} (x_{q}) = \frac{1}{k} \sum_{i = 1}^{k} f (x_{i})

(5)

where

x

and

y

are feature vectors of two samples,

d_{E} (x, y)

denotes the Euclidean distance,

\hat{f} (x_{q})

represents the predicted value,

f (x_{i})

is the actual value of the ith nearest neighbor,

k

is the number of selected nearest neighbors, and n reflects for the total number of feature dimensions.

3.1.3. Random Forest Regression

Random forest regression (RFR) is an ensemble learning technique that constructs multiple decision trees. Each tree in the training process utilizes bootstrap sampling of a fixed size on the training samples and random feature selection, promoting diversity among the trees and enhancing prediction accuracy. The calculation method using the averaging approach is shown in the formulas below [32]:

s_{m} = {(x_{1}, y_{1}), \dots, (x_{m}, y_{m})}

(6)

{\hat{y}}_{r f} = \frac{1}{k} \sum_{i = 1}^{k} f (x, s_{m}^{k})

(7)

where

s_{m}

represents the data subset obtained through proportional sampling,

x_{m}

and

y_{m}

are the input and output vectors, respectively,

{\hat{y}}_{r f}

is the predicted value of the target variable,

x

denotes the input sample,

k

is the number of decision trees, and

f (x, s_{m}^{k})

indicates for the output function of the decision tree.

3.1.4. Adaptive Boosting Regression

Adaptive boosting regression (ABR) is an iterative ensemble algorithm that trains a series of weak learners and adjusts the sample weights based on the prediction results from the previous iteration. The final prediction result encompasses the weighted sum of the predicted values from all weak learners. This weighted combination is expressed as follows [33]:

H (x) = ν \sum_{k = 1}^{N} (\ln \frac{1}{α_{k}}) g (x)

(8)

α_{k} = \frac{e_{k}}{1 - e_{k}}

(9)

e_{k} = \sum_{i = 1}^{m} e_{k i}

(10)

In these equations,

H (x)

represents the prediction of the model for the given input sample

x

,

ν

is the regularization factor,

α_{k}

denotes the weight assigned to the kth weak learner,

N

is the total number of weak learners,

g (x)

signifies the median of the weighted outputs from the weak learners,

e_{k}

indicates the total error,

e_{k i}

represents the relative error for an individual sample, and

m

is the total number of samples.

3.1.5. Gradient Boosting Regression

Gradient boosting regression (GBR) utilizes an ensemble model that employs decision trees as base learners. This model trains new weak learners based on the residuals from the prior weak learner to gradually reduce the overall loss [34]. By iteratively superposing multiple learners, the model can effectively approximate complex nonlinear functions with considerable accuracy. The described process involves [35]:

F_{0} (x) = \underset{c}{\arg \min} \sum_{i = 1}^{N} L (y_{i}, c)

(11)

c_{m, j} = \underset{c}{\arg \min} \sum_{x_{i} \in R_{m, j}} L (y_{i}, F_{m - 1} (x_{i}) + c)

(12)

F_{M} (x) = F_{0} (x) + \sum_{m = 1}^{M} \sum_{j = 1}^{J} c_{m, j} \cdot I (x \in R_{m, j})

(13)

Here,

F_{0} (x)

is the initial model,

c

stands for the output parameter of the initial leaf node,

L (y_{i}, c)

defines the loss function,

N

is the total number of samples,

c_{m, j}

indicates the optimal correction value for the jth leaf node of the mth regression tree,

R_{m, j}

represents the sample region corresponding to the jth leaf node of the mth regression tree,

F_{M} (x)

is the final prediction model,

M

signifies the total number of regression trees, and

J

is the total number of leaf nodes in a single regression tree.

3.1.6. Extreme Gradient Boosting Regression

Extreme Gradient Boosting (XGBoost) is an enhanced gradient boosting regression model that uses decision trees as base learners. It iteratively trains new trees to correct prediction residuals while introducing regularization terms to control model complexity, effectively improving generalization on regression tasks [36]. The core modeling process is expressed as follows [37]:

L^{(k)} = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}^{(k - 1)} + f_{k} (x_{i})) + Ω (f_{k})

(14)

Ω (f_{k}) = γ T + \frac{1}{2} λ \sum_{j = 1}^{T} w_{j}^{2}

(15)

where

l

is a loss function, which measures the difference between the total predicted value and

{\hat{y}}_{i}^{(k - 1)} + f_{k} (x_{i})

the target value

y_{i}

.

{\hat{y}}_{i}^{(k - 1)}

denotes the total calculation of the previous

k - 1

trees, and

f_{k} (x_{i})

is the supplementary value of the kth tree to improve the prediction results of the previous

k - 1

trees.

Ω (f_{k})

is the regularization term of the kth tree, also called the greedy term, which is used to penalize the complexity of the model. Among them,

T

is the total number of leaf nodes of the kth tree,

w_{j}^{2}

is the parameter value on each leaf node, and

γ

and

λ

are hyperparameters regulating the regularization term.

3.1.7. Gaussian Process Regression

Gaussian Process Regression (GPR) is a nonparametric Bayesian regression method with a solid theoretical foundation, which can simultaneously estimate the predicted value and quantify the associated uncertainty. It is particularly suitable for nonlinear regression tasks with small datasets, and has been widely applied in fatigue life prediction fields [38,39]. In this study, the GPR model is constructed with a composite kernel function consisting of a constant kernel, a radial basis function kernel and a white noise kernel. The core modeling process is expressed as follows:

y = f (x) + ε

(16)

k (x_{i}, x_{j}) = σ_{f}^{2} \exp (- \frac{| x_{i} - x j |^{2}}{2 l^{2}}) + σ_{n}^{2} δ i j

(17)

Here,

y

is the observed target value,

f (x)

represents the latent regression function,

ε

denotes Gaussian noise.

k (x_{i}, x_{j})

is the composite kernel function of GPR, where

σ_{f}^{2}

is the signal variance,

l

is the length scale of the radial basis function,

σ_{n}^{2}

is the noise variance, and

δ i j

is the Kronecker delta function.

3.1.8. Differential Evolution Algorithm

To prevent the convergence of the hyperparameter search to suboptimal solutions and to enhance prediction performance, this study adopts the DE algorithm for global optimization of the key hyperparameters in each candidate model. The DE algorithm is a population-based stochastic optimization algorithm that derives new candidates through differential mutation operations among vectors and demonstrates superior global search capability [40]. Specifically, hyperparameters such as the number of trees and maximum depth in an RF model are encoded as high-dimensional vectors, representing individuals in the DE population. After initializing a random population consisting of N individuals, this algorithm systematically evolves through cycles of mutation, crossover, and selection operations. A detailed description of the process is as follows [41].

In the mutation step, the vector

V_{i}^{(t)}

is produced using a differential strategy with key equations:

DE / rand / 1 : V_{i}^{(t)} = X_{r_{1}}^{(t)} + F \cdot (X_{r_{2}}^{(t)} - X_{r_{3}}^{(t)})

(18)

DE / rand / 2 : V_{i}^{(t)} = X_{r_{1}}^{(t)} + F \cdot (X_{r_{2}}^{(t)} - X_{r_{3}}^{(t)}) + F \cdot (X_{r_{4}}^{(t)} - X_{r_{5}}^{(t)})

(19)

DE / best / 1 : V_{i}^{(t)} = X_{b e s t}^{(t)} + F \cdot (X_{r_{1}}^{(t)} - X_{r_{2}}^{(t)})

(20)

DE / best / 2 : V_{i}^{(t)} = X_{b e s t}^{(t)} + F \cdot (X_{r_{1}}^{(t)} - X_{r_{2}}^{(t)}) + F \cdot (X_{r_{3}}^{(t)} - X_{r_{4}}^{(t)})

(21)

DE / current - to - best / 1 : V_{i}^{(t)} = X_{i}^{(t)} + F \cdot (X_{b e s t}^{(t)} - X_{i}^{(t)}) + F \cdot (X_{r_{1}}^{(t)} - X_{r_{2}}^{(t)})

(22)

Here,

X_{r_{1}}^{(t)}

,

X_{r_{2}}^{(t)}

,

X_{r_{3}}^{(t)}

,

X_{r_{4}}^{(t)}

, and

X_{r_{5}}^{(t)}

signify mutually exclusive random individuals in the population, all different from the target individual

X_{i}^{(t)}

.

X_{b e s t}^{(t)}

represents the individual with the optimal fitness in the current generation, and the parameter

F

, known as the scaling factor, is a randomly generated value between 0 and 1.

During the crossover step, the trial vector

U_{i}^{(t)}

is generated through binomial crossover:

u_{i, j}^{(t)} = \{\begin{array}{l} v_{i, j}^{(t)}, & j = j^{*} or r \leq C_{r} \\ x_{i, j}^{(t)}, & otherwise \end{array}

(23)

In this equation,

j^{*}

denotes a randomly chosen dimension,

C_{r}

represents the crossover probability, and

r

is a randomly generated value in the range [0, 1].

In the selection stage, individuals are retained based on their fitness, evaluated using the following equation:

X_{i}^{(t + 1)} = \{\begin{array}{l} U_{i}^{(t)}, & f (U_{i}^{(t)}) \leq f (X_{i}^{(t)}) \\ X_{i}^{(t)}, & otherwise \end{array}

(24)

where

f (\cdot)

denotes the fitness function of the individual.

Within the nested 5-fold cross-validation framework, the fitness function for each candidate in the DE algorithm is defined as the negative average coefficient of determination (R²) calculated via 5-fold inner cross-validation on the training partition of the outer fold:

f (\cdot) = - {\bar{R^{2}}}_{inner 5 - fold CV}

. This fitness metric guides the evolution of model hyperparameters toward configurations that enhance prediction performance on inner validation folds. Following iterations up to the predetermined maximum number of generations, the optimal hyperparameter combination is determined (Figure 2).

3.2. Model Performance Evaluation Metrics

To compare the accuracy of predictions across different models and identify the most suitable model for subsequent interpretable analysis two evaluation metrics, R² and RMSE, are utilized. The R² reflects the linear fitting degree between the predicted values and the actual values. Conversely, the RMSE represents the square root of the sum of squared residuals between predicted and true values, thereby quantifying the overall magnitude of the prediction errors. Both metrics were computed using the transformed fatigue life values (

\log_{10} N_{f}

), as introduced in Section 2.2, to align with the model training objectives and ensure consistent error interpretation.

During the model training and optimization phase, the average R² derived from 5-fold inner cross-validation on the outer training partition was utilized as the primary criterion for evaluation. During the final testing and comparison phase, both R² and RMSE were calculated on the independent test partitions of each outer fold. The average values across all five outer folds served as the definitive criteria for evaluating model performance. The formulas for these metrics are provided as follows:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(25)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(26)

where

y_{i}

is the true value of the sample,

{\hat{y}}_{i}

is the predicted value generated by the model,

\bar{y}

denotes the arithmetic mean of all true samples, and

n

is the total number of samples.

3.3. Interpretable Analysis Methods

This study establishes an interpretable analysis framework to mitigate the prevalent “black-box” challenge associated with many machine learning models. This framework incorporates SHAP and PDP.

SHAP is founded on the Shapley value concept from cooperative game theory, which allocates the output of the model prediction as a linear combination of impacts from each individual feature. The formula is presented as follows [42]:

g (x^{'}) = ϕ_{0} + \sum_{i = 1}^{M} ϕ_{i} x_{i}^{'}

(27)

In this equation,

g

represents the explanatory model,

x^{'} \in {\{0, 1\}}^{M}

indicates the presence state of corresponding features,

ϕ_{0}

is the baseline constant of the model,

M

is the total number of input features, and

ϕ_{i}

signifies the contribution value of the feature.

Using the entire dataset, the SHAP method first calculates the baseline (mean value of the model predictions across all samples). It then measures the deviation between the prediction for an individual instance and the baseline. Each feature is considered a participant in a cooperative game, where its contribution to this deviation is quantified through its SHAP value [43]. SHAP values collectively explain the deviation from the mean prediction to that of an individual sample’s prediction. The corresponding formula is shown as:

y_{i} = y_{b a s e} + \sum_{j = 1}^{k} f (X_{i j})

(28)

Here,

y_{i}

denotes the predicted value for the ith sample,

X_{i j}

represents the jth feature variable of the ith sample,

k

is the total number of features,

y_{b a s e}

is the mean prediction value across all samples, and

f (X_{i j})

corresponds to the SHAP value of

X_{i j}

. The positive or negative sign of this value indicates the corresponding feature’s positive or negative impact on the predicted value of the sample.

From a game-theoretic perspective, the SHAP value for a given feature is defined as the weighted average of its marginal contributions across all possible feature subsets, represented by [20]:

φ_{j}^{i} = \sum_{S} \frac{| S |! (k - | S | - 1)!}{k!} [f_{x} (S \cup {x^{j}}) - f_{x} (S)]

(29)

where

x^{j}

denotes the jth feature of the ith sample,

φ_{j}^{i}

is the SHAP value of the jth feature for the ith sample,

S

represents all feature subsets excluding feature

x^{j}

,

k

is the total number of input features,

| S |

indicates the cardinality of set

S

,

f_{x} (S \cup {x^{j}})

and

f_{x} (S)

are the model predictions with and without feature

x^{j}

, respectively.

The pure interaction effect between two features is calculated using the following equation [44]:

\{\begin{array}{l} φ_{j, l}^{i} = \sum_{S \subseteq F ∖ {j, l}} \frac{| S |! \cdot (k - | S | - 2)!}{2 \cdot (k - 1)!} \cdot δ_{j, l}^{i} (S), j \neq l \\ δ_{j, l}^{i} (S) = f_{x} (S \cup {j, l}) - f_{x} (S \cup {j}) - f_{x} (S \cup {l}) + f_{x} (S) \end{array}

(30)

where

F

is the complete set of input features,

j

and l represent two distinct features,

δ_{j, l}^{i} (S)

denotes the second-order marginal contribution of features

j

and

l

under feature subset

S

, and

φ_{j, l}^{i}

is the pure interaction effect between feature

j

and feature

l

.

To validate and complement the SHAP analysis, PDP is employed as an auxiliary interpretative tool in this study. PDP quantifies the marginal effect of a single feature on model predictions by averaging over the remaining features [45], thereby demonstrating how the target variable varies as the feature changes. For a single feature, the PDP is defined as follows [46]:

{\hat{PD}}_{j} (x_{j}) = \frac{1}{n} \sum_{i = 1}^{n} \hat{f} (x_{j}, x_{- j}^{(i)})

(31)

where

x_{j}

represents the target feature of interest,

n

is the total number of training samples,

X_{- j}^{(i)}

denotes the feature vector of the ith sample with the jth feature removed, and

{\hat{PD}}_{j} (x_{j})

refers to the average model prediction output when the jth feature of all samples is uniformly set to the fixed value

x_{j}

.

In summary, this study establishes a machine learning framework for predicting the fatigue life of 25CrMo4 steel, encompassing model construction, performance evaluation, and interpretability analysis. Implemented in a Python 3.9.12 environment with scikit-learn version 1.6.1, the overall workflow consists of the following steps: First, the fatigue life dataset undergoes preprocessing; second, global hyperparameter optimization is performed for six candidate machine learning models using the DE algorithm combined with nested 5-fold cross-validation. In this method, the outer 5-folds evaluate generalization performance, while the inner 5-folds guide the DE search; third, the model demonstrating the best average performance across the outer folds is selected, and its optimal fold is extracted for subsequent interpretable analysis. The combined SHAP-PDP approach is employed to assess feature importance, analyze the influence mechanisms of individual features, dissect the prediction process of representative samples, and elucidate interaction patterns between features. The complete workflow is illustrated in Figure 3.

4. Results and Discussion

4.1. Model Prediction Results

The machine learning models described in Section 3.1 were utilized to predict the fatigue life of 25CrMo4 steel. Global hyperparameter optimization was conducted using a nested five-fold cross-validation paired with the DE algorithm. The final performance metrics, average R² and RMSE along with their standard deviations, obtained from the outer five-fold cross-validation, were employed to select the optimal prediction model.

The performance metrics for each model from the outer five-fold cross-validation are presented in Table 1. The results reveal that the GPR model achieved the best overall prediction performance, registering an average R² of 0.6630 ± 0.1243 and an average RMSE of 0.1705 ± 0.0375, surpassing the other five machine learning models. Consequently, the DE-optimized Gaussian Process Regression (DE-GPR) was preliminarily chosen as the foundational model for further interpretability analysis.

To validate further the predictive accuracy of the DE-GPR model and to juxtapose it with traditional fatigue life prediction methods, this study compared the test set prediction results from the best-performing fold of the DE-GPR model, detailing the optimal hyperparameter combination in Table 2, against the predictions derived from the classical Paris law, as depicted in Figure 4. The predicted values from the DE-GPR model for all test samples reside within the ±2× error band. In contrast, the classical Paris law typically yields conservative predictions for the fatigue life of both MSP-treated and untreated 25CrMo4 steel specimens, with all values falling below the ideal prediction line, and the majority of samples situated outside the ±2× error band. For untreated specimens, the Paris law predictions are more consistent and closer to the ideal prediction line, whereas for the MSP-treated specimens, the predictions are more dispersed and deviate further from the ideal line. This discrepancy occurs because the classical Paris law primarily accounts for the fundamental law of crack propagation and does not effectively encapsulate the crucial impacts of surface integrity elements such as compressive residual stress and work hardening introduced by MSP on fatigue life.

Existing studies [27] have demonstrated that the NASGRO crack growth model can be adapted to provide accurate fatigue life predictions for MSP 25CrMo4 steel, with results falling within a ±2× error band. Similarly, the DE-GPR model yields predictions within this error band, indicating accuracy comparable to that of the modified NASGRO model. However, the construction of the NASGRO model necessitates a profound understanding of the physical mechanisms underlying fatigue crack propagation. It depends on parameter calibration and entails a complex calculation process. By contrast, the DE-GPR model, a data-driven approach, circumvents the need for explicit physical equations and can autonomously learn the relationships among multiple variables from data, offering a straightforward and efficient prediction process. Nevertheless, it is important to recognize that the DE-GPR model operates as a “black-box” with an opaque internal decision-making process, which does not provide as intuitive and clear an understanding of the physical causal relationships between features and fatigue life as do the Paris law and NASGRO model.

Based on the feature correlation analysis results in Section 2.2, there is an extremely strong linear correlation among

σ_{r s}

, FWHM, and

R_{a}

, suggesting significant information redundancy among these features. To investigate the impact of this redundancy on model performance, the study designed three sets of feature reduction experiments: each involved the sequential elimination of one pair from the three highly correlated features, retaining the other three features each time to re-conduct nested five-fold DE hyperparameter optimization and model training, and comparing the results to those of the DE-GPR model with a full feature set, as shown in Table 3.

The results indicate that the performance of the models, after feature dimensionality reduction, slightly improves compared with the all-feature model. The maximum differences in the average R² and RMSE were 0.0266 and 0.0065, respectively, demonstrating a certain degree of information redundancy among

σ_{r s}

, FWHM, and

R_{a}

. The removal of some redundant features can reduce noise interference. Although feature dimensionality reduction enhances model performance to a certain extent, given that the model with all features retains more comprehensive physical information and offers a more thorough analysis of feature contributions and interaction mechanisms for subsequent interpretability analysis, this study ultimately opted for the full-feature input DE-GPR model for further analysis.

4.2. Results of Interpretability Analysis

4.2.1. Feature Importance

Using the SHAP method, a global attribution analysis was initially conducted on the best fold of the outer five-fold cross-validation for the optimally configured DE-GPR model. This approach utilized the mean absolute SHAP value to evaluate the contribution of each input feature to the predicted fatigue life. As illustrated in Figure 5,

σ_{a}

has the highest mean absolute SHAP value (0.3033), indicating that it is the most influential feature for the model’s predictions. The

\sqrt{a r e a}

follows, with a value of 0.1569, ranking second in importance. The features

R_{a}

,

σ_{r s}

, and FWHM have considerably lower values of 0.0987, 0.0958, and 0.0669, respectively, with FWHM contributing the least. The overall feature importance ranking for the predicted fatigue life, derived from the SHAP analysis, is as follows:

σ_{a}

,

\sqrt{a r e a}

,

R_{a}

,

σ_{r s}

, and FWHM.

Figure 6 depicts the SHAP beeswarm plot, where each point represents the SHAP value of an individual sample, color-coded according to the magnitude of the corresponding feature. Each row demonstrates the influence of a specific feature on predicted fatigue life. High-value sample points for

σ_{a}

and

\sqrt{a r e a}

are predominantly located on the left side of the zero axis, displaying negative SHAP values, whereas low-value samples are found on the right side with positive SHAP values. This pattern indicates their detrimental effects on the predicted fatigue life. The distribution range of

\sqrt{a r e a}

is narrower than that of

σ_{a}

, signifying its weaker overall influence on model prediction outcomes. For

σ_{r s}

, which ranges from −430 MPa to 34 MPa, samples exhibiting tensile residual stress show a concentrated distribution on the left of the zero axis, which negatively impacts predicted fatigue life. Conversely, samples with compressive residual stress display a concentrated distribution on the right side, contributing positively to predicted fatigue life. This pattern corroborates the findings referenced in the literature, which state that tensile residual stress is deleterious, whereas compressive residual stress is advantageous for fatigue performance [47].

R_{a}

exhibits a distribution trend comparable to

σ_{r s}

: high values correspond to negative SHAP values, and low values are associated with positive SHAP values, indicating their adverse effect on predicted fatigue life. Among all features, FWHM presents the most significant SHAP distribution differences. Elevated FWHM values correlate with positive SHAP values, reflecting the established relationship between FWHM and the square root of the dislocation density [48]; a higher dislocation density enhances plasticity and thus improves fatigue resistance.

4.2.2. Interpretation of the Prediction for a Single Sample

Figure 7 illustrates the prediction process for Sample 18 and the underlying decision logic of the model. The baseline prediction

E [f (x)]

derived from the SHAP method is 5.8091 (on a logarithmic scale), corresponding to the mean predicted fatigue life of the dataset, i.e.,10^5.8091 cycles. The

σ_{a}

for this sample is 460.0 MPa, significantly higher than the average in the training set. The SHAP analysis assigns a negative correction of −0.5392 to this feature, resulting in a corrected fatigue life of 10^5.2699 cycles. This negative contribution is consistent with the established understanding that higher stress levels impair fatigue life, confirming the model’s accurate internalization of this relationship. Additionally, both the FWHM and

σ_{r s}

positively influence the model prediction. Conversely, the

R_{a}

value, which is higher than the average, provides a small negative correction of −0.0810. The actual value of

\sqrt{a r e a}

for this sample is 67.4, and the SHAP analysis indicates a positive contribution of 0.1549 to this feature. Consequently, the baseline prediction is adjusted to a final predicted fatigue life of 10^5.5270 cycles. This sample clearly demonstrates the additive nature of SHAP values in decomposing model predictions.

4.2.3. Feature Marginal Effects Based on PDP

Figure 8 and Figure 9 display the partial dependence plots for input features, demonstrating their marginal effect on the predicted fatigue life of 25CrMo4 steel. As depicted in Figure 8, the predicted fatigue life decreases from 3.25 × 10⁶ cycles to 0.79 × 10⁵ cycles as the applied stress level increases from 200 MPa to 550 MPa. This trend is consistent with the classical S-N curve behavior, corroborating both the data reliability and the predictive capacity of the established model.

Figure 9 displays the partial dependence plots for the remaining four features. As shown in Figure 9a, the predicted fatigue life diminishes with an increase in average roughness. This observation aligns with the theory that surface roughness influences the fatigue crack growth threshold by inducing crack closure [47]. Figure 9b depicts an upward trend in predicted fatigue life as the surface residual stresses become more compressive. This pattern confirms that compressive residual stress effectively inhibits the initiation and propagation of fatigue cracks [49]. For full width at half maximum (Figure 9c), the predicted fatigue life demonstrates a gradual increase as its value rises. Conversely, as depicted in Figure 9d, the predicted fatigue life decreases with an increase in the equivalent notch size. The trends derived from the PDPs are highly consistent with those obtained in the previous SHAP beeswarm plot analysis. The consistency between these two analytical results and the established mechanisms further authenticates the model interpretation.

4.2.4. Interaction Effects Among Features

To further explore interaction effects among key features, the pure SHAP interaction values between paired features were computed. Figure 10 depicts the pure SHAP interaction values between surface residual stress and applied stress level, while Figure 11 displays those between full width at half maximum and applied stress level.

The interaction values shown in Figure 10 exhibit distinct characteristics corresponding to the two surface treatment states. In the high compressive residual stress region (approximately −400 MPa), which includes the MSP specimens, the pure interaction values range from −0.0018 to +0.0041. Higher applied stress levels tend to yield positive interaction values, while lower applied stress levels are associated with negative ones. This pattern may reflect a physical coupling mechanism between the stable compressive stress layer induced by MSP and external cyclic loading. The variation in interaction values with applied stress magnitude may be linked to residual stress relaxation [50]. For 25CrMo4 alloy steel, relaxation may be associated with the number of cycles: lower applied stress leads to longer fatigue lives, and under extended cyclic periods, relaxation is more likely to occur, potentially resulting in a weak negative pure interaction in the SHAP results.

For the low residual stress region (ranging from −50 MPa to 35 MPa), corresponding to the untreated specimens, the interaction values are distributed between −0.0010 and +0.0035. Here, higher applied stress levels correlate with negative interaction values, while relatively low applied stress levels yield slightly positive values. This distribution likely reflects the absence of an effective protective stress layer in untreated specimens. In the absence of compressive residual stress, crack propagation is primarily driven by the applied stress itself, and the interaction between the surface stress state and external loading appears relatively weak and passive.

Figure 11 elucidates two distinct interaction trends correlated with different ranges of FWHM values. For a low FWHM value (approximately 2.55), characteristic of untreated specimens, the pure interaction value exhibits a gradual shift from +0.0031 to −0.0012 as the applied stress level increases. This trend, which is monotonic, may originate from the initially low dislocation density in the untreated surface, whereby the interaction is predominantly influenced by the external applied stress. With increasing levels of applied stress, the SHAP interaction value transitions from positive to negative.

Conversely, for a high FWHM value (approximately 3.9), associated with MSP-treated specimens, the pure interaction value decreases from +0.0040 to −0.0017 as the applied stress level decreases. In this scenario, higher levels of applied stress correspond to positive interactions, whereas lower stress levels result in negative interactions. This differential trend can be attributed to the pre-established high-density dislocation network caused by MSP, which creates a work-hardened surface layer. Under conditions of high applied stress, this work-hardened layer retards crack initiation, thus positively affecting the predicted fatigue life. For 25CrMo4 alloy steel, the influence of the work-hardened layer appears to be minimal under low stress amplitudes, which could account for the predominantly negative or neutral SHAP interaction values at these lower stresses.

Although the interaction effects observed are significantly smaller than the primary effects of individual features, their orderly distribution might indicate potential synergistic relationships between surface treatment characteristics and external cyclic loading. While primary effects continue to dominate the model predictions, these secondary interaction features enrich and refine the model’s interpretation. However, the physical interpretations of SHAP interaction values, being derivations from statistical associations within the model, require further experimental validation.

5. Conclusions

Based on 89 sets of fatigue test data of 25CrMo4 alloy steel, six machine learning algorithms were employed to predict fatigue life. Ultimately, a DE-GPR model was established. To elucidate the model decisions, this study utilized an interpretable framework combining the SHAP method with PDP to assess the significance and influence mechanisms of each feature in predicting fatigue life, as well as to examine the interaction patterns between features. The main conclusions are summarized as follows:

Feature importance ranking based on mean absolute SHAP values shows that applied stress level is the most influential feature in the fatigue life predictions, followed by equivalent notch size, average roughness, surface residual stress, and FWHM. Among these, applied stress level, equivalent notch size, and average roughness exhibit a negative association with predicted fatigue life, whereas FWHM shows a positive association. The effect of surface residual stress is more complex.
Marginal effect analysis confirms that the variation in predicted fatigue life with applied stress level follows the typical S-N curve shape. The trends for all features are consistent with the SHAP beeswarm plot, thus enhancing the reliability of the interpretation.
Surface residual stress and FWHM exhibit distinct interactive characteristics with varying levels of applied stress under different surface treatment conditions. For MSP specimens within regions of high compressive residual stress, the interaction values transition from positive to negative as the applied stress level decreases. This shift may be linked to the relaxation of residual stress under long-term cyclic loading. In the case of MSP specimens with high FWHM values, the interaction values decrease gradually as applied stress declines. This observation suggests that in 25CrMo4 alloy steel, the work-hardened layer appears to contribute only minimally under low stress amplitudes.

This study successfully developed a DE-GPR model with satisfactory predictive performance for the fatigue life prediction of 25CrMo4 alloy steel. It also systematically explored the influence mechanisms of features using the SHAP-PDP interpretable framework. However, some limitations remain. The current dataset comprises only 89 samples. Although nested five-fold cross-validation was employed to rigorously evaluate the generalization performance and mitigate overfitting, the model’s applicability to broader conditions still requires confirmation with larger and more diverse datasets. Consequently, the proposed model is currently only suitable for predicting the high-cycle fatigue life of MSP 25CrMo4 steel at room temperature. Additionally, the physical mechanisms underlying the observed interaction effects necessitate further targeted experimental validation.

Author Contributions

Methodology, Z.-C.L.; Validation, Z.-C.L.; Formal analysis, Z.-C.L.; Resources, X.-M.C.; Data curation, X.-M.C.; Writing—original draft, Z.-C.L.; Writing—review and editing, X.-M.C.; Supervision, X.-M.C.; Project administration, X.-M.C.; Funding acquisition, X.-M.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 52475148).

Data Availability Statement

The raw experimental data presented in this study are available in The fatigue test and surface condition datasets of notched UP and MSP specimens at https://doi.org/10.1016/j.engfracmech.2022.108992, reference number [27]. The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liao, D.; Gao, J.-W.; Zhu, S.-P.; Correia, J.; De Jesus, A.; Calçada, R. Fatigue Behaviour of EA4T Notched Specimens: Experiments and Predictions Using the Theory of Critical Distance. Eng. Fract. Mech. 2023, 286, 109269. [Google Scholar] [CrossRef]
Zhang, Y.; Huang, G.; He, J.; Wang, M. Hot Deformation Behavior and Microstructure Evolution Mechanisms of EA4T Axle Steel for High-Speed Train Application. J. Iron Steel Res. Int. 2025, 32, 2847–2863. [Google Scholar] [CrossRef]
Gao, J.-W.; Dai, X.; Zhu, S.-P.; Zhao, J.-W.; Correia, J.A.F.O.; Wang, Q. Failure Causes and Hardening Techniques of Railway Axles—A Review from the Perspective of Structural Integrity. Eng. Fail. Anal. 2022, 141, 106656. [Google Scholar] [CrossRef]
Luo, Y.; Yuan, P.; Li, G.; Yang, B.; Ao, N.; Li, Z.; Wu, Y.; Zhang, G.; Wu, S. Corrosion Fatigue Behavior and Life Prediction of Railway Axle EA4T Alloy Steel with Artificial Indentation. Eng. Fract. Mech. 2024, 296, 109835. [Google Scholar] [CrossRef]
Zhao, H.; Han, R.-P.; Gao, J.-W.; Zhu, S.-P.; Han, J. Fatigue Performance Evaluation of High-Strength Railway Axles Subjected to Different Surface Defects. Alex. Eng. J. 2025, 127, 66–74. [Google Scholar] [CrossRef]
Qin, T.; Ao, N.; Ren, X.; Zhao, X.; Wu, S. Determination of Optimal Ultrasonic Surface Rolling Parameters to Enhance the Fatigue Strength of Railway Axle EA4T Steel. Eng. Fract. Mech. 2022, 275, 108831. [Google Scholar] [CrossRef]
Makino, T.; Kozuka, C.; Hata, T.; Kato, T.; Yamamoto, M.; Minoshima, K. Fatigue Properties of Non-Press-Fitted Part of Full-Scale Induction-Hardened Axles of Medium-Carbon Steel for High-Speed Railway Vehicles. Int. J. Fatigue 2025, 190, 108664. [Google Scholar] [CrossRef]
Pertoll, T.; Buzzi, C.; Leitner, M.; Simunek, D.; Boronkai, L. Residual Life Assessment of Deep Rolled Railway Axles Considering the Effect of Process Parameters. Procedia Struct. Integr. 2024, 57, 250–261. [Google Scholar] [CrossRef]
Unal, O.; Maleki, E.; Karademir, I.; Husem, F.; Efe, Y.; Das, T. Effects of Conventional Shot Peening, Severe Shot Peening, Re-Shot Peening and Precised Grinding Operations on Fatigue Performance of AISI 1050 Railway Axle Steel. Int. J. Fatigue 2022, 155, 106613. [Google Scholar] [CrossRef]
Li, X.; Zhang, J.; Yang, B.; Zhang, J.; Wu, M.; Lu, L. Effect of Micro-Shot Peening, Conventional Shot Peening and Their Combination on Fatigue Property of EA4T Axle Steel. J. Mater. Process. Technol. 2020, 275, 116320. [Google Scholar] [CrossRef]
Ling, C.; Jia, Y.; Fu, R.; Zheng, L.; Zhong, Z.; Hong, Y. Fatigue Life Prediction for LPBF-Fabricated Ti-6Al-4V up to Very-High-Cycle Regime Based on Continuum Damage Mechanics Incorporating Effect of Defects. Int. J. Fatigue 2024, 181, 108131. [Google Scholar] [CrossRef]
Ding, M.C.; Zhang, Y.L.; Lu, H.T. Fatigue Life Prediction of TC17 Titanium Alloy Based on Micro Scratch. Int. J. Fatigue 2020, 139, 105793. [Google Scholar] [CrossRef]
Xu, T.; Lu, L.; Zeng, D.; Zou, L. Fretting Fatigue Crack Growth Simulation and Residual Life Assessment of Railway Press-Fitted Axle. Eng. Fract. Mech. 2023, 286, 109290. [Google Scholar] [CrossRef]
Wang, H.; Li, B.; Gong, J.; Xuan, F.-Z. Machine Learning-Based Fatigue Life Prediction of Metal Materials: Perspectives of Physics-Informed and Data-Driven Hybrid Methods. Eng. Fract. Mech. 2023, 284, 109242. [Google Scholar] [CrossRef]
Hamada, A.; Elyamny, S.; Abd-Elaziem, W.; Elkatatny, S.; Darwish, M.A.; Sebaey, T.A.; Järvenpää, A.; Vineesh, K.P.; Elsheikh, A.H. Advancing Fatigue Life Prediction with Machine Learning: A Review. Mater. Today Commun. 2025, 43, 111525. [Google Scholar] [CrossRef]
Zhan, Z.; Li, H. Machine Learning Based Fatigue Life Prediction with Effects of Additive Manufacturing Process Parameters for Printed SS 316L. Int. J. Fatigue 2021, 142, 105941. [Google Scholar] [CrossRef]
Bao, H.; Wu, S.; Wu, Z.; Kang, G.; Peng, X.; Withers, P.J. A Machine-Learning Fatigue Life Prediction Approach of Additively Manufactured Metals. Eng. Fract. Mech. 2021, 242, 107508. [Google Scholar] [CrossRef]
Horňas, J.; Běhal, J.; Homola, P.; Doubrava, R.; Holzleitner, M.; Senck, S. A Machine Learning Based Approach with an Augmented Dataset for Fatigue Life Prediction of Additively Manufactured Ti-6Al-4V Samples. Eng. Fract. Mech. 2023, 293, 109709. [Google Scholar] [CrossRef]
Gan, L.; Wu, H.; Zhong, Z. Fatigue Life Prediction Considering Mean Stress Effect Based on Random Forests and Kernel Extreme Learning Machine. Int. J. Fatigue 2022, 158, 106761. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Horňas, J.; Materna, A.; Glinz, J.; Yosifov, M.; Senck, S. Multivariate Interpolation and Machine Learning Models for Extreme Defects-Based Fatigue Life Prediction of Ti6Al4V Specimens Fabricated by SLM. Eng. Fract. Mech. 2025, 314, 110756. [Google Scholar] [CrossRef]
Jafari, A.; Mollaali, M.; Ma, L.; Shahmansouri, A.A.; Zhou, Y.; Dugnani, R. Brittle Fracture Strength Prediction via XML with Reliability Considerations. Eng. Fract. Mech. 2025, 328, 111555. [Google Scholar] [CrossRef]
Zhang, S.; Lei, H.; Zhou, Z.; Wang, G.; Qiu, B. Fatigue Life Analysis of High-Strength Bolts Based on Machine Learning Method and SHapley Additive exPlanations (SHAP) Approach. Structures 2023, 51, 275–287. [Google Scholar] [CrossRef]
Yu, A.; Zhou, Q.; Pan, Y.; Wan, F.; Kuang, F.; Lu, X. Hybrid Clustering-Enhanced Interpretable Machine Learning for Fatigue Life Prediction across Various Cyclic Stages in Laser Powder Bed Fused Ti-6Al-4V Alloy. Int. J. Fatigue 2025, 198, 108995. [Google Scholar] [CrossRef]
Huang, T.; Wan, C.; Liu, T.; Zhang, Y.; Lu, X.; Ding, Y.; Zhao, H.; Miao, C.; Xue, S. Data-Driven Fatigue Life Prediction of Corroded Steel Wires: A Transfer Learning on Stacking Interpretable Model and Feature Sensitivity Analysis. Int. J. Fatigue 2026, 207, 109498. [Google Scholar] [CrossRef]
Wong, T.-T. Performance Evaluation of Classification Algorithms by K-Fold and Leave-One-out Cross Validation. Pattern Recognit. 2015, 48, 2839–2846. [Google Scholar] [CrossRef]
Li, H.; Zhang, J.; Hu, L.; Su, K. Notch Fatigue Life Prediction of Micro-Shot Peened 25CrMo4 Alloy Steel: A Comparison between Fracture Mechanics and Machine Learning Methods. Eng. Fract. Mech. 2023, 277, 108992. [Google Scholar] [CrossRef]
Xu, L.; Wang, Y.; Mo, L.; Tang, Y.; Wang, F.; Li, C. The Research Progress and Prospect of Data Mining Methods on Corrosion Prediction of Oil and Gas Pipelines. Eng. Fail. Anal. 2023, 144, 106951. [Google Scholar] [CrossRef]
Ma, H.; Li, B.; Xue, H. Shot Peening-Induced Surface Integrity Governing Fatigue Performance of Al Alloy in High to Very High Cycle Regime: Synergistic Effects of Residual Stress, Roughness and Microstructure. J. Mater. Res. Technol. 2025, 39, 4866–4881. [Google Scholar] [CrossRef]
Tanaka, K. Fatigue Crack Propagation from a Crack Inclined to the Cyclic Tensile Axis. Eng. Fract. Mech. 1974, 6, 493–507. [Google Scholar] [CrossRef]
Sotiropoulou, K.F.; Vavatsikos, A.P.; Botsaris, P.N. A Hybrid AHP-PROMETHEE II Onshore Wind Farms Multicriteria Suitability Analysis Using kNN and SVM Regression Models in Northeastern Greece. Renew. Energy 2024, 221, 119795. [Google Scholar] [CrossRef]
Guo, J.; Zan, X.; Wang, L.; Lei, L.; Ou, C.; Bai, S. A Random Forest Regression with Bayesian Optimization-Based Method for Fatigue Strength Prediction of Ferrous Alloys. Eng. Fract. Mech. 2023, 293, 109714. [Google Scholar] [CrossRef]
Feng, D.-C.; Liu, Z.-T.; Wang, X.-D.; Chen, Y.; Chang, J.-Q.; Wei, D.-F.; Jiang, Z.-M. Machine Learning-Based Compressive Strength Prediction for Concrete: An Adaptive Boosting Approach. Constr. Build. Mater. 2020, 230, 117000. [Google Scholar] [CrossRef]
Guo, Y.; Rui, S.-S.; Xu, W.; Sun, C. Machine Learning Method for Fatigue Strength Prediction of Nickel-Based Superalloy with Various Influencing Factors. Materials 2022, 16, 46. [Google Scholar] [CrossRef]
Zhao, Y.; Lai, M.; Wu, Y.; Li, G.; Jiang, H.; Cui, J. Fatigue Life Prediction of Aluminum-Steel Magnetic Pulse Crimped Joints Based on Point Cloud Measurement and Gradient Boosting Regression Trees. Int. J. Fatigue 2025, 198, 109020. [Google Scholar] [CrossRef]
Wang, Q.; Yao, G.; Kong, G.; Wei, L.; Yu, X.; Jianchuan, Z.; Ran, C.; Luo, L. A Data-Driven Model for Predicting Fatigue Performance of High-Strength Steel Wires Based on Optimized XGBOOST. Eng. Fail. Anal. 2024, 164, 108710. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM: San Francisco, CA, USA, 2016; pp. 785–794. [Google Scholar]
Gao, J.; Wang, C.; Xu, Z.; Wang, J.; Yan, S.; Wang, Z. Gaussian Process Regression Based Remaining Fatigue Life Prediction for Metallic Materials under Two-Step Loading. Int. J. Fatigue 2022, 158, 106730. [Google Scholar] [CrossRef]
Tang, C.-Z.; Li, H.-W.; Li, K.-S.; Lei, X.-L.; Cheng, L.-Y.; Ju, L.; Li, W.; Zeng, F.; Zhang, X.-C. Data-Driven Fatigue Life Prediction of Small-Deep Holes in a Nickel-Based Superalloy after a Cold Expansion Process. Int. J. Fatigue 2024, 181, 108159. [Google Scholar] [CrossRef]
Storn, R.; Price, K. Differential Evolution—A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
Cai, Z.; Yang, X.; Zhou, M.; Zhan, Z.-H.; Gao, S. Toward Explicit Control between Exploration and Exploitation in Evolutionary Algorithms: A Case Study of Differential Evolution. Inf. Sci. 2023, 649, 119656. [Google Scholar] [CrossRef]
Mangalathu, S.; Hwang, S.-H.; Jeon, J.-S. Failure Mode and Effects Analysis of RC Members Based on Machine-Learning-Based SHapley Additive exPlanations (SHAP) Approach. Eng. Struct. 2020, 219, 110927. [Google Scholar] [CrossRef]
Palar, P.S.; Zuhal, L.R.; Shimoyama, K. Enhancing the Explainability of Regression-Based Polynomial Chaos Expansion by Shapley Additive Explanations. Reliab. Eng. Syst. Saf. 2023, 232, 109045. [Google Scholar] [CrossRef]
Alomari, Y.; Andó, M. SHAP-Based Insights for Aerospace PHM: Temporal Feature Importance, Dependencies, Robustness, and Interaction Analysis. Results Eng. 2024, 21, 101834. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Xin, X.; Hooker, G.; Huang, F. Pitfalls in Machine Learning Interpretability: Manipulating Partial Dependence Plots to Hide Discrimination. Insur. Math. Econ. 2025, 125, 103135. [Google Scholar] [CrossRef]
Luo, Y.; Qin, T.; Jia, X.; Hu, Y.; Li, C.; Mu, G.; Wu, S. Fatigue Life Enhancement of Foreign Object Impacted Railway Axle EA4T Steel with Surface Shot Peening. Eng. Fail. Anal. 2022, 142, 106782. [Google Scholar] [CrossRef]
Pešička, J.; Kužel, R.; Dronhofer, A.; Eggeler, G. The Evolution of Dislocation Density during Heat Treatment and Creep of Tempered Martensite Ferritic Steels. Acta Mater. 2003, 51, 4847–4862. [Google Scholar] [CrossRef]
Zhang, J.W.; Lu, L.T.; Shiozawa, K.; Shen, X.L.; Yi, H.F.; Zhang, W.H. Analysis on Fatigue Property of Microshot Peened Railway Axle Steel. Mater. Sci. Eng. A 2011, 528, 1615–1622. [Google Scholar] [CrossRef]
Benedetti, M.; Fontanari, V.; Bandini, M.; Savio, E. High- and Very High-Cycle Plain Fatigue Resistance of Shot Peened High-Strength Aluminum Alloys: The Role of Surface Morphology. Int. J. Fatigue 2015, 70, 451–462. [Google Scholar] [CrossRef]

Figure 1. Correlation heatmap based on Pearson correlation coefficient.

Figure 2. Schematic of differential evolution algorithm.

Figure 3. Explainable fatigue life prediction framework for 25CrMo4 steel.

Figure 4. Prediction scatter plots of fatigue life: (a) DE-GPR model; (b) Paris law.

Figure 5. Mean absolute SHAP values for each feature.

Figure 6. SHAP beeswarm plot for all features.

Figure 7. SHAP interpretation of prediction results for the 39th sample.

Figure 8. Partial dependence plot of applied stress level.

Figure 9. Partial dependence plots: (a) Average roughness; (b) Surface residual stress; (c) Full width at half maximum; (d) Equivalent notch size.

Figure 10. Pure SHAP interaction values between surface residual stress and applied stress level.

Figure 11. Pure SHAP interaction values between FWHM and applied stress level.

Table 1. Comparison of average performance of six machine learning models from outer 5-fold cross-validation.

Algorithm	Average R²	Standard Deviations	Average RMSE	Standard Deviations
KNN	0.5244	0.2447	0.1838	0.0090
RFR	0.5173	0.1973	0.2002	0.0227
ABR	0.5623	0.2086	0.1900	0.0304
GBR	0.5274	0.1964	0.1980	0.0217
XGBoost	0.5083	0.1331	0.2057	0.0203
GPR	0.6630	0.1243	0.1705	0.0375

Table 2. Optimized hyperparameter combination for the DE-GPR model.

Algorithm	Parameter	Value
Gaussian Process Regression (GPR)	length_scale	36.19
	constant_value	36.19
	noise_level	0.033

Table 3. Performance comparison of DE-GPR models before and after feature dimensionality reduction.

Feature Combination	Average R²	Standard Deviations	Average RMSE	Standard Deviations
All features	0.6630	0.1243	0.1705	0.0375
Excluding $σ_{r s}$ and FWHM	0.6848	0.0913	0.1652	0.0276
Excluding $σ_{r s}$ and $R_{a}$	0.6896	0.0926	0.1640	0.0289
Excluding FWHM and $R_{a}$	0.6777	0.1014	0.1665	0.0271

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Z.-C.; Chen, X.-M. Fatigue Life Prediction of 25CrMo4 Alloy Steel Based on Interpretable Methods. Materials 2026, 19, 2544. https://doi.org/10.3390/ma19122544

AMA Style

Li Z-C, Chen X-M. Fatigue Life Prediction of 25CrMo4 Alloy Steel Based on Interpretable Methods. Materials. 2026; 19(12):2544. https://doi.org/10.3390/ma19122544

Chicago/Turabian Style

Li, Ze-Cheng, and Xiao-Min Chen. 2026. "Fatigue Life Prediction of 25CrMo4 Alloy Steel Based on Interpretable Methods" Materials 19, no. 12: 2544. https://doi.org/10.3390/ma19122544

APA Style

Li, Z.-C., & Chen, X.-M. (2026). Fatigue Life Prediction of 25CrMo4 Alloy Steel Based on Interpretable Methods. Materials, 19(12), 2544. https://doi.org/10.3390/ma19122544

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fatigue Life Prediction of 25CrMo4 Alloy Steel Based on Interpretable Methods

Abstract

1. Introduction

2. Data Source and Preprocessing

2.1. Data Source

2.2. Data Analysis and Preprocessing

3. Predictive Models and Interpretable Analysis Methods

3.1. Predictive Models and Optimization Algorithms

3.1.1. Paris Law

3.1.2. K-Nearest Neighbor Regression

3.1.3. Random Forest Regression

3.1.4. Adaptive Boosting Regression

3.1.5. Gradient Boosting Regression

3.1.6. Extreme Gradient Boosting Regression

3.1.7. Gaussian Process Regression

3.1.8. Differential Evolution Algorithm

3.2. Model Performance Evaluation Metrics

3.3. Interpretable Analysis Methods

4. Results and Discussion

4.1. Model Prediction Results

4.2. Results of Interpretability Analysis

4.2.1. Feature Importance

4.2.2. Interpretation of the Prediction for a Single Sample

4.2.3. Feature Marginal Effects Based on PDP

4.2.4. Interaction Effects Among Features

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI