Prediction of Multi-Axis Fatigue Life of Metallic Materials Using a Feature-Optimised Hybrid GRU-Attention-DNN Model

Zhou, Mi; Lu, Haishen; Cao, Yuan; Wang, Chunsheng; Chen, Dian

doi:10.3390/eng7010009

Open AccessArticle

Prediction of Multi-Axis Fatigue Life of Metallic Materials Using a Feature-Optimised Hybrid GRU-Attention-DNN Model

by

Mi Zhou

¹,

Haishen Lu

¹,

Yuan Cao

²

,

Chunsheng Wang

² and

Dian Chen

^2,*

¹

CRRC Nanjing Puzhen Co., Ltd., Nanjing 210000, China

²

School of Automation, Central South University, Changsha 410083, China

^*

Author to whom correspondence should be addressed.

Eng 2026, 7(1), 9; https://doi.org/10.3390/eng7010009

Submission received: 6 November 2025 / Revised: 17 December 2025 / Accepted: 22 December 2025 / Published: 29 December 2025

(This article belongs to the Section Electrical and Electronic Engineering)

Download

Browse Figures

Versions Notes

Abstract

To address the challenge of simultaneously modelling temporal evolution and static properties in fatigue life prediction, this paper proposes a Hybrid GRU–Attention–DNN model: The Gated Recurrent Unit (GRU) captures time-evolution features, while the attention mechanism adaptively focuses on critical stages. These are then fused with static properties via a fully connected network to generate life estimates. Training and validation were conducted using an 8:2 split, with baselines including Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and GRU. Performance was evaluated using the coefficient of determination (R²), root mean squared error (RMSE), mean absolute error (MAE), and root mean squared logarithmic error (RMSLE), together with error band plots. Results demonstrate that the proposed model outperforms baseline CNN/GRU/LSTM models in overall accuracy and robustness, and that these improvements remain statistically significant according to bootstrap confidence intervals (CI) of R², RMSE, MAE and RMSLE on the test set. Additionally, this paper conducts an interpretability analysis: attention visualisations reveal the model’s significant emphasis on the early stages of the lifespan. Time window masking experiments further indicate that removing early information causes the most significant performance degradation. Both lines of evidence show high consistency in qualitative and quantitative trends, providing a basis for engineering sampling window design and trade-offs in test duration.

Keywords:

fatigue life prediction; Hybrid GRU-Attention-DNN model; feature optimisation; multiaxial fatigue; deep learning

1. Introduction

Metal materials are extensively utilised in critical industries such as aerospace, automotive, and energy equipment. Their service safety and fatigue life directly determine the reliability and total life-cycle cost of major engineering projects [1]. Actual structures often operate under multiaxial conditions such as coupled tension-torsion loading and variable-amplitude fatigue [2]. Crack initiation and propagation exhibit distinct nonlinearity, stage-dependent behaviour, and path dependency: the combined effects of loading history, nonproportional effects, and material parameters render the fatigue evolution complex and challenging to model explicitly. Traditional methods like the Palmgren-Miner linear cumulative damage model struggle to capture load sequence and interactions [3], often yielding systematic biases under sequential effects or non-proportional loading [4]. Empirical regression models, while convenient for engineering applications, exhibit limited generalisation when transferring to new operating conditions [5]. Meanwhile, multi-axis fatigue testing is costly and has limited coverage, constraining model validation and generalisation.

Over the past decades, multiaxial fatigue has been predominantly treated by critical-plane and stress-invariant criteria, such as the classical approaches proposed by Dang Van, Findley, McDiarmid and Matake. These methods project the local stress or strain tensor onto candidate planes and relate suitable combinations of shear and normal components to fatigue strength. They have been successfully applied to a wide range of metallic components under proportional and non-proportional loading, and remain the backbone of many design codes. However, most of these criteria rely on relatively low-dimensional descriptors (for example, equivalent shear stress ranges and mean normal stresses) and simple linear or bilinear damage laws. As a result, their accuracy may degrade when dealing with complex variable-amplitude histories, strong mean-stress effects or geometrically intricate components, especially when only limited experimental data are available to calibrate material parameters.

Recent studies have further demonstrated the practical importance and limitations of traditional multiaxial schemes in engineering applications. For example, Lee et al. analysed the multiaxial fatigue behaviour of additively manufactured Ti-6Al-4V and showed that surface texture and manufacturing-induced defects strongly affect life under complex loading paths, even when conventional criteria are employed for modelling [6]. Venturini et al. combined modal analysis and surrogate meta-models to assess the fatigue strength of automotive steel wheels subjected to service-like multiaxial loads, highlighting the need to couple structural dynamics with fatigue criteria in vehicle components [7]. Liu et al. investigated high-cycle multiaxial fatigue of 30CrMnSiA steel with combined mean tensile and shear stresses, and discussed the capability of critical-plane models to reproduce the observed failure envelopes [8]. These works underscore that, while classical criteria remain valuable for design and interpretation, there is increasing demand for models that can directly exploit rich experimental databases and capture highly nonlinear interactions among loading history, surface condition and material properties.

Against this background, neural-network (NN) approaches can be seen as a data-driven complement to classical multiaxial criteria. Instead of prescribing a specific combination of stress or strain components, deep models learn mappings directly from high-dimensional inputs—such as full loading histories, strain-rate signals and extended sets of material descriptors—to fatigue life. In principle, this enables the network to infer complex, nonlinear interactions that are difficult to encode in closed-form criteria, while still allowing classical models to provide baseline predictions, sanity checks or additional engineered features. The present work follows this complementary philosophy by using multiaxial fatigue data both to build a learning-based predictor and to interpret its behaviour in the light of established fatigue mechanisms [9].

Data-driven approaches offer a novel technical pathway for fatigue life prediction [10,11]. Convolutional networks excel at local/spatial feature extraction, while gated recurrent networks can capture sequence dependencies over extended time scales, demonstrating potential in life-related tasks. However, a single model struggles to simultaneously address the dual demands of “time-series cumulative damage” and “static feature coupling” [12]: Time-series models alone inadequately express higher-order relationships in static parameters like stress amplitude and load frequency; while spatial models alone struggle to preserve cyclic history information. Simply blending or concatenating multi-source features often leads to information competition and weight imbalance. Attention mechanisms [13] driven solely by data lack explicit correspondence with critical fatigue mechanism stages (e.g., damage jump phase, rapid stiffness decay phase), compromising interpretability and stability.

To overcome these limitations, hybrid models have emerged as a research focus. Compared to Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU) structures are simpler with fewer parameters, theoretically offering convergence and generalisation advantages in low-sample-size tasks [14]. This paper adopts GRU as one methodological rationale but does not conduct additional low-sample experiments for validation. Attention mechanisms can focus on critical damage stages like crack propagation by computing feature weights, thereby amplifying the influence of core information on prediction outcomes [15]. Deep Neural Networks (DNNs) excel at capturing higher-order coupling relationships in static features such as stress amplitude and load frequency through multi-layer nonlinear mappings [16,17]. Based on this, this paper constructs a feature-optimised Hybrid GRU-Attention-DNN fusion model. The overall approach involves decoupling dynamic and static information for modelling, followed by deep interaction in the fusion layer: the dynamic branch employs GRU to represent cyclic history while introducing attention to weight potentially critical damage stages; the static branch utilises multilayer perceptrons to capture high-order nonlinear couplings of static features like stress amplitude and load frequency [18,19]. The fusion layer achieves complementary integration of both information types within a unified representation space. To enhance consistency with the underlying mechanism, the attention design incorporates a lightweight prior bias constructed from the rate of change in strain amplitude. This strength parameter is learnable, enabling end-to-end training while guiding the model to focus on phase signals with engineering significance. To reduce redundancy and improve training stability, core features are pre-selected using Pearson correlation coefficients to eliminate noise and invalid inputs [20,21].

Compared with previously reported hybrid deep-learning architectures for fatigue and remaining-life prediction, such as LSTM–attention–Multilayer Perceptron (MLP) models, Convolutional Neural Network (CNN)–Recurrent Neural Network (RNN) cascades and Transformer–GRU schemes, the present framework is designed with three distinctive characteristics that are particularly tailored to multiaxial fatigue data [22]. First, it explicitly decouples static material descriptors from dynamic multiaxial loading histories: scalar mechanical properties and loading parameters are processed by a dedicated DNN branch, whereas the cyclic histories are encoded by a GRU branch. Fusion is only performed at a high-level representation stage, which helps disentangle “material-related” and “path-related” influences on life and avoids the feature competition that may occur when all variables are fed into a single backbone. Second, the attention mechanism is attached to the entire GRU hidden-state sequence and is biased by a physics-inspired prior built from the rate of change in strain amplitude, so that life stages associated with rapid stiffness loss or strain jumps receive systematically higher weights; by contrast, existing LSTM–attention–MLP designs usually apply purely data-driven attention without an explicit link to damage-evolution stages. Third, the static branch is realised as a compact DNN that operates only on a feature-optimised subset of inputs instead of the full metadata table, which reduces redundancy and improves robustness under the moderate sample size of the Materials Cloud dataset. This specific combination of GRU, physics-guided attention and feature-optimised DNN was chosen to balance expressiveness, interpretability and parameter efficiency, and the experiments in Section 3 show that it consistently outperforms single-branch CNN/GRU/LSTM baselines in both accuracy and robustness.

The main contributions of this work can be summarised as follows: (1) A feature-optimised Hybrid GRU–Attention–DNN architecture is proposed for multiaxial fatigue-life prediction, in which dynamic loading histories and static material descriptors are explicitly decoupled and processed by specialised branches before being fused in a joint representation space. (2) A physics-guided attention mechanism biased by strain-rate changes is introduced to highlight critical damage stages, providing improved robustness and enhanced interpretability compared with purely data-driven attention schemes. (3) A comprehensive experimental study is conducted on the Materials Cloud multiaxial fatigue dataset, including comparisons with CNN/GRU/LSTM baselines and bootstrap-based confidence-interval analysis, as well as attention visualisation and time-window masking experiments that jointly reveal the dominant role of early-life dynamic responses in fatigue-life prediction.

The subsequent sections of this paper are structured as follows: Section 2 details the data sources, preprocessing methods, feature selection workflow, and hybrid model construction specifics; Section 3 presents experimental results, including feature selection outcomes, model performance comparisons, and analysis; Section 4 summarises research conclusions and outlines future improvement directions.

2. Data and Methods

2.1. Data Sources and Preprocessing

The data used in this study are derived from the public dataset “A deep learning dataset for metal multiaxial fatigue life prediction” published on Materials Cloud. The original dataset contains 1167 fatigue samples from 40 metallic materials under 48 multiaxial loading paths. Each sample consists of a summary record storing mechanical properties and fatigue life, and a separate CSV file storing the corresponding multiaxial loading path as a two-channel time series. In this work, we select the strain-controlled subset with complete mechanical properties and loading-path files, and obtain 914 samples from 40 materials after basic cleaning, which are further split into 731 training samples and 183 test samples. According to the data descriptor in [23], the experiments were performed on a servo-hydraulic multiaxial fatigue testing machine.

During each test, axial stress and strain responses were recorded by the machine’s data acquisition system and exported as comma-separated value (CSV) files. These CSV files, together with the associated test metadata, are curated by the dataset providers into a structured digital database on the Materials Cloud platform. In this study, we download the metadata table and loading-path CSV files, import them into a Python-based scientific computing environment, and merge the scalar metadata with the two-channel loading paths to form input–output pairs for subsequent feature construction, normalisation and model training. In the present work, we directly used the processed digital dataset released by the authors. The metadata table and all associated time-series files were imported into Python 3.9 for subsequent feature construction and normalisation. Table 1 summarises all fields available in the raw database used in this work, including both metadata and per-test time-series quantities, together with their types, units and roles in the proposed model.

Before modelling, the metadata were checked for quality. An extra empty column that contained only missing entries was removed. For the remaining quantities listed in Table 1, no missing values were detected. To enhance robustness against potential missing data in future applications, a median-based imputation step was retained in the preprocessing pipeline, although it did not modify any samples in the current dataset.

To assess whether any input variables were nearly constant and thus uninformative, we performed a simple variance analysis on the scalar features. For each of the four mechanical properties listed in Table 1, the sample variance was computed over all training specimens. The results show that all four quantities exhibit a clear spread across the dataset, and none of them behaves as a near-constant variable. Therefore, no feature was discarded on the basis of low variance, and all scalar inputs in Table 1 were retained for subsequent modelling.

In addition, Pearson correlation analysis was carried out among the four scalar input features and the fatigue-life target. For each pair of variables, the linear correlation coefficient was computed on the training data to quantify their mutual dependence. The resulting correlation matrix indicates that some mechanical properties (for example, ultimate tensile strength and yield strength) are moderately correlated, whereas their correlations with fatigue life remain at a moderate level and no pair of variables exhibits near-perfect linear dependence. Consequently, none of the features listed in Table 1 was removed on the basis of excessive correlation, and all scalar inputs were retained in the subsequent modelling.

In addition, Pearson correlation analysis was performed among the four scalar input features to quantify their linear interdependency, as shown in Figure 1. A strong positive correlation is observed between TS and the yield strength

σ_{s}

(

r = 0.71

), indicating a high degree of linear dependence between these two strength-related descriptors. E exhibits a moderate correlation with

σ_{s}

(

r = 0.65

) and a weaker correlation with TS (

r = 0.40

). In contrast, the hardening exponent m shows negative correlations with the three mechanical properties, with

r = - 0.31

(E − m),

r = - 0.40

(TS − m), and

r = - 0.42

(

σ_{s}

− m). Although TS and

σ_{s}

are strongly correlated, all four descriptors were retained because they capture complementary mechanical aspects and are jointly used as inputs in the subsequent modelling.

All continuous variables were subsequently standardised to make them comparable in scale. For each scalar feature and for the fatigue-life target, the mean

μ

and standard deviation

σ

were first computed on the training set, and every value

x

was then transformed according to

(x - μ) / σ

. The same statistics were used to transform the test set, so that no information from the test data leaked into training. This z-score normalisation was implemented by the StandardScaler routine provided in the scikit-learn library. For the two-channel loading-path time series, the same standardisation procedure was applied separately to each channel before they were fed into the neural network.

After preprocessing, the specimens were randomly divided at the specimen level into a training set and a test set, with 80% of the samples used for training and 20% held out for testing (fixed random seed). As a consequence, different specimens of the same material may appear in both subsets. This setting matches the intended application scenario, where basic mechanical properties of a material are already available from previous tests and the model is used to predict fatigue life under new multiaxial loading paths for materials that have been characterised before. Generalisation to completely unseen materials is beyond the scope of the present study and will be considered in future work using material-wise partitioning.

To ensure data quality, outliers were first processed using the

3 σ

rule [24]: the mean (

μ

) and standard deviation (

σ

) of each feature were calculated, and data points satisfying Formula (1) were identified as outliers.

|x - μ| > 3 σ

(1)

Subsequently, all features were standardised using Z-score normalisation [25], with the formula being:

z = \frac{x - μ}{σ}

(2)

This transformation uniformly maps the eigenvalues to a distribution range with a mean of 0 and a standard deviation of 1, thereby eliminating interference from dimensional differences in parameters such as the number of cycles (unit: cycles), stress amplitude (unit: MPa), and load frequency (unit: Hz) during model training.

2.2. Fusion Model Construction

2.2.1. GRU Model

The GRU is a simplified variant of the LSTM network [26,27]. GRU merges the input and forget gates into a single update gate, and uses a reset gate to control how past information is combined with the current input. Like LSTM, GRU retains relevant past information while selectively incorporating new inputs, thereby providing effective memory and mitigating vanishing gradients [28]. The key difference lies in GRU’s removal of the memory control mechanism present in LSTM, thereby simplifying computational complexity, as illustrated in Figure 2.

Owing to its simplified gating, GRU is more compact. The update and reset gates jointly determine how the hidden state is updated at time step

t

. The hidden state is driven by both the current input and the hidden state from the previous time step [29]. Below is an introduction to these two components.

The update gate determines which information from the previous hidden state should be retained and carried forward to the current time step, as expressed by the formula:

z_{t} = σ (W_{z} x_{t} + U_{z} h_{t - 1} + b_{z})

(3)

σ

is the sigmoid function,

W_{z}

is the parameter matrix input to the update gate,

U_{z}

is the connection weight from the hidden state to the update gate,

b_{z}

is the bias term,

x_{t}

is the input vector at this time step,

h_{t - 1}

is the hidden state from the previous time step.

The reset gate determines which hidden states from the previous time step should be ignored.

r_{t} = σ (W_{r} x_{t} + U_{r} h_{t - 1} + b_{r})

(4)

W_{r}

is the weight matrix input to the reset gate,

U_{r}

is the parameter matrix input to the reset gate, and

b_{r}

is the offset.

Calculated based on the current input and the previous hidden state adjusted by the reset gate, it serves as a candidate for updating the hidden state:

\tilde{h} = t a n h (W_{h} x_{t} + U_{h} (r_{t} \otimes h_{t - 1}) + b_{h})

(5)

t a n h

is the hyperbolic tangent function.

W_{h}

and

U_{h}

are the parameter matrices for transforming the hidden state into candidate hidden states after processing by the input and reset gates, respectively.

b_{h}

is the bias parameter.

The current hidden state is obtained by weighting and combining the previous hidden state and candidate hidden states through the update gate:

h_{t} = (1 - z_{t}) \otimes h_{t - 1} + z_{t} \otimes \tilde{h}

(6)

In this paper, the temporal inputs are organised by cycle index

t = 1, \dots, T

, where each time step contains two types of dynamic features: “cycle count” and “strain rate”. A single-layer GRU maps sequences of length

T

to hidden state sequences

{h_{t}}_{t = 1}^{T}

, where

h_{t}

represents a compressed representation of cumulative damage evolution up to the

t

-th cycle pair. Intuitively, if a sudden change in strain magnitude or accelerated damage occurs at a certain stage, the

{h_{t}}

corresponding to adjacent segments will exhibit significant differences, providing discriminative power for subsequent attention weighting.

In practice, the temporal branch uses a single-layer GRU with 64 hidden units. These hyperparameters were determined by a small grid search (hidden size ∈ {32, 64, 128}, number of layers ∈ {1, 2}) on the validation set. Larger configurations improved the validation accuracy by less than one percentage point but almost doubled the training time and memory consumption. Because the dataset is of moderate size and the model has to be retrained many times for ablation and interpretability studies, running a full Bayesian or evolutionary hyper-parameter optimisation, which would require tens to hundreds of additional full trainings, was not feasible within our computational budget and would not substantially change the conclusions. Therefore, we adopt a limited grid/hand-tuning strategy guided by validation performance and domain knowledge.

2.2.2. Attention Mechanism

The attention mechanism is a model that simulates the human brain’s attention system. It can be viewed as a composite function that calculates the probability distribution of attention to emphasise the influence of a key input on the output [30].

This model maps a variable-length input

X = (x_{1}, x_{2}, \dots, x_{n})

to a variable-length output

Y = (y_{1}, y_{2}, \dots, y_{m})

. Specifically, the Encoder transforms a variable-length input sequence X into an intermediate semantic representation

C : C = f (x_{1}, x_{2}, \dots, x_{n}

) through a nonlinear transformation. The Decoder’s task is to predict and generate the output

y_{1}, y_{2}, \dots, y_{i - 1}

at time step i based on the intermediate semantic representation C of the input sequence X and the previously generated

y_{i} = g (y_{1}, y_{2}, \dots, y_{i - 1}, C)

, where both

f ()

and

g ()

are nonlinear transformation functions. Due to the lack of discriminative power in traditional Encoder–Decoder frameworks for input sequences X, the attention mechanism was introduced to address this issue [31]. Their proposed model architecture is illustrated in Figure 3.

Among these,

s_{t - 1}

represents the hidden state at time step

t - 1

on the decoder side,

y_{t}

denotes the target word, and

C_{t}

signifies the context vector. The hidden state at time step

t

is then calculated as shown in Equation (7).

s_{t} = f (s_{t - 1}, y_{t - 1}, C_{t})

(7)

C_{t}

depends on the hidden layer representation of the input sequence at the encoding end, which can be expressed as shown in Equation (8) after weighted processing.

C_{t} = \sum_{j = 1}^{T} α_{t, j} h_{j}

(8)

Among these,

h_{j}

represents the latent vector of the j-th word on the Encoder side. It contains information about the entire input sequence but focuses primarily on the vicinity of the j-th word. T denotes the input sequence length.

α_{t, j}

represents the attention weight assigned by the j-th word in the encoder to the t-th word in the decoder, where the sum of all

α_{t, j}

probability values equals 1. The calculation formula for

α_{t, j}

is shown in Equation (9).

\begin{array}{l} α_{t, j} = \frac{e x p (a_{t, j})}{\sum_{j = 1}^{T} e x p (a_{t, j})} \\ a_{t, j} = a (s_{t - 1}, h_{j}) \end{array}

(9)

Among these,

a_{t, j}

represents an alignment model used to measure the degree of alignment of a word at position

j

on the encoder side relative to a word at position

t

on the decoder side.

This paper focuses attention on identifying “critical stages” of fatigue. The approach involves: using the hidden state obtained by the GRU at each cycle as a temporal slice representation, first calculating its similarity with the global sequence summary as a base weight; Simultaneously, the normalised absolute value of the “strain rate” extracted from dynamic features serves as a learnable strength-based prior bias superimposed on the base weight. This naturally amplifies attention intensity during phases of strain sudden changes or rapid stiffness decay. Subsequently, the softmax function yields attention weights for each time step, which are then weighted and aggregated to form a temporal feature summary [32].

The scaled dot-product attention module in this work is deliberately kept lightweight: it shares the same hidden dimension as the GRU output (64), and uses single-head projections for queries, keys and values followed by softmax pooling. In preliminary trials we compared mean pooling, max pooling and the proposed single-head attention, and found that attention consistently improved the prediction accuracy while introducing only a small number of additional parameters. More complex variants with multi-head attention or larger projection dimensions brought only marginal gains (less than one percentage point in validation accuracy) but nearly doubled the computational cost and tended to overfit due to the limited dataset size. For this reason we did not perform a separate large-scale Bayesian search for the attention hyperparameters, and instead fixed this minimal configuration based on the above validation experiments.

2.2.3. Deep Neural Network Model

DNN consists of multiple layers of fully connected neurons. Through nonlinear activation functions and multi-layer mapping, they can approximate complex nonlinear functions. In fatigue life prediction problems, DNNs are primarily used for high-order nonlinear modelling of static features such as stress amplitude and load frequency [33].

As shown in Figure 4, a deep neural network can be divided into an input layer, hidden layers, and an output layer. All these layers are fully connected, meaning that any neuron in layer

i

is connected to all neurons in layer

i +

. When performing forward propagation in a deep neural network, the calculation formula is as follows:

u_{k} = \sum_{i = 1}^{n} W_{K i} X_{i}

(10)

y_{k} = f (u_{k} - b_{k})

(11)

Here,

X_{i}

denotes the i-th input variable,

W_{K i}

represents the weight connected to the i-th input variable,

u_{k}

is the weighted sum of all input variables,

b_{k}

is the threshold,

f ()

is the activation function, and

y_{k}

is the output of the neural network. Common activation functions include ReLU, sigmoid, and tanh. ReLU is frequently employed to mitigate the vanishing gradient problem during backpropagation. This paper utilises ReLU activation in both the input layer and hidden layers. The expression is as follows:

f (x) = m a x (0, x)

(12)

In this paper, the DNN module adopts a three-layer architecture (64-32-16 neurons) to progressively extract and compress static features. These are then combined with the temporal features output by the GRU-Attention module for lifespan prediction, thereby achieving collaborative modelling of dynamic and static features.

The fully connected sub-network that processes the static material properties consists of three hidden layers with 64, 32 and 16 neurons, respectively. Each hidden layer uses a linear transformation followed by a ReLU activation, batch normalisation and a dropout layer with a rate of 0.2 to reduce overfitting. The output of this DNN branch is a 16-dimensional feature vector, which is later concatenated with the temporal representation learned by the GRU–attention branch for final fatigue life prediction.

The sizes of the hidden layers were selected by a small grid and hand-tuning procedure on the validation set. We tested one- to three-layer configurations with hidden widths in the range of 32–128 units and observed that the three-layer setting with 64-32-16 units provided a good balance between prediction accuracy and model complexity. The whole hybrid model was trained end-to-end using the AdamW optimiser with an initial learning rate of 1 × 10⁻³ and a weight decay of 1 × 10⁻⁵. The batch size was set to 32 and the maximum number of training epochs was 1000, with a learning-rate scheduler that reduces the learning rate when the validation mean squared error stops improving. The model parameters used for testing correspond to the epoch with the lowest validation error.

2.2.4. Integration Method

In complex fatigue life prediction tasks, a single model often struggles to simultaneously capture long-term dependencies in time-series features and nonlinear expressions of static structural characteristics. To overcome this limitation, this paper proposes a Hybrid GRU-Attention-DNN fusion model. Through modular design, this model enables collaborative modelling of multi-source features: GRU captures temporal dependencies during cyclic loading, Attention enhances key stage features, and DNN models nonlinear effects of static parameters like stress amplitude and load frequency. Finally, the fusion layer achieves unified representation and outputs life prediction values.

The overall architecture of the Hybrid GRU-Attention-DNN model is illustrated in Figure 5. It comprises six distinct yet seamlessly integrated components: an input layer, a temporal feature extraction module, an attention-weighted module, a static feature modelling module, a feature fusion layer, and an output layer. The input layer normalises raw features and distinguishes between dynamic and static information; the GRU module focuses on capturing fatigue evolution patterns over extended time scales; the Attention layer highlights critical damage stages; the DNN module extracts higher-order relationships among static features; the fusion layer merges both feature types into a unified representation; and the output layer ultimately maps these to fatigue life prediction results.

This integration strategy differs from conventional “sequence encoder + MLP” schemes, where all variables are concatenated and processed by a single backbone. By decoupling static material properties from dynamic loading histories and combining them only at the fusion layer, the proposed model can assign specialised sub-networks to learn long-term temporal patterns and higher-order static relationships, respectively. The attention mechanism then adaptively weighs the temporal hidden states before fusion, so that the final prediction is mainly driven by those life stages that are most critical for damage accumulation. This modular design improves both prediction accuracy and interpretability compared with monolithic architectures.

In the input layer, the original multiaxial fatigue database provides both numerical and categorical descriptors, such as material label, specimen identifier, loading type and specimen geometry, in addition to several fatigue-related variables. To build a compact and physically meaningful input space, this study focuses on four core fatigue features: number of cycles, strain rate, stress amplitude and load frequency. The number of cycles and strain rate are recorded along the loading path and are therefore treated as dynamic temporal features, while the stress amplitude and load frequency are used as static features. Other numerical or categorical fields either contain a large fraction of missing or inconsistent entries, or mainly encode information that is already reflected in these four variables and in the loading histories. Including them would substantially increase the input dimensionality and the risk of overfitting for the given dataset size. To avoid training interference caused by scale differences, all selected features are further processed by Z-score normalisation and batch normalisation.

For each specimen, the dynamic input is constructed from the multiaxial strain-controlled loading history provided in the Materials Cloud database. The history is represented as a two-channel time series consisting of the (normalised) number of cycles and the strain rate. One representative loading block covering the whole fatigue life is extracted for every specimen and uniformly sampled along the life axis to obtain a fixed length of T = 241 points. In this way, all sequences share the same number of time steps and can be stacked into an array of shape (N, T, 2) for N specimens. At each time step, the two channels are standardised by subtracting the mean and dividing by the standard deviation computed from the training set. No additional numerical integration or moving-average smoothing is applied; instead, the GRU–attention branch directly learns temporal patterns such as changes in strain amplitude, load reversals and damage-accumulation stages from the original normalised histories.

The temporal feature extraction module employs a GRU network. Through dynamic adjustment via update and reset gates, the GRU effectively preserves early damage effects within million-cycle load sequences and propagates them to later stages [34], thereby revealing latent correlations between “low-amplitude cycling” and “high-amplitude cycling” phases. The GRU outputs hidden state vectors embodying temporal evolution patterns.

The Attention module further applies weighted distribution to the GRU output. Its core principle involves calculating similarity between queries and keys to generate a weight matrix for value vector weighting [35]. This significantly amplifies the impact of damage transition points (e.g., sudden stiffness drop phases) on life prediction, enhancing the model’s sensitivity to critical stages.

The static feature modelling module employs a three-layer DNN. Stress amplitude and load frequency undergo sequential nonlinear mapping to progressively extract their higher-order combinatorial features [36]. This module identifies the coupled effect of “high stress inducing stress concentration + high load accelerating damage accumulation”, outputting a compact static feature representation. To mitigate overfitting, each DNN layer incorporates dropout and batch normalisation [37].

The fusion layer concatenates the attention output (64-D) and the DNN output (16-D) into an 80-D vector, which is fed to a fully connected layer with ReLU to promote interaction between dynamic and static information.

It is worth emphasising that this integration strategy differs from previously reported hybrid architectures that combine LSTM units with attention and MLP heads or embed GRU cells inside Transformer-style temporal encoders. In LSTM–attention–MLP fatigue models, static and dynamic variables are often concatenated and processed by the same recurrent backbone, and attention is applied purely on the hidden states. In contrast, the present Hybrid GRU–Attention–DNN assigns separate, specialised branches to dynamic and static information, and uses attention that is explicitly guided by strain-rate changes, which enhances both physical interpretability and numerical stability. Compared with Transformer–GRU hybrids, which introduce multi-head self-attention layers with a relatively large number of parameters, the proposed design deliberately adopts a lightweight single-layer GRU with single-head scaled dot-product attention. This keeps the parameter count and computational cost commensurate with the available dataset size, reducing the risk of overfitting while still capturing the key temporal patterns that drive fatigue damage accumulation.

The output layer uses a single linear neuron to produce the final fatigue-life prediction, consistent with a regression objective. The mean squared error (MSE) loss function balances sensitivity to overall error and extreme values.

2.3. Verification Plan

To assess the proposed Hybrid GRU-Attention-DNN for multiaxial fatigue-life prediction, we design a systematic validation protocol. The protocol comprises four components: dataset partitioning, training/testing procedures, metric selection, and comparative experiments.

First, after preprocessing, all samples were split at the specimen level into a training set and a held-out test set with an 80/20 ratio using a fixed random seed. From the training set, a validation subset was further split off for model selection and early stopping, while the test set was kept completely untouched until the final evaluation. Because the split is specimen-level rather than material-wise, different specimens of the same material may appear in both training and test sets. Therefore, the reported performance mainly reflects generalisation to unseen loading paths of already characterised materials, rather than to completely unseen materials.

Second, within the training set we created a validation subset for model selection and early-stopping decisions. The training subset facilitated parameter updates, while the validation subset monitored performance changes during training to prevent overfitting. The training phase employed a mini-batch gradient descent strategy combined with the Adam optimizer for efficient parameter updates [38]. The learning rate was selected via a small grid search on the validation subset to balance convergence speed and stability. Other hyperparameters such as the GRU hidden size, the widths of the DNN layers, the batch size and the dropout rate were adjusted through several exploratory runs on the validation data. We monitored the training and validation curves under different settings and chose the configuration that achieved a good trade-off between fitting accuracy, generalisation on the validation subset and training time, and then kept it fixed when comparing different architectures.

During testing, the model outputs the logarithmic life

z = {l o g}_{10} (N_{f})

. Both predictions and ground truths are transformed back to the cycle-life scale by

N_{f} = 10^{z}

, and all metrics in Equations (13)–(16) are computed on the original life scale

N_{f}

.

Regarding evaluation metrics, this paper adopts four complementary indicators: the coefficient of determination (R²), root mean squared error (RMSE), mean absolute error (MAE), and root mean squared logarithmic error (RMSLE). During testing, the predicted and true fatigue lives are first mapped back from the logarithmic outputs

l g (N f)

to the original number-of-cycles scale via

N f = 10^{l g (N f)}

. All four metrics are then computed on this original life scale

N f

. MAE reflects the model’s ability to control absolute deviations between predicted and true fatigue lives. RMSE penalises larger errors more strongly than MAE and therefore highlights occasional large deviations. RMSLE evaluates the discrepancy between predicted and true values on a logarithmic scale, which is suitable when fatigue life spans several orders of magnitude and engineering practice often cares about relative rather than purely absolute errors. R² measures the goodness of fit (with values closer to 1 indicating better agreement). Combining these four metrics provides a comprehensive assessment of both absolute accuracy and relative stability across the entire life range. Their respective calculation formulas are:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}}{\sum_{i = 1}^{n} (y_{i} - \bar{y})^{2}}

(13)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}}

(14)

M A E = \frac{1}{n} \sum_{i = 1}^{n} ∣ y_{i} - {\hat{y}}_{i} ∣

(15)

M S L E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(l n (1 + y_{i}) - l n (1 + {\hat{y}}_{i}))}^{2}}

(16)

where

y_{i}

and

{\hat{y}}_{i}

denote the ground-truth and predicted fatigue lives of the

i

-th sample on the original cycle-life scale

N_{f}

, respectively;

n

is the number of test samples;

\bar{y} = \frac{1}{n} \sum_{i = 1}^{n} y_{i}

is the mean of the ground-truth values; and

l n (\cdot)

denotes the natural logarithm. The

l n (1 + \cdot)

transform in RMSLE is used to evaluate relative discrepancies and to avoid taking the logarithm of zero.

3. Experiments and Results

3.1. Experimental Setup

To validate the performance of the proposed Hybrid GRU–Attention–DNN model for predicting the multiaxial fatigue life of metallic materials, a unified experimental environment was established. All models were implemented in Python 3.9 using the TensorFlow 2.10 framework and were trained on a workstation equipped with an Intel i7-14650HX CPU, 32 GB RAM, and an NVIDIA RTX 4060 GPU running 64-bit Windows 11. Data preprocessing and metric computation were carried out using standard scientific libraries, including NumPy, Pandas, SciPy, and scikit-learn, and all plots were generated using Matplotlib (v3.7.1). Random seeds were fixed for the relevant libraries to enhance reproducibility.

The architectures and training hyperparameters of the Hybrid model and all baseline networks follow the design described in Section 2.2 and the validation protocol in Section 2.3. In particular, the CNN, GRU, and LSTM baselines share exactly the same data preprocessing, input normalisation, loss function, and optimisation settings as the Hybrid model; only the temporal backbone is replaced in each case. During testing, the trained models are applied to the held-out test set, and their predictions are evaluated using the four metrics and the error-band analyses defined in Section 2.3.

3.2. Comparison of Results

To intuitively and accurately evaluate the performance advantages of the Hybrid GRU-Attention-DNN model in predicting the multi-axis fatigue life of metallic materials, this paper conducts comparative experiments against standalone CNN, GRU, and LSTM models. Through scatter plots of predicted versus actual values (including error bands of ±1.5×, ±2× and ±3×) and test-set fitting curves, the analysis comprehensively evaluates both visual trends and numerical accuracy using four quantitative metrics: R², MAE, RMSE and RMSLE. In Figure 6a–d, the predicted and actual fatigue lives are plotted on the logarithmic scale

l g (N f)

for visual clarity, whereas the numerical values of R², MAE, RMSE and RMSLE reported in the inset of each subfigure are computed on the original life scale

N f

after inverse transformation from

l g (N f)

.

From the scatter plot (Figure 6a,b), it can be seen that the core differences among the four models are significant: The CNN model exhibits a more dispersed scatter overall, particularly in the medium-to-high fatigue life range (5 × 10⁵–1.5 × 10⁶ cycles). A significant number of predicted points deviate from the diagonal line and fall outside the ±2 error band. This occurs because CNNs excel at extracting local spatial features but struggle to capture the temporal correlations in the fatigue evolution of metallic materials, making it difficult to match the damage accumulation patterns that develop over cycles. The GRU model exhibits greater scatter point concentration than CNN. Prediction points in the low-life range (<5 × 10⁵ cycles) largely align with the diagonal line. However, systematic deviations persist in the medium-to-high life range, with most predictions falling below actual values. Additionally, some points breach the ±1.5× error band during abrupt strain amplitude changes. This reflects GRU’s ability to capture temporal trends but its lack of focus on critical damage nodes. The LSTM model exhibits more dispersed scatter points than GRU: significant fluctuations occur across the entire lifespan range, with some points exceeding ±2 times the error band in the low lifespan range and irregular deviations appearing in the medium-to-high lifespan range. This may stem from its more complex gating structure and larger parameter scale, which under the current dataset size and feature combination are more prone to parameter redundancy and noise sensitivity, thereby impacting generalisation stability. In contrast, the Hybrid GRU-Attention-DNN model’s scatter points cluster closer to the diagonal line. Over 90% of predictions across the entire lifespan fall within ±1.5 times the error band, with no significant systematic bias in the medium-to-high lifespan range. Even during abrupt strain amplitude changes, the model maintains high concentration. This advantage stems from the synergistic interaction of “GRU capturing temporal dependencies + Attention focusing on critical nodes + DNN fitting static coupling characteristics,” enabling the model to more accurately represent the mechanism of “cumulative temporal damage + static stress regulation”.

The fitting curve of the test set (Figure 7a–d) further demonstrates the differences from the dimension of “sample number–life value”. In this figure, the vertical axis represents the logarithmic fatigue life

l g (N f)

rather than the raw number of cycles to failure. This logarithmic representation compresses the several-orders-of-magnitude span of fatigue life into a numerically moderate range, so that both short-life and long-life specimens can be displayed within the same plot without the latter dominating the scale. On this scale, equal vertical distances correspond to similar multiplicative factors in the original life, making deviations at different life levels visually comparable and highlighting trend consistency between the fitted and true curves. The CNN model exhibits the greatest deviation from the true value curve, with significant gaps in the medium-life (sample 40–80) and high-life (sample 120–160) ranges. It completely fails to accurately predict outliers such as sample 72 (high stress amplitude condition) and sample 145 (variable amplitude cycling condition), further validating its lack of temporal modelling capability; The GRU model’s fitted curve generally follows the true value trend but exhibits a “lag phenomenon” in samples 80–100 (the strain amplitude sudden change phase). When the true value steeply declines, the predicted curve remains flat, reflecting a delayed response to sudden damage signals. The LSTM model’s fitted curve shows significant fluctuations: 100–120, and exhibits irregular deviations in the remaining samples. This may stem from its more complex gating structure, parameter scale, and regularisation settings, making it more sensitive to noise under the current dataset size and feature combination. In contrast, the Hybrid GRU-Attention-DNN model’s fitted curve shows high consistency with actual values: it maintains good alignment in the conventional sample range (0–40, 160–180), and effectively tracks both the strain magnitude transition zone (80–100) and outliers (72, 145). This demonstrates its ability to balance overall trends with critical local signals, resulting in smaller prediction errors for individual samples.

Further quantitative analysis was conducted based on four complementary metrics. As shown in Table 2, the Hybrid model achieves an R² value of 0.9156, representing a 9.9% improvement over CNN (0.8332) and a 7.5% improvement over LSTM (0.8520). It is slightly higher than GRU (0.9115), indicating that it more accurately captures the “feature-lifespan” mapping relationship and demonstrates the strongest consistency between its overall prediction trends and actual patterns. MAE of the Hybrid model is the lowest among all candidates, indicating the smallest average prediction deviation per sample and good stability under normal conditions. The Hybrid model also achieves the lowest RMSE, confirming that it effectively suppresses occasional large absolute errors. In addition, its RMSLE is clearly lower than those of CNN, GRU and LSTM, which shows that the model maintains smaller relative errors across the full fatigue-life range, including very short- and very long-life specimens. Together with the highest R², these results demonstrate that the Hybrid GRU–Attention–DNN provides the best overall trade-off between accuracy and robustness.

Compared to other models, the Hybrid GRU–Attention–DNN model demonstrates significant advantages in scatter distribution and three key metrics. It effectively integrates the temporal and static characteristics of metallic materials, enabling high-precision prediction of their multi-axis fatigue life. Its superior performance in the medium-to-high life range and during sudden strain amplitude changes further validates the role of the Attention module and DNN module in enhancing prediction accuracy and model robustness.

To further verify whether the observed performance improvement was statistically significant rather than accidental, we analysed the distribution of prediction errors on the test set and estimated the confidence intervals of the main indicators. For each model, the residuals on the original life-scale

N f

are collected and visualised as box plots as shown in Figure 8. The hybrid GRU-Attention-DNN exhibits the narrowest quartile range and the smallest extreme residual distribution, indicating that its superiority is not driven by a few favourable samples but remains unchanged in the majority of samples. In addition, the 95% confidence intervals of the four metrics estimated by the non-parametric bootstrap programme for 1000 samples were recorded in Table 3. Compared with the CNN and LSTM baseline, the R² of the hybrid GRU-Attention-DNN was the highest, while the RMSE, MAE and RMSLE were the lowest. Its confidence intervals shifted significantly in a better direction, but were still lower than the GRU baseline. These results indicate that, under the current dataset, the improvement of the model on the baseline is statistically significant.

3.3. Explainability Analysis

To reveal the discriminative basis of the Hybrid GRU–Attention–DNN in lifespan prediction, we derived attention weights for each time branch and normalised sequences of varying lengths to the range [0, 1] (approximately 241 points) based on “lifespan progress”. We plotted sample × time attention heatmaps and global average attention curves on the test set, while dividing the timeline into Early/Mid/Late segments to calculate their proportions. Furthermore, through time window masking experiments, we set the input of each segment to zero to evaluate

Δ M A E

.

To construct the sample–time attention heatmap in Figure 9a, the attention-weight sequence of each test specimen is first extracted along the fatigue-life axis. Because different specimens fail after different numbers of cycles, the life axis of each specimen is normalised by dividing the current cycle number by its failure cycle number, yielding a life fraction between 0 and 1. The attention sequence is then interpolated on this normalised axis so that every specimen is represented by a fixed number of points (241 positions in this study). These interpolated sequences are stacked row by row to form a two-dimensional matrix whose rows correspond to specimens and whose columns correspond to the normalised life positions. The rows are further ordered according to the absolute prediction error of each specimen so that samples with higher errors appear at the top of the diagram. This matrix is finally displayed as a colour-coded heatmap, where warmer colours indicate larger attention weights.

Figure 9a shows a consistent “bright band on the left” across samples, primarily concentrated in the early range of approximately 35–72 on the horizontal axis. This band appears as a continuous patch vertically rather than scattered points, indicating that the model extracts stable and reusable discriminative cues from the early stages in most samples. Entering the mid-stage (approximately 84–168), the overall colour tone darkens, with only scattered bright spots appearing near 132–156 in individual samples. This suggests the model occasionally utilises localised events from the middle segment. Near the failure threshold (≥0.85), only a few isolated high-weight pixels are visible, indicating that late-stage information holds some reference value but is not dominant. Since the vertical axis is sorted in descending order by absolute prediction error, the upper samples with high error exhibit flatter attention distributions and less concentrated hotspots. Conversely, samples with lower error in the middle and lower regions generally retain distinct early high-weight hotspots. This inverse correlation between “hotspot aggregation” and “error” indicates that the availability of early cues is closely linked to the model’s predictability. Corresponding to the heatmap, the global average attention in Figure 9b peaks at approximately 0.22 of the lifetime, then gradually decays over time, reaching a trough near the 0.75–0.85 interval. It shows only a slight rebound near failure, exhibiting an overall pattern of “significant height early on, a mid-term trough, and a slight lift in the late stage”.

To quantitatively summarise how the model allocates attention across the three life stages, the attention weights of each test specimen are first aggregated within the Early, Mid and Late windows. For every specimen, the attention values in each window are summed and divided by the total attention over the full sequence, yielding three proportions that add up to one. The bars in Figure 10 report the mean of these proportions across all test specimens, and the vertical error bars represent one standard deviation around the mean. The quantitative results are shown in Figure 10, where the attention shares for Early/Mid/Late are approximately 0.59 ± 0.16/0.32 ± 0.17/0.09 ± 0.11. In terms of magnitude, the average Early proportion is approximately 1.8 times that of Mid and 6.5 times that of Late, consistent with the peak positions in Figure 9b, reflecting the dominance of early information. It is important to note that the sum of these three segments always equals 1 (combined data), meaning an increase in one segment’s proportion necessarily accompanies a decrease in others. For most samples, the early proportion is high while the mid and late proportions correspondingly decrease. On the other hand, while Late has the lowest mean, its relative dispersion is significant, indicating a small number of “late-stage driven” outliers. These samples exhibited locally exploitable features near failure that could be leveraged by the model, though their contribution to the overall pattern is limited. Collectively, this segmentation not only aligns with the shape of the average curve but also reveals structural differences: early information is more reusable, mid-stage information is more heterogeneous, and late-stage information is more sporadic.

Figure 11 presents the results of the time-window occlusion experiment. In this analysis, each test specimen is evaluated not only with the full input sequence, but also with three modified sequences in which the Early, Mid or Late window is masked while the remaining parts are kept unchanged. For every specimen and for each masking scheme, the increase in root-mean-square logarithmic error relative to the prediction using the full sequence is calculated. The three bars in Figure 11 show, for the three windows, the mean error increments over all test specimens, and the error bars represent one standard deviation. When the Early segment is occluded, the error increment is approximately 0.29, which is markedly higher than the increments obtained when masking the Mid and Late segments (approximately 0.09 and 0.08, respectively). This pattern indicates that the early segment contains non-redundant and irreplaceable critical information: once removed, performance deteriorates immediately and substantially. By contrast, masking the mid-to-late segments only causes minor degradation, suggesting that effective cues within these regions are either weaker or have already been anticipated in the early phase. The agreement between the attention distribution and the masking-induced error increments further supports the interpretation that the model primarily relies on the dynamic response during the early life stage for discrimination.

In summary, the model primarily relies on the dynamic response during the early life stage for discrimination. This aligns with the understanding that initial material differences manifest early on, and that the evolution of stiffness in the early phase is indicative of subsequent life performance. It should be noted that attention is not a strictly causal indicator, but the credibility of the explanation is enhanced by two independent pieces of evidence: segmented attention distribution and temporal occlusion. Furthermore, replacing actual observations with zeros during occlusion may introduce distribution shifts. This paper minimises this effect through unified normalisation and comparative analysis.

4. Conclusions

This study proposed a feature-optimised Hybrid GRU–Attention–DNN model for predicting the multiaxial fatigue life of metallic materials. By assigning specialised branches to dynamic loading histories and static material descriptors and fusing them in a joint representation space, the model simultaneously captures temporal damage evolution and higher-order static couplings between material properties and loading paths.

On the Materials Cloud multiaxial fatigue dataset, the proposed Hybrid GRU–Attention–DNN consistently outperforms the baseline CNN, GRU and LSTM models. On the held-out test set, it achieves the highest R² (0.9156) and the lowest MAE, RMSE and RMSLE among all candidates, indicating both improved global fitting quality and reduced absolute as well as relative errors. Error-band plots show that more than 90% of the predictions fall within the ±1.5× band, with markedly fewer outliers than the baselines, which demonstrates the overall superiority of the proposed architecture.

Analysis of residual boxplots and bootstrap-based confidence intervals further reveals that the performance gain is statistically robust rather than driven by a few favourable samples. The Hybrid model exhibits the narrowest interquartile range and the smallest extreme residuals, and its 95% confidence intervals of R², RMSE, MAE and RMSLE are consistently shifted towards better values compared with CNN and LSTM, while being comparable to or slightly better than GRU. These results suggest that the proposed fusion strategy effectively enhances robustness against variability in loading paths and material properties, which is crucial for practical fatigue assessment.

The interpretability analysis provides additional insight into the prediction mechanism. Attention visualisation reveals a stable “early-life attention band”, where the average attention curve peaks in the early stage of the loading history and the Early/Mid/Late attention proportions follow an approximate 0.59/0.32/0.09 pattern. Time-window masking experiments show that removing the Early segment causes the largest increase in RMSLE, whereas masking the Mid or Late segments leads to only minor degradation. These two independent lines of evidence consistently indicate that early dynamic responses carry the most informative and irreplaceable cues for life prediction, which agrees with the physical understanding that initial stiffness evolution and early damage accumulation strongly influence subsequent fatigue life.

From an engineering perspective, the proposed model provides a practical tool for multiaxial fatigue assessment when only a moderate number of tests are available. It can be used to rapidly screen candidate loading paths or component geometries and to support design decisions by quantifying the influence of both loading histories and static material parameters. Nevertheless, the present evaluation protocol has some limitations. The train–test split is performed at the specimen level, so different specimens of the same material may appear in both sets; consequently, the reported performance mainly reflects generalisation to unseen loading paths of already characterised materials rather than to completely new alloys. In addition, the dataset covers a finite set of materials and loading conditions, and the current model does not explicitly incorporate microstructural information or crack-growth mechanics. Future work will consider material-wise partitioning or leave-one-material-out evaluation to more exhaustively assess generalisation to unseen materials. Extending the framework to other fatigue-critical components and to variable-temperature or environmental loading conditions is also an important direction.

Author Contributions

Conceptualization, M.Z.; methodology, H.L.; software, Y.C.; validation, C.W.; formal analysis, D.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The multiaxial fatigue dataset analysed in this study is publicly available from the Materials Cloud platform as described in reference [23]. The Python scripts implementing the proposed models are not publicly available due to ongoing related work and institutional regulations.

Conflicts of Interest

Authors Mi Zhou and Haishen Lu were employed by CRRC Nanjing Puzhen Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Nomenclature

Symbol	Description
$N_{f}$	Number of cycles to fatigue failure (fatigue life)
$E$	Young’s modulus of the material
TS	Ultimate tensile strength under monotonic loading
$y_{i}$	True fatigue-life value of the $i$ -th specimen
${\hat{y}}_{i}$	Predicted fatigue-life value of the $i$ -th specimen
R²	Coefficient of determination used for performance evaluation
MAE	Mean absolute error between predicted and true fatigue lives
RMSE	Root mean squared error between predicted and true fatigue lives
RMSLE	Root mean squared logarithmic error between predicted and true fatigue lives
$T$	Number of time steps in the loading-history sequence
$h_{t}$	Hidden state vector of the GRU at time step $t$
$α_{t}$	Attention weight assigned to time step $t$

References

Zhang, X.; Liu, F.; Shen, M.; Han, D.; Wang, Z.; Yan, N. Ultra-High-Cycle Fatigue Life Prediction of Metallic Materials Based on Machine Learning. Appl. Sci. 2023, 13, 2524. [Google Scholar] [CrossRef]
He, G.; Zhao, Y.; Yan, C. Multiaxial fatigue life prediction using physics-informed neural networks with sensitive features. Eng. Fract. Mech. 2023, 289, 109456. [Google Scholar] [CrossRef]
Huang, T.; Ding, R.-C.; Li, Y.-F.; Zhou, J.; Huang, H.-Z. A Modified Model for Nonlinear Fatigue Damage Accumulation of Turbine Disc Considering the Load Interaction Effect. Metals 2019, 9, 919. [Google Scholar] [CrossRef]
Anes, V.; Bumba, F.; Reis, L.; Freitas, M. Determination of the Relationship between Proportional and Non-Proportional Fatigue Damage in Magnesium Alloy AZ31 BF. Crystals 2023, 13, 688. [Google Scholar] [CrossRef]
Harlow, D.G. Statistically Modeling the Fatigue Life of Copper and Aluminum Wires Using Archival Data. Metals 2023, 13, 1419. [Google Scholar] [CrossRef]
Lee, S.; Molaei, R.; Carrion, P.E.; Shao, S.; Shamsaei, N. Multiaxial fatigue behavior and modeling for additive manufactured Ti-6Al-4V: The effect of surface texture. Int. J. Fatigue 2025, 201, 109205. [Google Scholar] [CrossRef]
Venturini, S.; Bonisoli, E.; Rosso, C.; Rovarino, D.; Velardocchia, M. Modal Analyses and Meta-Models for Fatigue Assessment of Automotive Steel Wheels. In Model Validation and Uncertainty Quantification; River Publishers: Cham, Switzerland, 2020; Volume 3, pp. 155–163. [Google Scholar]
Liu, T.; Shi, X.; Zhang, J.; Fei, B. Multiaxial high-cycle fatigue failure of 30CrMnSiA steel with mean tension stress and mean shear stress. Int. J. Fatigue 2019, 129, 105219. [Google Scholar] [CrossRef]
Liao, H.; Pan, J.; Su, X.; Sun, X.; Chen, X. A path-dependent adaptive physics-informed neural network for multiaxial fatigue life prediction. Int. J. Fatigue 2025, 193, 108799. [Google Scholar] [CrossRef]
Mosallam, A.; Medjaher, K.; Zerhouni, N. Data-driven prognostic method based on Bayesian approaches for direct remaining useful life prediction. J. Intell. Manuf. 2014, 27, 1037–1048. [Google Scholar] [CrossRef]
Liao, L.; Köttig, F. A hybrid framework combining data-driven and model-based methods for system remaining useful life prediction. Appl. Soft Comput. 2016, 44, 191–199. [Google Scholar] [CrossRef]
Zhu, Y.; Zhang, J.; Luo, J.; Guo, X.; Liu, Z.; Zhang, R. A Real-Time Remaining Fatigue Life Prediction Approach Based on a Hybrid Deep Learning Network. Processes 2023, 11, 3220. [Google Scholar] [CrossRef]
Komninos, P.; Verraest, A.E.C.; Eleftheroglou, N.; Zarouchas, D. Intelligent fatigue damage tracking and prognostics of composite structures utilizing raw images via interpretable deep learning. Compos. Part B Eng. 2024, 287, 111863. [Google Scholar] [CrossRef]
Chaves, R.R.; Oliveira, A.F.; Rubinger, R.M.; Silva, A.J. Recurrent Neural Networks (LSTM and GRU) in the Prediction of Current–Voltage Characteristics Curves of Polycrystalline Solar Cells. Electronics 2025, 14, 3342. [Google Scholar] [CrossRef]
Wu, S.; Liu, J. A Multi-Scale CNN-BiLSTM Framework with An Attention Mechanism for Interpretable Structural Damage Detection. Infrastructures 2025, 10, 82. [Google Scholar] [CrossRef]
Li, X.; Yang, H.; Yang, J. Fretting Fatigue Life Prediction for Aluminum Alloy Based on Particle-Swarm-Optimized Back Propagation Neural Network. Metals 2024, 14, 381. [Google Scholar] [CrossRef]
Baktheer, A.; Aldakheel, F. Physics-based machine learning for fatigue lifetime prediction under non-uniform loading scenarios. Comput. Methods Appl. Mech. Eng. 2025, 444, 118116. [Google Scholar] [CrossRef]
Xie, Z.; Du, S.; Lv, J.; Deng, Y.; Jia, S. A Hybrid Prognostics Deep Learning Model for Remaining Useful Life Prediction. Electronics 2020, 10, 39. [Google Scholar] [CrossRef]
Bock, F.E.; Keller, S.; Huber, N.; Klusemann, B. Hybrid Modelling by Machine Learning Corrections of Analytical Model Predictions towards High-Fidelity Simulation Solutions. Materials 2021, 14, 1883. [Google Scholar] [CrossRef]
Logeswari, G.; Thangaramya, K.; Selvi, M.; Roselind, J.D. An improved synergistic dual-layer feature selection algorithm with two type classifier for efficient intrusion detection in IoT environment. Sci. Rep. 2025, 15, 8050. [Google Scholar] [CrossRef]
Wu, H.; Wang, A.; Gan, Z.; Gan, L. Graphical Feature Construction-Based Deep Learning Model for Fatigue Life Prediction of AM Alloys. Materials 2024, 18, 11. [Google Scholar] [CrossRef]
Shin, H.; Yoon, T.; Yoon, S. Fatigue life predictor: Predicting fatigue life of metallic material using LSTM with a contextual attention model. RSC Adv. 2025, 15, 15781–15795. [Google Scholar] [CrossRef]
Chen, S.; Bai, Y.; Zhou, X.; Yang, A. A deep learning dataset for metal multiaxial fatigue life prediction. Sci. Data 2024, 11, 1027. [Google Scholar] [CrossRef] [PubMed]
Yaro, A.S.; Maly, F.; Prazak, P. Outlier Detection in Time-Series Receive Signal Strength Observation Using Z-Score Method with Sn Scale Estimator for Indoor Localization. Appl. Sci. 2023, 13, 3900. [Google Scholar] [CrossRef]
Kim, Y.-S.; Kim, M.K.; Fu, N.; Liu, J.; Wang, J.; Srebric, J. Investigating the impact of data normalization methods on predicting electricity consumption in a building using different artificial neural network models. Sustain. Cities Soc. 2025, 118, 105570. [Google Scholar] [CrossRef]
Mienye, I.D.; Swart, T.G.; Obaido, G. Recurrent Neural Networks: A Comprehensive Review of Architectures, Variants, and Applications. Information 2024, 15, 517. [Google Scholar] [CrossRef]
Apaydin, H.; Feizi, H.; Sattari, M.T.; Colak, M.S.; Shamshirband, S.; Chau, K.-W. Comparative Analysis of Recurrent Neural Network Architectures for Reservoir Inflow Forecasting. Water 2020, 12, 1500. [Google Scholar] [CrossRef]
Zhang, Y.; Fill, H.D. TS-GRU: A Stock Gated Recurrent Unit Model Driven via Neuro-Inspired Computation. Electronics 2024, 13, 4659. [Google Scholar] [CrossRef]
Noh, J.; Park, H.-J.; Kim, J.S.; Hwang, S.-J. Gated Recurrent Unit with Genetic Algorithm for Product Demand Forecasting in Supply Chain Management. Mathematics 2020, 8, 565. [Google Scholar] [CrossRef]
Brauwers, G.; Frasincar, F. A General Survey on Attention Mechanisms in Deep Learning. IEEE Trans. Knowl. Data Eng. 2023, 35, 3279–3298. [Google Scholar] [CrossRef]
Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Wei, Y.; Wu, D.; Terpenny, J. Remaining useful life prediction using graph convolutional attention networks with temporal convolution-aware nested residual connections. Reliab. Eng. Syst. Saf. 2024, 242, 109776. [Google Scholar] [CrossRef]
Mizuno, Y.; Hosoi, A.; Koshita, H.; Tsunoda, D.; Kawada, H. Fatigue life prediction of composite materials using strain distribution images and a deep convolution neural network. Sci. Rep. 2024, 14, 25418. [Google Scholar] [CrossRef] [PubMed]
Lei, W.; Dong, X.; Cui, F.; Huang, G. A Remaining Useful Life Prediction Method for Rolling Bearings Based on Hierarchical Clustering and Transformer–GRU. Appl. Sci. 2025, 15, 5369. [Google Scholar] [CrossRef]
Deng, Y.; Guo, C.; Zhang, Z.; Zou, L.; Liu, X.; Lin, S. An Attention-Based Method for Remaining Useful Life Prediction of Rotating Machinery. Appl. Sci. 2023, 13, 2622. [Google Scholar] [CrossRef]
Huang, Z.; Yan, J.; Zhang, J.; Han, C.; Peng, J.; Cheng, J.; Wang, Z.; Luo, M.; Yin, P. Deep Learning-Based Fatigue Strength Prediction for Ferrous Alloy. Processes 2024, 12, 2214. [Google Scholar] [CrossRef]
Garbin, C.; Zhu, X.; Marques, O. Dropout vs. batch normalization: An empirical study of their impact to deep learning. Multimed. Tools Appl. 2020, 79, 12777–12815. [Google Scholar] [CrossRef]
Ebadi, A.; Kaur, M.; Liu, Q. Hyperparameter optimization and neural architecture search algorithms for graph Neural Networks in cheminformatics. Comput. Mater. Sci. 2025, 254, 113904. [Google Scholar] [CrossRef]

Figure 1. Heatmap of the Pearson correlation matrix among the four scalar input features.

Figure 2. Schematic Diagram of GRU Structure.

Figure 3. Schematic Diagram of the Attention Mechanism Structure.

Figure 4. Schematic Diagram of DNN Model Structure.

Figure 5. Schematic Diagram of the Overall Structure of the Hybrid GRU-Attention-DNN Model.

Figure 6. Comparison between predicted and actual fatigue lives on the test set for different models: (a) CNN; (b) GRU; (c) LSTM; (d) Hybrid GRU–Attention–DNN. Scatter plots are drawn for the logarithmic life

l g (N f)

, while the R², MAE, RMSE and RMSLE values shown in the inset of each subfigure are calculated on the original life scale

N f

after inverse transformation from

l g (N f)

.

Figure 6. Comparison between predicted and actual fatigue lives on the test set for different models: (a) CNN; (b) GRU; (c) LSTM; (d) Hybrid GRU–Attention–DNN. Scatter plots are drawn for the logarithmic life

l g (N f)

, while the R², MAE, RMSE and RMSLE values shown in the inset of each subfigure are calculated on the original life scale

N f

after inverse transformation from

l g (N f)

.

Figure 7. Fitting results of different model test set: (a) CNN; (b) GRU; (c) LSTM and (d) GRU Attention DNN.

Figure 8. Boxplots of prediction residuals on the test set for different models. Residuals are computed on the original fatigue-life scale

N f

. The Hybrid GRU–Attention–DNN exhibits the smallest dispersion and the fewest extreme outliers: (a) CNN; (b) GRU; (c) LSTM and (d) GRU Attention DNN.

Figure 8. Boxplots of prediction residuals on the test set for different models. Residuals are computed on the original fatigue-life scale

N f

. The Hybrid GRU–Attention–DNN exhibits the smallest dispersion and the fewest extreme outliers: (a) CNN; (b) GRU; (c) LSTM and (d) GRU Attention DNN.

Figure 9. Attention visualisation results: (a) sample–time attention heatmap. Each row represents one test specimen, whose attention sequence has been mapped to the life-fraction axis and interpolated to 241 points before being shown as a colour-coded image. (b) global average attention curve with 95% confidence interval.

Figure 10. Early/Mid/Late attention allocation. For each test specimen, the attention weights within the three life windows are summed and normalised so that their total equals one; the bars show the mean proportion across all test specimens and the error bars indicate one standard deviation.

Figure 11. Time-window occlusion experiment. For each specimen, three additional predictions are generated by masking the Early, Mid and Late windows (0–80, 81–160 and 161–241 sampling steps, respectively); the bars show the mean increase in root-mean-square logarithmic error caused by each masking scheme, and the error bars denote one standard deviation across all test specimens.

Table 1. Overview of the raw database.

Feature Name	Type	Unit	Description
E (Young’s modulus)	Numeric	GPa	Elastic modulus of the material
TS (tensile strength)	Numeric	MPa	Ultimate tensile strength under monotonic loading
$σ_{s}$ (yield strength)	Numeric	MPa	Yield strength under monotonic loading
$m$ (hardening exponent)	Numeric	–	Strain hardening exponent of the material
$\log_{10} (N_{f})$	Numeric	–	Fatigue life (number of cycles to failure)
Stress history	Numerical sequence	MPa	Recorded axial (or equivalent) stress versus time or strain
Strain history	Numerical sequence	–	Recorded axial (or equivalent) strain versus time

Table 2. Quantitative comparison of different models in terms of R², MAE, RMSE, and RMSLE on the test set.

Model	R²	RMSE	MAE	RMSLE
CNN	0.8332	0.3247	0.2383	0.5461
GRU	0.9115	0.2366	0.1837	0.5095
LSTM	0.8520	0.3960	0.3028	0.5551
GRU-Attention-DNN	0.9156	0.2311	0.1756	0.5007

Table 3. Bootstrap-based 95% confidence intervals (CI) of evaluation metrics on the test set.

Model	R² (95% CI)	RMSE (95% CI)	MAE (95% CI)	RMSLE (95% CI)
CNN	[0.8197, 0.8467]	[0.3128, 0.3366]	[0.2241, 0.2525]	[0.5382, 0.5849]
GRU	[0.9068, 0.9201]	[0.2254, 0.2478]	[0.1728, 0.1939]	[0.4863, 0.5234]
LSTM	[0.8394, 0.8705]	[0.3791, 0.4128]	[0.2883, 0.3196]	[0.5338, 0.5781]
GRU-Attention-DNN	[0.9102, 0.9229]	[0.2205, 0.2416]	[0.1644, 0.1857]	[0.4891, 0.5119]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, M.; Lu, H.; Cao, Y.; Wang, C.; Chen, D. Prediction of Multi-Axis Fatigue Life of Metallic Materials Using a Feature-Optimised Hybrid GRU-Attention-DNN Model. Eng 2026, 7, 9. https://doi.org/10.3390/eng7010009

AMA Style

Zhou M, Lu H, Cao Y, Wang C, Chen D. Prediction of Multi-Axis Fatigue Life of Metallic Materials Using a Feature-Optimised Hybrid GRU-Attention-DNN Model. Eng. 2026; 7(1):9. https://doi.org/10.3390/eng7010009

Chicago/Turabian Style

Zhou, Mi, Haishen Lu, Yuan Cao, Chunsheng Wang, and Dian Chen. 2026. "Prediction of Multi-Axis Fatigue Life of Metallic Materials Using a Feature-Optimised Hybrid GRU-Attention-DNN Model" Eng 7, no. 1: 9. https://doi.org/10.3390/eng7010009

APA Style

Zhou, M., Lu, H., Cao, Y., Wang, C., & Chen, D. (2026). Prediction of Multi-Axis Fatigue Life of Metallic Materials Using a Feature-Optimised Hybrid GRU-Attention-DNN Model. Eng, 7(1), 9. https://doi.org/10.3390/eng7010009

Article Menu

Prediction of Multi-Axis Fatigue Life of Metallic Materials Using a Feature-Optimised Hybrid GRU-Attention-DNN Model

Abstract

1. Introduction

2. Data and Methods

2.1. Data Sources and Preprocessing

2.2. Fusion Model Construction

2.2.1. GRU Model

2.2.2. Attention Mechanism

2.2.3. Deep Neural Network Model

2.2.4. Integration Method

2.3. Verification Plan

3. Experiments and Results

3.1. Experimental Setup

3.2. Comparison of Results

3.3. Explainability Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI