SA-PhyGRU: A Self-Attention-Enhanced Physics-Informed GRU for Structural Seismic Response Prediction with Small Datasets

Gan, Cheng-Wu; Li, Bo; Wang, Yao-Yue; Yang, Dong

doi:10.3390/buildings16091738

Open AccessArticle

SA-PhyGRU: A Self-Attention-Enhanced Physics-Informed GRU for Structural Seismic Response Prediction with Small Datasets

¹

Earthquake Engineering Research & Test Center, Guangzhou University, Guangzhou 510006, China

²

School of Civil Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China

³

Key Laboratory of Earthquake Resistance, Earthquake Mitigation and Structural Safety, Ministry of Education, Guangzhou University, Guangzhou 510006, China

^*

Author to whom correspondence should be addressed.

Buildings 2026, 16(9), 1738; https://doi.org/10.3390/buildings16091738

Submission received: 3 March 2026 / Revised: 10 April 2026 / Accepted: 10 April 2026 / Published: 28 April 2026

(This article belongs to the Section Building Structures)

Download

Browse Figures

Versions Notes

Abstract

Accurate prediction of structural dynamic responses is critical for seismic analysis and decision-making throughout the structural life cycle. While model-driven and data-driven approaches have advanced practice, reliable prediction under limited data remains challenging due to the high cost of acquisition and simulation. This study proposes a Self-Attention-Enhanced Physics-Informed Gated Recurrent Unit network, SA-PhyGRU, for efficient and accurate seismic response prediction. The proposed network integrates GRU dynamics with a self-attention mechanism to capture long-range temporal dependencies and improve computational efficiency, while embedding physical constraints to enhance fidelity and generalization. Numerical and experimental validations on a three-story frame and a California hotel building show that SA-PhyGRU consistently outperforms conventional baselines in both accuracy and runtime, achieving improvements of up to 11.6% in

R^{2},

with pronounced gains in small-sample regimes. These results highlight SA-PhyGRU as an effective and generalizable approach for structural seismic response prediction and performance evaluation.

Keywords:

seismic response prediction; structural dynamics; self-attention-enhanced GRU; physics-informed learning; seismic excitation

1. Introduction

Earthquakes are unpredictable and highly destructive natural hazards that pose serious threats to human life and societal development [1,2,3]. In recent decades, the expansion of megacities and the increasing vulnerability of modern societies and advanced technologies have highlighted the need to evaluate seismic responses [4,5,6], assess building damage, and devise effective rescue plans [7].

To analyze structural responses to earthquakes, two broad classes of methods have been adopted: physics-driven and data-driven. Among the former, the finite element method is the most widely used and highly accurate; however, its modeling complexity and substantial computational cost hinder timely decision-making for post-earthquake emergency response [8,9].

Deep learning, as a data-driven solution to the above challenges, has advanced rapidly in civil engineering in recent years [10,11,12]. It avoids the complex process of iterative integration in time history analysis [13]. Numerous models have been developed to predict structural responses [14,15,16,17,18]. For example, recurrent neural networks (RNNs) have been used to predict nonlinear structural responses under dynamic loading for structural health assessment [19]. Zhang et al. [20] proposed two Long Short-Term Memory (LSTM)-based, data-driven schemes for structural response modeling—stacked sequence-to-sequence and full sequence-to-sequence—both showing strong predictive capability. Huang et al. [21] combined a Convolutional Neural Network (CNN) and an LSTM network to predict the seismic response of a two-story subway station. Wu et al. [22] introduced a gated recurrent unit (GRU)-based model that uses ground motion as input to accurately predict accelerations in a subway station, embedded in multi-layered soils. Attention mechanisms can assess the interrelationships among individual segments within a long sequence [23]. To further improve reliability and accuracy, many studies incorporate attention mechanisms: Li et al. [24] enhanced neural network performance for seismic response prediction using attention; Liao et al. [25] developed an attention-augmented LSTM for bridge seismic response prediction; and SeisFormer applied a self-attention architecture for structural response modeling [26]. All proposed models exhibit superior performance compared to traditional models; however, large-scale shaking-table tests are costly, and strong-motion records from field monitoring are scarce. Consequently, data acquisition for physical and engineering systems can be expensive, forcing researchers to make inferences with partial information [27]. In such small-data regimes, most purely data-driven methods lack robustness and offer no convergence guarantees [28]. Beyond data demands, the interpretability of black-box, data-driven deep learning models also remains a concern.

Physics-informed deep learning algorithms are increasingly used in civil engineering due to their reduced data requirements and improved interpretability [29]. Raissi et al. [30,31] showed that embedding partial differential equation constraints into the loss function enforces physical laws during training, thereby reducing reliance on large datasets and laying the groundwork for physics-informed approaches. Li et al. [32] proposed a deep residual neural network grounded in fundamental physics to address challenges such as dynamic behavior and gradient explosion in deep learning. A physics-informed RNN was developed to estimate the seismic response of both linear and nonlinear multi-degree-of-freedom systems [33]. Building on this, Zhang et al. [34,35] integrated physics knowledge into CNN and LSTM models to mitigate overfitting, improve robustness, and lessen dependence on training data. Liu et al. [36] and Liao et al. [37] adapted physical constraints to handle both linear and nonlinear cases, applying physics-informed LSTMs for response prediction. Moreover, Hu et al. [38] introduced pseudo-labeling into physics-informed neural networks to enhance the structural seismic response prediction performance. Similarly, several recent studies have explored AI-based structural damage identification methods, aiming to enhance the interpretability and reliability of machine-learning-based structural health monitoring systems [39,40,41].

Motivated by the success of physics-informed frameworks and attention mechanisms in structural response prediction, and the need for more efficient training via concise, explicit physical constraints, this paper proposes SA-PhyGRU, a model that integrates physical information with a self-attention-enhanced GRU to predict structural responses. Compared to models without self-attention, SA-PhyGRU computes attention weights and generates outputs with global contextual awareness. The embedding of physical constraints enhances the stability of the model’s predictions. Compared to LSTM networks, a GRU uses a reset gate and update gate to learn information, then saves more computing power and computational cost [42]. The consistency and accuracy of SA-PhyGRU’s forecasts are assessed, and its advantages in small-data settings are further demonstrated.

The paper is organized as follows. Section 1 reviews advances in acquiring structural responses during earthquakes. Section 2 presents the GRU and self-attention mechanisms, formulates a loss function that integrates physical and data constraints, and details the network hyperparameters. Section 3, illustrated in Figure 1, describes dataset construction using finite element analysis and experimental cases to generate seismic inputs and structural responses for training and testing. The network is then trained with limited data to evaluate the stability of SA-PhyGRU in seismic response prediction. Finally, Section 4 provides the conclusions.

2. The Foundation of the SA-PhyGRU Model

The SA-PhyGRU model is first proposed for seismic response prediction under physical constraints. It is constructed by integrating a GRU with a self-attention mechanism, with physical information embedded as explicit constraints. This section first introduces the theoretical foundations of the GRU and the self-attention mechanism, followed by a detailed description of the SA-PhyGRU loss function.

2.1. Gated Recurrent Unit (GRU)

The GRU is a specialized architecture of recurrent neural networks designed to mitigate the issues of gradient vanishing and gradient explosion that often arise in traditional RNNs when processing long sequences of data. Simultaneously, it simplifies the complex structure of LSTM networks, thereby enhancing computational efficiency and practicality.

The GRU eliminates an explicit memory cell and controls the hidden state directly via two gates, the reset gate and the update gate, which filter and update information, as illustrated in Figure 2. The formulas for reset gate

r^{t}

and update gate

z^{t}

are as follows:

r^{t} = σ (w_{r} [h^{t - 1}, x^{t}] + b_{r})

(1)

z^{t} = σ (w_{z} [h^{t - 1}, x^{t}] + b_{z})

(2)

where

x^{t}

is the current input;

h^{t - 1}

is the previous hidden state;

w_{r}

and

w_{z}

are the weight matrices;

b_{r}

and

b_{z}

are the bias terms; and

σ

is the sigmoid activation function. The reset gate

r^{t}

outputs weights between 0 and 1 to control how much past information is retained or discarded, thereby regulating the influence of

h^{t - 1}

on the current computation. Conversely, the update gate determines how much current input information is incorporated into the hidden state, using a sigmoid function to balance the contributions of the current input and the previous hidden state to the new state. The candidate hidden state

{\tilde{h}}^{t}

and the updated hidden state

h^{t}

are computed as follows:

{\tilde{h}}^{t} = t a n h (w_{h} [r^{t} * h^{t - 1}, x^{t}] + b_{h})

(3)

h^{t} = (1 - z^{t}) * {\tilde{h}}^{t} + z^{t} * h^{t - 1}

(4)

where

w_{h}

is the weight matrices;

b_{h}

is the bias vector; and the influence of the previous state

h^{t - 1}

on

h^{t}

is mediated by the values of

z^{t}

and

r^{t}

.

2.2. Self-Attention Mechanism

As a key module of the proposed SA-PhyGRU, the self-attention mechanism is used to improve predictive accuracy by highlighting the most informative components of the input and capturing their mutual dependencies. As shown in Figure 3, the self-attention block receives the GRU-generated hidden-state sequence, constructs query (Q), key (K), and value (V) representations, computes attention weights, and forms a weighted sum of the value vectors. The resulting context-enriched representation is then passed through fully connected layers to obtain the final predictions. This design yields outputs with global contextual awareness and enhances the model’s ability to identify the structural dependencies and latent interactions in the data.

Rather than a simple serial stacking, the integration of GRU and self-attention reflects a tight coupling of complementary strengths: self-attention augments the GRU hidden states with the global context, thereby informing the GRU’s gating dynamics (reset and update) and strengthening sequence modeling under complex, nonlinear dependencies. The self-attention formula is as follows:

A t t e n t i o n = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}})

(5)

where Q, K, and V are the query vector, key vector, and value vector, generated by the linear transformations of the input using three independent weight matrices

W^{Q}

,

W^{K}

, and

W^{V}

;

W^{Q}

,

W^{K}

, and

W^{V}

represent trainable weight matrices that can be learned during model training;

d_{k}

is the dimensionality of K;

K^{T}

is the transpose of matrix K; and softmax is a normalization function used to convert attention scores into weights.

2.3. Loss Function

The loss function in the SA-PhyGRU serves as the principal conduit for embedding physical knowledge into the learning process. As depicted in Figure 4, the self-attention module processes the GRU hidden-state sequence to produce a context-enriched representation. This representation is mapped by fully connected networks to displacement, velocity, and acceleration, which are then used to define the training objective.

To enforce physical consistency, the loss comprises two components: a data-fidelity term and a physics-informed term. The data-fidelity term penalizes discrepancies between predicted and measured responses, such as displacement, velocity, and acceleration, while the physics-informed term penalizes violations of the governing equations and constraints, such as equilibrium relations, constitutive consistency, and kinematic compatibility.

During backpropagation, gradients of the total loss reflect both contributions, and the optimizer Adam updates the model parameters to minimize the composite objective. This formulation promotes solutions that are not only statistically accurate but are also consistent with the underlying dynamics. In contrast, purely data-driven models can yield predictions that fit the observed data yet contravene fundamental physical principles, leading to poor generalization or unphysical behavior outside the training regime.

The training objective combines a data-fidelity term with physics-based penalties derived from the equation of motion and the associated difference relations. Data fidelity is measured using the mean squared error. The data loss components, denoted

{l o s s}_{1}

,

{l o s s}_{2}

, and

{l o s s}_{3}

, are defined as follows:

{l o s s}_{1} = \frac{1}{N} \sum_{i = 1}^{n} ({y_{p} - y_{t})}^{2}

(6)

{l o s s}_{2} = \frac{1}{N} \sum_{i = 1}^{n} ({{\dot{y}}_{p} - {\dot{y}}_{t})}^{2}

(7)

{l o s s}_{3} = \frac{1}{N} \sum_{i = 1}^{n} ({{\ddot{y}}_{p} - {\ddot{y}}_{t})}^{2}

(8)

where

y_{p}

,

{\dot{y}}_{p}

, and

{\ddot{y}}_{p}

denote the predicted displacement, velocity, and acceleration, respectively;

y_{t}

,

{\dot{y}}_{t}

, and

{\ddot{y}}_{t}

represent the corresponding true values; and N is the length of the response sequence. The embedding of structural dynamics equations offers a path that is more efficient than purely numerical simulations and more reliable than purely data-driven approaches. The equation of motion for the structure under seismic loading is as follows:

M \ddot{y} (t) + h (t) = - M {l \ddot{x}}_{g} (t)

(9)

where

h (t)

is the total nonlinear restoring force, given by

h (t) = c \dot{y} + k y

; M is the mass matrix;

l

is the unit column vector; and the ground acceleration

{\ddot{x}}_{g} (t)

is applied to the structural system as an external load. During the computation, both sides are divided by M, thereby canceling out the mass term. The numerical value of

h (t)

is then generated from

{\ddot{x}}_{g}

and

\ddot{y}

. The loss function, which incorporates physical information derived from the structural equations of motion, is expressed as follows:

{l o s s}_{4} = \frac{1}{N} \sum_{i = 1}^{n} (M \ddot{y} (t) {+ h (t) | M {l \ddot{x}}_{g} (t))}^{2}

(10)

By considering the discrepancy between the response obtained from finite-difference simulations and the actual response, the model’s predictive capability can be enhanced. The physical constraints derived from the finite-difference methods of displacement and velocity are as follows:

{l o s s}_{5} = \frac{1}{N} \sum_{i = 1}^{n} ({{\dot{y}}_{p} - {\dot{y}}_{d})}^{2}

(11)

{l o s s}_{6} = \frac{1}{N} \sum_{i = 1}^{n} ({{\ddot{y}}_{p} - {\ddot{y}}_{d})}^{2}

(12)

The

{\dot{y}}_{d}

is obtained by taking the first derivative of the predicted displacement

y_{p}

, while

{\ddot{y}}_{d}

is obtained by taking the second-order derivative of

y_{p}

. In summary, the data constraint

L_{d}

and the physical constraint

L_{p}

are as follows:

L_{d} = {l o s s}_{1} + {l o s s}_{2} + {l o s s}_{3}

(13)

L_{p} = {l o s s}_{4} + {l o s s}_{5} + {l o s s}_{6}

(14)

A physical loss function can reduce the dependence of deep learning on large training datasets while improving the model’s accuracy and stability. It is important to note that the role of the physical loss function is limited to the training phase of the model; constraints based on physical laws are not applied during the validation or testing phases.

L = α L_{d} + β L_{P}

(15)

The total loss function combines the data loss and physical loss with varying weight coefficients, and α and β are the weight coefficients for the control data and physical loss functions; to enhance the training of the model, it is often necessary to adjust them to achieve the desired convergence, especially when dealing with different datasets and models. Generally, the weight for the acceleration-related physics loss should be set to a value less than one, such as 0.1. Because acceleration tends to be large in magnitude, which can dominate the total loss and hinder convergence, it is recommended to adjust the weights so that the peak value of the total loss remains below 10.

a r g m i n

minimizes the training loss function by Adam. The individual

{l o s s}_{i}

terms correspond to physical quantities with units of m, m/s, and m/s², respectively, but their dimensional differences are not taken into account when they are summed. Therefore,

L_{d}

,

L_{P}

, and

L

are dimensionless.

The inclusion of a physics loss function reduces the model’s dependence on large-scale training datasets while concurrently enhancing its predictive accuracy and numerical stability. It is critical to emphasize that this physics-based regularization is exclusively active during the model training phase; it is not enforced during validation or inference, ensuring that the evaluation metrics reflect the model’s intrinsic predictive capability.

2.4. SA-PhyGRU Network

This study develops the SA-PhyGRU network, which integrates a GRU backbone with a self-attention module and embeds structural physics. The proposed model is evaluated for predictive accuracy and stability in seismic response prediction under limited data. The architecture of SA-PhyGRU is shown in Figure 5.

For structural response prediction, the input is the ground acceleration

{\ddot{x}}_{g}

, and the outputs are the structural responses: displacement

y_{p}

, velocity

{\dot{y}}_{p}

, and acceleration

{\ddot{y}}_{p}

. The equations of motion and the residuals from a finite-difference discretization are imposed as physics-based constraints, while the discrepancy between predictions and measurements serves as the data-fidelity term. These elements together define the SA-PhyGRU model.

This study implements SA-PhyGRU alongside baseline models, including a GRU, an LSTM, a physics-informed LSTM (PhyLSTM), and a physics-informed GRU (PhyGRU) to enable a controlled comparison between architectures with and without self-attention under the same physics constraints. All models were trained under the same conditions and hyperparameters to ensure a fair comparison. To better capture temporal dependencies, SA-PhyGRU adopts a two-layer GRU backbone with 100 units per layer. Two fully connected layers with ReLU activations are appended to enhance representational capacity.

Network training uses the Adam optimizer for its stable convergence. The hyperparameters are set as a batch size of 50, a learning rate of 0.001, and a dropout rate of 0.1 to mitigate overfitting. Appropriate selection of the dropout rate and learning rate can prevent overfitting, improve generalization capability, and reduce tuning time. Input and output tensors are shaped [10, 1000, 1] and [10, 1000, 3], corresponding to batch size, sequence length, and channel dimensions. PhyLSTM and LSTM baselines use the same two-layer recurrent architecture, identical fully connected heads, ReLU activations, Adam optimizatin, learning rates, maximum iterations, dropout rates, and batch sizes. For the physics-informed variants, the loss weights are tuned to improve training efficiency and avoid divergence, ensuring comparable optimization dynamics across models.

All models were trained in Python 3.7 using TensorFlow 1.15 on an RTX 3090 GPU. The training time for the proposed model was 2 h 10 min. On average, the training time of the GRU model was 20% shorter than that of the LSTM model. The installation was an NVIDIA-compiled build for RTX 30-series GPUs to ensure compatibility and performance; this provided an efficient environment for prototyping and training the networks.

3. The Numerical Case

This study assesses the effectiveness of SA-PhyGRU for structural response prediction using both numerical and experimental cases. The numerical case is constructed via the finite element method, and the experimental case is derived from a six-story hotel in San Bernardino, California. The loss functions employed for SA-PhyGRU are summarized in Table 1. SA-PhyGRU combines physics-based and data-fidelity terms, whereas the GRU baseline uses data loss only.

3.1. The Dataset Generation

This section constructs a three-story frame model in OpenSees, as shown in Figure 6. The first-story height is 4500 mm, the second and third stories are 3000 mm high, and the bay width is 4500 mm. Each story carries a lumped mass of 8400 kg. The model employed the elasticBeamColumn element and assumed that the component remains linearly elastic throughout the entire loading process, without yielding or plastic deformation. The elastic modulus is 2.68 ×

10^{4}

N/mm², and Rayleigh damping with a value of 0.5 was adopted. The model is analyzed in OpenSees to obtain the dynamic response, including displacement, velocity, and acceleration. These data are used to assemble the dataset for SA-PhyGRU and to support structural response prediction.

One hundred ground motion records from the PEER Strong Motion Database are used as seismic inputs. The records span a broad range of peak accelerations, providing diverse training and testing samples that enable a comprehensive assessment of model stability and generalization. Each record has a duration of 50 s and a sampling frequency of 20 Hz, yielding 1000 time steps per record. When the length of ground motion records exceeds 1000 steps, downsampling is applied to meet the requirements. Additionally, the data are scaled prior to being used as model inputs. Representative structural responses are shown in Figure 7. The training datasets used for the network are summarized in Table 2.

Real seismic acceleration records are used as external excitation, and the ground acceleration serves as the model input. The outputs comprise displacement, velocity, and acceleration at one of the three lumped masses. These outputs are used to assemble the training and test datasets. Each sample spans 50 s with a time step of 0.05 s. An example of the input acceleration and the corresponding structural response is shown in Figure 5. As is customary, 10 of the 100 available records were randomly selected for training, and the remaining 90 were reserved for testing to evaluate performance under limited data [34]. The random selection of the training set has only a marginal influence on the final results.

3.2. Numerical Validation

Using the constructed dataset and network configurations, SA-PhyGRU was trained with 10 percent of the records and evaluated on the remaining 90 percent. For a controlled comparison under limited data, the same splits were used to train and test PhyGRU and GRU.

This study introduces the lower quartile

{C I}_{q l}

and

{C I}_{q u}

as judgment indicators. The interval formed by the lower and upper quartiles stably reflects the distribution characteristics of the main data, directly corresponding to the 50% confidence level. This approach avoids the excessive influence of individual outlier predictions on interval estimation. Representative comparisons of predicted versus measured displacement, velocity, and acceleration for SA-PhyGRU, PhyGRU, and GRU are shown in Figure 8 and Figure 9. For displacement, SA-PhyGRU, which incorporates physics-based constraints, yields smaller errors than GRU. The interquartile confidence bounds

{C I}_{q l}

and

{C I}_{q u}

indicate robust performance, with

R^{2}

concentrated between 0.948 and 0.985. Displacement errors for SA-PhyGRU are predominantly within 0.04, for PhyGRU within 0.055, and for GRU up to 0.06 and in some cases 0.075. These results indicate that SA-PhyGRU achieves a closer fit and superior overall accuracy.

Because of the physics penalty, the initial loss is higher. After training, however, the final loss for SA-PhyGRU is slightly lower than that of PhyGRU and GRU, reflecting improved convergence toward physically consistent solutions.

To comprehensively assess predictive performance, the coefficient of determination (

R^{2}

), root mean square error (RMSE), and mean absolute error (MAE) are adopted. The formulas are as follows:

R^{2} = 1 - \frac{\sum_{i} {(y_{p} - y_{t})}^{2}}{\sum_{i} {(y_{t} - \bar{y})}^{2}}

(16)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{p} - y_{t} |

(17)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{t} - y_{p})}^{2}}

(18)

where

y_{p}

denotes the predicted value;

y_{t}

is the true value; and

\bar{y}

represents the sample mean. The coefficient of determination

R^{2}

measures the goodness-of-fit between predictions and observations. The MAE quantifies the average absolute deviation between predicted and true responses, indicating the typical error magnitude. The RMSE measures the root mean square difference between predicted and true responses, reflecting the average deviation with greater emphasis on larger errors.

The predictive performance for displacement, velocity, and acceleration under SA-PhyGRU, PhyGRU, and GRU is summarized in Table 3 and illustrated in Figure 8. Using displacement and acceleration as examples, SA-PhyGRU attains

R^{2}

= 0.9724, exceeding GRU by 8.56% and PhyGRU by 2.01%, indicating a better fit achieved by combining physics-informed losses with self-attention. The MAE for GRU, PhyGRU, and SA-PhyGRU is 0.0227, 0.0140, and 0.0125, respectively, corresponding to reductions of 44.9% relative to GRU and 10.7% relative to PhyGRU; the RMSE values are 0.0459, 0.0287, and 0.0255, showing similar improvements.

For acceleration,

R^{2}

improves from 0.8528 for GRU and 0.9436 for PhyGRU to 0.9652 for SA-PhyGRU, while MAE decreases from 0.6211 for GRU and 0.4225 for PhyGRU to 0.3676 for SA-PhyGRU, and RMSE decreases from 1.2575 for GRU and 0.8511 for PhyGRU to 0.7421 for SA-PhyGRU. These results demonstrate that incorporating physical constraints and self-attention reduces prediction errors and enhances accuracy and robustness. Overall, SA-PhyGRU delivers substantially higher accuracy than the data-driven GRU and improved robustness over PhyGRU under small-sample conditions. The consistent gains in

R^{2}

, MAE, and RMSE demonstrate the benefits of combining physics-informed losses with self-attention.

For additional comparison, the data-driven GRU model was trained using larger training partitions of 30% and 60%; the results are summarized in Table 4. Increasing the training share from 10% to 30% improved accuracy, yielding

R^{2}

= 0.9134, but the performance remained below the target level. With 60% of the data used for training, the GRU achieved results closer to SA-PhyGRU, with

R^{2}

values of 0.9425, 0.9478, and 0.9361. Corresponding reductions in MAE and RMSE were also observed, indicating further gains in predictive accuracy.

Finally, to assess the advantages of SA-PhyGRU over PhyLSTM and related baselines, three neural network models—including LSTM, SA-PhyGRU, and PhyLSTM—were implemented and evaluated on the same dataset; their performance metrics are reported in Table 5. As shown, SA-PhyGRU outperforms PhyLSTM, achieving lower MAE and RMSE values and a higher

R^{2}

. The improvement is attributable in part to the GRU unit, whose gating design simplifies the architecture, reduces the number of parameters, and improves computational efficiency. These results underscore the superior performance of SA-PhyGRU.

As shown in Figure 10, LSTM and PhyLSTM exhibit larger prediction errors than SA-PhyGRU, particularly for displacement and acceleration in the 6 s to 20 s interval. Although LSTM trains faster than PhyLSTM, it yields greater errors, weaker data fitting, and less reliable predictions. The interquartile confidence bounds

{C I}_{q l}

and

{C I}_{q u}

indicate that the

R^{2}

of SA-PhyGRU is concentrated between 0.945 and 0.982.

As shown in Figure 11, for displacement, velocity, and acceleration, LSTM yields substantially higher MAE and RMSE values and a notably lower

R^{2}

. These results demonstrate the clear advantage of the proposed SA-PhyGRU in response prediction accuracy, offering a practical and precise approach for predicting structural responses under seismic loading when using 10% of the datasets as the training set.

4. The Experimental Case

To benchmark the proposed SA-PhyGRU against a representative physics-informed baseline, we adopt the PI-LSTM of Liu et al. [36], which demonstrated improved seismic response prediction relative to conventional LSTM by embedding physical constraints. Both models are evaluated on the same experimental dataset to ensure a fair comparison.

4.1. The Dataset

The dataset comprises measured responses from a six-story hotel in San Bernardino, California, instrumented with nine accelerometers located on the first floor, third floor, and roof (Figure 12). Twenty earthquake records were obtained from the Center for Engineering Strong Motion Data. Detailed descriptions of the structure, instrumentation, and events are available in the study of Zhang et al. [35]. Given the limited sample size, ten records were used for training and the remaining ten for testing.

Acceleration responses recorded by the accelerometers installed on the six-story hotel are used as targets for prediction. Consistent with the strategy in Section 2, the SA-PhyGRU network incorporates the equation of motion as a physics constraint and employs self-attention to reduce prediction error. By contrast, the GRU baseline is trained with a data-only loss that penalizes discrepancies between predicted and experimental responses.

4.2. Experimental Validation

As shown in Figure 13, GRU, PhyGRU, and SA-PhyGRU deliver comparable performance for third-floor displacement. The third-floor displacement-response prediction of the GRU model is generally shifted downward, resulting in a smaller displacement-response prediction and greater error. Given that the amount of experimental case data is insufficient and cannot be increased, cross-validation is employed in this study to fully exploit the limited data by exchanging the training and testing sets. The advantage of SA-PhyGRU becomes more apparent when predicting roof displacement and acceleration, where it achieves lower errors and a better fit. The displacement error analysis further indicates that SA-PhyGRU offers a more robust and accurate approach for structural response prediction when using 50% of the dataset as the training set.

Table 6 and Figure 14 compare the performance of GRU, PhyGRU, SA-PhyGRU, and PI-LSTM in predicting displacement and acceleration at the third floor and the roof. For third-floor displacement, SA-PhyGRU attains the highest accuracy, while PI-LSTM, PhyGRU, and GRU also perform well, with

R^{2}

values of 0.9521, 0.9213, 0.9589, 0.9482, 0.9650, and 0.9300, respectively. SA-PhyGRU’s MAE and RMSE values are the lowest for third-floor displacement and acceleration among SA-PhyGRU, PI-LSTM, PhyGRU, and GRU. Compared to PI-LSTM, SA-PhyGRU showed superior predictive performance in roof displacement: the

R^{2}

rose from 0.930 to 0.9806; the MAE decreased from 0.001 to 0.0008; and the RMSE decreased from 0.002 to 0.0016, a further reduction of 20.0%. For roof acceleration, SA-PhyGRU’s

R^{2}

is lower than that of PI-LSTM but remains substantially higher at the third floor. The reduced accuracy at the roof is likely attributable to the higher acceleration intensity and associated noise in the roof measurements.

5. Conclusions

This study addressed seismic response prediction for a frame structure and a six-story hotel and introduced SA-PhyGRU, which integrates self-attention with physics-informed constraints. Across both numerical and experimental evaluations under limited training data, SA-PhyGRU consistently achieved higher accuracy and robustness than PhyLSTM, PhyGRU, and GRU, with

R^{2}

maximum increases of 2.2%, 3.3%, and 11.6%, respectively. The data-driven GRU improved with larger training sets, but remained less reliable under small-sample conditions. Although PI-LSTM attained a higher

R^{2}

for roof acceleration in one setting, SA-PhyGRU exhibited superior overall accuracy and stability.

The innovation of this research lies in proposing the SA-PhyGRU model, which integrates physical information with self-attention mechanisms, providing a method for structural response prediction on small datasets. Future research could further enhance the model’s predictive performance by more deeply integrating physical knowledge and incorporating additional experimental response data. Due to the lack of cases, the performance of the proposed model on larger structural systems and nonlinear behavior remains unclear. Additionally, the training speed poses certain difficulties for its practical deployment.

Author Contributions

Conceptualization, C.-W.G., B.L. and D.Y.; methodology, C.-W.G., B.L. and D.Y.; software, C.-W.G.; validation, C.-W.G., B.L. and Y.-Y.W.; formal analysis, C.-W.G.; investigation, C.-W.G. and Y.-Y.W.; resources, C.-W.G., B.L., Y.-Y.W. and D.Y.; data curation, C.-W.G.; writing—original draft preparation, C.-W.G.; writing—review and editing, C.-W.G. and B.L.; visualization, C.-W.G., Y.-Y.W. and D.Y.; supervision, C.-W.G., B.L. and Y.-Y.W.; project administration, C.-W.G., B.L., Y.-Y.W. and D.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data generated and/or analyzed during the current study are not publicly available due to legal/ethical reasons but are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, X.; Wang, H.; Gao, H.; Liu, Y.; Pan, Z.; Song, Q.; Qin, H.; Jiang, Y. Prior-Knowledge-Guided Missing Data Imputation for Bridge Cracks: A Temperature-Driven SP-VMD-CNN-GRU Framework. Buildings 2026, 16, 669. [Google Scholar] [CrossRef]
Takagi, J.; Wada, A. Higher performance seismic structures for advanced cities and societies. Engineering 2019, 5, 184–189. [Google Scholar] [CrossRef]
Kazama, M.; Noda, T. Damage statistics (Summary of the 2011 off the Pacific Coast of Tohoku Earthquake damage). Soils Found. 2012, 52, 780–792. [Google Scholar] [CrossRef]
Huang, X.; Wang, R.; Zhang, X.; Huang, G.; Teng, D.; Zhang, X. Damage Inspection and Seismic Assessment of Lingzhao Xuan in the Palace Museum: A Case Study. Buildings 2024, 14, 3311. [Google Scholar] [CrossRef]
Kang, X.; Chen, H.; Zhao, G.; Lin, X.; Zheng, L.; Chen, Y.; Liu, Q.; Zhao, Z.; Chen, X.; Wang, F. Nearby Real-Time Earthquake Simulation on an Urban Scale Based on Structural Monitoring. Buildings 2024, 14, 3574. [Google Scholar] [CrossRef]
Gao, Y.; Xiao, Z.; Gong, Z.; Huang, S.; Zhu, H. Spatiotemporal Deformation Prediction Model for Retaining Structures Integrating ConvGRU and Cross-Attention Mechanism. Buildings 2025, 15, 2537. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, B.; Wang, T.; Su, T.; Chen, H. Dynamic analysis of multilayer-reinforced concrete frame structures based on NewMark-β method. Rev. Adv. Mater. Sci. 2021, 60, 567–577. [Google Scholar] [CrossRef]
Geller, R.; Jackson, D.; Kagan, Y.; Mulargia, F. Earthquakes cannot be predicted. Science 1997, 275, 1616–1617. [Google Scholar] [CrossRef]
Zhang, Y.; Wu, G. Seismic vulnerability analysis of rc bridges based on kriging model. J. Earthq. Eng. 2019, 23, 242–260. [Google Scholar] [CrossRef]
Zheng, Z.; Tian, Y.; Yang, Z.; Lu, X. Hybrid framework for simulating building collapse and ruin scenarios using finite element method and physics engine. Appl. Sci. 2020, 10, 4408. [Google Scholar] [CrossRef]
Cha, Y.; Choi, W.; Suh, G. Autonomous Structural Visual Inspection Using Region-Based Deep Learning for Detecting Multiple Damage Types. Comput.-Aided Civ. Infrastruct. Eng. 2018, 33, 731–747. [Google Scholar] [CrossRef]
Xu, Y.; Lu, X.; Cetiner, B.; Taciroglu, E. Real-time regional seismic damage assessment framework based on long short-term memory neural network. Comput.-Aided Civ. Infrastruct. Eng. 2020, 36, 504–521. [Google Scholar] [CrossRef]
Li, Y.; Bao, T.; Gao, Z.; Shu, X.; Zhang, K.; Xie, L. A new dam structural response estimation paradigm powered by deep learning and transfer learning techniques. Struct. Health Monit. 2021, 21, 770–787. [Google Scholar] [CrossRef]
Xu, Y.; Quan, Q.; Zhang, Z. Research on Long-Term Structural Response Time-Series Prediction Method Based on the Informer-SEnet Model. Buildings 2026, 16, 189. [Google Scholar] [CrossRef]
Liu, D.; Yang, J.; Li, J.; Shen, J.; Zhang, Y.; Chen, L.; Zhou, L. LGSTA-GNN: A Local-Global Spatiotemporal Attention Graph Neural Network for Bridge Structural Damage Detection. Buildings 2026, 16, 348. [Google Scholar] [CrossRef]
Li, H.; Wang, T.; Gang, W. Dynamic response prediction of vehicle-bridge interaction system using feedforward neural network and deep long short-term memory network. Structures 2021, 34, 2415–2431. [Google Scholar] [CrossRef]
Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef] [PubMed]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Li, J.; Chen, W.; Fan, G. Structural health monitoring data anomaly detection by transformer enhanced densely connected neural networks. Smart Struct. Syst. 2022, 30, 613. [Google Scholar] [CrossRef]
Zhang, R.; Zhao, C.; Su, C.; Jing, Z.; Oral, B.; Hao, S. Deep long short-term memory networks for nonlinear structural seismic response prediction. Comput. Struct. 2019, 220, 55–68. [Google Scholar] [CrossRef]
Huang, P.; Chen, Z. Deep learning for nonlinear seismic responses prediction of subway station. Eng. Struct. 2021, 244, 112735. [Google Scholar] [CrossRef]
Wu, W.; Ge, S.; Yuan, Y.; Ding, W.; Anastaspoulos, I. Seismic response of subway station in soft soil: Shaking tabletesting versus numerical analysis. Tunn. Undergr. Space Technol. 2020, 100, 103389. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Advances in Neural Information Processing Systems. pp. 5998–6008. Available online: https://arxiv.org/abs/1706.03762 (accessed on 9 April 2026).
Li, T.; Pan, Y.; Tong, K.; Ventura, C.; de Silva, C. A multi-scale attention neural network for sensor location selection and nonlinear structural seismic response prediction. Comput. Struct. 2021, 248, 106507. [Google Scholar] [CrossRef]
Liao, Y.; Lin, R.; Zhang, R.; Wu, G. Attention-based LSTM (AttLSTM) neural network for Seismic Response Modeling of Bridges. Comput. Struct. 2023, 275, 106915. [Google Scholar] [CrossRef]
Meng, S.; Zhou, Y.; Gao, Z. Refined self-attention mechanism based real-time structural response prediction method under seismic action. Eng. Appl. Artif. Intell. 2024, 129, 107380. [Google Scholar] [CrossRef]
Hu, S.; Guo, T.; Shahria, A.; Koetaka, Y.; Ghafoori, E.; Karavasilis, T.L. Machine learning in earthquake engineering: A review on recent progress and future trends in seismic performance evaluation and design. Eng. Struct. 2025, 340, 120721. [Google Scholar] [CrossRef]
Karniadakis, G.E.; Kevrekidis, I.G.; Lu, L.; Perdikaris, P.; Wang, S.; Yang, L. Physics-informed machine learning. Nat. Rev. Phys. 2021, 3, 422−440. [Google Scholar] [CrossRef]
Xu, Y.; Sara, K.; Jessica, B.; Paolo, G. Physics-informed machine learning for reliability and systems safety applications: State of the art and challenges. Reliab. Eng. Syst. Saf. 2021, 230, 108900. [Google Scholar] [CrossRef]
Raissi, M.; Perdikaris, P.; Karniadakis, G. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2021, 378, 686–707. [Google Scholar] [CrossRef]
Raissi, M.; Yazdani, A.; Karniadakis, G. Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations. Science 2020, 367, 1026–1030. [Google Scholar] [CrossRef]
Li, J.; Chen, Y. A physics-constrained deep residual network for solving the sine-Gordon equation. Commun. Theor. Phys. 2021, 73, 015001. [Google Scholar] [CrossRef]
Soheil, S.; Martin, T.; Shamim, N.; Majid, J. Dyn Net: Physics-based neural architecture design for nonlinear structural response modeling and prediction. Eng. Struct. 2021, 229, 111582. [Google Scholar] [CrossRef]
Zhang, R.; Liu, Y.; Sun, H. Physics-informed multi-LSTM networks for metamodeling of nonlinear structures. Comput. Methods Appl. Mech. Eng. 2020, 369, 113226. [Google Scholar] [CrossRef]
Zhang, R.; Liu, Y.; Sun, H. Physics-guided convolutional neural network (PhyCNN) for data-driven seismic response modeling. Eng. Struct. 2021, 215, 110704. [Google Scholar] [CrossRef]
Liu, F.; Li, J.; Wang, L. PI-LSTM: Physics-informed long short-term memory network for structural response modeling. Eng. Struct. 2023, 292, 116500. [Google Scholar] [CrossRef]
Liao, Y.; Tang, H.; Li, R.; Ran, L.; Xie, L. Seismic response prediction and parameters estimation of the frame structure equipped with the base isolation-fluid inerter system (FS-BIFI) based on the PhyLSTM model. Eng. Struct. 2024, 309, 118077. [Google Scholar] [CrossRef]
Hu, Y.; Tsang, H.H.; Lam, N.; Lumantarna, E. Physics-informed neural networks for enhancing structural seismic response prediction with pseudo-labelling. Arch. Civ. Mech. Eng. 2024, 24, 7. [Google Scholar] [CrossRef]
Wang, X.W.; Zheng, W.; Wei, Z.H.; Li, S.Q. Explainable AI-Driven Optimal Feature Selection for the Identification of Structural Damage. Struct. Control Health Monit. 2025, 23, 7253150. [Google Scholar] [CrossRef]
Wang, X.W.; Wang, Z.H.; Yang, S.X.; Wei, S.Q.; Wang, T.L. An incremental broad ensemble learning framework for the identification of structural damage in conditions of data scarcity and non-stationarity. Eng. Struct. 2026, 353, 122261. [Google Scholar] [CrossRef]
Wang, X.W.; Zhao, Y.H.; Wei, Z.H.; Hu, N. An ultrafast and robust structural damage identification framework enabled by an optimized extreme learning machine. Mech. Syst. Signal Process. 2026, 216, 111509. [Google Scholar] [CrossRef]
Kyunghyun, C.; Bartvan, M.; Caglar, G.; Dzmitry, B.; Fethi, B.; Holger, S.; Yoshua, B. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP); Association for Computational Linguistics: Doha, Qatar; pp. 1724–1734. [CrossRef]

Figure 1. Overall process of the SA-PhyGRU.

Figure 2. The scheme of SA-PhyGRU network.

Figure 3. The architecture of self-attention.

Figure 4. The loss function of the SA-PhyGRU.

Figure 5. The architecture of SA-PhyGRU network.

Figure 6. Three-layer frame structure.

Figure 7. Frame structural response.

Figure 8. Structural response prediction results.

Figure 9. Structural response prediction results.

Figure 10. Structural response prediction results by different models.

Figure 11. Structural response prediction results.

Figure 12. The 6-story hotel in San Bernardino, California (Station Number: 23287).

Figure 13. Prediction performance of GRU, PhyGRU, and SA-PhyGRU.

Figure 14. Prediction results on the 3rd floor and roof.

Table 1. The loss functions of SA-PhyGRU.

	SA-PhyGRU
Physical loss ${(L}_{P})$	$\frac{1}{N} {(M \ddot{y} {+ h + M {l \ddot{x}}_{g})}^{2} + ({{\dot{y}}_{p} - {\dot{y}}_{d})}^{2} + ({{\ddot{y}}_{p} - {\ddot{y}}_{d})}^{2}}$
Data loss ${(L}_{d})$	$\frac{1}{N} {({y_{p} - y_{t})}^{2} + ({{\dot{y}}_{p} - {\dot{y}}_{t})}^{2} + ({{\ddot{y}}_{p} - {\ddot{y}}_{t})}^{2}}$
Total loss $(L)$	$L (δ_{1}, δ_{2}) = α L_{d} + β L_{P}$

Table 2. The loss functions of SA-PhyGRU.

Datasets	Total Duration	Sample Duration	Sample Number	Proportion
Training	50 s	0.05 s	1000	10%
Testing	50 s	0.05 s	1000	90%

Table 3. The performance of the GRU, PhyGRU, and SA-PhyGRU networks.

Response	Metric	GRU	PhyGRU	SA-PhyGRU
	MAE	0.0227	0.0140	0.0125
Displacement	RMSE	0.0459	0.0287	0.0255
	$R^{2}$	0.8891	0.9523	0.9724
	MAE	0.0896	0.0696	0.0630
Velocity	RMSE	0.1823	0.1497	0.1375
	$R^{2}$	0.8979	0.9525	0.9737
	MAE	0.6211	0.4225	0.3676
Acceleration	RMSE	1.2575	0.8511	0.7421
	$R^{2}$	0.8528	0.9436	0.9652

Table 4. The performance of GRU under different datasets.

Response	Metric	30% Dataset	60% Dataset
	MAE	0.0205	0.0142
Displacement	RMSE	0.0423	0.0286
	$R^{2}$	0.9134	0.9425
	MAE	0.0803	0.0709
Velocity	RMSE	0.1700	0.1502
	$R^{2}$	0.9174	0.9478
	MAE	0.5371	0.4239
Acceleration	RMSE	1.0762	0.8564
	$R^{2}$	0.8816	0.9361

Table 5. The performance of GRU under different datasets.

Response	Metric	SA-PhyGRU	PhyLSTM	LSTM
	MAE	0.0125	0.0141	0.0246
Displacement	RMSE	0.0255	0.0293	0.0483
	$R^{2}$	0.9724	0.9512	0.8713
	MAE	0.0630	0.0688	0.0991
Velocity	RMSE	0.1375	0.1483	0.1992
	$R^{2}$	0.9737	0.9575	0.8867
	MAE	0.3676	0.3988	0.6134
Acceleration	RMSE	0.7421	0.8773	1.2120
	$R^{2}$	0.9652	0.9521	0.8529

Table 6. The performance of the different models.

Third Floor	Metric	SA-PhyGRU	PI-LSTM	PhyGRU	GRU
	MAE	0.0007	0.001	0.0009	0.0010
Displacement	RMSE	0.0014	0.001	0.0019	0.0021
	$R^{2}$	0.9734	0.965	0.9589	0.9521
	MAE	0.0887	0.096	0.0994	0.1112
Acceleration	RMSE	0.1756	0.210	0.2143	0.2357
	$R^{2}$	0.9551	0.937	0.9315	0.9280
Rooftop
	MAE	0.0008	0.001	0.0010	0.0012
Displacement	RMSE	0.0016	0.002	0.0020	0.0023
	$R^{2}$	0.9806	0.930	0.9482	0.9213
	MAE	0.1935	0.145	0.2633	0.3478
Acceleration	RMSE	0.3956	0.280	0.5026	0.6783
	$R^{2}$	0.9536	0.977	0.9228	0.8796

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gan, C.-W.; Li, B.; Wang, Y.-Y.; Yang, D. SA-PhyGRU: A Self-Attention-Enhanced Physics-Informed GRU for Structural Seismic Response Prediction with Small Datasets. Buildings 2026, 16, 1738. https://doi.org/10.3390/buildings16091738

AMA Style

Gan C-W, Li B, Wang Y-Y, Yang D. SA-PhyGRU: A Self-Attention-Enhanced Physics-Informed GRU for Structural Seismic Response Prediction with Small Datasets. Buildings. 2026; 16(9):1738. https://doi.org/10.3390/buildings16091738

Chicago/Turabian Style

Gan, Cheng-Wu, Bo Li, Yao-Yue Wang, and Dong Yang. 2026. "SA-PhyGRU: A Self-Attention-Enhanced Physics-Informed GRU for Structural Seismic Response Prediction with Small Datasets" Buildings 16, no. 9: 1738. https://doi.org/10.3390/buildings16091738

APA Style

Gan, C.-W., Li, B., Wang, Y.-Y., & Yang, D. (2026). SA-PhyGRU: A Self-Attention-Enhanced Physics-Informed GRU for Structural Seismic Response Prediction with Small Datasets. Buildings, 16(9), 1738. https://doi.org/10.3390/buildings16091738

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SA-PhyGRU: A Self-Attention-Enhanced Physics-Informed GRU for Structural Seismic Response Prediction with Small Datasets

Abstract

1. Introduction

2. The Foundation of the SA-PhyGRU Model

2.1. Gated Recurrent Unit (GRU)

2.2. Self-Attention Mechanism

2.3. Loss Function

2.4. SA-PhyGRU Network

3. The Numerical Case

3.1. The Dataset Generation

3.2. Numerical Validation

4. The Experimental Case

4.1. The Dataset

4.2. Experimental Validation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI