1. Introduction
With the advancement of architectural structures toward higher-rise and more complex designs, steel frame structures have gained widespread application in engineering due to their advantages such as high strength and good ductility. The seismic reliability of these structures directly impacts the safety and durability of building structures. Traditional seismic reliability analysis methods for steel frame structures predominantly rely on the combination of finite element analysis and Monte Carlo simulation (MCS), yet they face challenges such as low computational efficiency and difficulties in handling implicit limit state functions. In recent years, machine learning methods have garnered significant attention in the field of structural engineering reliability analysis due to their robust capabilities in data mining and nonlinear fitting.
The core of structural reliability analysis lies in accurately describing the probability distribution characteristics of structural load effects and resistance, while establishing reasonable limit state functions. In early research, researchers predominantly employed traditional methods such as the response surface method [
1] and first-order second-moment method [
2] for reliability calculations. However, these approaches struggle to balance precision and efficiency for complex structures. With advancements in machine learning technologies, models like back propagation neural networks (BPNNs), support vector machines (SVR), and extreme gradient boosting trees (XGBoost) have been progressively applied to structural response prediction and reliability analysis. Nie et al. [
3] conducted static reliability analysis of steel frames using a combination of BPNN and radial basis function networks with the Fekete point method, validating the feasibility of neural networks in structural reliability studies. Wakjira et al. [
4] determined optimal input variables for composite reinforced concrete beams through regression analysis of load and flexural capacity using various machine learning methods. Dai et al. [
5] achieved a 99% coefficient of determination in predicting flexural capacity of cold-formed thin-walled steel sections using XGBoost, demonstrating the high-precision predictive capabilities of machine learning models.
The successful application of machine learning models in structural reliability analysis depends critically on two additional factors: sampling methods and parameter optimization. Proper sampling techniques can provide high-quality training data for models, directly influencing surrogate accuracy. Wang [
6] combined number theory point selection with artificial neural networks (ANNs), effectively improving the computational efficiency of global seismic reliability analysis for reinforced concrete-steel frames. Hu et al. [
7] adopted a center composite design to select training data, significantly reducing the time cost of reliability calculations for complex structures. Meanwhile, optimization algorithms play a vital role in parameter tuning for machine learning models. Asteris et al. [
8] utilized a feedforward neural network based on krill swarm algorithm (KHA) to predict inter-story drift ratio (IDR), achieving minimal model errors. Nguyen et al. [
9] employed particle swarm optimization–artificial neural networks (PSO-ANNs) to predict seismic responses of short columns, with a coefficient of determination exceeding 0.999, demonstrating the effectiveness of optimization algorithms in enhancing model performance.
It should be noted that the present study adopts a regular, symmetrical steel frame, which inherently exhibits lower vulnerability to torsional and geometric irregularities than many real-world structures. Recent post-earthquake investigations following the 2023 Kahramanmaraş sequence have demonstrated the critical influence of infill walls and subsurface conditions on seismic performance. Tan et al. [
10] showed that including infill walls in nonlinear models yields damage distributions significantly closer to field observations than bare frame models. Tan et al. [
11] further reported that variations in foundation soil stiffness caused structurally identical buildings on the same site to exhibit completely different outcomes, with softer soils increasing drift demands by 30–50%. While the current study intentionally focuses on a regular bare steel frame to isolate parameter sensitivity effects, extending the proposed machine learning-(ML-)based reliability framework to irregular configurations, infilled frames, and soil–structure interaction remains an important direction for future work.
Some recent works have further advanced the integration of ML surrogates and metaheuristic optimization in structural seismic analysis. Afshari et al. [
12] provided a comprehensive review of ML-based methods in structural reliability, concluding that a surrogate model replaces the true limit state function with a computationally inexpensive approximation. This reduces the number of calls to the structural finite element analysis while maintaining acceptable accuracy, thereby significantly lowering the computational cost of structural reliability analysis. Li et al. [
13] proposed an improved extreme learning machine combined with seagull optimization for seismic evacuation risk assessment of educational buildings, achieving 92% accuracy. Qin and Kaewunruen [
14] compared multiple ML models with traditional approaches for shear reliability analysis of steel fiber reinforced concrete beams, demonstrating that comparative evaluation across ML paradigms yields practical guidance for model selection. Asgarkhani et al. [
15] developed a dynamic ensemble ML model with GA and PSO hyperparameter optimization for predicting seismic probability curves of infilled steel frames considering soil–foundation–structure interaction, and reported that the proposed ensemble model achieved 99.3% accuracy while conventional models (XGBoost, gradient boost machine (GBM), random forest, LightGBM) ranged from 91% to 94.2%. These studies collectively confirm that PSO and other metaheuristic algorithms are effective tools for hyperparameter tuning in ML-assisted structural reliability, and that the achievable prediction accuracy depends strongly on the specific model architecture and the nature of the structural response. These studies do not proceed the analysis from the perspective of global structural performance, so that surrogate models can be combined with global structural reliability analysis to provide methods for efficient computation in large-scale structural reliability assessment.
Nevertheless, the global seismic reliability analysis of steel frame structures still faces several challenges. First, the probability distribution of critical response parameters such as the maximum IDR of the structure is difficult to accurately fit, which compromises the precision of reliability calculations. Second, the selection of input features for machine learning models lacks systematicity, and redundant features may impair the generalization ability of the model. Third, the impact of sample size on model prediction performance remains unclear, making it challenging to determine the optimal sample size. It is worth noting that the aforementioned studies [
3,
4,
5,
6,
8,
9] each adopt a single machine learning paradigm, with typically one variant of neural networks or gradient boosting, lacking systematic comparison against alternative models under identical conditions.
Therefore, this study focuses on multi-story steel frame structures and aims to address three specific gaps. First, while the general combination of sensitivity analysis, surrogate models, and Monte Carlo simulation is well established, existing studies rarely provide a systematic comparison among multiple machine learning paradigms—in particular, PSO-Optimized SVR, PSO-Optimized XGBoost, and PSO-Optimized BPNN—within the same seismic reliability context. Such a comparison is practically valuable because the optimal model choice is not known a priori for structural response data that deviates from standard distributions. Second, the study explicitly demonstrates that when the structural response does not conform to conventional distributions (as confirmed by the K-S test rejection of all four candidate distributions), traditional distribution-fitting approaches such as MLE can introduce noteworthy errors in reliability indices, whereas ML-based surrogates bypass this limitation entirely by avoiding distributional assumptions. Third, many existing ML-assisted reliability studies on frame structures have been conducted at the component level; for example, predicting the flexural or shear capacity of individual beams, columns, or connections. In contrast, this study addresses the global reliability of the overall deformation capacity of multi-story steel frames. Focusing on global reliability, this study computes the maximum inter-story drift ratio under random pushover analysis by simultaneously propagating uncertainties from nine structural parameters through a nonlinear finite element model. This introduces a higher-dimensional, more strongly coupled input–output mapping compared to component-level prediction tasks. This work thus provides a reference for the application of machine learning to global reliability assessment. Beyond these methodological contributions, the study also establishes a practical 1000-sample threshold for training data adequacy and clarifies its dependence on the input dimensionality and structural complexity. Through a series of research steps including finite element modeling and validation, global sensitivity analysis, machine learning model optimization and reliability calculations, a framework with high potential for large-scale computational applications is established for seismic reliability of steel frame structures.
2. Finite Element Model of Multi-Story Steel Frame Structure
To establish a reliable numerical platform for seismic response analysis and subsequent uncertainty quantification of multi-story steel frame structures, a single-bay plane frame from the middle span of a 9-story steel frame structure is selected as the analysis model. Referencing the structural configuration of the benchmark steel frame designed for the SAC Phase II Steel Project [
16], its initial design is carried out based on the Chinese Code for Seismic Design of Buildings (GB 50011-2010) [
17]. The resulting cross-sectional dimensions and material grades (
Table 1 and
Table 2) are then adopted to build consistent numerical models in both OpenSEES 3.4 and ETABS 2013 for subsequent nonlinear and modal analyses. The middle span is chosen because interior frames typically carry larger gravity loads and are less affected by torsional effects than perimeter frames, making them representative of the most critical seismic demand. This structural form is primarily intended for a typical type of regular steel structural layout in multi-story buildings ranging from 4 to 9 stories.
Figure 1a and
Figure 1b shows its plan layout and structural form, respectively.
To validate the accuracy of the finite element model, a centralized mass model of the planar frame is established using both OpenSEES and the commercial software ETABS. The OpenSEES model employs force-interpolation-based distributed plastic beam-column elements (nonlinearBeamColumn) to simulate beam and column components. The steel constitutive behavior is modeled using the Steel02 model incorporating the Bauschinger effect (see
Figure 2). The strain hardening ratio is taken as 0.01, and the transition curve parameters
R0 = 18,
cR1 = 0.925, and
cR2 = 0.15 are adopted following OpenSEES default recommendations. The yield strength and elastic modulus are treated as random variables in the subsequent uncertainty analysis, with their statistical properties provided later in the text. The P-Delta effect is also accounted for in the analysis.
The first five modes were selected for comparison because, for regular multi-story planar steel frames of this height, the cumulative effective mass participation typically exceeds 90% within the first five modes, ensuring that the dominant translational and torsional dynamic characteristics are captured. A comparison of the natural frequencies from the two models (
Table 3) shows relative differences below 5% in all five modes, confirming the reliability of the OpenSEES model for subsequent seismic performance and reliability analysis. In addition, a deterministic pushover analysis was performed on the structure. The resulting pushover curve and inter-story drift distribution are shown in
Figure 3. These results are in close agreement with those of existing studies [
18,
19], thereby validating the finite element model in terms of nonlinear structural behavior.
4. Seismic Reliability Analysis of Overall Deformation Capacity Based on Machine Learning
4.1. Model Feature Parameter Selection Based on Sensitivity Analysis
The 9-story steel frame structure established in the previous section aims to construct an efficient and reliable reliability analysis method based on machine learning. To reduce the complexity of machine learning models and enhance their generalization ability, a global sensitivity analysis is conducted to identify the input parameters (features) that most significantly influence the maximum IDR of the output. The variance-based Sobol’ method, applicable to strongly nonlinear systems, is employed to quantify the contributions of each input variable and their interactions to the output variance.
The Sobol’ sequence sampling of nine random variables is carried out, and the first-order sensitivity index and total sensitivity index of the maximum IDR are calculated. The results are shown in the interval bar chart in
Figure 8. The analysis shows that:
- (1)
The elastic modulus of steel, the yield strength of steel beam and the height of beam column section are the key parameters that affect the overall deformation capacity of the structure.
- (2)
The total sensitivity index of the width and thickness of the beam and column flange is significantly higher than that of the first-order index, which indicates that the interaction is the main effect on the output.
Therefore, the five key parameters—steel Young’s modulus
E, steel beam yield strength
fyb, section height coefficient
H, flange width coefficient
bf, and flange thickness coefficient
tf—are selected as input features for the subsequent machine learning model. It is worth noting that the four excluded variables—column yield strength
fyc, column ultimate strength
fuc, beam ultimate strength
fub, and web thickness coefficient
tw—each display a total sensitivity index
ST,i noticeably larger than their first-order index
Si (see
Figure 8), suggesting that their contribution to the output variance operates primarily through interactions with the retained parameters rather than through independent main effects. Consequently, the surrogate model captures their influence only indirectly via the five selected features with which they interact. While this simplification is justified by the sensitivity ranking and practically necessary for dimensionality reduction, it carries implications for model robustness and transferability that warrant acknowledgment. The trained models are expected to maintain high predictive accuracy within the current parameter space, as the dominant variance contributors are retained; however, were these models applied to structural configurations where the interaction structure among variables differs—for instance, cases where web thickness or column strength play a more direct role—predictive performance could degrade. Furthermore, the selected feature subset is inherently specific to the nine-story steel frame. Moreover, the probability distributions adopted in this study, and its transfer to frames of different heights, irregular layouts, or alternative steel grades, should be preceded by a re-evaluation of the sensitivity structure. Notwithstanding these considerations, the five-feature set is considered adequate for the primary objective of this study, i.e., establishing and validating a machine learning-based reliability analysis methodology, as corroborated by the consistently high coefficient of determination values (all exceeding 0.987) achieved by all three models on the test set.
4.2. Machine Learning Data Preprocessing and Model Evaluation Metrics
A dataset comprising 10,000 samples is generated using the Latin Hypercube Sampling (LHS) method, with each sample containing the aforementioned five input features and one output label representing the maximum IDR. The dataset is randomly divided into a training set (8000 samples) and a test set (2000 samples) at an 8:2 ratio. To eliminate dimensionality effects, the input features are standardized to have a mean of zero and a standard deviation of 1.
The following four commonly used regression evaluation indicators are calculated to evaluate and compare the model performance:
Mean Squared Error (MSE) refers to the average of the squares of the differences between predicted and actual values. A smaller MSE indicates that the model’s predictions are closer to the true values.
Root Mean Squared Error (RMSE) is the square root of Mean Squared Error (MSE). It shares the same unit as the original data and is more sensitive to outliers than MSE.
Mean Absolute Error (MAE) refers to the average absolute value of the difference between predicted and actual values.
The coefficient of determination (
R2) indicates the model’s fit to observed values. The range of
R2 is (−∞, 1], where a higher
R2 value closer to 1 signifies better model fit to the data.
where
represents the true value,
represents the predicted value, and
is the mean of the true values, with
being the sample size. Lower MSE, RMSE, and MAE values, along with a closer
R2 to 1, indicate better model performance.
4.3. Machine Learning Training and Optimization
Three optimized machine learning models are employed for training and prediction respectively:
- (1)
PSO-SVR: The particle swarm optimization (PSO) algorithm is employed to optimize the kernel function coefficients and penalty factors of support vector regression (SVR).
- (2)
PSO-XGBoost: The PSO algorithm is employed to optimize key parameters of extreme gradient boosting trees, including maximum depth and learning rate.
- (3)
BPNN: A single-hidden-layer backpropagation neural network with seven hidden layer nodes.
4.3.1. PSO-SVR Training and Optimization Details
In SVR, the kernel coefficient γ, the penalty factor C, and the tolerance ε are three key model parameters. When the radial basis function (RBF) kernel is selected, a larger γ leads to higher model complexity, which makes the model more prone to overfitting and reduces its generalization ability. Conversely, a smaller γ results in lower model complexity and may fail to capture the underlying patterns in the dataset. A larger C value indicates a lower tolerance for misclassification, placing greater emphasis on accurate classification of data points, which can also lead to overfitting. In contrast, a smaller C allows the model to be more tolerant of misclassification, making the model more flexible; however, this may result in suboptimal performance on the training data, as the model may struggle to capture complex patterns in the data. The tolerance ε denotes the width of the insensitive band; a larger ε allows more data points to fall within the band. When the band is sufficiently wide, it can encompass all sample points, but this also makes the model more prone to underfitting.
The accuracy of the SVR model also depends heavily on the choice of its parameters. To select optimal parameters for better prediction of structural responses and accurate evaluation of the overall seismic reliability of the structure, this subsection adopts PSO to tune the three key parameters of the SVR model. The coefficient of determination
R2 is used as the fitness function for the optimization. To ensure that the particle swarm can fully explore potential solutions while maintaining computational efficiency, the population size is typically set to 10 times the number of parameters to be optimized—resulting in 30 particles. Through multiple experimental tests, it was found that 50 iterations are sufficient for the fitness value to stabilize. In parameter optimization, the search space of all parameters is very large and the initial positions of particles are usually randomly distributed. Therefore, it is necessary to impose constraints on the search ranges. After repeated attempts and considering hardware limitations, the constrained search ranges for the PSO are determined, and presented in
Table 7.
During the early stage of particle swarm evolution, a relatively large inertia weight and a large cognitive (individual best) learning factor are adopted to allow particles to explore the search space fully, while a smaller social (global best) learning factor is used to reduce information sharing among particles. In the later stage of evolution, the opposite strategy is employed: information sharing among particles is enhanced to move the swarm toward the global best. The inertia weight
w and the learning factors
c1 and
c2 are conventionally set to 0.4, 2, and 2, respectively. In this study, a linear decreasing weight strategy is adopted, in which the inertia weight and learning factors are treated as varying gradually with the number of iterations. The inertia weight starts at 0.9 in the early stage and decreases linearly to 0.4 as the iteration increases. The cognitive learning factor decreases linearly from 3 to 1, while the social learning factor increases linearly from 1 to 3. The optimized model parameters are also presented in
Table 7. The result shows that coefficient of determination
R2 has improved considerably, indicating that the adoption of PSO enhances the training effectiveness of the SVR model.
4.3.2. PSO-XGBoost Training and Optimization Details
Before parameter tuning, XGBoost exhibited very good training performance on the training set, but performed poorly on the test set, indicating poor generalization ability and suggesting possible overfitting. In addition, the prediction performance on the tail data of the test set was also unsatisfactory, which would lead to increased errors in reliability calculations in subsequent analyses.
To overcome the above issues, PSO is adopted for parameter optimization in this subsection. In XGBoost, the maximum tree depth
dmax, learning rate
η, number of boosting rounds
Nb, L2 regularization term on leaf weights
λ, and minimum split loss
γ are five key parameters. Among them,
dmax and
Nb determine the complexity of the model, and
λ and
γ serve as two penalty factors to control model complexity. Since XGBoost is significantly affected by these parameters, this section optimizes these five parameters of the XGBoost model. The population size is set to 50, and the number of particle swarm iterations is 50. After multiple trials and considering hardware limitations, the constrained search ranges and the optimized parameters of the PSO-XGBoost are shown in
Table 8.
4.3.3. PSO-BPNN Training and Optimization Details
The BP neural network was implemented based on numerical simulation software, and the Levenberg–Marquardt (L-M) algorithm was used for model training. Since the neural network involves activation functions, its implementation differs slightly from the two aforementioned methods, resulting in minor differences in data partitioning and preprocessing. For this module, the dataset was divided into training set, validation set, and test set in a ratio of 7:1:2. A single-hidden-layer BP neural network was used for training. The number of hidden layer nodes was determined according to the empirical formula:
where
is the number of hidden layer nodes;
is the number of input nodes;
is the number of output nodes; and
is an empirical coefficient ranging from 0 to 10. Since the data in this study have five input features and one output, the number of hidden layer nodes was selected within the range of 3 to 13. After multiple experimental trials, the number of hidden layer nodes was optimally chosen as seven. The tansig function was used as the activation function for the hidden layer neurons, while the purelin function was used for the output layer neuron.
The weights and biases are adjustable parameters in BPNN. They determine the connection strength between neurons and the activation thresholds, directly influencing the learning capacity and performance of the neural network. Similar to the previous sections, PSO was adopted to optimize the weights and biases in BPNN. The BPNN architecture is 5-7-1, with fully connected layers, resulting in a total of 50 weights and biases to be optimized. Accordingly, the population size was set to 500, and the number of particle swarm iterations was set to 50. The constrained search range for the particle swarm was set to [−10, 10].
It should be noted that due to implementation constraints, the initial gradient and initial damping parameters of the L-M algorithm cannot be adjusted during training. Therefore, PSO is used to obtain locally optimal solutions under different initial gradients and initial damping parameters. The results in the next section will also show that, when BPNN itself can already achieve good predictive performance, applying PSO to optimize its weights and biases results in only a marginal improvement compared to the original BPNN, but it helps ensure the stability of BPNN.
It is acknowledged that PSO-based hyperparameter optimization adds a non-negligible one-time offline computational cost. However, as noted by Afshari et al. [
12] in their comprehensive review of ML-based structural reliability methods, this offline investment is justified when the surrogate model is subsequently used for a large number of online evaluations. In the present framework, the direct MCS benchmark requires 10
5 nonlinear pushover analyses, whereas the ML-assisted approach reduces the required number of finite elements runs to 10
4 training samples. Once trained, the surrogate evaluates 10
7 Monte Carlo samples at negligible computational cost (less than 10
−4 s per prediction). The offline optimization cost is therefore amortized over the substantial reduction in online finite element evaluations, making the overall framework cost-effective for the global seismic reliability assessment of multi-story steel frames.
4.4. Machine Learning Predictive Results and Reliability Analysis
Overall, all three models perform well in predicting the maximum IDR, with coefficients of determination R2 above 0.987. Among them, the BPNN demonstrates the highest stability and accuracy, achieving an R2 of 0.991 on the test set.
The trained model serves as a surrogate model, with MCS applied for reliability assessment. The reliability analysis procedure is divided into two parts: an “offline training stage” and an “online reliability calculation stage” (see
Figure 15). The left side shows the sensitivity screening and surrogate model training process, while the right side shows the process of performing large-scale MCS and failure probability calculation using the trained surrogate model. The procedure on the right side comprises: (1) generating 10
7 input feature sets via LHS; (2) predicting maximum IDRs using machine learning models; (3) calculating indicator function values based on the limit state function
g; (4) estimating failure probabilities by
The reliability indices derived from the machine learning–MCS method are compared with the direct MCS (considered as the exact solution) and the MLE method described in
Section 3.4, with results summarized in
Table 12,
Table 13 and
Table 14.
It can be seen from the tables that the probability of the structural response exceeding a larger IDR limit is smaller, and the corresponding reliability index is larger. At lower reliability levels, the error of PSO-XGBoost-MCS is larger than that obtained by the MLE method; however, at higher structural reliability levels, it achieves more accurate results than the MLE method. Based on reasonable extrapolation from the results under 1/70 to 1/55 limit, the reliability indices of the steel frame under the code-required IDR limit of 1/50, as obtained with three training models, all satisfy the requirements of the Chinese code. Data analysis reveals that BPNN-MCS achieves the highest accuracy, with relative errors of only 0.5311% to 1.0259% when calculating reliability indices under various limit values of IDR. PSO-SVR-MCS and PSO-XGBoost-MCS also demonstrate high precision, showing maximum relative errors of 2.7404% and 4.3578% respectively. All machine learning methods significantly outperform the MLE method based on the Gumbel distribution assumption, with the original MLE method exhibiting an error rate of 11.9383% at the 1/55 IDR limit value. Analysis of the nine-story steel frame structure highlights the advantages of machine learning in handling complex and unknown response distributions.
It should be acknowledged that the present comparison between ML-MCS and direct MCS focuses primarily on the accuracy of the resulting reliability indices, without a detailed quantification of the statistical uncertainty inherent in the direct MCS reference values or a formal comparison of computational costs. The direct MCS results reported in
Table 12,
Table 13 and
Table 14 serve as the best available benchmark under the current computational budget. A systematic assessment of computational efficiency gains, which encompass wall-clock time, the number of finite element evaluations saved, and overall speedup factors, is deferred to future work, where it will be integrated into a more comprehensive performance evaluation framework.
4.5. Impact of Sample Size on Model Training
To explore the data efficiency, the effects of sample size (500, 1000, 5000, 10,000) on model training and the accuracy of final reliability calculation are studied, as shown in
Figure 16,
Figure 17 and
Figure 18. Specifically,
Figure 16 shows the coefficient of determination
R2 for the training set across different sample sizes,
Figure 17 presents the
R2 for the test set across different sample sizes, and
Figure 18 displays the relative error of the reliability indices obtained from the three ML-MCS methods under varying sample sizes.
For the nine-story steel frame structure, with the increase in sample size, the training performance (R2) and reliability calculation accuracy of all models steadily improve. When the sample size is larger than 1000, the relative error of the reliability indices calculated by the three models can be controlled within 10%; when the sample size reaches 10,000, the error can be further reduced to within 5%. PSO-BPNN demonstrates the strongest robustness across various sample sizes, maintaining acceptable predictive performance even with just 500 samples. In contrast, PSO-XGBoost shows relatively unstable performance with small sample sizes (<1000).
The 1000-sample threshold can be understood from a simple perspective: with five input features, 1000 samples give roughly 200 data points per feature, which is about the minimum needed for a tree-based model to reliably tell meaningful patterns apart from random noise. When the sample size drops much below this, PSO-XGBoost tends to make splitting decisions based on noise rather than genuine trends, which explains the unstable performance seen in
Figure 16,
Figure 17 and
Figure 18. PSO-BPNN, on the other hand, learns a smooth overall relationship rather than making hard local splits, so it is naturally less affected by having fewer data points. That said, this threshold of 1000 should not be taken as a universal rule. If the structure has more uncertain parameters, for instance, a taller frame or one with randomized connection behavior and damping, more training samples would likely be needed. The present finding is best understood as a rough lower bound for regular low-to-mid-rise steel frames with a similar number of input variables.
5. Conclusions
The maximum IDR of the designed nine-story steel frame structure does not conform to common distribution form. The traditional parameter estimation and hypothesis testing method result in significant errors in reliability calculations, necessitating the use of more efficient calculation methods such as machine learning. Therefore, this research investigates the key EDP, model performance, and sample size sensitivity of machine learning-based seismic reliability analysis methods for structures. Using machine learning methods, the global reliability analysis of the structure was carried out. The above process also constitutes an important step in the global reliability-based design of structures, leading to the following conclusions:
- (1)
The global sensitivity analysis identifies the steel elastic modulus, steel beam yield strength, beam-column section height, beam-column flange width and beam-column flange thickness as the key parameters affecting the maximum IDR of steel frame structures. Based on these parameters, the machine learning model can accurately predict the structural response.
- (2)
Among the three machine learning models, PSO-BPNN demonstrates the highest accuracy and stability, with a maximum error of 1.0259%. PSO-SVR came in second, exhibiting a maximum error of 2.7388%, while PSO-XGBoost records the highest error at 4.3578%. The particle swarm optimization algorithm still effectively enhances model accuracy and stability, preventing overfitting.
- (3)
The sample size has a significant effect on the prediction effect of models. When the sample size is not less than 1000, the reliability analysis can be guaranteed. The prediction ability of the PSO-BPNN model is least affected by the sample size.
- (4)
The ML-based reliability framework, by avoiding distributional assumptions, achieves substantially smaller deviations from direct MCS than the conventional MLE approach (maximum error 1.0259% vs. 11.9383%), confirming its suitability when structural responses deviate from standard distributions.
It should be noted that this study provides insights for addressing the issue of random parameter characterization that does not conform to conventional distributions for multi-story steel frame structures. However, several limitations should be acknowledged. First, the conclusions may not be applicable to other structural forms where the random parameters follow conventional distributions. Second, this study does not consider other steel frame systems with different layouts or irregularities. Third, the comparison between machine learning-based and direct MCS methods is conducted primarily in terms of accuracy; the statistical uncertainty of the direct MCS reference values and the quantitative computational efficiency gains, including runtime, number of finite element evaluations saved, and overall speedup factors, are not systematically reported. These limitations will be addressed in future research, where a formal convergence analysis of direct MCS and a detailed cost–benefit assessment of surrogate-assisted reliability analysis will be undertaken.