A Novel Deep Hybrid Learning Framework for Structural Reliability Under Civil and Mechanical Constraints

Qasim Aljamal; Mahmoud AlJamal; Mohammad Q. Al-Jamal; Zaid Jawasreh; Ayoub Alsarhan; Sami Aziz Alshammari; Nayef H. Alshammari; Rahaf R. Alshammari

doi:10.3390/math13233834

,

and

¹

Department of Civil Engineering, Faculty of Architecture and Civil Engineering, Technical University Dortmund, 44227 Dortmund, Germany

²

Department of Cybersecurity, Science and Information Technology, Irbid National University, Irbid 21110, Jordan

³

Department of Renewable Energy, Technical Faculty, Jadara University, Irbid 21110, Jordan

⁴

Department of Artificial Intelligence, Science and Information Technology, Irbid National University, Irbid 21110, Jordan

Mathematics2025, 13(23), 3834;https://doi.org/10.3390/math13233834

This article belongs to the Special Issue Evolutionary Computation for Feature Selection and Dimensionality Reduction

Version Notes

Order Reprints

Abstract

This study presents an AI-based framework that unifies civil and mechanical engineering principles to optimize the structural performance of steel frameworks. Unlike traditional methods that analyze material behavior, load-bearing capacity, and dynamic response separately, the proposed model integrates these factors into a single hybrid feature space combining material properties, geometric descriptors, and load-response characteristics. A deep learning model enhanced with physics-informed reliability constraints is developed to predict both safety states and optimal design configurations. Using AISC steel datasets and experimental records, the framework achieves 99.91% accuracy in distinguishing safe from unsafe designs, with mean absolute errors below 0.05 and percentage errors under 2% for reliability and load-bearing predictions. The system also demonstrates high computational efficiency, achieving inference latency below 3 ms, which supports real-time deployment in design and monitoring environments. the proposed framework provides a scalable, interpretable, and code-compliant approach for optimizing steel structures, advancing data-driven reliability assessment in both civil and mechanical engineering.

Keywords:

AI-assisted structural optimization; physics-informed machine learning; structural reliability assessment; hybrid civil–mechanical framework; steel material properties

MSC:

68T05; 68T20; 68T37

1. Introduction

The rapid advancement of artificial intelligence (AI) and machine learning (ML) has transformed numerous engineering domains, including structural analysis, design optimization, and reliability assessment [1]. In particular, the integration of data-driven models with domain-specific physical knowledge offers the potential to enhance predictive accuracy, improve computational efficiency, and reduce the risks associated with structural failures [2]. Civil and mechanical engineering systems are increasingly expected to satisfy stringent safety standards while also meeting demands for sustainability, cost-efficiency, and resilience [3]. Achieving this balance requires methodologies that not only leverage high-capacity neural networks but also ensure strict compliance with engineering codes and limit states [4].

Traditional computational models, such as finite element simulations and physics-only approaches, have provided reliable insights into structural behavior for decades [5]. These physics-based methods, however, are computationally expensive and lack scalability for real-time or large-scale monitoring tasks. Purely data-driven AI models, while efficient and adaptive, often lack physical interpretability and risk producing noncompliant predictions when applied to safety-critical systems [6]. This dichotomy highlights the necessity of hybrid approaches that combine the predictive power of deep learning with the interpretability and rigor of physics-informed modeling [7].

Recent research has begun to explore such hybrid paradigms, where physics-informed neural networks (PINNs) and code-constrained optimization frameworks have demonstrated promising results in bridging the gap between data-driven learning and structural code compliance [8]. These approaches embed structural knowledge, such as load-bearing limits and material constraints, directly into the learning process, thereby reducing reliance on massive labeled datasets while ensuring compliance with established design standards [9]. Despite these advances, significant challenges remain in terms of scalability, generalization to diverse structural conditions, and the ability to adapt models dynamically under varying environmental and operational scenarios [10]. Moreover, existing studies have rarely addressed the verification of model performance under edge-computing constraints or validated the physical consistency of AI-driven predictions in accordance with engineering codes.

Another critical challenge arises in the optimization and reliability evaluation of AI-based structural models [11]. The performance of deep learning systems is highly sensitive to hyperparameter settings, requiring robust strategies for parameter tuning to achieve optimal outcomes [12]. While heuristic search methods such as grid search or random search are widely used, they are often inefficient in high-dimensional spaces [13]. Bayesian optimization has emerged as a powerful alternative, offering efficient exploration of parameter configurations and balancing accuracy with computational cost [14]. Yet, the integration of Bayesian optimization with physics-informed modeling remains underexplored in structural applications [15]. This gap underscores the need for frameworks capable of coupling intelligent hyperparameter tuning with physics-informed architectures to achieve both predictive robustness and engineering reliability.

The computational efficiency plays a pivotal role in enabling real-world deployment of AI-assisted structural systems [16]. Edge devices and cloud-based infrastructures that support smart cities and industrial applications impose constraints on latency, memory usage, and throughput. Therefore, any proposed framework must be designed to meet performance requirements in resource-constrained environments while maintaining reliability and compliance with engineering codes [17]. In this context, ensuring that the model remains both lightweight and interpretable during real-time operation becomes a crucial limitation that this study seeks to mitigate through quantization-aware optimization and edge-deployment validation.

The contributions of this paper are threefold:

We propose a hybrid AI-assisted optimization framework that integrates a deep neural network with a physics-informed module, enforcing code-based structural constraints and limit state functions during training.
We introduce a Bayesian hyperparameter optimization loop that adaptively tunes model configurations, enhancing predictive performance while minimizing computational cost.
We evaluate the framework through comprehensive validation and reliability metrics, including statistical, regression, and computational efficiency measures, ensuring scalability and deployment feasibility for real-time structural monitoring systems.

Table 1 highlights the fundamental differences between purely physical, purely AI-based, and hybrid modeling paradigms in structural engineering. Physical models such as finite element methods provide high interpretability and compliance with design codes but are computationally demanding and less scalable. Pure AI models deliver rapid predictions and adaptability yet often lack physical consistency and may produce noncompliant outcomes. The proposed hybrid AI–physics framework combines the strengths of both by embedding code-based constraints and limit-state equations into the learning process.

Table 1. Comparison between Physical, Pure AI, and Proposed Hybrid Modeling Approaches in Structural Analysis.

2. Literature Review

Lei et al. (2025) [22] proposed a hybrid machine learning model for predicting the shear strength of rock joints by integrating a multilayer perceptron (MLP) with the slime mold algorithm (SMA). Their SMA-MLP framework leveraged SMA’s global optimization capabilities to prevent local minima in MLP training, thereby improving predictive stability and accuracy. A dataset of 84 samples, incorporating joint roughness coefficient (JRC), normal stress, basic friction angle, Young’s modulus, and uniaxial compressive strength, was used for evaluation through five-fold cross-validation. The SMA-MLP model achieved good performance with

R^{2} = 0.9687

, RMSE = 0.097 MPa, and MAE = 0.067 MPa on the test set, outperforming baseline models such as MLP, CatBoost, random forest, ridge regression, and backpropagation neural networks. Feature importance analysis identified normal stress as the dominant factor influencing shear strength, while Z-score normalization enhanced generalization performance. Despite these contributions, the study focused exclusively on shear strength prediction without addressing regression-based reliability measures such as probability of failure, reliability index, or ultimate capacity, and it did not incorporate physics-informed constraints or validate real deploy-time capability. Our work extends these insights by introducing physical constraint embedding to enforce equilibrium consistency and material compliance within the learning process, and by integrating edge deployment verification to ensure robustness and inference stability under quantized, real-time conditions. Specifically, the developed dual-head hybrid framework performs both classification and regression, embeds physics-informed penalties, employs Bayesian optimization, and validates quantized deployment for code-compliant structural reliability assessment.

He et al. (2022) [23] introduced a hybrid deep learning framework for structural damage identification that integrates ensemble empirical mode decomposition (EEMD), Pearson correlation coefficient (PCC), and convolutional neural networks (CNN). Their method first decomposed acceleration signals using EEMD to obtain time–frequency components, then applied PCC to select the most informative features, which were subsequently fed into a CNN for damage classification. Evaluations on a three-story benchmark structure under four damage conditions demonstrated that the proposed EEMD–PCC–CNN achieved 94.02% accuracy, with precision, recall, and F1-scores above 92%, outperforming classical approaches including CNN, SVM, KNN, RF, and XGBoost. The results confirmed the superiority of combining time–frequency decomposition with deep feature extraction for structural health monitoring. However, this study focused solely on classification for damage identification, without addressing regression-oriented reliability metrics such as probability of failure, reliability index, or ultimate capacity. It also did not incorporate physics-informed constraints or validate real-time deployment feasibility. Our work extends this direction by embedding physical constraint enforcement into the learning process to ensure equilibrium and material compliance during model optimization, and by implementing edge deployment verification for evaluating inference robustness and latency under quantized operational conditions. Specifically, the proposed dual-head deep hybrid learning framework performs both classification and regression under physics-informed penalty embedding, employs Bayesian hyperparameter optimization, and validates quantized edge deployment through constraint-aware training for code-compliant structural reliability assessment.

Maryoosh et al. (2025) [24] proposed a hybrid learning framework for bridge damage prediction that integrates handcrafted and deep learning techniques. Their approach combined local binary pattern (LBP) and bag-of-visual-words (BoVW) for feature extraction, Apriori-based association rule mining for feature selection and dimensionality reduction, and MobileNetV3-Large as the final classifier. Experiments on multiple benchmark datasets, including DIMEC-Crack and Bridge Concrete Damage (BCD), with 10-fold cross-validation achieved classification accuracies ranging from 98.27% to 100%, with precision, recall, and F1-scores near 99% and error rates below 1.73%. The framework demonstrated robustness and outperformed conventional CNNs and transfer learning models such as VGG, ResNet, and Inception. However, this work was limited to crack classification and did not address regression-oriented reliability measures such as probability of failure, reliability index, or ultimate capacity. It also lacked physics-informed constraints and validation for real-time deployment on edge hardware. Our study extends these contributions by embedding physical constraint enforcement within the hybrid learning pipeline to ensure compliance with equilibrium and constitutive consistency, and by implementing edge deployment verification to evaluate inference robustness and latency on quantized embedded devices. Specifically, the proposed dual-head hybrid framework performs both classification and regression, integrates physics-informed penalties, applies Bayesian optimization for hyperparameter tuning, and validates quantized deployment under edge conditions for code-compliant structural reliability analysis.

Dang et al. (2020) [25] developed a feature-fusion and hybrid deep learning framework for structural health monitoring (SHM) that integrates multiple signal processing techniques with deep neural networks. Their method extracted features from autoregressive (AR) models, discrete wavelet transform (DWT), and empirical mode decomposition (EMD), fusing them into three-dimensional tensors subsequently processed by a 1D-CNN–LSTM hybrid network. The CNN layers captured local temporal dependencies, while LSTM cells modeled long-term patterns in vibration responses. Case studies included experimental data from a three-story benchmark structure, a synthetic stayed-cable bridge (My Thuan), and real-world progressive damage tests on the Z24 bridge, where the approach achieved accuracies of 93.5% to 91% for damage detection and localization. Compared to 2D-CNNs operating on spectrograms, the framework maintained competitive accuracy with more than 50% lower computational and storage cost, and robustness was confirmed under 10% added white noise. However, this work was limited to damage detection and localization without addressing regression-oriented reliability measures such as probability of failure, reliability index, or load capacity, and did not embed physics-informed constraints or validate edge deployability. Our framework advances this direction by incorporating physical constraint embedding within the network’s loss structure to ensure equilibrium consistency and mechanical plausibility, and by performing edge deployment verification to assess quantized inference stability, latency, and robustness under real-time operational conditions. Specifically, the proposed dual-head hybrid framework performs both classification and regression, embeds physics-informed penalties, applies Bayesian hyperparameter optimization, and validates quantized deployment for code-compliant structural reliability assessment.

Ly et al. (2020) [26] presented computational hybrid machine learning models for predicting the ultimate shear capacity (USC) of steel fiber reinforced concrete (SFRC) beams. Using a dataset of 463 experimental samples covering beam geometry, mixture composition, and fiber properties, they developed two hybrid approaches: a neural network optimized with a real-coded genetic algorithm (NN-RCGA) and one optimized with the firefly algorithm (NN-FFA). The NN-RCGA achieved superior accuracy (

R = 0.9771

) compared to NN-FFA (

R = 0.965

) and traditional empirical equations (

R = 0.5274

–

0.9075

), while reducing RMSE and MAE by over 70% in some cases. Sensitivity analysis identified web width, effective depth, and clear depth ratio as the most influential parameters in shear capacity prediction, and the framework enabled predictions in less than one second per beam. However, this study was limited to shear capacity estimation without addressing regression-based reliability measures such as probability of failure, reliability index, or ultimate capacity, and it lacked physics-informed penalties and real-time edge validation. Our framework advances this direction by embedding physical constraint enforcement directly into the loss formulation to ensure mechanical equilibrium and constitutive consistency, while introducing edge deployment verification to evaluate quantized inference stability and latency across embedded devices. Specifically, the proposed dual-head hybrid framework performs both classification and regression, incorporates physics-informed penalty embedding, employs Bayesian optimization for hyperparameter tuning, and validates quantized deployment under edge hardware constraints for code-compliant structural reliability assessment.

3. Methodology

The methodology of this study is designed to integrate structural engineering knowledge with advanced AI-driven optimization to ensure both predictive accuracy and compliance with established civil and mechanical engineering codes. As illustrated in Figure 1, the process begins with input processing and ingestion, where dataset artifacts undergo preprocessing, normalization, augmentation, and noise injection to form robust and representative feature vectors. These feature vectors are then passed into the proposed AI-assisted optimization framework, which employs a hybrid deep neural network comprising a classification head for structural reliability states and a regression head for capacity prediction. A physics-informed module is embedded within the network to enforce structural limit state functions and code-based constraints (AISC, Eurocode), with the loss function jointly regularized by both data-driven and physics-based penalties. Hyperparameter tuning is guided by Bayesian Optimization in a feedback loop that continuously updates the model configuration for optimal performance. Validation and reliability evaluation are conducted through cross-validation and multiple statistical metrics (Accuracy, Precision, Recall, F1, AUC, MAE, RMSE), ensuring compliance with safety thresholds before deployment. Computational efficiency-measured in terms of latency, throughput, and memory usage-is assessed to guarantee the feasibility of real-time deployment in edge or cloud-assisted monitoring systems.

Figure 1. Proposed Methodology.

3.1. Dataset Used

The utilized dataset in this study is the American Institute of Steel Construction Material Property Database (AISC-MPD, Version 16.0), officially released by the American Institute of Steel Construction (AISC) and publicly accessible through the AISC online repository. This dataset provides experimentally measured mechanical properties of multiple steel grades commonly used in civil and mechanical engineering applications. It includes yield strength (

f_{y}

), ultimate tensile strength (

f_{u}

), modulus of elasticity (E), ductility, and complete stress–strain curves, along with geometric descriptors such as cross-sectional dimensions. Since the data are derived from standardized laboratory tests following ASTM A370 and AISC protocols, they ensure measurement consistency and high fidelity. The database comprises approximately 2480 samples covering more than 45 structural steel alloys, including ASTM A36, A992, and A572 grades, representing a wide range of mechanical and geometric variability suitable for data-driven structural analysis.

For the ML framework, the dataset was partitioned into 70% for training, 15% for validation, and 15% for testing, ensuring stratified sampling across steel grades to maintain proportional representation of material diversity. Prior to modeling, all numeric features were standardized using z-score normalization, while categorical alloy identifiers were encoded using one-hot representation. The diversity of steel grades and geometric variability makes the dataset highly appropriate for AI-driven structural optimization tasks [27]. Table 2 provides a concise overview of the AISC Steel Material Property Database (AISC-MPD).

Table 2. Summary of AISC Steel Material Property Database (AISC-MPD).

3.2. Feature Representation and Engineering

The way that the mechanical and civil properties are represented, described, and turned into useful features has a significant impact on the applicability of AI-based structural optimization. From the perspective of the civil engineer, both material and geometrical properties have a significant influence on the structural capacity. In addition to geometric indices like cross section sizes, slenderness ratios, and moment of inertia, important parameters include the yield strength (

f_{y}

), ultimate tensile strength (

f_{u}

), elastic modulus (E), and ductility. These characteristics serve as the foundation for feature representation since they encapsulate the fundamental stiffness, strength, and stability of steel structures.

Such time-dependent features would provide supplementary information on material and structural performance under actual loading scenarios, according to mechanical engineering. In order to simulate nonlinear and time-dependent behavior, these macroscopic models incorporate stress-strain histories, cyclic fatigue response, and vibration-induced dynamic properties. These features would therefore improve the predictive accuracy for optimization problems by preventing the AI model from only taking into account the static descriptors and teaching it to take into account the dynamic performance of steel frameworks (such as progressive damage accumulation).

To provide additional physics-informed insights, engineered indicators are derived from the raw parameters. The most critical among them are the stress ratio (

R = σ / f_{y}

), which quantifies proximity to yielding; the strain energy density (U), which reflects ductility and toughness; and the demand-to-capacity ratio (DCR), which assesses safety compliance under applied loading. These derived indicators are physically interpretable and improve the robustness of the AI model by directly encoding structural safety and reliability constraints into the feature set.

When handling high-dimensional signals such as full stress-strain curves or vibration time histories, dimensionality reduction techniques are applied to avoid redundancy and improve computational efficiency. Principal Component Analysis (PCA) or learned embeddings compress correlated features into compact components while preserving the majority of variance in the data. This produces a hybrid feature space that integrates civil and mechanical insights, raw measurements, and engineered reliability indicators, ultimately enabling the framework to achieve both predictive accuracy and interpretability.

Stress ratio:

R = \frac{σ}{f_{y}}

(1)

Strain energy density:

U = \int_{0}^{ε} σ d ε

(2)

Demand-to-capacity ratio:

D C R = \frac{D}{C}

(3)

PCA transformation for high-dimensional features:

Z = W^{T} (X - μ),

(4)

where X is the input feature matrix,

μ

the mean vector, and W the eigenvector matrix.

Table 3 summarizes the hybrid feature representation. Civil and mechanical features provide foundational structural descriptors, engineered indicators enforce physics-informed safety criteria, and PCA reduces complexity while retaining essential information. This integrated representation ensures that the AI framework achieves high predictive accuracy while maintaining interpretability and compliance with design standards.

Table 3. Feature Representation for AI-Based Structural Optimization.

Algorithm 1 begins with the raw dataset

D

containing material, geometric, and mechanical attributes such as yield strength

f_{y}

, ultimate strength

f_{u}

, elasticity modulus E, geometry g, load histories

L (t)

, and stress–strain responses

(σ (t), ε (t))

. Each variable undergoes preprocessing to ensure unit consistency, missing-value imputation, and standardized scaling computed only from the training subset, thereby avoiding data leakage. Time-series signals are segmented into short windows, where both frequency-domain and time-domain features (epower spectrum, dominant frequency, RMS, and crest factor) are extracted to capture dynamic behavior.

Algorithm 1 Physics-Informed Feature Engineering (Expanded, Leakage-Safe)
Require: Dataset $D$ with $(f_{y}, f_{u}, E, g, L (t), σ (t), ε (t))$ ; split $(I_{train}, I_{val})$
Ensure: Final features F, metadata $meta$
Hyperparams: quantiles $[q_{ℓ}, q_{u}]$ , MI threshold $I_{min}$ , VIF limit $τ$ , PCA variance $η$ , window W, hop H
1: function Preprocess(x)
2: $x \leftarrow$ unit_convert(x)	▹ to SI
3: $x \leftarrow$ impute(x; policy=`MEDIAN`)	▹ fit on train only
4: $a, b \leftarrow Q_{q_{ℓ}} (x_{I_{train}}), Q_{q_{u}} (x_{I_{train}})$ ; $x \leftarrow min (max (x, a), b)$
5: $μ, σ \leftarrow mean (x_{I_{train}}), std (x_{I_{train}})$ ; $\hat{x} \leftarrow (x - μ) / σ$
6: return $\hat{x}, (a, b, μ, σ)$
Global hygiene and stats (fit on train, apply to all)
7: for channel $x \in D$ do
8: $\hat{x}, θ_{x} \leftarrow$ Preprocess(x); store $θ_{x}$ in $meta$
9: function Window( $x (t), W, H$ )
10: return list of windows ${w_{k}}$ length W with hop H
11: function FFTblock( $w, Ω$ )
12: compute $P (ω)$ ; $f_{dom} \leftarrow arg {max}_{ω \in Ω} P (ω)$ ; $H_{s} \leftarrow - \sum p_{ω} log p_{ω}$
13: return ${P, f_{dom}, H_{s}, rms, crest}$
14: for sample $s = 1 . . N$ do
15: extract $(f_{y}, f_{u}, E, g, L (t), σ (t), ε (t))$
16: $ε_{y} \leftarrow f_{y} / E$ ; $R \leftarrow {max}_{t} σ (t) / f_{y}$ ; $μ \leftarrow ε_{max} / ε_{y}$ ; $B \leftarrow f_{u} / f_{y}$
17: $U \leftarrow \int σ d ε$ (trapz); $D \leftarrow {max}_{t} L (t)$ ; $C \leftarrow C (g, f_{y}, E)$ ; $D C R \leftarrow D / C$
18: $S \leftarrow N / N_{cr} (g, E)$ ; $λ \leftarrow K L / r (g)$ ; $Δ_{ℓ} \leftarrow ϕ_{ℓ} (g, f_{y}, E, L) - γ_{ℓ}$
19: ${w_{k}} \leftarrow$ Window( $σ (t), W, H$ ); $T \leftarrow []$
20: for each $w_{k}$ do
21: $T . append ($ FFTblock( $w_{k}, Ω$ ))
22: $D_{Miner} \leftarrow$ rainflow_damage( $σ (t)$ )
23: build interactions $I = {x_{i} x_{j}, x_{i} / x_{j}, log x_{i}^{+}}$
24: $X_{s} \leftarrow$ concat $([f_{y}, f_{u}, E, g, R, μ, B, U, D C R, S, λ, Δ_{ℓ}, agg (T), D_{Miner}, I])$
Feature screening (train only)
25: for feature $X_{j}$ do
26: compute var $(X_{j})$ , VIF $(X_{j})$ , $\hat{I} (X_{j}; Y)$
27: if var $< v_{min}$ or VIF $> τ$ or $\hat{I} < I_{min}$ then mark j for removal
28: $J \leftarrow$ indices not removed; $X \leftarrow X_{:, J}$
Dimensionality reduction (train fit, global apply)
29: Fit PCA on $X_{I_{train}}$ ; pick r s.t. $\sum_{k = 1}^{r} λ_{k} / \sum λ \geq η$ ; $Z \leftarrow X V_{(:, 1 : r)}$
30: (optional) Fit denoising autoencoder on $X_{I_{train}}$ ; latent $Z_{A} \leftarrow DAE (X)$
Finalize and package
31: for $s = 1 . . N$ do
32: $F_{s} \leftarrow [X_{s, J}, Z_{s}, Z_{A, s}, missingness masks]$
33: update $meta$ with $(J, V, Λ, r, η, W, H, Ω, θ_{x} \forall x)$
34: return $F = {F_{s}}_{s = 1}^{N}, meta$

Physics-informed descriptors are simultaneously derived, including stress ratio, ductility, brittleness index, strain-energy density, and demand-to-capacity ratio, providing interpretable indicators of material and structural performance. Each feature is then screened for low variance, multicollinearity, and mutual-information relevance before dimensionality reduction. Principal Component Analysis (PCA) and a denoising autoencoder yield compact latent representations.

The engineered features are concatenated into a unified representation

F_{s}

for each sample, combining statistical, spectral, and physics-based variables. Metadata (such as scaling parameters, PCA bases, and windowing settings) are stored for reproducible deployment. This modular process ensures leakage-safe preprocessing, feature relevance, and computational efficiency while maintaining physical interpretability and robustness for downstream learning.

3.3. AI-Assisted Structural Optimization Framework

The AI-conditioned structural optimization method is based on a hybrid deep neural network (DNN) designed for a joint classification–regression task. The classification head enhances the detection of code-compliant configurations by distinguishing reliable from unreliable structural designs, while the regression head estimates continuous indicators such as the reliability index (

β

) and probability of failure (

P_{f}

). This dual-task strategy allows the framework to provide both categorical and quantitative safety assessments, supporting comprehensive pre-selection and reliability evaluations.

To preserve the physical integrity of predictions, physics-based constraints derived from AISC and Eurocode provisions are explicitly incorporated into the learning process. These state functions and code-based safety requirements are embedded as penalty terms in the total loss, ensuring that the network avoids statistically plausible yet physically infeasible solutions. By merging data-driven learning with structural mechanics, the model achieves high predictive accuracy and interpretability, effectively bridging the gap between artificial intelligence and engineering practice.

The input feature space F integrates material parameters, geometric descriptors, and load-response characteristics. The classification and regression outputs are optimized jointly through a composite loss function that balances statistical accuracy and physics-informed penalties. This formulation enables generalization across multiple steel grades, geometries, and loading scenarios, providing a unified framework for code-compliant optimization in civil and mechanical applications.

To ensure physical consistency, all mechanical quantities are defined explicitly as follows: the stress ratio

R = σ / f_{y}

expresses the normalized stress state relative to the yield strength

f_{y}

, where

σ (t)

is the instantaneous stress over time or load history. The strain energy density is given by

U = \int σ d ε

, evaluated either per loading cycle in fatigue analysis or over the elastic–plastic deformation path for static cases. The demand-to-capacity ratio (DCR) is defined as

D C R = D / C

, where D denotes the applied structural demand (e.g., bending moment, axial force, or deflection) and C is the corresponding design capacity computed as a function of geometry g, yield strength

f_{y}

, and modulus of elasticity E (i.e.,

C = C (g, f_{y}, E)

). The framework also incorporates instability effects through local and global buckling functions (

λ_{c} / λ_{c r}

) and serviceability-based deflection constraints (

Δ / Δ_{allow}

). These formulations are embedded directly within the physics-informed penalty term to ensure that each training iteration adheres to code-based safety limits.

Given a normalized feature vector

F

, the model predicts classification output

{\hat{y}}_{c}

and regression output

{\hat{y}}_{r}

as

\hat{y} = f_{θ} (F),

(5)

where

f_{θ}

denotes the DNN parameterized by

θ

.

The total loss integrates the objectives of classification, regression, and physical constraint satisfaction:

L_{t o t a l} = α L_{C E} (y_{c}, {\hat{y}}_{c}) + β L_{M S E} (y_{r}, {\hat{y}}_{r}) + γ L_{L S F},

(6)

where

L_{C E}

represents the cross-entropy loss for classification,

L_{M S E}

denotes the mean squared error for regression, and

L_{L S F}

penalizes violations of limit state functions. Physics-informed constraint formulation: The penalty term

L_{L S F}

encapsulates essential limit states derived from structural design codes, namely,

Tensile and compressive yielding: $g_{1} = σ_{\max} / f_{y} - 1$ ;
Local buckling: $g_{2} = λ_{c} / λ_{c r} - 1$ ;
Global buckling: $g_{3} = P / P_{c r} - 1$ ;
Serviceability (deflection): $g_{4} = Δ / Δ_{allow} - 1$ .

Each

g_{i} (\cdot)

quantifies the normalized deviation between the predicted structural response and the corresponding code-based limit. The total physics penalty is expressed as

L_{L S F} = \sum_{i} max (0, g_{i} ({\hat{y}}_{r}) - τ_{i}),

(7)

where

τ_{i}

denotes the permissible tolerance calibrated from AISC/Eurocode thresholds typically

τ_{σ} = 0.05

for yield stress,

τ_{P} = 0.10

for load ratios, and

τ_{Δ} = 0.10

for deformation constraints. These tolerances define the domain of code compliance within which predictions remain physically valid. The weighting coefficients

α

,

β

, and

γ

correspond to the relative contributions of classification accuracy, regression fidelity, and constraint enforcement, respectively, and are optimized through Bayesian hyperparameter tuning to achieve balanced learning across all objectives.

In multi-action scenarios (combined bending and axial loading), each limit-state penalty is scaled by a case-specific weight

w_{i}

, yielding the generalized form

L_{L S F} = \sum_{i} w_{i} max (0, g_{i} ({\hat{y}}_{r}) - τ_{i})

. These weights reflect the relative importance of each failure mode as prescribed by design codes (AISC 360, Eurocode EN 1993) and ensure balanced enforcement of strength, stability, and serviceability across all load combinations.

Table 4 illustrates the components of the AI-assisted structural optimization framework work collectively to ensure both predictive accuracy and engineering code compliance. The classification head provides a rapid binary decision that distinguishes safe from unsafe structural states, enabling immediate and code-consistent design screening. Complementing this, the regression head estimates continuous reliability indicators—specifically the probability of failure (

P_{f}

) and the reliability index (

β

)—which offer quantitative safety margins for engineering assessment. The physics-informed constraints embed structural limit-state functions (

g_{i}

) and tolerances (

τ_{i}

) directly into the learning process, ensuring that all predictions remain physically meaningful and aligned with AISC/Eurocode provisions. Finally, the joint loss function combines classification, regression, and physics-based penalties into a unified optimization objective, balancing statistical accuracy with physical feasibility. Together, these four components establish a robust and interpretable hybrid framework capable of delivering code-compliant, reliable, and computationally efficient structural optimization.

Table 4. Components of the AI-Assisted Structural Optimization Framework.

Algorithm 2 outlines the complete training procedure for the proposed hybrid model, jointly optimizing classification and regression objectives while enforcing physical constraint compliance.

Algorithm 2 Enhanced AI-Assisted Structural Optimization Framework

Require:: Feature set $F$ , classification labels $y_{c}$ , regression targets $y_{r}$ , learning rate $η$ , maximum epochs E
Ensure:: Optimized hybrid model $f_{θ}$
1:: Initialize: random network parameters $θ_{0}$ ; learning rate scheduler $η (t)$ ; regularization factor $λ$
2:: Split: Partition $F$ into training (70%), validation (15%), and test (15%) subsets
3:: for epoch $= 1$ to E do
4:: Shuffle minibatches $B = {(F_{b}, y_{c}^{b}, y_{r}^{b})}$
5:: for each minibatch $(F_{b}, y_{c}^{b}, y_{r}^{b}) \in B$ do
6:: Forward pass: compute hidden features $h = f_{θ} (F_{b})$
7:: Classification head: ${\hat{y}}_{c} = σ (W_{c} h + b_{c})$
8:: Regression head: ${\hat{y}}_{r} = W_{r} h + b_{r}$
9:: Compute classification loss $L_{c l s} = - \sum y_{c}^{b} log ({\hat{y}}_{c})$
10:: Compute regression loss $L_{r e g} = {∥ y_{r}^{b} - {\hat{y}}_{r} ∥}_{2}^{2}$
11:: Compute physics penalty $L_{L S F} = \sum_{i} max (0, g_{i} ({\hat{y}}_{r}) - τ_{i})$
12:: Regularization: $L_{r e g u} = λ {∥ θ ∥}_{2}^{2}$
13:: Total loss: $L_{t o t a l} = α L_{c l s} + β L_{r e g} + γ L_{L S F} + L_{r e g u}$
14:: Backpropagate: $\nabla_{θ} L_{t o t a l}$
15:: Update parameters: $θ \leftarrow θ - η (t) \cdot \nabla_{θ} L_{t o t a l}$
16:: Validate model; compute accuracy, $R^{2}$ , and limit-state compliance ratios
17:: if early stopping criterion met then
18:: break
19:: return optimized model $f_{θ}$

The iterative optimization ensures that the trained network simultaneously achieves high predictive accuracy and strict adherence to design-code constraints. This physics-informed strategy produces a model capable of real-time, interpretable, and code-compliant structural optimization, aligning artificial intelligence with established engineering safety principles.

3.4. Optimization Strategy

The efficiency and accuracy of the proposed AI-assisted framework for structural optimization depend heavily on carefully selected hyperparameters. Hyperparameter tuning is non-trivial because parameters such as learning rate, batch size, dropout probability, number of hidden units, and task-specific loss weights have direct effects on model convergence, generalization, and stability. Manual tuning or grid search is inefficient for such high-dimensional spaces. Therefore, Bayesian Optimization is adopted to systematically explore the hyperparameter space, allowing the framework to converge toward an optimal configuration with fewer evaluations compared to brute-force methods.

Bayesian Optimization relies on a surrogate model, typically a Gaussian Process (GP), to approximate the objective function that relates hyperparameters to validation performance. An acquisition function guides the search by balancing exploration of uncertain regions and exploitation of promising areas already identified. This probabilistic approach is particularly suited for structural reliability analysis, where training the hybrid deep model is computationally expensive. By using Bayesian Optimization, the framework avoids redundant trials and ensures that optimal hyperparameters are discovered with computational efficiency.

The learning rate (

η

), batch size (B), number of hidden units (h), dropout rate (p), and loss function weights

(α, β, γ)

that balance classification, regression, and physics-informed tasks are among the tuned hyperparameters in this study. Stability and robustness are ensured by searching each parameter within precisely defined ranges. The final setup minimizes overfitting and inference latency while optimizing validation accuracy and reliability scores. The framework’s state-of-the-art performance is guaranteed by this methodical optimization, which offers statistical accuracy as well as realistic viability for extensive structural monitoring and optimization.

Let

h \in H

denote a hyperparameter configuration from the search space

H

. The optimization objective is defined as:

h^{*} = arg max_{h \in H} f_{v a l} (h),

(8)

where

f_{v a l} (h)

is the validation performance score.

Bayesian Optimization approximates

f_{v a l} (h)

with a Gaussian Process:

f_{v a l} (h) \sim GP (μ (h), k (h, h^{'})),

(9)

where

μ (h)

is the mean and

k (h, h^{'})

the covariance kernel.

The next hyperparameter set is chosen by maximizing the acquisition function

a (h)

:

h_{t + 1} = arg max_{h \in H} a (h | D_{t}),

(10)

with

D_{t}

representing previously evaluated configurations.

The hyperparameters taken into account during the Bayesian Optimization process are compiled in Table 5. A distinct component of the learning process is controlled by each of the following parameters: learning rate influences convergence, batch size controls computational efficiency, hidden units establish model complexity, dropout offers regularization, and loss weights balance several goals. The framework attains strong predictive performance and practical efficiency by methodically adjusting them.

Table 5. Tunable Hyperparameters and Search Ranges in Bayesian Optimization.

Algorithm 3 outlines the Bayesian Optimization procedure used to efficiently explore the hyperparameter search space and identify the configuration that maximizes validation performance. The process begins by generating an initial set of evaluations, which form the dataset

D

for modeling the relationship between hyperparameters and their corresponding performance. A Gaussian Process surrogate is then fitted to approximate this objective function, providing both a mean prediction and uncertainty estimate for unseen configurations. Guided by the acquisition function

a (h | D)

, the algorithm selects the next candidate hyperparameter set

h_{t}

that offers the best balance between exploration of uncertain regions and exploitation of promising areas. After training the model with

h_{t}

and computing its validation score, the dataset is updated, and this iterative procedure continues for T iterations. Finally, the algorithm returns the hyperparameter configuration

h^{*}

that achieved the highest validation score within

D

, ensuring an optimized and computationally efficient selection process.

Algorithm 3 Bayesian Optimization for Hyperparameter Tuning

Require:: Hyperparameter search space $H$ , maximum iterations T, acquisition function $a (h)$
Ensure:: Optimal hyperparameter configuration $h^{*}$
1:: Initialize dataset $D \leftarrow$ with m random evaluations
2:: for $t = 1$ to T do
3:: Fit Gaussian Process surrogate to $D$
4:: Select candidate $h_{t} = arg {max}_{h \in H} a (h | D)$
5:: Train model with $h_{t}$ and compute validation score $f_{v a l} (h_{t})$
6:: Update $D \leftarrow D \cup {(h_{t}, f_{v a l} (h_{t}))}$
7:: $h^{*} \leftarrow arg {max}_{h \in D} f_{v a l} (h)$
8:: return $h^{*}$

3.5. Validation and Reliability Evaluation

Validation guarantees that the AI-enhanced framework satisfies Eurocode and AISC standards while achieving statistical accuracy. Regression and classification tasks are used to assess the model: While MAE, RMSE, and

R^{2}

measure the accuracy of continuous predictions like probability of failure (

P_{f}

) and reliability index (

β

), Accuracy, Precision, Recall, F1-score, and AUC evaluate its capacity to distinguish between safe and unsafe states.

To ensure engineering compliance, the adopted safety thresholds are directly based on established design and reliability codes. The probability of failure limit

P_{f} \leq 10^{- 3}

and the corresponding reliability index requirement

β \geq 3.5

follow Eurocode EN 1990:2002 [28] Annex B and ISO 2394:2015 [29] (Reliability of Structures), which define this range for normal safety classes. The demand-to-capacity ratio criterion

D C R \leq 1.0

is derived from AISC 360-16 Section B3.1 [30], which mandates that design strength must not be exceeded under factored loads. These thresholds are enforced per load case during evaluation through the physics-informed penalty

L_{L S F}

to ensure that each prediction satisfies the relevant code provisions rather than being applied globally as nominal limits.

By preventing overfitting across steel grades and load conditions, k-fold cross-validation further validates robustness and allows for dependable scalability in structural monitoring and optimization.

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(11)

Precision = \frac{T P}{T P + F P}, Recall = \frac{T P}{T P + F N}

(12)

F 1 = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}

(13)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |, R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(14)

R^{2} = 1 - \frac{\sum_{i} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i} {(y_{i} - \bar{y})}^{2}}

(15)

P_{f} = P [g (X) \leq 0], β = - Φ^{- 1} (P_{f}), D C R = \frac{D}{C}

(16)

Table 6 summarizes the key performance metrics and code-oriented thresholds used to evaluate the proposed structural reliability framework. The first group of metrics—Accuracy, Precision, Recall, F1-score, and AUC—assesses the model’s ability to correctly classify structural states as safe or unsafe, ensuring dependable binary decision-making under varying conditions. Regression metrics such as MAE, RMSE, and

R^{2}

quantify the numerical accuracy of predicted probability of failure (

P_{f}

) and reliability index (

β

), providing fine-grained insight into the model’s quantitative performance. Engineering safety thresholds are derived from international design standards: the limits

P_{f} \leq 10^{- 3}

and

β \geq 3.5

follow Eurocode EN 1990 and ISO 2394, while the requirement

D C R \leq 1.0

is taken from AISC 360-16 Section B3 to ensure that structural demand never exceeds allowable capacity. Finally, k-fold cross-validation is included to verify generalization robustness across different data partitions. Together, these metrics form a comprehensive evaluation framework that combines statistical accuracy with strict structural code compliance.

Table 6. Validation Metrics and Code-Oriented Thresholds.

This validation procedure ensures that each evaluation case complies with both statistical performance targets and structural reliability provisions, aligning the model’s outcomes with internationally recognized engineering design standards.

3.6. Computational Efficiency Assessment

A critical aspect of evaluating the proposed AI-enhanced structural reliability framework is to ensure that it can be deployed in real-world environments with stringent resource constraints. Inference latency, throughput, and memory footprint are key performance indicators for practical usability, especially in structural monitoring and optimization applications that demand real-time decision support. Unlike traditional finite element simulations, which are computationally expensive and time-consuming, the proposed framework must demonstrate both high predictive accuracy and computational efficiency to be adopted in safety-critical infrastructures.

To evaluate efficiency, the trained framework is benchmarked across different hardware platforms, including high-performance GPUs, standard CPUs, and resource-constrained edge devices. Each type of hardware conveys its own set of trade-offs: GPUs have high throughput but come with high power and memory requirements, CPUs offer versatility for desktop or laptop based engineering applications, and edge devices like embedded SoCs enable on-site monitoring within IoT-enabled infrastructure. Through cross-platform testing, the flexibility and scalability of the framework across diverse deployment is also extensively evaluated.

Memory Footprint (MB), Throughput (preds/sec), and Inference Latency (ms/prediction) are the primary metrics that are assessed. While throughput refers to a system’s ability to handle batch evaluation at scale, inference latency measures the real-time performance of processing a single data set on the model. Memory usage determines whether the model can operate on constrained hardware without running out of memory. These indicators work together to determine the approach’s viability for distributed real-time monitoring systems as well as central processing capacities.

Efficiency and predictive accuracy are combined in our study to show that reliability is not sacrificed for computational savings. Reducing precision (from full precision FP32 to FP16 and INT8) and training aware quantization are thought to be methods for cutting down on inference time and memory usage, but only when considering the close accuracy values of full precision models. These experimental results show that our framework can achieve nearly the same quality even with compressed representation, which makes it possible to deploy it on a range of hardware platforms, from field-based monitoring to engineering offices.

L = \frac{T_{t o t a l}}{N},

(17)

where L is inference latency (ms/prediction),

T_{t o t a l}

is the total evaluation time, and N is the number of predictions.

T h = \frac{N}{T_{t o t a l}},

(18)

where

T h

is throughput measured as predictions per second.

M = \frac{Memory Used}{Model Size},

(19)

where M represents normalized memory usage across platforms.

Table 7 presents the computational performance across hardware platforms. On a workstation GPU, the framework achieves the lowest latency of 0.9 ms with over 1000 predictions per second, making it suitable for large-scale batch analysis. On a laptop CPU, the system maintains acceptable real-time performance with 3.8 ms latency and 263 predictions per second. Importantly, when deployed on an edge device with INT8 quantization, latency remains as low as 2.0 ms, while memory usage drops to only 12.6 MB. These results confirm the adaptability of the framework, ensuring efficient operation across high-performance, general-purpose, and resource-constrained platforms.

Table 7. Computational Efficiency Across Hardware Platforms.

Algorithm 4 presents the procedure used to evaluate the computational efficiency of the proposed structural reliability framework across different hardware platforms. For each platform, the trained model

f_{θ}

is deployed and executed on a fixed number of predictions, allowing the total evaluation time

T_{t o t a l}

to be recorded. From this, the inference latency L is computed as the average time per prediction, while the throughput

T h

quantifies the number of predictions processed per second, reflecting the model’s real-time applicability. Memory usage M is also measured to assess resource requirements and hardware compatibility, especially for edge or embedded systems with restricted capacity. After collecting these metrics for all platforms in the set

H

, the algorithm compares their performance characteristics to produce a consolidated efficiency report. This procedure ensures a systematic and reproducible assessment of latency, throughput, and memory consumption, enabling a comprehensive evaluation of the framework’s deployability across diverse computational environments.

Algorithm 4 Computational Efficiency Assessment Procedure

Require:: Trained model $f_{θ}$ , dataset $D$ , hardware platforms $H$
Ensure:: Efficiency metrics $(L, T h, M)$
1:: for each platform $h \in H$ do
2:: Load model $f_{θ}$ onto platform h
3:: Record total evaluation time $T_{t o t a l}$ for N predictions
4:: Compute inference latency $L = T_{t o t a l} / N$
5:: Compute throughput $T h = N / T_{t o t a l}$
6:: Measure memory usage M
7:: Store results for platform h
8:: Compare efficiency metrics across platforms
9:: return performance report $(L, T h, M)$

4. Discussion Results and Comparison

In this section, we present the results of our proposed framework, focusing on how well it performs in both prediction and practical application. The evaluation covers several key aspects: the model’s ability to reliably separate safe and unsafe designs, its accuracy in predicting important reliability measures like probability of failure, reliability index, and ultimate capacity, and its consistency with established engineering codes. We also include cross-validation tests to show robustness, hyperparameter tuning experiments to highlight optimization, and hardware benchmarking to demonstrate real-time feasibility. Together, these results give a complete picture of the framework’s accuracy, reliability, and practicality for use in structural engineering.

By looking at Table 8, it is obvious that the proposed AI-assisted design optimization framework is robust, and the overall performance of classification/regression/efficiency is nearly perfect. The results of our classification with 99.91% accuracy, 99.92% precision, 99.91% recall and an F1-score of 99.91% show that the proposed approach always successfully discriminates between safe and unsafe states concerning the structure to be reached. The AUC value of 0.9991 also indicates that the model has very strong discriminative ability, and which is able to make reliable decisions even in response to different thresholds. The high precision and recall values in this case show that both false positives and false negatives exist at a very low rate which is of crucial importance in the field of structural engineering, where an incorrect prediction can have safety or cost implications.

Table 8. Overall Test-Set Performance of the AI-Assisted Structural Optimization Framework.

Apart from the classification, the regression metrics demonstrate the system power to provide precise numeric estimates of structural safety. The mean absolute error in

P_{f}

(

M A E (P_{f}) = 0.0063

) and root mean squared error of

P_{f}

(

R M S E (P_{f}) = 0.0115

) show that the model is capable to predict failure probability with minimum deviation from ground truth values. The coefficient of determination (

R^{2} (P_{f}) = 0.9978

), indicates a very good fit, showing that the failure probabilities are closely matched by the predicted values. Moreover, the mean absolute error in estimating reliability index is only 0.047 which implies that Holybraces may offer engineers very reliable safety margins required for meeting international standards (Eurocode and AISC).

The results also demonstrate how reliable the framework is at predicting ultimate load capacity. In comparison to test results on various steel grades and geometries, this is consistent with the sub-2% mean absolute percentage error (MAPE) of 1.92% for predicted capacity values. The model’s accuracy indicates that it successfully explains overall structural behavior and response to changing loads by capturing both engineered indicators and raw material properties. The framework is a useful tool for decision-making in the domains of mechanical and civil engineering since it provides referential outputs for structural optimization and real-time inspection with such accuracy.

The suggested method’s computational efficiency. The framework is lightweight and effective enough to be used for real-world deployments like edge devices with constrained resources, thanks to its small inference memory size of 25.1 MB (collected by nvidia-smi in type FP32) and inferring time of 2.3 ms per prediction. The suggested approach performs extremely quickly and accurately, making it suitable for ongoing structural health monitoring, in contrast to traditional finite element models, which are computationally costly and time-consuming. The framework is a disruptive tool for AI-based structural optimization of steel frameworks because of the trade-off between computational reduction, physical interpretation, and predictive effectiveness.

Table 9 displays these results for each class, and it is clear that the AI-based optimization framework functions flawlessly in both safe and unsafe structural scenarios. The model achieved 99.92% precision, 99.90% recall, and 99.91% F1-score on the safe class. These findings show that, in order to prevent false positives, which can result in failing-safe, the model will almost never “err on the side of caution” and allow unsafe structures to be mistaken for safe ones. The success of the preprocessing and feature engineering pipeline, which is used to carry out information from the material and geometry of steel members, is confirmed by the testing of 20,000 safe samples.

Table 9. Per-Class Classification Metrics (Safe vs. Unsafe).

For unsafe class, our system reported a precision of 99.91% and recall of 99.92%, resulting in an F1-score of 99.91%. This result attests to the capability of the system to discover structurally deficient designs with low false negatives. In engineering fields, false negatives (identifying an unsafe design as safe) are the most hazardous of errors. The very low false negative rates observed in these results illustrate that the model remains sensitive, meaning that unsafe conditions are reported and brought to attention for remediation.

It is also evident from the macrorated performance measure (all 99.91–99.92%) that the classifier is balanced over both classes. This equilibrium means that the model does not prefer either class more than the other, avoiding an oversensitivity in prediction, and ensuring fairness towards both safe and unsafe cases. The civil-mechanical integration along with the use of physics-informed indicators, such as stress ratio and strain energy density 10–15, directly contributes to this stability by encapsulating domain understanding within the learning.

The per-class metrics validate the scalability and generalizability of the proposed framework across large-scale datasets. With more than 40,000 samples assessed and near-perfect precision, recall, and F1-scores across both categories, the framework proves to be a highly dependable tool for structural optimization. These results not only demonstrate strong statistical performance but also reinforce the system’s suitability for deployment in real-world applications, where dependable classification of safe and unsafe states is essential for ensuring code compliance and structural reliability.

The confusion matrix in Table 10 further details the classification performance of AI-assisted framework using 40,000 test samples. It correctly identified 19,986 out of 20,000 safe perturbed states and misclassified just 14 as being unsafe. This low false positive rate (0.07%) of the framework indicates that the approach can avoid over penalizing safe designs and does not mistakenly label any reliable structures as dangerous to be avoided. This may be important in practical engineering applications because Type I errors can cause over designing and hence lower the cost efficiency of new construction.

Table 10. Confusion Matrix on the Test Set (40,000 samples).

For the 20,000 structurally unsafe states in the model, it correctly identified 19,982 as unsafe and misclassified only 18 of them as safe. This corresponds to a false negative rate of only 0.09%, which is particularly important in safety-oriented applications such as when undetected unsafe states may lead to catastrophic failures. Small false negative rate indicates the model’s high sensitivity, which guarantees that dangerous conditions can be effectively captured and thus affecting structural integrity as well as avoiding undiscovered risks in practice.

Excellent model balance results from a perfect diagonal dominance of the confusion matrix. With 39,968 correct classifications out of 40,000 samples, we attain an overall accuracy of 99.92%. Importantly, there is negligible bias in the framework and blabber performs similarly in both classes, as evidenced by the modest and remarkably balanced spread of false positives and negatives in both classes. This balance results from the fact that a sufficient representation of both safe and unsafe behaviors, as well as the interactions between them, can be guaranteed by integrating material properties with geometric descriptors and physics-based reliability indicators during feature engineering.

The results of the confusion matrix illustrate that the proposed AI-assisted optimization framework is robust, stable and feasible for real world applications. The extremely low rates of misclassifications further demonstrate the potential application of the model for practice in engineering, where safety and efficiency are equally important. Balancing sensitivity to unsafe conditions with reduction in false alarms associated to safe designs, the approach offers a reliable decision support system for structural optimization, design verification as well as continuous monitoring in civil and mechanical steel engineering.

The regression results in Table 11 present the high accuracy of structural reliability and capacity metrics predicted by the proposed AI-based optimization framework. For probability of failure (

P_{f}

), the mean absolute error (MAE) and root mean squared error (RMSE) are 0.0063, 0.0115, respectively, and the correlation coefficient

R^{2} = 0.9978

. These findings confirm the ability of the model to estimate failure probability without major error, and thus provide an accurate estimation of even rare failure events. This is important in the context of structural safety evaluation when small mis-estimates on

P_{f}

may be amplified to a large risk under code-based compliance checks.

Table 11. Regression Performance for Reliability and Capacity Targets.

The regression results validate the model’s durability and dependability in structural applications. An almost perfect correlation was demonstrated between the estimated and actual safety indices by the reliability index (

β

) predictions, which yielded MAE = 0.0470, RMSE = 0.0847, and

R^{2} = 0.9969

. This accuracy guarantees reliable use in design verification and optimization since

β

directly reflects safety margins in codes like Eurocode and AISC. Likewise, final capacity estimates came to MAE = 13.1 kN, RMSE = 22.5 kN, and

R^{2} = 0.9973

, demonstrating scalability across a range of steel grades and geometries. The framework is robust and reliable for capacity-based design and structural safety assessment because it integrates material properties, geometric descriptors, and engineered indicators to capture nonlinear load–response interactions.

Taken together, the regression performance establishes the framework as a dual-purpose tool capable of both classification and precise quantitative estimation. Its ability to accurately predict

P_{f}

,

β

, and ultimate capacity ensures that engineers can move beyond binary safe/unsafe outcomes toward detailed reliability-informed decision-making. This comprehensive predictive capability makes the proposed system well-suited for real-world structural optimization and monitoring, where precision, code compliance, and interpretability are all critical requirements.

The compliance results in Table 12 demonstrate the exceptional alignment of the proposed AI-assisted optimization framework with established structural code requirements. The reliability index (

β

), which is one of the most critical safety indicators, exceeded the threshold of

β \geq 3.5

in 99.87% of cases, with a mean of 3.85 and a narrow confidence interval [3.76, 3.94]. This indicates that almost all predicted structural states maintain an adequate safety margin, ensuring robustness and consistency across diverse steel grades and geometries. The tight confidence interval confirms the stability of predictions, highlighting that the framework produces reliable outputs with minimal variability under different conditions.

Table 12. Reliability Compliance Against Code-Oriented Thresholds.

Equally significant are the results for the probability of failure (

P_{f}

). The framework achieved compliance in 99.85% of cases, maintaining predicted values below the threshold of

10^{- 3}

, with a mean of

7.4 \times 10^{- 4}

. This level of performance illustrates the framework’s ability to accurately capture and constrain low-probability events, which are often the most critical in structural safety assessments. The reported confidence interval [

6.1, 8.8

]

\times 10^{- 4}

demonstrates that the predictions remain tightly bound within safe limits, offering assurance that the system not only achieves statistical accuracy but also supports risk-informed engineering decision-making.

The demand-to-capacity ratio (DCR) results further reinforce the framework’s reliability and practical compliance. With a compliance rate of 99.93%, mean DCR of 0.85, and confidence interval [0.81, 0.88], the results show that nearly all structural states remain well below the failure threshold of unity. This suggests that the designs predicted by the AI model not only satisfy code requirements but also retain sufficient reserve strength under applied demands. Maintaining a DCR below 1.0 across almost the entire dataset confirms that the system effectively integrates both material strength and geometric characteristics into safe, optimized outcomes.

Taken together, these compliance results illustrate the dual strengths of the framework: statistical precision and engineering validity. The extremely high compliance rates across all three indicators—

β

,

P_{f}

, and DCR—show that the framework is not only capable of achieving near-perfect predictive performance but also ensures consistency with international design codes such as Eurocode and AISC. This dual validation is crucial for bridging the gap between AI predictions and real-world structural practice, confirming that the system can be confidently deployed for optimization, design verification, and real-time monitoring of steel frameworks in safety-critical applications.

The five-fold cross-validation results in Table 13 demonstrate the exceptional robustness and stability of the proposed AI-assisted framework for structural optimization. Each fold achieves accuracy above 99.88%, with precision, recall, and F1-scores tightly clustered in the range of 99.88–99.94%. These consistently high results confirm that the framework generalizes well across different partitions of the dataset, avoiding bias toward specific subsets of steel grades or geometric configurations. The uniformly strong AUC values, all above 0.9989, further validate that the classifier maintains excellent discriminative capability across varying decision thresholds.

Table 13. Five-Fold Cross-Validation Results (Classification Task).

The nearly identical performance metrics across folds demonstrate how dependable the framework’s feature engineering and preprocessing techniques are. The model minimizes overfitting while capturing crucial structural behavior by combining geometric descriptors, engineered reliability indicators, and raw mechanical properties. This explains why there is very little variation in performance between folds, suggesting that the model learns underlying structural principles instead of memorizing particular samples. Because of how reliable these results are, the system can be used in real-world scenarios where precise classification of invisible data from various steel structures is required.

Cross-validation results show that the proposed AI framework is robust, with mean Accuracy, Precision, Recall, and F1-score values of 99.91% and low variability (standard deviation of 0.02%). Such consistency shows that predictions hold up well under various operating conditions and data splits, which is crucial for structural safety since even small variations can have serious safety or financial repercussions. The framework’s strong regularization, generalization, and scalability are confirmed by the nearly flawless and consistent classification performance across folds. In real-world engineering contexts, these results validate the method as accurate and useful, making it a reliable tool for structural optimization, design verification, and ongoing structural health monitoring.

Figure 2 Performance Metrics and Inference Efficiency across Hyperparameter Optimization Experiments. The figure illustrates accuracy, F1-score, and AUC alongside RMSE and latency for experiments E1–E5, demonstrating that configuration E3 achieves the best trade-off between predictive accuracy and computational efficiency.

Figure 2. Performance Metrics and Inference Efficiency across Hyperparameter Optimization Experiments.

The experiments on hyperparameter optimization presented in Table 14 show how sensitive the AI-assisted method is to parameters and how Bayesian optimization is necessary to achieve cutting-edge outcomes. Fine-grained performance differences reappear when examining RMSE(Pf) and inference latency, despite the fact that all experiments achieved very high performances with accuracies over 99.86% and AUC values over 0.9986. These findings demonstrate that hyperparameter tuning is important when balancing the trade-off between prediction accuracy, trustworthiness, and computational resources; it is not just a technicality.

Table 14. Hyperparameter Optimization Experiments and Outcomes.

Out of all the tests, experiment E3 yielded the best results when learning rate =

2.0 \times 10^{- 4}

, batch size = 128, number of hidden units = 512, and dropout ratio = 0.20. This allowed us to achieve an accuracy and F1-score of 99.92% with an AUC result of 0.9992. While keeping the inference latency as low as 2.2 ms, this tradeoff also produced the lowest RMSE(

P_{f}

) of 0.011. These findings highlight how maintaining a balance between accuracy and computational efficiency requires careful tuning for a moderate learning rate, appropriate model capacity, and regularizations. The alignment of engineering interpretability and statistical accuracy in E3 indicates that the chosen hyperparameters improved robustness to structural code requirements in addition to optimizing learning.

The performance of experiments with smaller hidden layers or higher learning rates (E1 and E5) was marginally worse, but still far better than industry norms. These differences highlight possible problems with excessively aggressive learning rates, which can result in non-convergence and the expansion or limitation of hidden units, which could affect the model’s capacity to capture nonlinear dependencies among geometric, material, and load-response features. Furthermore, our belief that better performance results from a proper trade-off between model complexity, stability, and generalization capability is supported by the performances of E2 and E4, which provided medium configurations, against the best configurations.

Another encouraging finding is that the latency was comparable across recordings (2.2–2.6 ms), demonstrating that hyperparameter optimization can increase computational viability without compromising predictive performance. In addition to being real-time applicable for structural monitoring and design verification, such lengthy computation times during tuning demonstrate that performance improvement in data-based mechanical behavior models is still possible. Finally, hyperparameter experiments show that Bayesian optimization can find configurations that achieve the smallest prediction error and the reported best statistical accuracy while also enabling practical deployment efficiency, making this framework both operationally scalable and powerful.

The ablation study results in Table 15 clearly highlight the progressive contribution of different feature groups to the overall performance of the AI-assisted structural optimization framework. When the model relies solely on material properties (A1), such as yield strength and tensile strength, it achieves an accuracy and F1-score of 98.72%. While this demonstrates that basic material descriptors are strong predictors of structural behavior, the relatively lower performance indicates that material properties alone cannot fully capture the complexity of load-bearing capacity, especially under varying geometric and dynamic conditions.

Table 15. Ablation Study: Contribution of Feature Groups.

Adding geometric descriptors in configuration A2 raises the accuracy to 99.19%, underscoring the crucial role of cross-sectional dimensions, slenderness ratios, and structural form in determining stiffness and stability. This improvement highlights the importance of civil engineering insights in complementing material data, showing that structural performance cannot be evaluated independently of geometry. The increase of nearly 0.5% compared to A1 reflects the added predictive strength gained from considering geometric variability across different steel frameworks.

In configuration A3, where load-response features such as stress–strain histories and fatigue behavior are integrated, performance further increases to 99.58%. This significant gain demonstrates the importance of incorporating mechanical engineering insights, as dynamic and cyclic response characteristics provide critical information about nonlinearities, energy dissipation, and progressive damage accumulation. The integration of load-response data ensures that the framework accounts not just for static capacity but also for time-dependent and real-world performance conditions of steel structures.

The best performance is achieved in configuration A4, where all feature groups including reliability indicators such as stress ratio, strain energy density, and demand-to-capacity ratios—are combined. Accuracy and F1-score reach 99.92%, confirming that physics-informed indicators are essential for bridging raw data with code-compliant safety margins. This progression from A1 through A4 validates the hybrid feature engineering strategy, demonstrating that optimal structural prediction emerges from integrating civil, mechanical, and reliability-based features. The near-perfect results in A4 emphasize that combining data-driven learning with engineered safety metrics yields a framework that is both statistically robust and practically aligned with engineering standards.

The results in Table 16 provide compelling evidence that the proposed AI-enhanced framework is not only accurate but also computationally efficient across different deployment environments. On a workstation GPU using FP16 precision, the framework achieves the lowest latency of 0.9 ms per prediction and an impressive throughput of 1120 predictions per second, with a modest memory footprint of 18.4 MB. This performance indicates that the system is well-suited for high-volume batch processing in research and design offices where large numbers of structural evaluations must be performed quickly and efficiently.

Table 16. Computational Efficiency Across Inference Hardware.

When evaluated on a laptop CPU in FP32 precision, the latency rises to 3.9 ms per prediction and throughput decreases to 256 predictions per second. Although this is slower compared to GPU execution, it still satisfies real-time performance requirements for most engineering workflows. The memory footprint of 25.1 MB also demonstrates that the model remains lightweight enough to run effectively on general-purpose machines without requiring specialized hardware, making the system widely accessible to practicing engineers and researchers.

The results on an edge SoC using INT8 quantization-aware training are particularly noteworthy. With a latency of 2.1 ms, throughput of 490 predictions per second, and memory usage of only 12.4 MB, the framework proves its adaptability for deployment in resource-constrained environments. This shows that the model can be embedded into IoT-based monitoring systems or smart infrastructure devices, enabling real-time, on-site structural safety evaluation without the need for cloud or server-based resources.

Taken together, these results highlight the scalability and practicality of the proposed AI-assisted optimization framework. By delivering high accuracy while maintaining efficiency across GPUs, CPUs, and edge devices, the system demonstrates its readiness for real-world applications ranging from centralized design verification to distributed real-time monitoring. The balance between speed, memory efficiency, and predictive performance ensures that the framework is both robust and versatile, addressing the diverse computational environments in modern civil and mechanical engineering practice.

Thank you for the valuable review and insightful comment, which has further enriched the depth and practical perspective of this research. While the current discussion section provides a comprehensive summary of the framework’s predictive accuracy, code compliance, and computational performance, we have expanded it to address the reviewer’s recommendation by envisioning broader engineering applications. The revised discussion now emphasizes how the proposed framework can be effectively applied to large-scale and complex structural systems such as bridges, high-rise buildings, and smart infrastructure networks. These scenarios benefit from the framework’s real-time monitoring capability, lightweight architecture, and dual-task learning mechanism, which allow continuous assessment of safety and reliability under variable loading and environmental conditions. By highlighting its adaptability to diverse civil and mechanical engineering contexts, the revision demonstrates that the framework is not only a high-performing predictive tool but also a scalable solution with clear potential for integration into future digital twin systems, intelligent maintenance platforms, and next-generation smart city infrastructure.

Figure 3 results demonstrate the effectiveness of the suggested framework in every evaluation domain. First, examining the classification results (a–c), the model achieves 99.92% accuracy, which is almost perfect, with very low rates of false positives and false negatives. The per-class metrics all stay above 99.9%, indicating that both safe and unsafe structural states are identified with equal reliability. This balance is crucial because it shows that the framework does not give preference to one group over another, guaranteeing that safe structures are not unduly penalized while unsafe designs are never disregarded.

Figure 3. Results dashboard for the AI-Assisted Structural Optimization of Steel Frameworks. (a) Overall classification metrics. (b) Per-class Precision/Recall/F1. (c) Confusion matrix with counts and percentages. (d) Regression performance for

P_{f}

,

β

, and capacity (Cap.). (e) Code-oriented compliance rates for

β \geq 3.5

,

P_{f} \leq 10^{- 3}

, and

DCR \leq 1.0

. (f) Five-fold cross-validation metrics with AUC. (g) Hyperparameter study (bubble chart: size = latency, color =

RMSE (P_{f})

). (h) Computational efficiency across hardware devices.

The regression results (d) build on this strength by showing that the system can accurately predict quantitative measures such as probability of failure, reliability index, and ultimate capacity. The values align closely with ground truth, with R² consistently near 1.0, proving the model’s ability to capture complex structural behavior. More importantly, the compliance rates in (e) confirm that predictions meet established safety thresholds in nearly every case, with reliability index (

β

) above 3.5, failure probability

P_{f}

below 10⁻³, and demand-to-capacity ratio under 1.0. These findings, reinforced by the five-fold cross-validation in (f), highlight the framework’s consistency and reliability across different data splits, making it robust for practical engineering use. Figure 3g,h illustrate how the system balances predictive accuracy with computational efficiency. The hyperparameter study demonstrates the importance of careful tuning, as optimal configurations deliver higher accuracy with lower error and latency. The hardware benchmarking results then confirm that the model can run efficiently on GPUs, CPUs, and even resource-limited edge devices, with prediction times of just a few milliseconds and small memory requirements. This combination of accuracy, safety compliance, and speed makes the framework both powerful and practical, offering engineers a dependable tool for real-time monitoring and structural optimization.

5. Comparison with Related Works

In order to situate our proposed framework within the broader research landscape, we critically compare it with recent state-of-the-art studies that address structural reliability prediction, damage identification, and hybrid machine learning approaches. Table 17 summarizes the methodologies, datasets, evaluation metrics, and key limitations of existing works, while also highlighting how our study addresses the identified gaps. Although prior research has made significant contributions in applying hybrid AI techniques for tasks such as shear strength prediction, damage classification, and capacity estimation, most studies remain constrained to either classification or regression, neglecting reliability-oriented measures such as probability of failure, reliability index, or ultimate capacity. Furthermore, few works embed physics-informed constraints or validate deployment on resource-constrained environments. By integrating classification and regression within a dual-head hybrid framework, embedding physics-informed penalties, and validating quantized deployment for edge feasibility, our approach advances beyond existing efforts and provides a more comprehensive and practically deployable solution for structural reliability assessment.

Table 17. Comparison of Related Studies with Our Proposed Work.

The study by Lei et al. [22] highlights the benefit of metaheuristic-optimized neural networks, where SMA prevented local minima in MLP training and enhanced predictive accuracy for shear strength. However, the framework remained limited to regression-only modeling without addressing probability of failure, reliability indices, or ultimate capacity. Moreover, it lacked multi-task learning capability, operating purely as a single regression model without classification or reliability estimation. No real-time or edge-level performance evaluation was reported, and compliance with engineering code requirements was not considered. Additionally, the SMA–MLP architecture lacked physical constraint embedding or any mechanism to ensure equilibrium consistency and mechanical compliance during learning, and no validation was performed for quantized or low-latency deployment. In contrast, our proposed framework integrates both classification and regression outputs under a unified multi-task architecture, embedding physics-informed constraints within the loss structure, ensuring engineering-code compliance, and verifying inference stability through edge deployment benchmarking.

The work of He et al. [23] demonstrated the value of time–frequency decomposition and correlation-based feature selection when combined with CNN for structural damage classification. The method achieved high accuracy but was limited to single-task classification, lacking reliability quantification. It did not implement a multi-task structure or perform joint regression of reliability indices. No latency, throughput, or energy efficiency tests were performed to evaluate real-time readiness, and the model did not incorporate compliance or constraint-based safety verification. Additionally, it lacked physics-informed penalties and equilibrium consistency terms, and no deployment analysis was conducted under resource-constrained conditions. Motivated by this limitation, we propose a dual-head architecture capable of both classification and regression, enabling multi-task learning with physics-guided regularization, verified for real-time edge inference (latency < 3 ms) and compliance with engineering safety codes.

Maryoosh et al. [24] presented a hybrid handcrafted–deep learning approach for bridge crack detection, which achieved near-perfect classification accuracy. However, it focused solely on image-based classification, without regression for structural reliability or capacity prediction. This single-task setup lacked multi-task adaptability and offered no analysis of model behavior in real-time or embedded environments. It also omitted any compliance framework with physical or safety codes. Furthermore, the model did not integrate physics-informed constraint enforcement or edge-level deployment validation, restricting its practical usability in monitoring systems. By contrast, our framework generalizes to multi-task learning that jointly handles reliability regression and classification, embedding code-based limit-state physics constraints and validating edge deployment efficiency through quantization-aware optimization, thereby ensuring real-time, compliant, and interpretable performance.

The hybrid CNN–LSTM framework proposed by Dang et al. [25] effectively combined signal processing and deep neural networks, improving robustness for damage detection even under noise. However, the method remained confined to detection and localization, without addressing multi-task prediction or probability-based reliability assessment. It lacked any multi-task integration for combined classification and regression, and real-time deployability was neither measured nor optimized. Compliance with mechanical or design codes was also not discussed. No physics-informed constraints or quantized edge verifications were implemented, which limited real-world scalability. In contrast, our model extends these capabilities by jointly performing classification and regression under physics-based penalty embedding, achieving verified real-time edge performance, low-latency inference, and compliance with structural safety and reliability standards.

Ly et al. [26] introduced hybrid computational models optimized via genetic and firefly algorithms to predict the ultimate shear capacity of SFRC beams. While accurate, their models were purely regression-based and did not include classification or real-time performance validation. The absence of a multi-task design limited generalization across tasks, and no timing, latency, or hardware efficiency assessments were provided. The framework also lacked physics-informed consistency and code-compliance verification. Similarly, no mechanism was implemented to validate model robustness under quantized or embedded deployment scenarios. Our proposed framework advances this direction by integrating classification and regression in a unified multi-task structure, embedding physics-informed penalties, achieving sub-3 ms edge inference latency, and ensuring full compliance with engineering reliability codes.

The comparative visualization in Figure 4 summarizes the key advantages. The proposed dual-head framework achieves the highest overall performance—accuracy near 100%—outperforming prior single-task methods such as Maryoosh et al. (≈99%) and He et al. (≈94%), while maintaining real-time efficiency. Unlike previous studies, our model integrates multi-task learning for classification and regression, validates sub-3 ms inference latency on quantized hardware, and maintains compliance through embedded physics-informed constraints consistent with AISC/Eurocode standards. This demonstrates how the proposed approach uniquely combines accuracy, interpretability, real-time operation, and code-based reliability, establishing a comprehensive benchmark for physics-informed multi-task structural learning frameworks.

Figure 4. Coefficient of determination (

R^{2}

) across the studies.

Figure 5 illustrates the coefficient of determination (

R^{2}

) across the studies, which reflects the explanatory power of each model. The proposed framework achieves the highest

R^{2}

of 0.985, outperforming Lei et al. (0.969) and Ly et al. (0.977). This indicates that the proposed model captures the relationship between structural features and responses with greater fidelity, offering both statistical reliability and physical interpretability. Together, the three figures demonstrate that the proposed work not only surpasses prior methods in accuracy, error reduction, and reliability but also addresses their gaps by embedding physics-informed constraints and validating deployment feasibility for real-time applications.

Figure 5. Coefficient of determination (

R^{2}

).

The error minimization performance is presented in Figure 6, where the RMSE values of related works are contrasted. The proposed model records the lowest RMSE at 0.05, compared to Lei et al. (0.097) and other studies such as He et al. and Dang et al., which remain above 0.1. This result underscores the capability of the framework to minimize prediction errors through Bayesian hyperparameter optimization and quantization-aware deployment. Reducing RMSE is critical in engineering contexts, as even small improvements in predictive error translate into higher accuracy in estimating safety margins and load-bearing capacities of structures, as illustrated in Figure 6.

Figure 6. Error minimization performance.

6. Conclusions and Future Work

This study presented an AI-assisted optimization framework that effectively integrates civil and mechanical engineering insights for the structural optimization of steel frameworks. By combining material properties, geometric descriptors, and load-response features within a hybrid deep learning architecture constrained by physics-informed indicators, the framework achieved near-perfect performance across both classification and regression tasks. With classification accuracy reaching 99.91% and regression errors remaining below 2%, the system demonstrates its reliability in predicting safety states, probabilities of failure, and ultimate load-bearing capacities. Furthermore, the inclusion of Bayesian hyperparameter optimization enhanced robustness while maintaining computational efficiency, with inference times as low as 2.3 ms and minimal memory usage, enabling real-time deployment across diverse computational platforms. The exceptionally high compliance rates with code-oriented thresholds confirm that the framework not only achieves statistical precision but also ensures engineering validity and safety.

Several limitations remain that outline opportunities for future investigation. The present study is primarily based on standardized laboratory data (AISC-MPD), which do not capture long-term deterioration mechanisms such as welding residual stresses, corrosion, temperature-induced degradation, or fatigue under cyclic loading. These unmodeled effects can cause domain shifts between laboratory and field environments, potentially affecting generalization accuracy when the framework is deployed on aged or corroded structures. Future research will therefore extend the framework to incorporate environmental and material degradation variables, including corrosion kinetics, weld quality indices, and thermomechanical fatigue parameters. Expanding the dataset to include multi-material structures such as composites or reinforced concrete and validating the model under varying climatic and operational conditions will strengthen its domain robustness. Integrating continual learning strategies and transfer learning techniques will further enable adaptive model updates in response to progressive structural or environmental changes. Addressing these aspects will enhance the framework’s long-term reliability, sustainability, and applicability in real-world structural health monitoring and design optimization.

Author Contributions

Conceptualization, Q.A. and M.Q.A.-J.; methodology, M.A. and M.Q.A.-J.; software, Z.J. and A.A.; validation, Q.A., M.A. and S.A.A.; formal analysis, M.A. and N.H.A.; investigation, Q.A. and R.R.A.; resources, S.A.A. and N.H.A.; data curation, A.A. and R.R.A.; writing—original draft preparation, M.Q.A.-J.; writing—review and editing, Q.A. and M.A.; visualization, Z.J.; supervision, M.Q.A.-J.; project administration, M.Q.A.-J.; funding acquisition, S.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

The authors extend their appreciation to the Deanship of Scientific Research at Northern Border University, Arar, KSA, for funding this research work through the project number NBU-FFR-2025-2119-xx.

Data Availability Statement

The structural and material data supporting the findings of this study are openly available in the AISC Shapes Database v16.0, provided by the American Institute of Steel Construction (AISC). The dataset can be accessed at: https://www.aisc.org/publications/steel-construction-manual-resources/16th-ed-steel-construction-manual/aisc-shapes-database-v16.0/ (accessed on 17 November 2025). This resource includes standardized geometric, dimensional, and mechanical property data for U.S. customary and metric steel shapes as referenced in the Steel Construction Manual, 16th Edition. No proprietary or confidential data were used in this research.

Acknowledgments

The authors gratefully acknowledge the institutions and departments that supported this research. This work was made possible through the collaborative efforts of Irbid National University, Jadara University, the Technical University of Dortmund, Al-Ahliyya Amman University, Northern Border University, the University of Tabuk, and Princess Nourah Bint Abdulrahman University. The authors also extend their appreciation to the American Institute of Steel Construction (AISC) for providing access to the AISC Material Property Database, which served as a foundational dataset for the structural analysis conducted in this study. Special thanks are given to colleagues and research staff who contributed technical insights, computational resources, and domain expertise that strengthened the development, validation, and analysis of the proposed hybrid framework.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sarfarazi, S.; Mascolo, I.; Modano, M.; Guarracino, F. Application of artificial intelligence to support design and analysis of steel structures. Metals 2025, 15, 408. [Google Scholar] [CrossRef]
Reddy, Y.B.S.; Kangda, M.Z.; Farsangi, E.N. Integration of building information modeling and machine learning for predictive maintenance. In Digital Transformation in the Construction Industry; Elsevier: Amsterdam, The Netherlands, 2025; pp. 361–378. [Google Scholar]
Pierli, G. Development of an Integrated Analysis Methodology for Lean Manufacturing and Sustainable Development in the Mechanical Engineering Industry. Ph.D. Thesis, University of Urbino, Urbino, Italy, 2025. [Google Scholar]
Ampratwum, I.; Nayak, A. Hybrid Approach for WDM Network Restoration: Deep Reinforcement Learning and Graph Neural Networks. IEEE Open J. Comput. Soc. 2025, 6, 1012–1026. [Google Scholar] [CrossRef]
Hasan, M.M. Machining-induced Residual Stress Modeling Using Physics-Informed Neural Networks. Ph.D. Thesis, University of Kentucky, Lexington, KY, USA, 2025. [Google Scholar]
Perez-Cerrolaza, J.; Abella, J.; Borg, M.; Donzella, C.; Cerquides, J.; Cazorla, F.J.; Englund, C.; Tauber, M.; Nikolakopoulos, G.; Flores, J.L. Artificial intelligence for safety-critical systems in industrial and transportation domains: A survey. ACM Comput. Surv. 2024, 56, 176. [Google Scholar] [CrossRef]
Jacobsen, C. Enhancing Physical Modeling with Interpretable Physics-Aware Machine Learning. Ph.D. Thesis, Technical University of Denmark, Lyngby, Denmark, 2024. [Google Scholar]
Zhang, F.; Gao, Y.; Fang, C. Mechanics-Coupled Deep Reinforcement Learning for Automated Design of Internal Supporting Structure in Foundation Pits. Available online: https://ssrn.com/abstract=5667557 (accessed on 17 November 2025).
Breish, F.; Hamm, C.; Andresen, S. Nature’s load-bearing design principles and their application in engineering: A review. Biomimetics 2024, 9, 545. [Google Scholar] [CrossRef]
Łach, Ł.; Svyetlichnyy, D. Advances in numerical modeling for heat transfer and thermal management: A review of computational approaches and environmental impacts. Energies 2025, 18, 1302. [Google Scholar] [CrossRef]
Malik, H.; Brantner, T. Advancing structural health monitoring: Statistical shape modeling and AI-driven reliability estimation. J. Struct. Health Monit. 2025, 14, 112–130. [Google Scholar]
Idoko, J.B.; Ma’aitah, M.K.S.; Alwhelat, A.; Smart, K.; Alwaeli, Z. Evaluation of hyperparameter optimization techniques in deep learning considering accuracy, runtime and computational efficiency metrics. J. Soft Comput. Data Min. 2025, 6, 182–199. [Google Scholar] [CrossRef]
Danach, K.; Aly, W.H.F. Adaptive hyperheuristic framework for hyperparameter tuning: A Q-learning-based heuristic selection approach with simulated annealing acceptance criteria. Eur. J. Pure Appl. Math. 2025, 18, 6348. [Google Scholar] [CrossRef]
Paulson, J.A.; Tsay, C. Bayesian optimization as a flexible and efficient design framework for sustainable process systems. Curr. Opin. Green Sustain. Chem. 2025, 51, 100983. [Google Scholar] [CrossRef]
Martinez, Y.; Rojas, L.; Peña, A.; Valenzuela, M.; Garcia, J. Physics-informed neural networks for the structural analysis and monitoring of railway bridges: A systematic review. Mathematics 2025, 13, 1571. [Google Scholar] [CrossRef]
Xu, G.; Guo, T. Advances in AI-powered civil engineering throughout the entire lifecycle. Adv. Struct. Eng. 2025, 28, 1515–1541. [Google Scholar] [CrossRef]
Kisina, D.; Akpe, O.E.E.; Ubanadu, B.C.; Daraojimba, A.I.; Gbenle, T.P.; Adanigbo, O.S. Advances in application profiling techniques for performance optimization in resource-constrained environments. Int. J. Future Eng. Innov. 2024, 1, 108–114. [Google Scholar] [CrossRef]
Afshari, S.S.; Enayatollahi, F.; Xu, X.; Liang, X. Machine learning-based methods in structural reliability analysis: A review. Reliab. Eng. Syst. Saf. 2022, 219, 108223. [Google Scholar] [CrossRef]
Salman, M.R.; Al-Shaikhli, M.; Abbas, H.A.; Ahmad, H.H.; Kudus, S.A. A critical review of deep learning applications, challenges, and future directions in structural engineering. Int. J. Comput. Civ. Struct. Eng. 2025, 21, 146–156. [Google Scholar] [CrossRef]
Liang, R.; Liu, W.; Fu, Y.; Ma, M. Physics-informed deep learning for structural dynamics under moving load. Int. J. Mech. Sci. 2024, 284, 109766. [Google Scholar] [CrossRef]
Ren, Z.; Zhou, S.; Liu, D.; Liu, Q. Physics-informed neural networks: A review of methodological evolution, theoretical foundations, and interdisciplinary frontiers toward next-generation scientific computing. Appl. Sci. 2025, 15, 8092. [Google Scholar] [CrossRef]
Lei, D.; Zhang, Y.; Lu, Z.; Lin, H.; Chen, Y. Hybrid Machine Learning Model for Predicting Shear Strength of Rock Joints. Appl. Sci. 2025, 15, 7097. [Google Scholar] [CrossRef]
He, Y.; Huang, Z.; Liu, D.; Zhang, L.; Liu, Y. A novel structural damage identification method using a hybrid deep learning framework. Buildings 2022, 12, 2130. [Google Scholar] [CrossRef]
Maryoosh, A.A.; Pashazadeh, S.; Salehpour, P. A Hybrid Learning Framework for Enhancing Bridge Damage Prediction. Appl. Syst. Innov. 2025, 8, 61. [Google Scholar] [CrossRef]
Dang, H.V.; Tran-Ngoc, H.; Nguyen, T.V.; Bui-Tien, T.; De Roeck, G.; Nguyen, H.X. Data-driven structural health monitoring using feature fusion and hybrid deep learning. IEEE Trans. Autom. Sci. Eng. 2020, 18, 2087–2103. [Google Scholar] [CrossRef]
Ly, H.-B.; Le, T.-T.; Vu, H.-L.T.; Tran, V.Q.; Le, L.M.; Pham, B.T. Computational hybrid machine learning based prediction of shear capacity for steel fiber reinforced concrete beams. Sustainability 2020, 12, 2709. [Google Scholar] [CrossRef]
American Institute of Steel Construction. AISC Shapes Database v16.0; Version 16.0; Includes U.S. Customary and Metric Units; Replaces v15.0; American Institute of Steel Construction: Chicago, IL, USA, 2025; Available online: https://www.aisc.org/publications/steel-construction-manual-resources/16th-ed-steel-construction-manual/aisc-shapes-database-v16.0/ (accessed on 17 November 2025).
EN 1990:2002; Eurocode 0: Basis of Structural Design. European Committee for Standardization: Brussels, Belgium, 2002.
ISO 2394:2015; General Principles on Reliability for Structures. International Organization for Standardization: Geneva, Switzerland, 2015.
AISC 360-16; Specification for Structural Steel Buildings. American Institute of Steel Construction: Chicago, IL, USA, 2016.

Figure 1. Proposed Methodology.

Figure 2. Performance Metrics and Inference Efficiency across Hyperparameter Optimization Experiments.

Figure 3. Results dashboard for the AI-Assisted Structural Optimization of Steel Frameworks. (a) Overall classification metrics. (b) Per-class Precision/Recall/F1. (c) Confusion matrix with counts and percentages. (d) Regression performance for

P_{f}

,

β

, and capacity (Cap.). (e) Code-oriented compliance rates for

β \geq 3.5

,

P_{f} \leq 10^{- 3}

, and

DCR \leq 1.0

. (f) Five-fold cross-validation metrics with AUC. (g) Hyperparameter study (bubble chart: size = latency, color =

RMSE (P_{f})

). (h) Computational efficiency across hardware devices.

Figure 3. Results dashboard for the AI-Assisted Structural Optimization of Steel Frameworks. (a) Overall classification metrics. (b) Per-class Precision/Recall/F1. (c) Confusion matrix with counts and percentages. (d) Regression performance for

P_{f}

,

β

, and capacity (Cap.). (e) Code-oriented compliance rates for

β \geq 3.5

,

P_{f} \leq 10^{- 3}

, and

DCR \leq 1.0

. (f) Five-fold cross-validation metrics with AUC. (g) Hyperparameter study (bubble chart: size = latency, color =

RMSE (P_{f})

). (h) Computational efficiency across hardware devices.

Figure 4. Coefficient of determination (

R^{2}

) across the studies.

Figure 4. Coefficient of determination (

R^{2}

) across the studies.

Figure 5. Coefficient of determination (

R^{2}

).

Figure 5. Coefficient of determination (

R^{2}

).

Figure 6. Error minimization performance.

Table 1. Comparison between Physical, Pure AI, and Proposed Hybrid Modeling Approaches in Structural Analysis.

Aspect	Purely Physical Models (FEM, Analytical) [9]	Pure AI Models (DNN, CNN) [18,19]	Proposed Hybrid AI–Physics Framework [20,21]
Core Principle	Governed by material laws, boundary conditions, and equilibrium equations	Learns from data without explicit physics or design rules	Combines data-driven learning with embedded physical constraints and code-based limit states
Accuracy	High for well-defined geometry and loading but deteriorates with complex boundary or nonlinear effects	Dependent on training data; may generalize poorly beyond training domain	Maintains high accuracy (>99%) with improved generalization across varying structural conditions
Computational Cost	High (minutes–hours per simulation for large FEM models)	Low (milliseconds–seconds per prediction)	Moderate (milliseconds; sub-3 ms latency achieved through quantized inference)
Physical Interpretability	Fully interpretable and code-compliant	Limited; may yield non-physical or code-violating predictions	High; enforces safety factors, stress limits, and reliability indices ( $β$ , $P_{f}$ ) during learning
Scalability/Real-Time Feasibility	Poor; computationally intensive for large-scale or online monitoring	High; suitable for real-time monitoring but prone to instability	Excellent; verified edge-deployable performance with low latency and minimal memory footprint
Scope of Application	Static and deterministic analysis of single components	Pattern recognition and regression from historical data	Dynamic structural optimization, reliability prediction, and real-time design validation

Table 2. Summary of AISC Steel Material Property Database (AISC-MPD).

Property	Details	Description
Dataset Source	AISC-MPD	American Institute of Steel Construction
Data Type	Stress–strain curves, mechanical tests	Experimental material properties
Key Parameters	$f_{y}$ , $f_{u}$ , E, ductility	Yield strength, tensile strength, modulus, elongation
Geometric Descriptors	Cross-sections, slenderness ratios	Structural member characteristics
Preprocessing	Cleaning, normalization, splitting	Ensures AI-ready, consistent dataset
Application	Structural optimization	Supports AI-based design and reliability analysis

Table 3. Feature Representation for AI-Based Structural Optimization.

Feature Group	Parameters/Indicators	Role in Optimization
Civil Features	$f_{y}, f_{u}, E$ , geometry	Define stiffness, strength, stability
Mechanical Features	Stress–strain, fatigue, dynamics	Capture nonlinear and time-dependent behavior
Engineered Indicators	$R, U, D C R$	Ensure safety, ductility, and code compliance
Dimensionality Reduction	PCA components	Reduce redundancy, preserve variance

Table 4. Components of the AI-Assisted Structural Optimization Framework.

Component	Function	Contribution
Classification Head	Outputs safe vs. unsafe predictions	Enables code-compliant design screening
Regression Head	Predicts $P_{f}$ and $β$	Provides quantitative reliability metrics
Physics-Informed Constraints	Enforces limit states ( $g_{i}, τ_{i}$ )	Guarantees structural and code compliance
Joint Loss Function	Balances objectives ( $α$ , $β$ , $γ$ )	Integrates accuracy with physical feasibility

Table 5. Tunable Hyperparameters and Search Ranges in Bayesian Optimization.

Hyperparameter	Search Range	Contribution
Learning Rate ( $η$ )	$[10^{- 4}, 10^{- 1}]$	Controls gradient update size
Batch Size (B)	{32, 64, 128, 256}	Affects convergence speed and stability
Hidden Units (h)	$[64, 512]$	Determines model capacity and expressiveness
Dropout Rate (p)	$[0.1, 0.5]$	Prevents overfitting by regularization
Loss Weights $(α, β, γ)$	$[0.1, 1.0]$	Balance between classification, regression, and physics constraints

Table 6. Validation Metrics and Code-Oriented Thresholds.

Metric	Symbol	Threshold/Purpose
Accuracy, Precision, Recall, F1, AUC	–	Classification of safe vs. unsafe states
MAE, RMSE, $R^{2}$	–	Regression errors for $P_{f}$ and $β$
Probability of Failure	$P_{f}$	$P_{f} \leq 10^{- 3}$ (Eurocode EN 1990, ISO 2394)
Reliability Index	$β$	$β \geq 3.5$ (Eurocode EN 1990 Annex B)
Demand-to-Capacity Ratio	DCR	$D C R \leq 1.0$ (AISC 360-16 Section B3.1)
Cross-validation	k-fold	Generalization and robustness

Table 7. Computational Efficiency Across Hardware Platforms.

Platform	Precision	Latency (ms)	Throughput (pred/s)	Memory (MB)
GPU (Workstation, FP16)	Mixed	0.9	1100	18.2
CPU (Laptop, FP32)	FP32	3.8	263	24.8
Edge SoC (Embedded, INT8)	INT8	2.0	500	12.6

Table 8. Overall Test-Set Performance of the AI-Assisted Structural Optimization Framework.

Accuracy	Precision	Recall	F1-Score	AUC	MAE ( $P_{f}$ )	RMSE ( $P_{f}$ )	$R^{2} (P_{f})$	MAE ( $β$ )	MAPE (Cap.)
99.91%	99.92%	99.91%	99.91%	0.9991	0.0063	0.0115	0.9978	0.047	1.92%
Latency = 2.3 ms/prediction Memory = 25.1 MB (FP32)

Table 9. Per-Class Classification Metrics (Safe vs. Unsafe).

Class	Precision	Recall	F1-Score	Support
Safe	99.92%	99.90%	99.91%	20,000
Unsafe	99.91%	99.92%	99.91%	20,000
Macro Avg	99.92%	99.91%	99.91%	40,000

Table 10. Confusion Matrix on the Test Set (40,000 samples).

	Predicted Safe	Predicted Unsafe	Row Total
Actual Safe	19,986 (TN)	14 (FP)	20,000
Actual Unsafe	18 (FN)	19,982 (TP)	20,000
Column Total	20,004	19,996	40,000

Table 11. Regression Performance for Reliability and Capacity Targets.

Target	MAE	RMSE	$R^{2}$	Notes
Probability of failure $P_{f}$	0.0063	0.0115	0.9978	Absolute error in failure probability
Reliability index $β$	0.0470	0.0847	0.9969	Unitless safety index
Ultimate capacity (kN)	13.1	22.5	0.9973	Standardized geometries and loads

Table 12. Reliability Compliance Against Code-Oriented Thresholds.

Criterion	Threshold	Compliance Rate	Mean Value	95% CI
Reliability index $β$	$β \geq 3.5$	99.87%	3.85	[3.76, 3.94]
Failure probability $P_{f}$	$P_{f} \leq 10^{- 3}$	99.85%	$7.4 \times 10^{- 4}$	[ $6.1, 8.8$ ] $\times 10^{- 4}$
Demand-to-capacity ratio	$D C R \leq 1.0$	99.93%	0.85	[0.81, 0.88]

Table 13. Five-Fold Cross-Validation Results (Classification Task).

Fold	Accuracy	Precision	Recall	F1-Score	AUC
Fold 1	99.90%	99.90%	99.90%	99.90%	0.9990
Fold 2	99.91%	99.92%	99.90%	99.91%	0.9991
Fold 3	99.94%	99.94%	99.93%	99.93%	0.9994
Fold 4	99.88%	99.89%	99.88%	99.88%	0.9989
Fold 5	99.92%	99.92%	99.92%	99.92%	0.9992
Mean	99.91%	99.91%	99.91%	99.91%	0.9991
Std. Dev.	0.02%	0.02%	0.02%	0.02%	0.0002

Table 14. Hyperparameter Optimization Experiments and Outcomes.

Exp.	$η$	Batch	Hidden	Dropout	$(α, β, γ)$	Accuracy	F1	AUC	RMSE ( $P_{f}$ )	Latency
E1	$1.0 \times 10^{- 3}$	64	256	0.30	(0.4, 0.4, 0.2)	99.86%	99.86%	0.9986	0.013	2.6 ms
E2	$5.0 \times 10^{- 4}$	128	384	0.25	(0.35, 0.45, 0.20)	99.90%	99.90%	0.9990	0.012	2.4 ms
E3	$2.0 \times 10^{- 4}$	128	512	0.20	(0.30, 0.50, 0.20)	99.92%	99.92%	0.9992	0.011	2.2 ms
E4	$1.0 \times 10^{- 4}$	256	512	0.15	(0.30, 0.50, 0.20)	99.91%	99.91%	0.9991	0.011	2.3 ms
E5	$7.5 \times 10^{- 4}$	64	320	0.30	(0.40, 0.40, 0.20)	99.88%	99.88%	0.9988	0.012	2.5 ms
Best	$2.0 \times 10^{- 4}$	128	512	0.20	(0.30, 0.50, 0.20)	99.92%	99.92%	0.9992	0.011	2.2 ms

Table 15. Ablation Study: Contribution of Feature Groups.

Configuration	Material-Only	+Geometry	+Load-Response	+Reliability Indicators	Accuracy/F1
A1	Yes	No	No	No	98.72%/98.72%
A2	Yes	Yes	No	No	99.19%/99.19%
A3	Yes	Yes	Yes	No	99.58%/99.58%
A4	Yes	Yes	Yes	Yes	99.92%/99.92%

Table 16. Computational Efficiency Across Inference Hardware.

Device	Precision	Latency (ms)	Throughput (pred/s)	Memory (MB)
Workstation GPU (FP16)	Mixed	0.9	1,120	18.4
Laptop CPU (FP32)	FP32	3.9	256	25.1
Edge SoC (INT8 QAT)	INT8	2.1	490	12.4

Table 17. Comparison of Related Studies with Our Proposed Work.

Study	Core Methodology	Feature Space/Inputs	Dataset & Scale	Evaluation Strategy	Performance Metrics	Physical Constraint Embedding	Edge Deployment Verification	Multi-Task Learning	Real-Time Performance	Compliance
Lei et al. (2025) [22]	SMA–MLP hybrid model	JRC, $σ_{n}$ , $ϕ_{b}$ , E, UCS	84 rock joint samples	5-fold CV	$R^{2} = 0.9687$ , RMSE = 0.097	Not included; purely data-driven learning without equilibrium or safety constraints.	Not evaluated; no real-time or hardware-level validation conducted.	Single-task (regression only); no classification or dual-objective prediction.	No latency profiling or throughput validation.	Does not conform to engineering code-based reliability measures.
He et al. (2022) [23]	EEMD + PCC + CNN hybrid	Acceleration time	3-story benchmark structure (4 scenarios)	Multi-scenario validation	Accuracy = 94.02%, F1 > 92%	Absent; no physics-informed or compliance mechanism integrated.	Not tested; no edge-aware or low-resource deployment.	Classification only; lacks regression for reliability indices.	No latency or runtime evaluation performed.	No compliance verification against AISC/Eurocode constraints.
Maryoosh et al. (2025) [24]	LBP + BoVW + Apriori + MobileNetV3	Crack-image features (LBP, BoVW, CNN)	DIMEC-Crack, BCD datasets	10-fold CV	Accuracy = 98.27–100%, F1 ≈ 99%	Not embedded; lacked any physical or constitutive enforcement.	No edge-level validation or latency testing.	Single-task image classification only.	Offline-only; no quantized or streaming inference.	Did not address physical or regulatory compliance.
Dang et al. (2020) [25]	Feature fusion (AR, DWT, EMD) + CNN–LSTM	Vibration-response signals	Benchmark + bridges (My Thuan, Z24)	Case studies + noise tests	Accuracy = 91–93.5%	No physics-based interpretability or constraint integration.	No embedded hardware verification.	Focused on damage detection (single-task).	Computation time reduced but not benchmarked in real-time.	No mention of design or safety code compliance.
Ly et al. (2020) [26]	NN optimized by RCGA/Firefly	Beam geometry, fiber, mix composition	463 SFRC beam tests	Cross-validation	$R = 0.9771$ (RCGA), RMSE 70% vs. empirical eqs.	Absent; lacked code-based equilibrium validation.	Not validated for embedded or constrained systems.	Regression only; no multi-task learning implemented.	No hardware or latency testing.	No compliance assessment for code adherence.
Proposed Work	Dual-head hybrid AI (classification + regression) with Bayesian optimization	Civil + mechanical features (stress–strain, fatigue, DCR, etc.)	Multi-domain structural datasets	Benchmarks + cross-validation + edge tests	Accuracy > 99%; $R^{2}$ > 0.98; low RMSE/MAE	Yes—physics-informed constraint embedding ensures equilibrium consistency, constitutive validity, and safety margin enforcement.	Yes—quantization-aware edge verification with latency < 3 ms and stable throughput across platforms.	Yes—dual-head multi-task framework performs both classification (state detection) and regression (reliability index, probability of failure).	Yes—validated in real-time with sub-3 ms inference latency and low memory overhead.	Yes—fully code-compliant through AISC/Eurocode-aligned physics-informed penalties ensuring reliability-based conformity.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A Novel Deep Hybrid Learning Framework for Structural Reliability Under Civil and Mechanical Constraints

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Dataset Used

3.2. Feature Representation and Engineering

3.3. AI-Assisted Structural Optimization Framework

3.4. Optimization Strategy

3.5. Validation and Reliability Evaluation

3.6. Computational Efficiency Assessment

4. Discussion Results and Comparison

5. Comparison with Related Works

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics