1. Introduction
Unreinforced masonry (URM) walls remain ubiquitous in the global building stock, constituting a significant portion of existing structures worldwide, yet their substantial mass and inherently low tensile strength render them highly susceptible to lateral collapse under seismic events, wind, or blast loads. This brittleness and lack of reinforcement result in URM walls often failing catastrophically unless they are retrofitted, so strengthening of existing masonry is critical for safety and structural integrity [
1,
2].
Traditional retrofit schemes including steel or concrete jacketing and shotcrete can increase capacity, but these methods add significant dead weight and stiffness, are labor-intensive or invasive to apply at a heavy penalty. They can even exacerbate seismic forces due to increased mass [
3,
4].
In recent decades, externally bonded fiber-reinforced polymer (FRP) composites have emerged as a superior alternative. Among FRP options, carbon-fiber FRP (CFRP) is particularly preferred for rigid-wall flexural and shear strengthening due to its exceptionally high elastic modulus, excellent fatigue resistance, and durability, outperforming glass (GFRP) and basalt (BFRP) systems in these applications [
4,
5,
6,
7].
Carbon-fiber composites offer unmatched strength-to-weight performance, but their premium cost makes optimization essential to balance safety and economy [
8]. Unfortunately, standard design codes such as ACI 440 (e.g., 440.2R and 440.7R) rely on iterative “trial-and-error” procedures. Engineers must initially assume a reinforcement ratio, calculate the structural strength, and repeat the process until code compliance is achieved. ACI 440.1R explicitly directs engineers to use this time-consuming approach [
9], which Bekdaş et al. describe as overly “cumbersome,” rendering manual optimization nearly impossible [
10]. Consequently, the lack of a direct design formula makes finding an economical, code-compliant CFRP solution highly tedious without computational assistance.
Researchers have long employed optimization algorithms to address this design challenge. For instance, Rahman et al. developed a Genetic Algorithm-based optimization framework for reinforced concrete beams, aiming to minimize the combined cost of CFRP plates and adhesive while satisfying both serviceability and ultimate limit-state requirements [
11].
Broader population-based algorithms have also been widely adopted to explore the highly nonlinear CFRP design space. Kayabekir et al. proposed an optimization approach based on the Jaya algorithm to determine the optimum placement of CFRP strips to increase the shear capacity of reinforced concrete beams. Their findings showed that the Jaya algorithm offers a competitive and computationally efficient alternative to previously used optimization techniques [
12], while Yücel conducted a comparative optimization study. In this study, it was shown that advanced methods such as Flower Pollination Algorithm and Particle Swarm Optimization can effectively determine CFRP design parameters. Furthermore, significant reductions in structural weight were achieved while maintaining compliance with design constraints [
13]. Expanding these optimization strategies to wall structures, Bekdaş et al. applied the Jaya algorithm to generate large datasets of optimum CFRP configurations for cantilever walls. However, while these metaheuristic methods successfully identify code-compliant, near-optimal layouts, their standalone computational cost remains relatively high [
10].
In recent years, a shift from this approach to data-driven models has been observed. This shift aims to overcome the computational burden and limited generalization capacity of metaheuristic optimization methods used alone. Therefore, artificial neural networks (ANNs) and ensemble learning models have been developed to quickly and reliably predict CFRP design requirements for different structural scenarios.
For example, Kayabekir et al. trained an ANN on metaheuristic-generated data to predict the optimum CFRP amount and orientation for the shear strengthening of RC beams [
14]. Zhang et al. utilized interpretable ensemble learning methods—such as gradient-boosted trees and random forests—to estimate the flexural capacity of FRP-strengthened beams and identify key variables through feature importance analysis [
15]. Furthermore, Bekdaş et al. proposed a hybrid ANN-Jaya framework for cantilever walls, training an MLP model on 500 optimized examples to rapidly predict the CFRP area with approximately 3.7% error while maintaining full ACI code compliance [
10].
However, metaheuristic algorithms have some practical limitations. In particular, high computational cost and the need for numerous repetitions to obtain reliable solutions are significant disadvantages. This can be time-consuming in large-scale engineering problems [
16].
In contrast, artificial intelligence models such as ANN and ensemble learning can make rapid predictions after training. However, these models generally operate as “black box” systems. The internal structure and weight distributions of the model cannot be easily interpreted. Therefore, validating the results and using them directly in design processes can be difficult. Furthermore, this lack of transparency in black-box models is considered a significant problem in terms of reliability in engineering applications [
17,
18].
Therefore, there is a need for an approach that can both make rapid predictions and provide clear analytical expressions. Symbolic regression offers a bridge between these two approaches. This method aims to find human-readable mathematical expressions that explain the relationships within the data. Thus, instead of complex computational models, clear “white-box” equations are obtained [
19].
For example, PySR is a tool developed to discover interpretable symbolic models from data [
20]. When optimum CFRP design datasets are analyzed using PySR, closed-form equations can be obtained for key design variables such as the required fiber area or reinforcement ratio. These expressions remain transparent and applicable while maintaining the predictive power of data-driven methods.
In short, symbolic regression combines the speed of learning-based models with the interpretability of traditional design equations. Thus, it overcomes the limitations of both metaheuristic methods and black-box ANN models [
19,
20].
Recently, symbolic regression has demonstrated strong potential across various structural applications by providing compact, interpretable, and highly accurate formulas. In structural engineering, Sorour et al. utilized PySR to model damage initiation in hybrid FRP-steel joints, achieving higher accuracy than traditional regression [
21]. Similarly, Megahed developed explicit prediction equations for the shear strength of reinforced concrete (RC) deep beams, offering superior transparency compared to opaque black-box models [
22]. SR has also been effectively combined with existing design rules to predict the capacity of concrete-filled steel tube (CFST) columns, outperforming standard European EC4 and AISC code estimates [
23]. Beyond structural components, SR is increasingly adopted in geotechnical and earthquake engineering to capture complex nonlinear behaviors. Pham et al. proposed a general SR-based formula for the soil compression index (Cc) [
24], while Almasoudi et al. developed a highly accurate (R
2 ≈ 0.99), physically consistent model for soil-structure interface shear strength [
25]. Furthermore, Diaz et al. and Ghosh and Debbarma successfully applied PySR to predict clay swelling pressure [
26] and seismic amplification factors in open ground-story buildings [
27], respectively, demonstrating that PySR consistently yields both exceptional predictive performance and fully interpretable mathematical expressions.
Previous studies have used metaheuristic optimization or machine learning models to estimate CFRP strengthening requirements. However, these methods either require iterative optimization processes or rely on prediction models that are difficult to interpret. Symbolic regression applications for direct CFRP design formulation are still limited. The originality of this work lies in the development of a hybrid process combining optimization and symbolic regression. This process transforms a large number of optimum design solutions into explicit engineering equations.
The main goal of the proposed hybrid process is to directly address this gap. In this context, clear and interpretable white-box design equations are proposed for flexural strengthening of URM (unreinforced masonry) walls using CFRP. For this purpose, a comprehensive dataset consisting of 1300 optimum design scenarios, fully compliant with ACI 440.7R and ACI 530 guidelines, was created. This dataset was generated using the Jaya algorithm. Two different transparent mathematical models that directly predict the required CFRP area (Af) were obtained using symbolic regression implemented via PySR in this study. The first model focuses on high accuracy. The second model offers a more balanced and simplified structure. It is designed for practical engineering applications. The obtained models were validated with independent test cases. This study provides direct analytical equations by eliminating complex iterative design processes. Thus, it offers structural engineers a practical design tool that ensures both safety and economic efficiency.
2. Methodology
The main steps of this study can be summarized as follows:
A large dataset containing optimum CFRP strengthening solutions for masonry walls was generated using a metaheuristic optimization framework based on the Jaya algorithm.
Symbolic regression was used to determine the relationship between structural parameters. The required CFRP area and explicit analytical expressions were obtained.
Two interpretable design equations were developed.
The proposed formulas were validated with previously unseen design cases.
The developed equations offer a fast alternative to the traditional iterative calculation procedures recommended in design guidelines.
2.1. Jaya Algorithm
The Jaya algorithm is a powerful and population-based meta-heuristic optimization method first proposed by Rao [
28]. It is derived from the Sanskrit word Jaya, meaning “victory”. The algorithm is based on the principle that the best solution should be obtained by approaching the best candidate solution in the population while simultaneously moving away from the worst solution. Unlike other evolutionary algorithms such as Genetic Algorithms (GA) or Particle Swarm Optimization (PSO), this method does not require the adjustment of specific control parameters. For example, parameters such as crossover probability, mutation rate, or inertia weights are not present in this method. Therefore, the Jaya algorithm is described as a “parameter-free” method. The algorithm only needs two basic inputs: population size and number of iterations. This feature makes the method easy to use and computationally efficient for complex engineering problems [
29].
The fundamental concept of Jaya combines the “survival of the fittest” principle with the collective intelligence of swarm-based methods. In every iteration, the algorithm identifies the “best” solution (global optimum for that iteration) and the “worst” solution to guide the search process, thereby maintaining a balance between the exploration of the search space and the exploitation of promising regions [
30].
Mathematically, let
f(
x) be the objective function to be minimized. The algorithm considers a population size of n candidates (indexed
k = 1, 2, …,
n) and a set of m design variables (indexed
j = 1, 2, …,
m). At any iteration
i, the value of the
jth variable for the kth candidate solution, denoted as
Xj,k,i, is updated according to Equation (1):
where
Xj,k,i is the current value of the jth variable for the kth candidate.
Xj,best,i is the value of the jth variable for the best solution obtained among the n candidates in the current iteration.
Xj,worst,i is the value of the jth variable for the worst solution obtained among the n candidates in the current iteration.
r1 and r2 are random numbers uniformly distributed in the range [0, 1].
During the optimization process, each newly generated vector constitutes a potential update to the population. A “greedy selection” strategy is then employed: the new solution is compared with the corresponding current solution. If the new vector yields a better (lower) objective function value, it replaces the current solution in the population; otherwise, the old solution is retained. This generate-and-test cycle repeats until the pre-specified stopping criterion (maximum number of iterations) is reached [
30,
31]. The flowchart of the algorithm is illustrated in
Figure 1.
In this study, the optimization process was executed with a population size of 20 and a maximum of 30,000 iterations to ensure convergence to the global optimum. The algorithm aimed to determine the optimal set of design variables. These are the CFRP strip width (
Wf), strip spacing (
Sf), and thickness (
tf) to minimize the total CFRP area (
Af) that satisfying the structural requirements of ACI 440.7R-10 [
32].
The physical model, including the wall cross-section, height, and the schematic layout of the externally bonded CFRP plates, is illustrated in
Figure 2. The input parameters and their corresponding ranges used in the optimization process, which correspond to the structural dimensions and loads depicted in
Figure 2, are detailed in
Table 1.
It should be noted that the CFRP layout considered in this study consists of vertically aligned strips rather than diagonal configurations. Although diagonal CFRP applications are known to be more effective for enhancing shear resistance, the primary objective of this study is flexural strengthening of masonry walls.
In flexural behavior, the dominant tensile stresses develop along the vertical direction due to bending moments. Therefore, vertical CFRP strips are more effective in resisting these tensile forces and improving the flexural capacity of the wall. This configuration is also consistent with common design practices and ACI 440 recommendations for flexural strengthening using externally bonded FRP systems.
Accordingly, the selected vertical layout is intentionally adopted to represent the governing structural behavior and ensure compatibility with the underlying design assumptions.
During the optimization process, the objective function was defined to minimize the area of CFRP material used per unit length of the wall, calculated as Equation (2):
where
n denotes the number of CFRP layers, which is fixed at one in the present study. All design variables are constrained within practical bounds to ensure feasible and constructible CFRP configurations. These side constraints, defining the admissible search space of the optimization algorithm, are presented in Equations (3)–(5). The lower and upper bounds for these design variables (
Wf,
Sf, and
tf) were strictly selected based on standard commercial availability and practical application constraints.
The optimization was subjected to seven critical constraints (
g1 to
g7) derived from ACI 440.7R-10 and ACI 530 to ensure safety and serviceability [
32,
33]. Any candidate solution violating these constraints was penalized with a high objective function value. The constraints are given as follows:
Tensile Stress Constraint (
g1): This constraint evaluates the flexural tensile stress (
Fb) calculated under the applied moment. According to ACI 530, this stress is compared with the design modulus of rupture (Φ
Fr), where a strength reduction factor of Φ = 0.6 is applied. The modulus of rupture (
Fr) is determined through in situ testing or estimated from standards such as ASCE 41-06 or ACI 530. Usually, if
, the unreinforced wall is deemed insufficient, necessitating the CFRP retrofit.
FRP Strength Constraint (
g2): The effective tensile stress in the CFRP reinforcement (
Ffe) must remain below its design tensile strength (
Ffu) to avoid rupture failure.
FRP Strain Constraint (
g3): The effective strain in the CFRP (
εfe) is limited to the design rupture strain (
εfu) to ensure strain compatibility and prevent debonding.
Masonry Strain Constraint (
g4): To prevent crushing failure of the masonry in the compression zone, the maximum compressive strain (
εm) must not exceed the ultimate usable strain (
εmu), typically taken as 0.0025.
FRP Spacing Constraint (
g5): This constraint limits the clear spacing between CFRP strips to ensure effective stress distribution and prevent local failure mechanisms.
FRP Stress Upper Limit Constraint (
g6): In the FRP system, the maximum tensile force developed for each strip width must not exceed specific threshold values. This verification guarantees that the FRP material operates within its design strength limits. Therefore, the maximum force corresponding to each width in the FRP system must satisfy the required condition.
Optimum FRP Area Constraint (
g7): This final constraint ensures that the provided CFRP area (
Afprov.) in the optimized solution is equal to or greater than the theoretically required area (
Afreq.).
After evaluating the objective function and penalizing constraint violations, the comprehensive dataset of 1300 optimized design scenarios was generated. To illustrate the range and nature of the generated variables, a representative sample consisting of seven randomly selected cases from the dataset is presented in
Table 2. This dataset subsequently served as the foundational input for applying Symbolic Regression via the PySR package. By processing the data, PySR discovered explicit, closed-form mathematical equations to predict the required optimum CFRP area (
Af) based directly on the structural parameters of the masonry walls. To validate the generalization capability of the proposed analytical equations, ten independent test cases were generated using the same optimization framework. These specific cases were strictly excluded from the regression process and were utilized solely as unseen benchmarks to evaluate the models’ accuracy as detailed in
Table 3.
2.2. Symbolic Regression and PySR
Symbolic regression (SR), which is a data-driven method, is employed in this study. This method aims to find short mathematical expressions that reveal relationships in datasets. It is different from other regression and machine learning methods because it focuses on interpretability and structural exploration. Traditional machine learning models (including ensemble methods like deep neural networks and random forests) often operate as “black boxes” [
34,
35].
Black-box models provide high predictive power but are not transparent. This can make scientific interpretation and model validation difficult. In contrast, standard parametric regression methods require the researcher to specify the functional form. A linear relationship or a polynomial of a certain degree might be assumed. However, if the actual data structure differs from these assumptions, this can introduce bias into the model. Symbolic regression overcomes these limitations. It does so by using evolutionary algorithms inspired by the principles of natural selection. In this process, a diverse population of candidate equations is created. These equations are usually represented by tree structures. They iteratively evolved using genetic operations. Mutation makes random changes to the equations to create new variations. Crossover allows the transfer of beneficial traits by combining the sub-expressions of successful individuals. Selection mechanisms eliminate low-performing candidates and allow the advancement of better solutions [
36]. The result of this process is clear and interpretable “white-box” equations. Because these equations are directly understandable, they can be easily integrated into theoretical frameworks and engineering designs [
37].
PySR is a high-performance, open source implementation of symbolic regression. It is specifically designed for scientific and engineering applications requiring speed and ease of use. The tool is built on the Julia programming language due to its powerful features in numerical computation. Thanks to Julia’s features such as just-in-time compilation and parallel processing support, PySR can perform very fast evolutionary search processes [
20].
PySR’s Python 3.12 interface offers a user-friendly structure. It is also compatible with popular data science libraries such as scikit-learn and pandas. Thus, it can be easily integrated into existing data analysis workflows [
38]. The evolutionary core of the tool uses a multi-population strategy. Candidate equations are distributed to increase diversity and to improve the convergence performance. The basic mechanisms include mutation, crosover and tournament selection [
20]. This structure also allows for the definition of custom operators and constraints. Thus, domain-specific regulations can be done including adding physical units or prohibiting certain mathematical operations [
39].
PySR uses the Pareto front and it chooses the best models. This approach addresses the natural balance between accuracy and complexity in symbolic expressions. Accuracy is evaluated using loss functions, which measure how well the equation fits the data. These metrics may include root mean squared error (RMSE) or custom domain-specific loss functions [
20]. Complexity is evaluated using different criteria, including the number of operators, tree depth, and overall expression length. This approach aims to reduce overfitting and increase the generalizability of the model. This is particularly important in fields such as structural engineering. Overly complex models may be physically difficult to interpret [
40]. As a result of Pareto optimization, PySR produces a set of ordered models consisting of non-dominated equations. Users can choose from these models according to their needs. For example, simpler models may be preferred to facilitate analytical processes in design [
41]. This multi-objective optimization approach is based on the evolutionary computation literature. Pareto dominance ensures the preservation of different solutions. Furthermore, various test studies have shown that PySR is successful in rediscovering known physical laws from noisy data [
20,
34].
In this study, a dataset consisting of 1300 design scenarios optimized with the Jaya algorithm was randomly divided into two subsets. 80% of the dataset was used for training and 20% for testing. PySR models were trained using only the training data. The test dataset was kept completely separate to evaluate the generalization ability of the models.
The main objective of this study is to develop a closed-form equation that directly predicts the required CFRP area (Af) based on the following structural parameters: wall thickness (t), masonry compressive strength (Fm), applied axial load (Pu), ultimate moment demand (Mu), CFRP rupture strength (Ffu′), CFRP modulus of elasticity (Ef).
To ensure the reproducibility of the study and to obtain physically interpretable equations, the PySR algorithm was implemented in the Python environment with specific hyperparameters. The mathematical search space was constrained to fundamental binary operators and specific unary operators. Furthermore, the maximum structural complexity of the generated equations was limited to 30 nodes (maxsize = 30). This limitation was implemented to prevent overfitting. The evolutionary search process was run over 40 iterations (niterations = 40). The algorithm used 15 subpopulations (populations = 15). Each subpopulation contained 33 individuals (population_size = 33). To ensure the stochastic evolutionary process is fully reproducible, a fixed random state value was used (random_state = 42). During the study, PySR generated several candidate models on the Pareto front.
Two different mathematical models were selected from this set of non-dominated models. The first model was obtained using a balanced selection strategy (model_selection = “best”). This approach optimizes the balance between prediction accuracy and algebraic complexity. The second model was selected using an accuracy-focused strategy (model_selection = “accuracy”). This approach minimizes only the loss function and does not consider equation complexity. The explicit mathematical expressions and statistical performance of the selected models are analyzed in detail in the next section.
3. Results and Discussion
This section presents the results of the symbolic regression analysis performed using the PySR framework on a dataset optimized by JA. The explicit closed-form design equations derived from the Pareto front are introduced and these equations demonstrate the optimal balance between algebraic simplicity and predictive accuracy. The statistical performance of the proposed models is evaluated in detail by using both the training and testing datasets. The generalization capability of the obtained formulas is validated using independent and previously unseen test data. A global sensitivity analysis is performed to determine the relative influence of structural and material parameters on the required CFRP area.
3.1. Proposed Explicit Design Equations and Pareto Front Analysis
In symbolic regression, the optimization process does not produce a single formula and it generates a diverse population of models consisting of different mathematical expressions. These candidate models are evaluated and distributed along the Pareto front. The Pareto front visualizes the natural balance between model accuracy and structural complexity. Model accuracy is measured by minimizing the loss function.
Structural complexity is determined by the number of mathematical nodes, operators, and constants. In this study, two different closed-form equations were obtained to predict the optimum CFRP area (
Af) required for the flexural strengthening of masonry walls.
Table 4 shows the candidate expressions evaluated using a balanced selection strategy.
Table 5 presents the expressions obtained when only prediction accuracy is prioritized.
In the context of the PySR algorithm, model complexity is defined as the total number of nodes in the mathematical expression tree. It is calculated by summing the predefined weights of all constituent elements in the formula, where each variable, constant, and mathematical operator (e.g., +, −, ×, ÷, log, sqrt) is counted as a separate node with a baseline weight of 1. Therefore, the complexity values reported in
Table 4 and
Table 5 directly represent the structural size of each candidate expression. This metric is fundamentally used in the Pareto optimization process to balance predictive accuracy against equation simplicity; lower complexity values correspond to simpler, highly interpretable equations, while higher values indicate more sophisticated mathematical structures aimed at capturing deeper nonlinear mechanics.
The first proposed equation chosen as Model 1 was extracted from the Pareto front utilizing a “balanced” selection criterion (model_selection = best). It evaluates candidate models based on a penalized score that strongly favors algebraic simplicity while maintaining high predictive precision. This model has a complexity score of 18. Despite its slightly higher node count, Model 1 is practically advantageous because of its structure, which is strictly composed of fundamental arithmetic operations and a straightforward quadratic term, completely avoiding transcendental or logarithmic functions. The explicit formula for the balanced model is presented in Equation (13):
The second proposed equation chosen as Model 2 was derived from the purely accuracy-driven Pareto front (
Table 5) using an “accuracy-focused” selection strategy (model_selection = “accuracy”). It prioritizes the absolute minimization of the prediction error. Interestingly, this approach is a highly accurate formulation with a slightly lower node complexity score of 17. However, to capture the deep non-linear behavioral relationships within the dataset and maximize the coefficient of determination, this model incorporates more advanced unary operators, specifically the natural logarithm and square root functions. The explicit formula for the accuracy-focused model is expressed in Equation (14), where
Af is the required area of the CFRP reinforcement (mm
2/m),
Mu denotes the ultimate moment demand (N·mm/m),
Ffu′ represents the CFRP rupture strength (MPa),
t is the thickness of the unreinforced masonry wall (mm), and
Ef corresponds to the CFRP modulus of elasticity (MPa).
Both models autonomously eliminated the applied axial load (Pu) and the masonry compressive strength (Fm) from their final expressions. The evolutionary algorithm determined that these parameters possessed a negligible impact on the optimal CFRP flexural area relative to the dominant variables (Mu, Ffu′, and t).
3.2. Predictive Performance and Statistical Accuracy
In order to numerically evaluate the reliability and generalization ability of the proposed symbolic regression models, 1300 optimal design scenarios were used. The dataset was divided into 80% training (1040 samples) and 20% testing (260 samples). Prediction performance was evaluated using four standard statistics: Coefficient of Determination (R
2), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). The performance of the balanced model (Model 1) and the accuracy-focused model (Model 2) are summarized in
Table 6.
As seen in
Table 6, both models make very good predictions. Model 1 achieved R
2 = 0.9913 and R
2 = 0.9927 on the training and test sets, respectively. The small difference between training and testing indicates that the model captures real physical relationships without memorizing the training data. Model 2 provided higher accuracy; in both datasets, R
2 > 0.997 and the test MAPE was only 1.32%.
In order to further demonstrate the reliability of the predictions, residual analysis was performed on the entire dataset (1300 samples) using the accuracy-focused model (Model 2). The mean error was found to be almost zero (µ = −0.000163) and the standard deviation was low (σ = 0.3183). The lowest and highest residuals ranged from −0.9040 to 0.9951. The near-zero mean indicates that the equation is unbiased and does not systematically overestimate or underestimate the CFRP area. This unbiasedness ensures that the designs are both safe and economical.
3.3. Validation on Unseen Data (Independent Test Cases)
Statistics obtained from 20% of the test data show that the models are consistent. However, to see the true generalization ability of the models, testing with completely independent data is necessary. For this purpose, ten new design scenarios were created using the Jaya algorithm. These examples were strictly isolated from the first 1300 examples and were not encountered by the PySR algorithm during training or model selection.
The optimum CFRP areas (
Af) found with the Jaya algorithm were compared with predictions made using two proposed equations.
Table 7 shows the results for the algebraically simple balanced model (Model 1).
According to
Table 7, the balanced equation successfully predicted the required CFRP area. The average error was 3.26%. Even the highest error (6.82%, Case 8) is within structural tolerances. Since Model 1 only uses basic mathematical operations, this accuracy is very useful for fast and manual calculations.
The same ten examples were also tested with the accuracy-focused equation (Model 2). The results are shown in
Table 8.
The results in
Table 8 clearly illustrate the superior predictive precision of Model 2. By incorporating logarithmic and square root operators to better capture the deep non-linear mechanics of the masonry walls, the average prediction error was substantially reduced to 2.10%. Remarkably, for specific scenarios such as Case 4, Case 5 and Case 10, the percentage error plummeted to less than 1%.
These validation outcomes definitively prove that both derived models are highly robust and ready for practical application. Structural engineers can confidently employ Model 1 for straightforward, preliminary design calculations or utilize Model 2 within computational spreadsheets when absolute precision and material optimization are critical priorities.
It is important to explicitly position the proposed framework as a data-driven engineering tool. The primary objective of the derived PySR equations is to serve as explicit surrogate models for the established ACI 440.7R and ACI 530 design guidelines. Consequently, the validation process in this study evaluates the models’ ability to replicate the exact iterative solutions dictated by these codes, rather than directly predicting standalone experimental test results. Since the ACI code provisions are inherently derived from, and rigorously calibrated against, comprehensive experimental databases of masonry structures, the proposed equations implicitly inherit this robust empirical validity. This ensures that the derived formulas remain structurally safe, reliable, and practically applicable for real-world design scenarios without the need for additional independent experimental calibration.
3.4. Feature Sensitivity Analysis
To thoroughly understand the governing mechanics behind the CFRP strengthening design and to quantify the global importance of each input parameter on the predicted CFRP area (
Af), three complementary sensitivity analysis approaches were employed: (i) global gradient-based sensitivity using mean absolute partial derivatives, (ii) local One-At-a-Time (OAT) elasticity analysis, and (iii) variance-based Sobol global sensitivity analysis (
Figure 3,
Figure 4 and
Figure 5).
The use of multiple independent methods ensures robustness and avoids methodological bias, as each technique quantifies a different aspect of input–output dependency. The gradient-based global sensitivity analysis (
Figure 3) revealed that the variable t exhibited the highest absolute sensitivity, accounting for approximately 92.8% and 93.0% of the total sensitivity in Model 1-best and Model 2-accuracy, respectively, while
Ffu contributed approximately 7%, and all other variables showed negligible gradients. This result is mathematically consistent with the symbolic model equations, where t appears in the denominator of nonlinear terms, causing relatively large absolute output changes per unit variation. However, it is important to emphasize that gradient-based sensitivity reflects absolute changes and is therefore influenced by the scale and units of the variables.
To overcome this limitation, scale-independent elasticity analysis was performed. The elasticity results demonstrated that t, Ffu, and Mu all exhibited similar relative influence, with elasticity values exceeding unity in both models (e.g., Model2: t = 1.211, Ffu = 1.171, Mu = 1.145), indicating that proportional changes in these variables produce comparable proportional changes in the output. This finding confirms that the system behavior is governed collectively by these three variables rather than by a single dominant parameter.
In particular, the variance-based Sobol global sensitivity analysis showed that the variable most affecting the variance of the output was mu. The total-order Sobol indices for Model 1 and Model 3 were approximately 0.609 and 0.633, respectively. This was followed by t and Ffu. Sobol analysis provides the most complete measure of global importance because it measures the contribution of each variable to the total output variance across the entire input range. The dominance of the Mu variable indicates that variations in this parameter explain the largest portion of the variance in the output; even if its magnitude appears small, it can be misleading due to scale effects.
The consistency between the Sobol indices, the elasticity analysis, and the mathematical structure of the symbolic equations strongly supports the idea that the models are governed by physically meaningful relationships rather than numerical errors or overfitting. Furthermore, the remaining variables (Fm, Pu, and Ef) showed negligible sensitivity in all three independent methods, confirming that these variables do not significantly affect model estimates within the studied domain.
Overall, the agreement of results from different independent sensitivity analysis techniques indicates that the predictive behavior of the proposed models is primarily controlled by Mu, t, and Ffu. This confirms the internal consistency, interpretability, and physical meaning of the derived symbolic relationships.
4. Discussion
In current applications, determining the required CFRP area (Af) largely relies on the ACI 440.7R-10 and ACI 530 design regulations. However, these methods naturally require a rather laborious trial-and-error process. This iterative process is not only time-consuming but also makes it practically difficult to effectively optimize expensive CFRP materials. The explicit mathematical equations obtained in this study directly eliminate this iterative burden. Thus, structural engineers can calculate the optimum reinforcement area in a single, direct mathematical step. This approach ensures both structural safety and material economy.
To clearly demonstrate the novelty of this research, the proposed method is presented comparatively with existing studies on structural strengthening in
Table 9. Early research mostly relied on metaheuristic algorithms to overcome the manual use of design regulations. Later, Artificial Neural Networks (ANNs) and ensemble models were widely proposed in the civil engineering literature to make rapid predictions and reduce design iterations. However, as seen in
Table 9, these machine learning models are mostly opaque “black box” systems. These models require the use of complex weighting matrices in computational environments. This makes their validation and reliability quite difficult, especially in structural engineering applications where life safety is critical.
Different from previous black-box methods, the proposed PySR method transforms highly nonlinear structural mechanical relationships into transparent and analytical “white-box” formulas. Once these formulas are obtained, they are not dependent on any software environment. From a practical standpoint, the two proposed models offer engineers a flexible design tool. Model 1 (equilibrium-oriented equation) consists only of basic arithmetic operations. Therefore, it is suitable for rapid preliminary sizing, field applicability checks, and manual calculations. In contrast, Model 2 (accuracy-oriented equation) includes logarithmic and square root functions, thus providing a very high accuracy of 0.997 R2. This model is particularly suitable for the final design phase. Although mathematically more complex, it can be easily integrated into standard engineering spreadsheets (e.g., Microsoft Excel) or automated structural design software. This ensures high accuracy and maximum material economy.
The physical validity of the proposed transparent models is strongly supported by the study’s findings, demonstrating clear consistency with actual structural performance behavior. As shown by the perturbation-based sensitivity analysis (
Section 3.4), the PySR algorithm automatically excluded the axial load (
Pu) and masonry compressive strength (
Fm) from the final explicit equations. This outcome is physically meaningful and consistent with the mechanics of CFRP flexural strengthening. The required CFRP area primarily represents a tensile demand associated with bending behavior.
While the axial load (Pu) introduces a pre-compression effect that can delay tensile cracking, its influence within the investigated design domain is relatively limited compared to the dominant effect of the ultimate moment (Mu). Furthermore, due to ACI-based design constraints that limit the maximum compressive strain to prevent brittle masonry crushing, the feasible solutions are inherently restricted to tension-controlled behavior. Consequently, the magnitude of Fm does not govern the required CFRP area within this framework.
Accordingly, the models correctly assign the highest sensitivity to the ultimate moment demand (Mu), CFRP rupture strength (Ffu′), and wall thickness (t). In such systems, the tensile demand is primarily resisted by the CFRP reinforcement, rendering Pu and Fm secondary. As expected from established structural mechanics principles, the required CFRP area increases with increasing moment demand and decreases with higher CFRP strength, which directly aligns with ACI-based design provisions. These results confirm that the proposed equations are not merely data-driven curve fits, but physically interpretable and reliable representations of real structural behavior.
However, an inherent limitation of this data-driven approach is that the applicability of the derived equations is strictly bounded by the parameter ranges of the generated dataset (which are comprehensively detailed in
Table 1) and the assumptions of the underlying ACI codes. Therefore, extrapolating these equations beyond this specific design domain—such as applying them to vastly different material strengths, wall dimensions, or loading conditions—should be performed with caution.
Beyond the physical interpretation, the resulting equations demonstrate a strong generalizability. In a validation study on independent and previously unseen data (
Section 3.3), the accuracy-focused equation (Model 2) showed a very low deviation with an average error of 2.10%. The balanced equation (Model 1) provided a highly acceptable error rate of 3.26% for engineering applications. These small deviations demonstrate that the models not only memorize training data but also learn real structural mechanical relationships. Consequently, this method provides structural engineers with a reliable tool containing clear mathematical expressions and is extremely fast in terms of computation. This tool contributes to the development of safe, economical, and regulatory-compliant solutions in the strengthening of unreinforced masonry structures.
5. Conclusions
This study presents a novel “white-box” artificial intelligence framework that combines the Jaya optimization algorithm with symbolic regression (PySR) to derive explicit closed-form design equations for the flexural strengthening of unreinforced masonry walls using CFRP. By generating a comprehensive dataset consisting of 1300 optimized design scenarios, the proposed approach eliminates the iterative trial-and-error procedures required by ACI guidelines while overcoming the lack of transparency associated with conventional black-box machine learning models.
The developed explicit equations demonstrated very high predictive accuracy across all statistical measures. The accuracy-focused model (Model 2) achieved a testing R2 of 0.9978, RMSE of 0.3186, MAE of 0.2248, and MAPE of only 1.32%. The balanced model (Model 1) also exhibited strong predictive performance, with a testing R2 of 0.9927 and a MAPE of 2.85%. These results confirm the ability of symbolic regression to accurately capture complex nonlinear relationships governing CFRP strengthening behavior.
Validation using ten completely independent structural design scenarios further confirmed the robustness and generalization capability of the proposed models. Model 2 predicted the optimum CFRP area with an average error of 2.10%, while Model 1 maintained a practical error level of 3.26%. The maximum observed deviation remained below 6.85%, which falls within acceptable engineering tolerances. These findings indicate that the proposed equations successfully capture the underlying structural mechanics rather than merely fitting the training data.
Perturbation-based sensitivity analysis revealed that the PySR algorithm effectively identifies the governing structural parameters. The resulting models consistently showed that the ultimate moment demand (Mu), wall thickness (t), and CFRP rupture strength (Ffu′) are the dominant variables controlling the required CFRP area. In contrast, axial load (Pu) and masonry compressive strength (Fm) were automatically excluded due to their negligible influence within the investigated design domain. This behavior is consistent with the tension-controlled flexural response of CFRP-strengthened masonry walls.
In addition to their predictive accuracy, the proposed models offer significant practical advantages. The derived equations exhibit relatively low mathematical complexity (18 for Model 1 and 17 for Model 2), enabling straightforward implementation in both manual calculations and standard engineering software environments. Model 1 is particularly suitable for rapid preliminary design and field applications due to its simplicity, while Model 2 provides high-precision predictions for final design and optimization purposes.
Overall, the proposed approach transforms a large set of optimization-based design results into direct analytical expressions, achieving high predictive performance (R2 > 0.99) while significantly reducing computational effort and design time compared to traditional code-based iterative procedures. This contributes to the development of fast, reliable, and interpretable design tools for structural engineers.
Future studies will focus on extending the proposed framework to shear strengthening applications, investigating different composite materials such as GFRP and BFRP, and analyzing the behavior of masonry walls under out-of-plane loading conditions. Furthermore, validation using independent experimental and literature-based datasets is recommended to further enhance the robustness and general applicability of the proposed models.