Prediction and Global Sensitivity Analysis of Long-Term Deflections in Reinforced Concrete Flexural Structures Using Surrogate Models

Reinforced concrete (RC) is the result of a combination of steel reinforcing rods (which have high tensile) and concrete (which has high compressive strength). Additionally, the prediction of long-term deformations of RC flexural structures and the magnitude of the influence of the relevant material and geometric parameters are important for evaluating their serviceability and safety throughout their life cycles. Empirical methods for predicting the long-term deformation of RC structures are limited due to the difficulty of considering all the influencing factors. In this study, four popular surrogate models, i.e., polynomial chaos expansion (PCE), support vector regression (SVR), Kriging, and radial basis function (RBF), are used to predict the long-term deformation of RC structures. The surrogate models were developed and evaluated using RC simply supported beam examples, and experimental datasets were collected for comparison with common machine learning models (back propagation neural network (BP), multilayer perceptron (MLP), decision tree (DT) and linear regression (LR)). The models were tested using the statistical metrics R2, RAAE, RMAE, RMSE, VAF, PI, A10−index and U95. The results show that all four proposed models can effectively predict the deformation of RC structures, with PCE and SVR having the best accuracy, followed by the Kriging model and RBF. Moreover, the prediction accuracy of the surrogate model is much lower than that of the empirical method and the machine learning model in terms of the RMSE. Furthermore, a global sensitivity analysis of the material and geometric parameters affecting structural deflection using PCE is proposed. It was found that the geometric parameters are more influential than the material parameters. Additionally, there is a coupling effect between material and geometric parameters that works together to influence the long-term deflection of RC structures.


Introduction
Reinforced concrete (RC) structures are widely used as the primary components of civil engineering structures due to their high strength and durability. Long-term deflection is a major concern for civil engineers when designing RC structural elements and assessing their long-term serviceability [1][2][3][4]. The deflection of RC structures may increase over time due to internal factors such as creep and shrinkage effects of the concrete and external factors such as continuous loading, elastic deformation associated with service loads and environmental influences [5]. During the service life of RC structures, local strains within component cross-sections can reach values several times greater than the initial elastic strain; this can cause undesired usability problems in structural elements with excessive deflections or crack widths, even in structures that meet code requirements. Excessive deflections can shorten the service life of RC structural elements [2] and have significant In recent years, the application of global sensitivity analysis in the field of finite element model updating and damage identification for civil engineering structures has received much attention [50][51][52]. Surprisingly, global sensitivity analysis techniques to assess the long-term deflection affecting RC beams have not been developed or studied.
Motivated by the above analysis, this paper presents the use of surrogate models for long-term deflection prediction of RC flexible members, compares the accuracy of global sensitivity indices calculated using PCE, SVR and RBF through numerical examples, and proposes the use of PCE for global sensitivity analysis of factors affecting the deflection of RC flexible members, which help predict the long-term deflection of RC beams in advance. The method presented in this paper will provide civil engineers with a set of data-driven tools to assess the long-term availability and safety of structures.
In this study, a data-driven modelling approach for long-term deflection prediction of concrete structures using surrogate models is proposed. In particular, four well-known surrogate models are used to predict the long-term deflection of RC flexural structures, namely PCE, SVR, Kriging and RBF. All surrogate models offer good transparency because they can generate explicit mathematical formulations that better describe the physical relationships between inputs and outputs. Additionally, the use of a PCE-based global sensitivity analysis of the factors influencing the long-term deflection of concrete structures is presented, which may help designers and civil engineers predict the long-term deflection of RC beams in advance.
The remainder of this article is organised as follows. Section 2 presents the theoretical basis of surrogate models. A global sensitivity analysis is presented in Section 3. A finite element analysis of a RC simply supported beam and the collected long-term deflection dataset of RC members are analysed in Section 4. The concluding remarks and outlines of future works are summarised in Section 5. For clarity, the abbreviations in the text are listed in full in abbreviations.

Theoretical Bases of Surrogate Model
Mainstream surrogate models can be divided into regression fitting (e.g., PCE and SVR), interpolation fitting (e.g., Kriging and RBF), and a combination of both [39]. Regression fitting does not cross the training samples in the modelling process, and there is fitting error. This method can filter out noise and experimental errors in training samples and is suitable for analytical problems with some computational noise and errors. Interpolation fitting passes through all training samples during the fitting process without fitting errors. Moreover, this approach is suitable for analytical problems with small or zero error. Popular surrogate model methods in the engineering field include PCE, LSSVR, Kriging and RBF. The specific characteristics of various methods are analysed as follows.

PCE
PCE is an explicit representation of the stochastic model response as a series of normal multivariate polynomials [53]. First introduced into stochastic mechanics by Ghanem and Spanos, the theory of chiastic chaos [54] was later extended by Xiu and Karniadakis [55] to different types of statistical distributions (e.g., uniform, β and γ distributions), as shown in Table 1. Typically, two methods are used to solve for PCE coefficients: intrusive and non-intrusive. Intrusive methods require modifications to the solution scheme of the deterministic control equations of the model [56], while non-intrusive methods, such as projection [57] and regression [47,58], calculate the PC coefficients by repeating the simulation over a limited number of input and output samples.
The output of the physical model or system of interest can be expressed as a blackbox function y = F (x) of its associated variables, and this functional relationship can be expressed in the form of a polynomial chaos expansion as follows: where α = (α 1 , . . . , α n ), (α i ≥ 0) is an n-dimensional indicator, β α is the unknown coefficient to be determined, and ψ α denotes the tensor product of orthogonal polynomials in a single variable.
As shown in Table 1, different univariate orthogonal polynomial bases for polynomial chaos expansion can be chosen for different types of data distributions. For example, for the input variables of Gaussian distribution type, the Hermite polynomial basis can be chosen; for the input variables of a uniform distribution, the Legendre polynomial basis can be chosen. In practical engineering applications, in order to save computational resources, the PCE in Equation (1) is usually truncated. This maintains its total order |α| = ∑ n i=1 α i while not exceeding a given polynomial of order p, i.e., y F p (x) = ∑ α∈A p,n β α ψ α (x), A p,n = {α ∈ N n : |α| ≤ p} Equation (3) is called the p-order full polynomial chaos expansion of the model response y. The relationship between the total number of unknown coefficients P and the maximum order p and dimension n of the input variables is as follows: From Equation (4), the number of truncated PCE bases increases exponentially as the dimensionality of the analysed problem and the increasing order of PCE, which greatly increases the computational cost. A large number of experimental cases show that only a small number of bases in the truncated PCE have an impact on the output response. Therefore, let A be a nonempty finite subset of N n , and the sparse PCE can be defined by the following equation: In Equation (5), the set A is referred to as the truncated set. A truncated PCE is called sparse if the sparsity index IS satisfies the following conditions: where p max corresponds to the order of the truncated PCE in Equation (5). In addition, the order of any indicator α in A and the order of the coupling effect of the variables are defined by the following equations, respectively: where 1 α i > 0 = 1 if α i > 0 and 0, otherwise. A popular way of obtaining the PCE coefficients is by constructing the objective function min β∈R P E (β) = β 1 + λ ψβ − y 2 2 from Equation (5) and solving it using a greedy algorithm. A comprehensive comparison of the accuracy and efficiency of orthogonal matching pursuit (OMP), least angle regression (LAR) and Bregman-iterative greedy coordinate descent (BGCD) in solving sparse PCE was made by Zhang et al. [59]. In this paper, the BGCD algorithm will be used to solve the PCE coefficients. As soon as a sparse PCE model for the deformation of the RC structure is established, the global sensitivity index can be obtained directly by post-processing the PCE coefficients.

SVR
SVM is a new machine learning algorithm based on statistical learning theory developed by Vapnik et al. [60]. The SVM method adopts the principle of structural risk minimisation and integrates techniques such as convex quadratic programming, maximum interval hyperplane classification and Mercer kernel clustering, which can find the optimal compromise between the complexity of the model and the learning ability. The optimal compromise between model complexity and learning ability can avoid the problem of overfitting and falling into a local optimum in the learning process. SVM is essentially a quadratic programming problem with linear constraints and has a high training complexity. However, when the sample data are too large, SVM becomes very complex and time-consuming in solving quadratic programming problems. To overcome this problem, Suykens et al. improved the SVM model [61] by replacing the inequality constraint with an equation constraint to minimise the squared term of the error. This eventually transforms the problem into solving a set of linear equations and improves the computational efficiency and accuracy; this method is called the least squares support vector machine. When the least squares SVM is used for regression prediction and modelling, we call it SVR.
SVR is suitable for nonlinear model prediction and achieving solution sparsity, and has a wide range of application prospects. A regression function of SVR is defined as: where b is the bias term, β denotes the regression coefficient vector and ϕ(x) denotes the nonlinear mapping function that can map the original low-dimensional design variables to a high-dimensional space. The model construction can be converted into solving the optimisation problem, i.e., Min β,e J(β, e) = The constraint is defined as: where b and e i are the penalty factor and error variable, respectively. By introducing Lagrange multipliers α, the optimisation problem with constraints is transformed as follows: Using the above to find the partial derivatives of the parameters β, b, e and α respectively, we obtain: By solving the above equation and eliminating the parameters β and e i , the following system of linear equations is obtained Low-dimensional input variables can be mapped to a high-dimensional space with the following expression φ x i , x j : Finally, the regression function of SVR is The commonly used kernel functions for constructing SVR are shown in Table 2. In this paper, we choose a radial basis kernel function with a hyperparameter σ to construct the SVR. The hyperparameter σ is obtained by cross-validation [61]. Table 2. Some commonly used kernel functions for SVR.

Kriging
As a semi-parametric model based on statistical prediction of stochastic processes, Kriging provides a linear unbiased, minimum variance estimate of the unknown response values in the design space by fitting a functional relationship between the sample points and the response values in the design space. Kriging was first proposed by the South African geologist Krige in 1951 when he was studying the distribution pattern of mineral reserves and was first used by Sacks [62] in the optimal design of structures. Kriging was first used in structural design optimisation by Simpson et al. [63,64].
Kriging consists of a parametric model and a non-parametric stochastic process jointly. For a set of m N-dimensional sample points, let the set consisting of sample points be Then the relationship between them can be expressed by the Kriging surrogate model as The first part of the equation is a linear regression of the data as shown in Equation (17), providing a global approximation to the fit, usually consisting of p polynomials. The second part is a random process with a non-independent but identically distributed normal distribution, providing a local approximation to the fit as follows: For a stochastic process z(x) that is a Gaussian smooth stochastic process with non-zero covariance and subject to a normal distribution N 0, σ 2 , its covariance matrix is generally expressed as where θ is the correlation function parameter and R θ, x i , x j is the spatial correlation function of any two sample points x i , x j in the sample points, which plays a dominant role in the fitting accuracy of the model and is commonly used in the form of a Gaussian correlation model.
is the square of the distance between the two sample points in the kth dimension, n represents the total number of output parameters, and θ k is the decay rate controlling the correlation in different dimensions. The correlation function matrix between each sample point in a sample set of m sample points is In the above mathematical model, the likelihood function for the occurrence of the true response at the sample point can be obtained as where F is a matrix of f (x) vector values at each sample point. According to the maximum likelihood rule, it can be found that: Furthermore, the logarithmic form of the great likelihood function is presented as: The optimal solution is obtained using an optimisation algorithm, i.e., the decay rate θ k in different dimensions can be determined. This method allows Kriging to be constructed. Using Kriging for unknown sample points x 0 predictions can be expressed as follows: where r T (x 0 ) is the vector of correlation functions between the unknown points and each sample point, i.e.:

RBF
RBF is a radially symmetric function, which is an interpolation method with the advantages of simplicity of form, adaptability and accuracy. It is confirmed that radial basis functions are the only best form of approximation for unknown functions by the equivalent definition of Micchelli's theorem [65]. Frank [66] interpolated a large amount of scattered data using various interpolation methods and verified that interpolation methods based on radial basis functions were the most effective.
Suppose the function y = f (x) is an N-dimensional real-valued function, and m sample points are selected using the experimental design method, and the set of these sample points is denoted as: . . , f (x m )} T for each sample point is obtained by the function. Then the radial basis function ∼ y(x) is used to fit a function y of the following form.
where λ i represents the coefficient to be determined before the ith basis function and x − x i represents the Euclidean norm between the prediction point x and the sample point x i . For represents the radial basis form of the radial basis, and the form of the basis function usually used is shown in Table 3, such that r = x − x i . Table 3. Some commonly used basis functions for RBF.

Types Expressions
Multiquadric basis (MQ) By taking the m sample points and response values into Equation (27), we obtain: The above equation has a total of m equations and n unknowns. For ease of presentation, we write Equation (28) in matrix form as follows.
Materials 2023, 16, 4671 Once a set of sample points and response values are given, the coefficient λ to be determined can be found according to the least squares method. Equation (27) shows that RBF describes the complex implicit functional relationship between the structural response and the structural parameters through a linear combination of basis functions. As the number of parameters to be corrected increases, the number of sample points required to solve for the coefficients to be determined is linearly related to the number of parameters to be corrected. Thus, compared to PCE, Kriging and SVR, RBF has the advantage of saving computational costs when fitting unknown problems.

Sobol' Decomposition
Let us consider a mathematical model with an n × m input x consisting of m samples of n variables and an m × 1 output y: where the input variables are defined on the n-dimensional unit cube K n : The Sobol' decomposition decomposes f (x) into summation terms with increasing dimensionality [44]: . . + f 1,2,...,n (x 1 , . . . , x n ) (32) where the constant f 0 is the mean value of the function, i.e.: The sum in Equation (32) contains the number of summands equal to ∑ n j=1 n j = 2 n − 1. Each summand f i 1 ,...,i s x i 1 , . . . , x i s has zero integration over any of its independent variables and the summand terms are orthogonal to each other [43], as follows: f i 1 ,...,i s x i 1 , . . . , x i s can be written as the difference between a multidimensional integral and a lower-order summation term. Now consider the input parameter X = {X 1 , . . . , X n } as an independent random variable uniformly distributed in [0, 1]. The model response Y = f(X) is a random variable whose total variance D is expressed as: By integrating the square of Equation (32) and using Equation (35), the total variance in Equation (38) can be decomposed as follows: where the partial variance appearing in the above expansion is as follows: The Sobol' indices are defined as follows: By definition, in combination with Equation (39), it is easy to obtain: Thus, each index S i 1 ,...,i s is a sensitivity measure describing how much of the total variance is due to uncertainty in the set of input parameters {i 1 , . . . , i s }. The first-order indices S i give the effect of each parameter acting individually on the output, while the second-order indices S ij indicate the coupling effect of variable x i and variable x j on the output, and the higher-order indices describe the effect of a possible mixture of parameters on the output.
The total sensitivity indicators S T i are defined in order to evaluate the total effect of an input variable [43]. They are defined as the sum of all partial sensitivity indices S i 1 ,...,i s containing parameter i:

Global Sensitivity Analysis Based on Monte Carlo Simulation
The traditional method of solving the global sensitivity indices is Monte Carlo simulation (MCS). Based on Equations (33) and (39), the following estimates of mean, total and partial variance can be derived using the N MC sample: In addition, the superscripts (1) and (2) in Equation (46) indicate that two different samples are generated and mixed. A similar expression allows for a one-time estimation of the total sensitivity indices S T i :Ŝ As described above, global sensitivity analysis does not require any assumptions about the model (e.g., linearity or monotonicity). In practice, analysts usually calculate first-order and total sensitivity indices and sometimes second-order indices. However, the calculation of sensitivity indicators based on the MCS method requires the evaluation of 2 n integrals, which is not practically feasible unless n is low. In addition, recent work has been devoted to further reducing the computational cost of evaluating Sobol' indices; see also [46]. However, the computational cost of evaluating all indices through MCS remains an issue.

Global Sensitivity Analysis Based on PCE
Defining a multidimensional indicator L i 1 ,...,i s : PCE in Equation (5) can then be rewritten as: Due to the orthogonality of the PC basis, the mean, total variance and partial variance of the response can be easily derived from Equation (51) as: The global sensitivity indices S A i 1 ,...,i s and S T,A i based on PCE obtained from the above equations are expressed as: From the above equations, by modelling the PCE of the response of interest, the global sensitivity indices can be calculated analytically from the coefficients of PCE, which significantly reduces computational costs.
Wu et al. [49] and Cheng et al. [48] derived methods for calculating global sensitivity metrics based on RBF and SVR, respectively. Let us consider the Ishigami function of high nonlinearity and non-monotonicity, which is widely used for benchmarking in global sensitivity analysis: The sensitivity indices of the model response can be calculated analytically as in [67]. Here, they are approximated by postprocessing PCE, RBF and Kriging of the model response according to [48,49,59]. From Table 4, the sensitivity indices of the parameters can be calculated more accurately using PCE, SVR and RBF for the same number of model evaluations, with the sensitivity indices calculated by PCE being the closest to the theoretical values. Therefore, PCE was subsequently used for the global sensitivity analysis of variables affecting long-term deflection.

Numerical and Experimental Validations
In this section, we first establish a finite element model of the RC simply supported beam to verify the feasibility of surrogate models with regard to their deflection prediction. Second, the accuracy of different surrogate models in calculating the global sensitivity indices is investigated through a numerical algorithm, and the main variables affecting the maximum deflection of RC simply supported beams are identified through global sensitivity analysis of the geometric and material variables affecting their deflection. Finally, long-term deflection prediction and global sensitivity analysis of RC flexural members are carried out with the collected experimental dataset to further validate the feasibility of surrogate models for application in the field of civil engineering structures. Figure 1 illustrates the process of prediction and global sensitivity analysis of deflections in RC flexural structures using surrogate models such as PCE, SVR, KRG, and RBF. This process consists of three main steps. In step 1, the dataset is normalised. In step 2, the data are randomly separated into training and testing sets. The training sets are used to train the surrogate model, and the testing sets are used to evaluate models. Step 3 is global sensitivity analysis of the factors affecting the deflection of the RC flexural structure. All experiments were conducted on a desktop computer with a Windows 10 operating system and equipped with an Intel(R) Core(TM) i7-9700 CPU @ 3.00 GHz and 16 GB DDR4 RAM 2666 MHz.

Predictive Accuracy Measures
In this paper, we use the coefficient of determination R 2 , relative average absolute error (RAAE), relative maximum absolute error (RMAE) and root mean square error (RMSE) as evaluation criteria for prediction performance, and these metrics are widely used in the accuracy assessment of surrogate models [68][69][70], the variance accounted factor (VAF), performance index (PI), A10−index and uncertainty analysis (U95), which are defined as follows:

Predictive Accuracy Measures
In this paper, we use the coefficient of determination R 2 , relative average absolute error (RAAE), relative maximum absolute error (RMAE) and root mean square error (RMSE) as evaluation criteria for prediction performance, and these metrics are widely used in the accuracy assessment of surrogate models [68][69][70], the variance accounted factor (VAF), performance index (PI), A 10 −index and uncertainty analysis (U 95 ), which are defined as follows: where f (x i ) andf (x i ) are the observed and simulated values, respectively, − f is the mean of the observed values and nt is the number of samples. Additionally, m 10 is the number of records with a ratio of measured to predicted value between 0.9 and 1.1. The closer the value of R 2 and A 10−index are to 1, the better the agreement between the actual and predicted values, when the smaller MAAE, RMAE and RMSE and larger VAF show more trustable statistical impressions.

Description of RC Simply Supported Beam Parameters
In this subsection, a RC simply supported beam structure is analysed. The structure and section reinforcement arrangement are shown in Figure 2. The concrete strength grade is C30, and the elastic modulus of concrete and reinforcement are E c and E s , respectively. The width and height of the beam section are assumed to be B and H, respectively, the thickness of the concrete protection layer is denoted as a, and the span length of the structure is L. There is a load (denoted as F) applied to the midpoint of the structure. The distribution parameters of all 9 input variables are listed in Table 5. The output Y is the midpoint deflection of the RC beam. Additionally, the relationship between two parameters can be calculated using the Parson correlation coefficient (PCC) as: where σ X and σ Y are the standard deviations of X and Y, respectively, and cov(X, Y) is the covariance between X and Y. As shown in Figure 3, high values of positive or negative coefficients affect the accuracy of the model and make it difficult to explain the effect of the input parameters on the target parameters. It can be seen that the correlation between Y and H as well as L and B is very high and that the PCC between the other variables is quite small. where and are the standard deviations of X and Y, respectively, and ( , ) is the covariance between X and Y. As shown in Figure 3, high values of positive or negative coefficients affect the accuracy of the model and make it difficult to explain the effect of the input parameters on the target parameters. It can be seen that the correlation between Y and H as well as L and B is very high and that the PCC between the other variables is quite small.    As shown in Figure 4a,b, we established the finite element model (FEM) of a RC simply supported beam in ANSYS 15.0. The FEM analysis was performed by fixing all 9 input variables at their mean values, and the results are shown in Figure 4c. It can be seen that the largest vertical displacements occur in the middle of the simply supported beam. The maximum vertical displacement of the structure is taken as the output of the model and is denoted as . The parameters and optimal parameters of the four surrogate models are shown in Table 6. As shown in Figure 4a,b, we established the finite element model (FEM) of a RC simply supported beam in ANSYS 15.0. The FEM analysis was performed by fixing all 9 input variables at their mean values, and the results are shown in Figure 4c. It can be seen that the largest vertical displacements occur in the middle of the simply supported beam. The maximum vertical displacement of the structure is taken as the output of the model and is denoted as Y. The parameters and optimal parameters of the four surrogate models are shown in Table 6.  As shown in Figure 4a,b, we established the finite element model (FEM) of a RC simply supported beam in ANSYS 15.0. The FEM analysis was performed by fixing all 9 input variables at their mean values, and the results are shown in Figure 4c. It can be seen that the largest vertical displacements occur in the middle of the simply supported beam. The maximum vertical displacement of the structure is taken as the output of the model and is denoted as . The parameters and optimal parameters of the four surrogate models are shown in Table 6.     In this numerical example, 100 sample points selected using a Sobol' quasi-random sequence were used to establish surrogate models. The Sobol' quasi-random sequence in Ref. [71] with a MALAB implementation called UQLAB is available at http://www.uqlab. com (accessed on 15 May 2023).
Four performance evaluation metrics, namely R 2 , RMAR, RMSE and A10-index, were determined for the above models, and the results for the training and test data are shown in Figures 5a and 5b, respectively. For the training data, Kriging is the optimal model, and for the test data, PCE is the optimal model. Figure 6 further shows the overall ranking of the efficiency of each model in the form of an intuitive stacked graph. Considering both the training and test data, PCE and Kriging are the best, with SVR having the lowest accuracy.   In this numerical example, 100 sample points selected using a Sobol' quasi-random sequence were used to establish surrogate models. The Sobol' quasi-random sequence in Ref. [71] with a MALAB implementation called UQLAB is available at http://www.uqlab.com (accessed on 15 May 2023).
Four performance evaluation metrics, namely , RMAR, RMSE and A10-index, were determined for the above models, and the results for the training and test data are shown in Figure 5a and Figure 5b, respectively. For the training data, Kriging is the optimal model, and for the test data, PCE is the optimal model. Figure 6 further shows the overall ranking of the efficiency of each model in the form of an intuitive stacked graph. Considering both the training and test data, PCE and Kriging are the best, with SVR having the lowest accuracy.      Figure 7 illustrates the actual-versus-prediction values of the maximum deflectio a RC simply supported beam obtained via PCE, SVR, Kriging and RBF on the s training data and testing data. The closer the data point to the line of best fit, the m accurate the prediction. It can be seen that all 4 surrogate models give particularly g As shown in Table 7, The models were scored from 1 to 4 based on each of the seven indices; then, the scores were summed to assign a total score for each model. As the interpolation-type surrogate models can be able to accurately pass all sample points on the train set, Kriging and RBF have the highest modelling accuracy on the training sets, followed by PCE and SVR. However, the fit-type surrogate models perform better on the testing sets. PCE has the highest prediction accuracy (R 2 = 0.9907, RAAE = 0.0535, RMAE = 0.5194, RMSE = 0.0115), followed by Kriging, RBF and SVR. It is noticed that the RMSE prediction accuracies of all 4 surrogate models are below 0.0150.  Figure 7 illustrates the actual-versus-prediction values of the maximum deflection of a RC simply supported beam obtained via PCE, SVR, Kriging and RBF on the same training data and testing data. The closer the data point to the line of best fit, the more accurate the prediction. It can be seen that all 4 surrogate models give particularly good results when predicting the actual deflection values, especially when predicting lower actual deflections; these data are almost always on the line of best fit. The maximum deflection values predicted by the surrogate model for RC beams can reliably support the design process for RC elements. The results of the uncertainty analysis are shown in Table 4. Table 8 shows that for both the training data and test data, all four surrogate models have low U95 values.  Figure 8a-d show the error distributions for the training and testing datasets for the PCE, SVR, Kriging and RBF models. It can be seen that most of the error distributions occur around zero, which leads to high accuracy of the models. All models developed produce more spot distribution around the zero point in the form of a Gaussian bell shape. The Taylor diagram of the surrogate models of a RC simply supported beam is presented in Figure 9. It is seen from these graphs that, despite the excellent performance of all models in high precision, PCE has the best performance in predicting both training and testing data. process for RC elements. The results of the uncertainty analysis are shown in Table 4. Table  8 shows that for both the training data and test data, all four surrogate models have low U95 values.   Figure 8a-d show the error distributions for the training and testing datasets for the PCE, SVR, Kriging and RBF models. It can be seen that most of the error distributions occur around zero, which leads to high accuracy of the models. All models developed produce more spot distribution around the zero point in the form of a Gaussian bell shape. The Taylor diagram of the surrogate models of a RC simply supported beam is presented in Figure 9. It is seen from these graphs that, despite the excellent performance of all models in high precision, PCE has the best performance in predicting both training and testing data.

Global Sensitivity Analysis of RC Simply Supported Beam
All global sensitivity indices are obtained by post-processing the coefficients of PCE. The results of the MCS method are also listed in Table 7 for comparison and it can be seen that PCE can provide accurate results of all the sensitivity indices with 100 model evaluations.
As shown in Table 9, for the first-order sensitivity indices, it can be seen that the RC simply supported beam section width has the greatest effect on deflection with = 0.4016 , followed by the span length and load with = 0.3608 and = 0.1040 , respectively. Interestingly, first-order sensitivity indices for the diameter of the tensile reinforcement and the thickness of the concrete protective layer are both 0.0000 with

Global Sensitivity Analysis of RC Simply Supported Beam
All global sensitivity indices are obtained by post-processing the coefficients of PCE. The results of the MCS method are also listed in Table 7 for comparison and it can be seen that PCE can provide accurate results of all the sensitivity indices with 100 model evaluations.
As shown in Table 9, for the first-order sensitivity indices, it can be seen that the RC simply supported beam section width H has the greatest effect on deflection with S H = 0.4016, followed by the span length L and load F with S L = 0.3608 and S F = 0.1040, respectively. Interestingly, first-order sensitivity indices for the diameter of the tensile reinforcement d 2 and the thickness of the concrete protective layer a are both 0.0000 with four decimal places retained, which does not mean that they do not have effects on deflection. The total sensitivity indices show that both the diameter of the tensile reinforcement d 2 and the thickness of the concrete protective layer a have effects on deflection in coupling with other variables. The sum of the first-order sensitivity indices is 0.9433, indicating that the variables acting alone have a dominant effect on deflection, and the sum of the first-and second-order sensitivity indices is 0.9911, close to 1, indicating that there is little higher-order coupling between the variables.
Another interesting result of the global sensitivity analysis is the second-order global sensitivity indices results shown in Figure 10. The x-axis and y-axis are the indices of the variables and the colour denotes the sensitivity indices. The white area indicates no coupling between pairs of variables, and the darker the colour, the greater the value of the second-order sensitivity indices for the pair of variables. It can be seen that in the variable pairs (H, L), (H, F) and (L, F) have very high values, which in fact are correlated. Therefore, the proposed method can also be used as a correlation evaluation tool for the uncertain parameters in the structure.
In summary, this numerical example of an RC simply supported beam demonstrates that all four surrogate models are efficient and accurate in engineering applications. At the same time, this example validates the validity and accuracy of global sensitivity analysis based on PCE in practical engineering applications.

Experiments of Long-Term Deflection of RC Flexural Members
To illustrate the effectiveness of surrogate models on the application of civil engineering problems, the prediction and global sensitivity analysis for long-term deflection tests on RC flexural members is presented in this section.

Data Collection and Pre-Processing
We analysed the data collected from 191 experiments that were summarised and documented by Espion [72] from 29 different research programs. The experimental dataset consists of 181 samples that detail the long-term deflections of RC simply supported beams and slabs with a variety of geometries, load levels and distributions, concrete strengths, reinforcement ratios and environmental conditions. To better evaluate efficiency, the performance of the surrogate models was compared with that of other machine learning models that have been frequently used for solving practical problems related to civil engineering, including back propagation neural networks (BP), decision tree (DT) and linear regression (LR). The hyperparameter settings for BP, DT, LR and the surrogate models were either proposed by previous studies, such as LR by Pham et al. [33], or were the default values for surrogate models. Table 10 reports the descriptions of 16 input variables and the ultimate long-term

Experiments of Long-Term Deflection of RC Flexural Members
To illustrate the effectiveness of surrogate models on the application of civil engineering problems, the prediction and global sensitivity analysis for long-term deflection tests on RC flexural members is presented in this section.

Data Collection and Pre-Processing
We analysed the data collected from 191 experiments that were summarised and documented by Espion [72] from 29 different research programs. The experimental dataset consists of 181 samples that detail the long-term deflections of RC simply supported beams and slabs with a variety of geometries, load levels and distributions, concrete strengths, reinforcement ratios and environmental conditions. To better evaluate efficiency, the performance of the surrogate models was compared with that of other machine learning models that have been frequently used for solving practical problems related to civil engineering, including back propagation neural networks (BP), decision tree (DT) and linear regression (LR). The hyperparameter settings for BP, DT, LR and the surrogate models were either proposed by previous studies, such as LR by Pham et al. [33], or were the default values for surrogate models. Table 10 reports the descriptions of 16 input variables and the ultimate long-term defections. The input variables were geometric parameters including (section width (b), total depth (h), area of tensile reinforcement (A s ), and experimental parameters (distance from ultimate compression fibre to centre of mass of tensile reinforcement (d), tensile reinforcement ratio (A s /bd), relative humidity (RH), concrete strength at age t ( f ' c ), span length (l), span-to-depth ratio (l/h), loading age (t i ), maximum moment at a constant load (M d ) consisting of the beam's own weight and a uniform load applied at the same age, maximum moment at an additional continuous load (M q ) consisting of a concentrated load and a uniform load applied at different ages), factors entering into the elastic deflection equation depending on the static system and load distribution (K d K q ), instantaneous or immediate measured deflection a(i) under M d + M q , and age t. The response was total measured deflection a(t) of the concrete flexural structure at age t.  Figure 11 shows the histograms of 17 variables with minima, maxima, mean and standard deviation in the final dataset. Most variables, except element K d , K q , are welldistributed and suitable for the modelling. As shown in Figure 12, correlations existed except for RH and B and with RH and K q (0.00), with H and d having the highest correlation (0.99), followed by a(t) with a(i) and l/h at 0.94 and 0.82, respectively. The relationship between the response and the input variables can be expressed as: To better evaluate efficiency, the performance of the surrogate models was compared with that of other machine learning models that have been frequently used for solving practical problems related to civil engineering, including back propagation neural networks (BP), decision tree (DT) and linear regression (LR). The hyperparameter settings for BP, DT, LR and the surrogate models were either proposed by previous studies, such as LR by Pham et al. [33], or were the default values for surrogate models.
The dataset consisting of 197 samples was randomly partitioned into two subsets: the training set with 178 samples (90% of the total dataset) and the test set with the remaining 19 samples (10%). To mitigate the negative effects of attributes with large values, the selected dataset was normalised. As one-time data partitioning is likely to lead to bias, in this study, 10 experiments were conducted using 10 random data partitions, and the comparative surrogate models were run on these data subsets accordingly. Therefore, comparisons of surrogate models were evaluated based on the mean and standard deviation values of the results of the 10 experiments. The parameters and optimal parameters of the four surrogate models are shown in Table 11.  Type of basis function MQ Figure 11. Histograms of the 17 variables in the final dataset (sample count: 197); statistical information such as minimum, maximum, mean, std. are also shown on the histograms.

Results and Discussion
Since the experimental data are discrete in nature and contain different levels of noise, a single division of training data and testing data may use some "bad" data as the training data to train the models, resulting in too bad an accuracy of the model on the testing data, so this paper used 10-times attempts to divide training data and testing data to achieve good model training results. The evaluation metrics, namely R 2 , RAAE, RMAE, RMSE, VAF, PI and A10-Index, were calculated from the test data to assess the predictive accuracy of surrogate models in predicting the long-term deflection of RC structures. Table 12 lists the values of the metrics calculated by the above models.
Four performance evaluation metrics were identified for the surrogate models, namely R 2 , RMAR, RMSE and A10-index, and the results for the experimental datasets are shown in Figure 13a,b. For the training data, Kriging was the best model, while for the test data, SVR was the best model. Figure 14 further shows the overall ranking of the efficiency of each model in the form of an intuitive stacked graph. Considering both the training and test data, Kriging is the best, with SVR and PCE having the lowest accuracy.

Results and Discussion
Since the experimental data are discrete in nature and contain different levels of noise, a single division of training data and testing data may use some "bad" data as the training data to train the models, resulting in too bad an accuracy of the model on the testing data, so this paper used 10-times attempts to divide training data and testing data to achieve good model training results. The evaluation metrics, namely R 2 , RAAE, RMAE, RMSE, VAF, PI and A 10 -Index, were calculated from the test data to assess the predictive accuracy of surrogate models in predicting the long-term deflection of RC structures. Table 12 lists the values of the metrics calculated by the above models. Four performance evaluation metrics were identified for the surrogate models, namely R 2 , RMAR, RMSE and A10-index, and the results for the experimental datasets are shown  Figure 13a,b. For the training data, Kriging was the best model, while for the test data, SVR was the best model. Figure 14 further shows the overall ranking of the efficiency of each model in the form of an intuitive stacked graph. Considering both the training and test data, Kriging is the best, with SVR and PCE having the lowest accuracy. As shown in Table 12, when the training sets were brought into surrogate models for prediction, both Kriging and RBF were able to reconstruct the training sets accurately due to the fact that Kriging and RBF are interpolated surrogate models. When the testing data were brought into the surrogate model for prediction, the prediction accuracy of the fitted surrogate models PCE and SVR was higher than that of Kriging and RBF.    Figure 15 shows the actual versus predicted values of the maximum deflection o simply supported beams obtained via PCE, SVR, Kriging and RBF on the training and testing data. The closer the data points are to the line of best fit, the more accurat predicted values are. It can be seen that although RBF has the lowest prediction accu 2 = 0.8975 is able to reach nearly 0.9000. PCE, SVR and Kriging all give good re when predicting the actual deflection values, especially when predicting the lower a deflection, and the data at these points are almost always on the line of best fit. maximum deflection values predicted by the surrogate models for RC beams can rel support the design process for RC members. The results of the uncertainty analysi shown in Table 13. Table 13 shows that for both the training data and test data, all surrogate models have low U95 values, with the RBF having the lowest (0.0749). As shown in Table 12, when the training sets were brought into surrogate models for prediction, both Kriging and RBF were able to reconstruct the training sets accurately due to the fact that Kriging and RBF are interpolated surrogate models. When the testing data were brought into the surrogate model for prediction, the prediction accuracy of the fitted surrogate models PCE and SVR was higher than that of Kriging and RBF. The prediction accuracy of SVR is the highest, with evaluation indices of R 2 mean = 0.9765, R 2 std = 0.0080, RAAE mean = 0.0965, RAAE std = 0.0141, RMAE mean = 0.4463, RMAE std = 0.1167, RMSE mean = 0.0555, RMSE std = 0.0175. Figure 15 shows the actual versus predicted values of the maximum deflection of RC simply supported beams obtained via PCE, SVR, Kriging and RBF on the training data and testing data. The closer the data points are to the line of best fit, the more accurate the predicted values are. It can be seen that although RBF has the lowest prediction accuracy, R 2 = 0.8975 is able to reach nearly 0.9000. PCE, SVR and Kriging all give good results when predicting the actual deflection values, especially when predicting the lower actual deflection, and the data at these points are almost always on the line of best fit. The maximum deflection values predicted by the surrogate models for RC beams can reliably support the design process for RC members. The results of the uncertainty analysis are shown in Table 13. Table 13 shows that for both the training data and test data, all four surrogate models have low U 95 values, with the RBF having the lowest (0.0749). and testing data. The closer the data points are to the line of best fit, the more accurate the predicted values are. It can be seen that although RBF has the lowest prediction accuracy, = 0.8975 is able to reach nearly 0.9000. PCE, SVR and Kriging all give good results when predicting the actual deflection values, especially when predicting the lower actual deflection, and the data at these points are almost always on the line of best fit. The maximum deflection values predicted by the surrogate models for RC beams can reliably support the design process for RC members. The results of the uncertainty analysis are shown in Table 13. Table 13 shows that for both the training data and test data, all four surrogate models have low U95 values, with the RBF having the lowest (0.0749).   Figure 16a-d show the error distributions for the training and testing datasets for the PCE, SVR, Kriging and RBF models. It can be seen that, consistently with the conclusion for the RC simply supported beam, most of the errors produce a more patchy distribution around the zero point in the form of a Gaussian bell shape. Figure 17 shows Taylor plots of the surrogate models for the experimental data. It can be seen from these plots that RBF and Kriging have the best performance for the training data and SVR has the best performance for the test data. Figure 18 presents comparisons of the RMSE values that were obtained from the surrogate models, LR model and empirical methods. The surrogate model has much smaller RMSE values than the ACI 318-83 building code and the CEB model code MC78. The RMSE values are also very competitive with the PSO-XGBoost model [73]. Therefore, surrogate models are effective tools for civil engineers or designers in predicting the longterm deflections of RC flexural members.     The RMSE values are also very competitive with the PSO-XGBoost model [73]. Therefore, surrogate models are effective tools for civil engineers or designers in predicting the longterm deflections of RC flexural members.  In order to verify whether the surrogate model is truly better than other models in predicting the long-term deformation of RC beams, a statistical measure of a one-tailed ttest statistical measure is performed. RMSE is tested as it is a common error metric for comparing models. The test is carried out on the RMSE values obtained in the test set with an equal number of samples and unequal variance. The calculated results with a confidence level of 95% (α = 0.05) are presented in Table 14. For all cases except RBF vs. BP where α is larger than the calculated value, it indicates that the surrogate model significantly outperformed the other models in terms of RMSE values of the long-term defection prediction of reinforced-concrete beams. In order to verify whether the surrogate model is truly better than other models in predicting the long-term deformation of RC beams, a statistical measure of a one-tailed t-test statistical measure is performed. RMSE is tested as it is a common error metric for comparing models. The test is carried out on the RMSE values obtained in the test set with an equal number of samples and unequal variance. The calculated results with a confidence level of 95% (α = 0.05) are presented in Table 14. For all cases except RBF vs. BP where α is larger than the calculated p value, it indicates that the surrogate model significantly outperformed the other models in terms of RMSE values of the long-term defection prediction of reinforced-concrete beams. The same conclusion is visually reflected in Figure 19, which shows the box plot of RMSE and R 2 values yielded by comparative models. Although the LR model exhibits a high R 2 , there is an outlier (+). PCE, SVR and KRG are less variable. The surrogate model demonstrates lower RMSE values and much smaller variability. Therefore, the surrogate model is the best prediction method in this experiment.
The same conclusion is visually reflected in Figure 19, which shows the box plot of RMSE and R values yielded by comparative models. Although the LR model exhibits a high R , there is an outlier (+). PCE, SVR and KRG are less variable. The surrogate model demonstrates lower RMSE values and much smaller variability. Therefore, the surrogate model is the best prediction method in this experiment.

Global Sensitivity Analysis of Characteristic Parameters That Affect Long-Term Deflection
Global sensitivity analysis not only identifies the important variables that influence the long-term deformation of RC structures, but also determines the effect of coupling between the variables on the prediction of structural deformation. This subsection addresses the use of global sensitivity analysis based on PCE to provide a comprehensive analysis of the importance of model input variables for predicting the long-term deformation of RC structures. As shown in Figure 20, the rectangular colour blocks on the diagonal line indicate the first-order sensitivity indices. Notably, the variable ( ) has the largest sensitivity index value and the variables and acting alone have no effect on the response. The other variables have smaller first-order sensitivity indices. The results of second-order sensitivity indices show that there is coupling between most of the variables. The sum of all first-order sensitivity indices is ∑ = 0.9306 and the sum of the first-and second-order sensitivity indices is ∑ + = 0.9665, indicating that there is little higher-order coupling between the variables affecting the long-term deflection of RC flexural members.

Global Sensitivity Analysis of Characteristic Parameters That Affect Long-Term Deflection
Global sensitivity analysis not only identifies the important variables that influence the long-term deformation of RC structures, but also determines the effect of coupling between the variables on the prediction of structural deformation. This subsection addresses the use of global sensitivity analysis based on PCE to provide a comprehensive analysis of the importance of model input variables for predicting the long-term deformation of RC structures. As shown in Figure 20, the rectangular colour blocks on the diagonal line indicate the first-order sensitivity indices. Notably, the variable a(i) has the largest sensitivity index value and the variables H and K q acting alone have no effect on the response. The other variables have smaller first-order sensitivity indices. The results of second-order sensitivity indices show that there is coupling between most of the variables. The sum of all first-order sensitivity indices is ∑ S i = 0.9306 and the sum of the first-and second-order sensitivity indices is ∑ S i + S ij = 0.9665, indicating that there is little higherorder coupling between the variables affecting the long-term deflection of RC flexural members.  Figure 21 shows the total sensitivity indicators for the variables affecting the longterm deflection of RC structures, and it can be clearly seen that all variables are present to influence the response by coupling with other variables, with variable ( ) having the greatest degree of influence on the response by coupling with other variables.  Figure 21 shows the total sensitivity indicators for the variables affecting the longterm deflection of RC structures, and it can be clearly seen that all variables are present to influence the response by coupling with other variables, with variable a(i) having the greatest degree of influence on the response by coupling with other variables. Figure 20. First-and second-order sensitivity for experimental data. Figure 21 shows the total sensitivity indicators for the variables affecting the longterm deflection of RC structures, and it can be clearly seen that all variables are present to influence the response by coupling with other variables, with variable ( ) having the greatest degree of influence on the response by coupling with other variables.

Conclusions
This paper presents the first prediction of the long-term deformation of RC structures using surrogate models (PCE, SVR, Kriging and RBF). The model accuracy was assessed using the evaluation metrics R , MAAE, RMAE, RMSE, VAF, PI, A10-Index and U95. and global sensitivity analysis of the parameters affecting the long-term deformation of RC structures was carried out. The feasibility of the proposed method was verified on a numerical example of a RC simply supported beam and a collected experimental dataset. For a RC simply supported beam, PCE has the highest prediction accuracy ( R 2 = 0.9907, RAAE = 0.0535, RMAE = 0.5194, RMSE = 0.0115, VAF = 99.0663, PI = 0.0249, A − Index = 0.9400), followed by Kriging, RBF and SVR. It is noticed that the RMSE prediction accuracies of all four surrogate models are below 0.0150. For experiments of long-term deflection of RC flexural members, RBF has the lowest prediction accuracy, with = 0.8975 able to reach nearly 0.9000. PCE, SVR and Kriging all give good results when predicting the actual deflection values, especially when predicting the lower

Conclusions
This paper presents the first prediction of the long-term deformation of RC structures using surrogate models (PCE, SVR, Kriging and RBF). The model accuracy was assessed using the evaluation metrics R 2 , MAAE, RMAE, RMSE, VAF, PI, A 10 -Index and U 95 . and global sensitivity analysis of the parameters affecting the long-term deformation of RC structures was carried out. The feasibility of the proposed method was verified on a numerical example of a RC simply supported beam and a collected experimental dataset. For a RC simply supported beam, PCE has the highest prediction accuracy (R 2 = 0.9907, RAAE = 0.0535, RMAE = 0.5194, RMSE = 0.0115, VAF = 99.0663, PI = 0.0249, A 10 − Index = 0.9400), followed by Kriging, RBF and SVR. It is noticed that the RMSE prediction accuracies of all four surrogate models are below 0.0150. For experiments of long-term deflection of RC flexural members, RBF has the lowest prediction accuracy, with R 2 = 0.8975 able to reach nearly 0.9000. PCE, SVR and Kriging all give good results when predicting the actual deflection values, especially when predicting the lower actual deflection, and the data at these points are almost always on the line of best fit. Taylor diagrams show that although all surrogate models have excellent performance in the accurate prediction of RC beam deflection, Kriging and RBF have the best prediction performance for training data and SVR and PCE for testing data. The results of the U 95 uncertainty analysis show that all four surrogate models have low uncertainty on both the FEM numerical model and the experimental data, with the FEM numerical model having 0.0269 and the experimental data having a maximum value of 0.0753. In addition, the prediction accuracy of the surrogate models are competitive in relation to empirical methods (ACI 318-83 and CEB model code) and machine learning models (PSO-XGBoost, BP, DT and LR).
At the same time, global sensitivity analysis based on PCE is proposed for the first time to determine the most important parameters for predicting the long-term deflection of RC structures. The effects of each factor acting alone or coupled with other factors on the longterm deflection of RC structures are analysed by means of first-order sensitivity indicators and total sensitivity indicators. For a RC simply supported beam, the beam section width H has the greatest effect on deflection with S H = 0.4016, followed by the span length L and load F with S L = 0.3608 and S F = 0.1040, respectively. Additionally, the variable pairs (H, L), (H, F) and (L, F) have very high values, which in fact are correlated. For experiments of long-term deflection of RC flexural members, instantaneous or immediate measured deflection a(i) has the largest sensitivity index value, and all variables are present to influence the response by coupling with other variables, with the variable a(i) having the greatest degree of influence on the response by coupling with other variables.
The results of this paper provide civil engineers and designers with an effective model for predicting the long-term deflection of RC structures and analysing the factors, such as the material and geometric factors, affecting the deflection of concrete beams. The future research directions in structural engineering include the development of user-operated software for the prediction of long-term deflection and global sensitivity analysis based on surrogate models, which are more convenient for engineers to use directly for solving various practical problems. In addition, considering that this paper only focuses on the long-term continuous loading tests of ordinary RC structures, future research will address the use of surrogate models in both high-strength and lightweight concrete materials.
Author Contributions: Supervision, conceptualisation, methodology, writing-review and editing, funding acquisition, W.D. and J.Z.; methodology, investigation, software, data curation, formal analysis, writing-original draft preparation, X.Y.; visualisation, investigation, writing-review and editing, M.Y.; writing-review and editing, T.L. All authors have read and agreed to the published version of the manuscript. Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data will be made available on request.