Machine Learning Techniques in Structural Wind Engineering: A State-of-the-Art Review

: Machine learning (ML) techniques, which are a subset of artiﬁcial intelligence (AI), have played a crucial role across a wide spectrum of disciplines, including engineering, over the last decades. The promise of using ML is due to its ability to learn from given data, identify patterns, and accordingly make decisions or predictions without being speciﬁcally programmed to do so. This paper provides a comprehensive state-of-the-art review of the implementation of ML techniques in the structural wind engineering domain and presents the most promising methods and applications in this ﬁeld, such as regression trees, random forest, neural networks, etc. The existing literature was reviewed and categorized into three main traits: (1) prediction of wind-induced pressure/velocities on different structures using data from experimental studies, (2) integration of computational ﬂuid dynamics (CFD) models with ML models for wind load prediction, and (3) assessment of the aeroelastic response of structures, such as buildings and bridges, using ML. Overall, the review identiﬁed that some of the examined studies show satisfactory and promising results in predicting wind load and aeroelastic responses while others showed less conservative results compared to the experimental data. The review demonstrates that the artiﬁcial neural network (ANN) is the most powerful tool that is widely used in wind engineering applications, but the paper still identiﬁes other powerful ML models as well for prospective operations and future research.


Introduction
Artificial intelligence (AI) has evolved rapidly since its realization in the 1956 Dartmouth Summer workshop and has attracted significant attention from academicians in different fields of research [1].Machine learning (ML), which is a form and subset of AI, is used widely in many applications in the area of engineering, business, and science [2].ML algorithms are capable of learning and detecting patterns and then self-improve their performance to better complete the assigned tasks.In addition, they offer a vantage for handling more complex approach problems, ensuring computational efficiency, dealing with uncertainties, and facilitating predictions with minimal human interference [3].Meanwhile, the ML capabilities in performing complex applications with large-scale and high-dimensional nonlinear data have been enhanced over the years due to the expansion of computational capabilities and power [4].
There are four main types of learning for ML algorithms: supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning [5,6].In supervised learning, the computer is trained with a labeled set of data to develop predictive models through a relationship between the input and the labeled data (i.e., regression and classification).In unsupervised learning, which is more complex, the computer is trained with an unlabeled set of data to derive the structure present in the data by extracting general rules Appl.Sci.2022, 12, 5232 2 of 27 (i.e., clustering and dimensionality reduction).In semi-supervised learning, the computer is trained with a mixture of labeled and unlabeled sets.In reinforcement learning, which is so far the least common learning type, the computer acquires knowledge by observing the data through some iterations that require reinforcement signals to identify the predictive behavior or action (i.e., make decisions) [3,7].
ML is becoming more prevalent in civil engineering with numerous studies publishing reviews and applications of ML in this field.While this paper focuses only on structural wind applications as explained later, a few key general summary studies or reviews are listed first for the convenience of the readers interested in broader applications.Adeli in [8] reviewed the applications of artificial neural networks (ANN) in the fields of structural engineering and construction management.The study presented the integration of neural networks with different computing paradigms (i.e., fuzzy logic, genetic algorithm, etc.).Çevik et al. [9] reviewed different studies on the support vector machine (SVM) method in structural engineering and studied the feasibility of this approach by providing three case studies.Similarly, Dibike et al. [10] investigated the usability of SVM for classification and regression problems using data for horizontal force initiated by dynamic waves on a vertical structure.Recently, Sun et al. [4] presented a review of historical and recent developments of ML applications in the area of structural design and performance assessment for buildings.
More recently, ML applications have been involved in predicting catastrophic natural hazards.Recent studies investigated the integration of real-time hybrid simulation (RTHS) with deep learning (DL) algorithms to represent the dynamic behavior of nonlinear analytical substructures [11,12].A comprehensive review was also provided by Xie et al. [13] on the progress and challenges of ML applications in the field of earthquake engineering including seismic hazard analysis and seismic fragility.Mosavi et al. [14] demonstrated state-of-the-art ML methods for flood prediction and the most promising methods to predict long-and short-term floods.Likewise, Munawar et al. [15] presented a novel approach for detecting the areas that are affected by flooding through the integration of ML and image processing.Moreover, ML applications were implemented in many other fields related to civil engineering generally and structural engineering particularly [16][17][18][19][20][21][22][23][24][25], structural damage detection [26][27][28][29], structural health monitoring [30][31][32][33], and geotechnical engineering [34][35][36][37][38][39].In addition, ML techniques, such as Gaussian regression, can be used for numerical weather predictions [40].Taking into consideration the above efforts to summarize ML techniques and their applications for different civil engineering sub-disciplines, no previous studies focused on structural wind engineering.Thus, the objective of this paper is to fill this important knowledge gap by providing a thorough and comprehensive review of ML techniques and implementations in structural wind engineering.
To better relate ML implementations, a brief overview of typical structural wind engineering problems is provided first.Bluff body aerodynamics is associated with a high level of complexity due to the several ways that wind flow interacts with civil engineering structures.Wind flow at the bottom of the atmosphere is influenced by the roughness of the natural terrain as well as by the built environment itself.As a result, eddies are formed that vary in size and shape and travel with the wind creating the well-known atmospheric boundary layer (ABL) flow characteristics [41].Studying and understanding the behavior of wind and its interaction with buildings and other structures is critical in the analysis and design process.Generally, ABL wind tunnel testing is still the most reliable tool to assess the aerodynamics of any structure and provide an accurate surface pressure and/or aeroelastic response.Computational fluid dynamics (CFD) tools became more popular and can perform well in predicting mostly mean, and in some cases peak, wind flow characteristics and corresponding loads on structures.To address larger problems, ML techniques were recently introduced in different applications in wind engineering but mostly to support and expand experimental and numerical wind engineering studies.
Based on the above introduction and the witnessed increased interest to incorporate ML techniques in structural wind engineering, a state-of-the-art review of the existing literature is beneficial and timely, which motivates this study.The goal of this paper again is to present an overview of the state of knowledge for commonly used ML methods in structural wind engineering as well as try to identify prospective research domains.We focus on the different ML methods that were used mainly for predicting wind-induced loads or aeroelastic responses.Therefore, eight major ML methods that were commonly used in the previous studies are the core of this review.These are: (1) artificial neural networks (ANN), (2) decision tree regression (DT), (3) ensemble methods (EM) that include: random forest (RF), gradient boosting regression tree (GBRT) or alternatively referred to as gradient boosting decision tree (GBDT), and XGboost, (4) fuzzy neural networks (FNN), (5) the gaussian regression process (GRP), (6) generative adversarial networks (GAN), (7) k-nearest neighbor regression (KNN), and (8) support vector regression (SVR).
The review and discussion following this introduction are divided into four sections.The first section goes over the different ML methods that were previously used through an overview of the formulation and the theoretical background for each method.This is to provide a fair context before discussing their applications for prediction and classification purposes.The second section is the core of this paper, which focuses on reviewing the previous studies that are categorized and presented through three main applications: (1) the prediction of wind-induced pressure/speed on different structures using data from experimental models, (2) integration of CFD models with ML models for wind loads prediction, and (3) assessment of the aeroelastic responses for two major types of structures, i.e., buildings and bridges.The third section provides a summary of the ML assessment tools and error estimation metrics based on the reviewed studies.The provided summary includes a list of assessment equations that are provided for the convenience of future researchers.The last section provides an overall comparison of the methods and recommendations to pave the path for using ML techniques in addressing future challenges and prospective research opportunities in wind engineering.It is important to note that this study did not review the ML implementation in non-structural wind applications such as wind turbines wake modeling, condition monitoring, blade fault detection, etc.

ML Methods Used in Structural Wind Engineering
This section discusses a brief theoretical background and an overview formulation for the commonly used ML methods in structural wind engineering.The discussion includes the eight classes that are mentioned before: ANN, FNN, DT, EM, GPR, GAN, KNN and SVM.It is noted that ANN methods are found to be the most commonly used methods in the area of focus, therefore, ANN is discussed in this section in more detail compared to the other methods.

Artificial Neural Network (ANN)
The concept of ANN is derived from biological sciences, where it mimics the complexity of the human brain in recognizing patterns through biological neurons, and thus imitates the process of thinking, recognizing, making decisions, and solving problems [42,43].ANN was the most popular method found in the reviewed literature to predict wind-induced pressures compared to other neural network methods (e.g., CNN or RNN).ANN is robust enough to solve multivariate and nonlinear modeling problems, such as classification and predictions.ANN is a group of layers that comprise multiple neurons at each layer and is also known as a feed-forward neural network (FFNN).It is composed of input layers, where all the variables are defined and fed into the hidden layers which are weighted and fed into the output layers that represent the response of the operation.The ANN architecture could be written as x-h-h-y which defines x number of inputs (variables), h number of hidden layers, and y number of outputs (responses) as shown in Figure 1.Each hidden layer comprises a certain number of neurons that gives a robust model, and this could be achieved by training and trials.The hidden layers are composed of activation functions that apply different weights to the input layer and transfer them to the output layers.The most common activation functions are the nonlinear continuous sigmoid, the tangent sigmoid, and the logarithmic sigmoid [44].The weights are multiplied with the inputs and calibrated through a training process between the input and output layers to reduce the loss.The training process is applied using the Levenberg-Marquardt backpropagation algorithm, which belongs to the family of the Multi-Layer Perceptron (MLP) network [45] and was originally proposed by Rumelhart et al. [46].It consists of two steps: feed-forward the values to calculate the error, and then propagate back the error to previous layers [47,48].The repeated iteration process (epochs) of backpropagation network error continues and it keeps adjusting the interconnecting weights until the network error is reduced to an acceptable level.Once the most accurate solution is formed during the training process, the weights and biases are fixed, and the training process stops.The Levenberg-Marquardt is a standard numerical method which achieves the second-order training speed with no need to compute the Hessian matrix and was demonstrated to be efficient with training networks up to a few hundred weights [47,49].Figure 2 shows the output signal for a generic neuron j in the hidden layer h defined in Equation ( 1), where  is the weight that connects the ith neuron of the current layer to the jth neuron of the following layer, xi is the input variable, b is the bias associated with the jth neuron to adjust the output along with the weighted sum, and f is the activation function that is usually adapted as either a tangent sigmoid or a logarithmic sigmoid, Equations ( 2) and (3), respectively.The (RBF-NN) that was used first by [50] is a function whose response either decreases or increases with the distance from a center point [51,52].The hidden layers are composed of activation functions that apply different weights to the input layer and transfer them to the output layers.The most common activation functions are the nonlinear continuous sigmoid, the tangent sigmoid, and the logarithmic sigmoid [44].The weights are multiplied with the inputs and calibrated through a training process between the input and output layers to reduce the loss.The training process is applied using the Levenberg-Marquardt backpropagation algorithm, which belongs to the family of the Multi-Layer Perceptron (MLP) network [45] and was originally proposed by Rumelhart et al. [46].It consists of two steps: feed-forward the values to calculate the error, and then propagate back the error to previous layers [47,48].The repeated iteration process (epochs) of backpropagation network error continues and it keeps adjusting the interconnecting weights until the network error is reduced to an acceptable level.Once the most accurate solution is formed during the training process, the weights and biases are fixed, and the training process stops.The Levenberg-Marquardt is a standard numerical method which achieves the second-order training speed with no need to compute the Hessian matrix and was demonstrated to be efficient with training networks up to a few hundred weights [47,49].Figure 2 shows the output signal for a generic neuron j in the hidden layer h defined in Equation (1), where w h ij is the weight that connects the ith neuron of the current layer to the jth neuron of the following layer, x i is the input variable, b is the bias associated with the jth neuron to adjust the output along with the weighted sum, and f is the activation function that is usually adapted as either a tangent sigmoid or a logarithmic sigmoid, Equations ( 2) and (3), respectively.The (RBF-NN) that was used first by [50] is a function whose response either decreases or increases with the distance from a center point [51,52].During the training process of BPNN, usually, the training is terminated when one of the following criteria is first met: (i) fixing the number of epochs to a certain number, (ii) the training error is less than a specific training goal, or (iii) the magnitude of the training gradient is less than a specified small value (i.e., 1.0 × 10 −10 ).The training error is the error obtained from running the trained model back on the data used in the training process, while the training gradient is the error calculated as a direction and magnitude during the training of the network that is used to update the network weight in the right direction and amount.

Fuzzy Neural Network (FNN)
The FNN approach combines the capability of neural networks with fuzzy logic reasoning attributes [53,54].The architecture of FNN is composed of an input layer, a membership layer, an inference layer, and an output layer (defuzzification layer), as shown in Figure 3.The membership and inference layers replace the hidden layers in the ANN.The input layer consists of n number of variables and the inference layer is composed of m number of rules, and accordingly n × m numbers of neurons exist in the membership layer.The activation function adopted in the membership layer is a Gaussian function as shown in Equation ( 4) and illustrated in Figure 3.
where uij is the value of the membership function of the ith input corresponding to the jth rule, mij and σij are the mean and the standard deviation of the Gaussian function.During the training process of BPNN, usually, the training is terminated when one of the following criteria is first met: (i) fixing the number of epochs to a certain number, (ii) the training error is less than a specific training goal, or (iii) the magnitude of the training gradient is less than a specified small value (i.e., 1.0 × 10 −10 ).The training error is the error obtained from running the trained model back on the data used in the training process, while the training gradient is the error calculated as a direction and magnitude during the training of the network that is used to update the network weight in the right direction and amount.

Fuzzy Neural Network (FNN)
The FNN approach combines the capability of neural networks with fuzzy logic reasoning attributes [53,54].The architecture of FNN is composed of an input layer, a membership layer, an inference layer, and an output layer (defuzzification layer), as shown in Figure 3.The membership and inference layers replace the hidden layers in the ANN.The input layer consists of n number of variables and the inference layer is composed of m number of rules, and accordingly n × m numbers of neurons exist in the membership layer.The activation function adopted in the membership layer is a Gaussian function as shown in Equation ( 4) and illustrated in Figure 3.
where u ij is the value of the membership function of the ith input corresponding to the jth rule, m ij and σ ij are the mean and the standard deviation of the Gaussian function.

Decision Tree (DT)
The DT method is one of the supervised ML models where the algorithm assigns the output through testing in a tree of nodes and by filtering the nodes (decision nodes) down within the split sub-nodes (leaf nodes) to reach the final output.The decision trees may differ in several dimensions such as the test might be multivariate or univariate, or the test may have two or more outcomes, and the attributes might be numeric or categorical [55][56][57].

Decision Tree (DT)
The DT method is one of the supervised ML models where the algorithm assigns the output through testing in a tree of nodes and by filtering the nodes (decision nodes) down within the split sub-nodes (leaf nodes) to reach the final output.The decision trees may differ in several dimensions such as the test might be multivariate or univariate, or the test may have two or more outcomes, and the attributes might be numeric or categorical [55-57].

Ensemble Methods (EM)
The EM methods include: (1) bagging regression tree that is also referred to as the random forest (RF) algorithm, (2) gradient boosting regression tree (GBRT) or decision tree (GBDT), and (3) extreme gradient boosting (XGB).All EM methods could be defined as a combination of different decision trees to overcome the weakness that may occur in a single tree such as sensitivity to training data and unstableness [58].The forest generated by the RF algorithm is either trained through bagging, which was proposed by Breiman [59], or through bootstrap aggregating [60].RF splits in each node n features among the total m features, where n is recommended to be or √2 [61].It reduces the overfitting of datasets and increases precision.Overfitting is overtraining the model which causes it to be particular to certain datasets and lose the generalized aspect desired in ML models.The DR and RF methods are commonly used in classification and regression problems.
GBRT, also known as GBDT as mentioned above, was first developed by Friedman [62] and is one of the most powerful ML techniques deemed successful in a broad range of applications [63,64].GBDT combines a set of weak learners called classification and regression tree (CART).To eliminate overfitting, each regression tree is scaled by a factor, called learning rate (Lr) which represents the contribution of each tree to the predicted values for the final model.The predicted values are computed as the sum of all trees multiplied by the learning rate [65].Lr with maximum tree depth (Td) determines the number of regression trees for building the model [66].Previous studies proved that smaller Lr decreases the test error but increases computational time [63,64,67].A subsampling procedure was introduced by Friedman [60] to improve the generation capability of the model using subsampling fraction (Fs) that is chosen randomly from the full date set to fit the base learner.
Another popular method from the EM family is the XGBoost, or XGB as defined above, which is similar to the random forest and was developed by Chen and Guestrin [68].XGB has more enhancement compared to other ensemble methods.It can penalize more complex models by using both LASSO (L1) and Ridge (L2) regularization to avoid

Ensemble Methods (EM)
The EM methods include: (1) bagging regression tree that is also referred to as the random forest (RF) algorithm, (2) gradient boosting regression tree (GBRT) or decision tree (GBDT), and (3) extreme gradient boosting (XGB).All EM methods could be defined as a combination of different decision trees to overcome the weakness that may occur in a single tree such as sensitivity to training data and unstableness [58].The forest generated by the RF algorithm is either trained through bagging, which was proposed by Breiman [59], or through bootstrap aggregating [60].RF splits in each node n features among the total m features, where n is recommended to be 1 3m or √ 2m [61].It reduces the overfitting of datasets and increases precision.Overfitting is overtraining the model which causes it to be particular to certain datasets and lose the generalized aspect desired in ML models.The DR and RF methods are commonly used in classification and regression problems.
GBRT, also known as GBDT as mentioned above, was first developed by Friedman [62] and is one of the most powerful ML techniques deemed successful in a broad range of applications [63,64].GBDT combines a set of weak learners called classification and regression tree (CART).To eliminate overfitting, each regression tree is scaled by a factor, called learning rate (Lr) which represents the contribution of each tree to the predicted values for the final model.The predicted values are computed as the sum of all trees multiplied by the learning rate [65].Lr with maximum tree depth (Td) determines the number of regression trees for building the model [66].Previous studies proved that smaller Lr decreases the test error but increases computational time [63,64,67].A subsampling procedure was introduced by Friedman [60] to improve the generation capability of the model using subsampling fraction (Fs) that is chosen randomly from the full date set to fit the base learner.
Another popular method from the EM family is the XGBoost, or XGB as defined above, which is similar to the random forest and was developed by Chen and Guestrin [68].XGB has more enhancement compared to other ensemble methods.It can penalize more complex models by using both LASSO (L1) and Ridge (L2) regularization to avoid overfitting.It handles different types of sparsity patterns in the data, and it uses the distributed weighted quantile sketch algorithm to find split points among weighted datasets.There is no need to specify in every single run the exact number of iterations as the algorithm has built-in cross-validation that takes care of this task.

Gaussian Process Regression (GPR)
The GPR is a supervised learning model that combines two processes: (1) prior process, where the random variables are collected, and (2) posteriori process, where the results are interpolated.This method was introduced by Rasmussen [69] and developed on the basis of statistical and Bayesian theory.GPR has a stronger generalization ability, self-calculates the hyper-parameters in GPR, and the outputs have clear probabilistic meaning [70].These advantages make the GPR preferable compared to BPNN, as it could handle complex regression problems with high dimensions and a small sample size [69,71].Background theory and informative equations can be found in detail in the literature [69,70].

Generative Adversarial Networks (GAN)
The GAN technique was proposed by Goodfellow et al. [72], which is based on a game theory of a minimax two players game.The GAN has attracted worldwide interest in terms of generative modeling tasks.The purpose of this approach is to estimate the generative models via an adversarial process.The approach is achieved by training two models; first, a generative model G that capture all the distribution in the data, and second, a discriminative model D that estimates the probability of a sample to come from the training data rather than G.The G model defines the p model (x) and draws samples from the distribution of p model.The input is placed as vector z and the model is defined by a prior distribution p(z) over the vector z as a generator function G(z:θ(G)), where θ(G) is a set of learnable parameters that define the generator's strategy in the game [73].More details about the GAN models can be found in [72,73].

K-Nearest Neighbors (KNN)
The KNN algorithm is a supervised non-parametric classification machine learning algorithm that was developed by Fix and Hodges [74].The KNN does not perform any training or assumption for storing the data, but it assigns the unseen data to the nearest set of data used in the training process.According to the value of K, the algorithm started to determine the class for the point to be assigned to according to the value K.For instance, if K is 1, the unseen point will be assigned to a certain class according to the class of the nearest point, or to the nearest five points in the case of K is 5, etc.The KNN is one of the simplest ML classification algorithms and more details can be found in [75].

Support Vector Machine (SVM)
The SVM is a supervised learning method used for the purpose of classification and regression that use kernel functions.The SVM algorithm is based on determining a hyperplane in an N-dimensional space depending on the number of features that classify the dataset.The optimum hyperplane for classification purposes is associated with the maximum margin between the support vectors which are composed of the dataset nearest to that hyperplane [76].SVM was developed by Vapnik [77] and is considered to be one of the most simple and robust classification algorithms.More details about SVM can be found in [78].

Prior Studies on Applying ML Techniques in Structural Wind Engineering
A broad range of studies is summarized in this section based on the three categories mentioned before, i.e., (1) prediction of wind-induced pressure/speed on different structures using data from experimental models, (2) integration of CFD models with ML models for wind loads prediction, and (3) assessment of the aeroelastic responses for buildings and bridges.Like several ML trends, the number of studies applying or implementing ML for wind engineering has been increasing significantly, specifically in the last couple of years.This reveals the future potential within the wind engineering community where ML techniques continue to gain more attention and interest from academicians and researchers.More than 50% of the total number of studies that were considered in this survey and started in the past 30 years were published only in the last two years (Figure 4), which elucidates the importance of implementing ML techniques in this important and critical domain.
for wind engineering has been increasing significantly, specifically in the last couple of years.This reveals the future potential within the wind engineering community where ML techniques continue to gain more attention and interest from academicians and researchers.More than 50% of the total number of studies that were considered in this survey and started in the past 30 years were published only in the last two years (Figure 4), which elucidates the importance of implementing ML techniques in this important and critical domain.

Prediction of Wind-Induced Pressure
Wind-induced pressure prediction forms an essential area in structural wind engineering.In addition to field studies, different tools can be used for estimating wind loads and pressure coefficients on surfaces, such as atmospheric boundary layer wind tunnels (ABLWT) or CFD simulations.Both ABLWT and CFD are commonly used but in some cases may require significant time, cost and expertise [79].As in other fields of civil engineering, studies using ML techniques have gained some momentum and wind engineers have shown interest in identifying a reliable approach to predict wind speeds and or wind-induced pressures for common wind-related structural applications.A summary of the key attributes and ML implementation in the studies that were included in the review related to the first category, i.e., the prediction of wind-induced pressures and time series from experimental testing or databases, is first provided in Table 1, then each study is discussed in more details in this section.The input variables used in each study are significant to the desired output needed from training the ML model.It depends mainly on the architecture of the model and the different inclusive parameters for each dataset.For predicting surface pressure it may depend mainly on either the coordinates of the pressure taps, or slope of the roof, wind direction, or building height.While for the aeroelastic responses of bridges, the input variables mainly depend on parameters such as displacement, velocity and acceleration response for the bridges.One of the studies used the dimension between the buildings as input variables (Sx, Sy) to predict the interference effect on surface pressure.

Prediction of Wind-Induced Pressure
Wind-induced pressure prediction forms an essential area in structural wind engineering.In addition to field studies, different tools can be used for estimating wind loads and pressure coefficients on surfaces, such as atmospheric boundary layer wind tunnels (ABLWT) or CFD simulations.Both ABLWT and CFD are commonly used but in some cases may require significant time, cost and expertise [79].As in other fields of civil engineering, studies using ML techniques have gained some momentum and wind engineers have shown interest in identifying a reliable approach to predict wind speeds and or windinduced pressures for common wind-related structural applications.A summary of the key attributes and ML implementation in the studies that were included in the review related to the first category, i.e., the prediction of wind-induced pressures and time series from experimental testing or databases, is first provided in Table 1, then each study is discussed in more details in this section.The input variables used in each study are significant to the desired output needed from training the ML model.It depends mainly on the architecture of the model and the different inclusive parameters for each dataset.For predicting surface pressure it may depend mainly on either the coordinates of the pressure taps, or slope of the roof, wind direction, or building height.While for the aeroelastic responses of bridges, the input variables mainly depend on parameters such as displacement, velocity and acceleration response for the bridges.One of the studies used the dimension between the buildings as input variables (Sx, Sy) to predict the interference effect on surface pressure.Many methods can be used for predicting and interpolating multivariate modeling problems, such as linear interpolation and regression polynomials.However, linear interpolation cannot solve nonlinear problems and regression polynomials are common to obtain empirical equations, but these empirical equations lack the generality to be used with other data and a large number of variables [81].Therefore, ML models generally and ANN particularly have the advantages over the latter methods in complex problems.
Most of the studies have adopted the three-stage evaluation process of training, testing and validation (TTV), which was proposed by [93] to build a robust ML model.The cross-validation process comprises two steps: first, the dataset is randomly shuffled and is divided into k subsets of similar sizes, then k − 1 sets are used for training and one set is used as the testing set to assess the performance of the model.The stability and the accuracy of the validation method depend mainly on the k value.Hence, the cross-validation method is usually referred to as k-fold cross-validation [19,94] and is illustrated in Figure 5.Many of the reviewed studies used the 10-fold CV method following Refaeilzadeh et al.'s [95] recommendation of using k = 10 as a good estimate.
ANN is the most commonly used technique employed in the reviewed studies (see Table 1).A study by Chen et al. [81] predicted the pressure coefficients on a gable roof using ANN.This was one of the most important and early studies for implementing ML models to predict wind-induced pressure on building surfaces.Later, Chen et al. [96] interpolated pressure time series from existing buildings to different roof height buildings, and then successfully extrapolated to other buildings with different dimensions and roof slopes using ANN.
Zhang and Zhang [82] evaluated the interference wind-induced effects, that were expressed by interference factor (IF) among tall buildings using radial basis function neural networks (RBF-NN).The RBF-NN is a feed-forward type neural network, but the activation function is different from those that are commonly used (i.e., tangent sigmoid or a logarithmic sigmoid).The RBF-NN was used first by [50] and it is a function whose response either decreases or increases with the distance from a center point [51,52].It was found that the predicted IF values were in very good agreement with the experimental counterparts.The interference index due to shielding between buildings was predicted from experimental data from wind tunnels using neural network models by English [97].The study found that the neural network model was able to accurately predict the interference index for building configurations that have not been tested experimentally.The interference index can be calculated by subtracting 1 from the shielding (buffeting) factor.
method is usually referred to as k-fold cross-validation [19,94] and is illustrated in F 5. Many of the reviewed studies used the 10-fold CV method following Refaeilzad al.'s [95] recommendation of using k = 10 as a good estimate.ANN is the most commonly used technique employed in the reviewed studie Table 1).A study by Chen et al. [81] predicted the pressure coefficients on a gable using ANN.This was one of the most important and early studies for implementin models to predict wind-induced pressure on building surfaces.Later, Chen et al. [9 terpolated pressure time series from existing buildings to different roof height build and then successfully extrapolated to other buildings with different dimensions and slopes using ANN.
Zhang and Zhang [82] evaluated the interference wind-induced effects, that we pressed by interference factor (IF) among tall buildings using radial basis function n networks (RBF-NN).The RBF-NN is a feed-forward type neural network, but the a tion function is different from those that are commonly used (i.e., tangent sigmoid logarithmic sigmoid).The RBF-NN was used first by [50] and it is a function who sponse either decreases or increases with the distance from a center point [51,52].I found that the predicted IF values were in very good agreement with the experim counterparts.The interference index due to shielding between buildings was pred from experimental data from wind tunnels using neural network models by English The study found that the neural network model was able to accurately predict the ference index for building configurations that have not been tested experimentally interference index can be calculated by subtracting 1 from the shielding (buffeting) f Bre et al. [85] predicted the surface-averaged pressure coefficients of low-rise b ings with different types of roofs using ANN.The predicted mean pressure coeffic using the Tokyo Polytechnic University (TPU) database [98] as input data, were rea ble when compared to the "M&P" parametric equation [99] and the "S&C" equation Those two equations are provided here (Equations ( 5) and ( 6), respectively) for con ience.Bre et al. [85] predicted the surface-averaged pressure coefficients of low-rise buildings with different types of roofs using ANN.The predicted mean pressure coefficients, using the Tokyo Polytechnic University (TPU) database [98] as input data, were reasonable when compared to the "M&P" parametric equation [99] and the "S&C" equation [100].Those two equations are provided here (Equations ( 5) and ( 6), respectively) for convenience.
Cp θ, D B = Cp(0 where a i and b i are adjustable coefficients, θ is the wind angle, D/b is the side ratio, G = ln(D/B), and Cp(0 • ) is assumed by Swami and Chnadra [100] equal to 0.6 independent from D/B. Hu and Kwok [66] successfully predicted the wind pressures around cylinders using different ML techniques for Reynolds numbers ranging from 104 to 106, and turbulence intensities levels ranging from 0% to 15% using several data from previous literature.In this particular study, the RF and GBRT performed better than the single regression tree model.Fernández-Cabán et al. [86] used ANN to predict the mean, RMS and peak pressure coefficients on low-rise building flat roofs for three different scaled models.The predicted mean and RMS pressure coefficient show a very good agreement with the experimental data, especially for the smaller-scale model.Hu and Kwok [88] investigated the wind pressure on tall buildings under interference effects using different ML models.The models were trained by different portions of the dataset ranging from 10% to 90% of the available data.The results showed that the GANs model could predict wind pressures based on 30% training data only, which may eliminate 70% of the wind tunnel test cases and accordingly decrease the cost of testing.In addition, RF exhibited a good performance when the number of grown trees, the n number of features and the maximum depth of the tree were set to 100, 3 and 25, respectively.Likewise, Vrachimi [101] predicted wind pressure coefficients for box-shaped obstructed building facades using ANN with a ±0.05 confidence interval for a confidence level of 95%.
Tian et al. [90] focused on predicting the mean and the peak pressure coefficient on a low-rise gable building using a deep neural network (DNN).This study presented a strategy to predict peak pressure coefficients which is considered a more challenging task when ML models are used.The strategy is used to predict first the mean pressure coefficient and then use the predicted mean pressure as an input with other input variables to predict peak pressure coefficients.This strategy is a reflection of the ensemble methods idea [58], which is an effective method for solving complex problems with limited inputs.FNN models were also successfully used in several studies [53,54,102] to predict mean pressure distribution and power spectra of fluctuating pressures.The most significant feature of FNN models is the capability of approximating any nonlinear continuous function to a desired degree of accuracy.Thus, this family of methods can capture the non-linearity relationship between the different input variables such as wind pressures, wind directions, and coordinates of pressure taps.
Another technique that is based on the methodology of applying ANN was used by Mallick et al. [92] in predicting surface mean pressure coefficients using equations for the group method of data handling neural networks (GMDH-NN)-a derivative method from ANN.The GMDH-NN is a self-organized system that provides a parametric equation to predict the output and can solve extremely complex problems [103].This ML algorithm was established using the GMDH shell software [104] and it is based on the principle of termination [104][105][106] to find the nonlinear relation between pressure coefficients and the input variables.Termination is the process where the parameters are seeded, reared, hybridized, selected, and rejected to determine the input variables.The study investigated in detail the effect of curvature and corners on pressure distribution and obtained an equation with different variables to predict the mean pressure coefficients.One major difference between ANN and GMDH-NN is that the neurons are filtered simultaneously based on their ability to predict the desired values, and then only those beneficial neurons are fed forward to be trained in the following layer, while the rest are discarded.
One other method to predict wind-induced pressures and full dynamic response, i.e., time history on high-rise building surfaces, was proposed by Dongmei et al. [84] using a backpropagation neural network (BPNN) combined with proper orthogonal decomposition (POD-BPNN).POD was utilized by Armitt [107] and later by Lumley [108] to deal with wind turbulence-related issues.The advantage of the POD-BPNN method over the ANN is its capability to predict pressure time series for trained data with time parameter t.POD is an approach that is based on a linear combination of a series of orthogonal load modes, where the spatial distributed multivariable random loads can be reconstructed through it and loading principle coordinates [109].The orthogonal load modes are space-related and time-independent, while the loading principal coordinates are time-varying and spaceindependent.Before applying the BPNN, the wind loads were decomposed using POD where the interdependent variables are transformed into a weighted superposition of several independent variables.More details about the POD background theory can be found in the literature [110][111][112].The training algorithm applied in that study was the improved global Levenberg-Marquardt algorithm, which can achieve a faster convergence speed [113,114].A similar study by Ma et al. [87] investigated the wind pressure-time history using both gaussian process regression (GPR) and BPNN on a low-rise building with a flat roof.The study concluded that GPR has high accuracy for time history interpolation and extrapolation.
The wind pressure time series and power spectra were again recently simulated and interpolated on tall buildings by Chen et al. [91] using three ML methods: BPNN, genetic algorithm (GANN), and wavelet neural network (WNN).The WNN produced the most accurate results within the three methods.The WNN combines the advantages of ANN with wavelet transformation, which has time-frequency localization property and focal features which are different from neural networks that have self-adaptive, fault tolerance, robustness and strong inference ability [115].The reviewed literature showed that the developed BPNN models could generalize the complex, multivariate nonlinear functional relationships among different variables such as wind-induced pressures and locations of pressure taps.Predicting pressure time series at different roof locations was achieved using ANN and the robustness of the models was able to overcome the problems associated with linear interpolation for low-resolution data.
A recent study [92] developed an ML model to predict the wind-induced mean and peak pressure for non-isolated buildings, considering the interference effect of neighboring structures using GBDT combined with the grid search algorithm (GSA).The study used wind tunnel data from TPU for non-isolated buildings.The data were split by a ratio of 9:1, where 90% of the dataset was used for training and 10% of the dataset was used for testing.Four hyperparameters were considered in developing the ML model, two hyperparameters for CART (i.e., maximum depth, d, for each decision tree, and a minimum number of samples to split an internal node), and two hyperparameters for a gradient boosting approach, i.e., learning rate (Lr) and number of CART models.The developed method was shown to be a robust and accurate method to predict the wind-induced pressure on structures under the interference effects of neighboring structures.Zhang et al. [116] predicted the typhoon-induced response (TIR) on long-span bridges using quantile random forest (QRF) with bayesian optimization instead of the traditional FE analysis.The QRF with bayesian optimization was able to provide adequate probabilistic estimations to quantify the uncertainty in predictions.

Integration of CFD with Machine Learning
Several studies integrated CFD simulations with ML techniques to predict either the wind force exerted on bluff bodies or the aeroelastic response of bridges and other flexible structures [117][118][119][120][121][122].Chang et al. [123] predicted the peak pressure coefficients on a low-rise building using 12 output data types from a CFD model such as mean pressure coefficient, dynamic pressure, wind speed, etc. as input variables in the ANN model.The predicted peak pressures were in good agreement with the wind tunnel data.Similarly, Vesmawala et al. [124] used ANN to predict the pressure coefficient on domes of different span to height ratios.The data were generated from the CFD model by developing a dome and a wind flow through the model.The predicted mean pressure coefficients were used for training the ML model with a maximum number of epochs of 50,000 to achieve the specified error tolerance.There were three main inputs: the span/height ratio, the angle measured vertically with respect to the vertical axis of the dome to the ring beam, and the angle measured horizontally with respect to wind direction.The study used neuroscience software in the model training and testing, and it was found that the BPNN predicted the mean pressure coefficients accurately through different locations along the dome.
Bairagi and Dalui [125] investigated the effect of a setback in tall buildings by predicting pressure coefficients along the building's face.The study used ANN and Fast Fourier Transform (FFT) to validate the wind-induced pressure on different setback buildings predicted by CFD simulation models.The predicted wind pressures were validated before using similar experimental data.The study showed that CFD was capable to predict similar pressure coefficients to experimental data and showed that ANN was capable to predict and validate these pressure coefficients.The Levenberg-Marquardt algorithm was used as the training function, starting with 500 training epochs which were increased until the correlation coefficient exceeded the 99th percentile.The model was trained using MATLAB neural network toolbox [126].
A recent study [127] proposed a multi-fidelity ML approach to predict wind loads on tall buildings by integrating CFD models with ML models.The study combined data from a large number of wind directions using the computationally efficient Reynoldsaveraged Navier-Stokes (RANS) model with a smaller number of wind directions using the more computationally intense Large Eddy Simulation (LES) method to predict the RMS pressure coefficients on a tall building.The study utilized four types of ML models: linear regression, quadratic regression, RF, and DNN, with the latter being the most accurate.In addition, a bootstrap algorithm was used to generate an ensemble of ML models with accurate confidence intervals.This study used the Adam optimization algorithm [128] and Rectified Linear Unit (ReLU) activation function [129,130] with a learning rate of 0.001 and regularization strength of 0.01 to avoid overfitting.That was contrary to other studies that used the Levenberg-Marquardt algorithm and tangent sigmoid or logarithmic sigmoid activation functions and this is because the other studies used the ANN method of two or less hidden layers, while the latter study used a DNN with three hidden layers.
To conclude this section, a summary of the attributes of the reviewed previous studies that integrate ML applications with CFD is provided in Table 2.

Aeroelastic Response Prediction Using ML
The prediction of aeroelastic responses for buildings and structures by using ML models is also of interest to this review.The input that was used for the prediction of these responses is either CFD simulations (Table 2) or physical testing databases (Table 3).Similar to the previous two sections, Table 3 is meant to provide a summary of the attributes of the key studies reviewed in this section that is concerned with using ML for aeroelastic response prediction.
Chen et al. [135] used a BPNN that was built from a limited dataset of already existing dynamic responses of rectangular bridge sections.The results indicated that the ANN prediction scheme performed well in the prediction of dynamic responses.The authors claimed that such an approach may reduce cost and save time by not using extensive wind tunnel testing, especially in the preliminary design.Wu and Kareem [131] developed a new approach utilizing ANN with cellular automata (CA) scheme to model the hysteretic behavior of bridge aerodynamic nonlinearities in the time domain.This approach was developed because the ANN is time-consuming until the ideal number of hidden layers and neurons between the input and output are determined.By embedding the CA scheme, which was originally proposed by [136] and later developed by [137] with ANN, the authors of that study aimed to improve the efficiency of the ANN models.The CA scheme is an approach that dynamically evolves in discrete space and time using a local rule belonging to a class of Boolean functions.This scheme is appealing as it could simulate very complicated problems with the simple local rule which is applied to the system consistently in space and time.The activation function used in the ANN training was bipolar sigmoid as shown in Equation (7).The CA scheme is an indirect encoding scheme that is based on the CA representative and could be designed using two cellular systems, i.e., the growing cellular system and the pruning cellular system [138].The ANN configuration based on the CA scheme was examined using a fitness index that is defined in Equation ( 8), which is a function of learning cycles and connections of ANN [139].The dynamic response of tall buildings was studied by Nikose and Sonparote [141,147] using ANN and the proposed graphs were able to predict the along-and across-wind responses in terms of base shear and base bending moments according to the Indian Wind Code (IWC).Both studies found that the back propagation neural network algorithm was able to satisfactory estimate the dynamic along-and across-wind responses of tall buildings.Similarly, different ML models were applied by Hu and Kwok [144] based on DT, KNN regression, RF, and GBRT to predict four types of crosswind vibrations (i.e., over-coupled, coupled, semi-coupled and decoupled) for rectangular cylinders.The data used in training and testing processes were extracted from wind tunnel data.It was found that GBRT can accurately predict crosswind responses which can supplement wind tunnel tests and numerical simulation techniques.One of the input variables used in that study was the Scruton number (Sc).
Oh et al. (2019) [140] studied the wind-induced response of tall buildings using CNN and focused on the structural safety evaluation.The trained model predicted the column strains using wind tunnel data such as wind speed and top floor displacements.The architecture of the trained model is composed of the input layer, two convolutional layers, two pooling layers, one fully connected layer, and the output layer.The input map forms the convolutional layer through convolution using the kernel operator.The ML-based model was utilized to overcome the uncertainties in the material, geometric properties and stiffness contribution of nonstructural elements which make it difficult to construct a refined finite element model.
Li et al. [133] used LSTM-originally proposed by Hochreiter and Schmidhuber [148]to predict the nonlinear unsteady bridge aerodynamic responses to overcome the increasing difficulties that exist in the gradient-based learning algorithm in the recurrent neural network (RNN) face.The RNN was developed to introduce the time dimension into the network structure, and it was found to be capable of predicting a full-time series where nonlinear relation exists between input and output.The study used displacement time series as input variables, and by weighting these time series, both the acceleration and velocity were obtained.The LSTM model was able to calculate the deck vibrations (i.e., lift displacement and torsional angle) under the unsteady nonlinear wind loads.Hu and Kwok [136] investigated the vortex-induced vibrations (VIV) of two circular cylinders with the same dimensions but staggered configurations, using three ML algorithms: DT, RF, and GBRT.The two cylinders were modeled first into a CFD simulation, and the mass ratio, wind direction, the distance between cylinders, and wind velocity were used as input variables.The GBRT algorithm was the most accurate in predicting the amplitude of the upstream and downstream vibration.Abbas et al. [132] employed ANN to predict the aeroelastic response of bridge decks using response time histories as the input variables.The predicted forces were compared with CFD findings to evaluate the ANN model.The ANN model was also coupled with the structural model to determine the aeroelastic instability limit of the bridge section, which demonstrated the potential use of this framework to predict the aeroelastic response for other bridge cross-sections.
More recently, surrogate models have been used widely in different areas related to structural wind engineering [149][150][151][152].One type of surrogate model is using the aid of finite element models (FEM) to obtain an output that can be used as an input in the trained model of the ML.Chen et al. [153] used a surrogate model in which the ANN was applied to the FE model to update the model parameter for computing the dynamic response of a cable-suspended roof while using the wind loads from full-scale measurements for three typhoon events in three consecutive years from 2011 to 2014.Luo and Kareem [154] proposed a surrogate model using a convolutional neural network (CNN) for systems with high dimensional inputs/outputs.Rizzo and Caracoglia [145] predicted the windinduced vertical displacement of a cable net roof using ANN.The trained model used wind tunnel pressure coefficient datasets and FEM wind-induced vertical displacement datasets.The surrogate model showed that it can successfully replicate more complex geometrically nonlinear structural behavior.Rizzo and Caracoglia [155] used surrogate flutter derivate models to predict the flutter velocity of a suspension bridge.The ANN model was trained using the critical flutter velocities dataset by calculating the flutter derivatives experimentally.The model successfully generated a large dataset of critical flutter velocities.In addition, surrogate modeling could analyze the structural performance of vertical structures under tornado loads by training fragilities using ANN [156,157].
Lin et al. [146] used a light gradient boosting machine (LGBM) method, which is an optimized version of the GBDT algorithm proposed by Ke et al. [158], with a clustering algorithm to predict the crosswind force spectra of tall buildings.This optimized algorithm combined two techniques in training the models: the gradient base one side sampling (GOSS) and the exclusive feature bundling (EFB).The results showed that the proposed method is effective and efficient to predict the crosswind force spectrum for a rectangular tall building.Liao et al. [143] used four different ML techniques (i.e., SVR, ANN, RF, and GBRT) to predict the flutter wind speed for a box girder bridge.The ANN and GBRT models accurately predicted the flutter wind speed for the streamlined box girders.The buffeting response of bridges can be predicted analytically using buffeting theory.However, some previous studies [159][160][161][162][163] have shown inconsistency between full-scale measured response and buffeting theory estimates.Thus, Castellon et al. [142] trained two ML models (ANN and SVR) to estimate the buffeting response speed using full-scale data from the Hardanger bridge in Norway.The two ML models predicted the bridge response more accurately than the buffeting theory when compared to the full-scale measurement.Furthermore, the drag force of a circular cylinder can be reduced by optimizing the control parameter such as feedback gain and the phase lag using neural networks by minimizing the velocity fluctuations in the cylinder wake [164].

Summary of Tools of Performance Assessment of ML Models
The performance of the ML models in wind engineering applications throughout the reviewed literature was assessed through at least one or more forms of different standard statistical error and standard indices.It is important for any ML model to evaluate the performance of the model using some error metrics or factors.Thus, this section aims to provide future researchers with a summary of all the tools and equations that have been used up to this date in structural wind engineering ML applications along with an assessment of which tools are more appropriate for the applications at hand.The compiled list of metrics, or factors, calculates the error to evaluate the accuracy between the ML predicted data and a form of ground truth such as experimental data or independent sets of data that were not used in training among others.There is always a lack of consensus on the most accurate metric that can be used.Nonetheless, this section attempts to provide more guidance on which methods are preferred based on the surveyed studies.
Several error metrics were used throughout the reviewed literature which include: Akaike information criterion (AIC), coefficient of efficiency (E f ), coefficient of determination (R 2 ), Pearson's correlation coefficient (R), mean absolute error (MAE), mean absolute percentage error (MAPE), mean square error (MSE), root mean square error (RMSE), scatter index (SI), and sensitivity error (Si).For the convenience of the readers and for completeness, the equations used to express each of these error metrics for assessing predicted data (p i ) against measured data (m i ) are summarized below (Equations ( 9)-( 18)).For N number of data points (e.g., N could be the number of pressure tabs used to provide experimental data), some of the error calculation equations also use average or mean values for predicted data (p) as well as measured data (m).
where f max (x i ) and f min (x i ), are the corresponding maximum and minimum values for the predicted output over the ith input factor while using the mean values for the other factors.
In general, MSE was employed in most of the studies and is considered one of the most common error metrics for pressure distribution prediction, but it is not always an accurate error metric.The MSE accuracy decreases when the pressure among the walls is included in the prediction because walls might introduce a pressure coefficient near zero which may cause a great rise as the normalizing denominator [90].Nevertheless, MSE is generally stable when used in RF models when the number of trees reaches 100 [88].The RMSE is not affected by the near-zero pressure coefficient as with MSE because it does not include a normalization factor in the calculation.Nevertheless, the lack of normalization is considered a limitation for this metric in the cases where the scale of pressure coefficients changes [90].The accuracy of some metric errors increases when their values approach one (i.e., coefficient of determination-R 2 ), which means that the predicted data are close to the experimental data, and the accuracy of some others increases when their values are close to zero (i.e., root mean square error-RMSE).
The correlation coefficient, R, is considered a reliable approach for estimating the prediction accuracy by measuring how similar two sets of data are, but its limitation is that it does not reflect the range nor the bias between the two datasets.The coefficient of efficiency, E, corresponds to the match between the model and observed data and can range from −∞ to 1, and a perfect match corresponds to E = 1 [89].AIC is a mathematical method used to evaluate how the model fits the trained data and this is an information criterion used to select the best-fit model.One other error metric that has not been commonly used in the literature is the SI normalized measure of error, where a lower SI value indicates better performance for the model.Besides the error metrics that assess the performance level of the model, other factors are used to indicate the effect of input variables on the output.The most common example is the sensitivity analysis error percentage (Si) (Equation ( 18)) which computes the contribution of each input variable to the output variable [165][166][167].The Si is an important factor to determine the contribution of each input value, especially when different inputs are used in the ML model training, which could be of great significance for informing and changing the assigned weight of neurons in neural networks.
Overall, it is important to note that each error metric or factor usually conveys specific information regarding the performance of the ML model, especially in the case of wind engineering applications (due to variation of wall versus roof pressures for instance), and most of these metrics and factors are interdependent.Thus, our recommendation is to consider the following factors together: (1) use R 2 to assess the similarity between the actual and predicted set; (2) use MSE when the model includes the prediction of roof surface pressure coefficients only without walls, but use either MAPE or RMSE when pressure coefficients for walls' surfaces are included in the model; (3) use AIC to select the best fit model in case of linear regression.This recommendation is to stress the fact that using several metric errors together is essential to assess the performance of ML models for structural wind engineering as opposed to only relying on a single metric.

Discussion and Conclusions
As in any other application, the quantity and the quality of data is the main challenge in successfully implementing ML models in the broader area of structural wind engineering.It is important to mention that the quality of the dataset used for training is as important as the quantity of data.The measurements usually may involve some anomalies such as missing data or outliers, thus removing the outliers is essential for the accuracy and robustness of the model [168,169].ML algorithms are data-hungry processes that require thousands if not millions of observations to reach acceptable performance levels.Bias in data collection is another major drawback that could dramatically affect the performance of ML models [170].To this end, some literature recommends that the number of datasets shall not be taken less than 10 times the number of independent variables, according to 10 events per variable (EPV) [171].Meanwhile, K-means clustering was used in many different studies due to its ability to analyze the dataset and recognize its underlying pattern.Most of the ML techniques need several trials and experiments through the validation process to develop a robust model with high accuracy prediction levels.For instance, whenever ANN is used, several trials are conducted for training purposes in terms of choosing the number of hidden layers and the number of neurons in each layer.
The ANN method is not recommended for datasets with a small sample size because this would achieve double the mean absolute error (MAE) compared to other ML techniques [134].ANN is capable of learning and generalizing nonlinear complex functional relationships via the training process, but there is currently no theoretical basis for determining the ideal neural network configuration [81].The architecture of ANN and training parameters cannot be generalized even within data of similar nature [141].Generally, one hidden layer is enough for most problems, but for very complex, fuzzy and highly non-linear problems, more than one hidden layer (node) might be required to capture the significant features in the data [172].The number of hidden nodes is determined through trials and in most cases, this number is set to no more than 2n + 1, where n is the number of input variables [173].In addition, a study by Sheela and Deepa [174] reviewed different models for calculating the number of hidden neurons and developed a proposed method that gave the least MSE compared to the other models.The proposed approach was implemented on wind speed prediction and was very effective compared to other models.Furthermore, a general principle of a ratio of 3:1 or 3:2 between the first and second hidden nodes provides a better prediction performance compared to other combinations [175].Generally, a robust neural network model can be built of two hidden layers and ten neurons and will give a very reasonable response.
ANN also appears to have a significant computational advantage over a CFD-based scheme.In ANN, the computational work is mainly focused on identifying the proper weights in the network.Once the training phase is completed, the output of the simulated system could be obtained through a simple arithmetic operation with any desired input information.On the other hand, in the case of a CFD scheme, each new input scenario requires a complete reevaluation of the fluid-structure interaction over the discretized domain.
From the review of the literature, it was also apparent that ANN has weighted advantages over other ML methods.However, there are some challenges accompanying implementing ANN in certain types of wind engineering applications.ANN is problematic in predicting the pressure coefficients within the leading corner and edges due to the separation which is accompanied by high rms pressure coefficient values and corner vortices.This may be eliminated by training datasets of full-or large-scale models that contain high-resolution pressure tapped areas.It is important to note that whenever the data are fed into a regression model or ANN model (training, validation or testing process), all the predictors are normalized between [−1, 1] to condition the input matrix.In the case of implementing ANN models, the Levenberg-Marquardt algorithm and tangent sigmoid or logarithmic sigmoid activation functions shall be used.On the contrary, the Adam optimization algorithm and Rectified linear unit activation function shall be used whenever a DNN model (i.e., three or more hidden layers) is used as the ML technique.
The literature review revealed that there are selected ML techniques that might not be as popular as ANN yet but with potential for future wind engineering applications and specific structural wind engineering problems.Less common ML methods, such as the wavelet neural network (WNN), are gaining increasing attention due to their advantage over ANN and other models in terms of prediction accuracy and good fit [176].In addition, wavelet analysis is becoming popular due to its capacity to reveal simultaneous spectral and temporal information within a single signal [177].Other ML techniques such as DL can be used as a probabilistic model for predictions based on limited and noisy data [178].GANs models can be used in structural health monitoring for damage detections in buildings using different images for damage that occurred during an extreme wind event.BPNN and GRNN were used to acquire the missing data due to the failed pressure sensors while testing [179].The GPR has high accuracy for time history interpolation and extrapolation and in the same context, the WNN predicts the time series accurately compared to other methods.Surrogate models were proved to be a powerful tool to integrate both FEM with ML models which could solve complex problems, such as the dynamic response of roofs and bridges while using the wind loads from physical testing measurements and can replicate more complex geometrically nonlinear structure behavior.
Ensemble methods have shown good results in predicting wind-induced forces and vibrations of structures.Due to the time-consuming and cost-prohibitive nature of conducting a lot of wind tunnel testing, ML models such as DT, KNN, RF and GBRT are found to be efficient [144], and in turn, recommended for accurately predicting crosswind vibrations.The GBRT specifically can accurately predict crosswind responses when it is needed to supplement wind tunnel tests and numerical simulation techniques.ANN and GBRT are found to be the ideal ML models for wind speed prediction.Moreover, RF and GBRT are found to predict wind-induced loads more accurately when compared to DT. GBDT is preferable to be used over ANN in the case of a small amount of input data, as ANN requires a large amount of input data for an accurate prediction as explained above.Predicting wind gusts, which has not been a common application in the reviewed work in this study, can be achieved accurately using ensemble methods or neural networks and logistic regression [180][181][182][183][184][185].
If only wind tunnel testing is considered, the wind flow around buildings, which provides deep insight into the aerodynamic behavior of buildings, is usually captured using particle image velocimetry (PIV).However, measuring wind velocities at some locations is a challenge due to the laser-light shielding.In such cases, DL might be used to predict these unmeasured velocities at certain locations as proposed in previous work [186].Tropical cyclones and typhoons' wind fields can be predicted using ML models using the storm parameters such as spatial coordinates, storm size and intensity [187,188].
Overall, it was demonstrated through this review that ML techniques offer a powerful tool and were successfully implemented in several areas of research related to structural wind engineering.Such areas that can extend previous work and continue to benefit from ML techniques are mostly: the prediction of wind-induced pressure time series and overall loads as well as the prediction of aeroelastic responses, wind gust estimates, and damage detection following extreme wind events.Nonetheless, other areas that can also benefit from ML but are yet to be explored more and recommended for future wind engineering research include the development and future codification of ML-based wind vulnerability models, advanced testing methods such as cyber-physical testing or hybrid wind simulation by incorporating surrogate and ML models for geometry optimization, wind-structure interaction evaluation, among other future applications.Finally, the physics-informed ML methods could provide a promising way to further improve the performance of traditional ML techniques and finite element analysis.
iables), h number of hidden layers, and y number of outputs (responses) as shown in Figure 1.Each hidden layer comprises a certain number of neurons that gives a robust model, and this could be achieved by training and trials.

Figure 2 .
Figure 2. The generic model of neuron j in hidden layer h.

Figure 2 .
Figure 2. The generic model of neuron j in hidden layer h.

Figure 3 .
Figure 3.The architecture of the four-layer fuzzy neural network.

Figure 3 .
Figure 3.The architecture of the four-layer fuzzy neural network.

Figure 4 .
Figure 4. Number of published ML-related studies with wind engineering applications.

Figure 4 .
Figure 4. Number of published ML-related studies with wind engineering applications.

Figure 5 .
Figure 5. Illustration of the k-fold cross-validation method.

Figure 5 .
Figure 5. Illustration of the k-fold cross-validation method.

Table 1 .
Summary of studies reviewed for wind-induced predictions.

Table 2 .
Summary of studies reviewed for integrating ML models with CFD simulation.

Table 3 .
Summary of studies reviewed for aeroelastic response.