Model for Estimating the Modulus of Elasticity of Asphalt Layers Using Machine Learning

: The management of roads, as well as their maintenance, calls for an adequate assessment of the load-bearing capacity of the pavement structure. This serves as the basis on which future maintenance requirements are planned and plays a signiﬁcant role in determining whether the rehabilitation or reconstruction of the pavement structure is required. The stability of the pavement structure depends on a large number of parameters, and it is not possible to fully assess all of them when making an estimation. One of the most signiﬁcant parameters is the modulus of elasticity of asphalt layers (E AC ). The goal of this study is to use models based on machine learning to perform a quick and efﬁcient assessment of the modulus of elasticity of asphalt layers, as well as to compare the formed models. The paper deﬁnes models for E AC estimation using machine learning, in which the input data include the measured deﬂections and the temperature of the upper surface of the asphalt layer. Analyses of modeling using artiﬁcial neural networks (ANNs), support vector machines (SVMs) and boosted regression trees (BRT) were compared. The SVM method showed a higher accuracy in estimating the E AC modulus, with a mean absolute percentage error (MAPE) of 7.64%, while the ANN method and the BRT achieved accuracies of 9.13% and 8.84%, respectively. Models formed in this way can be practically implemented in the management and maintenance of roads. They enable an adequate assessment of the remaining load-bearing capacity and the level of reliability of the pavement structure using non-destructive methods, at the same time reducing the ﬁnancial costs.


Introduction
Pavement constructions of highways, main roads and local roads are most often realized as flexible pavement constructions. Due to the negative impact of traffic load, various weather conditions and other parameters on the load-bearing capacity of pavement structures, it is necessary to implement a maintenance program in order to determine when and where maintenance works need to be carried out.
Non-destructive methods and the backcalculating method of the elasticity of asphalt layers (E AC ) have proven to be reliable when it comes to analyzing the mechanical properties of the pavement structure, and therefore can be used for the assessment of the load-bearing capacity of pavement structures and their remaining service life [1][2][3][4][5][6].
A non-destructive method provides a mechanical approach to pavement design. In this paper, deflection data were obtained by a non-destructive method using a falling weight deflectometer (FWD) device. For the data obtained in this way, backcalculation procedures were used to determine the stiffness of the individual layers of the pavement structure. In order to calculate the modules as accurately as possible, it was necessary to measure the deflections at different locations along the road section, all of which featured a uniform layer thickness. accurate E AC values in a more rapid way. The research aims to find the best forecasting model, which would be much more practical to apply.

Theory and Calculation
The ANN [26] method requires a large set of data for network training. It then quickly calculates the required output data with significant accuracy and precision, in comparison with classical methods and neuro-fuzzy systems [27]. The SVM method has been derived from the statistical theory of learning [28], which applies structural risk minimization, while the ANN only applies the minimization of the mean square error in the data set. As a result, a better prediction is obtained by the SVM model. The BRT [29] method is a supervised learning method that contrasts with ANN and SVM. Other methods process ML data numerically and result in one "best" model. However, the BRT theory processes them independently of their assumptions and uses the boosting method to combine a large number of relatively simple models for the optimization of predictive performance.

Artificial Neural Networks (ANN)
An ANN is a machine-learning method used for evaluating and processing of information. The general structure of an ANN consists of an input layer, one or more hidden layers and an output layer.
The formation of an ANN model undergoes three phases: the network modeling phase, network training phase and performance evaluation phase.
To define the network architecture, a range of parameters are used, such as: • number of input parameters; • number of layers and neurons in them; • number of output data; • selection of activation functions of hidden and output neurons; • type of training function, whether the network is forward-or backward-oriented.
In this paper, the available data set is divided into training data, validation data and testing data. In the training phase, the model is trained with the learning algorithm Levenberg-Marquardt (LM), a second-order numerical optimization technique which combines the advantages of the Gauss-Newton and the steepest descent algorithms. It is very effective when training networks that have up to several hundred units of weight [30].
Supervised learning involves two basic steps, a forward and a backward one. Neural network training incorporates the following steps for the fine-tuning of weights and biases, in order to obtain the most accurate solutions at the output [26,31]: Step 1: Model initialization: initializes weights and biases; Step 2: Forward propagation: using the given input x i , weights w i and biases b, the linear combination of inputs and weights (z i ) is calculated for each layer. After that, the output values (a i ) are calculated by applying the activation function to (z i ), represented in the Equation (1): In the current study, the logistic sigmoid (LogS) was used for the hidden activation function, while the output layer was associated with the linear activation function (purelin) [32]. Activation functions and expressions are shown in Figure 1. Step 3: Compute the loss function, based on which the difference between the obtained prediction and the given target values is reached. The calculation is based on the mean square error, represented in Equation (2): Step 4: Backpropagation is conducted, in which the gradients of the loss function are calculated. Using these gradients, the parameters are adjusted from the last layer to the first one.
Step 5: Iterations are repeated until it is observed that the loss function has been minimized, without overfitting the training data.
The described process of the feed-forward neural network and backpropagation is schematically shown in Figure 2. In this paper, the input neurons include the measured deflections and surface temperature of the asphalt layer. The best ANN architecture for the EAC prediction comprises eight input neurons, one hidden layer with eleven neurons and one output neuron ( Figure  3).
The learning rate and adaptation parameter Mu affect the convergence speed of the backpropagation algorithm. A learning rate of 0.001 and an adaptation parameter Mu of 0.001 were chosen. The epoch (iteration number) was limited to 1000 during training. The database for ANN modeling is divided in the following way: 85% for the training set, 10% for the validation set and 5% for the testing set. Step 3: Compute the loss function, based on which the difference between the obtained prediction and the given target values is reached. The calculation is based on the mean square error, represented in Equation (2): Step 4: Backpropagation is conducted, in which the gradients of the loss function are calculated. Using these gradients, the parameters are adjusted from the last layer to the first one.
Step 5: Iterations are repeated until it is observed that the loss function has been minimized, without overfitting the training data.
The described process of the feed-forward neural network and backpropagation is schematically shown in Figure 2. Step 3: Compute the loss function, based on which the difference between the obtained prediction and the given target values is reached. The calculation is based on the mean square error, represented in Equation (2): Step 4: Backpropagation is conducted, in which the gradients of the loss function are calculated. Using these gradients, the parameters are adjusted from the last layer to the first one.
Step 5: Iterations are repeated until it is observed that the loss function has been minimized, without overfitting the training data.
The described process of the feed-forward neural network and backpropagation is schematically shown in Figure 2. In this paper, the input neurons include the measured deflections and surface temperature of the asphalt layer. The best ANN architecture for the EAC prediction comprises eight input neurons, one hidden layer with eleven neurons and one output neuron ( Figure  3).
The learning rate and adaptation parameter Mu affect the convergence speed of the backpropagation algorithm. A learning rate of 0.001 and an adaptation parameter Mu of 0.001 were chosen. The epoch (iteration number) was limited to 1000 during training. The database for ANN modeling is divided in the following way: 85% for the training set, 10% for the validation set and 5% for the testing set. In this paper, the input neurons include the measured deflections and surface temperature of the asphalt layer. The best ANN architecture for the E AC prediction comprises eight input neurons, one hidden layer with eleven neurons and one output neuron ( Figure 3).
The learning rate and adaptation parameter Mu affect the convergence speed of the backpropagation algorithm. A learning rate of 0.001 and an adaptation parameter Mu of 0.001 were chosen. The epoch (iteration number) was limited to 1000 during training. The database for ANN modeling is divided in the following way: 85% for the training set, 10% for the validation set and 5% for the testing set.

Support Vector Machine (SVM)
A SVM is a machine-learning method often applied for classification and regression, and it was first identified by Vapnik. SVM regression is considered a non-parametric method since it relies on kernel functions.
In training data set ( , ), ( , ), … , ( , ) ∈ , denotes the space of the input data of n-dimensional vectors. SVM uses ( , ) = ∑ • ( ) as an approximating function. The function ( , ) is explicitly written as the function of weights w, which are the subject of the training. The evaluation of the regression model based on the SVM rests upon the evaluation of the approximation error, which tolerates errors within the ε-insensitivity zone. There is also the possibility of displaying the error outside of the ε-insensitivity zone, by introducing the linear loss function, represented in Equation (3) [28,33]: The main goal of the SVM algorithm is to simultaneously perform the minimization of and ‖ ‖. The linear regression hyperplane ( , ) = + is obtained by minimizing the actual error, as given in Equation (4): If slack variables and * for the deviations above and below tube are introduced ( Figure 4), in order to overcome the impossible limitations of the optimization problem, the expression stated in Cortes and Vapnik is obtained. It is represented in Equations (5) and (6) [28]:

Support Vector Machine (SVM)
A SVM is a machine-learning method often applied for classification and regression, and it was first identified by Vapnik. SVM regression is considered a non-parametric method since it relies on kernel functions.
In training data set {(x 1 , y 1 ), (x 1 , y 1 ), . . . , (x l , y l )} ∈ X x R, X denotes the space of the input data x of n-dimensional vectors. SVM uses f (x, w) = ∑ N i=1 w i ·ϕ i (x) as an approximating function. The function f (x, w) is explicitly written as the function of weights w, which are the subject of the training.
The evaluation of the regression model based on the SVM rests upon the evaluation of the approximation error, which tolerates errors within the ε-insensitivity zone. There is also the possibility of displaying the error outside of the ε-insensitivity zone, by introducing the linear loss function, represented in Equation (3) [28,33]: The main goal of the SVM algorithm is to simultaneously perform the minimization of R ε emp and w . The linear regression hyperplane f (x, w) = w T x + b is obtained by minimizing the actual error, as given in Equation (4): If slack variables ξ and ξ* for the deviations above and below ε tube are introduced ( Figure 4), in order to overcome the impossible limitations of the optimization problem, the expression stated in Cortes and Vapnik is obtained. It is represented in Equations (5) and (6) [28]: The constant > 0 is a pre-determined tolerance parameter which penalizes errors (large values of and * ) that are greater than ε, thus reducing the approximation error. The optimal hyperplane is obtained by maximizing the dual Lagrange function. After solving the dual regression problem, the following parameters are obtained: • optimal weight vector: which define the optimal regression hyperplane, represented in Equation (7): where * , are Lagrange multipliers. A kernel function is used to approximate a non-linear function. Vectors are transferred into the multi-dimensional space through the mapping Φ ∶ → ℱ where ℱ (zspace) represents the future space. The final approximate function is defined by the following expression as given in Equation (8): In this paper, the Gaussian (RBF) kernel function was used, defined by the following expression, represented in Equation (9): where σ is the width of the RBF function. The SVM algorithm was tested with different parameters for the precision parameter, tolerance parameter and kernel scale factor . The best results for SVM modeling were obtained with the algorithm parameters shown in Table 1.

SVM Parameters
Values precision parameter 0.9 tolerance parameter 1600 The constant C > 0 is a pre-determined tolerance parameter which penalizes errors (large values of ξ and ξ * ) that are greater than ε, thus reducing the approximation error.
The optimal hyperplane is obtained by maximizing the dual Lagrange function. After solving the dual regression problem, the following parameters are obtained: define the optimal regression hyperplane, represented in Equation (7): where α * i , α i are Lagrange multipliers. A kernel function is used to approximate a non-linear function. Vectors x i are transferred into the multi-dimensional space through the mapping Φ : X → F where F (z-space) represents the future space. The final approximate function is defined by the following expression as given in Equation (8): In this paper, the Gaussian (RBF) kernel function was used, defined by the following expression, represented in Equation (9): where σ is the width of the RBF function. The SVM algorithm was tested with different parameters for the ε precision parameter, C tolerance parameter and kernel scale factor γ. The best results for SVM modeling were obtained with the algorithm parameters shown in Table 1.

Boosted Regression Tree (BRT)
Gradient boosting (GB) is one of the machine learning methods used to solve classification and regression problems, and was developed by Jerome Friedman [34]. The GB algorithm is an optimized algorithm based on the error function. Decision trees form strong prediction models, which are created by merging base prediction models. GB is a method in which a decision tree is repeatedly added so that the next decision tree corrects the error of the previous one. The gradient optimization method is based on adjusting the last result by adding a negative value of the gradient of the function which is to be minimized. Hence, by constantly adjusting the weight of a base learner, it makes it a stronger learner. A schematic diagram of BRT is shown in Figure 5 for demonstration.

Boosted Regression Tree (BRT)
Gradient boosting (GB) is one of the machine learning methods used to solve classification and regression problems, and was developed by Jerome Friedman [34]. The GB algorithm is an optimized algorithm based on the error function. Decision trees form strong prediction models, which are created by merging base prediction models. GB is a method in which a decision tree is repeatedly added so that the next decision tree corrects the error of the previous one. The gradient optimization method is based on adjusting the last result by adding a negative value of the gradient of the function which is to be minimized. Hence, by constantly adjusting the weight of a base learner, it makes it a stronger learner. A schematic diagram of BRT is shown in Figure 5 for demonstration. The least-squares boosting (LS Boost) algorithm is an approximation of the function Fm-1, which minimizes the expected value xi, of some specified loss function. The calculation of loss function is based on the mean square error, represented in Equation (10): The BRT model and boosting regression tree ℎ ( ; ) are formed as follows [34,35]: Step 1: Model initialization, represented in Equation (11): Step 2: Iterative procedure of obtaining M regression trees: (1) The negative value of the gradient of the loss function ( , ( )) is calculated, and then it is used as the estimate of the residual, as given in Equation (12): (2) A regression tree ℎ ( ; ) is optimized for the residual obtained in the previous iteration. The step size of the gradient drop is calculated, as represented in Equation (13): The least-squares boosting (LS Boost) algorithm is an approximation of the function F m-1 , which minimizes the expected value x i , of some specified loss function. The calculation of loss function is based on the mean square error, represented in Equation (10): The BRT model and boosting regression tree h m (x; w) are formed as follows [34,35]: Step 1: Model initialization, represented in Equation (11): Step 2: Iterative procedure of obtaining M regression trees: (1) The negative value of the gradient of the loss function L(y i , F(x i )) is calculated, and then it is used as the estimate of the residual, as given in Equation (12): (2) A regression tree h m (x; w) is optimized for the residual obtained in the previous iteration. The step size of the gradient drop is calculated, as represented in Equation (13): Step 3: An approximation update is performed, represented in Equation (14): where ν represents the learning rate, which prevents the over-fitting of the model.
The LS Boost algorithm was tested with different parameters for the learning rate, number of boosting stages and leaf size. The best results for BRT modeling were obtained with the algorithm parameters shown in Table 2. Table 2. BRT algorithm parameters.

Dataset
Based on the measured deflections, the E AC are obtained through backcalculation procedures. Classical backcalculation procedures require the assumption of the initial modulus of elasticity at the beginning of the calculation, on which the convergence further depends. A wrong choice of the initial modulus of elasticity often leads to a calculation that is not sufficiently accurate. Other problems that affect the accuracy of the calculation include the accuracy of measurements, the number and thickness of the pavement construction layers, as well as the temperature of the asphalt layers [36][37][38][39].
In order to form a model for predicting the elastic modulus of asphalt layers of flexible pavement structures, it is necessary to form a set of data of the measured deflections and temperature, which contain a sufficient amount of information. Based on this data set, the modulus of elasticity is calculated using classic backcalculation procedures.
Starting from the defined goals and the analysis of the research plan and program, a database that first needs to be thoroughly examined and processed is chosen. Data analysis and processing mean that when deflections are measured on the primary and secondary roads and highways, the recurring points that represent extremes and significant deviations are removed. In this way, they do not affect the data analysis. In addition, 462 pieces of information were collected for analysis, 438 of which were taken for network training, which amounts to 95% of the total amount of data. Network testing was performed based on 23 pieces of information, which is 5% of the total amount of data. The statistics of the data used in the modeling are shown in Table 3.
The models were modeled by the ANN, SVM and BRT methods using the MATLAB software package. The input variables are deflections d 0 , d 300 , d 600 , d 900 , d 1200 , d 1500 , d 1800 , i.e., the measured deformations at vertical distances of 0, 300, 600, 900, 1200, 1500, 1800 mm from the center of the loading plate and the temperature of the upper surface of the asphalt layer. The deflections were mostly measured in spring and autumn in the early morning hours, except for one section on the highway, which was measured in the summer period in the morning hours. ELMOD 6 software was used for the backcalculation. Three methods for calculating the asphalt modulus can be selected in this software. They include the finite element method, linear elastic theory and method of equivalent thicknesses. In this paper, the linear elastic theory was used. Since in the Republic of Serbia there is no recommendation for using these three methods, British standards are used [40]. The output variable is the modulus of elasticity of the asphalt layers (E AC ) expressed in MPa. The deflections were measured based on the load which ranges between 48.63 and 50.89 kN. Due to such a small range, the load is considered constant, and hence it is not taken as an input variable. The thickness of asphalt layers was measured at intervals of 1 km and for that reason was not taken as an input variable.
According to British standards [40], the bearing capacity of the structure is divided into three sections, according to the module values: • modules up to 3000 MPa-poor bearing capacity overall; • modules from 3000 MPa to 7000 MPa-there are damages in some places; • modules above 7000 MPa-good bearing capacity overall.
As a result of the aforementioned division, in this research, the models were modeled with modules of up to 7000 MPa of the maximum value. Figure 6 shows the percentage values of the number of the current elastic modulus E AC , within the given ranges. The model's predictive capabilities and the degree of fit of the E AC for a certain range depend on the amount of data in that range. All models analyzed in this paper were trained on an identical training set, while the accuracy of the models was evaluated based on an identical testing set.

Results and Discussions
The parameters of the correlation coefficient (R), determination coefficient (R 2 ), mean absolute percentage error (MAPE) and root mean square error (RMSE) are used to measure the accuracy of the models. MAPE is the mean absolute percentage error, representing the mean of the sum of the absolute percentage differences between the actual and predicted values. The RMSE is the standard deviation of the errors, which calculates the average difference between actual and predicted values. In this paper, the normalized value is used. These parameters can be shown mathematically with the expressions given in Table  4. If the value of R 2 and R is closer to 1, the model is more reliable and the degree of fit is higher. The MAPE and RMSE are used to test the prediction ability of the model, and the smaller these values, the higher the prediction ability.  A summary of the obtained model performance results for the ANN, SVM and BRT models is shown in Table 5. The linear correlation between the actual and predicted EAC values for the ANN, SVM and RBT methods is shown in Figures 7-9, respectively. All models analyzed in this paper were trained on an identical training set, while the accuracy of the models was evaluated based on an identical testing set.

Results and Discussions
The parameters of the correlation coefficient (R), determination coefficient (R 2 ), mean absolute percentage error (MAPE) and root mean square error (RMSE) are used to measure the accuracy of the models. MAPE is the mean absolute percentage error, representing the mean of the sum of the absolute percentage differences between the actual and predicted values. The RMSE is the standard deviation of the errors, which calculates the average difference between actual and predicted values. In this paper, the normalized value is used. These parameters can be shown mathematically with the expressions given in Table 4. If the value of R 2 and R is closer to 1, the model is more reliable and the degree of fit is higher. The MAPE and RMSE are used to test the prediction ability of the model, and the smaller these values, the higher the prediction ability. Table 4. Model performance evaluation criteria.

Coefficient of correlation
Root mean squared error Note: E act,i and E act,i are the target and predicted modulus values, respectively; E act and E pred are the mean of the target and predicted modulus values corresponding to n patterns.
A summary of the obtained model performance results for the ANN, SVM and BRT models is shown in Table 5. The linear correlation between the actual and predicted E AC values for the ANN, SVM and RBT methods is shown in Figures 7-9, respectively.              The correlation coefficient R between the backcalculated and predicted E AC values is higher than 0.96 for all models. The coefficient of determination R 2 between the backcalculated and predicted E AC values is higher than 0.93 for all models. Furthermore, the ANN model is marked by a RMSE of 0.066 and a MAPE of 9.13%. The SVM model has a RMSE value of 0.059 and MAPE value of 7.64%, and the RBT model has a 0.078 RMSE and MAPE value of 8.84%. These results indicate that the model of the SVM method features better E AC prediction abilities than the other two analyzed methods. However, a rather small difference in the performance results obtained between the ANN, SVM and BRT methods shows that all three methods provide similar E AC prediction capabilities. Similar research was presented by authors Baldo [21], Gopalakrishnan [24] and Saltan [25]. In their papers, they analyzed the following ML methods: ANN [21,24,25], SVM [24,25] and BRT [25]. Based on their research, the coefficients of determination (R 2 ) shown in Table 6 were obtained. In all three works, the results were analyzed based on R 2 and therefore compared in this analysis. Having presented the data results from the literature, it can be noticed that the best dependence was achieved in the paper [24]; that is, the error was the smallest. The difference between this work and our research is that, in addition to the deflection values, the input data also included the values of the thickness of the pavement structure layers, which contributes to the accuracy of the analysis. In their research, Baldo et al. [21] used deflections too, but also the difference between the deflections of the surface geophone and the geophone at a certain depth. They used different combinations of inputs and based on that, the R 2 of 0.9477 was obtained, which is the approximate value obtained in this paper. The analyzed methods presented in this paper were also observed by the author Saltan [25]. Saltan used deflections for the input parameters, and as a result, the obtained determination coefficients were lower than the ones obtained in this research. Analyzing the methods individually, it is observed that the R 2 obtained by the ANN method has smaller deviations compared to the R 2 obtained in the authors' work, while the SVM and BRT methods have weaker R 2 values compared to those of the authors' work.

Conclusions
Flexible pavement structures are defined by the modulus of elasticity of asphalt layers, which is of great significance in determining the strength of the pavement and its behavior under the influence of traffic loads. They are obtained by backcalculation procedures, through measured deflections using an FWD device, applying non-destructive methods.
This paper presents a successful backcalculation system based on machine learning methods: ANNs, SVMs and BRTs. Taking into account the fact that the presented approach is relatively new in solving the problem of estimating the modulus of elasticity by using ML, the prediction results are presented in a critical comparative analysis. Based on the results shown in Table 5, a conclusion can be drawn that the SVM method features a smaller error in its ability to predict E AC (MAPE-7.64%; RMSE-0.059) in comparison with the other two analyzed methods. The errors of the ANN and BRT methods show a small deviation (MAPE from 1.2 to 1.49%; RMSE from 0.007 to 0.019) compared to the SVM method.
The models for estimating the modulus of elasticity of asphalt layers based on ML in this paper are applicable to all flexible pavement structures of highways and regional and local roads in the territory of the Republic of Serbia. The primary prerequisite for the application of the model is the formation of an adequate database with measured deflections and temperatures.
The application of the formed model makes it possible to obtain results within a very short period, using non-destructive methods. Based on these results, it is possible to see which sections require rehabilitation, reconstruction or strengthening of the pavement structure.