Stiffness Modulus and Marshall Parameters of Hot Mix Asphalts: Laboratory Data Modeling by Artificial Neural Networks Characterized by Cross-Validation

The present paper discusses the analysis and modeling of laboratory data regarding the mechanical characterization of hot mix asphalt (HMA) mixtures for road pavements, by means of artificial neural networks (ANNs). The HMAs investigated were produced using aggregate and bitumen of different types. Stiffness modulus (ITSM) and Marshall stability (MS) and quotient (MQ) were assumed as mechanical parameters to analyze and predict. The ANN modeling approach was characterized by multiple layers, the k-fold cross validation (CV) method, and the positive linear transfer function. The effectiveness of such an approach was verified in terms of the coefficients of correlation (R) and mean square errors; in particular, R values were within the range 0.965–0.919 in the training phase and 0.881–0.834 in the CV testing phase, depending on the predicted parameters.


Introduction
Road pavements are transport infrastructures built with different types of hot mix asphalt (HMA), namely mixtures made of aggregates and bitumen, mixed at temperatures higher than 150 • C. In order to withstand traffic loads and environmental conditions, such infrastructures have to be properly designed in terms of their thickness and material properties. With respect to the composition and mechanical characteristics of HMAs, experimental methods are currently used to design and optimize such bituminous mixtures [1][2][3][4][5][6]. Particularly, for what concerns the bitumen content identification and the mixtures' performance evaluation, quite onerous laboratory tests are necessary; moreover, such experimental procedures require experienced and skilled technicians. Any modification of the HMAs composition, in terms of type or quantity of bitumen, rather than of aggregates, requires new laboratory tests. A numerical model of HMAs' mechanical behavior that could quickly elaborate a reliable prediction of the material's response would allow a reduction in time and costs for the design of the mixture itself.
Typically, a mathematical model of the material's behavior is based on constitutive equations, expressed in analytical terms [7,8], eventually implemented in a computational system for finite element analysis [9,10].
One of the milestone studies on the modeling of the mechanical behavior of bituminous mixtures aimed to investigate the elastic-viscoelastic correspondence principle, which allowed the elastic continuum damage model to extend to a corresponding viscoelastic damage model, characterized by an evolution law able to properly describe the growth of damage within asphalt concrete, under uniaxial tension tests at different strain rates [11]. The following developments of such a constitutive approach have included cyclic loading conditions [12,13] and an analysis of the healing phenomenon [14]. On the basis of the elastic-viscoelastic correspondence principle and continuum damage mechanics, the constitutive framework has been further elaborated to obtain a fatigue prediction model, which has also been compared to a classic phenomenological fatigue model [15][16][17]. It has been demonstrated that the regression coefficients of the phenomenological model can be expressed as functions of viscoelastic and damage characteristics of asphalt concretes. More recently, the modeling approach has been modified to consider the viscoplastic rate-dependent hardening-softening response of bituminous mixtures under compression [18][19][20][21]. A viscoelastoplastic formulation of the constitutive approach has also been developed [22,23].
Linear viscoelastic, rather than thermo-viscoplastic laws have also been studied by Di Benedetto et al. [24,25] to model the mechanical response of bituminous mixtures.
A fundamental constitutive approach proposed by Masad et al. [26] to model the permanent deformation of asphalt concretes at high temperatures was characterized by an anisotropic non-associated flow rule, on the basis of the Drucker-Prager yield surface. Laboratory data obtained by triaxial tests, carried out at different confining stress levels and strain rates, were used to validate the proposed model, with respect to three different asphalt concretes. Another relevant aspect of such an approach is the relationship that has been established between the model parameters and some significant aggregate properties (shape, surface energy, anisotropic distribution), particularly with regard to the optimization of the mixture.
The research group of the Texas A & M University, after a series of studies [27][28][29][30], achieved significant advances, leading up to a temperature-dependent viscodamage model coupled to the Schapery's nonlinear viscoelasticity and the Perzyna's viscoplasticity [31]. Such a model has allowed the ability to distinguish between the compressive and tensile response of asphalt concretes with respect to the damage development phenomenon. Advanced numerical algorithms were used to implement the proposed model in a general-purpose finite element code. An effective method to obtain the model parameters through the creep test carried out at various stress levels and temperatures was also developed.
Further improvements have been accomplished over the years with more refined models based on the mechanics of materials [32][33][34][35][36][37][38][39][40] to acquire a deeper understanding about the complex relationship between asphalt mixtures' microstructure and their mechanical behavior. Particular emphasis has been given to the possibility of using such constitutive approaches to properly select asphalt concretes' components and their optimal proportions to ensure the required design properties and performance of the mixture. Furthermore, advanced material characterization methods have also been used to support such modeling approaches, for instance, techniques based on X-ray computed tomography image analysis. A coupled constitutive model, still based on Schapery nonlinear viscoelasticity and Perzyna-type viscoplasticity, but properly modified to take into account the hardening-relaxation phenomena, represents the current state of the art achieved by such a research group [41]. One of the key characteristics of the last approach is given by the possibility to calibrate and validate the model on the basis of conventional tests, namely dynamic modulus tests and repeated creep-recovery tests. The model has shown the capacity to properly describe the complex behavior of asphalt concretes under different loading conditions.
Other studies, based on continuum thermomechanics, have allowed constitutive equations suitable for modelling the mechanical response of asphalt concretes to be obtained. Krishnan et al. [42] developed a nonlinear rate type viscoelastic model, by means of thermodynamic principles. A coupled temperature-dependent viscoelastic, viscoplastic, and viscodamage model was obtained by Darabi et al. [43], on the basis of a thermodynamic framework.
Introducing the Helmholtz free energy and the concept of internal state variables, a viscoelasticviscoplastic damage model, congruent from the thermodynamics point of view, was developed by Zhu and Sun [44]. A deep discussion on the capacity of the model to reliably predict the volumetric deformation was provided; the contraction and dilation response of the material was successfully associated to the viscoelastic component and viscoplastic damage component, respectively.
Within the ambit of thermomechanics, Chen et al. [45] recently derived a constitutive formulation in finite strains and characterized by a viscoelastic dissipation potential to take into account the deviatoric and volumetric response, a modified Perzyna viscoplastic law with a non-associated flow rule, to model the inelastic deformation by means of a Drucker-Prager type plastic dissipation potential and a damage model, specifically introduced to represent the different behavior of the material in tension and compression.
The environmental as well as the physical and chemical behaviors of the bituminous mixtures, related to ageing, healing, or moisture-induced damage, have also been introduced in some constitutive formulations [46][47][48][49].
The brief discussion just presented represents a partial overview of the total amount of studies conducted by different research groups on the modeling of asphalt concretes' behavior, based on the mechanics of materials. Nevertheless, it can be stated that the main and very important characteristic of such methods is the mechanistic framework of the numerical or mathematical expressions, which allows a rational analysis and deep understanding of the bituminous mixtures' response under different testing conditions. For this reason, the mechanistic constitutive methods also represent a really useful tool to support the mix design phase.
Alternative approaches rely on methods that are not physically based, for instance, statistical rather than artificial neural networks (ANNs) models.
Statistical regressions of experimental data sets could be an alternative approach to develop predictive equations of the HMA's property analyzed [58][59][60][61]. Nevertheless, it has also been reported that ANNs can produce more accurate predictions with respect to multiple linear regression [62,63]. Indeed, in recent years, ANNs have arisen in scientific research as a powerful numerical technique for data analysis [64], even in the pavement-engineering field. Tapkın et al. [65] and Baldo et al. [66] verified the possibility of fitting Marshall test results of bituminous mixes to provide predictive equations that can be used to quicken the empirical Marshall mix design; nevertheless, such neural models are not mechanistically based and for this reason, they are considered as "black box" methods. However, the so called "black box" principle is basically based on a nonlinear fitting approach, as it has previously been widely discussed [66]. Furthermore, even without a physical basis, such approaches can be useful to reduce the number of laboratory tests required by the Marshall mix design, which is still widely adopted in many asphalt laboratories [67][68][69][70][71].
ANNs have also been used in other studies to model different mechanical parameters of bituminous mixtures for road pavement; such researches have reported the use of a basic network architecture, a standard approach for the data sampling and adoption of just one type of transfer function, namely the hyperbolic tangent one [65,[72][73][74][75][76][77]. This type of methodological approach has been followed primarily because the data sets were large enough to avoid the use of more complex neural network structure; however, it has been mentioned that more sophisticated ANNs could be useful to improve the prediction performance of neural models [64].
Therefore, the main objective of the current paper was to verify the effectiveness of an ANN modeling approach based on multiple layer structures, a more reliable data sampling technique and a more efficient transfer function, for accurate and fast prediction of mechanical parameters of bituminous materials to enhance the mix design phase in the laboratory. The ANNs models developed in this study were trained and tested on the basis of laboratory data of HMAs for road pavements, characterized by different types of aggregate and bitumen.

Artificial Neural Network Modeling
An artificial neural network is a computational approach that is increasingly used in the development of predictive models; its fundamental unit is the artificial neuron [78,79]. The function of an artificial neuron, like a biological one, is to process input signals and modulate its own response through an activation function, called the transfer function, which determines the interruption or transmission of the outgoing impulse. The computing power of a biological brain depends on the number of connections between neurons and the same goes for ANNs. In a feedforward ANN, neurons are organized into layers and since information flows only in one direction without any recurrent cycle, connections are established between neurons belonging to different layers so that none of the possible pathways touch the same neuron twice. Within an ANN, each neuron ( j) computes a weighted sum (n j ) of elements (p i ) in the input vector through weights (w j,i ) associated with each connection. Then, it calculates an output value (a) by applying an assigned transfer function to n j [80]: where R is the number of elements in the input vector and b is the neuron bias. The transfer function may take different analytical expressions. In the current study, the positive linear (Equation (2)) and hyperbolic tangent (Equation (3)) functions were used: Connections' weights and neuron bias are adjustable scalar parameters and their values determine the network function. The central idea of neural networks is to adjust such parameters, through an iterative process (called "training" or "learning"), so that the ANN is able to perform a desired task [78,79,81]. Given an experimental data set, the learning of a feedforward network is based on a supervised approach: It consists of iteratively adjusting weights' values to minimize the difference between experimental targets and calculated output by using an optimization algorithm [78,79,81]. The supervised learning is divided into two phases ( Figure 1): After the initialization of connection's weights, the forward pass begins, which consists of processing the incoming information through the parallel processing of many neurons to compute net outputs. Then, the backward pass carries out a comparison between the experimental targets and calculated outputs so that a backpropagation algorithm can evaluate a weight's correction. With such an approach, the artificial network learns to recognize the implicit relationship between the input and target and can provide a solution to new input data. A detailed description of the structure of feedforward networks, the computational process performed by the neuron, and the supervised learning has previously been discussed [66].

Artificial Neural Network Modeling
An artificial neural network is a computational approach that is increasingly used in the development of predictive models; its fundamental unit is the artificial neuron [78,79]. The function of an artificial neuron, like a biological one, is to process input signals and modulate its own response through an activation function, called the transfer function, which determines the interruption or transmission of the outgoing impulse. The computing power of a biological brain depends on the number of connections between neurons and the same goes for ANNs. In a feedforward ANN, neurons are organized into layers and since information flows only in one direction without any recurrent cycle, connections are established between neurons belonging to different layers so that none of the possible pathways touch the same neuron twice. Within an ANN, each neuron ( ) computes a weighted sum ( ) of elements ( ) in the input vector through weights ( , ) associated with each connection. Then, it calculates an output value ( ) by applying an assigned transfer function to [80]: where is the number of elements in the input vector and is the neuron bias. The transfer function may take different analytical expressions. In the current study, the positive linear (Equation (2)) and hyperbolic tangent (Equation (3)) functions were used: Connections' weights and neuron bias are adjustable scalar parameters and their values determine the network function. The central idea of neural networks is to adjust such parameters, through an iterative process (called "training" or "learning"), so that the ANN is able to perform a desired task [78,79,81]. Given an experimental data set, the learning of a feedforward network is based on a supervised approach: It consists of iteratively adjusting weights' values to minimize the difference between experimental targets and calculated output by using an optimization algorithm [78,79,81]. The supervised learning is divided into two phases ( Figure 1): After the initialization of connection's weights, the forward pass begins, which consists of processing the incoming information through the parallel processing of many neurons to compute net outputs. Then, the backward pass carries out a comparison between the experimental targets and calculated outputs so that a backpropagation algorithm can evaluate a weight's correction. With such an approach, the artificial network learns to recognize the implicit relationship between the input and target and can provide a solution to new input data. A detailed description of the structure of feedforward networks, the computational process performed by the neuron, and the supervised learning has previously been discussed [66].

The K-Fold Cross Validation
In machine learning, the available data set is usually divided into subsets for training, validation, and testing. After the model's hyperparameters have been set up (for example, the number of hidden layers, number of neurons in each layer, learning rate, etc.), the training data set, generally made up of 60% to 80% of the data, is used to train the model itself. If the model performs satisfactorily on the training data, then it is run on the validation data set, which usually ranges from 5% to 20% of the data. The validation data set helps to provide an assessment of the model's capability. If the error (i.e., the mean squared difference between the predicted and experimental data, MSE) on the validation data set significantly increases (or decreases) compared to the training error, then an overfitting (rather than an underfitting) condition has occurred. The test data set, typically consisting of 5% to 20% of the data, is employed to complete the analysis and assessment of the model prediction capacity.
The procedure described above, considered as "conventional", poses two issues: On the one hand, the subdivision of the data into subsets with different purposes involves the risk of leaving out some relevant trends from the training data set and, on the other hand, a model can lead to a lower prediction error on training data but at the same time to high test error due to data sample variability [82].
An alternative to the conventional approach is the so-called k-fold cross validation (CV). It is a statistical technique that allows a more accurate estimation of a model's performance and then to compare and select a model for a given predictive modeling problem [83][84][85][86]. It is strongly recommended to follow such a procedure in the case of a relatively small data set [78,81]. It involves a random division of data into k folds of equal sizes: One of these folds is used as a test set to evaluate the model's performance and others (k−1) are grouped to form the training data set ( Figure 2). The training and test phases are performed k times and each time a fold is assumed as a test set while the remaining ones are used to train the model. Performing multiple rounds of CV with different subsets from the same data allows a significant reduction in sample variability: All data are included in a training set (k−1) times, and in a test set once. Usually, the value of k is set to 5 or 10, as these values have been shown to result in more accurate estimations of a model's capacity [82,86]. By calculating the average of the testing error, MSE i , over these multiple rounds, an estimation of the model's predictive performance is obtained: where M is the number of samples in the i-th fold, t l is the experimental targets, and y l is the ANN-calculated outputs. Including a measure of the variance of the skill scores is also recommended [82,83], such as the standard deviation (i.e., the standard deviation is the square root of the variance) of MSE i : However, to find the hyperparameters of the model that best fit the experimental data and that satisfactorily generalize the problem, it is necessary to define a model selection procedure. This consists of designing several neural networks by assigning numerical values, within a range, to one of the hyperparameters and by fixing those of the remaining ones. This operation is repeated by varying at least one pair of hyperparameters. Resulting models have to adapt well to the data used for training. Subsequently, a k-fold CV is performed on each of the models thus defined. The network that shows the best predictive performance is initialized and then trained on the entire data set available; this is the final model that is used for predictions.

ANN Structure
In the current study, ANN models of the selected mechanical parameters of the bituminous mixtures investigated were developed using the software MATLAB ® (R2014b, The MathWorks Inc., academic use). The designed ANNs are of the feedforward type and are characterized by multiple layers, which make use of the positive linear rather than the hyperbolic tangent transfer function. The structure of all the ANNs is completed by an output layer, adopting a linear transfer function. The use of the positive linear transfer function (also known as ReLu, rectified linear unit [64,87]) has been shown empirically that it leads to better performance than the use of the hyperbolic tangent, which is prone to problems of rapid saturation [64,87]. Each hidden layer has the same number of neurons. The backpropagation algorithm, used to train the networks and improve their generalization, is the Bayesian regularization, which updates the weight values according to the Levenberg-Marquardt optimization [88].
The network was set up with random initial weights and biases, whereas the learning rate (i.e., the parameter that controls changes of weight and bias during learning) was 0.001, the number of epochs (i.e., is the maximum number of presentations of the set of training vectors to a network and updates of weights and errors) was 3000, the error goal was 0.01, and the minimum gradient (i.e., the magnitude of the gradient of performance used to stop the training) was 0.0000001. In order to train networks more efficiently, a pre-processing function was assigned to normalize the data set, so that the input variables and the target parameter had a mean equal to zero and standard deviation equal to one. The same function, as a post-processing function, also reconverts normalized outputs into the targets' original units. The following equation illustrates the normalization process performed on the training and test data sets: where is the i-th component of the input vector or one of the three mechanical parameters considered in the current study; and are respectively the average and the standard deviation of the parameter, ; and finally, is the normalized input or target variable treated (Table S1). Each new data that will be used by the network has to be pre-processed by the mean of Equation (6).
The number of hidden layers and the number of neurons in each hidden layer were selected as hyperparameters to be optimized in the model selection procedure to search for the best predictive capabilities on the basis of the selected machine learning method (i.e., the Bayesian regularization).

ANN-Model's Equation
Referring to the computational process performed by neurons (Section 2.1) and to the architecture of multiple-layer ANNs, the vector equation of a one-hidden layer model (Equation (7)), two-hidden layer model (Equation (8)), and three-hidden layer model (Equation (9)) are reported in the following:

ANN Structure
In the current study, ANN models of the selected mechanical parameters of the bituminous mixtures investigated were developed using the software MATLAB ® (R2014b, The MathWorks Inc., academic use). The designed ANNs are of the feedforward type and are characterized by multiple layers, which make use of the positive linear rather than the hyperbolic tangent transfer function. The structure of all the ANNs is completed by an output layer, adopting a linear transfer function. The use of the positive linear transfer function (also known as ReLu, rectified linear unit [64,87]) has been shown empirically that it leads to better performance than the use of the hyperbolic tangent, which is prone to problems of rapid saturation [64,87]. Each hidden layer has the same number of neurons. The backpropagation algorithm, used to train the networks and improve their generalization, is the Bayesian regularization, which updates the weight values according to the Levenberg-Marquardt optimization [88].
The network was set up with random initial weights and biases, whereas the learning rate (i.e., the parameter that controls changes of weight and bias during learning) was 0.001, the number of epochs (i.e., is the maximum number of presentations of the set of training vectors to a network and updates of weights and errors) was 3000, the error goal was 0.01, and the minimum gradient (i.e., the magnitude of the gradient of performance used to stop the training) was 0.0000001. In order to train networks more efficiently, a pre-processing function was assigned to normalize the data set, so that the input variables and the target parameter had a mean equal to zero and standard deviation equal to one. The same function, as a post-processing function, also reconverts normalized outputs into the targets' original units. The following equation illustrates the normalization process performed on the training and test data sets: where x is the i-th component of the input vector or one of the three mechanical parameters considered in the current study; x mean and x std are respectively the average and the standard deviation of the parameter, x; and finally, x n is the normalized input or target variable treated (Table S1). Each new data that will be used by the network has to be pre-processed by the mean of Equation (6). The number of hidden layers and the number of neurons in each hidden layer were selected as hyperparameters to be optimized in the model selection procedure to search for the best predictive capabilities on the basis of the selected machine learning method (i.e., the Bayesian regularization).

ANN-Model's Equation
Referring to the computational process performed by neurons (Section 2.1) and to the architecture of multiple-layer ANNs, the vector equation of a one-hidden layer model (Equation (7)), two-hidden layer model (Equation (8)), and three-hidden layer model (Equation (9)) are reported in the following: where p is the R length input vector and y is the network scalar output associated with the considered mechanical parameter. All ANN models were elaborated on the basis of four numerical input variables, namely the bitumen content (% by weight of mix), filler to bitumen ratio (%), air voids (%), and voids in the mineral aggregates (%), and two categorical input variables, namely the bitumen type and aggregate type. Such input data have been considered as effective in properly representing the composition of HMA mixes, which were designed and produced with different aggregate (limestone or diabase) and bitumen (standard 50/70 or modified 25-55/75) types. Predictive models characterized by such types of input data allow modifications of the mixture composition to be numerically simulated, obtaining an estimation of the mechanical parameter under analysis without the necessity of performing further experimental tests, thus representing an extremely useful design tool during the optimization of the material. The parameters' meaning of Equation (8) is explained in the following, while a graphical representation is reported in Figure 3. Vector and scalar variables are indicated in bold and italics type, respectively. Each layer is characterized by a connections' weight matrix, W, adds a vector, b, of neurons' bias, and returns an output vector, a. To distinguish one layer from another, the number of the layer is added to the variable of interest; furthermore, a distinction between the weight matrix that is connected to inputs and ones that are connected between layers is made by associating letters, I and L, to W, respectively. A pair of indexes is also assigned to each weight matrix to identify the destination (first index) and the origin (second index) of represented connections. Thus, in a three-layer feedforward network (Figure 3), the S 1 neurons in the first hidden layer add an S 1 × 1 bias vector, b1, to the dot product of IW1, 1 (i.e., the S 1 × R weight matrix of connections between R inputs and S 1 neurons) and the input vector, p. The sum of the bias, b1, and the product, IW1, 1·p, is the so-called net input, n1, with the dimension, S 1 × 1. This sum is processed by the transfer function, f 1 , to get the neurons' S 1 × 1 output vector, a1, of the first hidden layer, which is also the input vector of the second hidden layer. This layer, made of S 2 neurons, elaborates the incoming information by the S 2 × 1 bias vector, b2, the S 2 × S 1 layer weight matrix, LW2, 1 (having a source 1, neurons' outputs of the first layer, and a destination 2, neurons of the second layer), and the transfer function, f 2 , to produce the S 2 x1 output vector, a2. This process is repeated for all the hidden layers that belong to the architecture of the ANN. The last layer, which produces the network output, is called the output layer; function-fitting neural networks often have just one output neuron, because most of the time the prediction needed regards only one target value associated with a specific mechanical or physical parameter. Thus, the sum of the scalar bias, b3, and the product, LW3, 2·a2, is elaborated by the transfer function, f 3 , typically of the linear type, to obtain the scalar network output, labelled as y. The concepts discussed above can also be applied to the other equations' variables.
where is the R length input vector and is the network scalar output associated with the considered mechanical parameter. All ANN models were elaborated on the basis of four numerical input variables, namely the bitumen content (% by weight of mix), filler to bitumen ratio (%), air voids (%), and voids in the mineral aggregates (%), and two categorical input variables, namely the bitumen type and aggregate type. Such input data have been considered as effective in properly representing the composition of HMA mixes, which were designed and produced with different aggregate (limestone or diabase) and bitumen (standard 50/70 or modified 25-55/75) types. Predictive models characterized by such types of input data allow modifications of the mixture composition to be numerically simulated, obtaining an estimation of the mechanical parameter under analysis without the necessity of performing further experimental tests, thus representing an extremely useful design tool during the optimization of the material.
The parameters' meaning of Equation (8) is explained in the following, while a graphical representation is reported in Figure 3. Vector and scalar variables are indicated in bold and italics type, respectively. Each layer is characterized by a connections' weight matrix, , adds a vector, , of neurons' bias, and returns an output vector, . To distinguish one layer from another, the number of the layer is added to the variable of interest; furthermore, a distinction between the weight matrix that is connected to inputs and ones that are connected between layers is made by associating letters, and , to , respectively. A pair of indexes is also assigned to each weight matrix to identify the destination (first index) and the origin (second index) of represented connections. Thus, in a threelayer feedforward network (Figure 3), the S neurons in the first hidden layer add an S 1 bias vector, , to the dot product of , (i.e., the S R weight matrix of connections between R inputs and S neurons) and the input vector, . The sum of the bias, , and the product, , • , is the so-called net input, , with the dimension, S 1. This sum is processed by the transfer function, , to get the neurons' S 1 output vector, , of the first hidden layer, which is also the input vector of the second hidden layer. This layer, made of S neurons, elaborates the incoming information by the S 1 bias vector, , the S S layer weight matrix, , (having a source 1, neurons' outputs of the first layer, and a destination 2, neurons of the second layer), and the transfer function, , to produce the S x1 output vector, . This process is repeated for all the hidden layers that belong to the architecture of the ANN. The last layer, which produces the network output, is called the output layer; function-fitting neural networks often have just one output neuron, because most of the time the prediction needed regards only one target value associated with a specific mechanical or physical parameter. Thus, the sum of the scalar bias, 3, and the product, , • , is elaborated by the transfer function, f , typically of the linear type, to obtain the scalar network output, labelled as . The concepts discussed above can also be applied to the other equations' variables.

Model Selection Procedure and Error Estimation
Quantities and functions illustrated in Section 2.3 represent the networks' hyperparameters that are set in the model selection procedure in order to ensure better efficiency of the training process. As mentioned, the number of hidden layers (N) and the number of neurons in each hidden layer (S) were modified to obtain 15 case studies for each selected transfer function (positive linear or hyperbolic tangent): N was changed from 1 to 3, S from 6 to 14, by increments of 2 neurons; such research design is reported in Table 1. In the literature, the use of a lower number of neurons and a greater number of hidden layers is recommended in order to increase the capacity of an ANN and avoid overfitting problems [72]. The hyperparameters' selection is based on the evaluation of the models' prediction capacity [83,89], which is obtained by means of statistical indicators. In fact, the aim of the model evaluation is to estimate the generalization error of the selected model, calculated on the basis of a defined test set. The term "generalization" means the ability of a model to return a good prediction on unseen data, i.e., that have not been used for training. The most used statistical indicators, especially in the "conventional" procedure, are the mean squared error (MSE) and the correlation coefficient (R). The first one (whose analytical expression is reported in the last side of Equation (4)) calculates the average squared difference between the experimental targets (t) and ANN-computed outputs (y); it is also employed as performance function (also called the "loss" function) to optimize the ANN training process [78]. In general, the lower the mean square error, the closer the ANN-calculated data are to the experimental ones (i.e., calculated values are dispersed closely to experimental ones). Instead, the correlation coefficient, R, measures the linear relationship between t and y, and it is expressed by the ratio between the covariance of the two considered variables (t and y) and the product of the respective standard deviations: where t and y are the mean value of the targets and of network outputs, respectively. In general, the closer the value of R is to the unity, the stronger the results of the linear relation between t and y, thus confirming that the training has been completed successfully (if R 1 for the training data set) and that the degree of generalization achieved can be considered optimal (if R 1 for the testing data set). The mean squared error and the correlation coefficient have already been used in previous performance analysis of some ANNs designed to predict the mechanical parameters of HMA mixtures [65,66,[72][73][74][75][76]90].
On the contrary, when a k-fold CV is performed, the ANN performance evaluation takes place through three statistical indicators [82][83][84][85][86], namely the CV mean squared error (MSE cv ), the MSEs' standard deviation (σ cv ), and the CV correlation coefficient (R cv ), which is the average of the test correlation coefficient (R i ) over k rounds. A fourth indicator is given by the mean difference (∆) between the training and test MSEs over k-folds, which allows the occurrence of an overfitting problem to be assessed. In fact, when a model over-adapts to the training data set, the difference between training and test errors increases [82,85]. Therefore, the hyperparameters that allow the lowest values of MSE cv (i.e., higher R cv ), σ cv , and ∆ to be obtained can be selected to identify the final model, which has to be trained on the entire data set available.

Standard Approach
Five ANNs with a different number of neurons were also developed for each mechanical parameter of interest using the MATLAB ® ANN toolbox; these networks were trained with 80% of the available data set and tested on the remaining 20%. This configuration was chosen to match the data subdivision performed by the five-fold CV. The sampling process was completely random and carried out just one time by the standard approach implemented in the MATLAB ® ANN toolbox. If such an initial subdivision of the data set, with the following training and testing of the model, gives an unsatisfactory performance, it is suggested, on the basis of the experience of the researcher, that the whole procedure is repeated with a new sampling of the data set; however there is no automatic procedure aimed at optimizing model performance. In order to evaluate the performance of the standard approach with respect to the sample variability, in this study, random sampling was repeated 10 times to obtain 10 random subdivisions of the available data set.
The k-fold cross validation involves partitioning the available data set into k non-overlapping groups of equal size that represent the partition's classes of the initial data set [78,81]. Therefore, given that the performance of a model is evaluated on k test sets (which by definition cannot contain common elements), the average of test errors across k trials represents a more reliable estimation of the model's predictive capacity [78,81]. Indeed, in the standard approach, the subdivision of the data set in the training and test subsets is randomly performed and by repeating this operation, the identified test sets can contain common elements, which leads to a noisy estimation of the predictive performance for small-size data sets [78,81]. Furthermore, the MATLAB ® ANN toolbox carries out the data sampling as a "black box" and consequently it does not allow the performed subdivision of the data set to be easily identified; thus, the variability of the training and test set is difficult to establish.
Therefore, this study of the standard-ANN's aimed to verify the wide variability of the models' predictive capacity that could be caused by data sample variability.

Standard ANN Structure
The designed standard-ANNs are characterized by one hidden layer with a hyperbolic tangent transfer function, and one output layer with a linear transfer function; this architecture is set as the default in the MATLAB ® ANN toolbox. The number of neurons in the hidden layer ranges from 6 to 14 by increments of 2 neurons; as a result, five different standard-ANNs were obtained for each mechanical parameter considered. The seven types of input data already mentioned above (Section 2.3.1) were considered. Moreover, all the hyperparameters fixed in the model selection procedure for k-fold analysis were set to the same values (Section 2.3). In the following, these ANNs are simply called "standard".

Materials and Methods
The type of HMA mixture analyzed in the present research was dense asphalt concrete (AC). HMAs were produced with limestone or diabase aggregates and two types of bituminous binder; conventional or modified bitumen. The HMAs were designed within the framework of six different projects conducted in Greece. The mixtures were characterized by various bituminous binder percentages and aggregate grading curves. The preparation of the HMAs was conducted in the laboratory in order to support the mix design procedure as well as the stiffness evaluation of the design mixture.

Aggregates
The aggregates used in the current study were either limestone or diabase aggregates. The limestone aggregates came from the same quarry, while the diabase aggregates came from three different quarries; the results of the aggregates' characterization, along with the test protocols adopted, are reported in Table 2.

Bitumen
As mentioned above, two types of bituminous binder were considered in the present research, a 50/70 conventional bitumen and a bitumen modified with styrene-butadiene-styrene polymers (SBS). The results of the fundamental tests carried out on the two bitumen types, along with the test protocols used, are presented in Table 3.

Hot Mix Asphalts (HMAs)
The hot mix asphalts used in the current study (dense asphalt concrete mixtures) had a maximum aggregate size equal to 20 mm (HMA20) in all cases. More specifically, the HMAs20 with 50/70 conventional bituminous binder, either with limestone or diabase aggregates, were coded as HMA20-5070-Lm or HMA20-5070-Db, respectively. Similarly, for the mixtures with SBS-modified bitumen, the following codes, HMA20-SBS-Lm and HMA20-SBS-Db, for limestone and diabase aggregates, respectively, were used. The specimens of all HMAs were compacted in the laboratory by means of an impact compactor (EN 12697-30). All the specimens were characterized by a diameter of 100 mm and an average thickness of 63.4 for the mixtures with the limestone aggregates and 63.7 mm for the mixtures with the diabase aggregates. In total, 69 specimens were prepared for the mixtures with limestone aggregates and 60 specimens were produced for the mixtures with diabase aggregates. Therefore, 129 specimens in total were used in the present research. The grading curves of the HMA20-5070-Lm and HMA20-SBS-Lm are reported in Figure 4. The grading curves of the HMA20-5070-Db and HMA20-SBS-Db are shown in Figure 5.
Tables S2-S5 report the specimens' volumetric characteristics (EN 12697-8), Marshall stability (MS), and Marshall quotient (MQ) values (EN 12697-34) for each mixture investigated. The Marshall quotient was determined for each specimen, as the ratio between Marshall stability and Marshall flow. Despite having already been discussed as partial representativity of the Marshall results with regard to the bituminous mixtures' response [91][92][93][94], such a test is currently used due to the huge experience gained all around the world.

Stiffness Modulus Evaluation
The stiffness modulus (ITSM) was measured, for all the specimens, according to EN 12697-26, Annex C (IT-CY), using the standard testing conditions: Temperature of 20 °C and a target deformation and rise time equal to 5 μm and 124 ms, respectively. In total, 129 specimens were evaluated for stiffness and the results are presented in Table S6 for HMAs with limestone and diabase aggregates, respectively.

Stiffness Modulus Evaluation
The stiffness modulus (ITSM) was measured, for all the specimens, according to EN 12697-26, Annex C (IT-CY), using the standard testing conditions: Temperature of 20 °C and a target deformation and rise time equal to 5 μm and 124 ms, respectively. In total, 129 specimens were evaluated for stiffness and the results are presented in Table S6 for HMAs with limestone and diabase aggregates, respectively.

Stiffness Modulus Evaluation
The stiffness modulus (ITSM) was measured, for all the specimens, according to EN 12697-26, Annex C (IT-CY), using the standard testing conditions: Temperature of 20 • C and a target deformation and rise time equal to 5 µm and 124 ms, respectively. In total, 129 specimens were evaluated for stiffness and the results are presented in Table S6 for HMAs with limestone and diabase aggregates, respectively.

Artificial Neural Networks Modelling Results
The experimental data set, used for the training and testing of ANNs, included 39 specimens for the HMA mixtures with 50/70 penetration bitumen and limestone aggregate, 30 for ones with modified bitumen (25-55/75 penetration) and limestone aggregate, and 30 for each of the mixtures with diabase aggregate and a different type of bitumen (i.e., 129 specimens overall). According to the relevant literature [78,81], for such a data set size, k-fold cross validation is recommended.
The data of all the mixtures were fitted together to obtain a unique predictive model for each considered mechanical parameter; such models can take into account the different composition of the HMAs in terms of the bitumen content and type, aggregate type, filler to bitumen ratio, and volumetric properties.

Cross-Validation Results
Regarding the k-fold CV, the value of k was set to 5 and consequently, by having 129 available experimental data about the physical and mechanical properties of the considered HMA mixtures, each training data set was randomly made up of 103 observations, whereas the test data set was made up of the remaining 26 observations.
The four indicators' values for each of the case studies considered in the model selection procedure (Table 1) are shown in Tables 4-6, one for each mechanical parameter considered; the models with the best prediction performance are indicated in bold type. Table 4. Estimates of stiffness modulus models' predictive performance by five-fold cross validation.   Table 4, which reports the ITSM models' results, shows that the hyperbolic tangent transfer function leads to higher values of MSE cv , σ cv , and ∆ as the number of hidden layers increases. Conversely, R cv decreases for an increasing number of hidden layers (for example, 0.878 to 0.827 for 10 neurons passing from one to three hidden layers), as would be expected. This trend is verified whatever the number of neurons. Accordingly, hyperbolic-tangent ANNs exhibit an overfitting problem when the number of hidden layers is greater than one.

Number of Neurons in Each Hidden
Instead, regarding the linear positive transfer function, a general trend cannot be observed for the considered indicators, with respect to the number of hidden layers as well as the neurons. Nevertheless, in the case of a single hidden layer, overall, the ANNs performance is poor compared to the cases of two or three hidden layers; it means that, with one hidden layer, the ANNs models show an underfitting problem (i.e., higher MSE cv and smaller ∆). However, the optimal network structure should not be identified by analyzing each indicator individually. Instead, it should be selected by an overall assessment of the four indicators. Indeed, the model with 2 hidden layers and 12 neurons could potentially be considered the best one by focusing the attention only on the R or MSE values  881 and 158,193, respectively). Based on such a consideration, the ITSM's best prediction capacity is achieved by a three hidden layers feedforward network, which makes use of the positive linear transfer function and eight neurons in each hidden layer ( Figure 6).   Similar considerations can be outlined for MS models, according to the results reported in Table 5. Therefore, the MS final model is a three hidden layers network characterized by a positive linear transfer function and eight neurons in each hidden layer ( Figure 6). Table 6 shows that the best results for the MQ parameter are obtained by one hidden layer networks, particularly by means of the positive linear transfer function and 12 neurons (Figure 7). However, less marked differences can be observed between the performance indicators of the MQ models, with respect to the transfer function type, being equal to the number of neurons and hidden layers.   In summary, for the MQ parameter, a multiple hidden layers structure is not needed; for the MS parameter, a slight performance improvement can be observed when using three hidden layers, while a substantial predictive capacity improvement can be appreciated by means of a more complex neural network structure for the stiffness modulus. Among the mechanical parameters considered, the stiffness modulus represents the most important one for a rational performance-based characterization of the materials [1,66,72,91]. Therefore, the use of multiple hidden layers architectures and the positive linear transfer function is justified for modeling stiffness laboratory data with regard to the HMAs considered.
In the Supplementary Materials, the weights and bias values (Tables S8-S16), as well as normalization factors, are reported for each of the ANN models proposed to give the possibility of other researchers reproducing the ANNs topologies developed in this study. Using Equations (7) and (9) along with their associated weights bias values, as well as normalization factors, it will be possible for other researchers to determine predictions of stiffness modulus and Marshall parameters, whatever the composition of the mixture, within the range of composition parameter values considered, without any other additional effort with respect to the standard ANN approach.
The highest accuracy (R cv = 0.881) was obtained from the prediction of the ITSM, while R cv values greater than 0.830 were achieved for the MS (R cv = 0.834) and MQ (R cv = 0.859). These results can be considered satisfactory from an engineering point of view, taking into account the relatively low number of available specimens and the different characteristics of the HMAs considered.
The results of training phases of the ANN final models are summarized in        Comparisons between the experimental targets and ANN-calculated outputs for each mechanical parameter considered are reported in Table S7. The difference between the predicted and experimental data, expressed as a percentage value of the experimental data, is also reported (PD, percentage difference value). Both over-estimations and under-estimations can be observed for all the mechanical parameters considered. For the stiffness modulus, the PD absolute value is within the range 0.01%-20.49%, with an average value equal to 3.44%, while the average values for the MS and MQ are equal to 4.14% and 9.07%, respectively. Therefore, it can be concluded that cross validation is a useful approach to accurately evaluate a model's prediction capacity and, consequently, to identify the network's hyperparameters that lead to the highest possible accuracy for the predictive modeling problem of HMAs' mechanical parameters.

Standard Approach Results
Concerning the standard ANNs, 103 data (80% of the data set) were used for the training of networks and the remaining 26 (20%) for the test. Tables 7-9 show the "best" and the "worst" (over the 10 identified random partitions of the data set) prediction performance of the standard ANNs with a different number of neurons in the hidden layer on the three mechanical parameters considered, namely the stiffness modulus, Marshall stability, and quotient. The standard deviation of the test MSE, included in Tables 7-9, represents a measure of the variance of the performance scores over the 10 groups. Table 7. Effect of data sample variability on the stiffness modulus model's performance. Layer   6  8  10  12  14   Test  Train  Test  Train  Test  Train  Test  Train  Test  Train   Best  MSE  60,863    As can be observed, on the basis of the standard approach, the network performance can vary significantly depending on the test set considered, whatever the number of neurons in the hidden layer; for example, in terms of the stiffness modulus, with regards to 10 neurons, test MSE values are within the range 88, 046-550, 323, with a standard deviation equal to 138, 381 (Table 7). On the basis of the 10 different data samplings carried out, for each considered number of neurons, models characterized by the "worst" performance showed large differences between training and test MSE values; for example, with regards to the stiffness modulus, the MSE values for testing are an order of magnitude higher than the ones for training (550, 323 vs. 47, 722 for 10 neurons, Table 7). These results demonstrate that standard ANN models, on the basis of the "worst" random data partition, do not have the capacity to adequately generalize independently by the number of neurons. Therefore, in such cases, it is absolutely necessary to repeat the standard procedure, namely, random data sampling along with training and test phases, following a trial-and-error philosophy. In this way, it is possible to obtain good predictions (i.e., "best" cases), but these results depend on the data sets used for training and testing the model. Such data sets are randomly determined on the basis of a trial-and-error procedure that requires repetitions of the random partition; this means that a favorable data partition has to be looked for on the basis of a random repetition of the procedure, without any rational line guide. Hence, the standard approach, based on a few data sampling repetitions (up to 10 in this study), cannot be considered as the best suited method for ANN model optimization because the best results can be obtained only for comparable training and test sets variability.

Conclusions
In this study, an extensive ANN modelling activity was developed, with regards to the results of laboratory tests on HMA mixtures for road pavements. The experimental data set included 129 specimens, prepared with different aggregate and bitumen types. The main innovative feature of the research was the ANN model selection procedure, based on k-fold cross validation. In fact, to overcome the limitations of the standard approach of splitting the available data set into subsets with specific functions (i.e., for training and testing a model), a five-fold cross validation was used to compare the performance of 30 different network architectures and select the one that allows the best prediction accuracy of ITSM, MQ, and MS to be achieved, on the basis of four indicators. Five standard ANNs were also developed for each mechanical parameter of interest to demonstrate the wide variability of the models' predictive capacity that can be observed as a consequence of data sample variability.
The main results can be summarized as follows: 1.
The standard approach has proved to be unreliable as the accuracy obtained for one test set can change considerably for a different test set. Hence, basic use of the MATLAB ® ANN toolbox cannot ensure an identification of the ANN model that best fits the experimental data, especially in the case of relatively small data sets.

2.
The five-fold cross validation allowed the predictive capacity of the mechanical parameters' models to be objectively estimated. In particular, the three-hidden layers ANN with eight neurons in each hidden layer and a positive linear transfer function was found to be the best performing model for predicting ITSM and MS. The development of a multiple hidden layers structure represents the second main innovative feature of the research. The MQ final model is a one-hidden layer network with a positive linear transfer function and 12 neurons. In summary, it was verified that the development of a multiple hidden layers structure could be useful, depending on the mechanical parameter investigated.

3.
The use of the hyperbolic tangent transfer function, for multiple layer network architectures, has led to models that are characterized by overfitting problems.

4.
It was verified that the positive linear transfer function is more effective with respect to the common hyperbolic tangent function; this is the third main innovative feature of the research, regarding the applications of ANNs in the pavement engineering field. 5.
Analytical expressions of the proposed ANNs models, along with their associated weight values and normalization factors, were specified in detail to allow other researchers and engineers to compute a prediction of the parameters analyzed, modifying at their choice the composition of the mixture, within the range of variability of hot mix asphalt components considered in the study. 6.
The good results achieved in the training and CV phases demonstrates that ANNs are an efficient data analysis system that is useful for the development of HMA's prediction models, as they were able to satisfactorily identify the complex intrinsic relation between the mixtures' properties and the analyzed mechanical parameters. 7.
The ANNs models represent a typical "black box" method, which is not physically based; such a major drawback is compensated for by the possibility of obtaining satisfactory predictions of the mechanical parameters considered in a relatively easy way, within the ambit of materials and parameters considered. 8.
The use of predictive models, elaborated by means of artificial neural networks, can be really useful during the design phase of hot mix asphalts because it is possible to obtain very quickly an accurate estimation of the analyzed mechanical parameters as a consequence of different input values regarding the mixture's composition, so avoiding any other experimental tests. 9.
The reliability and effectiveness of the outlined ANN approach deserve further assessment by increasing the number of specimens, the variability of the mixtures' properties, and also considering different mechanical parameters, for instance, those associated to fatigue and permanent deformation resistance.
Supplementary Materials: The following are available online at http://www.mdpi.com/2076-3417/9/17/3502/s1, Table S1: Normalization parameters for the input and target variables, Table S2: Characteristics of HMA20-50/70-Lm specimens, Table S3: Characteristics of HMA20-SBS-Lm specimens, Table S4: Characteristics of HMA20-50/70-Db specimens, Table S5: Characteristics of HMA20-SBS-Db specimens, Table S6: Stiffness tests results for HMAs with limestone and diabase aggregates, Table S7: Comparison between experimental and ANN predicted data, Table S8: Component of the input weight matrix IW1,1 of the Stiffness Modulus final model, Table S9: Component of layers' weight matrices LW2,1, LW3,2, LW4,3 of the Stiffness Modulus final model, Table S10: Component of the hidden layers' bias vectors b1, b2, b3 and the output layer's scalar bias b4 of the Stiffness Modulus final model, Table S11: Component of the input weight matrix IW1,1 of the Marshall Stability final model, Table S12: Component of layers' weight matrices LW2,1, LW3,2, LW4,3 of the Marshall Stability final model, Table S13: Component of the hidden layers' bias vectors b1, b2, b3 and the output layer's scalar bias b4 of the Marshall Stability final model, Table S14: Component of the input weight matrix IW1,1 of the Marshall Quotient final model, Table S16: Component of the hidden layer's bias vectors b1 and the output layer's scalar bias b2 of the Marshall Quotient final model.