Porosity Prediction of Granular Materials through Discrete Element Method and Back Propagation Neural Network Algorithm

: Granular materials are used directly or as the primary ingredients of the mixtures in industrial manufacturing, agricultural production and civil engineering. It has been a challenging task to compute the porosity of a granular material which contains a wide range of particle sizes or shapes. Against this background, this paper presents a newly developed method for the porosity prediction of granular materials through Discrete Element Modeling (DEM) and the Back Propagation Neural Network algorithm (BPNN). In DEM, ball elements were used to simulate particles in granular materials. According to the Chinese speciﬁcations, a total of 400 specimens in di ﬀ erent gradations were built and compacted under the static pressure of 600 kPa. The porosity values of those specimens were recorded and applied to train the BPNN model. The primary parameters of the BPNN model were recommended for predicting the porosity of a granular material. Veriﬁcation was performed by a self-designed experimental test and it was found that the prediction accuracy could reach 98%. Meanwhile, considering the inﬂuence of particle shape, a shape reduction factor was proposed to achieve the porosity reduction from sphere to real particle shape.


Introduction
Granular mixture is widely applied in industrial manufacturing, agricultural production and civil engineering, including powder metallurgy, food accumulation and building materials. For these mixtures, porosity is one of the most crucial factors that has a considerable impact on mixture structure and performance. However, due to the complex particle composition, it is hard to calculate or predict the mixture porosity without the help of lab experiments.
In order to solve the above problems, many researchers have proposed different methods to calculate and analyze the porosity of granular mixtures. In particuology, packing density calculation models based on the packing theory were derived, including the Furnas model [1], linear packing density model [2], compressible packing model [3] and other improved models [4,5]. All of them are analytical models which were used to analyze particle spatial distributions and mixture packing densities. However, some literature [6,7] confirmed that these models were not accurate enough for complex problems and where particles have a wide range of sizes. A typical mixture in pavement engineering, however, is mainly composed of mineral aggregates with a wide range of particles sizes. The fine aggregates are less than 2.36 mm while the large particles can be larger than 19.0 mm. article the aggregate gradations were selected as the major control parameters, while the particle shapes were considered as minor parameters by introducing a reduction factor.

Determination of Possible Gradation Scope
In order to contain all possible proportions of coarse aggregates from 2.36 mm to maximum particle size, the recommended range of each particle size group was summarized on the basis of Chinese Construction Specification JTG F40-2004 [31], including various common gradation types: asphalt concrete (AC), asphalt macadam (AM), open graded friction course (OGFC) and stone mastic asphalt (SMA). In this way, the maximum possible gradation scopes of different maximum nominal particle sizes are obtained finally, as shown in Tables 1 and 2. It could be observed that the mineral mixtures had a large particle size polydispersity, but the particle size ratios in each sieving size group in pavement materials were relatively small. However, the particle size distribution of the mixture can be adjusted by changing the proportion of sieving groups.

Establishment of Coarse Aggregate Proportion Database
Machine learning algorithms require large sample size to ensure the stability and accuracy or prediction results. In order to obtain enough test data for training for both Gradation-13 and Gradation-16, 200 coarse aggregates' proportions were randomly created by VBA within the upper and lower limit listed in Tables 1 and 2. As shown in Figures 1 and 2, 200 combinations cover almost all possible gradations between upper and lower limit for one gradation type.  In the above two figures, the upper and lower rough lines represent the upper and lower bounds of the gradation. Meanwhile, each fine line in the middle of upper and lower limits represents a particle combination. Although particle compositions cannot be listed completely, figures justify that the samples used for model training are representative.

Virtual Compacted Model Design
In this study, Particle Flow Code (PFC) was applied to carry out the virtual compaction test. Static compaction, as a commonly used compaction method in road material's DEM modelling, was chosen, considering both compacting effect and computing efficiency. In addition, in order to  In the above two figures, the upper and lower rough lines represent the upper and lower bounds of the gradation. Meanwhile, each fine line in the middle of upper and lower limits represents a particle combination. Although particle compositions cannot be listed completely, figures justify that the samples used for model training are representative.

Virtual Compacted Model Design
In this study, Particle Flow Code (PFC) was applied to carry out the virtual compaction test. Static compaction, as a commonly used compaction method in road material's DEM modelling, was chosen, considering both compacting effect and computing efficiency. In addition, in order to In the above two figures, the upper and lower rough lines represent the upper and lower bounds of the gradation. Meanwhile, each fine line in the middle of upper and lower limits represents a particle combination. Although particle compositions cannot be listed completely, figures justify that the samples used for model training are representative.

Virtual Compacted Model Design
In this study, Particle Flow Code (PFC) was applied to carry out the virtual compaction test. Static compaction, as a commonly used compaction method in road material's DEM modelling, was chosen, considering both compacting effect and computing efficiency. In addition, in order to determine a suitable static compaction pressure, the influence of static pressure applied for the compaction was tested in detail. It should be noted that all particles are ball shaped in order to control a single variable in this stage. Three gradations were selected randomly and their passing rates are shown in Table 3. Since all the particle combinations were a coarse part of an entire gradation, the total mass of different mixtures may also be different. All elements were generated in accordance with passing rate and total volume of design specimen. The boundary condition for the container is simulated by a wall, which is similar to its counterpart steel container in the laboratory test. The compaction pressure only applied on top and bottom wall. The height of design cylindrical specimen is 100 mm and the diameter of the container is also 100 mm.
For each specimen, the virtual test mainly follows three steps: (1) The ball elements were generated without overlap in a random position in a cylindrical area; (2) Seven levels of hydrostatic pressure were applied to compact seven specimens, including 100 kPa, 200 kPa, 300 kPa, 400 kPa, 500 kPa, 600 kPa and 700 kPa; (3) Compression was performed until all elements become stable as shown in Figure 3. At that time, mixtures are considered to be in the most compact state.
Appl. Sci. 2020, 10, x FOR PEER REVIEW  5 of 19 determine a suitable static compaction pressure, the influence of static pressure applied for the compaction was tested in detail. It should be noted that all particles are ball shaped in order to control a single variable in this stage. Three gradations were selected randomly and their passing rates are shown in Table 3. Since all the particle combinations were a coarse part of an entire gradation, the total mass of different mixtures may also be different. All elements were generated in accordance with passing rate and total volume of design specimen. The boundary condition for the container is simulated by a wall, which is similar to its counterpart steel container in the laboratory test. The compaction pressure only applied on top and bottom wall. The height of design cylindrical specimen is 100 mm and the diameter of the container is also 100 mm.
For each specimen, the virtual test mainly follows three steps: (1) The ball elements were generated without overlap in a random position in a cylindrical area; (2) Seven levels of hydrostatic pressure were applied to compact seven specimens, including 100 kPa, 200 kPa, 300 kPa, 400 kPa, 500 kPa, 600 kPa and 700 kPa; (3) Compression was performed until all elements become stable as shown in Figure 3. At that time, mixtures are considered to be in the most compact state. Because there are only coarse aggregates in this model, linear contact model was applied to all particles. The density of aggregates was set as 2650 g/cm 3 and effective modulus and friction coefficient were 50 GPa and 0.4, respectively. The results are shown in Table 4. Because there are only coarse aggregates in this model, linear contact model was applied to all particles. The density of aggregates was set as 2650 g/cm 3 and effective modulus and friction coefficient were 50 GPa and 0.4, respectively. The results are shown in Table 4. It can be observed that compaction pressure caused little effect on final porosity for same gradation mixture. Taking Gradation#1 as an example, the range of mixture porosity is from 0.328 to 0.334, which indicates the mixture with sphere particles can be well compacted at all above pressures. Meanwhile, a compaction pressure of 600 kPa is commonly used in laboratory tests in many countries' specification, for instance, Superior Performing Asphalt Pavement (SUPERPAVE) specification. Therefore, a fixed compaction pressure of 600 kPa was chosen to carry out virtual compaction tests in this study. The particle number in each specimen varies from about 1000 to 17,000 for different gradations, and computing time changes from 10 min to nearly 2 h.

Data Recording
Through above virtual compaction models, the porosity of coarse aggregates (larger than 2.36 mm) mixtures were measured and recorded. In order to eliminate the impact of contingency, three gradations were randomly selected and five parallel virtual experiments were conducted. In each group of parallel virtual experiments, the particle size distributions were identical, but the initial positions of ball elements were different. The detailed results are shown in Table 5. According to the results, it can be observed, that for the same gradation, the porosity measured by different parallel specimens is very close. However, there are great differences among different gradations, which shows that the results of the virtual compaction test were credible. Therefore, the porosity of Gradation-13 and Gradation-16 specimens were recorded respectively in the same way. The following Tables 6 and 7 show the final results of two types of gradation. Based on above data, the internal relationship between independent variables (retained percentage of each particle size group) and dependent variable (porosity) will be further analyzed in the next chapter.

Data analysis and BP Neural Network Model Building
MATLAB software included plenty of useful functions of neural network algorithm in the Neural Network Toolbox as shown in the literature [32]. These functions provide the possibility to build a basic model for those not professional at machine learning. Therefore, the BPNN models in this article were built with "newff" function in Neural Network Toolbox of MATLAB, which is easy for reference.
In general, the BP neural network model includes an input layer, an output layer and multiple hidden layers. The previous research shows that the three layer BP neural network with one hidden layer can approximate any non-linear function [32]. Hence, the model in this article was established with one hidden layer and the model structure is shown in Figure 4. Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 19 In the BP model, there are three kinds of key parameters: (1) input and output variables; (2) transfer function; (3) the number of hidden layer nodes and their corresponding weight thresholds. Through the above three parameters, a complete BP neural network prediction model can be obtained and calculated through Equation (1).
where (1) represents model input variables (particle volume fraction of different particle size groups); T represents model output result; k represents the number of input variables in the range of 1 to m; m represents the total number of input variables, four for Gradation-13 and five for Gradation-16; (2) f(x) refers to transfer function; (3) i represents the number of hidden layer nodes in the range of 1 to n; n refers to the total number of hidden layer nodes; represents weight from input layer to hidden layer; refers to weight from hidden layer to output layer; represents threshold from input layer to hidden layer; η represents threshold from hidden layer to output layer.
In addition, in order to evaluate model training accuracy, Mean Square Error (MSE) and coefficient of determination ( 2 ) of the output results are applied and MSE can be calculated according to Equation (2).
where represents the test value and represents the real value. Meanwhile, n refers to the test set sample number.
In "newff" function, there are not too many parameters to be determined. Hence, all of them are listed in the following Table 8.  In the BP model, there are three kinds of key parameters: (1) input and output variables; (2) transfer function; (3) the number of hidden layer nodes and their corresponding weight thresholds. Through the above three parameters, a complete BP neural network prediction model can be obtained and calculated through Equation (1).
where (1) X k represents model input variables (particle volume fraction of different particle size groups); T represents model output result; k represents the number of input variables in the range of 1 to m; m represents the total number of input variables, four for Gradation-13 and five for Gradation-16; (2) f(x) refers to transfer function; (3) i represents the number of hidden layer nodes in the range of 1 to n; n refers to the total number of hidden layer nodes; W ki represents weight from input layer to hidden layer; V i refers to weight from hidden layer to output layer; ε i represents threshold from input layer to hidden layer; η represents threshold from hidden layer to output layer. In addition, in order to evaluate model training accuracy, Mean Square Error (MSE) and coefficient of determination (R 2 ) of the output results are applied and MSE can be calculated according to Equation (2).
where T i represents the test value and t i represents the real value. Meanwhile, n refers to the test set sample number.
In "newff" function, there are not too many parameters to be determined. Hence, all of them are listed in the following Table 8. Because of the high relativity between gradation and mixture porosity, the prediction accuracy was great enough when the default functions were applied. In order to simplify the procedures of building the prediction model, the default transfer functions were finally determined. It should be noted that the determination of other key parameters is illustrated in the following Chapter 3.1, Input and output variables processing.

Input and Output Variables Processing
For the model in this article, the input layer contains four variables (i.e., particle volume fraction of 13.2-16.0 mm, 9.5-13.2 mm, 4.75-9.5 mm and 2.36-4.75 mm) for Gradation-13 and five variables for Gradation-16, and output layer includes the only dependent variable, porosity, for both gradation types.
The first and foremost step in BP neural network modeling is to process the original input data. Because the recorded data are relatively close to each other in the whole range of 0.30-0.45, it is necessary to standardize the original data and transfer variables into the range of [-1, 1] according to Equation (3). Meanwhile, the original gradation should also be processed through Equation (3). This procedure can achieve the best performance for modeling.
where y max and y min represent 1 and −1 respectively in this article, while x max and x min refer to the maximum and minimum value of recorded porosity and x 0 refers to original porosity, while y 0 refers to standardized porosity. Through the above equation, the procedure for standardizing the input variables includes these three parts: (1) finding maximum and minimum values in the recorded porosity as x max and x min ; (2) calculating each standardized porosity (y 0 ) value through Equation (3); (3) recording standardized porosity (y 0 ) values. The next step is to divide the available data set into two groups, one for training and another for testing the developed model. Meanwhile, the testing set will not be used in the model training process. It should be noted that grouping is done randomly. In this study, 200 particle compositions and corresponding porosity for both Gradation-13 and Gradation-16 were randomly divided into training and testing groups.
Finally, the output results still need to be anti-standardized in a similar way by following Equation (4): where, x max and x min refer to the maximum and minimum value of recorded porosity and x 1 refers to prediction porosity, while y 1 refers to standardized prediction porosity. The standardized and anti-standardized can be completed automatically by function "mapminmax" in MATLAB only if the original data is inputted.

Transfer Function Determination
There are three transfer functions commonly used in the BP neural network model and these functions are applied in the hidden and output layer. The default transfer functions in "newff" function include a hyperbolic tangent function("tansig") in hidden layer as shown in Equation (5) and a pure linear function("purelin") in the output layer in Equation (6).
purelin(n) = n The final prediction formula can be obtained by training the model to determine the appropriate parameters. Among these parameters, the number of hidden layer nodes should be tested by user. The other corresponding parameters could be determined automatically.

Hidden Layer Node Number Determination
Generally, the model accuracy will be improved by increasing the training sample size. However, when the sample size is large enough, even more data cannot improve the model performance. For the purpose of studying the relationship between original sample size and training model performance, different training sample sizes were tested and compared. The mean square error (MSE) and coefficient of determination (R 2 ) between prediction value and virtual test value were calculated to evaluate the accuracy and stability of final prediction model. Seven models for Gradation-13 were built in this article with sample sizes of 25, 50, 75, 100, 125, 150 and 175 respectively. For all seven models, 25 samples were applied as the testing group to check the model performance by calculating R 2 .
For the BP model, except for training sample size, the hidden layer node number also plays a significant role in model training. According to empirical Equation (7), the range of node number is set as 3 to 13.
where, m and n refer to the number of input and output layer nodes respectively and a refers to a regulating constant in the range of 1 to 10. For each group, 10 parallel tests were carried out to determine the optimum node number. The detailed procedure is omitted in this article since it was introduced in another publication [32]. The following Figure 5 shows that coefficient of determination (R 2 ) changes with node number.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 19 The final prediction formula can be obtained by training the model to determine the appropriate parameters. Among these parameters, the number of hidden layer nodes should be tested by user. The other corresponding parameters could be determined automatically.

Hidden Layer Node Number Determination
Generally, the model accuracy will be improved by increasing the training sample size. However, when the sample size is large enough, even more data cannot improve the model performance. For the purpose of studying the relationship between original sample size and training model performance, different training sample sizes were tested and compared. The mean square error (MSE) and coefficient of determination (R 2 ) between prediction value and virtual test value were calculated to evaluate the accuracy and stability of final prediction model. Seven models for Gradation-13 were built in this article with sample sizes of 25, 50, 75, 100, 125, 150 and 175 respectively. For all seven models, 25 samples were applied as the testing group to check the model performance by calculating R 2 .
For the BP model, except for training sample size, the hidden layer node number also plays a significant role in model training. According to empirical Equation (7), the range of node number is set as 3 to 13.
where, m and n refer to the number of input and output layer nodes respectively and a refers to a regulating constant in the range of 1 to 10. For each group, 10 parallel tests were carried out to determine the optimum node number. The detailed procedure is omitted in this article since it was introduced in another publication [32]. The following Figure 5 shows that coefficient of determination (R 2 ) changes with node number. It can be observed that: (1) the optimum node number can be different for different models; (2) with training sample size increasing, model performance is less sensitive to node number. The optimum hidden layer node number for all seven groups are shown in the following Table 9. It can be observed that: (1) the optimum node number can be different for different models; (2) with training sample size increasing, model performance is less sensitive to node number. The optimum hidden layer node number for all seven groups are shown in the following Table 9. Meanwhile, the following Figure 6 shows the R 2 for different models based on their optimum node number. The error bars here represent standard deviation.  Meanwhile, the following Figure 6 shows the R 2 for different models based on their optimum node number. The error bars here represent standard deviation. From this figure, it is clear that with the increase of training group sizes, R 2 rises sharply in the initial stage. After the number of training groups goes up to more than 75, the growth rate of R 2 becomes very small. Meanwhile, for model accuracy, when the amount of training data is too small, the model error is very large. With the increase of training data size, the model tends to be more stable. In this state, a more accurate prediction model can be obtained.
Based on the above analysis, the number of training sets was recommended as 75 or 100, considering training accuracy and efficiency. Meanwhile, the best number of testing sets is about 1/2-1/3 of the training set. From this figure, it is clear that with the increase of training group sizes, R 2 rises sharply in the initial stage. After the number of training groups goes up to more than 75, the growth rate of R 2 becomes very small. Meanwhile, for model accuracy, when the amount of training data is too small, the model error is very large. With the increase of training data size, the model tends to be more stable. In this state, a more accurate prediction model can be obtained.

Testing Results
Based on the above analysis, the number of training sets was recommended as 75 or 100, considering training accuracy and efficiency. Meanwhile, the best number of testing sets is about 1/2-1/3 of the training set.

Testing Results
On the basis of the existing database, 150 groups were used as the training group and 50 groups were applied as the testing group to build the BP neural network model. The training results are shown in Figure 7 for Gradation-13 and Figure 8 for Gradation- 16. In the process of data transmission, through repeated training, four hidden layer nodes for Gradation-13 and five nodes for Gradation-16 were finally determined as the most suitable number of hidden layer nodes.
On the basis of the existing database, 150 groups were used as the training group and 50 groups were applied as the testing group to build the BP neural network model. The training results are shown in Figure 7 for Gradation-13 and Figure 8 for Gradation- 16. In the process of data transmission, through repeated training, four hidden layer nodes for Gradation-13 and five nodes for Gradation-16 were finally determined as the most suitable number of hidden layer nodes.  For two models, their coefficients of determination both achieve 99%, which means porosity prediction models have enough accuracy to provide reliable results.
In addition, all the related parameters applied in model training are listed in the following tables. Tables 10 and 11 provide the gradation and porosity normalization parameters  (  ,  ,  , ) for Gradation-13 and Gradation-16. Through these parameters, the original were applied as the testing group to build the BP neural network model. The training results are shown in Figure 7 for Gradation-13 and Figure 8 for Gradation-16.
In the process of data transmission, through repeated training, four hidden layer nodes for Gradation-13 and five nodes for Gradation-16 were finally determined as the most suitable number of hidden layer nodes.  For two models, their coefficients of determination both achieve 99%, which means porosity prediction models have enough accuracy to provide reliable results.
In addition, all the related parameters applied in model training are listed in the following tables. Tables 10 and 11 provide the gradation and porosity normalization parameters  (  ,  ,  , ) for Gradation-13 and Gradation-16. Through these parameters, the original For two models, their coefficients of determination both achieve 99%, which means porosity prediction models have enough accuracy to provide reliable results.
In addition, all the related parameters applied in model training are listed in the following tables. Tables 10 and 11 provide the gradation and porosity normalization parameters (x max , x min , y max , y min ) for Gradation-13 and Gradation-16. Through these parameters, the original gradation and porosity data could be normalized according to Equation (3). Then, the weights and thresholds used to transfer data from the input layer to the hidden layer are shown in Table 12, while those parameters from the hidden layer to the output layer are shown in Table 13.   The complete prediction calculation process is listed as follows: Step1: Original gradation and porosity data are recorded and inputted to MATLAB program.
Step2: All the original data are normalized according to Equation (3) and parameters in Tables 10  and 11.
Step3: The values in hidden layer are calculated according to Equations (8).
Through the above steps, the porosity of a certain kind of particulate mixture can be accurately predicted.

Verification and Application of Prediction Method
For the purpose of evaluating model validity and accuracy, a static compression test was also completed in laboratory. The steel balls were used to compare with the virtual ball element model. Meanwhile, the relationship among the measured porosity, the simulated porosity and the predicted value was compared. In order to simulate the real composition of granular particles applied in highway engineering, a great number of steel balls in different sizes were prepared and mixed in the same proportion as uniformly as possible, then sieved to several different size groups. The original particle size distribution and sieved results of the steel balls are shown in Table 14 and Figure 9.

Verification and Application of Prediction Method
For the purpose of evaluating model validity and accuracy, a static compression test was also completed in laboratory. The steel balls were used to compare with the virtual ball element model. Meanwhile, the relationship among the measured porosity, the simulated porosity and the predicted value was compared. In order to simulate the real composition of granular particles applied in highway engineering, a great number of steel balls in different sizes were prepared and mixed in the same proportion as uniformly as possible, then sieved to several different size groups. The original particle size distribution and sieved results of the steel balls are shown in Table 14 and Figure 9.  According to Table 14 and Figure 9, every sieving group was mixed with several different particle size groups uniformly. For 9.5-13.2 mm group, 13.0 mm particles were not found, so 12.303 mm balls were added to keep a continuous and real size distribution in the steel balls.
Then, in order to make the results of laboratory and DEM comparable, the size of steel ball experiments' container was exactly the same as the DEM model. According to different gradations, the steel balls were mixed in a standard Marshall mold, whose diameter was 100 mm as shown in Figure 10. Because there was no filler or binder in this mixture, universal testing machine were applied to compact the mixed steel balls with a static pressure of 600 kPa. Meanwhile, particles' total According to Table 14 and Figure 9, every sieving group was mixed with several different particle size groups uniformly. For 9.5-13.2 mm group, 13.0 mm particles were not found, so 12.303 mm balls were added to keep a continuous and real size distribution in the steel balls.
Then, in order to make the results of laboratory and DEM comparable, the size of steel ball experiments' container was exactly the same as the DEM model. According to different gradations, the steel balls were mixed in a standard Marshall mold, whose diameter was 100 mm as shown in Figure 10. Because there was no filler or binder in this mixture, universal testing machine were applied to compact the mixed steel balls with a static pressure of 600 kPa. Meanwhile, particles' total volume was measured by the displacement method. On the other hand, virtual compression tests on computer and porosity prediction on BP neural network model were completed at the same time. The detailed test data is shown in the following Figure 11.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 15 of 19 volume was measured by the displacement method. On the other hand, virtual compression tests on computer and porosity prediction on BP neural network model were completed at the same time. The detailed test data is shown in the following Figure 11.  As can be seen from the above figure, model prediction accuracy for the steel ball test was very high and model error was extremely slight. The maximum relative error of prediction was no more than 1.15%. This proved the credibility of the prediction model. Meanwhile, the results of lab tests and virtual tests were very close, which also proved that for the sphere element-based model, sufficient credible raw data could be obtained by DEM.

Reduction Factor Analysis Considering Particle Shape Effect on Mixture Porosity
From the above illustration, a porosity prediction model for sphere element has been established. However, as mentioned in the Introduction, paving materials in reality generally have complex but similar particle shape combinations, which indicate that shape effect is non-negligible but consistent. Therefore, a reduction factor to consider particle shape effect was introduced in this chapter.
A gradation was selected randomly for comparing the mixture porosity between sphere elements model and real aggregate shape model. The volume of each particle size group was: Appl. Sci. 2020, 10, x FOR PEER REVIEW 15 of 19 volume was measured by the displacement method. On the other hand, virtual compression tests on computer and porosity prediction on BP neural network model were completed at the same time. The detailed test data is shown in the following Figure 11.  As can be seen from the above figure, model prediction accuracy for the steel ball test was very high and model error was extremely slight. The maximum relative error of prediction was no more than 1.15%. This proved the credibility of the prediction model. Meanwhile, the results of lab tests and virtual tests were very close, which also proved that for the sphere element-based model, sufficient credible raw data could be obtained by DEM.

Reduction Factor Analysis Considering Particle Shape Effect on Mixture Porosity
From the above illustration, a porosity prediction model for sphere element has been established. However, as mentioned in the Introduction, paving materials in reality generally have complex but similar particle shape combinations, which indicate that shape effect is non-negligible but consistent. Therefore, a reduction factor to consider particle shape effect was introduced in this chapter.
A gradation was selected randomly for comparing the mixture porosity between sphere elements model and real aggregate shape model. The volume of each particle size group was: As can be seen from the above figure, model prediction accuracy for the steel ball test was very high and model error was extremely slight. The maximum relative error of prediction was no more than 1.15%. This proved the credibility of the prediction model. Meanwhile, the results of lab tests and virtual tests were very close, which also proved that for the sphere element-based model, sufficient credible raw data could be obtained by DEM.

Reduction Factor Analysis Considering Particle Shape Effect on Mixture Porosity
From the above illustration, a porosity prediction model for sphere element has been established. However, as mentioned in the Introduction, paving materials in reality generally have complex but similar particle shape combinations, which indicate that shape effect is non-negligible but consistent. Therefore, a reduction factor to consider particle shape effect was introduced in this chapter.
A gradation was selected randomly for comparing the mixture porosity between sphere elements model and real aggregate shape model. The volume of each particle size group was: Volume 13.2−16.0 mm : Volume 9.5−13.2 mm : Volume 4.75−9.5 mm : Volume 2.36−4.75 mm = 1 : 5 : 4 : 1.
A total number of 15 DEM models with different real aggregated shapes were built based on previous researches [33][34][35][36][37]. It should be noted that there is only one particle shape in each specimen to amplify its effect, as shown in Figure 12. A total number of 15 DEM models with different real aggregated shapes were built based on previous researches [33][34][35][36][37]. It should be noted that there is only one particle shape in each specimen to amplify its effect, as shown in Figure 12. All the aggregate shapes used in this chapter were quite common. Although there are some special shapes in civil engineering, for instance, elongated particle, their volume percentages in mixture were generally low and strictly controlled, due to their poor engineering properties. The model parameters were identical with sphere element-based models. In this way, these mixtures' porosities were recorded and plotted on the following Figure 13. From Figure 13, the red line represents the mixture porosity of the sphere elements-based model, which was 0.381, while the green points refer to 15 mixture porosities in 15 real aggregate shapes respectively. It can be observed that there is an obvious difference between these two models. Nevertheless, this difference is quite stable and consistent among 15 green points. Detailed statistical analysis has been done as shown in the following Table 15. All the aggregate shapes used in this chapter were quite common. Although there are some special shapes in civil engineering, for instance, elongated particle, their volume percentages in mixture were generally low and strictly controlled, due to their poor engineering properties. The model parameters were identical with sphere element-based models. In this way, these mixtures' porosities were recorded and plotted on the following Figure 13. A total number of 15 DEM models with different real aggregated shapes were built based on previous researches [33][34][35][36][37]. It should be noted that there is only one particle shape in each specimen to amplify its effect, as shown in Figure 12. All the aggregate shapes used in this chapter were quite common. Although there are some special shapes in civil engineering, for instance, elongated particle, their volume percentages in mixture were generally low and strictly controlled, due to their poor engineering properties. The model parameters were identical with sphere element-based models. In this way, these mixtures' porosities were recorded and plotted on the following Figure 13. From Figure 13, the red line represents the mixture porosity of the sphere elements-based model, which was 0.381, while the green points refer to 15 mixture porosities in 15 real aggregate shapes respectively. It can be observed that there is an obvious difference between these two models. Nevertheless, this difference is quite stable and consistent among 15 green points. Detailed statistical analysis has been done as shown in the following Table 15. From Figure 13, the red line represents the mixture porosity of the sphere elements-based model, which was 0.381, while the green points refer to 15 mixture porosities in 15 real aggregate shapes respectively. It can be observed that there is an obvious difference between these two models. Nevertheless, this difference is quite stable and consistent among 15 green points. Detailed statistical analysis has been done as shown in the following Table 15. From Table 15, 95% confidence interval has been determined and plotted as shown in Figure 13. Therefore, a shape reduction factor range was recommended considering the shape effect through Equations (10) and (11).
Upper limit of shape reduction factor = Confidence upper limit Mixture porosity o f sphere model = 0.231 0.381 = 0.606 (10) Lower limit of shape reduction factor = Confidence lower limit Mixture porosity o f sphere model = 0.206 0.381 = 0.541 (11) When the prediction results based on sphere element models are multiplied by shape reduction factor, a more accurate result could be obtained.

Summary and Conclusions
In this research, a porosity prediction model was established for granular mixtures through the DEM model and BPNN algorithm. A total of 400 mixes were simulated with the discrete element method to provide the database for BPNN training and testing. The prediction results were verified by steel ball experiments. Through this research, the following findings and conclusions were observed:

1.
This study provided an efficient and detailed method to build a porosity prediction model according to DEM model and BPNN algorithm. After model training, testing in MATLAB and verifying in lab experiments, the model's coefficients of determination could achieve 99%.

2.
A suitable original database size was compared and recommended: prediction model accuracy could reach about 98% when the original data size was larger than 75. The amount of raw data can be determined according to different accuracy requirements.

3.
A shape reduction factor was proposed to achieve the porosity reduction from sphere to real particle shape, which made the porosity prediction results closer to reality and easy to use. In this research, particle shapes were not considered as the primary control parameters, which will be considered in a future study. Evaluation is recommended of particle shapes through combining the findings of this research with experimental testing methods.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.