Using Artificial Neural Networks to Predict Influences of Heterogeneity on Rock Strength at Different Strain Rates

Pre-existing cracks and associated filling materials cause the significant heterogeneity of natural rocks and rock masses. The induced heterogeneity changes the rock properties. This paper targets the gap in the existing literature regarding the adopting of artificial neural network approaches to efficiently and accurately predict the influences of heterogeneity on the strength of 3D-printed rocks at different strain rates. Herein, rock heterogeneity is reflected by different pre-existing crack and filling material configurations, quantitatively defined by the crack number, initial crack orientation with loading axis, crack tip distance, and crack offset distance. The artificial neural network model can be trained, validated, and tested by finite 42 quasi-static and 42 dynamic Brazilian disc experimental tests to establish the relationship between the rock strength and heterogeneous parameters at different strain rates. The artificial neural network architecture, including the hidden layer number and transfer functions, is optimized by the corresponding parametric study. Once trained, the proposed artificial neural network model generates an excellent prediction accuracy for influences of high dimensional heterogeneous parameters and strain rate on rock strength. The sensitivity analysis indicates that strain rate is the most important physical quantity affecting the strength of heterogeneous rock.


Introduction
Natural rocks typically have substantial heterogeneity in terms of containing preexisting flaws at multiple scales resulting from a variety of geological processes. The heterogeneity makes alternations to rock mechanical strength, further affecting the deformation behaviors and failure patterns upon loading [1,2]. In fact, to a certain degree, almost all rock-related engineering projects include structure constructions in rock masses, which involve pre-existing opening cracks [3,4]. For example, human excavations can greatly alter the stability of rock masses as a result of disturbance of the primary stress equilibrium, leading to artificially induced stress redistribution. Consequently, stress concentration may occur around the pre-existing flaws, which triggers crack coalescence and ultimate failure [5,6]. These opening flaws are usually naturally filled with different fine-grained materials due to weathering or joint shearing [7]. In geological constructions, opening flaws are manually filled via grouting and shotcrete to strengthen the load capacity of flawed rocks [8,9]. The addition of filling materials further enhances the rock heterogeneity. Fracture behaviors of flawed rock become more complex with the involvement of filling materials due to the complicated interactions between rock and flaw filling. In addition, natural rocks are often subjected to high strain rate loadings, such as earthquakes, rock blasting and oil-well fracturing. As a rate-dependent material, the heterogeneous rock mechanical strength is also highly related to the strain rate [10].
Hence, it is vital to establish the relationship between the heterogeneity on the flawed rock (induced by pre-existing flaws and filling materials) concerning the mechanical strength at different strain rates, bringing great convenience to rock engineering application and geotechnical design. In terms of evaluating the rock mechanical strength, laboratory tests are the most widely applied direct approaches, where standardized specimens are indeed required according to the corresponding testing standards. So far, several laboratory experiments have been performed on the flawed rocks with different pre-existing crack configurations, such as crack geometry, number, orientation, etc. [11,12]. For instance, by performing static and dynamic experiments on cubic rock specimens with a single preexisting flaw, Yan et al. [13] illustrated the significant coupled effects of pre-compression and strain rate on rock strength and fracture responses. Zhao et al. [14] investigated the cracking and coalescence modes on the rock-like specimen containing two parallel cracks via uniaxial compression tests. They highlighted that the peak strength of the tested samples is related to the crack inclination angle and the associated bridge angle. In addition, Huang et al. [15] put more emphasis on probing into the effects of confining pressures. Regarding three pre-existing fissures, Zhou et al. [16] focused on the investigation of coupled thermomechanical and underlying cracking mechanisms. However, the major drawbacks of experiments are also obvious; they are known to be time-consuming and expensive. Considering the multiple heterogeneous parameters and strain rate that influence the mechanical behaviors of the flawed rock, a large number of samples with variations are required. For some special fragile or heavily weathered rocks, an adequate number of high-quality formed samples is not always feasible to obtain. Therefore, laboratory-based rock mechanical strength estimation may be problematic, especially for investigating high dimensional variables.
An alternative approach is to run numerical simulations by performing parametric studies to reduce the excessive experimental workload and cut the unnecessary cost [17,18]. For example, Wang et al. [19] performed extended finite element method simulation to investigate the initiation mechanisms of wing cracks, which stayed consistent with the theoretical analysis. Liu et al. [20] incorporated a fluid coupled model into grain-based model to discover the promotion mechanisms for crack coalescence in two systems with pre-existing flaws. Shen et al. [21] adopted a discrete element method to establish the relationships between P wave velocity and the number of pre-existing cracks, as well as confining pressure. However, as an indirect approach, the simulation study has its inherent disadvantages. Before conducting simulation, a thorough understanding is needed, with an awareness of all the factors involved. Meanwhile, a considerable amount of time is spent on validating the material constitutive parameters. Other conventional method is either seeking an analytical solution or obtaining the empirical solution for simplicity and reliability [22]. Nevertheless, these two options are not always achievable for complicated problems involving nonlinear and high-dimensional variables [23].
Recently, the artificial neural network (ANN) approach has shown promising capabilities in handling the abovementioned complex problem [24,25]. The ANN-based approach presented more reliable and acceptable results than conventional statistical analysis, as reported in [26]. ANN has an inherent ability to learn from available datasets and has been proven to be a powerful tool in various fields. For example, Gope et al. [27] designed an efficient ANN architecture to predict the crack growth direction of aluminum alloys. Liu and Athanasiou [23] successfully captured the complex relationship among the planestrain stress intensity factor, the indentation load, and the specimen dimensions through ANN. Yan et al. [28] proved that ANN models are viable for predicting fracture parameters with high accuracy by using only 77 sets of experimental data. For predicting the rock strength, the previous ANN models primarily take the inherent physical properties of rock materials into account, e.g., porosity [29], P wave velocity [29], unit weight [30] and point load index [29,30]. Some of the input parameters considered, such as lithology, are only qualitatively introduced into the neural network [30]. However, limited attention was paid to forecasting the influences of the rock heterogeneity on rock strength by the ANN approach. In addition, the strain rate effect was also ignored in the previous ANN-based models despite its importance.
Targeting the gaps in the existing literature, this study adopts the ANN approach to predict the influences of heterogeneity on rock strength at different strain rates. The ANN model is implemented in Matlab (R2020a). The raw dataset is collected from our previous 84 Brazilian disc tests [31][32][33]. In these tests, 3D-printed Brazilian disc specimens with preexisting single and double flaws were compressed under quasi-static and dynamic loadings. The application of the 3D printing technique can eliminate the creation of micro cracks and guarantee the imprecise flaws in the geometry during the flaw embedding process, making the experimental results more reliable. The considered heterogeneous parameters in this study include the flaw number, relative flaw location, flaw orientation, and strain rate. The optimization of the ANN architecture is performed using a trial-and-error method in terms of transfer functions and hidden neuron numbers. The predictions obtained from the trained ANN model are compared with the experiments to evaluate ANN model performance. In addition, a sensitivity study demonstrates the relative importance of the individual input variable, and the effects of different input numbers are also investigated.

Methodology
As a versatile approach for dealing with nonlinear problems, the artificial neural network (ANN) is in the form of a multilayer network, consisting of two propagation directions, namely feed forward and back propagation [24]. As shown in Figure 1, a typical ANN architecture is divided into three parts: one input layer, one or multiple hidden layers, and one output layer. Multiple layers containing different numbers of neurons are connected by either nonlinear or linear transfer functions. The complex nonlinear relationship between input and output variables can be learned via these transfer functions. The collection of neuron nodes distributed in each layer has their unique weight and bias values to describe the interconnection strength within the network. In Figure 1 ANN approach. In addition, the strain rate effect was also ignored in the previous ANNbased models despite its importance.
Targeting the gaps in the existing literature, this study adopts the ANN approach to predict the influences of heterogeneity on rock strength at different strain rates. The ANN model is implemented in Matlab (R2020a). The raw dataset is collected from our previous 84 Brazilian disc tests [31][32][33]. In these tests, 3D-printed Brazilian disc specimens with preexisting single and double flaws were compressed under quasi-static and dynamic loadings. The application of the 3D printing technique can eliminate the creation of micro cracks and guarantee the imprecise flaws in the geometry during the flaw embedding process, making the experimental results more reliable. The considered heterogeneous parameters in this study include the flaw number, relative flaw location, flaw orientation, and strain rate. The optimization of the ANN architecture is performed using a trial-anderror method in terms of transfer functions and hidden neuron numbers. The predictions obtained from the trained ANN model are compared with the experiments to evaluate ANN model performance. In addition, a sensitivity study demonstrates the relative importance of the individual input variable, and the effects of different input numbers are also investigated.

Methodology
As a versatile approach for dealing with nonlinear problems, the artificial neural network (ANN) is in the form of a multilayer network, consisting of two propagation directions, namely feed forward and back propagation [24]. As shown in Figure 1, a typical ANN architecture is divided into three parts: one input layer, one or multiple hidden layers, and one output layer. Multiple layers containing different numbers of neurons are connected by either nonlinear or linear transfer functions. The complex nonlinear relationship between input and output variables can be learned via these transfer functions. The collection of neuron nodes distributed in each layer has their unique weight and bias values to describe the interconnection strength within the network. In Figure 1, 1 and 2 represent the input weight matrixes, while 1 and 2 represent the bias matrixes in the hidden and output layers, respectively.  Herein, the updating of the weight and bias matrixes is achieved by adopting the Levenberg-Marquardt algorithm as the training function. As a typical backpropagation algorithm, the Levenberg-Marquardt optimization has its inherent advantages, pertaining to convergence rate, generalization performance and precision, although more memory is required [34]. Since the Levenberg-Marquardt algorithm is an iterative procedure, the application of training function is a process of adjusting the network weight and bias matrixes for each epoch. The initial weight and bias values are random, and there is no need to make a prior assumption. The algorithm implementation has two phases. Firstly, input variables are propagated forward through the network via transfer functions to emerge as output variables, defined as a feed-forward process. To assess the ANN model performance, the averaged squared difference between the outputs and targets is proposed and quantified by a mean square error (MSE): Meanwhile, the correlation between the outputs and targets is expressed by a regression value R as: where n is the number of outputs, and O i and T i are the ith output and target values, respectively. During training, if the MSE meets the target error, the training process is then terminated because the predictive accuracy is sufficient. Otherwise, the error between the predicted output and target output values will be propagated backward through the network to update the connection weights and biases, in order to minimize the errors, until the MSE satisfies the target error. This is the second phase, also known as the backpropagation process. The detailed derivation of the Levenberg-Marquardt algorithm can be found in previous literature [35,36].
To avoid overfitting and increase the generalization capability of the trained ANN, an early stopping technique is utilized during the training procedure. The entire dataset is divided into three subsets, namely, the training set, the validation set and the testing set. During the training procedure, the first two datasets are used. The training set is used for computing and updating the network weights and biases. During the training process, as with the training error, the model performance on the validation set is simultaneously monitored. When the validation error starts to increase for a certain number of iterations (set as 10 here), the training is terminated. The increase in the validation error indicates the decrease in the model generalization ability. The entire network will then return the weight and bias matrixes when the overall error is at its minimum.

Input and Output Variables
The current study aims to investigate the ANN application in establishing the relationship between the rock strength (peak load) and rock heterogeneous factors at different strain rates. An experimental dataset including 84 Brazilian disc experiments has been collected from our previous studies [31][32][33]. For the multiple-crack system, descriptive and appropriate crack geometric parameters should be properly selected, which is vital to obtain an acceptable predictive accuracy with a relatively small dataset. Herein, the selected parameters include (1-2) the crack inclination angle with loading axis α 1 , α 2 , where the subscripts are used to distinguish the different flaws; (3) the crack offset distance H; (4) the crack tip distance S, (5-6) the flaw filling f 1 , f 2 , and (7) the strain rate . ε, as shown in Figure 2a. These seven parameters construct the input variables, and the peak load PL, as marked in Figure 2b, is the output variable. The entire dataset containing 84 experiments is listed in Table A1 in Appendix A. For the single flaw tests, H and S are equal to 0. If the open flaw is filled, its corresponding filling parameter is set as 1; otherwise, it will be 0.
The raw dataset can be normalized on a scale of 0 to 1 to improve the network performance. The normalization equation is given as follows:  The raw dataset can be normalized on a scale of 0 to 1 to improve the network performance. The normalization equation is given as follows: where and are the ith normalised and original variable values, respectively. and are the minimal and maximum values of the variable set.

Optimization of ANN
In this application, the trial-and-error method is adopted for optimizing the network structure and operational parameters because of the small dataset and low computational consumption for each trial [27]. Different transfer function combinations and the min-max normalization effect are tuned in order to find the best setting. Meanwhile, a parametric study for hidden layer neuron number is conducted.
For better comparison, different ANNs are required to be performed on the same training, validation and testing dataset divisions, accounting for 60, 12  Since the network is trained to start from different initial weights and biases, this may lead to different solutions even for the same dataset division. Each ANN structure is trained ten times and the ANN with the best performance is selected. The MSE or R values for the training and testing dataset are chosen to evaluate the training and generalization performances, respectively. The ANN, owing the smaller average MSE or the larger average R, is considered to have better performance.

Transfer Functions
Based on the previous studies [28,37], one hidden layer is applied due to the limited input and output variables. The network with too many hidden layers or neurons can be easily over-trained and will not sufficiently predict the new data. The detailed analysis of the neuron number contained in the hidden layer is elaborated in the subsequent section. The different setting combinations are performed on the same 7-10-1 ANN, where the ANN contains 10 hidden layer neurons. Table 1 lists seven different transfer functions and

Optimization of ANN
In this application, the trial-and-error method is adopted for optimizing the network structure and operational parameters because of the small dataset and low computational consumption for each trial [27]. Different transfer function combinations and the min-max normalization effect are tuned in order to find the best setting. Meanwhile, a parametric study for hidden layer neuron number is conducted.
For better comparison, different ANNs are required to be performed on the same training, validation and testing dataset divisions, accounting for 60, 12  Since the network is trained to start from different initial weights and biases, this may lead to different solutions even for the same dataset division. Each ANN structure is trained ten times and the ANN with the best performance is selected. The MSE or R values for the training and testing dataset are chosen to evaluate the training and generalization performances, respectively. The ANN, owing the smaller average MSE or the larger average R, is considered to have better performance.

Transfer Functions
Based on the previous studies [28,37], one hidden layer is applied due to the limited input and output variables. The network with too many hidden layers or neurons can be easily over-trained and will not sufficiently predict the new data. The detailed analysis of the neuron number contained in the hidden layer is elaborated in the subsequent section. The different setting combinations are performed on the same 7-10-1 ANN, where the ANN contains 10 hidden layer neurons. Table 1 lists seven different transfer functions and normalization setting combinations. Figure 3 presents the regression values for different ANN settings.
As shown in Figure 3, the setting of case 1 is superior to other cases and the following study adopted same transfer functions and normalization setting. It should also be noted that, for cases 2, 3 and 4, the trained ANNs present a better testing accuracy than the training results, which may result from the fixed data set splitting here. Although these three cases have poor training accuracy for the training dataset, the trained ANNs appear to be more suitable for the testing dataset, leading to a higher regression value [27]. normalization setting combinations. Figure 3 presents the regression values for different ANN settings. Normalized Tan-sigmoid  Tan-sigmoid  Normalized  2 Normalized Tan-sigmoid  Sigmoid  Normalized  3  Normalized  Sigmoid  Sigmoid  Normalized  4 Normalized Tan-sigmoid  Linear  Raw  5  Normalized  Sigmoid  Linear  Raw  6 Raw Tan-sigmoid  Linear  Raw  7 Raw Sigmoid Linear Raw As shown in Figure 3, the setting of case 1 is superior to other cases and the following study adopted same transfer functions and normalization setting. It should also be noted that, for cases 2, 3 and 4, the trained ANNs present a better testing accuracy than the training results, which may result from the fixed data set splitting here. Although these three cases have poor training accuracy for the training dataset, the trained ANNs appear to be more suitable for the testing dataset, leading to a higher regression value [27].

Hidden Layer Neuron Number
Yan et al. [28] reported the following formulation to estimate the neuron number in the single hidden layer: where , and the neuron numbers of hidden, input and output layers, respectively. is a constant, ranging from 1 to 10. The neurons number in the hidden layers is investigated in the range of 4 to 13. The comparison between different ANNs is also performed for the same abovementioned dataset divisions and the MSE values for each case are shown in Figure 4. The training MSE is usually smaller than the testing MSE. The neuron number in the hidden layer is chosen to be 5 since this leads to a relatively smaller average MSE value.

Hidden Layer Neuron Number
Yan et al. [28] reported the following formulation to estimate the neuron number in the single hidden layer: where m, n and l the neuron numbers of hidden, input and output layers, respectively. a is a constant, ranging from 1 to 10. The neurons number in the hidden layers is investigated in the range of 4 to 13. The comparison between different ANNs is also performed for the same abovementioned dataset divisions and the MSE values for each case are shown in Figure 4. The training MSE is usually smaller than the testing MSE. The neuron number in the hidden layer is chosen to be 5 since this leads to a relatively smaller average MSE value. Table 2 provides the ANN setting after optimization. The training is repeated 10 times and the ANN with the best performance is selected.

ANN Training
The training procedure of the picked ANN is illustrated in Figure 5. After epoch-11, the error of the validation starts to increase, as well as that of the testing dataset. Although the training accuracy is still increasing, the generalization ability tends to decrease. As a result, the training stops at epoch-21 and returns the training value at epoch-11 due to the early stopping setting. The outcome of the chosen ANN is shown in Figure 6. It is observed that the R values for training, validation and testing are 0.9962, 0.9863, and 0.9983, respectively. The overall regression value R is 0.9951.   Table 2 provides the ANN setting after optimization. The training is repeated 10 times and the ANN with the best performance is selected. The training procedure of the picked ANN is illustrated in Figure 5. After epoch-11, the error of the validation starts to increase, as well as that of the testing dataset. Although the training accuracy is still increasing, the generalization ability tends to decrease. As a result, the training stops at epoch-21 and returns the training value at epoch-11 due to the early stopping setting. The outcome of the chosen ANN is shown in Figure 6. It is observed that the R values for training, validation and testing are 0.9962, 0.9863, and 0.9983, respectively. The overall regression value R is 0.9951.  The weights and biases associated with this trained ANN are given as follows:   The weights and biases associated with this trained ANN are given as follows:

Results and Discussion
The entire dataset is used to evaluate the accuracy of the predicted peak load by the ANN approach. All predicted results by the trained ANN are listed in Appendix A. The importance factors of each input variable on the peak load are calculated by the sensitivity analysis.

Predicting Results
Figure 7a presents a comparison of the peak loads predicted by the ANN model with the experimental results, and Figure 7b shows the relative error values between the ANNpredicted and experimental results. The relative error is calculated as PL ANN − PL Experiment / PL Experiment , where PL ANN and PL Experiment are the ANN predicted and experimental results, respectively. Meanwhile, the statistical analysis based on the ratio of PL ANN /PL Experiment and mean absolute relative error is performed, and the key descriptive statistics are summarized in Table 3.  It is clear that the dynamic test section has better predicted results than the static test section in terms of mean, standard deviation (SD) and coefficient of variation (COV) for the ratio of / , as well as a smaller mean absolute relative error. Except for several data points in the static test section, the majority of the predicted data have a relative error within ±10%.
Generally speaking, the ANN model demonstrates its strong ability to quickly (within seconds) and accurately predict the peak load. The current trained ANN model has a good mean value (close to 1) and a small standard deviation, indicating that the ANN model has a good generalization ability. In addition, the relatively small COV value indicates fewer scattering results. The overall mean absolution relative error is only 4.49%.  It is clear that the dynamic test section has better predicted results than the static test section in terms of mean, standard deviation (SD) and coefficient of variation (COV) for the ratio of PL ANN /PL Experiment , as well as a smaller mean absolute relative error. Except for several data points in the static test section, the majority of the predicted data have a relative error within ±10%.
Generally speaking, the ANN model demonstrates its strong ability to quickly (within seconds) and accurately predict the peak load. The current trained ANN model has a good mean value (close to 1) and a small standard deviation, indicating that the ANN model has a good generalization ability. In addition, the relatively small COV value indicates fewer scattering results. The overall mean absolution relative error is only 4.49%.

Sensitivity Analysis
To quantify the relative influence factor of each input variable on the peak load, Garson's equation, derived from neuron weight matrixes (IW1 and IW2), is applied, as illustrated below [38,39]: where I ik is the relative importance factor of the ith input neuron on the kth output neuron, N and L are the numbers of input and hidden neurons, respectively, IW1 ij is the connection weight matrix between the ith input neuron and the jth hidden neuron, while IW2 jk is the connection weight matrix between the jth hidden neuron and the kth output neuron.
Note that the ANN model is obtained based on a series of min-max normalized inputs and output data. Figure 8 shows the relative influence of each individual input variable. The results of the analysis illustrate that the strength of heterogeneous rock is greatly affected by all considered input variables, while the strain rate . ε is the most predominant variable affecting the rock strength in this study.  The effects of input number are further evaluated by comparing the mean absolute relative error derived from the entire dataset. The results for different selections of inputs are summarized in Table 4. Note that all ANN models with different inputs are trained under the same ANN setting listed in Table 2. It can be observed from Table 2 that the reduction in the inputs will decrease the overall performance of the ANN models in the current study. In particular, the strain rate cannot be ignored here because of the large prediction error presented in Case 8, which also validates the sensitivity analysis results. The error difference among other reported cases is, to some extent, not obvious.  The effects of input number are further evaluated by comparing the mean absolute relative error derived from the entire dataset. The results for different selections of inputs are summarized in Table 4. Note that all ANN models with different inputs are trained under the same ANN setting listed in Table 2. It can be observed from Table 2 that the reduction in the inputs will decrease the overall performance of the ANN models in the current study. In particular, the strain rate . ε cannot be ignored here because of the large prediction error presented in Case 8, which also validates the sensitivity analysis results. The error difference among other reported cases is, to some extent, not obvious.

Conclusions
The present study shows that the artificial neural network (ANN) approach can be applied to predict the influences of heterogeneity on rock strength at different strain rates. In the current study, seven input variables (initial crack inclination angle α 1 , α 2 ; relative position H, S; filling material f 1 , f 2 and strain rate . ε) are considered and only one output variable, peak load PL. The best prediction performance of the ANN is observed when using the tan-sigmoid transfer functions in both the hidden and output layers, as well as the five neurons contained in the single hidden layer. Although only a limited number of experiment tests are implemented, the trained ANN presents a fast and accurate strength predicting performance, as reflected in the overall regression value of 0.9951. Finally, the sensitivity study for the input variables determines that strain rate is the most important factor affecting the mechanical performance of heterogeneous rocks. In the practical application process, the rock strength can be forecasted by several simple and measurable physical variables. In conclusion, the strong capacity for rapid convergence and high predictive accuracy of the ANN model is corroborated.
Limited by current experimental tests, the input variable pairs, namely α 1 , α 2 and f 1 , f 2 , both have the same values. Therefore, the applicability of the trained ANN model for more complex conditions concerning flaws with diverse crack inclination angles and filling material needs to be further validated by corresponding experiments. Because the ANN approach has the inherent advantage of being able to be extended (in terms of both depth and width), it possesses promising potential for establishing a more complex nonlinear relationship between the rock strength and rock heterogeneity when more heterogeneous physical quantities are involved. However, from a practice point of view, further systematic sampling with variations poses a great challenge for the large quantity of high-quality laboratory-tested samples, which may be problematic for some special fragile or heavily weathered rocks, as mentioned above. Hence, the proposed ANN approach, to some extent, can minimize the number of unnecessary experiments in order to ensure the rapid estimation of rock mechanical properties, which provides an alternative means of handling complex rock engineering problems. At the same time, with the aid of the more complete database, the reliability of the proposed ANN model can be further validated. It is worth noting that the current ANN approach cannot provide detailed information on failure patterns, such as those in experiments and simulations, and further research is deemed necessary.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.