A Predictive Mimicker of Fracture Behavior in Fiber Reinforced Concrete Using Machine Learning

Due to the exceptional qualities of fiber reinforced concrete, its application is expanding day by day. However, its mixed design is mainly based on extensive experimentations. This study aims to construct a machine learning model capable of predicting the fracture behavior of all conceivable fiber reinforced concrete subclasses, especially strain hardening engineered cementitious composites. This study evaluates 15x input parameters that include the ingredients of the mixed design and the fiber properties. As a result, it predicts, for the first time, the post-peak fracture behavior of fiber-reinforced concrete matrices. Five machine learning models are developed, and their outputs are compared. These include artificial neural networks, the support vector machine, the classification and regression tree, the Gaussian process of regression, and the extreme gradient boosting tree. Due to the small size of the available dataset, this article employs a unique technique called the generative adversarial network to build a virtual data set to augment the data and improve accuracy. The results indicate that the extreme gradient boosting tree model has the lowest error and, therefore, the best mimicker in predicting fiber reinforced concrete properties. This article is anticipated to provide a considerable improvement in the recipe design of effective fiber reinforced concrete formulations.


Introduction
Due to brittle behavior, concrete absorbs significantly less energy as it shows an abrupt fracture in tension. Ductile materials can be coupled with concrete to improve tensile and energy absorption properties. Reinforced Cement Concrete (RCC) uses rebar to get better ductility and tensile strength. However, due to the larger diameter of the rebar, the cracks that rebar bridges are relatively larger, leading to durability issues [1]. The use of fibers has been increasing due to their enhanced mechanical and fracture properties. Fibers bridge the cracks at a micro-scale that controls crack width, improves crack resistance, and ensures better ductility.
Concrete reinforced with fibers can give either strain-softening or strain-hardening behavior. Mix that gives strain-hardening behavior is classified as Engineered Cementitious Composites (ECC). Earlier, it was thought that strain-hardening is achieved by only increasing the volume of fibers, but later it was found that fibers content is not the only controlling parameter. It is also the function of parameters such as:

•
Fiber Properties: Mechanical properties, aspect ratio, and volume fraction; • Matrix properties: Initial flaw size distribution and its mechanical properties; • Fiber-matrix interfacial properties: Chemical and frictional bond.
These properties can be adjusted to achieve strain hardening behavior while keeping practical fibers content [2]. To effectively use the combination of all parameters, researchers 2.1. 1

. Artificial Neural Networks (ANN)
ANN is a bio-inspired computational model that works the way human neurons work, which is why it has this name. This model contains three basic parameters (1) Input layer, (2) Output layer, and (3) Hidden layer. However, there are also some other parameters, but they were kept as default. For our problem, parameters for input and output are fixed. The hidden layer is a parameter that depends on data and defines the complexity of the model. The more hidden layers, the better the model can fit. However, it may cause overfitting, a situation in which data fits very well for the training dataset but performs poorly for the testing dataset [99]. Thus, the network was trained for different layers, and the optimum number of hidden layers was found by comparing the RMSE of training and validation datasets. Figure 1a depicts a typical ANN model. (c) XGBoost. f t is the Tensile strength; C/C is the cement-to-cement Ratio; F/C is the Fly ash to cement ratio; S/C is the sand to cement ratio; E f is the elastic modulus of fiber; f c is the Compressive Strength; єis the tensile Strain Capacity.

Regression Analysis
This is the most common family of models containing many different types with separate parameters, with the goal being to fit the data as closely as possible [100]. Common regression models include linear regression and polynomial regression.

Regression Tree Analysis
Regression Tree is one of the most potent tools of ML for regression analysis. It performs the calculations in a hierarchal (Tree-like) manner. The number of trees is a parameter that defines its complexity; however, the model cannot be made too complex to avoid overfitting [101]. Common Regression Tree models include CART and XGBoost (iterative tree). Figure 1b,c shows typical CART and XGBoost model respectively.

Generative Adversarial Network (GAN)
In addition to the above models, a specialized data augmentation technique known as Generative Adversarial Network (GAN) was used. GAN is a technique used when the available data is not very large, allowing virtual data to increase accuracy by increasing the dataset [14]. Figure 2 show the processing of data to get virtual dataset.

Overview
For the development of the ML model, a dataset with 19 instances was used, from which 15 are input parameters, and 4 are outputs. Input parameters include matrix constituents and fiber properties as shown in Table 1. To cover a wide range of cement replacement materials, (1) matrix constituent: the cement-to-cement ratio, the fly ash-tocement ratio, the sand-to-cement ratio, the coarse aggregate-to-cement ratio, the limestone powder-to-cement ratio, the slag-to-cement ratio, the silica fume-to-cement ratio, the metakaolin-to-cement ratio, the fiber content, the water-to-binder ratio, and the superplasticizer content was used. Major parameters that define (2) fiber properties are: the fiber length, the fiber diameter, the fiber tensile strength, and the fiber elastic modulus. Predicting the fracture response of the material is made possible using the dataset of conventional and HPFRC as well as the ECC samples. If only ECC samples were used to predict properties, there would have been a problem with the model of not differentiating the sample of other types of FRC. The model would have been treating input of any fiber concrete as an ECC and predicting higher values of strains considering strain-hardening. Training ML on the behavior of both FRC and ECC materials was made to overcome this issue. The trained model can predict the post cracking response based on variable differences in FRC and ECC.
The HPFRCCs sample data were included to capture the effect of coarse aggregate addition on the fracture attributes of the matrix. As ECC lacks coarse aggregate compared with FRC, the model might have confused the strain-softening behavior to the presence of coarse aggregates. Thus, HPFRCC data are added to avoid this mishap as it has coarse aggregate and shows stain-hardening simultaneously.

Dataset Normalization
The data were collected from literature initially in raw form. Their range was drastically different, e.g., cement content was normally around 1, but other parameters such as fiber diameter or elastic modulus of fiber were in the range of hundreds. Therefore, data normalization was necessary so that the model could predict the sensitivity of each parameter, which ultimately affects the results. Therefore, to keep all the parameters between 0 and 1, the following normalization technique was used. For normalization, Equation (1) was used to keep the data between 0 and 1.
x is any original input parameter; x(min) is the minimum value of the similar parameter; x(max) is the maximum value of the parameter; x * is the normalized value of the parameter.

Hyperparameter Tuning
Hyperparameter tuning is the most crucial parameter of machine learning models. In ANN, it corresponds to the number of hidden layers and learning rate, and in regression tree depends upon the number of branches. A simple iterative technique was used to find the performance of the model by changing the parameters. The best parameter for both training and validation sets was selected to counter underfitting and overfitting.

Performance Evaluation
In order to test the performance accuracy of the model, three basic performance parameters were used for regression data to relate the predicted (Y pre ) and actual results (Y actual ) [102,103]. These three parameters include (1) root mean squared error (RMSE), (2) coefficient of determination (R 2 ), and (3) Pearson correlation coefficient (R) as given by Equations (2)-(4).
However, the classification data were evaluated based on fundamental parameters of AUC (Area under Curve), the area under the ROC curve, and accuracy in predicting the data.

Anomalous Data
Anomalous data are the outlier that can affect the model's accuracy. Data are extracted from already published articles, including the hypothetical trails of different mixes. Fourteen such samples were removed, e.g., mix with 10% fibers, as it was an outlier, so it was removed. In the same way, reported compressive strength of over 200 MPa was also an outlier; therefore, it was also removed. Table 2 shows the optimal hyperparameters for different machine learning models used for each output parameter. There are different sorts of interlinking between input and output parameters. Therefore, for better results, each hyperparameter was calculated using the simplified approach of using a loop and finding the optimal combination for which the error is minimal for both training and validation sets, along with keeping a special check on overfitting. Since the numbers of hyperparameters in some models were very high, tuning was done only on some of the hyperparameters, and the rest were taken as default. Table 2 shows Hyperparameters that were optimized. The hyperparameters missing in this table were kept as default. XGBoost showed good results using the default hyperparameters without any tuning.

Training Process
For the training process, all the optimal hyperparameters listed in Table 2 were used to train the machine learning model. Special attention was given to ensure the model neither be under-fitted nor overfit. The training process was done, and the performance of each model was calculated separately for the training and testing dataset as per parameters defined in Section 2.3.4. Figure 3 show the approach employed for training the model.

Predicted Results and Discussions
Based on above mentioned trained models, compressive strength, tensile strength, tensile strain, and post cracking behavior (whether strain hardening would occur or not) can be predicted. Tables 3-5 compare actual vs. predicted results of the defined output parameters. The prediction accuracy was measured in terms of R 2 value and R-value. Its larger value corresponds to high prediction accuracy, while in the case of RMSE value, a low value indicates high accuracy. For post cracking behavior AUC and accuracy are used. Their higher value indicates higher prediction accuracy. Table 3. Comparison of predicted and actual values of mechanical properties.

Compressive Strength
Tensile Strength Table 3. Cont. Table 4. Comparison of predicted and actual values of ductility properties. Table 4. Cont.

Tensile Strain
Tensile Strain Table 5. AUC and Confusion matrix for predicting Post-cracking response.

Strain-Hardening
The performance of each model is summarized in Table 6. The performance of the models was evaluated as per Section 2.3.4. The results of both testing and training data sets were compared to avoid under fitting and overfitting. Among all models, the XGBoost method shows the best accuracy for all the output parameters followed by ANN, GPR, CART, and SVM. This model gives RMSE for a training set of compressive strength, tensile strength, and ductility as 1.59, 0.2, and 0.163, respectively, while for the testing set RMSE is 2.35, 0.31, and 0.18, respectively, which was more accurate compared to the previous models [12] in which RMSE for a training set of compressive strength, tensile strength, and ductility as 2.5, 0.36, and 0.25, respectively, while for the testing set RMSE is 6.75, 0.774, and 0.785, respectively. The value of R 2 for XGBoost was 0.99, 0.98, and 0.98 for compressive strength, tensile strength, and tensile strain, respectively, for training while 0.95, 0.95, and 0.97 for testing. XGBoost was able to classify the fracture behavior of the samples more accurately compared to other models. Its accuracy for classification was 98.5% and 98.4% for training and testing, respectively. The high accuracy of XGBoost is due to its iterative architecture, as shown in Figure 1c, which creates a better relationship between input and output parameters. Complete working process of predictive model is shown in Figure 4.

Validation of Predictive Models
It is clear from Table 6 that the XGBoost model has the maximum accuracy compared to the other models. Thus, it was used to predict compressive strength, tensile strength, tensile strain, and the post-cracking response of samples that are not included in any dataset (neither in original nor in virtual). The model was practiced for validation by published experiments' data. Samples with one varying parameter were checked. Two types of samples with varying percentage content of flyash and fiber content were tested. Figure 5 shows the comparison of actual v/s predicted properties of samples. Results reveal that XGBoost is accurate in predicted values, and classification made by the model about post cracking behavior is also 100% true.

Conclusions and Recommendations
This research was able to develop a new way to predict fracture properties (i.e., Mechanical properties, ductility, and the post-cracking response) of FRC, using the aid of machine learning. Five models were developed to predict four outputs with 15 input parameters of FRC. The performance of each model was evaluated, and from those following conclusions were made:

•
The predictive models are accurate enough to replace the extensive experimentation trails required for optimizing FRC according to desired needs. • These models can be effectively used to bifurcate fracture behavior as strain hardening or softening based on selected inputs as they are well trained for both types of behavior. XGBoost model shows 98.4% accuracy in segregating the fracture response of fiberreinforced matrices.

•
The above-proposed models can realistically be used to predict mechanical properties, ductility, and post-cracking behavior of both the traditional and high-performance FRC. Among all models, the XGBoost model shows the best accuracy for all the output parameters. This model gives RMSE for a training set of compressive strength, tensile strength, and ductility as 1.59, 0.2, and 0.163, respectively, while for the testing set RMSE is 2.35, 0.31, and 0.18, respectively. These performance indicators of RMSE were more accurate than previously implemented models [12]. • GAN was used to successfully produce a virtual dataset of 1000 samples using the original dataset. This virtual dataset further increased the accuracy of the models.

•
These models can also be optimized in a way to make the mix economic with improved mechanical properties along with minimizing the environmental impacts (e.g., carbon footprint and reuse of waste products) Future research is needed to find out other parameters and their dependence on different important parameters of FRC, e.g., fresh properties, durability properties, use of other types of cement, or incorporating the packing density concept for high strength concrete. More research is also needed using these models for other types of specialpurpose concretes. Acknowledgments: Authors would like to say special thanks to Muhammad Riyyan Khan (Khan.riyyan@yahoo.com) and Muhammad Umar Javed (mjaved.bscs18seecs@seecs.edu.pk) for useful discussion on the Implementation of ML models and would also like to thank Sadia Arshad and Hammad Anis Khan for guidance on the article writing and its formatting.

Conflicts of Interest:
The authors declare no conflict of interest.