Prediction of the Compressive Strength of Waste-Based Concretes Using Artificial Neural Network

In the 21st century, numerous numerical calculation techniques have been discovered and used in several fields of science and technology. The purpose of this study was to use an artificial neural network (ANN) to forecast the compressive strength of waste-based concretes. The specimens studied include different kinds of mineral additions: metakaolin, silica fume, fly ash, limestone filler, marble waste, recycled aggregates, and ground granulated blast furnace slag. This method is based on the experimental results available for 1303 different mixtures gathered from 22 bibliographic sources for the ANN learning process. Based on a multilayer feedforward neural network model, the data were arranged and prepared to train and test the model. The model consists of 18 inputs following the type of cement, water content, water to binder ratio, replacement ratio, the quantity of superplasticizer, etc. The ANN model was built and applied with MATLAB software using the neural network module. According to the results forecast by the proposed neural network model, the ANN shows a strong capacity for predicting the compressive strength of concrete and is particularly precise with satisfactory accuracy (R² = 0.9888, MAPE = 2.87%).


Introduction
According to the World Bank, the amount of global solid waste is currently 2.2 billion tons per year. This figure is likely to increase due to global demographic and economic growth. The overconsumption and inefficient use of materials also have a critical impact on the environment and climate. The annual cement needs in France are 16.9 million tons (in 2020), and in 2020 the need for aggregates was over 400 million tons, 96% of natural origin. Concrete is an ancient and widely used material because of its mechanical properties, which have long been appreciated. The Egyptians were already using it around 2600 BC in the pyramid of Abu Rawash. This material continued to evolve until the invention of reinforced concrete in 1867 by Joseph Monier . Concrete is a multiphase material consisting of a granular skeleton and a cementitious matrix. Due to chemical reactions of hydration followed by hardening, the assembly stiffens. This attributes some specific physicochemical properties to the material: elastic properties, high compressive strength, low tensile strength, permeability, etc. These properties are the result of a series of chemical reactions, triggered as soon as the anhydrous cement and water come into contact. These reactions thus give rise initially to portlandite (Ca(OH) 2 ), which acts as a trigger for setting, then to various hydrates (C-S-H, C-A-H, C-A-S-H, etc.) representing the "glue" of the matrix.
Concrete structures now represent more than 90% of modern structures. Concrete has thus become the most important building material on the planet, in terms of volume and turnover. Its success stems from, among other things, its extraordinary versatility and It is clear that the prediction of concrete properties can be efficiently performed using machine learning technology [40,41].
In the early 2000s, several studies [42,43] showed the great potential to optimize mix proportioning and forecasting of concrete properties. Apostolopoulou et al. [44] investigated the use of ANNs to simulate the characteristics of lime-based mortars, such as compressive and flexural strength and consistency. The final results showed that the developed ANN models fit satisfactorily with the experimental data. Gupta et al. [45] used an ANN in a recent study since there was no mathematical model for the rapid prediction of mechanical properties of rubberized concrete. The trained network based on data compiled from recent research showed results that predicted compressive strength, modulus of elasticity (static and dynamic), and mass loss. Several other studies on this type of concrete led to similar conclusions, while others focused on predicting the properties of recycled aggregate-based concrete [35,46]. The test data are generally sets of compressive strength, splitting strength, porosity, the permeability coefficient of recycled aggregate, etc. Based on mean squared error (MSE), root mean square error (RMSE), and coefficient of regression (r 2 ), the results proved to have a very good fit, as stated by Dantas et al. [47]. It has been demonstrated that ANNs can predict the compressive and tensile strength of concretes containing construction and agricultural wastes [32,48], blast furnace slag [35], and alkali-activated mortars [34,37]. Some authors combined an ANN with other techniques, such as a genetic algorithm (GA) [35], statistics and holistic models [44], the cuckoo search method [49,50], ANFIS models [24,51], fuzzy logic models [52,53], and the Monte Carlo approach [54], to optimize the prediction results. Jiang et al. [53] and Farooq et al. [55] studied the prediction of mechanical properties of self-compacting concretes and high-performance concretes using an ANN on over 1030 datasets. The excellent findings obtained suggested that machine learning processes are quite robust and efficient, becoming indispensable for concrete property prediction. In addition, Asteris et al. [56] developed a methodology that predicts the effects of seismic loads on masonry structures. The authors were able to take into account the weakness, damage, fragility, and general properties of structures. Ray et al. [39], like Sadowski et al. [38], recently showed that ANN techniques are relevant in predicting properties of waste-based concretes and mineral admixtures such as metakaolin, silica fume, dust-based, filler-based, glass waste-based quartz mineral, and fibers. Bui et al. [57] used a whale optimization algorithm (WOA) coupled with a neural network (NN) with over 400 nodes to simulate the 28-day compressive strength of concrete. The results showed that the WOA-NN is reliable and has the highest correlation of 0.8976 when compared to different techniques of modeling. Other recent studies [58,59] conducted on the prediction of concrete's compressive strength used several methods (Support Vector Regression (SVR), Decision Tree Regression (DTR), Gradient Boosting Regression (GBR), and ANN) for comparative purposes. It was shown that SVR, DTR, and ANN were reliable methods.

Research Significance
More than 10 billion tons of concrete are currently used worldwide, and in the USA, for example, USD 9.4 billion would be needed to restore the country's 600,000 bridges. It is therefore important to emphasize that the emergence of innovative processes and techniques in the formulation and composition of concretes is very much needed. The use of supplementary cementitious material may be one of the best solutions for nature and resource preservation. One of the best-established approaches to reducing the impact of cement on the environment is the replacement of clinker with other materials. This method reduces energy consumption and increases production, without any additional industrial installation [60]. These substitutes are generally reactive byproducts from other industries: granulated blast furnace slag (GBFS), a byproduct of the iron industry; and fly ash (FA), generated by electricity production after the burning of coal. Moreover, natural materials such as calcined clays, pozzolans, and limestone fillers have proved suitable for concrete use. Several theoretical methods exist to predict concrete strength: those of Feret (1897), Abrams (1920), Bolomey (1925) [3,6], etc. Depending on the case, certain preponderant parameters, such as the water/binder ratio (W/L), the substitution rate (p(%)), and the maturation time, may influence the final strength of the concrete. However, these existing formulae are not always adapted to the materials cited above. In this work, ANN modeling and results in predicting concrete properties were investigated. The aim was to develop and set up AI-based tools to predict the properties of concretes containing byproducts reused as supplementary cementitious materials. This research highlights the potential of using an ANN with satisfactory and reliable results in predicting the characteristics of environmentally friendly concretes [38,61]. The novelty of this article relies on the scale of the dataset used and its extensiveness. Contrarily to several studies, this study tested the concomitant set of multiple supplementary cementitious materials using machine learning.

Artificial Neural Networks
Artificial neural networks (ANNs), which are part of the machine learning process, involve mathematical techniques based on the conception of interconnected layers of nodes [62]. An artificial neural network (ANN) is an artificial intelligence system that focuses on the identification and solving of complex issues and phenomena. A parallel can be drawn with conventional digital computing techniques, yet neural networks have many additional assets. For instance, they use equivalent processing modes and distributed information storage, and also have high accuracy. Furthermore, these methods are very robust when operating following the training process and are flexible to new information and learning [63]. The ANN system is meant to recreate the biological characteristics of the nerve cell structure of the brain.
Usually, an ANN is made up of an input layer of neurons, which includes other layers within it. These neurons predict the process results [10]. The junction of the layers is based on link weights according to Rafiq et al. [64]. As a definition, it can be said that an ANN is a computing system composed of multiple simple units and highly interconnected processing elements. These elements analyze information through the dynamic state response to external inputs. An ANN is skilled in memorizing the characteristics or features of given data and can match or make connections from new data to old with different levels of success [62,65]. The hidden layers (HLs) play the role of connecter or information carrier. The structure then enables the nets to extract a non-linear correlation from the available dataset [24].
There are six main parts in an ANN around a considered neuron nj: inputs (pi), bias (bj), weights (wij), sum function (n)j, activation function (f), and outputs (aj), as displayed in Figure 1. Inputs can be defined as information considered to be decision variables coming from neurons or the external environment. Weights are values that convey the effect of inputs or process elements on each other. Random weight values can be triggered when the process starts. The sum function is an operation that reflects the whole effect of inputs and weights by taking into account a bias value on this process element [13,66] (Equation (1)).
where: i = [1;k] is the number of the ith input neuron j = [1;m] is the number of the jth output neuron k = number of units in the ith input vector. b j = value of bias (referred to as the activation threshold) associated with jth node. The activation function or transfer function (usually the log-sigmoid function or the hyperbolic tangent [24]) is a function that processes the (n) j value and then determines the  (2) [16,67]. It also represents a way to simulate a phenomenon's reaction using input and output parameters [68].
where (a) j is the output of the jth neuron and α is a constant used to control the slope of the semi-linear region [13], and usually α = 1. The activation function or transfer function (usually the log-sigmoid function or the hyperbolic tangent [24]) is a function that processes the (n)j value and then determines the corresponding output value according to the formula in Equation (2) [16,67]. It also represents a way to simulate a phenomenon's reaction using input and output parameters [68] .
where (a) j is the output of the j th neuron and α is a constant used to control the slope of the semi-linear region [13] , and usually α = 1.

Neuron Model (Logsig, Tansig, Purelin)
In an ANN, each input is weighted with an appropriate w. The sum of the weighted inputs and the bias forms the input to the transfer function f(n)j. Multilayer networks often use the log-sigmoid transfer function logsig(n)j. The function logsig(n)j generates outputs between 0 and 1 as the neuron's net input goes from negative to positive infinity. Alternatively, multilayer networks can use the tan-sigmoid transfer function tansig(n)j or purlin. Logsig(n)j appears to be more adapted to the current study as it was found to be more accurate for some predictions [69]. Several algorithms can be implemented in ANN modeling, such as Bayesian regularization, Scaled Conjugate Gradient, Levenberg-Marquardt, one-step secant, and some other combination rules [19]. The most popular ANN method is the feedforward multilayer perceptron (MLP) system. The general scheme of the adopted neural network system is given in Figure 2. Final weight values come at the end of the training process and their final value are defined based on how well the model was trained.

Neuron Model (Logsig, Tansig, Purelin)
In an ANN, each input is weighted with an appropriate w. The sum of the weighted inputs and the bias forms the input to the transfer function f (n) j . Multilayer networks often use the log-sigmoid transfer function logsig(n) j . The function logsig(n) j generates outputs between 0 and 1 as the neuron's net input goes from negative to positive infinity. Alternatively, multilayer networks can use the tan-sigmoid transfer function tansig(n) j or purlin. Logsig(n) j appears to be more adapted to the current study as it was found to be more accurate for some predictions [69]. Several algorithms can be implemented in ANN modeling, such as Bayesian regularization, Scaled Conjugate Gradient, Levenberg-Marquardt, one-step secant, and some other combination rules [19]. The most popular ANN method is the feedforward multilayer perceptron (MLP) system. The general scheme of the adopted neural network system is given in

Training Methods
The neural network models applied in this study were developed using the Neural Network Toolbox in MATLAB software. The models were generated with 02 hidden layers and 10 neurons per hidden layer (Table 1). Of the total data, 70% was used for the

Training Methods
The neural network models applied in this study were developed using the Neural Network Toolbox in MATLAB software. The models were generated with 02 hidden layers and 10 neurons per hidden layer (Table 1). Of the total data, 70% was used for the training process. In our approach, 15% of the remaining data was used for testing and the other 15% for validation. The training process was operated using the Levenberg-Marquardt backpropagation algorithm (LMBPA), similar to Abu Yaman et al. [20] and Kumar et al. [70]. The LMBPA was chosen due to its simplicity of use. It was also shown that one of the most reliable ANN training algorithms is the backpropagation (BP) algorithm, which distributes the network error to arrive at the best fit or minimum error [71,72] and was, accordingly, used in this study.

Feedforward Network
A feedforward neural network was used in this study. This seems to be the most commonly used ANN architecture type. Feedforward networks have all their neurons classified into different layers. All neurons in each of the considered previous layers are connected to the neurons in the next layer. The multilayer architecture considered in this study, also called a multilayer perception [70], is given in Figure 3. There is no reliable method for deciding the number of neural units or layers required for a particular problem. This comes with experience and trials that are necessary to achieve the best network configuration [9]. The structure using multiple layers of neurons creates nonlinear relationships between input and output vectors. The number of layers determines the complexity of the architecture and the forecast precision. When the training process is completed, a positive value of weight signifies that the corresponding feature is directly related to the output. On the other hand, a negative weight implies that the corresponding feature is inversely The structure using multiple layers of neurons creates nonlinear relationships between input and output vectors. The number of layers determines the complexity of the architecture and the forecast precision. When the training process is completed, a positive value of weight signifies that the corresponding feature is directly related to the output. On the other hand, a negative weight implies that the corresponding feature is inversely linked to the output. The more the weight related to a feature, the more the effect of the corresponding feature on the output.

The Backpropagation Algorithm (BPA)
The term 'backpropagation' indicates a method in which a correction gradient is calculated for nonlinear multilayer networks [73]. This step is an essential part of the network learning process and is performed by the learning algorithm [64]. To assess the performance of the neural network model, an error measure such as root mean square error (RMS) can be used [24]. The determination of and reduction in the error value or cost function can be performed using the so-called generalized delta rule [11]. In fact, the error (which is the gap between forecast and actual values) is reduced using a backpropagation algorithm [9]. Then, during the BPA process, the neuron weights are subsequently adjusted ( Figure 4). According to Oztas et al. [18], the BPA is one of the most famous and most widely used training algorithms [11]. In a multi-layer perceptron (MLP) this method corresponds to a gradient descent technique that minimizes the error or cost of the process.  LMBPA was used in this present study, as implemented in MATLAB and its neural network fitting module. The LMBPA is the fastest backpropagation algorithm for many engineering problems and is highly recommended as a first-choice supervised algorithm, according to Sobhani et al. [10]. However, it requires more memory than other algorithms such as the Momentum, Adagrad, and Rmsprop methods.
A cost function (error function) can be defined to quantify the difference between the actual value and desired (forecast) outputs (Equation (3)): where: aPREDICT = a (j, PREDICT) is the forecast value, aTARGET = a (j,TARGET) is the experimental value. Gradient descent is an optimization algorithm that approaches a local minimum of a function by taking steps proportional to the negative of the gradient of the function at the current point. The main objective of the algorithm is then to reduce the cost value and adjust the weight that must be updated very smoothly and slowly by iteration until con- LMBPA was used in this present study, as implemented in MATLAB and its neural network fitting module. The LMBPA is the fastest backpropagation algorithm for many engineering problems and is highly recommended as a first-choice supervised algorithm, according to Sobhani et al. [10]. However, it requires more memory than other algorithms such as the Momentum, Adagrad, and Rmsprop methods.
A cost function (error function) can be defined to quantify the difference between the actual value and desired (forecast) outputs (Equation (3)): where: a PREDICT = a (j, PREDICT) is the forecast value, a TARGET = a (j, TARGET) is the experimental value.
Gradient descent is an optimization algorithm that approaches a local minimum of a function by taking steps proportional to the negative of the gradient of the function at the current point. The main objective of the algorithm is then to reduce the cost value and adjust the weight that must be updated very smoothly and slowly by iteration until convergence.
In the gradient descent technique, the adjusted weight can be expressed as (Equation (4)): new weight = old weight − derivative Rate * learning rate (4) where τ is known as the step-size parameter and affects the rate of convergence of the algorithm, and ∇J (wij) ∇(w ij ) is the derivative rate or gradient of the loss function J (w ij ).
The learning process consists of changing the weights in order to minimize this J(w) in a gradient descent technique. The training process is considered as successfully completed when the iterative process has converged [9].

Modeling Performance Criteria
The accuracy and error quantification of the proposed system was evaluated using performance parameters. The first parameter is the R 2 coefficient (coefficient of determination), which is the absolute fraction of variance of a variable. It is a measure of the proportion of the information in the data that is explained by the model [62]. The value of R 2 varies from 0 to 1. The closer R 2 is to 1, the closer the forecast value is to the experimental one, expressed as (Equation (5)): The root mean square error (RMSE) is the square root of the mean square error and indicates the average distance of a data point (targeted) from the expected value (predicted) provided by the model. The lower the RMSE value, the better the model (Equation (6)): For a better understanding, RMSE can be normalized using the mean of the actual value. This can facilitate comparisons between datasets or models [20].
MAPE (Equation (7)) is the mean absolute percentage error and is a statistical value of prediction accuracy. It indicates a better model fit through a percentage value. However, MAPE places a heavier penalty on negative errors than positive errors due to the division by the factor a PREDICT .
MAE is the mean absolute error formula and is given by Equation (8): In all the formulae above: • N is the number of experiments, • a PREDICT = a (j, PREDICT) is the predicted value for the jth neuron • a TARGET = a (j, TARGET) is the experimental value for the jth neuron.

Bibliographic Dataset and Data Preparation
The comprehensiveness, structure, and volume of the data used for training are vital to building an effective network. This is what must lead to better learning, testing, and validating for the network and accurate prediction of all aspects of the relationship between inputs and outputs [20].
Experimental datasets from different sources were used. The notation used is given in Table 2. This is an inhomogeneous collection from the experimental data of some previous research work. The present database was built from the literature and includes a total of 1303 concrete formulations from 22 different studies. We used a large dataset to minimize the lack of data that causes informational uncertainty and to minimize model accuracy problems. Data were assembled from the bibliography and 18 selected inputs were considered: the content of water, cement, and fine and coarse aggregate; admixture; age; and water/binder ratio; superplasticizer; the slump, etc. One output was considered, which is compressive strength.
This study takes into account a very wide spectrum of materials and quantities, as shown in Table 3. In Table 4, we also present an excerpt of concrete mixes from the dataset used. Part of the extensive list of formulations used for the ANN training -testing is given in the Supplementary Materials. Table 3. General characteristics of concrete formulations used in the dataset.

Results and Discussion
The structure of the ANN applied in this study is shown in Figures 5 and 6. The network consists of 18 inputs, two hidden layers, and one output, and was used for 1310 data values.
The results in Figure 7 show the model performance results measured through error minimization techniques. During the learning process, the error drops as the network is continuously trained. The patterns in Figure 7 are respectively training, validation, and testing relative to model error.

•
Pattern 1 (blue, Training) describes the training error obtained from 70% of the samples and improves the model's fit by adjusting the network according to its error. • Pattern 2 (green, Validation) fits the network generalization ability that instructed the network on when to stop the training process. Pattern 2 represents the ability of the model to predict new data [32] (predictive performance). The training process is halted when validation error stops decreasing, which inherently avoids over-fitting. • Pattern 3 (red, Testing) does not affect training and is an independent measure of network performance. This error measured on the test data indicates how well the model is generalized to the data during and after training.

Results and Discussion
The structure of the ANN applied in this study is shown in Figures 5 and 6. The network consists of 18 inputs, two hidden layers, and one output, and was used for 1310 data values.
The results in Figure 7 show the model performance results measured through error minimization techniques. During the learning process, the error drops as the network is continuously trained. The patterns in Figure 7 are respectively training, validation, and testing relative to model error.

•
Pattern 1 (blue, Training) describes the training error obtained from 70% of the samples and improves the model's fit by adjusting the network according to its error. • Pattern 2 (green, Validation) fits the network generalization ability that instructed the network on when to stop the training process. Pattern 2 represents the ability of the model to predict new data [32] (predictive performance). The training process is halted when validation error stops decreasing, which inherently avoids over-fitting. • Pattern 3 (red, Testing) does not affect training and is an independent measure of network performance. This error measured on the test data indicates how well the model is generalized to the data during and after training.     In Figures 7 and Figure 8, the results clearly demonstrate that the gradient begins to stabilize when the epoch equals 6. The values of the coefficient of determination R² are 0.9982, 0.9763, and 0.9566 for training, validation, and testing respectively. The histogram shown in Figure 9 shows that the error value is lowering from training to testing. This corresponds to an improvement of the model throughout the processing and precision.
The high values of R² in Figure 10 mean that the model seems to have sufficient accuracy and is also well trained. In Figures 7 and 8, the results clearly demonstrate that the gradient begins to stabilize when the epoch equals 6. The values of the coefficient of determination R 2 are 0.9982, 0.9763, and 0.9566 for training, validation, and testing respectively. The histogram shown in Figure 9 shows that the error value is lowering from training to testing. This corresponds to an improvement of the model throughout the processing and precision. The high values of R 2 in Figure 10 mean that the model seems to have sufficient accuracy and is also well trained.     In Figure 10, the network outputs with respect to targets for training, validation, and test sets are plotted against the target values. For a perfect fit, the data should fall along a 45 • line, where the network outputs are equal to the targets. For this problem, the fit is very good for all datasets, with R 2 values of 0.95 or above in each case. In Figure 10, the network outputs with respect to targets for training, validation, and test sets are plotted against the target values. For a perfect fit, the data should fall along a 45° line, where the network outputs are equal to the targets. For this problem, the fit is very good for all datasets, with R² values of 0.95 or above in each case. The performance indicators for compressive strength accuracy are given in Table 5. The value of RMSE, which is 2.91 MPa, shows that the gap between predicted and experimental values is small. MAPE shows that the predicted compressive strength deviated on average by 2.87% from the experimental data. This indicates that the differences between forecast and actual results were negligible. All these points indicate that the ANN strength predictive model was able to reproduce the experimental compressive strength results with high accuracy. These results are comparable to those obtained in similar stud- The performance indicators for compressive strength accuracy are given in Table 5. The value of RMSE, which is 2.91 MPa, shows that the gap between predicted and experimental values is small. MAPE shows that the predicted compressive strength deviated on average by 2.87% from the experimental data. This indicates that the differences between forecast and actual results were negligible. All these points indicate that the ANN strength predictive model was able to reproduce the experimental compressive strength results with high accuracy. These results are comparable to those obtained in similar studies [29,51]. In the same order, the determination coefficients reached by [39,78] are between 0.9443 and 0.9836.

Conclusions
This study aimed to use an artificial neural network to predict the compressive strength of waste-based concretes. The methodology, architecture, and learning methods were explained, based on feedforward and backpropagation techniques. A bibliographic dataset compiled from the literature was then used, including a total of 1303 concrete formulations from 22 different studies. The important conclusions that can be drawn from this work are:

•
The ANN model can predict compressive strength with high accuracy by learning the deep features of the water-cement ratio, the cement and admixture content, the age of the concrete, etc.

•
The results have demonstrated that multilayer feedforward artificial neural networks are practicable methods to forecast compressive strength in concretes. • Errors of the model calculated from R 2 , MSE, MAPE and MAE show small gaps between experimental and forecast values.
The above results suggest that the use of ANN is suitable for concrete compressive strength prediction. A coming study that we are undertaking will test the use of Decision Tree Regression (DTR) in the prediction of concrete properties. This machine learning method has been stated to be a very efficient approach.

Data Availability Statement:
The data used in this study are available from the corresponding author on submission of a reasonable request.