Prediction of the Chloride Resistance of Concrete Modified with High Calcium Fly Ash Using Machine Learning

The aim of the study was to generate rules for the prediction of the chloride resistance of concrete modified with high calcium fly ash using machine learning methods. The rapid chloride permeability test, according to the Nordtest Method Build 492, was used for determining the chloride ions’ penetration in concrete containing high calcium fly ash (HCFA) for partial replacement of Portland cement. The results of the performed tests were used as the training set to generate rules describing the relation between material composition and the chloride resistance. Multiple methods for rule generation were applied and compared. The rules generated by algorithm J48 from the Weka workbench provided the means for adequate classification of plain concretes and concretes modified with high calcium fly ash as materials of good, acceptable or unacceptable resistance to chloride penetration.


Introduction
The increased use of high calcium fly ash (HCFA) for partial replacement of Portland cement in concrete could result in a number of environmental benefits (reduced consumption of cement clinker, reduced CO 2 emissions during cement production, saving natural resources, reduced landfill space and storage costs). The resources of high calcium fly ash are large, it is produced as a by-product of power generation in brown coal burning plants. However, this type of ash is usually characterized by low silica content, a high content of free lime and an increased content of sulfur compounds. It could be used in concrete following the requirements of ASTM (American Society for Testing and Materials) C618 Class C, but in Europe, it does not meet the requirements defined in standard EN 450-1. At present, HCFA is not in common use in European countries in spite of positive examples of its suitability provided by Greek and Turkish researchers. It was shown [1] that in the case of cement replacement with HCFA, the compressive strength of concrete was increased if the content of active silica in the fly ash was higher than that in the cement. Similar results were obtained earlier by Naik, et al. [2]: partial replacement of cement by fine-grained HCFA resulted in the same or better compressive strength of concrete; the results for drying shrinkage were also positive. The optimization of fineness coupled with the adjustment of water content were found as the key parameters of the effective utilization of high calcium fly ashes for strength maximization of cement mortars [3]. The application of HCFA as a partial cement replacement in mortar beams stimulated self-healing of cracks and particularly of microcracks [4]. It was also found that concrete specimens incorporating HCFA exposed to long-term chloride ponding experiments exhibited significantly lower total chloride content for all depths from the surface [5]. The key factors for the adequate performance of HCFA in concrete seem to be both the composition and the gradation of fly ash.
The assessment of concrete resistance to chloride ingress is fundamental for the durability of reinforced concrete structures exposed to deicing salt and the marine environment [6]. Numerous papers on chloride penetration resistance of concrete modified with standard siliceous fly ash were recently reviewed in [7]. The addition of fly ash is generally found (and confirmed in [8]) to reduce chloride permeability and also to increase the chloride binding capacity of concrete. Despite lower chloride threshold values, the addition of fly ash was found to provide better corrosion protection to steel reinforcements. There is a need to extend such a study to include high calcium fly ash. For rational use of HCFA in structural concrete, there is also a need to propose tools for the prediction of the chloride penetration resistance of concrete.
The prediction of the engineering properties of composite materials is usually based on experimental test results with a reference to the observed material microstructure. The relevant material characteristics can be extracted from an experimental dataset using various artificial intelligence methods, developed for the last two decades for various engineering applications [9,10]. Artificial neural networks were successfully applied for the prediction of the compressive strength of concrete containing silica fume [11] or coal ash [12]. Moreover, the application of neural networks and optimization technologies created the possibility to search for the optimum mixture of concrete: the mixture with the lowest cost and required performance, such as strength and slump [13]. Machine learning methods were also tested on the classification of concrete modified by fluidized bed fly ash as materials of adequate resistance to chloride penetration [14] and resistance to surface scaling [15]. The application of machine learning for the prediction of the scaling resistance of concrete modified with high calcium fly ash is described in [16]. The authors of [17,18] proposed to combine artificial neural networks and machine learning methods in one system to estimate and predict various properties of concrete materials.
The aim of this study is to generate rules using a machine learning algorithm to evaluate the chloride resistance of concrete modified with high calcium fly ash. The rules are generated using selected attributes from a database created by storing the experimental results of the chloride migration coefficient determined for three concrete series.

Composition of Concrete Mixes and Test Results of the Chloride Migration Coefficient
The chloride migration coefficient in concrete specimens with different contents of high calcium fly ash was experimentally measured. Concrete mixes were prepared with high calcium fly ash used for replacement of 15% or 30% of the cement mass. Experimental tests were performed on several mixes. For concrete manufacturing, two types of Portland cement, CEM I 42.5R (with 10% C 3 A content) or CEM I 42.5 HSR NA (with 2% C 3 A content), siliceous sand fraction 0 ÷ 2 mm and amphibolite as a coarse aggregate (two fractions 2 ÷ 8 mm and 8 ÷ 16 mm) were used. The following admixtures were used: a high range water reducer (based on polycarboxylate ethers) and a plasticizer (lignosufonate). Because of the expected variability of ash properties, three lots of high calcium fly ash were tested from different deliveries from the power plant, namely S1, 16 March 2010, S2, 19 May 2010, and S3, 28 June 2010. The chemical composition of HCFA is given in Table 1. For HCFA beneficiation, a grinding process was applied during 10-28 minutes in a ball mill. The physical properties of ash before and after grinding are given in Table 2 [19]. HCFA was used as an additive to concrete mix in unprocessed form (as collected) and after grinding.
The Nordtest Method Build 492-Non-Steady State Migration Test [20] was used to determine the chloride migration coefficient. The principle of the test is to subject the concrete specimen to external electrical potential applied across it and to force chloride ions to migrate into the concrete. The specimens are then split open and sprayed with silver nitrate solution, which reacts to give white insoluble silver chloride on contact with chloride ions. This provides a possibility to measure the depth to which a sample has been penetrated. The non-steady-state migration coefficient, D nssm , is determined on the basis of Fick's second law. This coefficient is dependent on the voltage magnitude, the temperature of the anolyte measured at the beginning and the end of test and the depth of chloride ions' penetration. The criteria for evaluating the resistance of concrete against chloride penetration proposed by L. Tang [21] are shown in Table 3. Table 1. The chemical composition of high calcium fly ashes from Bełchatów power plant in Poland, determined using the XRF (X-ray fluorescence) method. Fly ash sampling date and bath designation [19].  Table 2. Physical properties of high calcium fly ashes before and after processing [19].  Table 3. Criteria for the classification of the concrete resistance to chloride ions' penetration [21].

Chloride Migration Coefficient D nssm Resistance to Chloride Penetration
<2 × 10 −12 m 2 /s Very good 2-8 × 10 −12 m 2 /s Good 8-16 × 10 −12 m 2 /s Acceptable >16 × 10 −12 m 2 /s Unacceptable Experimental tests revealed a decrease of the chloride migration coefficient with the increase in the HCFA amount added to the mix. The most significant reduction of D nssm by 36%-75% and 54%-89% after 28 and 90 days of curing, respectively, was obtained when using ground HCFA to substitute 30% of binder mass. With a such reduction of D nssm , the level of chloride resistance changed from acceptable to good or from unacceptable to acceptable, [22]. For a few mixes prepared with a water-to-binder ratio of 0.60, a change of D nssm did not increase the level of chloride penetration resistance. Sieving through a 0.125-mm mesh size sieve was found to improve HCFA performance: it significantly reduced the value of D nssm , which was most evident after 90 days of curing. No clear relationship could be found between D nssm and the water-to-binder ratio or the compressive strength of concrete.
The resistance against chloride ingress of concrete containing low calcium fly ash was previously tested by Baert, et al. [23], and at 28 days, the chloride migration coefficient was increased with increasing fly ash content. However at later ages (3, 6 or 12 months), due to the pozzolanic reaction, the D nssm coefficient was lower for all concrete mixes with siliceous fly ash. The effects of blast furnace slag on the chloride migration coefficient summarized by Gjorv [6] were clearly favorable, even at the age of 14 days. After 28 days of water curing, the increasing amounts of slag up to 80% replacement resulted in the reduced apparent chloride diffusion coefficient from 11 × 10 −12 down to 2 × 10 −12 m/s 2 . The comparison with the obtained results on HCFA in concrete reveals almost comparable efficiency as blast furnace slag. This could be attributed to both pozzolanic and hydraulic activity of HCFA. The hydraulic properties of these fly ashes should be related to reactive aluminate phases and their hydration and also to the formation of ettringite in the initial phase of hydration [24]. A high hydraulic and pozzolanic activity index after a prolonged hydration and hardening process is connected with hydraulic phases, mainly belite and gehlenite, as well as with the reactivity of the glassy phase. The complexity of the phenomena involved in chloride ion penetration in concrete containing such a mineral addition of pozzolanic and hydraulic activity justifies an application of machine learning techniques to reveal the possible governing rules.
In Table 4, the database containing data on the composition of the concrete mixes, the specific surface of fly ash obtained by the Blaine method and the chloride migration coefficient determined after 28 days of curing is presented. The estimation of the concrete resistance to chloride penetration, based on the values of the diffusion coefficients according to the criterion presented in Table 3, is placed in the last column of Table 4.
The permeability of concrete is known to be dependent largely on the water-to-cement ratio, (w/c). However the definition of w/c is not unambiguous when using supplementary cementitious materials. Following the EN 206 standard, the effect of active mineral additions on w/c is quantified using the k-efficiency factor: the content of the additive (a) is multiplied with a k-value, and the water to cement ratio (w/c) is replaced by (w/c) eq = w/(c + k · a). The efficiency k factor approach is adequate to address the mix design for compressive strength when using the additives of the established efficiency. Even in such a case, like siliceous fly ash, the efficiency factors are not the same for durability performance and for the compressive strength [25]. The compiled fly ash efficiency data [6,26] revealed a much higher efficiency coefficient k in relation to the compressive strength than the value given in EN 206, even reaching the value of two in relation to the resistance to chloride attack. For nonstandard fly ashes and coal combustion products from so-called clean coal technology, the efficiency factors are not established [27]. Therefore, it is not possible to describe all of the effects of the nonstandard fly ashes, including HCFA, on concrete performance when exposed to various environmental factors with only one efficiency coefficient. In order to avoid an unambiguous (w/c) definition, the content of water in the mix is used as a descriptor in the machine learning database.  The database presented in Table 4 is a general database, which can be transformed into a "working database" by column selection.

Introduction to Machine Learning
Determining the relationship between material composition and the chloride resistance of concrete is a difficult and time-consuming process, even in the case of a small dataset, as presented in Table 4. For the considered dataset, it requires simultaneous analysis of 12 attributes (columns) for over 50 examples (rows). This task can be done manually; however, using a computer system to support data exploration is much more efficient. The branch of artificial intelligence concerned with applying algorithms that let computers evolve patterns using empirical data is called machine learning.
The aim of machine learning is to automatically learn to recognize complex patterns and make intelligent decisions based on the dataset. By a dataset, we mean a collection of logically-related records: a database. Each record can be called an instance or example, and each one is characterized by the values of predetermined attributes. The difficulty lies in the fact that the set of all possible behaviors given all possible inputs is too large to be covered by the set of observed examples (training data). Hence, the learner must generalize from the given examples, so as to be able to produce a useful output in new cases.
Patterns recognition associated usually with classification is the most popular example of utilizing machine learning. However machine learning or, more general, statistical algorithms can support the knowledge discovery at different stages from outlier detection and attribute (features) selection to knowledge modeling and model validation.

Feature Selection
Feature selection, also known as attribute selection or feature reduction, is the technique of selecting a subset of relevant features for building robust learning models. By removing most irrelevant and redundant attributes from the data, feature selection helps improve the performance of learning models by: speeding up the learning process and alleviating the effect of the curse of dimensionality. Moreover, the irrelevant attributes degrade the performance of state-of-the-art decision tree and rule learners [28].

Classification
As was written earlier in Section 3.1, classification is the most common type of machine learning application. The goal of the classification process is to find a way of classifying unseen examples based on the knowledge extracted from the provided set of classified instances. Extracting the knowledge from the provided dataset requires the attribute set characterizing the example to be divided into two groups: the class attribute and the non-class attributes. For unseen instances, only non-class attributes are known; hence, the aim of data mining algorithms is to create such a knowledge model that allows predicting the example class membership based only on non-class attributes.
The knowledge model depends on the way the classifier is constructed, and it can be represented by classification rules (the algorithm AQ21 [29]), decision trees (e.g., algorithm C4.5, [30]) or many other representations. Regardless of the representation, both classification rules and decision trees algorithms create hypotheses.
In the considered problem, the chloride resistance of concrete (class attribute) depending on the material composition and some predictions of the concrete (non-class attributes) is searched. We concentrated on the most popular representative of decision tree classifiers" the J48 algorithm, the open-source implementation of the last publicly-available version of a C4.5 method developed by J. Ross Quinlan [30]. This algorithm was compared to selected algorithms available in Weka [28] in Section 4.2.

Classifier Evaluation
So as to evaluate the classifier, i.e., to judge the hypotheses generated from the provided training set, we have to verify the classifier performance on the independent dataset, which is called the testing set. The classifier predicts the class of each instance from the test set; if it is correct, it is counted as a success; if not it, is an error. The measure of the overall performance of the classifier is the classification accuracy. This is the number of correct classifications of the instances from the test set divided by the total number of these instances, expressed as a percentage. The greater the classification accuracy, the better is the classifier.
In order to get a deeper understanding of which types of errors are the most frequent, the result obtained from a test set is often displayed as a two-dimensional confusion matrix with a row and a column for each class. Each matrix element shows the number of test examples, for which the actual class is the row and the predicted class is the column. Good results correspond to large numbers down the main diagonal and small, ideally zero, for the elements off the diagonal. The sum of the numbers down the main diagonal divided by the total number of test examples determine the classification accuracy.
Let's consider what can be done when the number of data for training and testing is limited. The simplest way to handle this situation is to reserve a certain number of examples for testing and to use the remainder for training. Of course, the selection should be done randomly. The main disadvantage of this simple method is that this random selection may not be representative. A more general way to mitigate any bias caused by the particular sample chosen for hold out is to repeat the whole process, training and testing, several times with different random samples. The random selection repeated many times can be treated as the basis of a statistical technique called cross-validation. In the k-fold cross-validation, the dataset U is split into k approximately equal portions (U = E 1 ... E k ) [31]. In each iteration i, the set E i is used for testing, and the remainder U \ E i is used for training. Overall classification accuracy is calculated as an average from the classification accuracy for each iteration.
When we have only one database consisting of a very small number of records, the estimation of classification accuracy (the measure of the overall performance of the classifier) can be done using the n-fold cross-validation, where n is the number of examples in the database. In this method, called leave-one-out cross-validation, each example in turn is left out, and the learning method is trained on all of the remaining examples. It is judged by its correctness on the remaining example, one or zero for success or failure, respectively. The results from n judgments, one for each member of the database, are averaged, and that average represents the classification accuracy [28].

Feature Selection
In Table 4, the dataset with 12 attributes is presented. It is clear that for database with a few dozens of instances, this number of attributes is too large. Some attributes can be eliminated, but it is important to eliminate the most irrelevant attributes.

Classification
As was mentioned in Section 3.3, the chloride resistance of concrete depending on material composition can be searched using one of many software suites available on the market, and we decided to utilize the Weka workbench. The Weka workbench provides over one hundred algorithms supporting classification. They belong to different types, like: Bayesian classifiers, rule classifiers, tree classifiers or meta classifiers. In our research, we decided to determine the chloride resistance of concrete using the selected 20 algorithms belonging to three different types of algorithms. As a training set, all of the instances from the database ( Table 6) were considered. The classification accuracy was evaluated using leave-one-out cross-validation. The obtained results are collected in Table 7. The best accuracy equaling almost 90% was obtained using the J48 algorithm. The decision tree generated by the J48 algorithm is presented in Figure 1, where the first number in brackets denotes the number of examples from the training set covered by a selected leaf, and the second number, just after the sign "/", indicates the number of incorrectly-classified instances (negative examples). The obtained decision tree can be easily transformed into the following rules: where p denotes the number of positive examples covered by the rule (i.e., the number of records from this class satisfying the rule) and n denotes the number of negative examples covered by the rule (i.e., the number of records from the other classes satisfying the rule).
The obtained decision rules determine the conditions concretes have to fulfill to provide appropriate resistance against chloride penetration.
Using the leave-one-out method (n = 56), we obtained a classification accuracy equal 89.3%. The result obtained from a test set is often displayed as a two-dimensional confusion matrix with a row and a column for each class. Each matrix element shows the number of test examples for which the actual class is the row and the predicted class is the column. The sum of the numbers down the main diagonal divided by the total number of test examples determine the classification accuracy. The confusion matrix of the solved problem is determined in the form presented in Table 8. Such a result can be considered satisfactory with respect to the limited number of records in the database.

Conclusions
The rules generated by algorithm J48 from the Weka workbench provided a means for the adequate classification of plain concretes and concretes modified with high calcium fly ash as materials of good, acceptable and unacceptable resistance to chloride penetration.
According to the generated rules, it is found that if the content of water in mixes is small enough (in investigated concretes, w ≤ 158 L/m 3 ), then concretes modified with high calcium fly ash are qualified as materials of good resistance to chloride penetration, whereas concretes without high calcium fly ash are qualified as materials of acceptable resistance. For greater content of water (w > 158 L/m 3 ), concretes using cement of low C 3 A with or without high calcium fly ash are characterized by unacceptable resistance to chloride penetration. However, when using cement of high C 3 A, the replacement 15% or 30% of cement mass by high calcium fly ash, particularly by ground fly ash, improves the resistance of concretes to chloride penetration.
It is found that both the specific surface of fly ash and the content of water and cement play a significant role in providing the required concrete resistance. The classifier was evaluated using the leave-one-out method. The obtained classification accuracy was equal to 89.3%. This value seems to be sufficient to acknowledge the correctness of the classifier. Due to a small number of tested specimens, the rules are applicable only to concrete mix compositions of similar binder content. Further tests are needed in order to enlarge the experimental database and to cover a broader range of concrete compositions.