Enhancing the Analysis of Rheological Behavior in Clinker-Aided Cementitious Systems Through Large Language Model-Based Synthetic Data Generation

Eser, Murat; Kaya, Yahya; Mardani, Ali; Bilgin, Metin; Bozdemir, Mehmet

doi:10.3390/ma18153579

Open AccessArticle

Enhancing the Analysis of Rheological Behavior in Clinker-Aided Cementitious Systems Through Large Language Model-Based Synthetic Data Generation

by

Murat Eser

¹

,

Yahya Kaya

²,

Ali Mardani

²

,

Metin Bilgin

^1,*

and

Mehmet Bozdemir

¹

Department of Computer Engineering, Bursa Uludag University, Bursa 16059, Turkey

²

Department of Civil Engineering, Bursa Uludag University, Bursa 16059, Turkey

^*

Author to whom correspondence should be addressed.

Materials 2025, 18(15), 3579; https://doi.org/10.3390/ma18153579

Submission received: 27 May 2025 / Revised: 11 July 2025 / Accepted: 15 July 2025 / Published: 30 July 2025

Download

Browse Figures

Versions Notes

Abstract

This study investigates the parameters influencing the compatibility between cement and polycarboxylate ether (PCE) admixtures in cements produced with various types and dosages of grinding aids (GAs). A total of 29 cement types (including a control) were prepared using seven different GAs at four dosage levels, and 87 paste mixtures were produced with three PCE dosages. Rheological behavior was evaluated via the Herschel–Bulkley model, focusing on dynamic yield stress (DYS) and viscosity. The data were modeled using CNN, Random Forest (RF), and Neural Classification and Regression Tree (NCART), and each model was enhanced with synthetic data generated by Large Language Models (LLMs), resulting in CNN-LLM, RF-LLM, and NCART-LLM variants. All six variants were evaluated using R-squared, Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Logcosh. This study is among the first to use LLMs for synthetic data augmentation. It augmented the experimental dataset synthetically and analyzed the effects on the study results. Among the baseline methods, NCART achieved the best performance for both viscosity (MAE = 1.04, RMSE = 1.33, R² = 0.84, Logcosh = 0.57) and DYS (MAE = 8.73, RMSE = 11.50, R² = 0.77, Logcosh = 8.09). Among baseline models, NCART performed best, while LLM augmentation significantly improved all models’ predictive accuracy. It was also observed that cements produced with GA exhibited higher DYS and viscosity than the control, likely due to finer particle size distribution. Overall, the study highlights the potential of LLM-based synthetic augmentation in modeling cement admixture compatibility.

Keywords:

rheological properties; cement admixture compatibility; grinding aids; artificial intelligence; supervised learning; large language models

1. Introduction

Greenhouse gas emissions, air pollution, and climate change are among the leading causes of global problems. In addressing these issues, developing alternatives to products that produce carbon dioxide (CO₂) emissions during production, as well as renewable energy sources and energy efficiency, have become increasingly important [1,2].

Cement is the building block of the modern world, but the large amount of energy and natural resources used in its production pose a serious burden on our planet. More than 2% of the electricity used worldwide and 5% of the electricity used in industry is spent on cement production [3]. This significantly increases the amount of carbon dioxide in the atmosphere, accelerating climate change. In fact, each ton of cement production causes approximately one ton of CO₂ emissions [2,4,5]. This shows that the cement industry accounts for a significant portion of global carbon emissions. Therefore, urgent steps must be taken to develop more sustainable production methods and reduce the environmental impact of cement production.

Approximately 35% of the energy consumed in cement production is spent in the clinker grinding stage. Furthermore, a significant portion of the energy expended in the grinding process is wasted as heat, sound, and vibration. As a result of studies conducted to reduce both energy consumption and costs, as well as to decrease the amount of CO₂ released into the environment, the use of grinding aids (GAs) has come to the fore [6,7].

GAs are organic or inorganic compounds used to reduce energy consumption and improve cement properties in the cement production process. These substances adsorb on the surface of cement particles thanks to the highly polar functional groups they contain, such as -OH, -NH₂, and -COOR. Adsorption occurs as a result of interactions such as neutralization of surface charges, van der Waals forces, and hydrogen bonds. In this way, friction between particles is reduced, grinding energy is reduced, and cement particles exhibit a more homogeneous distribution. The use of GA can accelerate hydration reactions by reducing the surface energy of cement and improve cement–admixture compatibility [2,8,9]. It can also affect properties such as fluidity, setting time, and strength of cement. When choosing GA, both grinding efficiency and environmental effects should be considered [10,11]. The use of GA is important in terms of environmental sustainability.

Plank et al. [12] emphasized that GAs and water-reducing admixtures (PCE) have a similar mechanism in terms of adsorption onto the cement grain. Therefore, PCE adhesion to the cement grain may be negatively affected in cementitious systems prepared with cements produced using GA. It has been stated that in case of binder–admixture incompatibility, negative effects such as rapid or slow setting [13], segregation [14], loss of consistency, and increased shrinkage [15] may occur in cementitious systems. Some of these negative effects may be due to binder properties, while some may be due to additive properties. Since the PCEs used affect the cement properties, the cement–PCE compatibility of the cements in which PCEs are used during production should be examined comprehensively. Depending on the type, dosage, and chemical content of the GA used, its effects on both cement surface properties and hydration reactions vary [16]. Some studies examining the compatibility of cements produced with GA with PCE are summarized below.

Sun et al. [17] investigated the compatibility of cements produced using glycerine type GA at 0.01, 0.02, 0.03, and 0.04 ratios with PCE. It was observed that the flow performance of PCE is higher when the GA dosage is less than 0.02%. It was determined that a lower percentage of PCE was adsorbed on cements containing more than 0.02% GA. In another study, the effect of the use of GA on the surface energy of the cement grain was investigated by Prziwara and Kwade [11]. It was stated that GA adsorbed onto the cement grain, causing a decrease in surface energy. On the other hand, the cements produced using GA may physically have higher surface energy compared to the grains without GA due to the smaller particles. This situation revealed that there are different opinions about the adsorption of PCE to cements containing GA. It was stated that PCE adsorption was seriously affected by the high positive charge density of the cement surface [18,19]. Sun et al. [17] emphasized that the cement surface charge density can vary depending on the type of GA and its usage rate. Accordingly, the degree of adsorption of PCE to cement varies in cements containing GA. As can be understood from the literature, the PCE compatibility of cemented systems should be examined in the presence of GA.

Mardani-Aghabaglou et al. [13] reported that determining the rheological parameters of cementitious systems is one of the most reliable methods for examining cement-admixture compatibility. It was emphasized that many experiments should be performed to determine the parameters in question [20]. Many modeling and regression studies were carried out to facilitate this very laborious process [21]. Artificial neural networks and regression models are widely applied for this purpose [21]. In their study, Venkata Rao et al. [22] used different approaches to predict the shear behavior of glass fiber reinforced concrete slabs. They adopted artificial neural network modeling in their study and used Mean Absolute Error and Root Mean Square Error criteria. Revathi et al. [23] conducted modeling studies to alleviate the experimental load of geopolymer mixtures for sustainable production. In this direction, they used machine learning techniques effectively in mixture design and interpretation. In another study, Nagaraju et al. [24] estimated the compressive strength of concrete mixtures with the help of linear regression and Bayesian optimization. The compressive strength of concrete was determined by multivariate adaptive regression splines (MARS), artificial colony, and artificial neural networks (ANNs) by Cheng and Cao [25]. In another study, the compressive strength of concrete was modeled by multiple summation regression trees (MARTs) and ANN methods [26]. Kaveh et al. [27] modeled the compressive strength and fresh state properties of cementitious systems using the M5 Tree and MARS methods. Asteris and Mokos [28] modeled the compressive strength and elasticity modulus of cementitious systems with MARS, M5 Tree model, and ANN, and compared the results. Compressive strength of cementitious systems [28,29,30,31] flexural strength [32,33], V-funnel flow time [31], L-box [27], and durability performance [34,35] were modeled using many methods. It has been understood from the literature that many studies were conducted to model the strength and durability properties of cementitious systems. However, a limited number of studies were carried out on the modeling of rheological parameters. Mardani-Aghabaglou et al. [36] investigated the compatibility of different types of cements with water-reducing admixture by modeling them with ANN, MARS, and CRA methods. No study was found in which the rheological parameters of cementitious systems containing GA were modeled. It has been determined that there is a limited number of studies in the literature examining and modeling the cement–water reducing admixture compatibility of cements produced with GA. In this study, the rheological properties, such as dynamic yield shear stress and final viscosity of cement paste mixtures containing GA, were modeled, and the effective parameters were determined according to their order of action. For this purpose, Neural Classification and Regression Tree (NCART), Random Forest (RF), and Convolutional Neural Network (CNN) methods were applied. In particular, the ncart method has been recently introduced to the literature and stands out due to its high performance in cases where the dataset is limited and because it has not been used before in the estimation of DYS and viscosity values. In addition, synthetic data was generated using Large Language Models to improve model performance, and the effects on the models were examined. Mean Absolute Error (MAE), Root Mean Square Error (RMSE), R-squared, and Logcosh metrics were used to measure the performance of the models.

2. Experimental Study

In the modeling study, data obtained from the authors’ previous experimental study [3,37] were used. The properties and test methodology of the materials used in the study are given below.

2.1. Materials

The cements used in the study were obtained by grinding 4.8 kg of clinker and 0.2 kg of gypsum (gypsum at s rate of 4% of the total ground mass) in a laboratory-type mill. In addition to 5 different GAs that are widely used in the industry, 2 modified GAs—obtained as a result of the modification of the TEA additive through the esterification reaction— were also included, resulting in a total of 7 GAs used. The GAs were used in four different ratios, 0.025%, 0.05%, 0.075%, and 0.1% of the total weight of clinker and gypsum during the clinker grinding phase. Thus, in addition to the control cement that does not contain GA, a total of 29 different CEM I 42.5R type cements in compliance with the EN 197-1 Standard [38] were produced. All cements were ground to a target Blaine fineness of 4100 ± 100 cm²/g. The Blaine fineness value and particle size distributions of the cements used are shown in Table 1. A uniform polycarboxylate ether-based high-rate water-reducing admixture (PCE) was used. Some properties of the used GA and PCE provided by the manufacturer are shown in Table 2, and their molecular representations are shown in Figure 1.

2.2. Preparation of Mixtures

In order to determine the rheological parameters, 87 paste mixes with 0.32 w/c ratio and 0, 0.1, and 0.15% PCE by cement weight were prepared. In the selection of these ratios, the measuring capacity of the rheometer used and the mixing capacity of the mixer used were taken into account. The naming of the paste mixtures was made according to the type of GA used and its rate of use. For example, the mixture without GA is denoted by C, while the mixture containing 0.025% TIPA type GA is named 0.025 TIPA.

2.3. Method

The rheological parameters of the mixtures were measured with the help of the Herschel–Bulkley model shown in Equation (1) using a ball measuring system–BMS rheometer. Similar measurement systems were applied by other researchers [2,40,41,42]

The relationship between shear stress (τ) and shear rate

\overset{\cdot}{γ} = (\frac{d γ}{d t})

according to the Herschel–Buckley model is shown in Equation (1).

τ = τ o + b \cdot {\dot{γ}}^{ρ}

(1)

Here, τ is the dynamic shear stress, τ_o is the dynamic threshold shear stress, b is the consistency coefficient, p is the Herschel–Bulkley index, and

\overset{\cdot}{γ}

= (dγ/dt) is the deformation rate.

3. Modeling Methodology

The rheological properties (DYS and viscosity) of paste mixtures prepared using GA and control cement without GA are shown in Table 3. Three different PCE ratios were used to examine the compatibility of the cements with the water-reducing admixture. Cement–PCE compatibility was examined based on the changes in these rheological parameters.

Two different methods were used in modeling the data: hybrid learning, machine learning, and deep learning. Neural Classification and Regression Tree (NCART) was used as a hybrid learning method, Random Forest (RF) was used as the learning method, and Convolutional Neural Network (CNN) was used as the deep learning method. Four different metrics were used to evaluate the results: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), R-squared, and Logcosh.

Cross-validation is a statistical method used to evaluate the predictive performance of a model by dividing the original dataset into multiple subsets, training the model on some of them, and validating it on the remaining data. This approach is especially useful in cases where data is limited, as it maximizes the use of the dataset while reducing the risk of overfitting [43,44,45]. One of the most commonly used cross-validation techniques is k-fold cross-validation, where the dataset is divided into k equal parts. The model is trained on k-1 folds and tested on the remaining folds; this process is repeated k times, so that each fold serves as a validation set once [45,46]. In the study, the 10-fold cross-validation technique was used to avoid the memorization problem due to the low number of data points and to create robust models. The general methodology followed in this study is presented in Figure 2.

Python programming language version 3.9 was used in the modeling processes. The Keras library was preferred for the implementation of the CNN model. While obtaining the results of the NCART method, 5000 epochs were run, for the n_trees parameter 2–10, for the n_layers parameter 2–10, and for the n_selected parameter 4–6. While obtaining the results of the NCART method, 5000 epochs were run, for the n_trees parameter 2–10, for the n_layers parameter 2–10, and for the n_selected parameter 4–6. As a result of setting the parameters for the DYS experiment, n_trees: 2, n_layers: 4, n_selected: 4, and for the viscosity experiment, n_trees: 10, n_layers: 4, n_selected: 6. For CNN, the average values of the metrics obtained after 5000 iterations were calculated. In the CNN method, Sigmoid and Linear activation functions were used for the hidden layer and output layer, respectively. Optimization was performed between 3 and 199 trees while deciding how many decision trees would be used for the RF algorithm. The depth value that produced the best results was selected, and the operations were performed. According to the results of the optimization process, the depth values for the viscosity and dynamic yield stress (DYS) parameters were found to be 19 and 61, respectively. Equation (2) was used in the Linear Regression Analysis (LR) method to determine the coefficients within the scope of the study. The aim of determining the coefficients is for researchers to obtain output with their data.

\sum_{i = 1}^{M} {(y_{i} - \underline{y_{i}})}^{2} = \sum_{i = 1}^{M} {(y_{i} - \sum_{j = 0}^{p} w_{j} \times x_{i j})}^{2}

(2)

3.1. Neural Classification and Regression Tree (NCART)

NCART, a new model for analyzing tabular data, combines the advantages of deep learning and decision tree [47]. NCART combines the computational power of deep learning models and the interpretability of decision trees to create effective and successful prediction models. During the implementation of the method, 5000 training time, 2–10 as the number of decision trees, 2–10 as the neural network layers, and 4–6 value ranges for the n_selected parameter were used.

3.2. Random Forest (RF)

It was stated that the RF method is one of the most widely used decision tree-based ensemble learning methods, which produces multiple classifiers instead of a classifier and then classifies the test data with the votes obtained from the estimates [48].

Each dataset was generated from the original dataset by displacement, then the trees were developed using random feature selection [49]. It was reported that the RF method is very fast and has high accuracy [50,51].

3.3. Convolutional Neural Network (CNN)

Convolution layer and neuron numbers were applied as 1–2 and 64, 128, 16–8, 32–16, respectively. The hidden layer numbers are 2 and 3; the number of neurons in the hidden layers is realized as 32–16; 16–8–4; 32–16–8. The learning rate and momentum coefficients were taken as 0.01 and 0, respectively.

3.4. LLMs (Large Language Models)

Large Language Models (LLMs) are artificial intelligence systems designed to generate human-like text. Their foundations trace back to the 2017 paper “Attention Is All You Need” [52], which marked a pivotal milestone in natural language processing; the Transformer architecture introduced therein has become the cornerstone of contemporary large-scale language models. By leveraging LLMs, a wide array of tasks—such as text comprehension, text generation, machine translation, information retrieval, information extraction, and even synthetic dataset creation—can be performed with high efficiency. GPT-4o was used for synthetic data augmentation in the study [53].

3.5. Evaluation of Model Performance

In order to evaluate the results of the 3 modeling methods applied in the study, 3 different metrics, namely MAE, RMSE, R-squared, and Logcosh, presented in Equations (3)–(6), were used. Based on the statistical parameters used in Equations (3)–(6), lower MAE, RMSE, and Logcosh and higher R-squared values are the desired outcomes.

M A E = \frac{1}{n} \sum_{i = 1}^{n} |E_{i} - P_{i}|

(3)

R M S E = {[\frac{1}{n} \sum_{i = 1}^{n} {(E_{i} - P_{i})}^{2}]}^{1 / 2}

(4)

R - s q u a r e d = 1 - \frac{\sum {(E_{i} - E_{p})}^{2}}{\sum {(E_{i} - M e a n)}^{2}}

(5)

L o g c o s h = \sum_{i = 1}^{n} l o g (c o s h (P_{i} - E_{i})) c o s h (x) = \frac{e^{x} + e^{- x}}{2}

(6)

4. Results and Discussion

When the results in Table 3 were examined, the yield stress and viscosity values generally decreased with the addition of PCE to the mixture, as expected. This behavior became more evident with the increase in PCE dosage. DYS and final viscosity values of the mixtures generally increased with the increase in GA dosage. As GAs are adsorbed on the cement particle, they reduce agglomeration and provide fluidity [2,17]. In the case of low-dosage GA use, reductions in the rheological values of paste mixtures were observed depending on the mentioned mechanism. However, the adsorption of PCE on the particle surface decreases when the surface of the cement particles is surrounded by the GA in the case of high GA. It is understood from Table 1 that as the GA dosage increases, the fine particle ratio increases in cements having GA with the same Blaine fineness value as the control cement. Obtaining finer cements by using GA has both positive and negative effects. While finer grains increase water demand and negatively affect rheological parameters, they can suppress this negative effect by causing higher PCE adsorption. Sun et al. [17] reported that GA can adsorb more strongly to finer particles. It is thought that PCE adsorption is lower due to the higher GA adsorption in finer particles. As the adsorption of PCE to the cement particle increases, the dispersion ability of the admixture may also increase [18]. Moreover, a high dosage of GAs may reduce PCE adsorption due to the competitive adsorption of GAs [9,19,54]. Under such conditions, increasing the PCE dosage in the presence of elevated GA levels results in a higher concentration of non-adsorbed PCE within the system. This may cause the main and side chains of non-adsorbed PCE molecules to become entangled and aggregate [39,42,55]. On the other hand, depending on the formation of smoother particles in the presence of GA, it can positively affect the rheological properties. This observation underlines the need for accurate modeling of material behavior in systems affected by additive interactions. In a related study, Ai et al. [56] proposed a numerical model to simulate physical phenomena such as melt pool and keyhole evolution, demonstrating that detailed modeling can enhance the predictive capability in complex material systems.

It has been understood that while the positive effects of GA are dominant with the use of GA at a lower dosage (0.025), the negative effects come to the fore as the GA dosage increases.

The mixtures containing 0.025% and 0.05% GA showed the lowest performance in terms of the rheological properties (high DYS and FV of the TEA mixture). The lowest performance was measured in the DEIPA mixture when the GA content was above 0.05%. Mixtures containing M-TEA-2 exhibited the highest rheological performance (low DYS and FV values), regardless of the GA usage rate.

4.1. Determining the Dominant Parameters

The RF method was used to determine the dominant parameters affecting the rheological properties. For this purpose, the feature importance method was taken into account. The ranking obtained for the dominant parameters from the RF model for viscosity and DYS data is shown in Figure 3.

Figure 3 shows the percentage of the independent variables affecting the values of the dependent variables DYS and viscosity in such a way that the total values are 100%. In terms of viscosity characteristics, the dominant parameters were determined, respectively, as follows:

The material amount residue on 45 microns, water-reducing admixture dosage, the material amount residue on 60 microns, the material amount residue on 32 microns, zeta potential, molecular weight, pH, GA utilization rate, density, the number of hydrogen bond acceptors, the number of functional groups, number of molecules per fineness.

Water-reducing admixture dosage, the material amount residue on 45 microns, the material amount residue on 32 microns, the material amount residue on 60 microns, pH, zeta potential, molecular weight, GA utilization rate, density, the number of functional groups, the number of hydrogen bond acceptors, the number of molecules per fineness. The mentioned ranking for DYS was as follows:

As expected, water reducing admixture dosage and particle size distribution were the most dominant parameters in terms of rheological properties. In this context, the cement particle size distribution parameter was more dominant than the Blaine fineness value. It was previously emphasized that cement particle size distribution was directly affected by GA type and usage rate. Therefore, rheological parameters of cementitious systems were affected by the presence of GA. In addition to the positive effects of GA usage in terms of energy efficiency, it should be considered that it may also provide benefits in terms of rheological properties and cement-PCE compatibility of cementitious systems.

The statistical information of DYS and viscosity-dependent variables is presented in Figure 4. After determining the dominant parameters affecting the DYS and viscosity-independent variables, the statistical information of the four most effective independent parameters is presented in Figure 5.

4.2. Factors Affecting Cement–Admixture Compatibility

4.2.1. Cement Fineness and Its Role in Cement–PCE Compatibility

Cement fineness significantly influences compatibility with water-reducing admixtures [18]. Both Blaine fineness and particle size distribution impact surface energy and adsorption capacity [57,58]. Finer particles (<10 µm) enhance PCE adsorption due to higher surface energies [17,36]. Additionally, the presence of GAs alters surface energies, adsorption capacities, and Zeta potential, affecting the compatibility of GAs with PCEs [2].

4.2.2. Grinding Aid Properties and Their Effect on Cement Rheology

The performance of GAs depends on parameters such as pH, molecular weight, utilization rate, and functional groups [11]. Comparisons based on molecular count rather than mass provide better insight into adsorption performance. GAs with pH values differing from neutrality (e.g., TEA, TIPA) enhance rheological properties [2,11]. These findings align with earlier studies on the impact of pH on GA performance [2].

4.2.3. Structural and Chemical Factors of PCEs Affecting Cement Rheology

PCE characteristics such as backbone length, side chain number, molecular weight, and intermolecular bonds significantly influence dispersion and rheological parameters [58,59,60,61]. Comb-like side chains of PCEs enhance dispersion compared to GAs [2], and the utilization rate remains the most critical factor for improving rheological properties [36].

4.3. Dataset Power Analysis

When performing power analyses on the dataset, power calculations were performed using Cohen’s F² effect size given in Equation (7) and the F-test statistic given in Equation (8). The equation shows the R² variance ratio, k predictor variable count, and n sample count.

f^{2} = \frac{R^{2}}{1 - R^{2}}

(7)

F = \frac{\frac{R^{2}}{k}}{\frac{1 - R^{2}}{n - k - 1}}

(8)

According to the power analysis evaluations, the dataset used in the study was found to be sufficient, with a power value of 0.956 in terms of a large effect. The small and medium effect values are 0.096 and 0.573, respectively, and it has been calculated that the number of data points needs to be increased. Therefore, synthetic data augmentation was applied using the LLM to improve the reliability of the model in terms of small and medium effects.

5. Modeling Results

The coefficients obtained from the regression analysis applied to the DYS and viscosity outputs are summarized in Table 4. The coefficients of the experimental parameters in the viscosity and DYS experiments were calculated by regression analysis methods and presented in Table 4. The weights in Table 4 can be used as in Equation (9). In the table, weights were calculated for 12 different features (inputs) of the experiments. By using the weights of these features, predictions can be generated for data that is not available in the dataset or is available to another researcher. When Equation (9) is examined, the value in the viscosity experiment for the fineness feature (input), denoted by

X_{1},

is multiplied by the weight coefficient 13.2043. After this process is applied to the remaining 11 features, the intercept value in the last column of the table is added to the result. The result is the output prediction of the data from the related viscosity experiment. Using the weights presented in Table 4, prediction values can be obtained for all data.

P r e d i c t i o n = X_{1} \times 13.2043 + X_{2} \times (- 4.4515) + X_{3} \times (- 60.9363) + X_{4} \times 3.9518 + X_{5} \times 15.2265 + X_{6} \times (- 7.5471) + X_{7} \times 0.0000 + X_{8} \times (- 5.1809) + X_{9} \times 14.2919 + X_{10} \times 6.1875 + X_{11} \times 6.1875 + X_{12} \times (- 43.2357) + 65.8933

(9)

CNN results for viscosity and dynamic yield stress (DYS) are presented in Table 5. To achieve optimal performance, the architecture was systematically varied by adjusting the number of convolutional layers and their neurons, as well as the number of hidden layers and their neuron counts.

As shown in Table 5, the best result for viscosity using the CNN method is obtained with one convolutional layer containing 128 neurons and two hidden layers with 32 and 16 neurons, respectively. Table 5 shows that the best result for DYS in the CNN method is obtained when the number of neurons in one convolutional layer is 128 neurons, and the number of neurons in two hidden layers is 32-16 neurons.

In the viscosity experiment, the CNN model was more accurate in predicting the target value, whereas in the DYS experiment, it performed less well in predicting the target value.

The RF model results are presented in Table 6. As shown, viscosity predictions made with the RF method achieved higher accuracy compared to those for DYS.

Results from the NCART model for viscosity and DYS are shown in Table 7. When the data in Table 7 were examined, the results obtained for viscosity in the NCART method were better than DYS. It is thought that the reason why all models produce better results in the viscosity experiment than in the DYS experiment is that the DYS results are spread over a larger area compared to the viscosity results. The stability of the metric rankings measured using other methods prior to this point is crucial to ensure consistent results.

In order to determine the most successful method for viscosity and DYS properties, the values obtained for all metric types are compared in Table 8, and the values obtained with augmented data are compared in Table 9.

Charts that compare results from experimental data and models are shown in Figure 6 and Figure 7. When the figures for both DYS and viscosity were examined, it was seen that the results obtained from the modeling are close to the experimental values. The closeness of these values to each other can be shown as evidence for the validity of the models. Figure 6 and Figure 7 show the experimental values of viscosity and DYS and the values obtained by the models during testing.

From Table 8, it is seen that the NCART method gives the best results in all metrics for both viscosity and DYS parameters. The RF method gives the second-best results in all metrics for both viscosity and DYS parameters. The CNN method was behind the other two models in both experiments. As a result, in terms of viscosity parameters, the results obtained with the NCART method were 0.19 and 1.36 times better than the RF and CNN methods, respectively. For DYS, it was calculated as 0.24 and 10.29, respectively.

According to Table 9, NCART achieves the highest performance across all metrics for both viscosity and DYS. Furthermore, when the experimental data is augmented with synthetic samples generated by the LLM, all baseline models show significant improvements in all measurement metrics. Specifically, NCART-LLM reduces the viscosity test MAE from 1.04 to 0.53 and the DYS test MAE from 8.73 to 3.29, thus outperforming the original NCART by 0.51 and 5.44 MAE points, respectively.

Similarly, RF-LLM reduced RF’s viscosity MAE from 1.23 to 0.97 and DYS MAE from 8.97 to 6.88, while CNN-LLM reduced CNN’s viscosity MAE from 2.40 to 0.75 and DYS MAE from 19.02 to 3.27. Similarly, for all other metrics, RF-LLM and CNN-LMM show significant improvement in target predictions. In particular, the CNN-LLM shows a performance improvement over the CNN model in all metrics, as expected. This can be explained by the fact that the performance of CNN models improves in direct proportion to the amount of data. The performance of deep learning-based models, such as CNN, is directly related to the amount of data. Therefore, the results of the model trained with 87 examples lag behind those of other models. Deep learning-based models require more data for better performance. Therefore, the limited amount of data has reduced the model’s generalization ability. As a result, it demonstrates the significant benefit of LLM-based data augmentation for improving the generalization capabilities of the models.

Charts comparing the results obtained from the augmented experimental data and models are shown in Figure 7. When the results for viscosity and DYS are examined, it is seen that the results obtained from the modeling are closer to the experimental values than the normal models. The closeness of the predictions made by the models with the augmented experimental data confirms the metric results given in Table 9. The close agreement observed in the viscosity and DYS experiments further validates our modeling approach.

6. Conclusions

This study aimed to model the compatibility between cements produced with grinding aids (GAs) and water-reducing admixtures (PCE) and to identify the key parameters affecting rheological behavior.

The results showed that paste viscosity is predominantly influenced by PCE dosage, particle size distribution (residues on 45, 32, and 60 microns), pH, zeta potential, molecular weight, GA dosage, and several molecular descriptors. Similarly, dynamic yield stress (DYS) is primarily affected by residue on 45 microns, PCE dosage, residue on 60 and 32 microns, zeta potential, molecular weight, and other related parameters.
It was concluded that particle size distribution is the most critical factor determining cement–PCE compatibility. Since GA usage directly influences fineness characteristics, the type and dosage of GA should be carefully selected. Future research is recommended to focus on the relationship between GA use and cement powder properties in greater detail.
Among the modeling methods employed, NCART showed the best performance in predicting both viscosity and DYS, followed closely by Random Forest (RF). The models achieved R² values of 0.84 for viscosity and 0.77 for DYS, indicating strong predictive capabilities, though some room for improvement remains.
The study also highlighted the limitations of small datasets, which may reduce model generalizability and increase the risk of overfitting. However, the decision tree-based models used (NCART and RF) proved advantageous due to their non-parametric nature and ability to handle limited data without distributional assumptions.
In terms of error metrics, Logcosh yielded the lowest errors across all models, followed by MAE and RMSE. When Large Language Model (LLM)-based data augmentation was applied, all models showed substantial improvement. For example, NCART-LLM reduced the MAE of viscosity from 1.04 to 0.53 and of DYS from 8.73 to 3.29. Similar improvements were observed in RF-LLM and CNN-LLM, with CNN showing the most notable gains, likely due to its higher learning capacity and sensitivity to increased data volume.
Overall, the integration of LLM-generated synthetic data significantly enhanced model performance, especially for CNN and NCART structures. NCART-LLM achieved an R² value of 0.95 for viscosity prediction, and both NCART-LLM and CNN-LLM achieved R² values of 0.95 for DYS prediction. These findings confirm that LLM-augmented data can successfully strengthen model accuracy and robustness in small to medium-sized datasets.
LLM-generated examples added to real-world data have significantly improved model accuracy by compensating for the lack of power in medium and small domains.

Author Contributions

Conceptualization, M.E., Y.K., A.M., M.B. (Metin Bilgin) and M.B. (Mehmet Bozdemir); methodology, M.E., Y.K., A.M., M.B. (Metin Bilgin) and M.B. (Mehmet Bozdemir); software, M.E., Y.K., A.M., M.B. (Metin Bilgin) and M.B. (Mehmet Bozdemir); formal analysis, M.E., Y.K., A.M., M.B. (Metin Bilgin) and M.B. (Mehmet Bozdemir); investigation, M.E., Y.K., A.M., M.B. (Metin Bilgin) and M.B. (Mehmet Bozdemir); resources, M.E., Y.K., A.M., M.B. (Metin Bilgin) and M.B. (Mehmet Bozdemir); writing—original draft preparation, M.E., Y.K., A.M., M.B. (Metin Bilgin) and M.B. (Mehmet Bozdemir); writing—review and editing, M.E., Y.K., A.M., M.B. (Metin Bilgin) and M.B. (Mehmet Bozdemir). All authors have read and agreed to the published version of the manuscript.

Funding

They extend their appreciation to the Bursa Uludağ University Science and Technology Centre (BAP) for their contributions under grant number FGA-2025-2123.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

They extend their appreciation to the Bursa Uludağ University Science and Technology Centre (BAP) for their contributions under grant number FGA-2025-2123. The authors wish to acknowledge the support of the Scientific and Technological Research Council of Turkey (TÜBİTAK) through grant number 222M245. They also extend their thanks to the Bursa Uludağ University Science and Technology Centre (BAP) for their contributions under grant numbers FHIZ-2025-2428 and FDK-2024 1960. Additionally, the lead author expresses gratitude to TÜBİTAK for awarding the 2211A scholarship during their doctoral studies. The third author thanks the Turkish Academy of Sciences (TÜBA).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial intelligence
BMS	Ball measuring system
CNN	Convolutional Neural Network
DEG	Diethylene glycol
DEIPA	Diethanol isopropanol amine
DYS	Dynamic yield stress
E	Ethylene glycol
F	Final viscosity
LF	Linear Regression Function
LS	Lignosulphonates
MAE	Mean Absolute Error
MARS	Multivariate Adaptive Regression Splines
MARC	Multiple summation regression trees
M-TEA	Modified triethanolamine
NCART	Neural Classification and Regression Tree
GA	Grinding aids
PCE	Polycarboxylate Ether-Based Water-Reducing Admixture Superplasticizer
PF	Power Regression Function
RF	Random Forest
RMSE	Root Mean Square Error
TEA	Triethanolamine
TIPA	Triisopropanolamine

References

Boesch, M.E.; Hellweg, S. Identifying Improvement Potentials in Cement Production with Life Cycle Assessment. Environ. Sci. Technol. 2010, 44, 9143–9149. [Google Scholar] [CrossRef]
Zhang, Y.; Kong, X. Correlations of the dispersing capability of NSF and PCE types of superplasticizer and their impacts on cement hydration with the adsorption in fresh cement pastes. Cem. Concr. Res. 2015, 69, 1–9. [Google Scholar] [CrossRef]
Kobya, V.; Kaya, Y.; Mardani-Aghabaglou, A. Effect of amine and glycol-based grinding aids utilization rate on grinding efficiency and rheological properties of cementitious systems. J. Build. Eng. 2022, 47, 103917. [Google Scholar] [CrossRef]
Mardani-Aghabaglou, A.; Karakuzu, K.; Kobya, V.; Hatungimana, D. Durability performance and dimensional stability of road concrete containing dry-shake surface hardener admixture. Constr. Build. Mater. 2021, 274, 121789. [Google Scholar] [CrossRef]
Altun, M.G.; Özen, S.; Mardani-Aghabaglou, A. Effect of side chain length change of polycarboxylate-ether based high range water reducing admixture on properties of self-compacting concrete. Constr. Build. Mater. 2020, 246, 118427. [Google Scholar] [CrossRef]
Kaya, Y.; Kobya, V.; Mardani, A. Evaluation of fresh state, rheological properties, and compressive strength performance of cementitious system with grinding aids. J. Appl. Polym. Sci. 2024, 141, e55212. [Google Scholar] [CrossRef]
Kaya, Y.; Kobya, V.; Mardani, A.; Mardani, N.; Beytekin, H.E. Effect of grinding conditions on clinker grinding efficiency: Ball size, mill rotation speed, and feed rate. Buildings 2024, 14, 2356. [Google Scholar] [CrossRef]
Kobya, V.; Karakuzu, K.; Mardani, A.; Felekoğlu, B.; Ramyar, K. Effect of Chain Characteristics of Polycarboxylate-Based Water-Reducing Admixtures on Behavior of Cementitious Systems: A Review. J. Mater. Civ. Eng. 2023, 35, 03123002. [Google Scholar] [CrossRef]
Kaya, Y.; Kobya, V.; Kaya, Y.; Mardani, A. Impact of PCE-based grinding aids on hydration kinetics in fly ash substituted systems: Influence of pH and dosage. Constr. Build. Mater. 2025, 458, 139531. [Google Scholar] [CrossRef]
Sun, Z.; Liu, Y. A grindability model for grinding aids and their impact on cement properties. Adv. Cem. Res. 2016, 28, 475–484. [Google Scholar] [CrossRef]
Prziwara, P.; Kwade, A. Grinding aids for dry fine grinding processes–Part I: Mechanism of action and lab-scale grinding. Powder Technol. 2020, 375, 146–160. [Google Scholar] [CrossRef]
Plank, J.; Sachsenhauser, B.; De Reese, J. Experimental determination of the thermodynamic parameters affecting the adsorption behaviour and dispersion effectiveness of PCE superplasticizers. Cem. Concr. Res. 2010, 40, 699–709. [Google Scholar] [CrossRef]
Mardani-Aghabaglou, A. Portland Çimentosu ve Süperakışkanlaştırıcı Katkı Uyumunun Incelenmesi. Ph.D. Thesis, Ege Üniversitesi Fen Bilimleri Enstitüsü, İnşaat Mühendisliği Anabilim Dalı, İzmir, Türkiye, 2016. [Google Scholar]
Mardani-Aghabaglou, A.; Boyacı, O.C.; Hosseinnezhad, H.; Felekoğlu, B.; Ramyar, K. Effect of gypsum type on properties of cementitious materials containing high range water reducing admixture. Cem. Concr. Compos. 2016, 68, 15–26. [Google Scholar] [CrossRef]
Mardani-Aghabaglou, A.; Felekoğlu, B.; Ramyar, K. Effect of Cement C3A Content on Properties of Cementitious Systems Containing High-Range Water-Reducing Admixture. J. Mater. Civ. Eng. 2017, 29, 04017066. [Google Scholar] [CrossRef]
Assaad, J.J.; Asseily, S.E.; Harb, J. Use of cement grinding aids to optimise clinker factor. Adv. Cem. Res. 2010, 22, 29–36. [Google Scholar] [CrossRef]
Sun, Z.; Liu, H.; Ji, Y.; Pang, M. Influence of glycerin grinding aid on the compatibility between cement and polycarboxylate superplasticizer and its mechanism. Constr. Build. Mater. 2020, 233, 117104. [Google Scholar] [CrossRef]
Karakuzu, K. Su Azaltıcı Katkı Anyonik Monomer Değişiminin Farklı C₃A İçeriğindeki Çimentolu Sistemlerin Davranışına Etkisi. Ph.D. Thesis, Bursa Uludağ University, Bursa, Türkiye, 2023. [Google Scholar]
Kobya, V.; Karakuzu, K.; Mardani, A.; Felekoğlu, B.; Ramyar, K. Combined interaction of PCE chains lengths, C3A and water content in cementitious systems. Constr. Build. Mater. 2023, 378, 131178. [Google Scholar] [CrossRef]
Assaad, J.J.; Issa, C.A. Effect of clinker grinding aids on flow of cement-based materials. Cem. Concr. Res. 2014, 63, 1–11. [Google Scholar] [CrossRef]
Mardani-Aghabaglou, A.; Öztürk, H.T.; Kankal, M.; Ramyar, K. Assessment and prediction of cement paste flow behavior; Marsh-funnel flow time and mini-slump values. Constr. Build. Mater. 2021, 301, 124072. [Google Scholar] [CrossRef]
Venkata Rao, M.; Sivagamasundari, R.; Nagaraju, T.V.; Subrahmanyam Raju, A. Neural Network Modeling for Assessing Punching Shear Strength in GFRP-Reinforced Concrete Slabs. In World Conference on Artificial Intelligence: Advances and Applications; Springer Nature: Singapore, 2024; pp. 387–396. [Google Scholar]
Revathi, B.; Gobinath, R.; Bala, G.S.; Nagaraju, T.V.; Bonthu, S. Harnessing explainable Artificial Intelligence (XAI) for enhanced geopolymer concrete mix optimization. Results Eng. 2024, 24, 103036. [Google Scholar] [CrossRef]
Nagaraju, T.V.; Mantena, S.; Azab, M.; Alisha, S.S.; El Hachem, C.; Adamu, M.; Murthy, P.S.R. Prediction of high strength ternary blended concrete containing different silica proportions using machine learning approaches. Results Eng. 2023, 17, 100973. [Google Scholar] [CrossRef]
Cheng, M.Y.; Cao, M.T. Estimating strength of rubberized concrete using evolutionary multivariate adaptive regression splines. J. Civ. Eng. Manag. 2016, 22, 711–720. [Google Scholar] [CrossRef]
Chou, J.S.; Chiu, C.K.; Farfoura, M.; Al-Taharwa, I. Optimizing the prediction accuracy of concrete compressive strength based on a comparison of data-mining techniques. J. Comput. Civ. Eng. 2011, 25, 242–253. [Google Scholar] [CrossRef]
Kaveh, A.; Bakhshpoori, T.; Hamze-Ziabari, S.M. M5’ and Mars based prediction models for properties of self-compacting concrete containing fly ash. Period. Polytech. Civ. Eng. 2018, 62, 281–294. [Google Scholar] [CrossRef]
Asteris, P.G.; Mokos, V.G. Concrete compressive strength using artificial neural networks. Neural Comput. Appl. 2020, 32, 11807–11826. [Google Scholar] [CrossRef]
Zhang, J.; Ma, G.; Huang, Y.; Aslani, F.; Nener, B. Modelling uniaxial compressive strength of lightweight self-compacting concrete using random forest regression. Constr. Build. Mater. 2019, 210, 713–719. [Google Scholar] [CrossRef]
Amlashi, A.T.; Abdollahi, S.M.; Goodarzi, S.; Ghanizadeh, A.R. Soft computing based formulations for slump, compressive strength, and elastic modulus of bentonite plastic concrete. J. Clean. Prod. 2019, 230, 1197–1216. [Google Scholar] [CrossRef]
Saha, P.; Debnath, P.; Thomas, P. Prediction of fresh and hardened properties of self-compacting concrete using support vector regression approach. Neural Comput. Appl. 2020, 32, 7995–8010. [Google Scholar] [CrossRef]
Xu, J.; Zhao, X.; Yu, Y.; Xie, T.; Yang, G.; Xue, J. Parametric sensitivity analysis and modelling of mechanical properties of normal-and high-strength recycled aggregate concrete using grey theory, multiple nonlinear regression and artificial neural networks. Constr. Build. Mater. 2019, 211, 479–491. [Google Scholar] [CrossRef]
Gholampour, A.; Mansouri, I.; Kisi, O.; Ozbakkaloglu, T. Evaluation of mechanical properties of concretes containing coarse recycled concrete aggregates using multivariate adaptive regression splines (MARS), M5 model tree (M5Tree), and least squares support vector regression (LSSVR) models. Neural Comput. Appl. 2020, 32, 295–308. [Google Scholar] [CrossRef]
Yu, Y.; Zhang, C.; Gu, X.; Cui, Y. Expansion prediction of alkali aggregate reactivity-affected concrete structures using a hybrid soft computing method. Neural Comput. Appl. 2019, 31, 8641–8660. [Google Scholar] [CrossRef]
Sezer, A.; Sezer, G.İ.; Mardani-Aghabaglou, A.; Altun, S. Prediction of mechanical and penetrability properties of cement-stabilized clay exposed to sulfate attack by use of soft computing methods. Neural Comput. Appl. 2020, 32, 16707–16722. [Google Scholar] [CrossRef]
Mardani-Aghabaglou, A.; Kankal, M.; Nacar, S.; Felekoğlu, B.; Ramyar, K. Assessment of cement characteristics affecting rheological properties of cement pastes. Neural Comput. Appl. 2021, 33, 12805–12826. [Google Scholar] [CrossRef]
Kaya, Y. Farklı Tip Klinker Öğütme Kolaylaştırıcı Katkı Kullanımının Çimentolu Sistemlerin Özelliklerine Etkisi. Master’s Thesis, Bursa Uludağ University, Bursa, Türkiye, 2022. [Google Scholar]
EN 197-1:2011; Cement-Part 1: Composition, Specifications and Conformity Criteria for Common Cements. European Committee for Standardization: London, UK, 2011.
Kaya, Y.; Öztürk, H.T.; Kobya, V.; Mardani, N.; Mardani, A. Experimentally and Modeling Assessment of Parameters Affecting Grinding Aid-Containing Cement–PCE Compatibility: CRA, MARS and AOMA-ANN Methods. Polymers 2025, 17, 1583. [Google Scholar] [CrossRef] [PubMed]
Mardani-Aghabaglou, A.; Tuyan, M.; Yılmaz, G.; Arıöz, Ö.; Ramyar, K. Effect of different types of superplasticizer on fresh, rheological and strength properties of selfconsolidating concrete. Constr. Build. Mater. 2013, 47, 1020–1025. [Google Scholar] [CrossRef]
Altun, M.; Özen, S.; Mardani Aghabaglou, A. Effect of Side Chain Length Change of Polycarboxylate-Ether-Based High-Range Water-Reducing Admixture on Properties of Cementitious Systems Containing Fly Ash. J. Mater. Civ. Eng. 2021, 33, 118427. [Google Scholar] [CrossRef]
Kobya, V.; Karakuzu, K.; Mardani, A.; Felekoğlu, B.; Ramyar, K. Effect of polycarboxylate-based water-reducing admixture chains length on portland cement-admixture compatibility. J. Sustain. Cem. Based Mater. 2024, 13, 69–86. [Google Scholar] [CrossRef]
Simon, R.; Subramanian, J.; Li, M.; Menezes, S. Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data. Brief. Bioinform. 2011, 12, 203–214. [Google Scholar] [CrossRef]
Hawkins, D.; Kraker, J. Deterministic fallacies and model validation. J. Chemom. 2010, 24, 188–193. [Google Scholar] [CrossRef]
Krstajic, D.; Buturović, L.; Leahy, D.; Thomas, S. Cross-validation pitfalls when selecting and assessing regression and classification models. J. Cheminform. 2014, 6, 10. [Google Scholar] [CrossRef]
Yates, L.; Aandahl, Z.; Richards, S.; Brook, B. Cross validation for model selection: A review with examples from ecology. Ecol. Monogr. 2023, 93, e1557. [Google Scholar] [CrossRef]
Luo, J.; Xu, S. NCART: Neural Classification and Regression Tree for tabular data. Pattern Recognit. 2024, 154, 110578. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Archer, K.J. Empirical characterization of random forest variable importance measure. Comput. Stat. Data Anal. 2008, 52, 2249–2260. [Google Scholar] [CrossRef]
Bilgin, M. Makine Öğrenmesi Teorisi ve Algoritmaları; Papatya Bilim: İstanbul, Türkiye, 2018. [Google Scholar]
Akar, Ö.; Güngör, O. Rastgele orman algoritması kullanılarak çok bantlı görüntülerin sınıflandırılması. Jeodezi Jeoinformasyon Derg. 2012, 106, 139–146. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
OpenAI. GPT-4o: Introducing GPT-4o, Our New Flagship Model. Available online: https://openai.com/index/gpt-4o-and-more-tools-to-chatgpt-free (accessed on 13 May 2024).
Kaya, Y.; Kobya, V.; Mardani, A.; Assaad, J.J. Effect of modified Triethanolamine on grinding efficiency and performance of cementitious materials. Talanta Open 2024, 9, 100293. [Google Scholar] [CrossRef]
Ma, Y.; Sha, S.; Zhou, B.; Lei, F.; Liu, Y.; Xiao, Y.; Shi, C. Adsorption and dispersion capability of polycarboxylate-based superplasticizers: A review. J. Sustain. Cem.-Based Mater. 2022, 11, 319–344. [Google Scholar] [CrossRef]
Ai, Y.; Ye, C.; Liu, J.; Zhou, M. Study on the evolution processes of keyhole and melt pool in different laser welding methods for dissimilar materials based on a novel numerical model. Int. Commun. Heat Mass Transf. 2025, 163, 108629. [Google Scholar] [CrossRef]
Assaad, J.J. Assessment of rheology variations of cement pastes containing clinker grinding aids compliant with ASTM C465. Adv. Civ. Eng. Mater. 2015, 4, 94–114. [Google Scholar] [CrossRef]
Katsioti, M.; Tsakiridis, P.E.; Giannatos, P.; Tsibouki, Z.; Marinos, J. Characterization of various cement grinding aids and their impact on grindability and cement performance. Constr. Build. Mater. 2009, 23, 1954–1959. [Google Scholar] [CrossRef]
Flatt, R.J.; Houst, Y.F. A simplified view on chemical effects perturbing the action of superplasticizers. Cem. Concr. Res. 2001, 31, 1169–1176. [Google Scholar] [CrossRef]
Ouyang, X.; Jiang, X.; Qiu, X.; Yang, D.; Pang, Y. Effect of molecular weight of sulfanilic acid-phenol-formaldehyde condensate on the properties of cementitious system. Cem. Concr. Res. 2009, 39, 283–288. [Google Scholar] [CrossRef]
Alonso, M.M.; Palacios, M.; Puertas, F. Compatibility between polycarboxylate-based admixtures and blended-cement pastes. Cement Concr. Compos. 2013, 35, 151–162. [Google Scholar] [CrossRef]

Figure 1. (a) TEA, (b) TIPA, (c) DEIPA, (d) DEG, (e) EG, (f) M-TEA-1-2, (g) PCE [39].

Figure 2. Regression modeling flowchart for clinker-aided cementitious rheology.

Figure 3. Dominant value ranking affecting viscosity and DYS predictions by the RF method.

Figure 4. Distributions of experimental values (a) Viscosity, (b) DYS.

Figure 5. The four most effective parameters for distribution (a) Residue on 45 µm, (b) Water-reducing admixture dosage, (c) Residue on 60 µm, (d) Residue on 32 µm.

Figure 6. Performance of the models in predicting viscosity values in the test phase: (a) Experimental results and NCART, (b) experimental results and RF, (c) experimental results and CNN.

Figure 7. Performance of the models in predicting DYS values in the test phase: (a) Experimental results and NCART, (b) experimental results and RF, (c) experimental results and CNN.

Table 1. Particle size distribution of cements.

Cements	Blaine Fineness (cm²/g)	Residue on 32 µm Sieve (%)	Residue on 45 µm Sieve (%)	Residue on 60 µm Sieve (%)
Control	4100	31.9	10.4	1.2
0.025 TEA	4110	26.5	7.6	0.8
0.05 TEA	4115	18	4.7	0.7
0.075 TEA	4100	16.6	4	0.5
0.1 TEA	4090	14	3.1	0.3
0.025 TIPA	4100	24.9	8.2	0.9
0.05 TIPA	4180	19.0	4.9	0.8
0.075 TIPA	4200	11.7	2.9	0.5
0.1 TIPA	4140	9.6	2.1	0.2
0.025 DEIPA	4080	25.8	9.5	0.8
0.05 DEIPA	4090	21.0	5.8	0.7
0.075 DEIPA	4100	15.2	3.2	0.5
0.1 DEIPA	4020	13.2	3.0	0.4
0.025 DEG	4110	23.8	8.2	0.9
0.05 DEG	4140	16.7	6.0	0.7
0.075 DEG	4190	12.5	4.8	0.6
0.1 DEG	4150	9.5	3.1	0.4
0.025 EG	4160	25.4	9.3	0.9
0.05 EG	4190	25.8	8.1	0.9
0.075 EG	4180	22.3	6.2	0.9
0.1 EG	4160	17.3	4.3	0.7
0.025 M-TEA-1	4120	24.6	9	0.9
0.05 M-TEA-1	4070	21	6.9	0.6
0.075 M-TEA-1	4100	14.8	3.5	0.4
0.1 M-TEA-1	4050	14.5	3.4	0.3
0.025 M-TEA-2	4080	28.2	11.2	1.1
0.05 M-TEA-2	4080	26.2	10.2	1
0.075 M-TEA-2	4060	24.4	9	0.9
0.1 M-TEA-2	4050	23.5	9	0.8

Table 2. Some chemical properties of Gas [39].

GA Name	Alkaline Content (%) (Na₂O)	Intensity (g/cm³)	Solid Content (%)	Chloride Content (%)	pH 25 °C
TEA	<10	1.095	50.0	<0.1	10.5
TIPA	<10	1.124	50.0	<0.1	10.8
DEIPA	<10	1.079	50.0	<0.1	9.7
DEG	<10	1.118	50.0	<0.1	7.2
EG	<10	1.26	50.0	<0.1	8.2

Table 3. Dynamic yield stress and viscosity values of mixtures containing different amounts of PCE.

Mixture	DYS (Pa)			FV (Pa s)
	PCE Ratio (by Weight of Cement)
	0%	0.1%	0.15%	0%	0.1%	0.15%
C	38.28	30.01	10.93	38.28	30.01	3.77
0.025 TEA	53.5	25.86	10.16	11.1	8.69	4.63
0.05 TEA	83.76	31.19	14.35	12.6	11.06	6.14
0.075 TEA	90.61	46.99	22.65	12.64	14.09	8.77
0.1 TEA	117.81	96.93	54.34	14.42	15.91	13.72
0.025 TIPA	33.91	15.69	5.31	8.78	5.98	2.66
0.05 TIPA	59.39	28.4	12.25	11.44	9.35	5.42
0.075 TIPA	105.09	68.51	32.31	12.61	13.99	9.4
0.1 TIPA	81.4	106.5	34.5	12.52	17.67	10.77
0.025 DEIPA	27.66	11.89	4.32	6.93	4.69	2.22
0.05 DEIPA	34.01	15.88	6.72	9.45	6.97	3.65
0.075 DEIPA	76.25	44.96	21.47	13.06	11.86	9.11
0.1 DEIPA	86.12	62.94	41.63	15.28	15.28	11.68
0.025 DEG	30.08	15.82	5.13	7.14	5.62	2.43
0.05 DEG	36.38	17.48	4.82	8.95	6.27	3.01
0.075 DEG	55.36	29.59	13.58	10.37	8.81	5.39
0.1 DEG	68.73	33.58	15.2	12.6	10.28	6.36
0.025 EG	24.95	14.34	5.45	7.3	5.6	2.75
0.05 EG	43.82	23.32	7.35	10.33	7.91	3.52
0.075 EG	41.29	23.8	8.36	12.05	7.83	3.85
0.1 EG	52.94	30.55	12.24	12.37	10.26	5.82
0.025 M-TEA-1	29.92	10.6	3.66	6.99	4.47	1.94
0.05 M-TEA-1	33.4	15.87	4.09	9.89	7.36	2.88
0.075 M-TEA-1	68.84	42.47	19.78	12.33	13.45	8.53
0.1 M-TEA-1	105	63.17	26.44	13.97	15.89	10.35
0.025 M-TEA-2	34.56	10.72	7.32	7.95	4.52	1.58
0.05 M-TEA-2	41.53	10.55	3.35	8.83	4.23	1.91
0.075 M-TEA-2	29.47	12.04	3.7	6.68	5.03	1.99
0.1 M-TEA-2	33.85	14.81	5.87	7.71	6.22	3.09

Table 4. Coefficients of regression models for viscosity and DYS values.

Experiment	w0	w1	w2	w3	w4	w5	w6	w7	w8	w9	w10	w11	Intercept
Viscosity	0.268	−0.883	−7.029	−0.016	22.494	−0.005	0.000	0.453	0.107	0.670	0.670	−24.778	9.9780
DYS	13.204	−4.451	−60.936	3.9518	15.226	−7.547	0.000	−5.180	14.291	6.187	6.185	−43.235	65.893

Table 5. CNN results for viscosity and DYS values.

Experiment	Number of Conv. Layer Neurons	Number of Hidden Layers	Number of Hidden Layer Neurons	MAE		RMSE		R-Squared		Logcosh
				Train	Test	Train	Test	Train	Test	Train	Test
Viscosity	128	2	32-16	1.96	2.40	2.43	2.76	0.61	0.28	1.41	1.78
	128	3	32-16-8	2.01	2.44	2.47	2.85	0.60	0.24	1.46	1.85
	64	3	32-16-8	2.92	3.10	3.50	3.61	0.18	0.01	2.32	2.47
	32-16	3	16-8-4	3.30	3.47	3.95	4.01	0.01	0.01	2.68	2.83
	16-8	3	32-16-8	3.31	3.48	3.96	4.02	0.01	0.01	2.69	2.84
DYS	128	2	32-16	15.07	19.02	19.99	23.00	0.47	0.01	14.41	18.33
	128	3	32-16-8	20.78	21.11	28.35	26.67	0.01	0.01	20.12	20.44
	64	3	32-16-8	20.54	20.92	28.09	26.39	0.01	0.01	19.87	20.25
	32-16	3	16-8-4	20.80	21.12	28.39	26.67	0.01	0.01	20.13	20.44
	16-8	3	32-16-8	20.80	21.10	28.38	26.65	0.01	0.01	20.13	20.42

Table 6. RF results for viscosity and DYS values.

Metrics	MAE		RMSE		R-Squared		Logcosh
Metrics	Train	Test	Train	Test	Train	Test	Train	Test
Viscosity	0.68	1.23	0.86	1.54	0.95	0.80	0.29	0.73
DYS	6.20	8.97	8.59	12.11	0.90	0.76	5.55	8.32

Table 7. NCART results for viscosity and DYS values.

Metrics	Parameter			MAE		RMSE		R-Squared		Logcosh
Metrics	Tree	Layer	Select	Train	Test	Train	Test	Train	Test	Train	Test
Viscosity	2	2	4	0.22	1.04	0.38	1.33	0.98	0.84	0.06	0.57
	2	2	6	0.06	1.20	0.09	1.58	0.99	0.79	0.005	0.73
	6	2	6	0.05	1.28	0.07	1.68	0.99	0.76	0.003	0.80
DYS	6	2	6	0.62	8.73	1.26	11.50	0.99	0.77	0.31	8.09
	2	4	4	0.57	9.65	0.84	13.42	0.99	0.66	0.27	9.00
	6	4	6	0.73	10.42	1.32	13.82	0.99	0.64	0.39	9.76

Table 8. Best results obtained for viscosity and DYS parameters.

Model	MAE		RMSE		R-Squared		Logcosh
Model	Train	Test	Train	Test	Train	Test	Train	Test
Viscosity
NCART	0.22	1.04	0.38	1.33	0.98	0.84	0.06	0.57
RF	0.68	1.23	0.86	1.54	0.95	0.80	0.29	0.73
CNN	1.96	2.40	2.43	2.76	0.61	0.28	1.41	1.78
DYS
NCART	0.62	8.73	1.26	11.50	0.99	0.77	0.31	8.09
RF	6.20	8.97	8.59	12.11	0.90	0.76	5.55	8.32
CNN	15.07	19.02	19.99	23.00	0.47	0.01	14.41	18.33

Table 9. Best results obtained for viscosity and DYS parameters using augmented data.

Model	MAE		RMSE		R-Squared		Logcosh
Model	Train	Test	Train	Test	Train	Test	Train	Test
Viscosity
NCART	0.22	1.04	0.38	1.33	0.98	0.84	0.06	0.57
NCART-LLM	0.37	0.53	0.58	0.78	0.97	0.95	0.11	0.21
RF	0.68	1.23	0.86	1.54	0.95	0.80	0.29	0.73
RF-LLM	0.78	0.97	0.97	1.20	0.93	0.89	0.35	0.50
CNN	1.96	2.40	2.43	2.76	0.61	0.28	1.41	1.78
CNN-LLM	0.70	0.75	0.97	1.04	0.93	0.91	0.32	0.36
DYS
NCART	0.62	8.73	1.26	11.50	0.99	0.77	0.31	8.09
NCART-LLM	1.11	3.29	1.66	5.01	0.99	0.95	0.65	2.84
RF	6.20	8.97	8.59	12.11	0.90	0.76	5.55	8.32
RF-LLM	5.30	6.88	7.12	9.52	0.93	0.85	4.66	6.24
CNN	15.07	19.02	19.99	23.00	0.47	0.01	14.41	18.33
CNN-LLM	2.54	3.27	4.57	5.43	0.97	0.95	2.01	2.71

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Eser, M.; Kaya, Y.; Mardani, A.; Bilgin, M.; Bozdemir, M. Enhancing the Analysis of Rheological Behavior in Clinker-Aided Cementitious Systems Through Large Language Model-Based Synthetic Data Generation. Materials 2025, 18, 3579. https://doi.org/10.3390/ma18153579

AMA Style

Eser M, Kaya Y, Mardani A, Bilgin M, Bozdemir M. Enhancing the Analysis of Rheological Behavior in Clinker-Aided Cementitious Systems Through Large Language Model-Based Synthetic Data Generation. Materials. 2025; 18(15):3579. https://doi.org/10.3390/ma18153579

Chicago/Turabian Style

Eser, Murat, Yahya Kaya, Ali Mardani, Metin Bilgin, and Mehmet Bozdemir. 2025. "Enhancing the Analysis of Rheological Behavior in Clinker-Aided Cementitious Systems Through Large Language Model-Based Synthetic Data Generation" Materials 18, no. 15: 3579. https://doi.org/10.3390/ma18153579

APA Style

Eser, M., Kaya, Y., Mardani, A., Bilgin, M., & Bozdemir, M. (2025). Enhancing the Analysis of Rheological Behavior in Clinker-Aided Cementitious Systems Through Large Language Model-Based Synthetic Data Generation. Materials, 18(15), 3579. https://doi.org/10.3390/ma18153579

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing the Analysis of Rheological Behavior in Clinker-Aided Cementitious Systems Through Large Language Model-Based Synthetic Data Generation

Abstract

1. Introduction

2. Experimental Study

2.1. Materials

2.2. Preparation of Mixtures

2.3. Method

3. Modeling Methodology

3.1. Neural Classification and Regression Tree (NCART)

3.2. Random Forest (RF)

3.3. Convolutional Neural Network (CNN)

3.4. LLMs (Large Language Models)

3.5. Evaluation of Model Performance

4. Results and Discussion

4.1. Determining the Dominant Parameters

4.2. Factors Affecting Cement–Admixture Compatibility

4.2.1. Cement Fineness and Its Role in Cement–PCE Compatibility

4.2.2. Grinding Aid Properties and Their Effect on Cement Rheology

4.2.3. Structural and Chemical Factors of PCEs Affecting Cement Rheology

4.3. Dataset Power Analysis

5. Modeling Results

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI