Leveraging Machine Learning for Designing Sustainable Mortars with Non-Encapsulated PCMs

: The development and understanding of the behavior of construction materials is extremely complex due to the great variability of raw materials that can be used, which becomes even more challenging when functional materials, such as phase-change materials (PCM), are incorporated. Currently, we are witnessing an evolution of advanced construction materials as well as an evolution of powerful tools for modeling engineering problems using artificial intelligence, which makes it possible to predict the behavior of composite materials. Thus, the main objective of this study was exploring the potential of machine learning to predict the mechanical and physical behavior of mortars with direct incorporation of PCM, based on own experimental databases. For data preparation and modelling process, the cross-industry standard process for data mining, was adopted. Seven different models, namely multiple regression, decision trees, principal component regression, extreme gradient boosting, random forests, artificial neural networks, and support vector machines, were implemented. The results show potential, as machine learning models such as random forests and artificial neural networks were demonstrated to achieve a very good fit for the prediction of the compressive strength, flexural strength, water absorption by immersion, and water absorption by capillarity of the mortars with direct incorporation of PCM.


Introduction
The development of construction materials is extremely complex due to the enormous amount of different raw materials that constitute them and the influence that these have on their properties.If functional materials are added, the degree of complexity increases significantly, as these can largely influence their basic properties and play a leading role in their performance in buildings.Thus, it becomes essential to resort to techniques that help us in decision-making during the formulation and development of new and advanced construction materials.
Phase-change materials (PCM) incorporated into construction materials are still a developing area, confirmed by the increasing number of scientific publications on this subject in different topics.Also, in the construction industry, PCM has been attracting enormous interest from the scientific community, once again related to a growing number of scientific publications, mainly verified in recent years [1].Until now, PCMs have been considered one of the most viable strategies to energy saving, since they can be incorporated into the most varied construction materials, with applications in different building constructive solutions [2][3][4][5][6].Currently, several incorporation techniques and different types of PCM have been used.So far, a large part of the research work carried out has focused on the encapsulation technique [1], using PCM microcapsules or macrocapsules embedded in concrete [7], mortars [8,9], panels [10], and bricks [11].The use of the direct incorporation technique is still an underdeveloped area.However, it has advantages from an environmental and economic point of view.Since PCM does not need any additional treatment, it can be used in its pure and free state and is also about six times cheaper compared to a microencapsulated solution.Thus, considering that in some studies, the cost of the construction materials doped with PCM is very high, compromising the practical application of the technology [12], it is extremely important and useful to optimize the amount of PCM to be used in construction materials, mainly in mortars for interior coating, which constitute one of the preferred practical applications for thermal storage technology implementation.
As we see the evolution of advanced construction materials, we also see the evolution of powerful tools for modeling engineering problems [13].New digital technologies, such as artificial intelligence, make it possible to predict behavior in composite materials [14][15][16][17][18][19][20].Prediction of material properties and process optimization are the main areas in which machine learning in the field of materials science is gaining popularity due to its advantages [21] and the possibility of practical application.However, before optimization, it is necessary to implement suitable forecast models, thus allowing the prediction of the effect that the presence and content of each raw material will have on the performance of the mortars.Chou and Tsai [19] proposed a hierarchical classification and regression approach for predicting the compressive strength of high-performance concrete, concluding that the new approach outperforms conventional flat prediction models.Yaseen et al. [20] used an extreme learning machine model to predict the compressive strength of foamed concrete, concluding that the extreme learning machine exhibited the most accurate predictions compared to other algorithms (multivariate adaptive regression spline, M5 tree model, and support vector machine).Young et al. [17] developed a method for predicting concrete compressive strength using three different machine learning methods (neural networks (ANN), support vector machines (SVM), and decision trees) through a laboratory-and industry-scale concrete-mixture designs database.The results allowed the conclusion that the predictive accuracy of the models was higher for laboratory-fabricated concrete compared to the industry-scale concrete mixtures.
To date, few studies have focused on predicting the mechanical properties of construction materials with PCM integration.However, these studies focus on concrete and mortars functionalized with PCM microcapsules.Marani and Nehdi [22] applied different machine learning models (random forest, extra trees, gradient boosting, and extreme gradient boosting) to predict the compressive strength of cementitious composites incorporating PCM microcapsules.For this, they used an experimental database built from the open literature.The results allowed obtaining machine learning models with accuracy in predicting the compressive strength within the range of 0.93 to 0.97.However, more comprehensive and specific experimental studies are needed to define the importance of different parameters and obtain a better view of the main aspects of materials science.Later, Marani et al. [23] developed a unified concrete-mixture design framework with microencapsulated PCM using a novel ternary machine learning paradigm.The authors used a tabular generative adversarial network to generate a large synthetic-mixture design database based on the limited available experimental observations.The test results allowed them to observe that the gradient boosting regressor model trained on the synthetic data outperformed the model trained on the real data.Cunha et al. [24] developed a study in which they predicted the compressive and flexural strength of mortars incorporating PCM microcapsules subjected to different temperatures.They used different data mining techniques, such as ANN, SVM, and multiple linear regressions (MLR), concluding that ANN models have the best predictive capacity for compressive and flexural strength.
Currently, there are few studies on the prediction of physical and mechanical properties of mortars with PCM incorporation using machine learning models.These studies are particularly related to the incorporation of PCM microcapsules into cementitious composites, mainly focusing on their mechanical behavior [22][23][24][25].The application of prediction models to mortars incorporating non-encapsulated PCM using the direct incorporation technique continues to be an under-developed area, as does the prediction of properties related to the physical behavior of mortars.Predicting the physical properties of mortars functionalized with PCM is extremely important since the parameters related to their porosity, such as water absorption by capillary and immersion, greatly influence the thermal performance of this type of material [8,12].On the other hand, existing studies relating to the prediction of the mechanical behavior of mortars incorporating phase-change materials are still scarce and are only related to mortars activated with PCM microcapsules.Thus, this work intends to fill some of the gaps currently existing in this area of knowledge.
The originality and novelty of this paper are briefly summarized as follows: • Prediction of the mechanical and physical characteristics of mortars with incorporation of non-encapsulated PCM, through the technique of direct incorporation technique, which is unprecedented in this field;

•
In-depth comparative analysis of how each implemented model is able to understand the relationships between variables and how they affect the behavior of mortars with direct incorporation of PCM, which contributes to the state of knowledge in both ML and PCM-enhanced mortar fields; • Utilization of own experimental databases from an experimental campaign in which a novel PCM incorporation technique was studied.
The developed work displays a significant contribution to the field of sustainability, mainly concerning the construction sector.This is particularly because mortars incorporating phase-change materials contribute significantly to improve the energy efficiency of buildings, reducing the use of fossil fuels and emissions of CO 2 .On the other hand, the use of tools for predicting the behavior of these mortars not only has an innovative aspect in this area, but also allows for a significant advancement in knowledge.The adoption of machine learning models, based on existing knowledge, enables the prediction of which components have the greatest impact on the mortar formulation.This constitutes fundamental knowledge, facilitating the practical implementation of this type of construction materials in the construction industry.

Raw Materials and Mortars Design
Gypsum-based mortars, cement-based mortars, and cement-and fly ash-based mortars, activated with different PCM contents (0%, 2.5%, 5%, 7.5%, 10%, and 20% of aggregate volume) were developed (Table 1).In this way, it was possible to experimentally obtain a broader knowledge about the behavior of non-encapsulated PCM incorporation into mortars formulated based on different binders.
The binders used were produced by Portuguese companies.Thus, the cement was supplied by Secil (Lisboa, Portugal) and the gypsum by Sival (Leiria, Portugal).The fly ash used was produced in a Portuguese thermoelectric coal-fired power plant.The fibers used are based on polyamide, with a length of 6 mm, and acted as a shrinkage control agent and supplied by a company (Weber, Aveiro, Portugal).The superplasticizer used is based on polyacrylate, allowing to control the water/binder ratio used in mortars, and was supplied by the BASF company (Lisboa, Portugal).
The aggregate used has a natural origin.Sand 1, supplied by Weber, presents a minimum dimension of 0.063 mm and maximum dimension of 0.5 mm, a D10 of 105 µm, a D50 of 310 µm, and a D90 of 480 µm.Sand 2, supplied by Extractopuro, Lda.(Santarém, Portugal), presents a minimum dimension of 0.125 mm and maximum dimension of 4 mm, a D10 of 162.5 µm, a D50 of 0.7 mm, and a D90 of 2.8 mm.Finally, the non-encapsulated PCM used is a paraffin, with temperature transition of 22 • C, enthalpy of 200 kJ/kg, and maximum operating temperature of 50 • C, providing thermal storage capacity to the developed mortars.The PCM was supplied by the German company Rubitherm (Berlin, Germany).The raw materials density is presented in Table 2.

Experimental Methods
The databases used for the development of this work were based on experimental tests.Physical and mechanical properties of different mortars were determined, according to the same test procedures.The performance of the mortars was determined after 28 days, and their curing procedure was carried out in accordance with European standardization, EN 1015-11 [26].
The physical behavior of mortars was determined based on their water absorption properties, namely the coefficient of water absorption by capillarity and water absorption by immersion.
The water absorption tests were carried out in accordance with the European standard EN 1015-18 [27].A total of 85 samples were carried out.The test specimens used have dimensions of 40 × 40 × 160 mm 3 .After 28 days of curing, the specimens were dried in an oven at a temperature of 60 ± 5 • C until they reached a constant mass.The lateral surfaces of the specimens were coated with a silicone to ensure that water contact occurred only on the specimen's inferior face.After the side waterproofing dried, the specimens were placed in contact with a water layer of approximately 6 mm.The weights of the specimens were recorded after 10 and 90 min of contact with water.The coefficient of water absorption by capillarity was determined based on Equation (1).
where C-coefficient of water absorption by capillarity; (kg/m 2 .min 0.5 ); M 2 -sample mass at 90 min of water contact (g); M 1 -sample mass at 10 min of water contact (g).
The water absorption by immersion tests were performed in accordance with the Portuguese specification E LNEC 394 [28].Test specimens with dimensions of 40 × 40 × 160 mm 3  were developed, with a total of 101 samples being developed.First, the samples were dried in an oven at 60 • C until they reached a constant mass (m 3 ).Next, the specimens were submerged in water at approximately 20 • C under atmospheric pressure to obtain the saturated mass (m 1 ).Finally, the hydrostatic mass was measured by weighing the sample in water (m 2 ).The water absorption by immersion was determined based on Equation (2).
where hydrostatic sample mass (g); m 3 -dry sample mass (g).The mechanical behavior of mortars was determined based on their flexural strength and compressive strength.The experimental tests were carried out in accordance with the European standard EN 1015-11 [26].The flexural and compressive tests were conducted using load control at a speed of 50 N/s and 150 N/s, respectively, at 28 days of curing.A total of 66 samples to determine flexural strength, and 126 samples to determine compressive strength were developed.The flexural strength was determined based on Equation (3), and the compressive strength was determined based on Equation (4).

Data Processing and Predictive Models
Together with the formulations presented in Table 1, the results obtained in the experimental campaigns for the mechanical and physical properties of the mortar comprise the ground truth for the AI-based prediction of the mortar behavior.As each formulation will correspond to different resulting properties, these essentially comprise the database for the ML model training and testing procedures.Four main properties were assessed during the experimental campaign, which correspond to the dependent variables in the models, namely compressive and flexural strength concerning mechanical properties and water absorption by capillarity and by immersion regarding physical properties of the mortars.
With respect to the preparation of data and the modelling process, the cross-industry standard process for data mining (CRISP-DM) was adopted [29] to implement a systematic tool-and industry-neutral approach for the analysis of the data and the training and testing of the predictive model.The CRISP-DM process involves an iterative cycle with six stages, ranging from understanding the needs and goals of the project and understanding and preparing the data to the modelling, evaluation, and implementation of the models (Figure 1).absorption by capillarity and by immersion regarding physical properties of the mortars.
With respect to the preparation of data and the modelling process, the cross-industry standard process for data mining (CRISP-DM) was adopted [29] to implement a systematic tool-and industry-neutral approach for the analysis of the data and the training and testing of the predictive model.The CRISP-DM process involves an iterative cycle with six stages, ranging from understanding the needs and goals of the project and understanding and preparing the data to the modelling, evaluation, and implementation of the models (Figure 1).Throughout this process, the model training and evaluation stages featured the application of several different regression models to gain some insight on which algorithms better fit the data.In total, for each mortar property, comprising a dependent variable, the initial study included the implementation of seven different models, namely multiple regression (MR), decision trees (DT), principal component regression (PCR), extreme gradient boosting (xGB), random forests (RF), artificial neural networks (NN), and support vector machines (SVM).The package rminer [30] for R [31] was used to derive the results.Given that the capacity for generalization is a critical factor for future application and model evaluation, a five-run cross-validation method was implemented.A k-fold value of ten was selected due to the relatively small dataset size.This entailed assessing the data across the entire training set by partitioning it into ten folds.The model was subsequently trained ten times, each time reserving a different fold as the testing dataset, thus maximizing the use of the available data [32].
A noteworthy aspect at this stage is related to the fact that, as depicted by Figure 1, the modelling process typically preconized by CRISP-DM is iterative.This implies a constant analysis of the quality and predictive capabilities of a model in function of the data under which it was trained, together with the iterative search for the best combination of variables used for the training process (i.e., independent variables).This process was supported by both expert knowledge in the field of PCM-enhanced mortars and associated experimental campaigns and by the analysis of several metrics representative of model quality.Metricwise model assessment was achieved by resorting to not only the correlation between the observed and the predicted values but also the value of the error defining the degree of learning of a given model [33].Two main metrics were used: the correlation coefficient (R 2 ) and the root mean squared error (RMSE), calculated according to Equation (5).
where y-the computed network output vector; ŷ-the target output vector; N-the number of samples in the database.
In addition, very good insight on the quality of different models could easily be attained by the analysis of their corresponding regression error characteristic (REC) curves as well as through sensitivity analysis concerning the importance of each variable on the predictive capability of each model.The latter analysis is also highly valuable towards promoting the interpretability of the models, allowing for a better understanding of what has been learned by each one, potentially increasing trust on the corresponding model.Table 3 shows the different combinations of independent variables adopted throughout the model training iterations.These were the result of the iterative process of analyzing the significance of each variable both throughout the study and according to the expert knowledge in the field.Hence, the first variation of data corresponds to the use of all the variables ("allVars") associated with binder type as well as contents of gypsum, cement, fly ash, sand 1 and sand 2, superplasticizer, fiber, PCM, and water (see Section 2 for the properties of each material).In turn, database variations denominated "noFibers", "noFibers.SP", and "noFibers.SP.Water" correspond to the accumulated removal of the variables associated with fiber content, superplasticizer content, and water content, respectively.

Mechanical Properties
As previously stated, the implementation of the CRISP-DM methodology involved the training and testing of several different ML models, ranging from simpler MR methods that are mainly adopted for comparison purposes to more complex models such as ANN, SVM, and RF.The exploration of these different models was accompanied by the associated iterations concerning the features depicted in Table 3.One of the advantages of iterating different features over several models is that it allows for a better understanding of their ability to fit a problem, by analyzing their performances in function of the resulting metrics.Thus, the same analysis sequence is followed for all models concerning the four main assessed mortar properties (compressive and flexural strength and water absorption by capillarity and by immersion) throughout Section 3.This sequence begins with the comparison of the predictive performance of all models for each given mortar property.This comparison is then followed by a selection of the ones featuring a better fit, which, in turn, are then analyzed in more detail.
In this context, concerning the mechanical properties of PCM mortars, specifically unilateral compressive strength (UCS), Table 4 shows a matrix-like distribution of the model assessment metrics described by Equation (1) across the several adopted models and feature selection alternatives.From the analysis of this Table, one can easily infer that, according to the resulting R 2 and RMSE metrics, the ANN seems to have the better fit of all the models for the combination of data corresponding to "allVars" (highest R 2 of 0.98, with lowest RMSE of 1.03), closely followed by the RF model (0.97 R 2 and 1.17 RMSE).Given this, the regression error characteristic (REC) curve for the "allVars" data was drawn, as a way to provide validation on the previous analysis as well as additional insight on the behavior of these models for this database variation (Figure 2).The REC curves corroborate the findings related to Table 4, showing that the ANN outperforms the other models, namely in terms of the area under curve (AUC), closely followed by the RF model.with lowest RMSE of 1.03), closely followed by the RF model (0.97 R 2 and 1.17 RMSE).Given this, the regression error characteristic (REC) curve for the "allVars" data was drawn, as a way to provide validation on the previous analysis as well as additional insight on the behavior of these models for this database variation (Figure 2).The REC curves corroborate the findings related to Table 4, showing that the ANN outperforms the other models, namely in terms of the area under curve (AUC), closely followed by the RF model.Accordingly, these results prompted a more in-depth analysis of the performance of the ANN and RF models for the prediction of UCS, which was realized through the plotting of the values predicted by the model during its testing phase vs. the actual values obtained during the experimental campaign, representing the ground truth for the models.In these plots, illustrated by Figure 3a,b for the ANN and the RF models, respectively, it is evident that the closer the points are to the diagonal line, the better the fit and, consequently, the higher the R² value.The figures reveal that both models effectively replicated the behavior of the target variable (UCS), particularly in the lower-to-middle range (i.e., UCS values up to 20 MPa), though the values at the upper range (i.e., above 40 MPa) were slightly over or underestimated.This discrepancy is attributed to the lower number of records in this upper range in the database, which is anticipated to improve as the database expands during future experimental campaigns.Regardless, the ANN still seems to be slightly more able to provide a relatively close estimation of these values at these upper ranges.Accordingly, these results prompted a more in-depth analysis of the performance of the ANN and RF models for the prediction of UCS, which was realized through the plotting of the values predicted by the model during its testing phase vs. the actual values obtained during the experimental campaign, representing the ground truth for the models.In these plots, illustrated by Figure 3a,b for the ANN and the RF models, respectively, it is evident that the closer the points are to the diagonal line, the be er the fit and, consequently, the higher the R² value.The figures reveal that both models effectively replicated the behavior of the target variable (UCS), particularly in the lower-to-middle range (i.e., UCS values up to 20 MPa), though the values at the upper range (i.e., above 40 MPa) were slightly over or underestimated.This discrepancy is a ributed to the lower number of records in this upper range in the database, which is anticipated to improve as the database expands during future experimental campaigns.Regardless, the ANN still seems to be slightly more able to provide a relatively close estimation of these values at these upper ranges.Another significant aspect for consideration is the relative importance of the variables for both models, shown in Figure 4.The figure illustrates how significant each of the used variables (in this case corresponding to the "allVars" database variation) is for each model's prediction of UCS.It is noteworthy that, similarly to both models, the variables related to the contents of sand (both sand 1 and sand 2) and cement are considered among the most rele- Another significant aspect for consideration is the relative importance of the variables for both models, shown in Figure 4.The figure illustrates how significant each of the used variables (in this case corresponding to the "allVars" database variation) is for each model's prediction of UCS.It is noteworthy that, similarly to both models, the variables related to the contents of sand (both sand 1 and sand 2) and cement are considered among the most relevant.Bearing in mind that the parameter being predicted is UCS, it is indeed intuitive that the coarser material, especially sand 2, as it is coarser than sand 1, is thus likely to have a greater impact on compressive strength, together with the main binding agent.make sense in many cases (depending on the type of polymer and the length and width of the fiber stripes), as mixing fibers into aggregates typically results in a more even distribution of stresses and increased ductility, which may result in higher compressive strength, the experimental campaign results did not emphasize this.In fact, while the presence of fibers may have increased ductility and even tensile strength, the direct analysis of experimental results indicates that compressive strength was not affected by it.
Conversely, the RF model assigns only a minimum importance to the content of PCM, which appears to be undervalued when facing the expert knowledge expectation.This expectation pertains to the fact that the addition of PCM, especially when directly incorporated in the form of a paraffin (as was the case throughout the experimental campaign), delays the hydration process of the binders, which ultimately leads to a reduction in mechanical performance in most cases.Despite this, overall UCS results seem to indicate that, even though both models a ained a very good performance in terms of metrics, the prioritization of variable influence seems to be slightly more intuitive in the case of RF when compared to expert knowledge in the field.As far as mechanical properties are concerned, the other PCM-enhanced mortar parameter studied in this work was flexural strength.Similarly to the process adopted concerning UCS, the first step taken in the analysis of flexural strength was metrics-based, as  Yet, whereas the ANN model seems to follow the more conservative approach in terms of variation of importance between variables, the RF model seems to be more assertive, nearly neglecting the contributions of aspects such as the content of fibers, fly ash, binder type, or gypsum in favor of a higher significance of sand and cement content, which is more in line with the expert knowledge in the field.In addition, the ANN model seems to allocate a high level of importance to the presence of fibers.Although this can make sense in many cases (depending on the type of polymer and the length and width of the fiber stripes), as mixing fibers into aggregates typically results in a more even distribution of stresses and increased ductility, which may result in higher compressive strength, the experimental campaign results did not emphasize this.In fact, while the presence of fibers may have increased ductility and even tensile strength, the direct analysis of experimental results indicates that compressive strength was not affected by it.
Conversely, the RF model assigns only a minimum importance to the content of PCM, which appears to be undervalued when facing the expert knowledge expectation.This expectation pertains to the fact that the addition of PCM, especially when directly incorporated in the form of a paraffin (as was the case throughout the experimental campaign), delays the hydration process of the binders, which ultimately leads to a reduction in mechanical performance in most cases.Despite this, overall UCS results seem to indicate that, even though both models attained a very good performance in terms of metrics, the prioritization of variable influence seems to be slightly more intuitive in the case of RF when compared to expert knowledge in the field.
As far as mechanical properties are concerned, the other PCM-enhanced mortar parameter studied in this work was flexural strength.Similarly to the process adopted concerning UCS, the first step taken in the analysis of flexural strength was metrics-based, as detailed in Table 5.Although the observation of the metrics seems to indicate that the SVM model is capable of obtaining a slightly higher performance under the "noFibers" database variation (0.84 R 2 and 1.01 RMSE), the latter also seems to be accompanied with a slightly worse performance regarding every other model when compared to the "allVars" database.Moreover, considering that expert knowledge in the field indicates that the inclusion of fibers in mortars enhances flexural strength by increasing ductility and tensile strength (due to fibers' ability to bridge cracks forming under tensile stress), the approach adopted for analyzing these parameters was to resort to the "allVars" database.The drive behind this choice is related to the fact that the inclusion of the additional variables (in this case related to fibers) can potentially provide added insights, particularly in what concerns to the relative importance of variables.
Ensuing this decision, the REC curves depicted in Figure 5 were assessed in order to confirm the metrics-based indication that the SVM, ANN, and RF models outperform most of their peers, with the SVM featuring a slightly higher AUC.The seemingly higher performance of the SVM model is further supported by the predicted vs. actual values plots presented in Figure 6, at least in the lower-to-mid range of values (i.e., below 10 MPa).
Bearing in mind that flexural strength features a component related to compression and another related to tensile strength, the expectation on the relative importance of variables (Figure 7) is that not only should the sand and cement content continue to display a high significance on the results (as the main contributors to compressive strength), but the fact that the cement, together with the fiber content, are the major factors influencing tensile strength should enhance their relative importance further.
Sustainability 2024, 16, x FOR PEER REVIEW 10 of 20 detailed in Table 5.Although the observation of the metrics seems to indicate that the SVM model is capable of obtaining a slightly higher performance under the "noFibers" database variation (0.84 R 2 and 1.01 RMSE), the la er also seems to be accompanied with a slightly worse performance regarding every other model when compared to the "allVars" database.Moreover, considering that expert knowledge in the field indicates that the inclusion of fibers in mortars enhances flexural strength by increasing ductility and tensile strength (due to fibers' ability to bridge cracks forming under tensile stress), the approach adopted for analyzing these parameters was to resort to the "allVars" database.The drive behind this choice is related to the fact that the inclusion of the additional variables (in this case related to fibers) can potentially provide added insights, particularly in what concerns to the relative importance of variables.Ensuing this decision, the REC curves depicted in Figure 5 were assessed in order to confirm the metrics-based indication that the SVM, ANN, and RF models outperform most of their peers, with the SVM featuring a slightly higher AUC.The seemingly higher performance of the SVM model is further supported by the predicted vs. actual values plots presented in Figure 6, at least in the lower-to-mid range of values (i.e., below 10 MPa).Bearing in mind that flexural strength features a component related to compression and another related to tensile strength, the expectation on the relative importance of variables (Figure 7) is that not only should the sand and cement content continue to display a high significance on the results (as the main contributors to compressive strength), but the fact that the cement, together with the fiber content, are the major factors influencing tensile strength should enhance their relative importance further.
In this context, and similarly to the UCS case, the ANN model output a more conservative approach once again, while correctly identifying the sand content (especially content is obviously important in the mechanical behavior of mortar, favoring this factor in detriment of those most typically related to mechanical performance hinders the generalization potential of this model, even though its assessment metrics were among the best of all models.In summary, the more conservative approach that characterized the ANN model comes across as the best fit for the estimation of the flexural strength behavior of PCM-enhanced mortars.

Physical Properties
As mentioned, the physical properties of PCM-enhanced mortars that were considered in this study were water absorption by capillarity and by immersion.Beginning with the former and following the same methodology adopted in the previous subsection on mechanical properties, Table 6 pertains to the assessment of the seven implemented models for the different database combinations.Once again, though some of the models can perform well over all databases, namely ANN, RF, and, to a slightly lesser extent, SVM, it is clear that there is no clear gain in adopting one of the less encompassing database variations in detriment of the "allVars" variation for this parameter.In fact, except for the RF model, which seems to have a slight increase in R 2 of 1% for the "noFibers" variation, the performance of these models tends to decrease with the reduction of the number of variables, providing an indication that all variables are relevant for the prediction of water absorption by capillarity.Figure 8, featuring the comparison between the REC curves of the models, seems to support the claim that the ANN model displays the best fit for this parameter, followed by the competing RF and SVM models.In this context, and similarly to the UCS case, the ANN model output a more conservative approach once again, while correctly identifying the sand content (especially sand 2), the cement content, and the fibers as highly relevant, aligning with the expectations.Concurrently, the RF model also performed similarly to the UCS case, providing a more assertive choice of most important factors, namely both sands and especially cement content, which fits the expert knowledge.Still, this was achieved at the expense of other factors that seem to be undervalued, specifically the presence of fibers.The SVM model, however, while behaving similarly to the RF in terms of selection assertiveness, allocated an extremely high important to the water content.Notwithstanding the fact that water content is obviously important in the mechanical behavior of mortar, favoring this factor in detriment of those most typically related to mechanical performance hinders the generalization potential of this model, even though its assessment metrics were among the best of all models.In summary, the more conservative approach that characterized the ANN model comes across as the best fit for the estimation of the flexural strength behavior of PCM-enhanced mortars.

Physical Properties
As mentioned, the physical properties of PCM-enhanced mortars that were considered in this study were water absorption by capillarity and by immersion.Beginning with the former and following the same methodology adopted in the previous subsection on mechanical properties, Table 6 pertains to the assessment of the seven implemented models for the different database combinations.Once again, though some of the models can perform well over all databases, namely ANN, RF, and, to a slightly lesser extent, SVM, it is clear that there is no clear gain in adopting one of the less encompassing database variations in detriment of the "allVars" variation for this parameter.In fact, except for the RF model, which seems to have a slight increase in R 2 of 1% for the "noFibers" variation, the performance of these models tends to decrease with the reduction of the number of variables, providing an indication that all variables are relevant for the prediction of water absorption by capillarity.Figure 8, featuring the comparison between the REC curves of the models, seems to support the claim that the ANN model displays the best fit for this parameter, followed by the competing RF and SVM models.In what concerns the predicted vs. actual value plot analysis, depicted in Figure 9, the ability of ANN to predict the behavior of the mortar in terms of water absorption by capillarity over the entire range of the data is noteworthy.Indeed, even at the upper ranges, which are characterized by a lack of data, the ANN shows a very good fit to the data, corroborating the high R 2 with low RMSE that characterized this model.This is further validated by the fact that its selection of variables in what concerns their relative importance (Figure 10) seems to be very reasonable, as it identified water content and the finer materials, such as the sand content, especially sand 1.It also assigned a moderate importance to superplasticizer and PCM, which fits the expert knowledge in the field.Indeed, whereas the former reduces the amount of water typically added to the mortar mixes (which in turn translates into lower porosity and thus less water absorption by capillarity), the la er tends to enfold the aggregate components of the mixes, especially when directly incorporated in the form of paraffin, hindering the amount of water absorbed by capillarity.The duality between superplasticizer and water content also seems to have been identified by the RF and the SVM models, although these tended to favor the superplasticizer and the water content (respectively) individually much more than its counterpart.Thus, the analysis of results concerning water capillarity confirms the ANN model's effective fit to the prediction of this parameter.This is further validated by the fact that its selection of variables in what concerns their relative importance (Figure 10) seems to be very reasonable, as it identified water content and the finer materials, such as the sand content, especially sand 1.It also assigned a moderate importance to superplasticizer and PCM, which fits the expert knowledge in the field.Indeed, whereas the former reduces the amount of water typically added to the mortar mixes (which in turn translates into lower porosity and thus less water absorption by capillarity), the latter tends to enfold the aggregate components of the mixes, especially when directly incorporated in the form of paraffin, hindering the amount of water absorbed by capillarity.The duality between superplasticizer and water content also seems to have been identified by the RF and the SVM models, although these tended to favor the superplasticizer and the water content (respectively) individually much more than its counterpart.Thus, the analysis of results concerning water capillarity confirms the ANN model's effective fit to the prediction of this parameter.
The second and final physical property of mortars with direct incorporation of PCM at issue in this work is water absorption by immersion.Beginning once again with the interpretation of the assessment metrics of the implemented models, detailed in Table 7, one can immediately infer that the overall values for R 2 and RMSE are lower in comparison with the metrics obtained in the study of other mortar parameters (both mechanical and physical).This is likely to be related to a much higher dispersion in the results of the experimentally tested samples, as a consequence of the interaction between the directly incorporated PCM and the mortar aggregates.As a matter of fact, the direct incorporation of PCM into the mortar has the tendency to result in the aggregates being enfolded by the PCM paraffin in several layers, which are randomly distributed throughout the mortar.In turn, this comprises a major factor contributing to a high variation in results concerning the absorption of water by immersion, ultimately resulting in a hindrance to the accurate estimation of this aspect in the studied mortars.Naturally, as the experimental campaign proceeds towards gathering additional data, this hindrance is expected to be gradually mitigated over time.The second and final physical property of mortars with direct incorporation of PCM at issue in this work is water absorption by immersion.Beginning once again with the interpretation of the assessment metrics of the implemented models, detailed in Table 7, one can immediately infer that the overall values for R 2 and RMSE are lower in comparison with the metrics obtained in the study of other mortar parameters (both mechanical and physical).This is likely to be related to a much higher dispersion in the results of the experimentally tested samples, as a consequence of the interaction between the directly incorporated PCM and the mortar aggregates.As a ma er of fact, the direct incorporation of PCM into the mortar has the tendency to result in the aggregates being enfolded by the PCM paraffin in several layers, which are randomly distributed throughout the mortar.In turn, this comprises a major factor contributing to a high variation in results concerning the absorption of water by immersion, ultimately resulting in a hindrance to the accurate estimation of this aspect in the studied mortars.Naturally, as the experimental campaign proceeds towards gathering additional data, this hindrance is expected to be gradually mitigated over time.
Notwithstanding this fact, the current metrics-based assessment of models seems to indicate that the "noFibers" database variation originated two reasonably consistent models for the prediction of water absorption by immersion, namely the RF (0.70 R 2 and 3.72 RMSE) and the ANN (0.61 R 2 and 4.19 RMSE).It is evident that a reasonable performance can also be found in the database variations with fewer variables, possibly as a result of a higher difficulty for the models to understand the relationships between variables and identify pa erns, which is also a consequence of the higher dispersion of results.However, taking into account that the gains in predictive performance with the reduction of Notwithstanding this fact, the current metrics-based assessment of models seems to indicate that the "noFibers" database variation originated two reasonably consistent models for the prediction of water absorption by immersion, namely the RF (0.70 R 2 and 3.72 RMSE) and the ANN (0.61 R 2 and 4.19 RMSE).It is evident that a reasonable performance can also be found in the database variations with fewer variables, possibly as a result of a higher difficulty for the models to understand the relationships between variables and identify patterns, which is also a consequence of the higher dispersion of results.However, taking into account that the gains in predictive performance with the reduction of variables are not conclusive, the "noFibres" database variation was selected as the one with the most available information for the purpose of comparative analysis as well as potential for additional insight.While the SVM model appears to initially be capable of competing with the RF and the ANN model when observing the subsequent REC curves shown in Figure 11, the two latter models quickly overcome the former, outperforming it in terms of AUC.competing with the RF and the ANN model when observing the subsequent REC curves shown in Figure 11, the two la er models quickly overcome the former, outperforming it in terms of AUC.
Figure 11.REC curve for water absorption by immersion predictive performance under the "noFibers" data variation.The aforementioned higher dispersion of results is blatant in the predicted vs. actual value plots depicted in Figure 12, once again showing the clear difficulty experienced by both models in estimating values in the mid-to-high ranges, characterized by a lower amount of data in comparison to the lower ranges.Nonetheless, it is possible to observe that the RF model predictions seem to be closer to the actual values, as conveyed by their closer proximity to the 45° line in the figure, substantiating the higher values achieved by this model in terms of the previous metrics.The aforementioned higher dispersion of results is blatant in the predicted vs. actual value plots depicted in Figure 12, once again showing the clear difficulty experienced by both models in estimating values in the mid-to-high ranges, characterized by a lower amount of data in comparison to the lower ranges.Nonetheless, it is possible to observe that the RF model predictions seem to be closer to the actual values, as conveyed by their closer proximity to the 45 • line in the figure, substantiating the higher values achieved by this model in terms of the previous metrics.
In what concerns the relative importance of variables assumed by each model, the RF model strongly points out the PCM and water content as the paramount variables, adding up to being responsible for nearly 75% of the total importance of all variables in this model.This is consistent with the previously described behavior of the PCM paraffin enfolding the mortar aggregates, partially isolating the aggregates randomly throughout the mortar body, and subsequently resulting in a strong influence over its rate of water absorption by immersion.In opposition to this, the typical pattern of the ANN model characterized by the tendency to distribute the weights of relative importances slightly more evenly seems to slightly hamper its predictive ability in this case, ultimately supporting the RF model's metrics-based indication of a better fit in the context of water absorption by immersion behavior prediction (Figure 13).In what concerns the relative importance of variables assumed by each model, the RF model strongly points out the PCM and water content as the paramount variables, adding up to being responsible for nearly 75% of the total importance of all variables in this model.This is consistent with the previously described behavior of the PCM paraffin enfolding the mortar aggregates, partially isolating the aggregates randomly throughout the mortar body, and subsequently resulting in a strong influence over its rate of water absorption by immersion.In opposition to this, the typical pa ern of the ANN model characterized by the tendency to distribute the weights of relative importances slightly more evenly seems to slightly hamper its predictive ability in this case, ultimately supporting the RF model's metrics-based indication of a be er fit in the context of water absorption by immersion behavior prediction (Figure 13).In what concerns the relative importance of variables assumed by each model, the RF model strongly points out the PCM and water content as the paramount variables, adding up to being responsible for nearly 75% of the total importance of all variables in this model.This is consistent with the previously described behavior of the PCM paraffin enfolding the mortar aggregates, partially isolating the aggregates randomly throughout the mortar body, and subsequently resulting in a strong influence over its rate of water absorption by immersion.In opposition to this, the typical pa ern of the ANN model characterized by the tendency to distribute the weights of relative importances slightly more evenly seems to slightly hamper its predictive ability in this case, ultimately supporting the RF model's metrics-based indication of a be er fit in the context of water absorption by immersion behavior prediction (Figure 13).

Conclusions
This work was aimed at exploring the potential of machine learning to predict the behavior of mortars with direct incorporation of PCM, based on our own experimental databases, contributing to an underdeveloped research area with great research needs.Thus, this work adds innovative knowledge to the currently existing knowledge about mortars with the incorporation of microencapsulated PCM in mortars.On the other hand, it also presents the application of machine learning models applied to predicting the physical and mechanical behavior of mortars based on different binders (cement and gypsum).Not only is this implementation of machine learning models aimed at providing insight on the hypothesis of whether these models are capable of understanding how the mortar constituents affect its behavior, but this paper also includes a comparative study on which models display the best fit to the data for each predicted variable.
Thus, four mortar parameters were studied (two mechanical parameters, specifically compressive and flexural strength, and two physical parameters, namely water absorption by capillarity and by immersion) over several variations of databases encompassing different combinations of variables, ranging from the content of aggregates, binders, and obviously PCM to the use of fibers, superplasticizer, gypsum, or fly ash in the mortar mixes.Based on the results, it was possible to carry out a comparative analysis of the implemented model's ability to understand the relationships between variables and their impact on the mortars behavior.
The results show potential, as ML models, specifically random forest and artificial neural network, were demonstrated to achieve a very good fit for the prediction of the four target variables.The results were assessed by several different metrics and analyses, which were validated and strongly supported by resorting to expert knowledge in the field of PCM-enhanced mortars and associated experimental campaigns.The proposed models also represent pre-design tools at the project stage, allowing a reduction in the number of experimental samples, saving time and resources.
The limitations of this study are related to the possibility of applying it to other mortars since its generalization is conditioned by the type of binders, sands, and PCM used.Therefore, future work is proposed as a response to this limitation, namely the following: • The database can be increased in terms of result numbers but essentially including other similar experimental works based on different raw materials (binders, aggregates, and PCM types) to increase their application/generalization potential.Regarding PCMs, it will be important to include PCMs from inorganic nature and eutectic mixtures in addition to organic nature PCM's; • The ability to predict the behavior of PCM-enhanced mortars can be expanded towards the ability to select the best combination of variables for a certain mortar application.This can be achieved by implementing an optimization algorithm capable of resorting to the predictive capabilities of the ML models to ascertain the best combination of mortar components (e.g., content of PCM, sand, cement, and water) to produce a mortar with the target mechanical and physical characteristics for specific uses.

Figure 4 .
Figure 4. Relative importance of variables for UCS predictive performance of both ANN and RF models ("allVars" database variation).

Figure 4 .
Figure 4. Relative importance of variables for UCS predictive performance of both ANN and RF models ("allVars" database variation).

Figure 5 .
Figure 5. REC curve for flexural strength predictive performance under the "allVars" data variation.

Figure 7 .
Figure 7. Relative importance of variables for flexural strength predictive performance of ANN, RF, and SVM models ("allVars" database variation).

Figure 7 .
Figure 7. Relative importance of variables for flexural strength predictive performance of ANN, RF, and SVM models ("allVars" database variation).

Figure 8 .
Figure 8. REC curve for water absorption by capillarity predictive performance under the "allVars" data variation.

Sustainability 2024 , 20 Figure 10 .
Figure 10.Relative importance of variables for water absorption by capillarity predictive performance of ANN, RF, and SVM models ("allVars" database variation).

Figure 11 .
Figure 11.REC curve for water absorption by immersion predictive performance under the "noFibers" data variation.

Figure 13 .
Figure 13.Relative importance of variables for water absorption by immersion predictive performance of ANN and RF models ("noFibers" database variation).

Table 3 .
Adopted database variations featuring the different independent variables.

Table 4 .
Obtained metrics for every adopted model trained on the different database variations for UCS.
Best values marked as background green; lower values marked as background orange.

Table 4 .
Obtained metrics for every adopted model trained on the different database variations for UCS.

Table 5 .
Obtained metrics for every adopted model trained on the different database variations for flexural strength.
Best values marked as background green; lower values marked as background orange.

Table 5 .
Obtained metrics for every adopted model trained on the different database variations for flexural strength.

Table 6 .
Obtained metrics for every adopted model trained on the different database variations for water absorption by capillarity.

Table 6 .
Obtained metrics for every adopted model trained on the different database variations for water absorption by capillarity.
Best values marked as background green; lower values marked as background orange.

Table 7 .
Obtained metrics for every adopted model trained on the different database variations for water absorption by immersion.
Best values marked as background green; lower values marked as background orange.

Table 7 .
Obtained metrics for every adopted model trained on the different database variations for water absorption by immersion.
Best values marked as background green; lower values marked as background orange.