A Comprehensive Study on the Styrene–GTR Radical Graft Polymerization: Combination of an Experimental Approach, on Different Scales, with Machine Learning Modeling

: The study of the styrene–Ground Tire Rubber (GTR) graft radical polymerization is particularly challenging due to the complexity of the underlying kinetic mechanisms and nature of GTR. In this work, an experimental study on two scales ( ∼ 10 mL and ∼ 100 mL) and a machine learning (ML) modeling approach are combined to establish a quantitative relationship between operating conditions and styrene conversion. The two-scale experimental approach enables to verify the impact of upscaling on thermal and mixing effects that are particularly important in this heterogeneous system, as also evidenced in previous works. The adopted experimental setups are designed in view of multiple data production, while paying speciﬁc attention in data reliability by eliminating the uncertainty related to sampling for analyses. At the same time, all the potential sources of uncertainty, such as the mass loss along the different steps of the process and the precision of the experimental equipment, are also carefully identiﬁed and monitored. The experimental results on both scales validate previously observed effects of GTR, benzoyl peroxide initiator and temperature on styrene conversion but, at the same time, reveal the need of an efﬁcient design of the experimental procedure in terms of mixing and of monitoring uncertainties. Subsequently, the most reliable experimental data (i.e., 69 data from the 10 mL system) are used for the screening of a series of diverse supervised-learning regression ML models and the optimization of the hyperparameters of the best-performing ones. These are gradient boosting, multilayer perceptrons and random forest with, respectively, a test R 2 of 0.91 ± 0.04, 0.90 ± 0.04 and 0.89 ± 0.05. Finally, the effect of additional parameters, such as the scaling method, the number of folds and the random partitioning of data in the train/test splits, as well as the integration of the experimental uncertainties in the learning procedure, are exploited as means to improve the performance of the developed models.


Introduction
As the consequences of climate change have been more and more visible in recent years, it has become of prior importance for all sectors to reduce the environmental impact of their activities, which is also encouraged by the increasing number of policies going in this direction. In this sense, greener ways to deal with the increasing amount of end-of-life tires are needed to avoid their ending in landfills or burning for energy recovery [1][2][3][4].
One environmentally-friendly and popular solution consists of transforming the rubber parts of tires into a micrometric powder, commercially called Ground Tire Rubber (GTR), by mechanical grinding. This powder is subsequently used as raw material or additive in other products of interest for diverse applications. In particular, the elastomeric properties of GTR make it an interesting filler agent to improve the mechanical properties of a wide range of materials including thermoplastics, thermosets, virgin or composite rubbers, concrete, bitumen or asphalt [1][2][3][5][6][7][8][9]. For example, the introduction of GTR fillers in polystyrene (PS), which is investigated in this work, can produce a composite material with improved stress cracking resistance and impact strength, with respect to the brittle pure PS matrix, and with reduced material costs.
However, the different structures of the two phases (i.e., the PS matrix and GTR, GTR having the form of a 3D crosslinked network) present a challenge in terms of their compatibility and the resulting poor mechanical properties of the final blend. Accordingly, several compatibilization strategies can be envisioned, such as devulcanization and reclamation or surface activation [2,3,5,6,10,11]. In this work, a chemical surface activation strategy in the form of in situ cross-linking by grafting polymerization, which is among the most prominent GTR surface modification techniques [6], is employed to produce PS-GTR composites. More specifically, PS is directly grafted onto the surface of GTR particles that are present during the radical polymerization of styrene. Similar in situ bulk radical polymerization techniques have been used to finely disperse and incorporate not only rubber particles in polymer matrices [4,[12][13][14], but also other nanofillers, such as for the synthesis of graphene-based nanocomposite materials [15][16][17][18][19][20][21], carbon tube-based nanocomposite materials [22,23] and many others [24][25][26].
A second challenging aspect of this process lies in the complexity of the GTR itself as raw material [2]. GTR is effectively obtained from tires of diverse origins and/or with unknown compositions of rubbers and additives, depending on the vehicle type and the required performance. This limits the mastery of the process and reduces the capacity of a detailed modeling of the system to better understand the link between the process parameters (e.g., temperature/time of the reaction and initial compositions of reactants) and the final productivity and product quality indicators (e.g., styrene conversion rate and grafting efficiency). For example, a previous work demonstrated that additives and moieties found in varying compositions in GTR, such as carbon black, display a significant effect on the course of the polymerization [27]. Nevertheless, developing a model capable of handling this inherent system complexity and predicting the effects of the process conditions on the desired indicators of the produced GTR-based composite would be extremely beneficial to the associated recycling and circular economy sectors.
This work builds on previously reported results of an ongoing study on the styrene-GTR radical graft polymerization system [27][28][29]. More specifically, a part of this ongoing study has already focused on the modeling of the system via the introduction of a generalized kinetic scheme that has been employed to describe the evolution of the polymerization in the presence of different amounts of GTR and/or benzoyl peroxide (BPO) initiator, under different reaction temperatures [29]. The mathematical model proposed in that work, presented an initial framework for the consideration of the chemical reactions that might occur in the system, as a consequence of the presence of GTR (i.e., as opposed to the presence of pure polybutadiene rubber that is considered in most reported studies [30][31][32][33]) as well as a methodology for the quantification and the gradual "masking" of reaction sites on the surface of GTR. In the present work, a complementary approach is adopted as means to circumvent the manque of knowledge on the exact composition of the system and the associated chemical developments. Accordingly, the present study focuses exclusively on the case where GTR is present in the system, contrary to [29] where both the pure styrene homopolymerization and the GTR-grafting cases were investigated.
Machine learning (ML) is a prominent tool in polymer materials design and property prediction that allows saving significant time and effort, related to the development of purely phenomenological modeling approaches, when the underlying phenomena are not completely elucidated and/or highly complex [34,35]. Accordingly, this work presents the implementation of a data-driven modeling approach, on the basis of supervised ML techniques, under varying operating conditions. The developed ML models are implemented as standalone regression models, for the prediction of the monomer conversion developments in an attempt to assess their suitability in modeling this highly complex polymerization system, without using any prior knowledge or mechanistic description of the underlying phenomena, as it is the case in previous works [27][28][29]. As a perspective for future work, the findings of this work will be used in the development of a hybrid modeling approach where both phenomenological and ML models of the system will be combined.
The proposed ML modeling framework necessitates the existence of experimental data to serve during the training of the models and the subsequent testing of their generalization capacity. However, the use of the previously reported bench-scale experimental setup [28] was prohibitive for the production of sufficient amounts of data for the ML model (i.e., under a wide range of process conditions), due to its highly time-consuming nature. In addition, this system also presented a limited capacity of controlling the homogeneity of temperature and composition within the reactor, especially at high GTR loading [28]. So in order to produce the required experimental data for this work, with the lowest possible uncertainty and within a reasonable amount of time, the grafting polymerization experiments were performed on laboratory-scale experimental setups, namely a small-sized system (∼10 mL) and a medium-sized one (∼100 mL). In the small-sized system, a parallel realization of grafting polymerizations in sealed test tubes, placed in a heating bath and without internal agitation, was carried out. On the other hand, the medium-sized system was based on an ad hoc experimental setup allowing the simultaneous realization of the poymerization reaction in six glass reactors, under controlled temperature and internal agitation. Both systems served to produce multiple samples rapidly and to investigate the effect of scale-up on the thermal and compositional homogeneity within the reacting mixture. In both cases, small volumes of reactants were used in order to allow using the complete reactor content in the post-reaction analyses, thus limiting the uncertainties related to the sampling of the heterogeneous mixture.
Note that, most reported studies are either limited to low GTR loadings (i.e., up to 30% wt.) or concern different polymer matrices and/or compatibilization strategies [10,12,[36][37][38][39][40][41][42]. At the same time, the precision and repeatability of the results are rarely discussed. To the authors' knowledge, this is the first time that the styrene-GTR radical graft polymerization system has been comprehensively investigated, for a wide range of GTR content, by combining a two-scale experimental approach with a ML modeling strategy. All the produced experimental data, as well as the developed mathematical models, are readily available through a GitHub repository. In addition, an extensive analysis of the uncertainty introduced in the different steps of the experimental process is also included in the discussion of the results.

Materials
The reagents used for the polymerization reactions are styrene monomer Reagent-Plus® (with a purity ≥ 99% and stabilized with 4-tert-butylcatechol) and BPO initiator Luperox® A75. They were purchased from Sigma-Aldrich Chemie GmbH (Steinheim, Germany) and Sigma-Aldrich Co. (St. Louis, MO, USA) respectively and both were used without further purification. GTR particles were obtained from DeltaGom France and used without further purification as well. The characteristics of GTR have been discussed elsewhere [27,28]. Even if the presence of stabilizers and impurities can greatly affect polymerization kinetics, these were not removed within the perspective of implementing this process at a larger scale. Therefore, the idea in this work was to use GTR as-received without further purification with organic solvents to avoid the generation of chemical wastes and to reduce production costs.

Small-Sized System
On the small scale, reactions took place in glass test tubes containing a total of 1.5 g of reactants (i.e., styrene, BPO and GTR) at various proportions. The test tubes were first filled with the desired amounts of GTR. The mixtures of monomer and initiator were preliminary prepared in beakers and then introduced directly onto the GTR, in the corresponding test tubes, followed by sealing of the tubes. Once the oil bath reached the target temperature, about ten test tubes were introduced in the bath and kept submerged during the whole reaction time. Occasionally, the sealing of the tubes was loosened to allow pressure relief in case of gas buildup. At the end of each reaction, the test tubes were immediately placed into a cold water bath for cooling. Due to the very low volumes, the heating and cooling times of the tubes was limited to a few minutes (i.e., typically less than three minutes). Finally, the test tubes were unsealed and placed in vacuum oven at 40°C until constant weight, signifying the complete evaporation of any remaining styrene. The measured mass loss during this stage served in the calculation of styrene conversion, according to: where m i,styrene denotes the mass of styrene that was introduced in the test tube and m f ,styrene is the recorded mass loss during vacuum evaporation.

Medium-Sized System
Figure 1 depicts a schematic of the experimental system, which was designed to perform simultaneously six polymerization reactions under identical heating, stirring, inerting and refrigeration conditions. More precisely, Figure 1a corresponds to a side view of the system (i.e., showing only three of the six reactors), while the top view proposed in Figure 1b enables to better visualize all six positions. Besides these schematics, photos of the actual system are provided in Appendix A. In the following, the elements of the system are described.

•
The reactors were designed with a limited volume (<100 mL) to keep the duration of sample analysis after polymerization reasonable (note that the whole reactor content is analyzed) while enabling reactions with sufficient amounts of GTR. The collars are large enough to facilitate the introduction of the reactants and the recovery of the products, the latter being more or less viscous depending on the operating conditions. The reactor diameter remains constant along the reactor height (except the bottom). This facilitates the introduction and removal of the agitator. The reactor material (glass) facilitates the observation of the mixing conditions. Finally, a lateral orifice enables the punctual introduction of a thermocouple, inside the reactor, for temperature measurement. • A 6-cm thick aluminum plate, containing 6 drilled reactor holdings, was used for reactor temperature control (Figure 1b,c). This plate was heated via conduction by an electrical heating plate on which it was directly placed. Silicon oil was also added in the holding positions to maximize the heat-transfer between the aluminum plate and the reactors. • A cooling system was also installed at the upper part of the setup, enabling to cool down the vapors of monomer during the reaction. It was composed of a glass tube, positioned on top of each reactor and wrapped with a transparent hose in which circulated glycerol, at a temperature of to 3.5°C, from a cooling thermostat bath. • The mechanical agitator ( Figure A1c) was designed with double propellers and an anchor to better scrape the reacting mixture from the walls and bottom of the reactors. In fact, since the mixture of GTR with PS became quite sticky during the polymerization, it was important to make sure that reactor content would remain under mixing throughout the polymerization, without forming an inverse bell shape with the agitator spinning in void in the middle of it. The rotation speed was fixed at 30 rpm. A pulley was fixed on the top of each stirring axe and a belt system made the 6 stirring axes rotate simultaneously ( Figure A1b). The system was designed to keep the 6 axes parallel, thus avoiding stirrers from scratching and damaging the reactor walls. • Nitrogen inerting, before the reactions, was also implemented via specifically designed inlets on the seals of the glass tubes and a dedicated nitrogen feeding network. All reactors were first filled with the desired amounts of GTR. The respective mixtures of monomer and initiator were preliminary prepared in beakers and then introduced in the corresponding reactors directly onto the GTR. An initial blend with the mechanical agitator was performed manually to ensure a good distribution of the mixture of monomer and initiator within GTR particles. The experimental system was then mounted as shown in Figure 1a. Reactors were sealed and purged with nitrogen while initiating the cooling system and the mechanical agitation. The heating plate was turned on only once the cooling thermostat had reached its set-point temperature. During the polymerization, a temperature measurement was taken inside each reactor every 15 min during the first hour, then every hour until the end of the reaction. For these measurements, agitation was momentarily paused. In general, mixtures with low GTR content displayed higher temperature variations during the initial stages of the reaction but, eventually, all reactors reached a steady reaction temperature.
At the end of the polymerization, the reactors were removed from the experimental system and cooled-down in an ice bath to room temperature. The full contents of the reactors were removed, weighed and then transferred in vacuum oven at 40°C until constant mass. Throughout the different steps of the experimental procedure, special care was taken to measure eventual mass losses of reactants and products (i.e., due to their transfer between recipients or due to evaporation). The uncertainty related to the weighing equipment was also considered. The conversion of styrene during the polymerization reaction X styrene was determined on the basis of an expression similar to Equation (1): where, ∆ accounts for the total loss of styrene, from the preparation steps to the final measurement, as detailed in Appendix B. Finally, the measurement uncertainty, due to the equipment used, was assessed on the basis of the principle of error propagation [43] (cf. Appendix B for details).

Design of Experiments
A wide range of operating conditions, for the different experimental runs, was defined during this study in order to succeed in mapping the widest domain of the feasible experimental space. These were defined in terms of three principal conditions, namely the reaction temperature, the peroxyde intiator content and the GTR loading, as shown in Table 1. In particular, an effort was made to avoid limiting the amount of GTR to very low values, as it is often the case in other reported studies [12,36], which would also be coherent with an ultimate objective of maximizing the recycling of end-of-life tires [10] in an eventual industrial realization of the system. In this respect, a GTR loading as high as 70% was tested. On the basis of these established ranges, D-optimal designs of experiments were prepared for both experimental setups, considering a maximum reaction time of 8 h per experiment. These are depicted in Figure 2 for both systems.
However, along the experiments and in view of the obtained results, it was decided to slightly modify the initial design by introducing a few additional points to better investigate a specific experimental domain of interest. These points are shown in Figure 2, in orange for the small sized-system and in green and yellow for the medium-sized system at 90°C and 100°C, respectively. Inversely, some points were omitted due to extremely low reaction advancement for the adopted reaction temperature and duration (i.e., practically zero styrene conversion) or important temperature overshoot within the first minutes of the reaction. All the realized experiments, along with the measured conversion values, are grouped in Appendix C.

Factors
Min Max

ML Algorithms
Data-driven modeling is based on the premise that patterns and trends, which are otherwise difficult to identify or extract on the basis of knowledge and/or observation, can be identified within data coming from a system or process. Among the most common datadriven approaches that are encountered in the modeling of physicochemical systems are response surface methodology (i.e., as part of design of experiments) and ML methods, the latter being particularly adapted to highly complex or multidimensional problems [35]. In the present work, ten popular supervised ML regression models were initially screened on the produced experimental data of the grafting polymerization system to identify the ones that were more fitted to the specific problem characteristics. Indeed, each ML technique can be more or less suitable to a modeling problem, depending on different factors such as the quantity and nature of the data and the form of the underlying sought pattern. During this initial screening, the parameters of the tested models were fitted to match the experimental data, via the model training step, but without any further optimization of the hyperparameters (HPs) of these models. The tested models belong to different categories, including linear (linear regression (LR) and two of its regularized counterparts, namely ridge and lasso) and non-linear ones (support vector regression (SVR), Gaussian processes (GP) and multilayer perceptrons (MLP)), with extended application in polymer science [34]. In addition, similarity-based (k-nearest neighbors (kNN)), decision trees (DT) and ensemble (random forest (RF) for bagging and gradient boosting (GB) for boosting) methods were also tested.
At a second stage, the four best-performing ML models, namely GB, RF, MLP and SVR, were subjected to HP optimization, in an attempt to further improve their performance. Note that, it is extremely difficult to select a priori the best suited ML technique for a specific problem, mainly due to the plethora of existing methods and to the non-deterministic character of the outcome of the final combination of the model setup, training and HP optimization. This is the reason why this selection is not really justified in most reported studies. However, besides attempting a screening of different techniques and selecting the best performing ones, as is the case here, it is also useful to attempt to identify the characteristics of the best performing models that make them suitable to the problem in hand. This can provide valuable conclusions and serve as future guidance in similar problems. Accordingly, the basic principles of these four techniques are briefly explained below and will serve in the later discussion of the results and the conclusions of the study. Moreover, more detailed descriptions and comparative studies are also reported in the relevant literature [44]. Finally, note that all models were developed in Python, using the Scikit Learn library.
GB and RF are ensemble ML methods as they consist of combining the outputs of a large number of prediction models to obtain an improved ensemble prediction. The difference between GB and RF lies in the way the outputs are combined as well as the construction of the constitutive prediction models. More specifically, they, respectively, belong to boosting and bagging (or bootstrap aggregation) categories. In boosting, simple weak estimators (in underfitting) are combined sequentially in a way that at each iteration, a new weak learner is trained by considering the errors of the previous one (i.e., each new learner tries to correct the errors made by its predecessor), thus reducing underfitting/bias. In GB, the considered learners are DT. The principal advantages of GB are its capacity to model complex non-linear relationships, its customization possibilities and its generalization ability. However, it suffers from large computation cost. The main HPs of GB are generally the number of estimators (or learners) and the learning rate. Other HPs such as the fraction of features and samples considered in each tree can also introduce some randomness which has been proven to ameliorate generalization performance and calculation time. More details can be found in [45][46][47][48][49].
Inversely, in bagging, several well performing prediction models (in overfitting) are trained in parallel on a bootstrap sample of the data set (i.e., a subset of the train data set that is randomly drawn to train an individual prediction model; this subset is then replaced so that all subsets are drawn from the same data distribution). The individual predictions are then combined to give one final prediction (majority vote if classification problem, average value if regression problem) which reduces the overfitting/variance of the individual models. In RF, the prediction models are DT. However, the main particularity of RF method lies in that each DT is build only on a random subset of features. The random subsets of data and features reduce the possible correlations between trees, creating a diversity of individual prediction models (specific to particular conditions), and thus resulting in better accuracy and stability. Moreover, as the number of trees grows, the generalization error converges which makes RF particularly robust against overfitting.
Other advantages of RF are its low sensitivity to HP values and its capacity to evaluate the importance of input features. As RF is based on DT, it also benefits from the latter's simplicity and interpretability. More details about RF method can be found in [44,[50][51][52][53][54]. The main HPs of RF are the number of trees and the number of random input features and samples considered for each tree.
MLP, which are a specific type of artificial neural networks (ANN), consist of several layers containing the so-called neurons: an input layer, containing the inputs of the problem, an output layer, containing the outputs of the problem, and one or several intermediate hidden layers. The neurons present in the different layers are interconnected and associated with a weight. The information exchange is made possible thanks to the neurons which transform their input (i.e., the sum of all the inputs arriving from the previous layer multiplied by their corresponding weights plus a bias term) to an output by means of an activation function. Weights are iteratively adjusted during the learning process by minimizing the error between the predicted and the true outputs. A great number of HPs need to be set (and eventually optimized) in MLP, such as the number of hidden layers, the number of neurons in each hidden layer, the activation function, the learning rate and the method to adjust the weights, which renders this step especially time-consuming. On the other hand, MLP presents a powerful capacity to model complex problems as it can approximate any linear or nonlinear mathematical function. On the counterpart, the training of a MLP presents several local minima and is prone to overfitting and lack of generalization, as the complexity and the size of the network architecture increases, especially for small data sets. Furthermore, the structure of the MLP itself makes it harder to interpret compared to RF. More details about ANN and MLP can be found in [55][56][57][58][59][60][61][62].
SVR is the counterpart of support vector machines (SVM) classification algorithm for regression problems and is widely used due to its good generalization ability, high prediction accuracy and robustness during the training process (convex problem), even in presence of small, high dimensional and noisy data sets [63][64][65][66]. In SVM, the goal is to find the best hyperplane that will separate the data correctly into classes, which is performed by maximizing the margin (distance between hyperplane and closest point, also called support vector) in both sides of the hyperplane and the number of data classified correctly. When data is not linearly separable, kernel functions can be employed to re-describe them in another higher dimensional space where they will be linearly separable, thus improving classification performance. More details can be found in [63,64,[67][68][69][70]. The principle of SVR is also based on finding the best hyperplane. However, the goal here is to have the maximum number of points located inside the narrowest margin (or epsilon tube), the latter containing the hyperplane which represents the regression function. The main HPs are the regularization parameter C (to penalize the highest deviations outside the margin), epsilon (radius of the epsilon tube) and the kernel. Other HPs, more specific to the kernel, can also be optimized.

ML Procedure
In any data-driven modeling study, the data pretreatment represents the most crucial and time-consuming step, as the quality of data (besides quantity) will largely determine the performance of the model (i.e., widely known as "garbage in, garbage out" principle). In this work, data pretreatment consisted essentially in keeping the most reliable experimental measurements (i.e., eliminating the ones in which important losses or measurement errors were observed). The final data set, that was used for the training of all ML models, is composed of a total of 69 experiments, as shown in Table C1. These were shuffled and partitioned into train and test sets according to a 5-fold cross-validation procedure (Figure 3a). The latter enabled to evaluate the variability of the performance subject to partitioning, while ensuring that each sample was present in one of the test sets. As part of the data pretreatment process, the considered inputs need also to be standardized (i.e., rescaled to present a common mean and standard deviation) as means to ensuring that their eventual difference in magnitude will not bias the model. In the present work, a mean value of 0 and a standard deviation of 1 were used for the normalization, while this was applied solely on the train set to avoid data leakage (i.e., assure the complete independence of the train and test sets).
For the HP optimization step, a grid search cross validation method was employed to exhaustively test all possible combinations. This was motivated also by the relatively low number of models/HPs to be tested. In a different case, other HP optimization strategies, such as genetic algorithms or random search, could be employed, depending on the problem characteristics and the type of ML algorithms [71]. To fine-tune the boundaries and levels of the considered HPs, a prior manual investigation of the impact of each individual HP on the model performance was initially carried out. The different types of HPs for each model, along with their respective optimization domain, are detailed in Table 2. A nested cross-validation scheme was adopted in order to limit the introduction of bias in combined HP tuning and model selection steps. More specifically, this procedure consists of two nested cross-validation schemes, namely an external one based on a train/test split of the whole data set and an internal one based on a train/validation split of each training set of the outer loop. The outer loop is used for model training (i.e., estimation of its parameters) and model selection, while the inner one aims in HP optimization. In this work, 5-fold outer and inner cross-validations were employed (Figure 3b). Even if this procedure is more time-consuming than using a common k-fold for both model training/selection and HP optimization, it produces more robust and unbiased performance estimates against overfitting and is highly suitable for small data sets [72,73].
Among the different metrics that are typically employed to evaluate the performance of ML models, the root mean squared error (RMSE), mean absolute error (MAE), determination coefficient (R 2 ), max and median error, are the most commonly encountered ones. In this work, RMSE was selected as the principal model evaluation metric as it penalizes specifically the largest errors, which are to be avoided if the model has to be applied for industrial use. However, in the results section, R 2 and MAE values are also reported in order to provide a more complete overview of the performance of the developed ML models.

Experimental Results
All the results for the small-sized and medium-sized systems are presented in detail, respectively, in Tables C1 and C2 and are also publicly accessible in electronic format (cf. Data Availability Statement). In the following, the behavior of the two systems in terms of temperature control, polymerization time and effect of GTR and BPO is discussed, on the basis of the obtained results. In all the presented graphs, error bars represent standard deviation (i.e., whenever an experiment has been repeated several times). The average absolute uncertainty in the determination of monomer conversion was estimated (cf. Appendix C) to be equal to an average of 0.13% wt and 0.29% wt for the small-sized and the medium-sized systems, respectively. This was primarily based on the uncertainty of the equipment used (i.e., ±0.1 mg of the precision balance). At the same time, the average standard deviation of three measurement repetitions for a total of seven samples was found to be equal to 0.3 mg. Finally, other sources of introduction of random error to the measurements, such as the balance drift, influenced by air stream, temperature or humidity, were not quantified.
During the experiments, the temperature inside the reactors was investigated to ensure the reliability of the data for latter modeling. In the small-sized system, a thermocouple was kept inside three reference tubes (corresponding to GTR/(GTR + styrene) contents of 10%, 40% and 70% wt, the ratio BPO/styrene being set at 5.5% wt) during the whole reaction for the three temperatures of interest. An example of the recorded temperature evolution, for 90°C, is shown in Figure 4 and in Table 3. Despite an initial overshoot of the reaction temperature, which is more pronounced for mixtures with lower GTR content, the set-point temperature was reached in the first 10 min of reaction, within a margin of ±2°C in all cases. In the medium-sized system, it was impossible to follow the evolution of the internal temperature on a permanent basis due to the form of the mechanical agitator that covered the reactor radially and throughout the complete mixture height. Accordingly, at specific time-intervals, the agitation was stopped promptly and thermocouples were introduced into the reactors through the lateral orifice for temperature measurement. In this case, the temperature overshoot was significantly more pronounced, reaching even 56°C at certain conditions (see Table C2 for all temperature measurement results). Note that, the measured temperature was also affected by the positioning of the thermocouple inside the reactor, a phenomenon that was not observed in the small-sized system. The highest temperature values were observed below the agitator, near the bottom of the reactor, which are the ones that are reported here. These observations validate the ones made during a previous work [28] in a larger scale reactor and are evidence of the scale-up difficulty of the system. They are also indicative of the necessity to accompany published experimental measurements in similar systems with details about the achieved temperature homogeneity and the eventually observed heat-transfer limitations. In view of these elements, in the present work, 110°C experiments were limited for the medium-sized system while in the subsequent ML modeling treatment, only small-sized system data were considered as isothermal experiments.  Another parameter of interest, for this polymerization system, is the reaction time that, depending on the conditions, may extend up to >30 h until complete styrene or BPO consumption [12,36]. In the present work, the maximum reaction time was limited to 8 h due to technical constraints. However, the experiments carried out in the smallsized system, under the different reaction conditions set in the experimental design, were performed also for shorter duration, namely for 2 h and 5 h (see Table C1). This allowed monitoring the temporal evolution of the system and including time as a factor in the subsequent ML modeling study.
The evolution of styrene conversion, with respect to the mixture composition in GTR and BPO, is presented in Figures 5 and 6, for a reaction temperature of 90°C and 100°C, respectively. In these figures, continuous curves correspond to the small-sized system (denoted as S) while dashed curves are for the medium-sized system (denoted as M). The overall trend, observed for both systems and in both temperatures, is completely consistent with previous observations [27][28][29] where an increase in the amount of GTR in the system displayed a negative effect on monomer conversion. At the same time, a rather expected positive impact of the amount of BPO on the final styrene conversion is also observed. The inhibiting effect of GTR has been attributed to the reaction between the dissociated benzoyl acid radicals and the unpaired electrons on the surface of carbon black, creating new active sites which are either inert or have lower reactivity than the BPO primary radicals, thus limiting styrene conversion. Another possible mechanism, acting synergetically towards the same direction, could be the redox reaction between BPO and the oxide groups of the carbon black surface, which would be favored at a lower reaction temperatures due to its lower activation energy than the thermal decomposition of BPO [27].
It is also worth noting that the effect of GTR content on the value of monomer conversion is not linear, with respect to the GTR content, but is more pronounced in the range of 10-40% wt loadings. For example, at 90°C (i.e., Figure 5), the final monomer conversion after 8 h of reaction drops from around 80% to only 10% when passing from 10% to 40% wt in GTR content. However, half of this decrease is already evident at 15% GTR content. This phenomenon had already been identified in previous studies [27,29] and a critical GTR concentration had been considered, in the range between 10% and 50% wt of GTR content, beyond which the observed inhibiting effects are gradually attenuated.
The same trend is observed in both small-sized and medium-sized systems. At the same time, there is a slight disagreement between the results of the two systems, under similar reaction conditions, with the small-sized system displaying higher styrene conversions at lower GTR/(GTR + styrene) ratios (i.e., 10% and 15%) than the medium sized one. This tendency gradually inverses as the GTR ratio increases (i.e., up to 70% wt of GTR). This difference could be attributed to the aforementioned unavoidable mass losses that occur during the different steps in the medium-sized system, especially during the transfer of the reactor content to the recipients for mass measurement and evaporation in the vacuum oven (e.g., non-recovered material from the reactor walls and agitator). Although an effort was made to meticulously monitor these losses and account for them during the calculations (cf. ∆ 2 in Equations (2) and (B.1)), it was assumed that the total losses in mass is proportionally distributed among styrene and GTR, according to their initially introduced amount in the reactor. In this sense, for low GTR/(GTR + styrene) ratios, a relatively higher percentage of the measured mass loss will be attributed to styrene, thus leading to the calculation of a higher value for monomer conversion than in a case where the same mass loss would correspond to a higher GTR content. A different consideration of the distribution of losses (e.g., an equal distribution between GTR and styrene) would have artificially decreased this observed deviation between the two systems. In any case, it remains technically impossible to verify this assumption.  Besides the negative effect of GTR content, it is seen that the reaction temperature displays an overall positive effect by increasing styrene conversion. It is also observed that the effect of temperature becomes less profound at higher GTR loadings. These observations are completely inline with the previously reported measurements [27] as well as with the proposed hypothesis of redox reaction between BPO and the oxide groups of the carbon black surface that would be favored at lower reaction temperatures. Effect of GTR/(GTR + styrene) and temperature for small-sized system and BPO/styrene = 5.5% (dashed curves correspond to trend lines).

ML Results
Following the pretreatment of the measured experimental data, as explained in Section 3, 10 different supervised-learning regression algorithms were screened and compared, prior to any further optimization of their HPs. In Figure 9, the parity plots of the split corresponding to the lowest RMSE, for the test data set, are presented for each model. In addition, comparative bar plots between all tested models, on the basis of the adopted performance criteria (i.e., R 2 , RMSE, MAE), are depicted in Figures 10 and 11 for the train and test data sets, respectively. The error bars correspond to the standard deviation, calculated on the basis of the 5-fold learning procedure. The exact values of all the metrics are also provided in Table D1. The best performing models, identified in terms of their test data set performance metrics were found to be the ensemble models (GB and RF), DT, MLP and SVR. The corresponding test R 2 values for these models varied between 0.927 ± 0.032 and 0.762 ± 0.066.
These models, identified as the top-performing ones in this case, are among the mostcommonly employed in similar reported studies [35]. By their construction, ensemble models can generalize very well to new data as they are composed of several models, each trained under specific data conditions. DT models also perform particularly well, but typically show lower performance than ensemble models as they can be considered as a sub-case of RF. MLP networks are notorious for their capacity in modeling highly non-linear response surfaces but suffer from low generalization and a need for extensive data-sets. On the other hand, SVR presents a more robust training performance and perform well with limited data. It would thus seem as these models displayed better performance than the rest due to a combination of characteristics of the problem that would partially match their specific advantages. Accordingly, GB, RF, MLP and SVR were selected for the HP optimization phase, in order to investigate whether further significant improvement could be attained in their performance, differentiating one (or more) of them as a clear choice for the modeling of this system. Note that, the selection of these models was not based strictly on their performance but also on a choice to further test models that are structurally different (i.e., ensemble models vs. SVR and ANN) and quite popular in such applications. Note also that some of them, such as SVR and MLP, are typically considered to be more sensitive to their HP values than others (e.g., RF). Finally, the low values of standard deviation for the selected models are indicative of their robustness against the train/test splits, which allows generalizing the conclusions about their applicability with higher confidence.   Concerning the rather poorly performing models, it seems that the experimental data could not be described by linear models and/or by the use of a reduced number of input features (i.e., as is the case in strongly regularized linear models), thus leading to evident underfitting. This becomes obvious in the case of Lasso, which significantly reduces the number of input features (e.g., as opposed to Ridge). An optimization of the regularization parameter in Ridge and Lasso models could probably help to improve the performance results and/or to reduce underfitting but this was not further pursued in this work. On the contrary, GP model performs perfectly on the training data set but fails to generalize in the test set, displaying clear evidence of severe overfitting. Similarly, an optimization of the GP kernel could have been explored as means to improve its performance, just as an optimized value of the number of neighbors might had improved the performance of the kNN model and render it more interesting for this problem.
The four best-performing ML models (GB, RF, MLP and SVR) were subsequently subjected to HP optimization using a "nested cross-validation" procedure, instead of a simple k-fold validation (i.e., as the one applied before HP optimization). The obtained optimized HPs are presented in Table 4, for each of the 5 train/test splits of the 5-fold outer cross-validation, and are completed with the average computation time for one split. Note that certain HPs of some of the models, such as the number of estimators in RF, the hidden layer sizes in MLP and the regularization parameter C in SVR, vary depending on the split, while the rest are not affected. It is also observed that the train/test split displays a more or less significant effect on the model performance, depending on the method. For example, Figure 12 shows that the test R 2 of SVR with optimized HPs varies from 0.7 to 0.9, depending on the split. However, the overall net improvement of SVR and MLP models, after HP optimization, is also clearly visible in Figure 13, where train and test data are better aligned on the y = x bisector of the parity plots.   Once these optimized values were obtained, they were used as part of the 5-fold outer cross-validation, to train the models on train set and evaluate them on test set for each train/test split. The test performances (R 2 , RMSE and MAE) with default (i.e., prior to HP optimization) and optimized HPs are compared in Figure 12, where full bars represent the performances with default HPs while hatched ones are for performances with optimized HPs. The error bars represent the standard deviations calculated on the basis of the five different train/test splits. The exact values of all the metrics are also provided in Table D2. Note that the final results are subject to the selected HP optimization method and metric. Accordingly, the selection of a method different than grid search cross validation or the optimization on the basis of another metric than RMSE, would have probably resulted in different values for the optimized HPs and for the model performances. A net improvement is observed in the SVR and MLP model performances on the test data set, whilst GB and RF display no improvement but rather a slight degradation in their performance. Moreover, as discussed previously, RF models are known for showing low sensitivity to their HPs values. The improvement of SVR and MLP models after HP optimization is also visible in Figure 13, where train and test data are better aligned on the y = x bisector of the parity plots. Finally, note that the test performance of the models displays a higher variation than their respective train performance (cf . Table D2), which can be considered as further justification of the need to employ a k-fold validation procedure rather than evaluating the performance of the models on the basis of any random split.
Several options could be envisioned to further improve the prediction performances of the models. First of all, it is clear that the limited number of samples prevents the models from learning from more data and thus improving their generalization performance. Indeed, as discussed in the introduction of this work, data-driven techniques require sufficient amount and quality of data. Although it is not evident to define a priori what "sufficient" means, especially since different methods have different requirements in terms of the amount of data, relative studies have shed light to the interplay that exists between the amount of data, used for the training of ML models, and their precision [74]. Accordingly, in the case of problems of limited data, one option would be to introduce a crude estimation of the target property in the feature space to enhance the accuracy of ML models.
At the same time, it is clear that fundamental experimental studies of physicochemical systems or materials cannot compete, in terms of data availability, with problems related to the fields of computer science, medical records or industrial data analysis. In this sense, it remains crucial that the experimental procedure is rigorous, involving identification and monitoring of all the steps that introduce uncertainties to the measurements. The impact of the experimental uncertainties on ML models performance can be subsequently evaluated and taken into account in the model development phase. For example, the incorporation of known data uncertainty measures on input and/or target data has been previously investigated for ANN, by using least-squares-inspired methods, and resulted in better generalization capacity [75][76][77][78]. In the present work, this possibility was exploited via a weighing procedure of the data used in the training steps of SVR, RF and GB by the estimated uncertainties on monomer conversion measurements. In particular, the normalized inverses of the uncertainties were used to weigh the samples in the regression functions (i.e., through the option sample_weight). However, only a marginal improvement was observed solely for SVR, corresponding to an increase in R 2 from 0.806 to 0.817, thus this option was not further pursued. The comparison of the test performances between the default case (all samples have same weight) and the case with weighted samples, after HP optimization, is provided in Figure E1.
Other parameters related to the ML model development procedure, such as the ratio for the partitioning of data into train and test set, the data standardization method and the random data shuffling and model initialization, could also influence the final model performance. The influence of these parameters in the performances of GB, RF, MLP and SVR, prior to HP optimization, was also studied in this work. The results are provided in Figures E2-E4, respectively, for the three previously described parameters. They showed that SVR can be slightly improved by using MinMax scaling instead of standard scaling. At the same time, MLP showed a net degradation of its performance with MinMax and robust scaling while the other two methods did not display any significant difference among the three scaling techniques. The partitioning ratio 90/10 for train and test sets, corresponding to 10-fold cross-validation, despite improving the average test performance was accompanied by higher standard deviations. Indeed, more data was available to train the models in this case but, at the same time, the model was evaluated on a very small test set which resulted in a broader error distribution. Finally, the random state, which controls the sequence of the pseudo-random number generator, was also found to display an effect on the model performance. As discussed previously, this effect can only be counterbalanced by the implementation of a k-fold cross validation procedure for the model training and HP optimization steps. However, it is important to remain aware of this random fluctuation of the model performance, especially given the limited amount of available data. Finally, another option for potentially improving the model performances would be by integrating their data-driven character with existing knowledge on the system. One way to achieve this would be to couple them with the previously developed mechanistic model [29] in an attempt to exploit simultaneously the advantages of each one of the two approaches in a unique hybrid model. This remains a perspective for future work.

Conclusions
The study of the kinetics of the styrene-GTR radical graft polymerization represents a great challenge due to the complexity of the prevailing kinetic mechanisms as well as the nature of GTR. In addition, the existing knowledge or controlling capacity of these aspects remains limited. In this respect, a rigorous experimental study and a ML modeling approach were combined in this work in an attempt to overcome these limitations and relate operating conditions with a key performance index, namely styrene conversion.
The particularity of the adopted experimental approach lied in the attention paid in ensuring data reliability and quantity, which are crucial in any ML approach. For this purpose, two laboratory-scale experimental setups were implemented to evaluate the reactive medium thermal and compositional homogeneity and investigate scale-up effects. The experimental procedure for both systems was designed so that data can be produced rapidly, while increasing data reliability by eliminating the uncertainty related to sampling for analyses. At the same time, all the potential sources of uncertainty, such as the mass loss along the different steps of the process and the precision of the experimental equipment, were carefully identified and monitored. The analysis of the results on both experimental scales revealed that the effects of GTR, BPO and temperature on styrene conversion were coherent with previous studies. However, the heterogeneous nature of the system created difficulties in the mixing and the estimation of the losses for the medium-sized system, which should not be neglected in view of an eventual further scaling-up.
The most reliable experimental data, containing 69 experiments from the small-sized system, was subsequently subjected to the screening of diverse supervised-learning regression ML models and the optimization of their HPs to identify the best-performing ones. These were identified to be GB, MLP and RF with test R 2 of, respectively, 0.91 ± 0.04, 0.90 ± 0.04 and 0.89 ± 0.05. In view of the small available data set, a nested cross-validation scheme was employed to account for variance in both validation and test sets, and the results revealed the impact of the random data partitioning on the performance of the developed models. Finally, the effect of additional parameters, such as the scaling method and the number of folds, as well as the integration of the experimental uncertainties in the learning procedure, were exploited as means to improve the predictions of the models.
From the perspective of upscaling the studied system towards industrial application, further research on how to enhance the homogeneity of temperature and composition is more than necessary, especially at higher GTR loadings. The latter are indeed necessary to increase the recycling of used tires. To limit the costs of the process, the use of commercially available reagents and GTR without further purification was preferred in this work. However, the presence of additives, stabilizers or impurities affects the polymerization course, and the variable GTR composition represents a major part of this added uncertainty. This complexity encourages further use of ML tools, ideally in combination with already developed phenomenological models in hybrid approaches, to quickly map and, if possible, understand the relationship between operating conditions and final product properties at different scales. Developing ML models compatible with small data sets and capable of integrating/predicting uncertainties is therefore an interesting direction for future work, as well as the extension to other molecular properties. For the latter, experimental procedures enabling the collection of sufficient data of good quality are therefore required. Finally, an effort to share experimental data and accompany each experimental result with precise and detailed conditions is crucial, as demonstrated in this work. Acknowledgments: The authors acknowledge the "Service Etudes et Réalisation Mécanique" of LRGP as well as the "Service de soufflage du verre" of University of Lorraine who, respectively, designed and built the medium-sized system and the glass parts. The authors also acknowledge the following masters students who contributed in setting up and collecting experimental data: Redouanne Chenna, Justine Guerry, Bahia Lamari, Betty Soulet and Jérémy Lanoizelée.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The

Appendix B. Evaluation of the Experimental Uncertainties
During the different steps of experimental procedure, especially in the medium-sized setup, several losses of reactants take place. These need to be identified and quantified for the correct calculation of styrene conversion (i.e., denoted as ∆ in Equation (2)). These losses are identified as follows: where: • ∆ 1 is the loss of styrene during polymerization reaction due to evaporation; • ∆ 2 is the loss of styrene during the transfer of the reactor content to the gravimetry cup at the end of the polymerization; • ∆ 3 is the styrene remaining in the glass tube at the end of the polymerization reaction.
All the above losses were identified by meticulous weighing of the used equipment before and after each individual step, using a Mettler Toledo XP504 precision balance. During these measurements, the ratio of styrene to GTR was considered to be identical and constant, as defined in the recipe of each experiment and measured during the initial material introduction step.
According to [43], the uncertainty of a function f = f (x 1 , x 2 , . . . , x n ) can be expressed in terms of the uncertainties u x i of each one of its variables x i and partial derivatives The application of Equation (B.2) to X styrene and ∆, defined in Equations (2) and (B.1), respectively, results in:   Figure E2. Scaler impact on test RMSE, without HP optimization. Figure E3. Partitioning train/test impact on test RMSE, without HP optimization. Figure E4. Random state impact on test RMSE, without HP optimization.