On the data-driven modeling of reactive extrusion

: This paper analyzes the ability of different machine learning techniques, able to operate in the low-data limit, for constructing the model linking material and process parameters with the properties and performances of parts obtained by reactive polymer extrusion. The use of data-driven approaches is justiﬁed by the absence of reliable modeling and simulation approaches able to predict induced properties in those complex processes. The experimental part of this work is based on the in situ synthesis of a thermoset (TS) phase during the mixing step with a thermoplastic polypropylene (PP) phase in a twin-screw extruder. Three reactive epoxy/amine systems have been considered and anhydride maleic grafted polypropylene (PP-g-MA) has been used as compatibilizer. The ﬁnal objective is to deﬁne the appropriate processing conditions in terms of improving the mechanical properties of these new PP materials by reactive extrusion.


Introduction
Initially, the industry adopted virtual twins in the form of simulation tools that represented the physics of materials, processes, structures, and systems from physics-based models. These computational tools transformed engineering science and technology to offer optimized design tools and became essential in almost all industries at the end of the 20th century.
Despite of the revolution that Simulation Based Engineering-SBE-experienced, some domains resisted to fully assimilate simulation in their practices for different reasons: • Computational issues related to the treatment of too complex material models involved in too complex processes, needing a numerical resolution difficult to attain. Some examples in polymer processing concern reactive extrusion or foaming, among many others. • Modeling issues when addressing materials with poorly known rheologies, as usually encountered in multi-phasic reactive flows where multiple reactions occur. • The extremely multi-parametric space defined by both the material and the process, where the processed material properties and performances strongly depend on several parameters related, for example in the case of reactive extrusion, to the nature of the reactants or the processing parameters, like the flow rate and viscosity, the processing temperature, etc.

Reactive Polymers Processing
Reactive extrusion is considered to be an effective tool of continuous polymerization of monomers, chemical modification of polymers and reactive compatibilisation of polymer blends. In particular, co-rotating and contra-rotating twin-screw extruders have proven to be a relevant technical and economical solution for reactive processing of thermoplastic polymers. The literature dedicated to reactive extrusion shows that a very broad spectrum of chemical reactions and polymer systems has been studied [11][12][13][14][15].
The many advantages of using the extruder as a chemical reactor can be described as follows: (i) polymerization and/or chemical modifications can be carried out in bulk, in the absence of solvents, the process is fast and continuous (residence time of the order of a few minutes); (ii) if necessary, devolatilization is effective, leading to the rapid removal of residual monomers and/or reaction by-products; and (iii) the screw design is modular, allowing the implementation of complex formulations (fillers, plasticizers, etc.).
However, there are also some disadvantages in using an extruder as a chemical reactor such as: (i) the high viscosity of the molten polymers, which lead to self-heating and therefore to side reactions (thermal degradation for example); (ii) the short residence time which limits reactive extrusion to fast reactions; and (iii) the difficulty of the scale up to industrial pilot and plants.
In terms of modeling and simulation, various strategies [16] can be considered as it needs to deal with a large number of highly nonlinear and coupled phenomena. Actually, the strategy of modeling depends on the objectives in terms of process understanding, material development from machine design or process optimization, and control. For example, in the case of free radical grafting of polyolefins, a two-phase stochastic model to describe mass transport and kinetics based on reactive processing data was proposed in [17].
Regarding process optimization, a simple 1D simulation approach provides a global description of the process all along the screws, whereas 3D models allow a more or less accurate description of the flow field in the different full zones of the extruder. However, most of these simulations are based on simplified steady-state 1D models (e.g., Ludovicc© software [18]).
Actually, the main processing parameters such as residence time, temperature, and extent of the reaction are assumed homogeneously distributed in any axial cross section. The use of one-dimensional models allows significant reductions of the simulation effort (computing time savings). In any case, the flow model is coupled with reaction kinetics that impact the fluid rheology [19].
Thus, one-dimensional models are specially appropriate when addressing optimization or control in reactive extrusion. In particular, the model proposed in [20] predicts the transient and steady-state behaviors, i.e., pressure, monomer conversion, temperature, and residence time distribution in different operation conditions. However, these simulations require several sub-models on establishing constitutive equations (viscosity, chemical kinetics, mass and temperature transfers). Actually, it takes time and the intuition and accumulated knowledge of experienced specialists. Furthermore, it is important to note that, despite the impressive effort spent by hundreds of researchers and thousands of published papers, no constitutive equation exists describing, for example, the behavior of complex polymer formulations such as reactive extrusion systems.
In summary, such a process is quite complex and would require a detailed study on the influence of the nature of polymers and chemical reactions (kinetics and rheology), processing conditions (temperature, screw speed, flow rate, screw profile). Nevertheless, a determinist answer to each of these parameters is out of consideration and actually we believe that the understanding of such a process is quite unrealistic from usual approaches.

Objectives of the Study
The present work aims at addressing a challenge in terms of industrial applications that is not necessarily based on improving the understanding of the process itself, but replacing the complex fluid and complex flow by an alternative modeling approach able to extract the link between the process outputs and inputs, key for transforming experience into knowledge.
A model of a complex process could be envisaged with two main objectives: (i) the one related to the online process control from the collected and assimilated data; (ii) the other concerned by the offline process optimization, trying to extract the optimal process parameters enabling the target properties and performances. Even if the modeling procedure addressed in this work could be used in both domains, the present work mainly focuses on the second one, the process modeling for its optimization; however, as soon as data could be collected in real-time, with the model available, process control could be attained without major difficulties.
There are many works in which each one uses a different data-driven modeling technique, diversity that makes it difficult to understand if there is an optimal technique for each model, or if most of them apply and perform similarly. Thus, this paper aims at comparing several techniques first, and then, using one of them that the authors recently proposed, and that performs in the multi-parametric setting, address some potential uses.

Modeling
In this section, we revisit some regression techniques that will be employed after for modeling reactive extrusion. For additional details and valuable references the interested reader can refer to Appendix A.
In many applications like chemical and process engineering or materials processing, product performances depend on a series of parameters related to both, the considered materials and the processing conditions. The number of involved parameters is noted by D and each parameter by x i , i = 1, . . . , D, all of them grouped in the array x.
The process results in a product characterized by different properties or performances in number smaller or greater than D. In what follows, for the sake of simplicity and without loss of generality, we will assume that we are interested in a single scalar output noted by y.
From the engineering point of view, one is interested in discovering the functional relation between the quantity of interest-QoI-y and the involved parameters x 1 , . . . , x D ≡ x, mathematically, y = y(x) because it offers a practical and useful way for optimizing the product by choosing the most adequate parameters x opt .
There are many techniques for constructing such a functional relation, currently known as regression, some of them sketched below, and detailed in Appendix A where several valuable references are given.

From Linear to Nonlinear Regression
The simplest choice consists in the linear relationship that if D + 1 data are available that is D + 1 couples {y s , x s }, s = 1, . . . , D + 1, then the previous equation can be written in the matrix form where x i,s denotes the value of parameter x i at measurement s, with i = 1, . . . , D and s = 1, . . . , D + 1. The previous linear system can be expressed in a more compact matrix form as Thus, the regression coefficients β 0 , . . . , β D are computed by simple inversion of Equation (3) from which the original regression form (1) can be rewritten as where W T = (β 1 · · · β D ). When the number of measurements P becomes larger than the number of unknowns β 0 , · · · , β D , i.e., P > D + 1, the problem can be solved in a least-squares sense.
However, sometimes linear regressions become too poor for describing nonlinear solutions and in that case one is tempted to extended the regression (1) by increasing the polynomial degree. Thus, the quadratic counterpart of Equation (1) reads where the number of unknown coefficients When considering third degree approximations, the number of unknown coefficients scales with D 3 and so on. Thus, higher degree approximations are limited to cases involving few parameters, and multi-parametric cases must use low degree approximations because usually the available data are limited due to the cost of experiences and time.
The so-called sparse-PGD [7] tries to encompass both wishes in multi-parametric settings: higher degree and few data. For that purpose, the regression reads where the different single-valued functions F j i (x j ) are a priori unknown and are determined sequentially using an alternate directions fixed point algorithm. As at each step one looks for a single single-valued function, higher degree can be envisaged for expressing it into a richer (higher degree) approximation basis, while keeping reduced the number of available data-points (measurements).

Code2Vect
This technique deeply revisited in the Appendix A proposes mapping points x s , s = 1, . . . , P, into another space ξ s , such that the distance between any pair of data-points ξ i and ξ j scales with the difference of their respective outputs, that is, on |y i − y j |.
Thus, using this condition for all the data-point pairs, the mapping W is obtained, enabling for any other input array x compute its image ξ = Wx. If ξ is very close to ξ s , one can expect that its output y becomes very close to y s , i.e., y ≈ y s . In the most general case, an interpolation of the output is envisaged.

iDMD, Support Vector Regression, and Neural Networks
Inspired by dynamic model decomposition-DMD- [8,9] one could look for W minimizing the functional F (W) [10] whose minimization results in the calculation of vector W that at its turn allows defining the regression y = W T x. Appendix A and the references therein propose alternative formulations. Neural Networks-NN-perform the same minimization and introduce specific treatments of the nonlinearities while addressing the multi-output by using a different number of hidden neuron layers [21].
Finally, Support Vector Regression-SVR-share some ideas with the so-called Support Vector Machine-SVM [22], the last widely used for supervised classification. In SVR, the regression reads and the flatness in enforced by minimizing the functional G(W) while enforcing as constraints a regularized form of

Experiments
The purpose of this project is the dispersion of a thermosetting (TS) polymer in a polyolefin matrix using reactive extrusion by in situ polymerisation of the thermoset (TS) phase from an expoxide resin and amine crosslinker. Here, Polypropylene (PP) has been chosen as the polyolefin matrix. A grafted PP maleic anhydride (PP-g-MA) has been used to ensure a good compatibility between the PP and the thermoset phases.
These studies were carried out as part of a project with TOTAL on the basis of a HUTCHINSON patent [23]. This patent describes the process for preparing a reinforced and reactive thermoplastic phase by dispersing an immiscible reactive reinforcing agent (e.g., an epoxy resin as precursor on the thermoset dispersed phase). This process is characterized by a high shear rate in the extruder combined with the in-situ grafting, branching, and/or crosslinking of the dispersed phase. These in situ reactions permit the crosslinking of the reinforcing agent as well as the compatibility of the blend with or without compatibilizer or crosslinker. The result of this process is a compound with a homogeneous reinforced phase with thin dispersion (<5 µm) leading to an improvement of the mechanical properties of the thermoplastic polymer. The experiments carried out in the framework of the present project are mainly based on some experiments described in the patent. However, new complementary experiments have been carried out to complete the study.

Materials
The main Polypropylene used as the matrix is the homopolymer polypropylene PPH3060 from TOTAL. Two other polypropylenes have been used to study the influence of the viscosity, and several impact copolymer polypropylenes have also been tested in order to combine a good impact resistance with the reinforcement brought by the thermoset phase. A PP-g-MA (PO1020 from Exxon) with around 1 wt% of maleic anhydride has been used as a compatibilizer between the polypropylene matrix and the thermoset phase. All the polypropylenes used are listed in Table 1 with their main characteristics. Concerning the thermoset phase, three systems have been studied. As a common point, these three systems are based on epoxy resins that are DGEBA derivates with two epoxide groups, two different resins (DER 667 and DER 671 from DOW Chemicals have been used. The first two systems, named R1 and R2 here, are both constituted of an epoxy resin mixed with an amine at the stoichiometry. The first uses the DER 667 with a triamine (Jeffamine T403 from Huntsman) that is sterically hindered, whereas the second one uses the DER 671 with a cyclic diamine (Norbonanediamine from TCI Chemicals. Melamine has also been tested in one of the formulations. The third system, named R5 here, mixes the epoxy resin DER 671 with a phenolic hardener (DEH 84 from DOW Chemicals) that is a blend of three molecules: 70 wt% of an epoxy resin, a diol, and less than 1 wt% of a phenolic amine. These systems have been chosen in order to see the influence of the structure, molar mass, and chemical nature on the in-situ generation of the thermoset phase within our polyolefin matrix. Table 2 summarizes the systems studied. The kinetics of these chemical systems have been studied from the variation of the complex shear modulus from a time sweep experiment with an ARES-G2 Rheometer (TA Instruments). The experiments have been performed for temperatures from 115 • C to 165 • C using a 25 mm plate-plate geometry, with a 1 mm gap, at the frequency ω = 10 rad/s and a constant strain of 1%. The kinetics have been performed on a stoichiometric premix of the reactants. The gel times of the systems have thus been identified as the crossover point between the loss and storage modulus. Note that the reaction is too fast to be performed at temperatures beyond T = 165 • C. Consequently, an extrapolation according to an Arrhenius law allowed us to determine the gel time of the systems at T = 200 • C (Barrel temperature of the extruder). The results give a gel time lower than 10 s for the three systems (t gel (R1) = 4.5 s, t gel (R2) = 10 s, and t gel (R5) < 1 s), so we made the hypothesis that the reaction time is much lower than 1 min and thus that the reaction is totally completed at the die exit of the extruder. Moreover, a Dynamic Mechanical Analysis (DMA) showed that the main mechanical relaxation T α associated with the T g of the thermoset phase is close to 80 • C, which is the T g observed for TS bulk systems.
The influence of the addition of silica on the final properties has been studied with two different silicas (Aerosil R974 and Aerosil 200).

Extrusion Processing
The formulations have been fulfilled in one single step with a co-rotating twin screw extruder (Leistritz ZSE18, L/D = 60, D = 18 mm), with the screw profile described in Figure 1. Two different temperatures profiles have been used, one at 230 • C and the other one at 200 • C, both with lower temperatures for the first blocs to minimize clogging effects at the inlet. These temperature profiles are described in Figure 2. Several screw rotation speeds and flow-rates have been used to study the influence of the process on the final materials (N = 300, 600, 450, 800 rpm;ẇ = 3, 5, 6, 10 kg/h).
The solid materials were mixed and introduced at the entrance by a hopper for the pellets and with a powder feeder for the micronized powders. As for the liquid reagents, they were injected over the third bloc with an HPLC pump. The formulations are cooled by air at the exit of the extruder and then pelleted.

Characterization
Tensile-test pieces (5A) and impact-test pieces have been injected with a Babyplast injection press at 200 • C and 100 bar. Young modulus has been determined by a tensile test with a speed of 1 mm/min and Stress at yield, Elongation at break, and Stress at break have been measured with a tensile speed of 50 mm/min. Impact strength has been measured by Charpy tests on notched samples at room temperature.

Data-Driven Modeling: Comparing Different Machine Learning Techniques
As previously mentioned, a model that links the material and processing parameters with the processed material properties is of crucial interest. By doing that, two major opportunities could be envisaged: the first one concerns the possibility of inferring the processed material properties for any choice of manufacturing parameters; (second), for given target properties, one could infer the processing parameters enabling them.
In this particular case, process parameters are grouped in the six-entrees array x: whereas the processed material properties are grouped in the five-entrees array y, containing the Young modulus, the yield stress, the stress at break, the strain at break, and the impact strength, As previously discussed, our main aim is extracting (discovering) the regression relating inputs (material and processing parameters) x with the outputs (processed material properties) y, the regression that can be written in the form where y i () represents the linear or nonlinear regression associated with the i-output, or when proceeding in a compact form by creating the multi-valued regression relating the whole input and output data-pairs, as y = y(x).
The intrinsic material and processing complexity justify the nonexistence of valuable and reliable models based on physics, able to predict the material evolution and the process induced properties. For this reason, in the present work, the data-driven route is retained, from the use of regression techniques, as the ones previously summarized.
The available data come from experiments conducted, described in the previous section that consists of P pairs of arrays x s , and y s , s = 1, · · · , P, that is: all them reported in Tables A1 and A2 included in the Appendix B (for the sake of completeness and for allowing researchers to test alternative regression procedures). Table A1 groups the set of input parameters involved in the regression techniques. The hyper parameter MaskIn is a boolean mask indicating if the data are included in the training (MaskIn= 1) or it is excluded from the training to serve for quantifying the regressions performance (MaskIn= 0). On the other hand, Table A2 groups the responses, experimental measures, for each processing condition.
As indicated in the introduction section, one of the objectives of the present paper is analyzing if different machine learning techniques perform similarly, or their performances are significantly different. For this purpose, this section aims at comparing the techniques introduced in Section 2, whereas the next section will focus on the use of one of them.
In order to compare the performances of the different techniques, an error was introduced serving to compare the regressions prediction. In particular, we consider the most standard error, the Root Mean Squared Error (RMSE). When applied to the different regression results, it offers a first indication on the prediction performances. Table 3 reports the errors associated with each regression when evaluating the output of interest that is the array y for a given input x, for all the data reported in Tables A1 and A2. Because the different outputs (the components of array y) present significant differences in their typical magnitudes, Table 4 reports the relative errors, computed from the ratios between the predicted and measured data difference to the measured data. Sparse PGD-sPGD-employed second degree Chebyshev polynomials and performed a regression for each of the quantities of interest according to Equation (14). The use of low degree polynomials avoided overfitting, being a compromise for ensuring a reasonable predictability for data inside and outside the training data-set. From a computational point of view, 20 enrichment (N in Equation (A16)) were needed for defining the finite sum involved in the separated representation that constitutes the regression of each output of interest y i .
Code2Vect addressed the low-data limit constraint by imposing a linear mapping between the representation (data) and target (metric) spaces, avoiding spurious oscillations when making predictions on the data outside the training set.
Considering the iDMD because of the reduced amount of available data, the simplest option consisting of a unique matrix relating the input-output data pairs in the training set (linear model) was considered, i.e., with respect to Equation (15), it was assumed F(x)) = Fx, and matrix F ensuring the linear mapping was obtained by following the rationale described in Section 2. The computed regression performs very well despite the fact of assuming a linear behavior.
The quite standard Neural Network we considered (among a very large variety of possible choices) presents a severe overfitting phenomenon in the low data-limit addressed here. This limitation is not intrinsic to NN, and could be alleviated by considering richer architectures and better optimizers, parameters, out of the scope of the present study.
The main conclusion of this section is the fact that similar results are obtained independently of the considered technique that seems quite good now from the point of view of engineering. Even if the errors seem quite high, it is important to note that: (i) the highest errors concern the variables exhibiting the largest dispersion in the measurements; (ii) the prediction errors are of the same order as that of the dispersion amplitudes; and (iii) we only considered 35 data-points from the 59 available for the training (regression construction) while the reported errors were calculated by using the whole available data (the 59 data-points). The next section proves that the prediction quality increases with the increase of the amount of points involved in the regression construction (training).

Data-Driven Process Modeling
In view of the reported results, it can be stressed that all the analyzed techniques show similar performances and work reasonably well in the low-data limit (where only 60% out of 59 data points composed the training data-set were used in the regressions).
As it can be noticed, some quantities of interest such as the Young's modulus and the stress at break are quite well predicted when compared with the others on which predictions were less performant. There is a strong correlation between this predictability capability and the experimental dispersion noticed when measuring these other quantities, like the strain at break. That dispersion represents without any doubt a limit in the predictability that must be addressed within a probabilistic framework. All the mechanical tests were performed on five samples from the same experiment process on the extruder. The final value is the average of these five tests. The confidence interval is estimated: 10% for the Young modulus and Yield stress, 20% on elongation and stress at break.
Extracting a model of a complex process could serve for real-time control purposes, but also, as it is the case in the present work, for understanding the main tendencies of each quantity of interest with respect to each process or material parameter (the last constituting the regression inputs), enabling process and material optimization.
In order to perform that sensibility analysis, we consider a given quantity of interest and evaluate its evolution with respect to each of the input parameters. When considering the dependence with respect to a particular input parameter, all the others are fixed to their mean values, even if any other choice is possible. Figure 3 shows the evolution of σ b with respect to the six input parameters, using the lowest order sPGD modes to extract the main tendencies.
From these AI-based metamodels, one should be able to identify the process conditions and the concentration of the TS phase in order to enhance a certain mechanical property. Thus, in order to increase the stress at break, increasing the content of thermoset seems a good option, with all the other properties (Young modulus, stress at yield, strain at break and impact strength being almost insensible to that parameter). A more detailed analysis, involving multi-objective optimization (making use of the Pareto front) and its experimental validation, constitutes a work in progress, out of the scope of the present work.
To further analyze the accuracy of the methodology and the convergence behavior, in what follows, we consider one of the regression techniques previously described and employed, the sPGD, and perform a convergence analysis, by evaluating the evolution of the error with respect to the size of the training data-set.
The training-set was progressively enriched, starting from 30 data points, and then considering 35, 41, 47, and finally 53 (that almost correspond to 50%, 60%, 70%, 80%, and 90% of the available data-set). The error was calculated again by considering both the training and test data-sets. Table 5 reports the results on the elastic modulus prediction, and clearly proves, as expected, that the prediction accuracy increases with the size of the training-set, evolving from around 15% to finish slightly below 10%.
It is important to note that one could decrease even more the error when predicting the training-set, but overfitting issues will occur and the error will increase tremendously out of the training set compromising robustness. The errors here reported are a good compromise between accuracy in and out of the training-set.  In order to facilitate the solution reproducibility, in what follows, we give the explicit form of the sPGD regression. As previously discussed, the sPGD makes use of a separated representation of the parametric solution, whose expression reads for a generic quantity of interest u(x) More explicitly, each univariate funcion F j i (x j ) is approximated using an approximation basis, When approximating the elastic modulus, whose results were reported in Table 5, we considered six parameters, i.e., D = 6, a polynomial Chebyshev basis consisting of the functions T j k (x j ) (needing for a pre-mapping of the parameter intervals into the reference one [−1, 1] where the Chebyshev polynomials are defined). The number of modes (terms involved by the finite sum separated representation) and number of interpolation functions per dimension were set to N = 10 and Q = 3. The coefficients related to Equation (18) when applied to the elastic modulus approximation are reported in Appendix C.
An important limitation, inherent to machine learning strategies, is the fact that most likely other factors instead of the ones considered as inputs could be determinant for expressing the selected outputs. This point constitutes a work in progress.

Conclusions
We showed in this paper that different machine learning techniques are relevant in the low-data limit, for constructing the model that links material properties and process parameters in reactive polymer processing. Actually, these techniques are undeniably effective in complex processes such as reactive extrusion. More precisely, this work was based on the in situ synthesis of a thermoset phase during its mixing/dispersion with a thermoplastic polymer phase, which is certainly one of the most complex cases in the processing of polymers.
We proved that a variety of procedures can be used for performing the data-driven modeling, whose accuracy increases with the size of the training-set. Then, the constructed regression can be used for predicting the different quantities of interest, for evaluating their sensitivity to the parameters, crucial for offline process optimization, and also for real-time process monitoring and control.

Appendix A.1. Support Vector Regression
Finally, Support Vector Regression-SVR-shares some ideas of the so-called Support Vector Machine-SVM-widely used in supervised classification. In SVR, the regression reads and the flatness in enforced by minimizing the functional G(W) while enforcing as constraints a regularized form of in particular with ξ s ≥ 0 and ξ * s ≥ 0, and and with many other more sophisticated alternatives to extend to the nonlinear case.
Appendix A.2. Code-to-Vector-Code2Vect Code2Vect maps data, eventually heterogenous, discrete, categorial, etc. into a vector space equipped of a Euclidean metric allowing computing distances, and in which points with similar output y remain close one to one another as sketched in Figure A1. We assume that points in the origin space (space of representation) consist of P arrays composed on D entries, noted by x i . Their images in the vector space are noted by ξ i ∈ R d , with d D. The mapping is described by the d × D matrix W, where both the components of W and the images ξ i ∈ R d , i = 1, · · · , P, must be calculated. Each point ξ i keeps the label (value of the output of interest) associated with its origin point x i , denoted by y i . We would like to place points ξ i , such that the Euclidian distance with each other point ξ j scales with their output difference, i.e., where the coordinates of one of the points can be arbitrarily chosen. Thus, there are P 2 2 − P relations to determine the d × D + P × d unknowns.
Linear mappings are limited and do not allow with proceeding in nonlinear settings. Thus, a better choice consists of the nonlinear mapping W(x) [6].

Appendix A.3. Incremental DMD
We reformulate the identification problem in a general multipurpose matrix form where x and y represent the input and output vectors, involving variables of different nature, both them accessible from measurements. In what follows, both are assumed D-component arrays.
If we assume both evolving in low-dimensional sub-spaces of dimension d, with d D, the rank of K, the so-called model, is expected reducing to d. The construction of such reduced model was reported in [10], and from the two procedures that were proposed, in what follows, we summarize one of them, the so-called Progressive Greedy Construction.
In this case, we proceed progressively. We consider the first available datum, the pair (x 1 , y 1 ). Thus, the first, one-rank, reduced model reads ensuring W 1 x 1 = y 1 . Suppose now that a second datum arrives (x 2 , y 2 ), from which we can also compute its associated rank-one approximation, and so on, for any new datum (x i , y i ): For any other x, the model could be interpolated from the just defined rank-one models, W i , i = 1, ..., P, according to with I i (x) the interpolation functions operating in the space of the data x, functions that in general decrease with the distance between x and x i (e.g., polynomials, radial basis, ... ) able to proceed in multidimensional settings.

Appendix A.4. From Polynomial Regression to Sparse PGD-Based Regression
In the regression setting, one could consider a polynomial dependence of the QoI, y, on the parameters x 1 , · · · , x D . The simplest choice, linear regression, reads where the D + 1 coefficients β k can be computed from the available data. If 1 + D = P data are available, y j , j = 1, . . . , 1 + D, coefficients β k can be calculated. Linear regression requires the same amount of data as the number of involved parameters; however, it is usually unable to address nonlinearities.
Nonlinear regressions can be envisaged when the number of parameters remains reduced, due to the fact that the number of terms roughly scales with D to the power of the considered approximation degree.
In this section, we propose a technique able to ensure rich approximations while keeping the sampling richness quite reduced, the so-called multi-local sparse nonlinear PGD-based regression-sPGD. The last reads where Ω is the domain in the parametric space in which the approximation is searched, i.e., x ∈ Ω, and w(x) represents the test function whose arbitrariness serves to enforce that the regression y(x) approximates the available data y, and with δ the Dirac mass, to express that data only available at locations x j in the parametric space. Following the Proper Generalized Decomposition (PGD) rationale, the next step is to express the approximated function y(x) in the separated form constructed by using the standard rank-one update [7] that leads to the calculation of the different functions F j i (x j ) involved in the separated form (A16).

Appendix A.5. A Simple Neural Network
Deep-learning is mostly based on the use of neural networks, networks composed of components that emulate the neuron functioning that from some incoming data generates an output that, within more complex and large networks, can become the input of other neurons in other layer.
We consider the schema in Figure A2 that illustrates a neuron receiving two input data x 1 and x 2 to produce the output Y. The simplest functioning consists of collecting both data, multiplying each by a weight, W 1 and W 2 and generate the output by adding both contributions according to that in the more general case can be written as The main issue is precisely the determination of vector W. If an input-output couple is available (x 1 , y 1 ), with the input normalized, i.e., x 1 = 1, then the best choice for the searched vector consists of W = y 1 x 1 that ensures recovering the known output, i.e., (A20) Figure A2. Sketch of a simple neuron.
Imagine now that, instead of a single input-output couple, P couples are available (x 1 , y 1 ), · · · , (x P , y P ), the learning can be expressed by minimizing the functional (A21) The nonlinear case employs a nonlinear function of the predictor for the neuron activation. When the multicomponent inputs produces multicomponent outputs, W becomes a matrix instead the vector previously considered. However, the procedures for computing that matrix from the knowledge of P couples (x s , y s ), s = 1, . . . , P remain almost the same as the ones previously discussed. In some circumstances, instead of considering a single layer of neurons, multiple layers perform better.