Multiple Tensor Train approximation of parametric constitutive equations in elasto-viscoplasticity

This work presents a novel approach to construct surrogate models of parametric Differential Algebraic Equations based on a tensor representation of the solutions. The procedure consists in building simultaneously, for every output of the reference model, an approximation given in tensor-train format. A parsimonious exploration of the parameter space coupled with a compact data representation allows to alleviate the curse of dimensionality. The approach is thus appropriate when many parameters with large domains of variation are involved. The numerical results obtained for a nonlinear elasto-viscoplastic constitutive law show that the constructed surrogate model is sufficiently accurate to enable parametric studies such as the calibration of material coefficients.


Introduction
Predictive numerical simulations in solid mechanics require material laws that involve systems of highly nonlinear Differential Algebraic Equations (DAEs).These models are essential in challenging industrial applications, for instance to study the effects of the extreme thermo-mechanical loadings that turbine blades may sustain in helicopter engines ( [1,2]), as well as in biomechanical analyses [3,4].
These DAE systems are referred to as constitutive laws in the material science community.They express, for a specific material, the relationship between the mechanical quantities such as the strain, the stress, and miscellaneous internal variables and stand as the closure relations of the physical equations of mechanics.When the model aims to reproduce physically-complex behaviors, constitutive equations are often tuned through numerous parameters called material coefficients.
An appropriate calibration of these coefficients is necessary to ensure that the numerical model mimics the actual physical behavior [5].Numerical parametric studies, consisting of analyzing the influence of the parameter values on the solutions, are typically used to perform the identification.However, when the number of parameters increases and unless the computational effort required for a single numerical simulation is negligible, the exploration of the parameter domain turns into a tedious task, and exhaustive analyses become unfeasible.Moreover, defining an unambiguous criterion measuring the fidelity of the model to experimental data is a challenge for models with complex behaviors.
A common technique to mitigate the aforementioned challenges is to build surrogate models (or metamodels) mapping points of a given parameter space (considered as the inputs of the model) to the outputs of interest of the model.The real-time prediction of DAE solutions for arbitrary parameter values, enabled by the surrogate model, helps the comprehension of constitutive laws and facilitates the conducting of parametric studies.In particular, the robustness of the calibration process can be dramatically improved using surrogate model approaches.
The idea of representing the set of all possible parameter-dependent solutions of ODEs and PDEs as a multiway tensor was pioneered with the introduction of the Proper Generalized Decomposition (PGD) [6][7][8].In this representation, each dimension corresponds to a spatial/temporal coordinate or a parameter coefficient.The resulting tensor is never assembled explicitly, but instead remains an abstract object for which a low-rank approximation based on a canonical polyadic decomposition [9] is computed.The PGD method further alleviates the curse of dimensionality by introducing a multidimensional weak formulation over the entire parameter space, and the solutions are sought in a particular form where all variables are separated.When differential operators admit a tensor decomposition, the PGD method is very efficient because the multiple integrals involved in the multidimensional weak form of the equations can be rewritten as a sum of products of simple integrals.
Unfortunately, realistic constitutive equations or even less sophisticated elasto-viscoplastic models admit no tensor decomposition with respect to the material coefficients and the time variables.An extension of the PGD to highly nonlinear laws is therefore non-trivial.However, many other tensor decomposition approaches have been successfully proposed to approximate functions or solutions of differential equations defined over high-dimensional spaces.We refer the reader to [10][11][12] for detailed reviews on tensor decomposition techniques and their applications.
Among the existing formats-CP decomposition [9,13,14], Tucker decomposition [11,15], hierarchical Tucker decomposition [11,16]-this work investigates the Tensor-Train (TT) decomposition [17,18].The TT-cross algorithm, introduced in [17] and further developed in [19,20], is a sampling procedure to build an approximation of a given tensor under the tensor-train format.Sampling procedures in parameter space have proven their ability to reduce nonlinear and non-separable DAEs by using the Proper Orthogonal Decomposition (POD) [21], the gappy POD [22], or the Empirical Interpolation Method (EIM) [23,24].These last methods are very convenient when the solutions have only two variables; hence, they are considered as second-order tensors.
This paper aims to extend the sampling procedure of the TT-cross method to DAEs having heterogeneous and time-dependent outputs.A common sampling of the parameter space is proposed, though several TT-cross approximations are computed to cope with heterogeneous outputs.These outputs can be scalars, vectors, or tensors, with various physical units.In the proposed algorithm, sampling points are not specific to any output, although parameters do not affect equally each DAE output.The proposed method is named multiple TT-cross approximation.Similarly to the construction of a reduced integration domain for the hyper-reduction of partial differential equations [25] or for the GNATmethod [26], the set of sampling points is the union of contributions from the various outputs of the DEA.In this paper, the multiple TT-cross incorporates the gappy POD method, and the developments are focused on the numerical outputs obtained through a numerical integration scheme applied to the DAE.

Materials and Methods
The parametrized material model generates several time-dependent Quantities of Interest (QoI).These quantities can be scalar-, vector-, or even tensor-valued (e.g., stress) and are generally of distinct natures, namely expressed with different physical units and/or have different magnitudes.For instance, in the physical model described in Appendix A, the outputs of the model are ε ∼ have six components each, and p is a scalar.Therefore, the generated data will be segregated according to the QoI to which they relate.This will also be structured in a tensor-like fashion to make it amenable to the numerical methods presented in this paper.We restrict our attention to discrete values of d − 1 parameters and discrete values of time instants, related to indices i 1 . . .i d , i k ∈ {1, . . .n k } for k = 1 . . .d.For instance, all the computable scalar outputs p will be considered as a tensor For a given χ = 1, . . ., N denoting an arbitrary QoI, the tensor of order d, (denoted with bold calligraphic letters) refers to a multidimensional array (also called a multiway array).Each element of A χ identified by the indices (i 1 , . . . The last index accounts for the number of components in each QoI.Therefore, the last index is specific to each A χ , while the others are common to all tensors for χ = 1, ..., N. Hence, a common sampling of the parameter space D 1 × . . .× D d−1 can be achieved.The vector A χ (i 1 , . . ., i d−1 , :) ∈ R n χ d contains all the components of output χ at all time instants used for the numerical solution of the DAE and for a given point in the parameter space.
Matricization designates a special case of tensor reshaping that allows representing an arbitrary tensor as a matrix.The q th matricization of A χ denoted by A χ q consists of dividing the dimensions of A χ into two groups, the q leading dimensions and the (d − q) trailing dimensions, such that the newly-defined multi-indices enumerate respectively the rows and columns of the matrix A χ q .For instance, A Here again, these matricizations are purely formal because of the curse of dimensionality.
The Frobenius norm is denoted by .without the usual subscript F .For A χ ∈ R n 1 ×...n χ d , it reads: The Frobenius norm of a tensor is invariant under all matricizations of a given tensor.
In [17], Singular-Value Decomposition (SVD) is considered in the algorithm called TT-SVD.Because of the curse of dimensionality, the TT-SVD has no practical use, even if tensors have a low rank.More workable approaches aim to sample the entries of tensors.
For instance, in the snapshot Proper Orthogonal Decomposition (POD) [21], the sampling procedure aims to estimate the rank and an orthogonal reduced basis for the approximation of a matrix A. The method consists of applying the truncated SVD on the submatrix Ã = A(:, J pod ) constituted by a selection of columns J pod of A. Hence, the accuracy of the resulting POD reduced basis relies on the quality of the sampling procedure that generally introduces a sampling error.This sampling procedure seems to be convenient when considering the first matricizations A χ q if the product n 1 n 2 . . .n q and Card(J pod ) are reasonably small regarding the available computing resources.However" for large values of q, the curse of dimensionality makes the snapshot POD, alone, intractable.
A more practical approach to construct an approximate TT decomposition effectively, called the TT-cross method, is proposed in [17].The TT-cross consists of dropping the concept of a POD basis and using the Pseudo-Skeleton Decomposition (PSD) introduced in [27] as the low-rank approximation.
Unlike the TT-SVD, the TT-cross enables building an approximation based on a sparse exploration of a reference tensor.The PSD can be used to approximate any matrix A ∈ R n×m and is written as: where the sets I psd and J psd are respectively a selection of row and column indices.The definition is valid only when the matrix A(I psd , J psd ) is non-singular.In particular, the number s of rows and columns has to be identical.This approximation (1) features an interpolation property at the selected rows and columns: T psd (I psd , :) = A(I psd , :) and T psd (: The PSD is a matrix factorization similar to the decomposition used in the Adaptive Cross Approximation (ACA) [28] and the CURdecomposition [29,30].Additionally, these references provide algorithms to build the factorization effectively.That decomposition has also been used in the context of model order reduction, for instance in the Empirical Interpolation Method (EIM) proposed in [23,24].
The condition that A(I psd , J psd ) must be non-singular makes it difficult to share sampling points for various matrices A χ q with χ = 1, . . ., N having their own rank.The gappy POD introduced in [22] aims at relaxing the aforementioned constraint by combining beneficial features of the snapshot POD and the PSD.Indeed, the gappy POD (a) relies on a POD basis that remains computationally affordable, (b) requires only a limited number of rows of the matrix to be approximated, and (c) enables reusing the set of selected rows for different matrices.These properties are key ingredients for an efficient, parsimonious exploration of the reference tensors.The gappy POD approximation T gap of a matrix A ∈ R n×m is given by: where † denotes the Moore-Penrose pseudo-inverse [31] and I gap is a row selection of s rows and where V ∈ R n×r is a POD basis matrix of rank r such that: In the sequel, because the simulation data in A χ are outputs of a DAE system, it does not make sense to sample the last index i d during column sampling of A χ q .Each numerical solution of the DAE system generates all the last components of each tensor A χ .Hence, the column sampling is restricted to indices i q+1 , . . .i d−1 , and all the values of i d in D χ d are kept.This special column sampling is denoted by J χ pod .It is performed randomly by using a low-discrepancy Halton sequence [32].The matrix V(I gap , :) must have linearly independent columns to ensure that the approximation is meaningful.Since V is a rank-r POD basis, there exists a set of s rows such that this property holds as long as s ≥ r.Here, I gap contains at least the interpolation indices related to V.This latter set is denoted by I χ , such that V(I χ , :) is invertible.In the numerical results presented hereafter, I χ is obtained using the Q-DEIM algorithm [33] that was shown to be a superior alternative to the better-known DEIM procedure ( [34], Algorithm 1).
Unlike the PSD, the gappy POD enables selecting a number of rows that exceeds the rank of the low-rank approximation: This makes it possible to share sampling points between matrices having their own rank.In this case, the interpolation property does not hold as in the PSD case (2).
T gap is the approximation of A by the product of three matrices: V, V(I gap , :) † and A(I gap , :).
The TT-cross approximation can be understood as a generalization of such a product of matrices.A tensor T ∈ R n 1 ×•••×n d is said to be in Tensor-Train format (TT format) if its elements are given by the following matrix products: where the so-called tensor carriages (or core tensors) are such that for k = 1, . . ., d: In the original definition of the tensor-train format [17], the leading and trailing factors (corresponding to G 1 (i 1 ) and G d (i d ) for any choice of i 1 and i d ) are respectively row and column vectors.Here, the convention r 0 = r d = 1 is adopted so that row matrices G 1 (i 1 ) and column matrices G d (i d ) can be interpreted as vectors or matrices depending on the context.
The TT format allows significant gains in terms of memory storage and therefore is well-suited to high-order tensors.The storage complexity is O(nr 2 d) where r = max(r 1 , . . ., r d−1 ) and depends linearly on the order d of the tensor.In many applications of practical interest, the small TT-ranks r k enable alleviating the curse of dimensionality [17].
The sequential computational complexity of the evaluation of a single element of a tensor in TT format is O dr 2 .Assuming that r is small enough, the low computational cost allows a real-time evaluation of the underlying tensor.Therefore, in terms of online exploitation, this representation conforms with the expected requirements of the surrogate model.Figure 1 illustrates the sequence of matrix multiplications required to compute one element of the tensor train.The objective of the proposed approach is to build for each physics-based tensor A χ an approximate tensor T χ given in TT format by using a nested row sampling of the simulation data.Algorithm 1 provides the set of matrices {G χ 1 , . . ., G χ d } that enable defining the tensor-train decompositions and aggregate sets for row sampling.It is a sequential algorithm that navigates from dimension one to dimension d − 1 of tensors A χ .
The method provided by Algorithm 1 is non-intrusive and relies on the numerical solutions of the DAEs in a black-box fashion.matrices: Row Sampling: From each χ, select a set of rows I χ k applying the Q-DEIM algorithm [33] to the basis V χ k .Define the union of all selected rows and the corresponding row selection matrix: and: Output Definitions: A χ,(k+1) Matricization: Define, formally, the matrix as the second matricization of the tensor A χ,(k+1) :  In the row sampling step, specific sets of interpolant rows I χ k are first determined independently for each output χ, but a common, aggregated set I k (10) is then used to sample the entries of all outputs.Indeed, computing the elements of all submatrices A χ k (I k , :) requires m k calls to the DAE system solver with: m k = Card(I k−1 ) ñk with I 0 = D 1 .Furthermore, the gappy POD naturally accommodates a number of rows larger than the rank r χ k for each approximation of A χ k , and considering a larger sample size for each individual χ is expected to provide a more accurate approximation.
The tensorization and matricization steps are purely formal.No call to the DAE system solver is done here.They define the way the simulation data must be ordered in matrices to be approximated at the next iteration.The recursive definition of the matrix A χ k implies that the latter is equal to the k th matricization of a subtensor extracted from A χ .Equivalently, the matrix A χ k corresponds to a submatrix of the k th matricization of A χ , as illustrated in Figure 3.To quantify the theoretical accumulation of errors introduced at each iteration, Proposition 1 gives an upper bound for the approximation error associated with a tensor-train decomposition built by the snapshot POD followed by the row sampling steps, when a full column sampling is performed.
the following inequality holds: where σ min and σ max refer to the smallest and the largest singular values of its matrix argument.

The proof is given in ([35], Proposition 12). Proposition 1 suggests that the approximation error
A χ − T χ can be controlled by the truncation tolerances set by the user.However, the bound ( 16) tends to be very loose, and the hypothesis (15) may be difficult to verify when the basis V χ k stems from a column sampling of the matrix A χ k .Hence, the convergence should be assessed empirically in practical cases.

Outputs' Partitioning as Formal Tensors
The physical model described in Appendix A is represented as the relations between six (d = 7) parameters (inputs of the model) and the time-dependent mechanical variables (outputs of the model): For each parameter, interval of definition is discretized by a regular grid with 30 points: The time interval discretized is the one used for the numerical solution; it corresponds to a regular grid with n t = 537 points.Then: The snapshot POD sample sizes are: ñ1 = ñ2 = ñ3 = ñ4 = ñ5 = 100 and ñ6 = 30

Performance Indicators
The truncation tolerance is chosen here to be = 10 −3 .The construction of the tensor-train decompositions requires solving the system of DAEs ∑ d−1 k=1 s k n k ñk times with as many sets of parameter values.In the proposed numerical example, it amounts to 514, 050 solutions.Fifteen hours were necessary on a 16-core workstation to carry out the computations.Ninety eight percent of the effort was devoted to the solution of the physical model and the remaining 2% to the decomposition operations.
For a single simulation on a personal laptop computer, the solution of the physical model took 0.7 s, whereas the surrogate model was evaluated in only 1 ms, corresponding to a speed-up of 700.
Storing the multiple TT approximations requires 2,709,405 double-precision floating-point values.For comparison purposes, storing a single solution (constituted by the multiple time-dependent outputs) of the DAE system involves 10,203 values.Therefore, the storage of the tensor-train decompositions is commensurate with the storage of 265 solutions, while it can express the approximation of 30 6 solutions.
For χ = 1, . . ., 4, the rank r χ k is bounded from above by the theoretical maximum rank r χ max,k of the matrix A χ k .More specifically, r χ max,k corresponds to the case where A χ k has full rank and is the k th matricizations of the tensors A χ .Given the choice of truncation tolerance = 10 −3 , the TT-ranks listed in Table 1 show that the resulting tensor trains involve low rank approximations.Table 2 emphasizes that in practice, r χ k r χ max,k except for k = 1 where r χ max,k is already "small".
Table 1.TT-ranks of the outputs of interest and theoretical maximum ranks.2. Ratio between the theoretical maximum ranks and the TT-ranks of the outputs of interest.

Approximation Error
The accuracy of the surrogate model is estimated a posteriori by measuring the discrepancy between its own outputs and the outputs of the original physical model.The estimation is conducted by comparing solutions associated with 20,000 new samples of parameter set values randomly selected according to a uniform law on each discretized parameter interval.The difference between the surrogate and the physical models is measured based on the following norms: where x and X ∼ are respectively scalar and tensor time-dependent functions.
For the mechanical variable Z (where Z can stand for any one of ε ∼ , ε ∼ vp , σ ∼ and p), Z PM and Z TT denote the output corresponding respectively to the solution of the DAEs and the surrogate model.A relative error is associated with each mechanical variable, namely: .
Depending on the parameter values, the viscoplastic part of the behavior may or may not be negligible as measured by the magnitudes of p and ε ∼ vp relative to ε ∼ .Hence, in the proposed application, the focus is on comparing the norm of the approximation error for ε ∼ , ε ∼ vp , and p with respect to the norm of ε ∼ .The histograms featured in Figure 4a-d present, for each mechanical variables, the empirical distribution of the relative error for all simulation results.The surrogate model given by the tensor-train decompositions features a level of error that is sufficiently low to carry out parametric studies such as the calibration of constitutive laws where errors lower than 2% are typically tolerable.
Figure 5a-d present the evolution of the relative error distribution (for the different mechanical variables) with respect to the truncation tolerance based on a random sample of 20,000 parameter set values chosen as in Section 3.3.Figure 6 details the graphical notations.The results empirically show for each mechanical output that the relative error decreases together with .This is consistent with the expected behavior of the algorithm.

On Fly Error Estimation
Based on the physical model, the surrogate model gives an approximation of each output of interest.However, the approximate outputs may be inconsistent with the physics in the sense that they may lead to non-zero residuals when introduced into (the discrete version of) the DAE system describing the physical model.
A coherence estimator is an indicator that measures how closely the physical equations are verified by the outputs of the surrogate model.It is reasonable to expect the accuracy of the metamodel to be correlated with the coherence estimator.
Using Equation (A1), let: and define the associated coherence estimator as follows: and the effectivity of the estimator as the following ratio: Figure 8 displays the relation between the relative error for σ ∼ and the effectivity of the estimator for 20,000 simulation results drawn randomly.The error increases with the final cumulative deformation, that is when the material exhibits a more intense viscoplastic behavior.Furthermore, the plot shows a correlation between the coherence estimator and the relative error.In particular, the effectivity tends to be larger than one, which indicates that the coherence estimator behaves like an upper bound of the relative error.Excluding a few outliers, the coherence estimator does not overestimate the relative error by more than a factor of seven.
Finally, the effectivity of the coherence estimator empirically converges to one (that is, the estimator becomes sharper) as the magnitude of the relative error increases.This coherence estimator is very inexpensive to compute and only relies on the outputs of the surrogate model.The results suggest that the coherence estimator could be used as an online error indicator that increases the reliability of the surrogate model at the current point when exploring in real time the parameter domain.

Discussion
The TT-cross decomposition enables building an approximation based on a sparse exploration of a reference tensor, by using the gappy POD.Several outputs of a parametric DAE are approximated, assuming they have d − 1 common indices and one specific index.The present work assesses the performance of tensor-train representations for the approximation of numerical solutions of nonlinear DAE systems.The proposed method enables incorporating a large number of simulation results ( 500,000 scalar values) to produce a metamodel that is accurate over the entire parameter domain.More specifically, numerical results show that the multiple TT decomposition gives promising results when used as a surrogate model for an elasto-viscoplastic constitutive law.For this particular application, the surrogate model exhibits a satisfying accuracy given the moderate computational effort spent for its construction and the data storage requirements.Moreover, the observed behavior of the proposed empirical coherence estimator indicates that the latter could be exploited to assess the approximation error in real time.
The application to more complex material constitutive laws of industrial interest and involving a larger number of parameters [35] corroborates the aforementioned results in terms of compactness and accuracy of the surrogate models.Surrogate models have the potential to transform the way of carrying out parametric studies in material science.In particular, Reference [35] demonstrates that the exploitation of models based on the multiple TT approach simplifies the process of the calibration of constitutive laws.Future work will investigate the combination of the proposed method with the "usual" model order reduction techniques such as hyper-reduction [36] in order to take into account the space dimension.

System of Equations
The elastic behavior is governed by: The viscoplastic behavior is described by the Norton flow rule (A2) formulated with the von Mises criterion (A5).The yield function and the normal to the yield function are given by (A3) and (A4).(A6) gives the definition of the deviatoric stress tensor involved in (A5).
where (.) + denotes the positive part function.
The operator : denotes the contracted product defined as: The nonlinear isotropic hardening is modeled by (A7) where (A8) gives the viscoplastic cumulative rate.
Finally, the linear kinematic hardening is given by: The case of a uniaxial cyclic tensile testing driven by deformation is considered.The loading is applied by imposing ε 11 (t) with the pattern shown in Figure A1 and σ 12 (t) = σ 13 (t) = σ 23  The initial conditions for the internal variables are: p(t = 0) = 0 and X ∼ (t = 0) = 0 ∼ The model is highly nonlinear.First, the isotropic hardening law introduces an exponential nonlinearity.The most significant nonlinearity arises from the Norton law (A2) featuring the positive part function.Capturing the resulting threshold effect is particularly challenging for surrogate models.

14 )
For each χ = 1, . . ., N χ , define the matrix G χ d ∈ R (s d−1 n χ d )×s d with s d = 1 such that: G χ d = A χ d (At each iteration k = 1, . . ., d − 1, the snapshot POD method, used to build the POD reduced basis (9), requires sampling a set J χ k .The column sampling amounts to a parsimonious selection of ñk points in the partial discretized parameter domain D k+1 × • • • × D d−1 and an exhaustive sampling of the last dimension for each tensor A χ .The considered submatrices Ãχ k = A χ k :, J χ k are then constituted of ñχ k = ñk n χ d columns (see Figure 2).

Figure 2 .
Figure 2. Definition of the submatrix Ãχ k used to construct the POD reduced basis.In the illustration, the snapshot POD sample size is ñk = 3.

Figure 3 .
Figure 3. Definition of A χ k based on A χ .In the illustration, the number of rows selected at the previous iteration k − 1 is s k−1 = 3.
×n χ d and its tensor-train approximation T χ constructed by Algorithm 1. Assuming that for all k ∈ [1 : d − 1]:

Figure 4 .
Figure 4. Empirical distribution of the errors for every mechanical variable.(a) Empirical distribution for e ε .The size of the histogram bucket is 0.009%; (b) Empirical distribution for e ε vp .The size of the histogram bucket is 0.008%; (c) Empirical distribution for e σ .The size of the histogram bucket is 0.024%; (d) Empirical distribution for e p .The size of the histogram bucket is 0.066%.

Figure 5 .
Figure 5. Empirical distribution of the relative approximation error for every mechanical variable.

Figure 6 .
Figure 6.The left and right sides are the first and third quartiles (respectively Q 1 and Q 3 ).The line inside the box represents the median.The reach of the whiskers past the first and third quartiles is 1.5 times the Interquartile Range (IQR).The crosses represent the outliers lying beyond the whiskers.The plots in Figure7a,b show the dependence of the number of stored elements and the number of calls to the physical model on .

Figure 7 .
Figure 7. Dependence of computational cost and memory storage indicators on .(a) Dependence of the number of calls to the physical model on ; (b) Dependence of the number of stored elements on .

Figure 8 .
Figure 8. Effectivity of the coherence estimator η σ (17) associated with σ.The color scale indicates the final cumulative deformation.

11 Figure A1 .
Figure A1.The applied strain component ε 11 (t) consists of a triangular pattern of period 400 s with a peak-to-peak amplitude of 2% centered at zero.
∼ vp , and p have the same units, but have different physical meanings.The surrogate model is defined by introducing N = 4 groups of outputs as tensors A χ .The formal tensors A 1 , ... A 4 are related to p, ε ), p(t) where ε ∼ , ε ∼ vp , σ ∼ have six components each and pis a scalar.ε ∼ , ε

Table A1 .
Parameter range of variations considered in the model.When applicable, the unit is indicated between brackets.Note that the dimension of K depends on the parameter n according to Equation (A2).