Data pruning of tomographic data for the calibration of strain localization models

The development and generalization of Digital Volume Correlation (DVC) on X-ray computed tomography data highlight the issue of long term storage. The present paper proposes a new model-free method for pruning the DVC data. The size of the remaining sampled data can be user-defined, depending on the needs concerning storage space. The data pruning procedure is deeply linked to hyper-reduction techniques. The DVC data of a resin-bonded sand tested in uniaxial compression is used as an illustrating example. The relevance of the pruned data is tested afterwards for model calibration. A new Finite Element Model Updating (FEMU) technique coupled with an hybrid hyper-reduction method is used to successfully calibrate a constitutive model of the resin bonded sand with the pruned data only.


Introduction
With the development and the generalization of digital image correlation (DIC) or digital volume correlation (DVC) techniques on Computed Tomography (CT) data, the volume of data acquired has drastically increased.This raises new challenges, such as data storage, data mining or the development of relevant experiments-simulations dialog methods such as model validation and model calibration.
In experimental mechanics, the access to full 3D fields like displacement or strain fields is far richer than 1D load-displacement curves.These data can drive finite element simulations for model calibration.Although extremely convincing, the increasing resolution of the full-field measurement tools, such as X-ray Computed Tomography, leads to an explosion of the volume of data to store.The long term storage of CT data sets is nowadays an issue (see van Ooijen et al. [1]).This paper proposes a numerical method for pruning 3D data set related to DVC when it becomes necessary to free up storage capacity.It aims to preserve the ability to identify constitutive equations reflecting strain localization.It is a mechanical based approach to prune DVC data.Original experimental data are preserved solely in a reduced experimental domain (RED).
Compression of data is known to be a convenient approach to restore storage capacity.For instance, MP3 files are a fairly common way to reduce the size of audio files for a daily use (see Pan [2]).However, a non negligible loss of information is needed, but controlled.The MP3 compression roughly consists in filtering certain components of the non-reduced audio file that are actually non-audible for most people.In other words, the MP3 algorithm was made to prune the audio data that are not absolutely necessary.Usually the compression rate is around 12. In the same philosophy, there can be a way to massively compress the experimental data taken from experiments with a controlled loss of information based on an algorithm that detects the pertinent information.This has been proposed in Cioaca and Sandu [3] by using a sensitivity analysis with respect to variations of calibration parameters.These parameters are the coefficients of a given model that should reflect the experimental observations.The result is that the pruned data are dedicated to a given model.In this paper, a model-free approach is proposed.It aims to make possible various calibrations with different models after data pruning.Here, the relevant information are local but situated in regions submitted to strain localization.
The data submitted to the pruning procedure are the outputs of a Digital Volume Correlation that reconstructs the displacement field u(x, t) for observations at time instants (t j ) j=1,...N t , over a spatial domain Ω, where x is a position vector.The geometry of the experimental sample is approximated by a mesh and the determined displacement is decomposed on finite element (FE) shape functions [4].
The proposed method can be linked to data pruning or data cleaning methods described in the literature for machine learning [5].The aim of these procedures are not to reduce data storage but to improve the data quality by accurate outliers detection for instance [6].In Hong et al. [7], a data pruning method is employed to filter the noise in the data set.
Using the FE approximation of the experimental fields paves the way to further simulations.In the calibration procedure, the full-field measurements are used as inputs of an inverse problem that aims to determine a given set of parameters µ = {µ 1 , . . ., µ m }.These parameters are the coefficients of given constitutive equations.Their values are unknown or not known precisely.The most straightforward method is called Finite Element Model Updating (FEMU) (see Kavanagh and Clough [8], Kavanagh [9]).It is a rather common way to optimize a set of parameters taking into account the experimental data and balance equations in mechanics.It consists in computing the discrepancy between the FE approximation of the experimental fields and the FE simulations.
Thus, an optimization loop is done on µ where the FE method is used as a tool for assessing the relevance of the parameter set.The objective function, or cost function, of the optimization can focus on the difference between the computed and experimental displacement fields (FEMU-U), forces (FEMU-F, or force balance method), or the strain fields (FEMU-ε) or a mix between all these sub-methods.A review of FEMU applications can be found in Ienny et al. [10].The method is particularly suitable for: • Non-isotropic materials (e.g.: orthotropic materials in Lecompte et al. [11] or Molimard et al. [12] or anisotropic materials such as the human skin [13]) ; • Heterogeneous materials such as composites [14]; • Heterogeneous tests like open-hole tests (e.g.: Lecompte et al. [11], Molimard et al. [12]) or CT-samples [15]; • Special cases of local phenomena like strain localization or necking (e.g.: Forestier et al. [16], Giton et al. [17]) or the illustrating case of the present paper; • Multi-materials configurations (e.g.: solder joints studied in Cugnoni et al. [18] or heterogeneous material identification done in Latourte et al. [19]); • Determination of the boundary conditions [20].
One of the recent developments concerning FEMU is to couple this method with reduced order models (ROMs) to cut down the computation time in the parameters optimization loop.An example of such recent developments can be found in Neggers et al. [21] where a method called FEMU-b is highlighted, or in Cugnoni et al. [22].The FEMU-b consists in determining an intermediate space of predominant empirical modes associated to a reduction procedure, like the Proper Orthogonal Decomposition (see Aubry et al. [23]) or the PGD [24].The discrepancy is computed between the experimental and simulated reduced variables, where the reduced variables are solutions of reduced equations.
In Ryckelynck [25], it has been shown that ROMs can be supplemented by a reduced integration domain, by following the hyper-reduction method.This leads the way for for data pruning methods that preserve calibration capabilities.Here, the dimensionality reduction of experimental data enables the restriction of experimental data to a reduced experimental domain (RED).This RED is a subdomain of the specimen where the experimental data are sampled.It is not necessarily a connected domain.The calibration capabilities of the proposed data pruning are assessed by using the FEMU with an hybrid hyper-reduction method (H 2 ROM) [26].Hence, the FEMU is not done on the complete domain but on the RED determined by the data pruning.The result is a fast calibration procedure, with low memory requirement and a validated data pruning protocol.
The remaining part of the paper is structured as follows.In Section 2, the proposed method for data pruning is described.The DVC is recalled.A dimensionality reduction then hyper-reduction are performed to compute the pruned data.The pruning procedure is applied in Section 3 on a resin-bonded sand tested in in situ uniaxial 2. Data pruning by following an hyper-reduction scheme

Digital Volume Correlation
Let's consider a specimen occupying a domain Ω undergoing a certain mechanical test.With image acquisition techniques, grayscale images are obtained in 3D.The Digital Volume Correlation aims to determine the displacement field u at every position x in Ω at a given deformed state at time t.f and g are the gray levels at the reference and deformed states.They are related by the equation: The conservation of the optical flow is linearized by assuming that the reference image is differentiable, hence the local residual to minimize, η(x, t) is: where ∇ f is the gradient of f .This local residual is integrated over the whole domain Ω to be a function to minimize: This is an ill-posed problem.To well-pose the problem, the displacement field can be restricted to a kinematic subspace.Here, the displacement field is assumed to be decomposed over a set of vector functions ψ j (x) that corresponds to the shape functions of a FE model defined on Ω.
where N d is the number of degrees of freedom of the mesh, a exp i the i th nodal degree of freedom in the FE model.a exp denotes the vector of degrees of freedom to be determined.With this restriction to the kinematic subspace, the objective function is now a quadratic form of the a exp i , and its minimization is a linear system, set up for each observation of a deformed state: where the matrix M and the vector f are: In the sequel, N t observations of the specimen deformation at time instants t j , j= 1, . . .N t , are considered.The DVC gives access to the final correlated displacement field u(x, t j ) for each observations, through the coefficient vector a exp (t j ), and the local residual η(x, t j ).From the displacement field, a strain field ε ∼ is extracted assuming small strains: This strain is thus calculated at each Gauss point of the mesh used for the DVC.For pressure dependent or plastic materials, it can be convenient to subdivide the strain field in its deviatoric part and its hydrostatic part: where I ∼ is the unit tensor.
It is worth noting that the pruning procedure only focuses on the displacement and not on the strain.It is considered that the strain can be computed in post-processing (thanks to Equation 8) and are not worth saving.
The strain tensor is actually considered as temporary data used to compute a reduced experimental domain.

Dimensionality reduction
The first step of the pruning procedure consists in performing a dimensionality reduction of the experimental data.It is based on singular value decomposition.This approach is similar to the Principal Component Analysis (PCA).But here, a reduced basis of empirical modes is obtained without centering the data.
The experimental data from DVC are saved into two matrices, Q u and Q ε defined as: and where e γ the γ th Gauss point, and: with N g being the number of integration points in the mesh.
For the sake of simplicity, we did not account for the symmetry of the strain tensor.
The first step of the pruning procedure consists in performing a first dimensionality reduction of the DVC data.Only the reduced basis and coordinate are kept instead of the snapshot matrix Q u .The procedure is also done on the snapshot matrix of the stain Q ε but not in order to reduce storage (as the stain data are not saved).The corresponding reduced basis is used as a temporary tool to compute afterwards the reduced domain.The determination of the empirical modes is performed thanks to a Singular-Value Decomposition (SVD): where V x ∈ R N d ×N x , with x = u or ε, is an empirical reduced basis for displacement or strain respectively, where tol is a numerical parameter (typically 10 −3 ).According to the Eckart-Young theorem, the matrix V x (V x ) T Q x is the best approximation of rank N x for Q x by using the reduced basis V x .
The relevance of the dimensionality reduction of the displacement data appears to be conditioned by the difference between the number of time steps N t and the order of the approximation N u , as and V u ∈ R N d ×N u .In situ tests observed in X-ray CT tend to have few time steps so the first dimensionality reduction may not be efficient.Moreover, due to the resolution of the Computed Tomograpy, data have generally an important number of degrees of freedom.In other words, the snapshot matrix Q u has a lot of lines (N d ) but few columns (N t ).The memory cost is mostly due to the number of dof of the problem.That is why the proposed pruning protocol is based on a hyper-reduction method in order to reduce significantly this number of dof.

Hyper-reduction
The proposed pruning method has its roots in the hyper-reduction method [27].An hyper-reduced order model is a set of FE equations restricted to a reduced integration domain (RID) when seeking an approximate solution of FE equations with a given reduced basis.In few words, this approach accounts for the low rank of the reduced approximation to set up the reduced equations of a given FE model.Let's explain this with a simple linear and elastic finite element model.Let's K ∈ R N d ×N d be the stiffness matrix of this FE model and c ∈ R N d the right hand side term of the following FE balance equations: where a FE ∈ R N d is the solution of the FE equations.For a given reduced basis of rank N R V ∈ R N d ×N R , the approximate reduced solution of the balance equations is denoted by a R such that: where b R ∈ R N R are the variables of the reduced order model.It turns out that the rank of K V must be N R in order to find a unique solution b R .Since N d is usually larger than N R , it exists a selection of few rows of KV that preserves the rank of the selected submatrix.By following the hyper-reduction method proposed in Ryckelynck et al. [27], this row-selection is achieved by considering balance equations set up on a reduced integration domain (RID).In former works on hyper-reduction, the RID were generated by using simulation data.
Here, the RED is similar to a RID, but its construction uses solely experimental data, that is to say that the reduced basis used to perform this row selection comes from Equations 14.That's why the pruning method is called a model-free approach.One of the advantages of such method is that the data pruning has not to be performed again if the constitutive model is changed.The RED is denoted by Ω exp R ⊂ Ω.For a given RED, a set of few degrees of freedom subscripts can be defined as: Hyper-reduced balance equations are restricted to the RED by using convenient test functions [27] such that: ( When the reduced basis contains empirical modes and few FE shape functions located in Ω exp R , the method is termed hybrid hyper-reduction [26].The RID must be large enough to have rank In the usual hyper-reduction method the RID is generated by the assembly of elements containing interpolation points related to various reduced basis.These reduced bases are extracted from simulation data generated by a given mechanical model for various parameter variations [27].Here, the RED construction is based exclusively on the reduced bases related to Q u and Q ε .The RED is the union of several subdomains: Ω u and Ω ε generated from the reduced matrices V u and V ε , a domain denoted by Ω + corresponding to a set of neighboring elements to the previous subdomains, and a zone of interest (ZOI) denoted by Ω user .In the sequel, Ω user is set up to evaluate the force applied by the experimental setup on the specimen.
Ω u is designed as if we would like to reconstruct experimental displacements outside Ω u by using V u and given experimental displacement in Ω u .On a restricted subdomain Ω u , we only have access to a restricted set of nodal displacements.The set of their indices is denoted by P u .The set of remaining displacement indices is denoted by H u such that a exp [H u ] is the vector to be reconstructed by knowing a exp [P u ].Various approaches have been proposed in the literature to perform this kind of reconstruction.They are related to data completion [28] or data imputation [29] for instance.Here, we have the opportunity to choose the set P u , because the reconstruction issue is only formal.By using the DEIM method proposed in Chaturantabut and Sorensen [30], we can obtain the set P u such that V η u [P u , :] is a square and invertible matrix.Then, in that situation, the number of selected degrees of freedom in P u is the number of empirical modes in V u .But in the present application, this set could be too small to get robust calibrations after data pruning.Then, we propose a modification of the DEIM algorithm in order to multiply the number of selected indices by a given factor K. We name this algorithm this algorithm is exactly the same as the usual DEIM algorithm in Chaturantabut and Sorensen [30].In the sequel, the set of selected indices by using K-SWIM is denoted by u .The same reasoning is applied to the reconstruction of the experimental strain tensors.The K-SWIM algorithm applied to V ε defines P For given sets of indices P (K) u and P ε , the RED is: where supp is the support of the function and ψ ε k are the shape functions related to the strain tensor in the FE model used to compute a exp .

Algorithm 1: K-SWIM Selection of variables With empIrical Modes
Input : integer K, linearly independent empirical modes v k ∈ R d , k = 1, . . .M Output : variables index set P (K) Algorithm 1 is properly defined if in line 3 the matrix (U l [P j , :]) T U l [P j , :] is invertible, for l > 1 with j = (l − 1) K, or equivalently if the following property is fulfilled.
Proof.Let's assume that U l [P j , :] T U l [P j , :] is invertible for l > 1 and j = (l − 1) K.Then, we compute q l .(v k ) M k=1 is a set of linearly independent vectors.So max i∈{1,...,d} |q Let's introduce the first additional index, j = (I − 1) K + 1, P j = P j ∪ {arg max i∈{1,...,d} |q I [i]|} and the following residual vector: Then, q l = q I [P j ] and q l 2 > 0. So U l+1 [P j , :] is full column rank.Since P j ⊂ P j+K , then U l+1 [P j+K , :] An other interesting property is the possible cancellation of the data pruning by using a large value of the parameter K in the input of Algorithm 1.The following property holds.
The RED covers the full domain and all the data are preserved.
Proof.By following Algorithm 1, for l = 1 with K = N d and The second property is quite restrictive.In practice, large values of K, with K < N d , enable to preserve all the data.The value of K has to be chosen according to the size of the memory that we would like the free up.
When the RED is available, the DVC experimental bases (V A reduced mesh, that is the restriction of the FE mesh to Ω exp R .
3. The load history applied to the specimen on the subdomain Ω user .
It is also advised to store the distribution of a value of interest in the full domain and in the reduced domain.
These data can be saved as histograms for example.In this present paper, the shear strain distribution was saved, as this variable is extremely interesting in the case of strain localization.The additional memory cost is actually negligible as it consists in storing a few hundred of floats.
Remark: when choosing to save Q exp u the noise is also saved, whereas in V exp u it is partially filtered, like in PCA analysis.
The data concerning the strains are not stored as they can be computed with the displacement data thanks to Equation 8.
Generally, in-situ experiments observed in X-ray CT do not have numerous time steps, hence the above dimensionality reduction does not reduce drastically the size of the data to store.This will be illustrated with the following example in Section 3. The hyper-reduction of the domain is actually the predominant step for data pruning.

Reduced mesh of the RED
In order to set up the hybrid hyper-reduced order model on the RED Ω exp R for the calibration procedure, we introduce a FE model restrained to the RED.This defines a reduced mesh.The FE shape functions of the reduced mesh are denoted by (ψ i ) i=1,...card(F ) such that: where F i is the i th index in the set F of degrees of freedom in the RED: card(F ) is the number of degrees of freedom of the reduced mesh.This changes the index numbering.The set F is then transformed into F such that: The complement set of F is denoted I.It contains the degrees of freedom in Ω exp R that are connected to Ω\Ω exp R in the full original mesh.For a given empirical reduced basis V, V ∈ R card(F)×N is its restriction to the RED: The hybrid FE/reduced approximation is obtained by adding few columns of the identity matrix to V. In this hybrid approximation, we only add FE degrees of freedom that are not connected to the degrees of freedom in I.The resulting set of degrees of freedom is denoted by R. In Baiges et al. [26] it has been shown that this permits to have strong coupling in the resulting hybrid approximation.Let's define the subdomain connected to I: Then we get: The hybrid reduced basis is denoted by V H .It reads, by using the Kronecker delta (δ ji ): We assume that: where V H contains V in its first columns and keep the same columns of the identity matrix (with renumbered rows) in the latest columns and K and c are computed on the reduced mesh of Ω exp R .This assumption is relevant in mechanical problems without contact condition, in the framework of first strain-gradient theory.We refer the reader to Fauque et al. [31] for the extension of the hybrid hyper-reduction method to contact problems.It follows that the hybrid hyper-reduced equations for a linear problem reads: In case of non linear problems, K is the FE tangent stiffness matrix computed on the reduced mesh and c is the opposite of the residual of FE balance equations in the reduced mesh.If the matrix (V does not have a full rank, it is suggested to remove the columns of V that cause the rank deficiency, incrementally by starting from the last column.When using the SVD to obtain V from data, the last columns have the smallest contribution in the data approximation.
Theorem 3. When Ω exp R = Ω then the hybrid hyper-reduced equations are the original FE equations on the full mesh. Proof.
and the reduced mesh is the original mesh.In addition, all the empirical modes have to be removed from V H to get a full rank system of equations.Hence V H is the identity matrix.So the hybrid hyper-reduced equation are exactly the original FE equations.There is no complexity reduction.
In the sequel, the empirical reduced basis is extracted from data restrained to the RED, by using the SVD.Let's denote by X the data available on the full mesh, before the data pruning.Then, after pruning, the empirical reduced basis is related to X = X[F , :]: with R < tol max(diag(S)).After pruning, V is no more a submatrix of a given V.
where 0 R is a vector of zero in R card(R) .This means that the hyper-reduced solution is exact and the FE correction in the hybrid approximation is null.
Proof.Let's introduce the matrix V = X W S −1 .Then, 3. Illustrating example: polyurethane bonded sand studied with X-ray CT

Material and test description
The material studied here is a polyurethane bonded sand used in casting foundry to mold the internal cavities of foundry parts.The resin makes bonds between grains and improves drastically the mechanical properties of the cores (stiffness, maximum yield stress, traction strength...).The material has been extensively studied with standards laboratory tests, focusing on macroscopic displacement-force curves.This casting sand has been experimentally investigated by Jomaa et al. [32] and Bargaoui et al. [33].These macroscopic data are completed with an in-situ uniaxial compression test studied in X-ray CT on an as-received sample.According to Bargaoui et al. [33], the process used to make the cores (Cold Box process) guarantees the homogeneity of the material.In the sequel, the resin bonded sand is supposed homogeneous.
The sample is a parallelepiped (20.0×22.4×22.5 mm 3 ).The load was increased (with a constant displacement rate of 0.5 mm/min) and the displacement was stopped at several levels, noted P i .During these stopped displacement periods, the sample was scanned with a tension beam of 80 kV and an intensity of 280 µA.P 0 corresponds to the initial state, before the appliance of the load.Then seven tomography scans were performed at increasing compressed states.At P 7 , the sample is broken.The bottom and top extremities were excluded from the images because of the artifacts induced by the plates.A grayscale image of the tested cemented sand is displayed in Figure 1.During the test, the reaction is measured at the top of the sample.It is plotted in Figure 2. The first 6 steps (non-broken sample) are situated before the peak of the loading curve.

DVC and error estimation
The displacement fields at these different stages were calculated using a 3D-digital image correlation (DVC) software named Ufreckles, developed by LaMCos (see Réthoré et al. [4]).A finite element continuum method is used to calculate the displacement field with a non-linear least square error minimization method.The chosen element size is near 0.5 mm.The final region of interest is 20.0×22.4×15.8mm 3 .The top of the sample has been excluded.The DVC is performed on a parallepipedic mesh composed of around 470,000 degrees of freedom.
The DVC showed that the pre-peak displacement field is extremely non-homogeneous as shown in Figure 3.The pre-peak noise is relatively homogeneous (Figure 4).The noise is more predominant at the first steps where the displacement is really small and thus the DVC reconstruction was more complicated.The test showed a complex and rich behavior of the material tested with namely a non-homogeneous displacement field and pre-peak strain bifurcations.The experimental data are very suited for testing the ability of a given model to predict such phenomena.Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Figure 3. u ax at the pre-peak steps (deformed x75) Step 1 Step 2 Step 3 Step 4 Step 5 Step 6

Building the reduced experimental basis
As precised in the data pruning procedure, the experimental displacement and strain snapshot matrices are computed.The attention is drawn to the fact that the studied test has not a lot of time steps (N t = 7) and the experimental mesh is not that big.The DVC matrices Q u and Q ε are respectively 474,405×7 and 1,774,080×14.
If the truncated SVD is applied on these matrices, only 6 modes are extracted for the displacement and 13 for strain.As the number of time steps is rather small, the use of empirical modes does not reduce the size of the experimental data, as stated before.
In other words, the experimental data are not suited for the dimensionality reduction.This method is efficient on matrices with numerous columns and rather few lines, whereas tomographic data tend to have the exact opposite : few columns (time steps) and a lot of lines (degrees of freedom).

RED after DVC on the specimen
The building procedure has three main inputs: • The empirical modes V u and V ε .
• The K parameter for K-SWIM (Algorithm 1).
• A Zone Of Interest (ZOI) Ω user .During the test, the loading curve was measured at the top of the sample.In order to compare computed and measured reactions for model assessment, the elements at the top of the mesh are considered as a ZOI.In the remaining, Ω + is one layer of elements around Ω u ∪ Ω ε ∪ Ω user .
The RED was determined varying the number K of selected lines in the K-SWIM Algorithm.Its influence is assessed in Figure 5.For K = 1, the standard DEIM algorithm selects very few degrees of freedom.Most of the RED is actually the ZOI.This is due to the relatively low number of modes contained in the reduced basis (only 6).This apparent issue can be overcome by selecting more lines during the K-SWIM algorithm.When increasing K, the number of degrees of freedom linearly rises.The attention is drawn on the fact that the resultant RED for K=25 or K=50 are discontinuous, as it is usually the case when using hyper-reduction methods.The newly selected zones are situated in the sheared regions.A summary of the different matrix sizes at each step is displayed in Table 1.As stated before, it's clear that for this kind of data, the PCA analysis does not reduce significantly the memory usage.The hyper-reduction scheme used allowed to save up to 85% of the memory space for the illustrating example.
Experimental Data Empirical Modes Pruned data  [34] for unbonded sand and clay to bonded geomaterials within the framework developed by Gens and Nova [35].The C-CASM has been extensively described in Rios et al. [36].The Modified Cemented Clay And Sand Model (MC-CASM) presented here has some modifications of the C-CASM: • Addition of a damage law whose equation is phenomenological (based on cycled compressive tests); • The hardening law of the bonding parameter b is different: a first hardening precedes the softening.It is supposed here that the polyurethane resin goes through a first hardening before breaking.
It is supposed here that the yield function was previously calibrated with standard laboratory tests.The calibration concerns the parameters involved in the different damage and hardening laws that can be more difficult to assess with macroscopic loading curves.In the continuation of the paper, the equivalent von Mises stress will be denoted q and the mean pressure p.The MC-CASM equations are summarized hereafter.

Yield function
The yield function, f , of the constitutive model is defined by: M, r, and n are constant parameters that control the shape of the yield function.p c is the preconsolidation pressure, that is to say the maximum yield pressure during an isotropic compressive test (see Roscoe et al. [37]).b is the bounding parameter modeling the amplification of the yield surface due to intergranular bonding.p t is the traction resistance of the soil defined by Gens and Nova [35] as: Where α is a constant parameter modeling the influence of the binder on the traction resistance.The yield function is supposed to be calibrated.This means that M, r, n, α and the initial values of p c and b are known.
The yield surfaces of the unbonded (blue) and bonded sand (red) are plotted in Figure 8.

Hardening and damage laws
The model has two hardening variables: the preconsolidation pressure p c and the bonding parameter b.The evolution of p c is directly controlled by the incremental plastic volumetric strain εp v , whereas b relies on a plastic strain damage measure h: The incremental value of h is defined as a weighting of the effects of the incremental plastic shear strain and the incremental plastic volumetric strain: The model also includes a damage law whose formulation is purely phenomenological: The hardening and damage laws provide m = 7 unknown parameters to calibrate.

Calibration protocol by using the hybrid hyper-reduction method
The FEMU-H 2 ROM is preceded by an off-line phase similar to an unsupervised machine learning phase.It consists in building the empirical reduced basis V that is mandatory to set up the hybrid hyper-reduced equations.
It is similar to the first step of the data pruning method: a snapshot matrix is constructed based on simulations and experimental results (and not on experiments only).
The starting point of the off-line phase is to assess the parameter sensibilities of the model starting from an initial guess µ 0 = {µ 0 1 , . . ., µ 0 m }.This guess can come from a previous calibration, or a calibration done using macroscopic force-displacement curves of standard tests without predicting strain localization.
The off-line calculations are performed on the full domain Ω and thus can be time consuming.The boundary conditions are the experimental displacements taken from the computed tomography imposed at the top and the bottom of the sample.The displacement field is not imposed inside the sample because one of the aims of the model is to correctly capture the strain localization appearing inside the sample during the test, under the constraint of balance equations.Imposing the displacement field inside the specimen gives less balance equation to fulfill.m calculations are made on Ω. Attention is drawn on the fact that these calculations can be done in parallel.Only the displacement snapshot matrices are needed.A total of m + 1 independent calculations are performed: • One initial calculation where µ = µ 0 , which gives Q u (µ 0 ); • m parameters sensibility calculations where µ = µ i = {µ 0 1 , . . ., µ 0 i + δµ 0 i , . . ., µ 0 m }, which give Q u (µ i ) Once done, these calculations are restricted to the reduced experimental domain Ω exp R .They are denoted Q u (µ i ) for i = 0, . . ., m.All these results have to be aggregated in one snapshot matrix X before the computation of the empirical modes V. Instead of concatenating the m + 1 matrices into one, a Derivative Extended Proper Orthogonal Decomposition (DEPOD) method is used (see Schmidt et al. [38]).This approach has been validated in previous works on model calibration with hyper-reduction Ryckelynck and Missoum Benziane [39].This allows to capture the effects of each parameter variation.
The first term αV Empirical modes depending on the factor α are displayed in Figure 9.For α = 0, that is to say without experimental data in the bulk, the empirical modes have strong fluctuations only at the top and the bottom of the specimen, where the experimental boundary conditions are imposed.This can be explained by the natural smoothing that ensures the finite element method with rather elliptic equations.Increasing the importance of the experimental data tends to naturally perturb the displacement field inside the sample.Even for strongly perturbed modes (α = 10) the last empirical mode is roughly smooth: this is due to the POD algorithm that filters the data.
In the sequel, we choose α = 1.The experimental data are as important as simulation data related to FE balance equations.
First mode Second mode Third mode Last mode Once V is available, the hybrid reduced basis V H can be defined.Then, the experimental reduced coordinates are projected on the empirical reduced basis to be compared during the optimization loop: For the proposed example, there is a fast decay of the singular value (see Figure 10 where POD is set to 10 −4 ).When this decay is not sufficient to provide a small number of empirical modes, we refer the reader to Ghavamian et al. [40], Peherstorfer et al. [41] and Haasdonk et al. [42] to cluster the data in order to divide the time interval and construct local reduced basis in time.
Proof.Let's consider a linear elastic FE problem.If Q ex u fulfills the FE equations on Ω ex R , with additional Dirichlet boundary conditions, then:

Cost function and parameters updating
In the optimization loop, a given set of parameters µ is assessed.The H 2 ROM calculations provide the reduced coordinates associated with the empirical basis previously determined on the RED denoted b R (µ).The top reaction F comp (µ) is also calculated as the average axial stress in the ZOI.
In the example, the cost function evaluates two scales of error: the microscale error between experimental and computed reduced coordinates and the macroscale error between the measured and computed top reactions.
The microscale error is defined as: The choice of the norm is user-dependent.The inverse covariance matrix of the displacement is the best norm for a Gaussian noise according to Tarantola [43], Kaipio and Somersalo [44] for a Bayesian framework.However, in this present study, to keep the treated problem rather simple, this error function is chosen: The macroscale error is defined as: is the top of the sample, where the experimental load was measured.The experimental load measurements are supposed uncorrelated and their variance is denoted by σ 2 F .In a Bayesian framework, for a Gaussian noise corrupting the load measurements [21], the previous equation can be written as: For the the optimization loop, the final objective function is a weighted sum of the two previous sub-objective functions: where c u and c F are the weights.They can be chosen to balance the two cost functions or to privilege one scale to another.In the illustrating example, the cost function is balanced.A classical Levenberg-Marquardt algorithm is employed for the minimization of the error function and the update of the parameters vector µ.

Model calibration and FEM validation
The optimization loop took 53 iterations.The speed ratio between FEM calculations and H 2 ROM predictions is around 70. Moreover the H 2 ROM predictions only needed around 3% of the FEM calculation memory cost.
The H 2 ROM predictions converge way more easily than the FEM calculations.The problem simulated in the optimization loop is a displacement imposed problem.The use of the reduced basis to predict the displacement field facilitates drastically the convergence.That explains also the important speed-up time that does not come only from the reduction of the integration domain.
Figure 11 displays the experimental and the computed top reactions (initial and optimized).At the end of the optimization loop, it is mandatory to assess the relevance of the H 2 ROM prediction.The FEMU-H 2 ROM is dependent on the initial guess µ 0 .This input determines the relevance of the reduced basis of the model after the parameters sensibility study and the DEPOD analysis.During the model updating, the parameters set can be too different from the initial guess.As a consequence, the empirical reduced basis V H may not be accurate and the H 2 ROM predictions will not be admissible.That is to say that the discrepancy between hyper reduced and Finite Element calculations may not be negligible.That's why the optimized parameters set µ * must be validated with FEM calculations on the full domain Ω.It is worth noting that if the experimental data are included in the DEPOD, the final H 2 ROM prediction should be close to the experiments.
In a similar manner to the optimization loop, an error function between both calculations can be defined focusing on the microscale (displacement error) and macroscale (top reactions differences).
Concerning the microscale, the discrepancy is only computed in the RED, as H 2 ROM predictions are only made on this domain and cannot be reconstructed in the full domain with this particular approach.
In the same manner, the macroscale discrepancy is: The microscale and macroscale errors should not exceed a few percents of the FEM calculations.In Figure 11 the FEM top reaction is plotted in orange.the RED.In the illustrating example, the computed and measured shear strain distributions were compared.The analysis is summarized in the histograms displayed in Figure 12 for the last pre-peak step.The discrepancy between computed and measured distributions was considered here as satisfying.
In the case of notable differences between H 2 ROM prediction and FEM calculations, or between FEM calculations and experiment, the FEMU-H 2 ROM is not validated.Two solutions are possible to overcome this issue: 1. Perform again the whole parameters sensibility study with µ 0 = µ * .This implies to do m parallel calculations again.The first solution should be performed in the case of strong differences between H 2 ROM prediction and FEM calculations.The second option "only" costs a FEM calculation.It is also possible to modify the optimization loop to include regularly FEM-H 2 ROM comparison and enrich V H incrementally.  Peer-reviewed version available at Math.Comput.Appl.2019, 24, 18; doi:10.3390/mca24010018 5. Discussion

Limitations of the pruning procedure
The present paper focused on DVC sets and not on the images themselves.They are known to be as-well particularly heavy and perhaps more problematic than the DVC data.The pruning procedure considers that they can be deleted.Actually, it can be problematic.For instance, new DVC algorithm could improve the determination of the displacement field (for example for complex problems involving cracks).
The images could be pruned too, in the sense that the only the pixels of the images inside the determined RED can be conserved.However, we preconize to store only the reduced DVC data when the data storage is an issue.
In the case of non homogeneous materials, the data concerning the inhomogeneity outside the RED must be saved as well.

A posteriori study of the RED
An a posteriori study of the determined RED was performed.The present discussion will focus on the shear strain distributions inside the whole domain Ω and the RED Ω exp R for the illustrating example.It would be preferable that the pruning procedure stores in the RED the most different configurations.The shear strain distributions in the whole domain and in the RED might be different (not the same mean value for example).The Figures 13 (a) and 14 (a) present the shear strain distributions at the first and last pre-peak step.It appears that the statistical distribution of the shear strain inside the RED is not the same than the one inside the full domain.
Nevertheless, zooms at both histograms in Figures 13 (b) and 14 (b) reveal that the extremum values of the shear strain are conserved.One can see that the RED contains nearly all the elements where the shear is maximal.
Even if the proposed procedure is model-free, it is intimately linked with the mechanics of solids: it will store preferably the data that is mechanically more relevant.For strain localization phenomenon, it is the most sheared zone.The proposed method is not statistical: it induces actually a sampling bias.

Conclusion
The present paper proposed a pruning data procedure for DVC data that is model free and versatile.The K-SWIM algorithm, through its parameter K, enables the user to define the size of the stored data.
The resultant data can still be used afterwards for calibration for instance.The use of hybrid hyper-reduction is particularly suitable for the pruned data as it enables a non-negligible reduction of memory and time costs in the FEMU optimization loop.The FEMU-H 2 ROM method is thus a new way to use massive DVC data for deeper mechanical studies.
exp u and V exp ε ) are restricted to Ω R and the data to be stored are: 1.The pruned data Q exp u or the reduced basis V exp u , where V exp u is obtained by the SVD applied on Q exp u only, the consecutive reduced coordinates b exp u

Then, the balance
equations of the hybrid hyper-reduced equations are fulfilled by [( b FE ) T , 0 T R ] T .If both hybrid hyper-reduced equations and FE equations have a unique solution respectively, then the solution of the hybrid hyper-reduced equations is b

Figure 5 .Figure 6 .
Figure 5. Influence of K in the K-SWIM algorithm the pruned experimental data.It is weighted by a custom parameter α that enables to give more impact to the experimental fluctuations in the empirical modes.The finite element methods tends to smooth these fluctuations thus provoking a certain loss of information.

Figure 9 .
Figure 9. u of the DEPOD modes depending on α

2 .
Concatenate the previously determined matrix X from Equation 40 with Q u (µ * ) and perform a new truncated SVD to determine ultimately an enriched reduced basis V H .No new FEM calculations are needed.

Figure 11 .
Figure 11.Result of the H 2 ROM optimization

Figure 12 .
Figure 12.Probability distribution of shear strain at the last pre-peak step in the whole domain Ω, comparing FEM calculation and experimental data

Figure 13 .Figure 14 .
Figure 13.Shear strain distributions in the whole domain and in the RED at the first step

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 13 November 2018 doi:10.20944/preprints201811.0309.v1 Peer
-reviewed version available at Math.Comput.Appl.2019, 24, 18; doi:10.3390/mca24010018compression with X-ray tomography.In Section 4, the calibration of an elastoplastic model enables to validate the pruning protocol.Notations 2 nd order tensors are denoted by a ∼ .Matrices are denoted by capital bold letters A and vectors are denoted by bold lower-case characters a.The colon notation is used to denote the extraction of a submatrix or a vector (at column i for example): a = A[:, i].Sets of indices are denoted by calligraphic characters A. The element of a matrix A at row i and column j is denoted A ij or A α [i, j] when the matrix notation A α has a subscript.a is the restriction of a to the reduced experimental domain.

Table 1 .
Size of the matrix stored at each step of the data pruning

) Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 13 November 2018 doi:10.20944/preprints201811.0309.v1
Peer-reviewed version available at Math.Comput.Appl.2019, 24, 18; doi:10.3390/mca24010018and if both hybrid hyper-reduced equations and FE equations on Ω ex R are unique, then the solution of the hybrid hyper-reduced equation is the exact projection of the experimental data on the empirical reduced basis b the unique solution of the hybrid hyper-reduced equations, and the exact projection of the restrained FE solution.The last property does not imply that imposing a FE (t j , µ)[I ] = Q ex u [I, j] as a boundary condition to degrees of freedom in I is the best way to fulfill FE balance equations.In fact, with the additional boundary conditions on I, the maximum of available FE equations is card(F ).The Property 4 means that if the empirical reduced basis is exact, then all the N d FE balance equations are fulfilled in Ω.In a sense, in the proposed calibration protocol, we better trust in FE balance equations than in experimental data.Accurate FE balance equations can be obtained by a convenient mesh of Ω, although noise is always present in experimental data.

preprints.org) | NOT PEER-REVIEWED | Posted: 13 November 2018 doi:10.20944/preprints201811.0309.v1
Peer-reviewed version available at Math.Comput.Appl.2019, 24, 18; doi:10.3390/mca24010018 It is clear that its value is extremely close to the one computed thanks to H 2 ROM.The error is around 1% at each step.This final verification is purely numerical.If the H 2 ROM predictions are validated, it is advised to analyze deeper the full field FEM calculation.The calculation and the experiment can be extremely different in the zones outside Preprints (www.