The Use of Machine-Learning Techniques in Material Constitutive Modelling for Metal Forming Processes

: Accurate numerical simulations require constitutive models capable of providing precise material data. Several calibration methodologies have been developed to improve the accuracy of constitutive models. Nevertheless, a model’s performance is always constrained by its mathematical formulation. Machine learning (ML) techniques, such as artiﬁcial neural networks (ANNs), have the potential to overcome these limitations. Nevertheless, the use of ML for material constitutive modelling is very recent and not fully explored. Difﬁculties related to data requirements and training are still open problems. This work explores and discusses the use of ML techniques regarding the accuracy of material constitutive models in metal plasticity, particularly contributing (i) a parameter identiﬁcation inverse methodology, (ii) a constitutive model corrector, (iii) a data-driven constitutive model using empirical known concepts and (iv) a general implicit constitutive model using a data-driven learning approach. These approaches are discussed, and examples are given in the framework of non-linear elastoplasticity. To conveniently train these ML approaches, a large amount of data concerning material behaviour must be used. Therefore, non-homogeneous strain ﬁeld and complex strain path tests measured with digital image correlation (DIC) techniques must be used for that purpose.


Introduction
Considerable developments in the field of computational mechanics and the growth in computational power have made it possible for numerical simulation software to emerge and completely revolutionize the engineering industry. In fact, industry's high-demandfuelled and fast-paced development cycles have accelerated the adoption of computational analysis software [1,2]. With a wide array of software solutions in the market, capable of virtualizing entire design workflows, numerical simulation has become a vital component of the product development process. Indeed, this virtualization has allowed companies to massively reduce the number of time-consuming experiments between design iterations, thus reducing delays and costs and allowing companies to reach higher levels of competitiveness [3]. In this context, finite element analysis (FEA) has been widely used as a powerful numerical simulation tool in engineering analyses, such as structural analysis, heat transfer and fluid flow. In FEA, the material behavior is defined by specific constitutive models for which the ability to adequately describe a given material directly impacts the reliability of the numerical results. As such, material characterization has received increasing attention given the need for computational analysis software for precise material data [3].
Conventionally, constitutive laws are established according to first-principle assumptions in order to generalize experimental observations made on simple loading regimes. In The paper is structured as follows: in Section 2, the fundamentals of the conventional approach to material modelling are briefly explained and the mathematical formulations applied to elasticity and plasticity are introduced. In Section 3, the application potential of ML/ANNs to different sub-fields of material constitutive modelling is discussed, while also introducing the concept of implicit material modelling. In Section 4, a literature review on the latter topic is presented, as well as a discussion concerning its implementation into FEA codes; finally, in Section 5, concrete application examples of the different approaches discussed in Section 3 are presented.

Phenomenological Approach
Material modelling refers to the development of constitutive laws representing the material behavior at the continuum level. Conventionally, constitutive laws are developed based on simplified assumptions and empirical expressions. Such expressions rely on empirical parameters, computed or calibrated via experimental methods. The calibration process ensures the compatibility between the observed and numerically simulated mechanical responses [5,24]. As models become more advanced in their physical basis, empirical expressions get more complex, often requiring extensive experimental campaigns [2].
The highly complex nature of material behavior and the research of new materials in recent years has led to the development of a great number of constitutive models targeting different phenomena. In the framework of plasticity, one can highlight the classic models by Tresca, von Mises [25] or Hill [26]. Such models are formulated as flow rules based on the yield function of the material.

Elastoplasticity
A material under load suffers plastic deformation starting at the uniaxial yield stress σ y . Beyond this point, hardening phenomena take place. Reversing the load at a given strain ε, plastic deformation ceases and stress starts to linearly decrease with strain, with a gradient equal to the Young's modulus, E [27]. Once the stress reaches zero, the strain remaining in the material is the plastic strain ε p , whereas the recovered strain ε e is the elastic strain. The following relationships are easily derived [25]: There are plenty of developed yield criteria, with the most commonly used in engineering practice being that of von Mises. The criterion relies on the knowledge of an equivalent stressσ, the von Mises stress, defined in tensor notation as [27]: where σ is the deviatoric stress tensor. Similarly, an effective plastic strain rateṗ can be established, such that:ṗ The yield function, thus, assumes the following representation [27]: written here as a function of the stress σ and accumulated plastic strain p, such that: The performance of the material thus depends on the previous states of stress and strain, that is, plasticity is path-dependent [18]. In this case, the plastic range can be characterized by two internal variables: the back stress tensor χ, representing kinematic hardening and the drag stress R representing isotropic hardening. The yield function is, therefore, modified as [27]: where k is a material constant and J is a distance in the stress space. For large deformations, the elastoplaticity can be formulated based on the co-rotational configuration. An in-depth overview of this can be seen in [28].

Machine Learning Approaches for Constitutive Modelling in Metal Forming
Material constitutive models are mathematical formulations representing the main characteristics, behaviours and functions of the materials. Generally, continuum mechanical models, such as the ones presented in Section 2, are written in a differential form and require a calibration procedure. However, even with a very robust calibration, known in the field as parameter identification, the accuracy of the model is always constrained by its mathematical formulation. Additionally, the development of analytical constitutive models requires a large effort in both time and resources. Today's state-of-the-art constitutive models required decades to develop and validate. However, there is still a margin for multiple improvement opportunities, particularly using ML techniques, such as the one presented in Appendix A. The most straightforward fields of application are the following:

Parameter Identification Inverse Modelling
One category of inverse problems in metal forming is called parameter identification. The aim is to estimate material parameters for constitutive models, i.e., estimate parameters to calibrate the models. The development of new materials and the effort to characterize the existent ones led to the formulation of new complex constitutive laws. Nevertheless, many of these new laws demand the identification of a large number of parameters adjusted to the material whose behaviour is to be simulated. In cases where the number of parameters is high, it might be necessary to solve the problem as an inverse non-linear optimization problem.
The parameters' determination should always be performed confronting mathematical/numerical and experimental results. Therefore, the comparison between the mathematical model and the experimental results (observable data) is a function that must be evaluated and minimized. This is generally the objective function of these problems [29][30][31].
When using the finite element method (FEM) to reproduce the material behaviour, the most commonly used method is the finite element model updating (FEMU) technique, which consists of creating a FEM model of a mechanical test, collecting displacement or strain components at some points/nodes and calculating a cost function with the difference between numerical and experimental displacements at these nodes. Minimizing this cost function concerning the unknown constitutive parameters provides the solution to the problem. This method is very general and flexible and can be used with local, average or full-field measurements. Other methods have been also proposed, such as the constitutive equation gap method (CEGM), the equilibrium gap method (EGM), the reciprocity gap method (RGM) and the virtual fields method (VFM) [32]. A review of these methods can be found in [2].
Considering the direct problem, which is to obtain the strain, displacement and stresses given the input variables (such as boundary conditions, geometry, and material parameters), an inverse problem can be devised in a given way to obtain the material parameters and stresses, when the inputs are the geometry, boundary conditions, displacements, and strains. In this context, ML can be used to solve the inverse problem efficiently, modelling the inverse process using the direct problem results as training data. The solution for the direct problem can be easily obtained using FEA software. This approach is illustrated in Figure 1.

Constitutive Model Corrector
In the last decades, a large number of analytical constitutive models were developed. Although a large effort has been put in these successive developments, a model is always a simplification of reality and is always constrained by a mathematical formulation. Nevertheless, the effort and the knowledge gained in these last decades should not be neglected for the development of new constitutive models. Therefore, one approach to develop new constitutive models using ML techniques can take advantage of this knowledge, as a starting point, and can be used as what can be called a model corrector or model enhancement.
The approach of the model corrector starts after the calibration of the well-known analytical model. It consists of confronting the real observations, generally from experimental tests, with the numerical simulations. After the parameter identification process, the differences between the real and numerical values can be collected and evaluated in a way similar to a data decomposition process (see Figure 2). These differences do not come only from the errors of the measurement techniques but also from the limitations (non-mathematical flexibility) of the analytical model. Then, the ML model is used to model the differences as a different (refined) scale model. For accurate simulations, both analytical and ML corrector models must be introduced in the FEA software, resulting in a hybrid constitutive model.

Data-Driven Constitutive Model Using Empirical Known Concepts
An upper level of the previous approaches is using ML exclusively to model the behaviour of the material. However, the architecture and the development of the ML model considers some physical and empirical knowledge gained in the last decades in the development of analytical models, which are numerically implemented a posteriori. An example of this influence is the selection of the input and output features of the ML model. Generally, these ML material models that use empirical knowledge are developed in a differential point of view, such as the known analytical models, and all output features are time-derivatives of the input features. Additionally, all features, including internal variables, have specific physical or empirical representations or concepts, such as hardening, grain size, recovery, etc.
These ML material models can easily fully replace the existent constitutive models implemented in FEA software because both share the same type of structure and input/output features (see Figure 3). Although the majority of the implementations are done explicitly, these models can theoretically be implemented in implicit FEA codes, such as ABAQUS [22]. To train these models, a non-straightforward analysis and decomposition of the experimental data is required for each specific feature of the model. This procedure takes into account the meaning of each feature (i.e., of each internal variable). . ML model as a full constitutive model, which is implemented into a FEA simulation software. The ML material model is trained indirectly using full-field measurements from experimental tests. This approach can optionally use empirical known concepts.

General Implicit Constitutive Model Using Data-Driven Learning Approach
The use of ML technology to fully model the behaviour of the material without any preconception, knowledge, analytical or mathematical formulation is the ultimate goal of this scientific thematic. Although not mature, this approach has a large potential because (i) the material behavior is caused by a multitude of phenomena too complex to be handled individually, as it is traditionally done in the development of analytical models; (ii) the amount of data now retrieved using full-field methods and heterogeneous tests is difficult to be analysed in the same way as the data previously acquired from classical homogeneous mechanical tests and (iii) no previous concepts or knowledge are required, and preconceptions can mean constraints for disruptive developments.
An illustration of the development of an ML constitutive model using full-field experimental data and its integration into FEA simulation software is depicted in Figure 3. The most difficult step in the development of this approach is training the model, which, in this case, does not use any physical knowledge as a training constraint. Additionally, output features of the model, such as components of the stress tensor, cannot be measured in real experimental mechanical tests.

Review on Implicit Data-Driven Material Modelling
The concept of implicit material modelling relies on the material behavior being captured within the connections of an ANN trained using experimental or even synthetic data. That is, the relationships between stress and strain are learned purely based on data without resorting to a mathematical model or any kind of assumptions. The neural network should not only be able to reproduce the data it was trained on, but also possess the ability to approximate the results of other experiments. In this regard, while explicit material models may often not be able to reproduce new sets of experiments [8], ANNs are highly flexible and adaptable in the sense that these can be improved by being retrained with new data [33,34].
The possibility of extending neural network concepts to constitutive modelling was first explored by Ghaboussi, Garret and Wu [35]. The authors trained feedforward neural networks with backpropogation using experimental data and tested them against several well-known analytical models describing the behavior of concrete in plane stress subjected to two load settings: monotonic biaxial loading and uniaxial compressive cyclic loading. For the first setting, both stress and strain-controlled models were developed in order to tackle the path dependency of the material behaviour. The models were trained to predict strain and stress increments, respectively, using the current states of stress and strain as inputs, with the addition of a stress or strain increment. For the cyclic load setting, the model was trained to predict the strain increment based on the current state and the previous two states of stress and strain. The authors reported promising results for both load settings with the ANN-based models showing, in some cases, better performance than some of the analytical formulations. With their research, Ghaboussi and coworkers effectively proved the powerful interpolation and generalization capabilities of ANNs applied to constitutive modelling tasks, thus laying the foundations for the further development of implicit material models. Their methodology was later applied to the constitutive modelling of concrete (Wu and Ghaboussi [36]), sand (Ghaboussi et al. [37] and Ellis et al. [38]) and composites (Ghaboussi et al. [39]).
An early application of the implicit modelling approach in the framework of metal plasticity is due to Furukawa and Yagawa [18]. The authors introduced the first mathematical description of the concept relying on a state-space representation and proposed an implicit viscoplastic model using neural networks based on the well-known Chaboche's model. The model was trained using synthetic data and was tested against experimental data. Due to the fact that the internal variables of the Chaboche's model are not experimentally obtainable, a simple technique was proposed in order to derive the states of kinematic and isotropic stresses from the experimental curves. Overall, the authors reported that the implicit model correlated well with the results provided by the conventional formulations when applied to testing data within the training range, certifying the great interpolative capacity of the neural network. However, considerable errors were observed in stress and stain curves when using testing data far off the limits of the training set, indicating the limited generalization capacity of the model.
Following a purely data-driven approach, several authors successfully applied very similar methodologies to train ANN-based constitutive models using experimental data to predict the flow behavior of different metals under hot deformation (i.e., compression, torsion). Many industrial and engineering applications require the hot deformation of materials which are associated with numerous metallurgic phenomena, affecting the structure of the material at the micro-scale. As such, several factors affecting the flow stress generate highly complicated and non-linear relationships, limiting the application range and accuracy of the conventional constitutive models. Li et al. [40] studied the high temperature flow characteristics of Ti-15-3 alloy for the determination of hot-forging processing parameters and used an ANN model to predict the flow stress, finding that the ANN was able to correctly reproduce the behavior in sampled and non-sampled data. Lin et al. [12] used an ANN to predict the flow stress of 42CrMo steel under hot compressive deformation tests, reporting that the predictions were in good agreement with the experimental results. Mandal et al. [41] stated that the ANN-based model was an efficient tool to evaluate and predict the deformation behavior of stainless steel type AISI 304L. Sun et al. [42] developed a constitutive relationship model for Ti40 alloy and indicated the predicted flow stress by the ANN-based model correlated well with experimental results. Li et al. [43] reported that an ANN-based model showed better performance and proved to be more accurate and effective in predicting the hot deformation behavior of modified 2.25Cr-1Mo steel than the conventional constitutive equations.

Neural Network Architectures and Generalization Capacity
The implicit approach to material modelling based on ANNs presents obvious advantages; however, important choices regarding network architecture and the training process have to be taken into account in order to avoid limited predictive capabilities. For example, non-linear material behavior is highly path-dependent, making it difficult for ANNs to learn and generalize well the full spectrum of material behavior. The issue is accounted by Ghaboussi and coworkers [35] in their original work. Aiming to improve the generalization capacity of the ANN-based material models, Ghaboussi and Sidarta [20] later proposed a new neural network architecture: nested adaptive neural networks (NANNs). Their proposal is based on the fact that most data have an inherent structure which, in the case of material behavior, is a nested structure that can be observable in terms of: • Dimensionality-where constitutive funtions are taken as members of function spaces, each of which is contained in a space of subsequently higher dimensions; • Path-dependency-where a state of the material behavior represented with a given number of history points is a subset of another state represented with a larger number of history points.
In line with this notion, the underlying concept of NANNs is to train a base module to represent the material behavior in the lowest function space and, subsequently, augment it with additional modules representing higher function spaces ( Figure 4). All the nodes in each layer of the new added module are connected to all the nodes in the next layer of the lower levels. Each module is a standard FFNN trained using adaptive evolution; that is, neurons are subsequently added to the hidden layers as network approaches its capacity. With new nodes, new connections are generated, and the objective is for the new links to capture the knowledge that the old ones were not able to capture. As such, training is carried out for the new connections while the old ones are frozen. With this new technique, Ghaboussi and Sidarta [20] reported a significant error reduction when using high-level NANNs.ŷ In more recent works, regarding the scope of the path-dependency of material behavior, several authors (Abueidda et al. [44], Heider et al. [45], Ghavamian and Simone [46], Mozaffar et al. [47] and Gorji et al. [48]) demonstrated that RNNs can be particularly useful for modelling path-dependent plasticity, reporting superior performance compared to conventional ANNs. Contrarily to the latter, RNNs are designed to handle time sequences [47,49] and are equipped with history-dependent hidden states, enabling them to carry information from previous inputs onto future predictions. These internal variables have the potential to emulate the role of objects such as plastic strains and back stresses in physics-based plasticity. Nevertheless, standard RNNs suffer from vanishing/exploding gradients that hinder the error backpropagation process while dealing with long sequences. Long short-term memory (LSTM) and GRUs are more robust derivations of the standard RNNs ( Figure 5) proposed to avoid those problems.  Apart from generalization issues, a much more fundamental problem is emphasized by several authors [20,35,50,51]: the "black-box" nature of ANNs does not guarantee the implicit material model to obey the conservation laws, symmetry and invariance requirements of conventional models. Nonetheless, if a neural network has been trained with a comprehensive dataset, it is reasonable to assume that it would be able to approximate the laws that the actual material obeys and use its generalization capability to appropriately predict stress-strain paths not included in the training data. However, building comprehensive datasets means acquiring large amounts of data. The cost of acquition is often prohibitive, and models end up being built in a small data regime.
To tackle these issues, Raissi et al. [51] proposed a new type of neural network: physicsinformed neural networks (PINNs). The authors exploit the use of automatic differentiation to differentiate neural networks with respect to their input coordinates (i.e., space and time) and model parameters to obtain PINNs. The technique allows to constrain neural network models to respect symmetries, invariances, or conservation principles originating from the physical laws that govern the observed data. This type of information, encoded in such a way into the neural network, acts as a regularization agent that greatly reduces the space of admissible solutions. The methodology was proven by the authors as able to tackle a wide range or problems in computational science, allowing the use of relatively simple FFNN architecture trained with small amounts of data. The general methodology proposed by Raissi and coworkers [51] was applied to constitutive modeling in a recent work by Masi et al. [50] through the proposal of thermodynamics-based artificial neural networks (TANNs), a subtype of PINNs. In TANNs, basic laws of thermodynamics are encoded directly in the architecture of the neural network. In the framework of material modelling, these thermodynamic constraints concern the stresses and the internal state variables and their relationship with the free-energy and the dissipation rate. Masi and coworkers [50] report that TANN-based models were able to deliver thermodynamically consistent predictions both for seen and unseen data, achieving more accurate results than standard ANN material models. Moreover, the authors state that the inclusion of the laws of physics directly into the network architecture allows for the use of smaller training datasets given that the network already does not have to capture the underlying relationships from data.
Xu et al. [52], on the other hand, note that even though these physical constraints allow for improved accuracy in stress predictions, ANN-based constitutive models may still not yield a symmetric positive definite stiffness matrix. Therefore, nonphysical solutions and numerical instabilities arise when coupling ANN-based models with FEM solvers. The authors propose symmetric positive definite neural networks (SPD-NN) as a new architecture in order to deal with these issues. The neural network model is trained to predict the Cholesky factors of the tangent stiffness matrix, from which the stress is obtained. According to Xu et al. [52], the approach weakly imposes convexity on the strain energy function and satisfies the Hill's criterion and time consistency for path-dependent materials. The authors compared the SPD-NN with standard architectures in a series of problems involving elasto-plastic, hyper-elastic and multi-scale composite materials. The results showed that, overall, the SPD-NN was able to provide superior results, proving the effectiveness of the approach for learning path-dependent, path-independent and multi-scale material behaviour.

Integration in Finite Element Analysis
An ANN-based constitutive model can be used in a way similar to conventional material models in FEA to solve boundary value problems. An important disadvantage regarding the numerical implementation of ANN-based material models in FEA was pointed out by Ghaboussi and Sidarta [20]. The issue is related to the fact that the ANN model already provides an updated state of stress, without the need for integration of the model's equations over the strain increment. Therefore, it is not possible to obtain an explicit form of the material's stiffness matrix. As a solution, the authors propose an explicit formulation of the material's stiffness matrix for the ANN-based material model, allowing for an efficient convergence of the finite element Newton-Raphson iterations.
Successful embedding of ANNs as material descriptions in finite element (FE) codes were recorded by several authors. Usually, the ANN architecture is implemented in the software's subroutines as a set of matrices containing the optimized weights obtained during training. Substantial gains in computational efficiency using ANNs have been reported due to the practically instantaneous mapping offered by the ANNs. Lefik and Schrefler [21] trained a simple ANN as an incremental non-linear constitutive model for a FE code and demonstrated that a well-trained, albeit simple, network model was able to reproduce the behavior from simulated material hyteresis loops.
Kessler et al. [53] developed an ANN-based constitutive model for aluminum 6061 based on experimental data for different temperatures, strains and strain rates. The authors implemented the model in Abaqus VUMAT and reported the ANN's superior ability to mirror the experimental results.
Jang et al. [10] proposed a constitutive model to predict elastoplastic behavior for J2-plasticity, where an ANN was implemented in Abaqus User MATerial (UMAT) to replace the conventional nonlinear stress-integration scheme under isotropic hardening. The ANN was used only for nonlinear plastic loading, while keeping linear elastic loading and unloading relegated to the physics-based model. The authors reported that the ANN-based model provided results in good agreement with those from the conventional model for single-element tensile test and circular cup drawing simulations.
Zhang and Mohr [54] rewrote a standard algorithm for von Mises plasticity such that the stress update was governed by a single non-linear function predicting the apparent modulus (i.e., the Young's modulus in case of elastic loading/unloading, and the elastoplastic tangent matrix in case of plastic loading). This function was then replaced by an FFNN and implemented as a user material subroutine in Abaqus/explicit. The authors demonstrated that a neural network with 5 hidden layers and 15 neurons per layer was able to "learn" the yield condition, flow rule and hardening law from data, providing highly accurate responses to the predictions of a J2 plasticity model, including the large deformation response for arbitrary multi-axial loading paths and reverse loading.

Indirect/Inverse Training
The vast majority of the approaches documented in the literature for implicit constitutive modelling consists of feeding the ANN with paired data (usually, stress and strain) during the training process in order to assimilate the material behavior [55]. The process requires copious amounts of data, and obtaining comprehensive stress-strain relationships while relying on the standard simple mechanical tests poses a great challenge, especially when dealing with anisotropic materials. Moreover, in a complex experiment, certain variables such as stresses cannot be directly measured [56]. Therefore, the training process must be carried out in such a way that allows the ANN-based model to indirectly predict the stress using measurable data, e.g., displacements and global force, obtained from full-field measurements [56]. One of the first approaches to bypass these limitations was proposed by Ghaboussi et al. [39], consisting of an auto-progressive method to train an ANN to learn the behavior of concrete from experimental load-deflection curves. However, the method is inefficient due to the fact that FEM simulations are needed in order to compute the stress and strain.
The generalized SPD-NN architecture proposed by Xu et al. [52] was able to deal with both direct and indirect training procedures. For the latter, the authors used displacement and force data to indirectly train the SPD-NN-based model and predict the stress in an incremental form by coupling the ANN with a non-linear equation solver. Despite providing superior results compared to standard architectures, the authors point out the requirement of full-field data as the main limitation, especially for three-dimensional solid bodies.
Both Huang et al. [56] and Xu et al. [57] proposed a method to use an ANN to replace the constitutive model in a FEM solver in order to impose the physical constraints during training, enabling the ANN to learn material behaviour based on the experimental measurement. Nonetheless, the approach highly depends on a full-field data for force and displacements, while, in most experiments, only partially observed data are usually available, limiting its application to more complex three-dimensional bodies. A different method was reported by Liu et al. [55] who propose coupling an ANN with the FEM to form a coupled mechanical system, with the inputs and outputs being determined at each incremental step ( Figure 6). With this, the ANN is able to learn the constitutive behavior based on partial force and displacement data. The authors applied the methodology to learn nonlinear in-plane shear and failure initiation in composite materials, achieving good models with results in excellent agreement with the analytical solutions. The main drawback is the need to write a FEM code based on an automatic differentiation package (e.g., Pytorch or TensorFlow), making the FEA step very time-consuming. In order to overcome this limitation, Tao et al. [58] propose a similar approach to the one proposed by Liu, with the ANN model being coupled with FEA software Abaqus. However, the authors propose a modified set of backward propagation equations that enable the communication between Abaqus and the ANN, keeping the whole training process internally in Abaqus.  A similar approach to Liu et al. [59] is here proposed; however, instead of coupling the ANN with a FEM solver to enforce the physical constraints, the virtual field method (VFM) is used to guarantee the global equilibrium ( Figure 7). The VFM, first introduced by Grédiac [32], is known by its computational efficiency and does not require FEA in order to conduct any forward calculations [60]. The key elements behind the VFM are the principle of virtual work (PVW) and the choice of virtual fields. According to the PVW, the internal virtual work must be equal to the external virtual work performed by the external forces and is written by [61]: where ε * is the virtual strain, u * is the virtual displacement and T is the global traction vector. The virtual fields are mathematical test functions which work as weights and can be defined independently of the measured displacements/strains. An infinite number of virtual fields can be used; nonetheless the following two conditions should be met [4,61]: the chosen virtual fields should be kinematically admissible; that is, the displacement boundary conditions must be satisfied and the virtual fields should be zero or constant along the boundary where the force is applied. By coupling the VFM with the ANN model, one can use force and displacement data, numerically generated or coming from full-field measurements, to indirectly train the ANN. The strain components are easily obtained from the displacements and are fed as inputs to the neural network, which will provide the stress components. The virtual strains are obtained from the virtual displacements, such that: Then, the stress equilibrium is evaluated globally by means of the PVW, and the parameters (W, b) are optimized until the equilibrium is respected, that is, by minimizing the loss: where n V represents the number of virtual fields. This methodology is illustrated in Section 5.3.

Application Examples
This section presents an example of each approach, showing the large potential of ML in material constitutive modelling.

Parameter Identification
The parameter identification problem can be reduced to a curve-fitting problem if physical constraints would not be taken into account. However, most material constitutive models have physical constraints, such as material parameter boundary values and mathematical relations between them, in order to guarantee that the parameters have a physical meaning [62]. The formulation for the solution of the constitutive model parameters identification is as follows. First, a physical system with a behaviour that can be described by a numerical model and for which experimental results are available should be considered. A set of measurable variables that can be experimentally determined should also be considered. This set of variables can be defined as Z T = [z 1 , z 2 , . . . , z m ]. In the case where simple mechanical tests are considered, such as tensile and shear tests, these measurable variables would be the stresses or strains. Considering this information, it is then possible to formulate the solution of the identification problem as the minimization of a function that measures the difference between theoretical predictions Z num (obtained by the numerical model) and experimental data. This function is known as the objective function L(A), and can be formulated as [62]: with L q (A) computed as: where A = [A 1 , A 2 , ..., A r ] T is the set of the r ∈ N constitutive model parameters, and Z exp is a known experimental value of Z in N experimental tests. The tuple of time points (t 0 , t 1 ) is the time period of the generic test q. Additionally, D q is a given weight matrix associated to the test q. The optimization problem consists on finding the minima of L(A), i.e., where the M inequalities g m (A) and the L equalities h l (A) define the model constraints. The search-space is limited and, as a consequence, the optimization parameters should fit in it.
The previous classical methodology is indeed computationally expensive because an optimization problem must be solved. Therefore, ML techniques directly applied to solve the inverse problem without the need for iterative procedures can be beneficial. Additionally, the ML methodology is easier to implement and faster at obtaining results in comparison with the current state-of-the-art procedures, such as the VFM and FEMU methods. Considering that the parameters to be identified will be used in numerical FEA, the training phase can directly and accurately use a numerical FEA database. Even if the creation of this database is computationally time-consuming, it is less costly than an experimental database and it is done only once for each mechanical test. Then, the parameters are identified (i.e., predicted within the ML model) instantaneously and using one experimental test.
The example hereby presented is based on the work done in [63] and uses the biaxial cruciform test in order to generate data to train and identify the parameters for both the Swift's hardening law (parameters σ 0 , k and n) and the Hill48 anisotropic criterion (parameters r 0 , r 45 and r 90 ). A Young's modulus E = 210 GPa and a Poisson's ratio ν = 0.3 are used to characterize the elastic response of the material. As in [63], synthetic images are used for comparison and validation purposes. The reference parameters for the plastic anisotropy coefficients r α at 0°, 45°and 90°from the rolling direction are listed in Table 1. The full geometry of the cruciform test specimen is depicted in Figure 8a; however, only one-fourth is used. The solid geometry is discretized as shown in Figure 8b using a mesh composed by a total of 114 combined CPS4 and CPS3R elements. Symmetry boundary conditions are applied at x = 0 and y = 0 boundaries. A displacement condition u x = u y = 2 mm is applied along the x and y directions. The range of the six constitutive parameters used to generate the database is shown in Table 2. Due to the large number of possible combinations, the Latin hypercube sampling method was used, resulting in a total of 6000 samples. The optimal ANN hyperparameters for the inverse model of the cruciform test were searched. The ideal architecture is listed in Table 3.
The input layer has 7203 input features, since 21 times steps were considered for each test sample and, for each time step, the global force and the strain tensor in all elements are used. Therefore, each sample has the following structure: Force, (ε xx , ε yy , ε xy ) i=1,...,elements j=1,...,times steps .
The suggested learning rate α = 1.512 × 10 −5 was also found using the Hyperband approach. In Figure 9, the learning curves are plotted, registering errors lower than 0.02 during training. The best result was achieved after 275 epochs with a MSE of 0.00083 and a MAE of 0.0072.     Figure 11 shows the influence of the training set size in the MSE. Considering that a plateau was not achieved, it can be concluded that further information should be given to the model to avoid underfitting. The lack of convergence can also be due to unrepresentative data (lack of quality).

Prediction and Comparison
In order to evaluate the performance of the ANN inverse model, three reference sets of material parameters were chosen (see reference values in Table 4) to create synthetic experimental data and used as test cases. Test case 2 uses the same parameters as in [63]. The predictions and corresponding errors are also depicted in Table 4. The smallest error was in test case 2, while the largest error was for test case 3. This accuracy difference between test case 2 and 3 is expected, with the largest error seen for test case 3, where the reference parameters are outside the range of the training space. Furthermore, the error in test cases 1 and 2 are very low, which contrasts with the discussion in the training and validation phase. The force-displacement and strain-stress curves resulting from the obtained parameters can be seen in Figure 12a,b, where a quite perfect fit can be observed. The results obtained using an ANN approach were compared with the results using other approaches in [63]. The comparison, presented in Table 5, show a minimal difference between the errors of the different approaches, meaning that the ANN is effective and can predict the parameters with the same precision as the other more established approaches. However, it should be noted that, although the identification took less than a second, the ANN approach required around 96 h to complete the 6000 sample database in a CPU Intel i7-8700 computer. The VFM solution took less than 5 min for the required 15 optimization iterations [63].  Figure 13 shows the force-displacement and stress-strain curves obtained with the learned material parameters for the test cases 1 and 3. As seen for test cases, even if the errors are larger, the curves show an excellent fit. However, for the test case 3, a large error amplitudes are observed. This can be explained by the fact that the reference data was outside the training space, meaning the network needed to extrapolate the information to predict a result. To test the robustness of the ANN parameter identification approach and to take into account possible errors coming from experimental measurements, the training dataset was polluted with random normal distribution noise (with a standard deviation of 1). Table 6 lists the predictions of the ANN after the training with the noise-polluted data. The predictions attained presented a larger error, demonstrating that this approach is influenced by the quality of the training data.

Remarks
This example shows the capabilities of a ML methodology applied to the calibration of numerical models of materials. Here, an easy to implement FFNN was trained as an inverse model with the aid of a large dataset generated from FEA numerical simulations. The ANN-based model was able to achieve its objectives and the results show that the network can predict material parameters, even though with some errors. It was also seen that the reliability of the ANN inverse model is highly dependent on the quantity and quality of the training data.

ML Constitutive Model Using Empirical Known Concepts
The selection of the features of the ML model can be done using the knowledge already gained in the last decades. Therefore, concepts such as isotropic and kinematic hardening can be used as output/input of the ML. These concepts and their values can be retrieved in real material mechanical curves obtained with classical homogeneous tests using the result's decomposition, as done in [18], and used for training purposes. Additionally, the differential formulation of the majority of the known material model can also be maintained in the development of an ML plasticity model. In classical plasticity or viscoplasticity models with mixed isotropic-kinematic hardening, the inputs are the stress tensor σ, the plastic (or viscoplastic) strain tensor ε vp , the backstress tensor χ, the isotropic hardening R and the equivalent plastic strain p, which is a scalar value representing the overall plasticity level. This results in 11 inputs, which can be used in a FFNN architecture with one single hidden layer. The eight outputs are the differential values of the inputs, i.e., the set ε vp ,χ,ṗ,Ṙ . The stress tensor can be updated by a hypoelastic relation given as where C is the material stiffness matrix and ε is the total strain tensor. Generally, the formulation in Equation (15) is used in its incremental form due to the nonlinearity of the plastic behaviour. In this section, the material constitutive behaviour of a metal is modelled using ML/ANN techniques and additional empirical concepts to the ones previously introduced. Then, this model is used in a finite element analysis. In this example, the material ML model has an artificial neural network (ANN) architecture, and it is trained (fitted) using a virtual simulated material (as synthetic data). This virtual material has a a priori known behaviour, following the classical Chaboche elastoviscoplastic model. The advantage of knowing beforehand the behaviour is that this allows the ML capabilities and competences in material constitutive modelling to be evaluated at least to replace classical models [22].

Creating the Dataset for Trainning
To train the ML/ANN model, a synthetic database representing the behaviour of a virtual material was artificially created. Although a large set of analytical elastoplastic constitutive models could be chosen, in this work, the elasto-viscoplastic Chaboche model was selected due to the challenge of adding rate-dependent phenomena in the behaviour of the material. Plane stress conditions were applied. The viscoplastic strain rate tensor and the plastic strain rate can be seen in Equation (A11). Additionally, the kinematic and isotropic hardening evolution are, respectively, defined by Equations (A13) and (A14). The stress-strain states of the differential equations governing the behaviour depend also on the elasto-plastic material properties (e.g., E = 5000 MPa, ν = 0.3, R 1 = 500 MPa, k = 0, K = 50 MPa s −1 , a = 7500 MPa, b = 0.6, c = 100, n = 3).
The system of differential equations for two tensile tests (T xx , T yy ) and one simple shear (S xy ) for three cyclic strain ranges of 0.05, 0.12 and 0.16 is solved using the Runge-Kutta method [64], and their results are added to the virtual material database. The 9 virtual experimental results are created and represented by Figure 14. Each test is simulated with 1000 data points, as done in [22]. (a) the scaled input data (σ, χ, ε vp , R and p) is transformed into the (b) scaled differential output data (χ,ε vp ,Ṙ andṗ) by the virtual material model.

Training the ANN Model
As supervised learning, the training finds the regression parameters to minimize the cost function. In this work, the best topology, activation functions and the optimization algorithm are found using a grid search function [65] and a 5-fold cross-validation scheme. For that purpose, (i) four topologies (four, ten, sixteen and twenty-two hidden neurons) for one hidden layer, (ii) three activation functions (logistic, tanh, ReLU), and (iii) three optimization algorithms (LBFGS, SGD, ADAM) were candidates. An automatic batch size of min (200, nº training samples) and a learning rate of 0.001 were used. For the output layer, a linear activation function is used. Figure 15 presents the grid search results, where (i) the LBFGS optimization algorithm is responsible for the best result for all activation functions and size of the hidden layer, and (ii) the ReLU together with LBFGS seems to be the best combination for the training set.
The validation curve [66] for the ANN size can be seen in Figure 16a, where it can be seen that the optimal value (R 2 = 0.965 for training and R 2 = 0.964 for cross-validation) is obtained for 16 neurons. The influence of the training database size was also analysed [66] using the 16 hidden node configuration. However, Figure 16b shows that the increase in the training samples database does not improve the model because both the training and cross-validation R 2 seemed to converge after 4000 training samples. The regularization term λ was also tuned using the same procedure [66]. It can be seen in Figure 16c that a regularization term lower than 10 −2 should be used in both training and cross-validation.

Testing the Trained Model in an FEA Simulation
The ANN model was implemented in the Abaqus standard (implicit) FEA code using a UMAT user routine. Although the UMATs are written in Fortran, here, a call for a python script was used. The advantage of using a differential ANN model is that the FEA time integration is straightforward. Nevertheless, small time increments were used due to the non-use of a consistent stiffness matrix.
The goal of the ANN model is to be tested in a simulation environment. Therefore, for that purpose, a heterogeneous mechanical test named adapted butterfly specimen [67] and illustrated in Figure 17a is used. Although not represented in Figure 17b, a tool displacement of 9.6 mm is imposed on the top of the specimen. Symmetry conditions are considered in both the x and y axis, allowing only 1/4 of the specimen to be represented. Then, 2D 4-node bilinear elements (CPS4) are used with an element size of 0.75 mm, as can be seen in Figure 17c. A fixed increment size of 0.2 was used for a total time of 5 s. This specimen has the advantage of producing heterogeneous stress fields, as seen in Figure 17d. Three points were selected to assess the accuracy of the ANN model and compared with the results obtained using the virtual material modelled by the Chaboche elasto-viscoplastic model. It must be noted that the ANN model was trained with the virtual material using only two tensile and one simple shear test. Table 7 compares the values of the stress and backstress tensors and isotropic hardening/drag stress for the three selected points of Figure 17d at the end of the test. Here, the experiment values correspond to the virtual material experiments. In general, the ANN model can reproduce fairly well the virtual material. However, although lower errors can be seen for σ xx and σ yy (16% and 0.75%) and their correspondent backstress in the first two points, higher errors were obtained for the shear stress component. Nevertheless, these results can be explained that, for this point, ε = {−0.0914, 0.2008, 0.0442}, and the ANN model was never trained with such a complex strain field, only with a maximum strain of 0.16. For points 2 and 3, the performance of the ANN model is better because the strain values are lower. It must be highlighted that the larger error for point 2 is 8%. Table 8 compares the equivalent plastic strain and the viscoplastic strain tensor at the end of the test for the selected three points. Again, the performance of the ANN model is fairly good; however, it is the shear strain that presents the worst results (errors larger than 20%). The evolution of σ during the test for the selected points when using the ANN model can be seen in Figure 18. The experimental evolution of the virtual material is also added for comparison purposes. Although it is observed that both σ xx and σ yy evolution reproduces well the behaviour of the virtual material, ANN-τ xy fails to foresee the experimental curve. This fact can be attributed to the absence of complex shear states in the training data.
The other internal state variables were also assessed and compared. Figure 19 presents the time evolution of the viscoplastic strain tensor, equivalent plastic strain, backstress tensor and the isotropic hardening/drag stress. In a general overview, it can be stated that the ANN model curves can fit the virtual experiments. However, larger computational times (>20×) were observed in comparison with the reference model.

Final Remarks
In this section, it can be seen that an ML/ANN model can be used to model the behaviour of an elasto-viscoplastic material. Here, an ANN model was effectively trained using virtually created (synthetic) data of cyclic tensile compression and shear tests. This was successfully reproduced for 2D (plane-stress) conditions, thus achieving the main objective.
Subsequently, the ANN model was applied in an FEA simulation of complex stressstrain states using a heterogeneous mechanical test denoted the butterfly test. Even knowing that the complexity of the stress-strain field used for analysis was not used for training, the ANN model was quite able to simulate the stress-strain evolution. In conclusion, the proposed ANN model, even with different combinations of strain never used in training, is capable of reproducing the material behaviour defined by the virtual material.
Moreover, the ANN implementation in the FEA code is straightforward and no integration problems were observed even in complex geometries This example shows the capability of the ML/ANN to model the material behaviour using some concepts already used in analytical models. Although this example used synthetic data, real experimental data could be used under the same conditions.

Implicit Elastoplastic Modelling Using the VFM
The heterogeneous test developed by Martins et al. [2] was used to generate numerical experimental data to train the ANN model. The configuration consists of a solid 3 × 3 mm 2 plate with a thickness t = 0.1 mm. The domain is discretized by 9 4-node bilinear plane stress elements. The initial mesh, geometry and boundary conditions are depicted in Figure 20. Symmetry boundary conditions are applied to the boundaries at x = 0 and y = 0, and a surface traction is defined at x = 3 mm. The traction follows a non-uniform distribution, composed by a single component along the x-direction, which varies linearly in the y-direction, according to f x (y) = my + b, where m and b respectively control the distribution. The numerical simulations were conducted using the commercial finite element code Abaqus. The model was built with CPS4R elements (bilinear reduced integration plane stress). The material was simulated employing a non-linear isotropic elasto-plastic model, with the isotropic hardening response obeying Swift's law, given by: where σ y is the flow stress, K is a hardening coefficient, n is the hardening exponent, σ 0 is the yield stress and ε 0 the deformation at the yielding point, computed as: The elastic parameters were defined as E = 210 GPa and ν = 0.3, while σ 0 = 160 MPa, K = 565 MPa and n = 0.26 were adopted for Swift's hardening law.

Data Generation and ANN Training
To generate the training data, all simulations were performed using a small displacement formulation, with the time period set to 1 and using a fixed time increment ∆t = 0.005. For each time increment, the deformation components at the centroid were extracted for all the elements, and the global force was determined by means of computing the equilibrium of the internal forces, such that: where F is the global force, L is the length of the solid, A is the element's area and t is the thickness. The training data were generated for different load distributions, keeping the slope fixed at m = 10 N/mm and varying the intercept parameter, such that: b = {50, 170, 270} N.
Prior to training and for each mechanical trial, the dataset was organized into batches of 9 elements per time increment and shuffled before being split into training (80%) and test data (20%). The input features were transformed in order to follow a normal distribution with zero mean and unit variance. Two models were trained and aimed at predicting the linear elastic and elasto-plastic responses of the material. Once trained, the models were validated with different mechanical trials using the following load distributions: m = 12, b = 100 for the elastic model and m = 12, b = 200 for the elasto-plastic model. The neural network model consisted of a simple FFNN with one hidden layer, with 4 hidden neurons for the elastic case and 8 hidden neurons for the elasto-plastic case. In both cases, the parametric rectified linear unit (PReLU) activation function [69] was used. The inputs given to the models were the deformation components in the current and previous time increments, ε t and ε t−1 , respectively, and the outputs were the stress tensor components at the current time increment, σ t . The ADAM algorithm was used to optimize the network weights, with the initial learning rate set to 0.1, scheduled to be reduced using a multiplier of 0.2, if no improvement in the training loss was registered after 5 epochs. For the elastic model, the network was trained during 30 epochs, and for the elasto-plastic model, training was set to occur during a maximum of 150 epochs.

Results and Discussion
The learning curves for both models are depicted in Figure 21. The plots show the loss decreasing during the training epochs with convergence being achieved earlier for the elastic model. In this case, four virtual fields were used during training, while for the elasto-plastic response model, a total of six virtual fields were used. The complete set of virtual fields is presented in Table 9 and illustrated in the Appendix C. Table 9. Virtual fields used to train both ANN models. Virtual fields 1 through 4 were used to train the elastic model, while the whole set was used to trained the elastoplastic model. x L sin 2πx 3L   The validation results ( Figure 22) show that, in general, the ANN was able to learn both elastic and elasto-plastic behaviors. In the former case, the ANN provided near-excellent stress predictions, with, however, a higher error being registered for the stress component in the y-direction for the element 9 (Figure 22c). In the latter case, the ANN provided overall excellent predictions for the stress response along the x-direction. Although the use of additional virtual fields slightly improves the shear stress response for the plastic model (Figure 22b,d), it seems the ANN has a noticeable lower sensitivity to the stress responses along y and xy. The issue may be due to fact that either the number of chosen virtual fields is not enough to capture the material behavior or the chosen set of virtual fields provides more weight to the stresses along x, to the detriment of the remaining components. Another factor at play is that the virtual fields were chosen manually. This strategy is often used for non-linear models and is the easiest to implement; nonetheless, it does not guarantee that the chosen virtual fields produce the best results and is tied to the user's expertise [2]. Additionally, it may be possible that the mechanical test considered in this example may be too simple, thus not providing enough dissimilar behaviour data. Albeit with a wide margin for further improvements in the future, this application example has effectively proven that the new methodology works.

Conclusions
This paper presents and discusses the recent advances and applications of ML, particularly ANNs, in metal forming processes, from which the following points can be highlighted: • ANNs provide an opportunity for a complete change of paradigm in the field of constitutive modelling. Their powerful capabilities allow material models to be devised without the need to formulate complicated symbolic expressions that, conventionally, require extensive parameter calibration procedures; • Nevertheless, even in calibration/parameter identification procedures, the power and potential of ANNs can still be used. This paper showed that an ANN can be trained as an inverse model for parameter identification; • ANNs are paving the way for the development of implicit material models capable of reproducing complex material behavior purely from data. Additionally, the the network architectures themselves can be designed based on the structure of existing well-known constitutive models; • A great number of methodologies employ ANNs as "black boxes" that are not guaranteed to respect the basic laws of physics governing the dynamics of the systems, thus requiring large amounts of training to do so. This is an important issue due to the cost of data acquisition being often prohibitive and time-consuming; • Considerable efforts have been done to improve the generalization capacity of the ANN-based implicit constitutive models, either by the development of new architectures, more robust training procedures or through the augmentation of the datasets. The use of NANNs, for example, is an interesting approach to capture the full material response, but does not solve the underlying dependency on large datasets. On the other hand, PINNs, particularly TANNs, have the potential to leverage the knowledge obtained from training data through the direct encoding of basic thermodynamic laws in the neural network, thus resulting in models able to perform very well on low data regimes. The SPD-NN goes a step further by imposing time consistency, strain energy function convexity and the Hill's criterion, thus solving numerical instabilities derived from coupling standard ANN-based models with FEM solvers; • Another powerful attribute of implicit constitutive models is the ease of implementation within FE subroutines. The architecture of ANNs can be stored in the code via matrices with the appropriate dimensions, containing the optimized weights obtained during training. The ANN operations are reduced to a sequence of matrix operations that provide an almost instantaneous mapping between input and output values, providing substantial gains in computational efficiency; • A major part of the ANN-based methodologies to implicit constitutive modelling relies on labeled data pairs (strain and stress) to train the network. However, the stress tensor is not measurable in complex experimental tests, thus requiring the ANN-based model to be trained indirectly, using only measurable data, such as displacements and global force; • Indirect training approaches for ANN-based constitutive modelling reported in the literature resort to coupling with FEA software in order for the numerical solver to impose the necessary physical and boundary contraints, while also allowing full-field data to be taken advantage of; • A new disruptive methodology to train an ANN-based implicit constitutive model was proposed using the VFM. The new approach allows an ANN to learn both linear elastic and elastoplastic behaviors through the application of the PVW, using only measurable data, e.g., displacements and global force, to feed the training process. An interesting advantage of this new approach is that it does not require the FEM for further calculations.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: Different network topologies were formulated as variants of FFNNs and became popular in the last two decades [73]: convolutional neural networks (CNNs) and long short-term memory neural networks (LSTMs). Typically, in CNNs, a convolutional layer moves across the data, similar to a filter in computer vision algorithms, requiring only a few parameters, as the convolving layer allows for effective weight replication [74]. LSTMs, a variation of recurrent neural networks (RNNs), offer specialized memory neurons with the purpose of dealing with the vanishing gradient problem. The latter often leads to sub-optimal local minima, especially as the number of neuronal layers increases [73].

Appendix A.3. Mathematical Representation
Consider a case where each training instance of a given dataset has the form (x, y), where x ∈ R n is an input vector containing the set of input features [x 1 , . . . x n ] and y is the observed value, obtained as a part of the training data. Given a FFNN of arbitrary architecture, as shown in Figure A2, the dimensionality of the feature vector x dictates the number of neurons in the input layer [33].  m denoting the activation of the mth neuron in the network's kth layer. The activation potential from layer (k − 1) to layer k is controlled by a matrix of parameters W (k) ∈ R m×n and computed as the weighted sum of the output values {x 1 , . . . x n } from the incoming connections. A function g(·) is then applied to this weighted sum, leading to the activations of the kth layer being computed as: A set of biases b ∈ R m is added as the weight of a link, which in practice means incorporating a bias neuron that conventionally transmits a value of 1 to the output node [33]. In essence, in a FFNN, the n-dimensional input vector x is transformed into the outputs using the following recursive relations [33,75]: (1) input to hidden layer a (p+1) = g W (p+1) a (p) + b (p+1) ∀p ∈ {1, ...k − 1} hidden to hidden layer o = g W (k+1) a (k) + b (k+1) hidden to output layer . (A3) In the above expressions, g(·) is an activation function, a mathematical entity that receives the output of the weighted sum as an input and converts that into the final output of a node [33,49]. There are several different types of activation functions, and their selection is an important part of the design process of a neural network. The most common activation functions [33,75] are shown in Figure A3. Worthy of note is the fact that all the activation functions shown are monotonic. Moreover, other than the identity function, most of them saturate at large absolute values of the argument. The use of nonlinear activation functions greatly increases the modelling capabilities of a neural network [33,49]. Historically, both the sigmoid and tanh functions have been the popular choices to incorporate non-linearity in the neural network. However, in recent years, the preference has been directed towards a family of piecewise linear functions, such as the rectified linear unit (ReLU) and its variations. Due to their simplicity, these functions are easier to differentiate, thus providing a faster training process [33].

Appendix A.4. Learning Paradigms and Training Procedures
The training of an ANN is accomplished through a learning procedure during which sample data is fed to the network and a learning algorithm is in charge of modifying a set of parameters in order for the ANN to provide the best possible prediction [9]. There are essentially two learning paradigms: unsupervised learning and supervised learning [49]. Unsupervised learning is concerned with finding patterns in unlabeled datasets containing many features. Supervised learning is the most widespread method used to train ANNs and uses labeled datasets, where each training sample is linked to a label [49,75]. The process is similar to a standard fitting procedure, accompanied by the selection of a learning algorithm that is used to fit target value [9,49]. The main task relates to data generation and preparation in order to ensure the dataset is both accurate and consistent. The main dataset is split into various subsets, mainly a training and a test set. Additionally, a validation set, different from the latter ones, may be used to optimize the hyper-parameters [9]. The following step is concerned with translating the raw information into meaningful features that can be used as inputs for the ANN. Finally, the model is trained by optimizing its performance, that is, finding the set of parameters (W, b) that minimizes a given loss function [9]. For regression tasks, the mean squared error (MSE) is commonly employed, being written as: where m is the total number of training instances, k is the number of layers, s k is the number of neurons on the kth layer and s k+1 is the number of neurons on the (k + 1)th layer. There is a wide variety of optimization algorithms available to minimize L (W, b). A gradient-based algorithm (e.g., gradient descent or adaptive moment estimation (ADAM)) is normally used to minimize the loss by subsequently moving in the direction of the negative gradient [33].
where k, R, K and n are the initial yield stress, isotropic hardening and two material constants, respectively, whereas σ and χ are the deviatoric parts of the stress and backstress tensors [27,76]. The J(σ − χ ) invariant is computed through the following formula: The kinematic hardening rateχ is defined by means of a partial differential equation, given as:χ while the isotropic hardening rate is calculated from: The variables a, b, c and R 1 are material parameters, which must be identified for each material.