In this section, we review research that uses ML algorithms in scientific computing to achieve a better balance between computational cost an accuracy. We divide the section into two subcategories: simulations and surrogate modelling. The first regards the application of ML to improve or speed up simulation techniques (e.g., DFT, MD, and CFD), while the second refers to the use of ML to create reduced-order models that completely circumvent the need for the computationally expensive yet more physically accurate simulations.

#### 2.1. Simulations

An important goal in computational modelling and simulations is to find a good balance between computational resources and accuracy. For example, in the absence of computational limits, QM-based simulations could be used to resolve complex, turbulent flows accurately; yet practically this is an inconceivable task. In this section, we present research that uses ML, aiming to improve the balance between accuracy and speed.

A good starting point is MD, a simulation technique that resolves the system down to the atomic scale. MD is based on Newtonian physics. Each atom is represented as a single point. Its electronic structure, the dynamics of which are described by QM, is ignored. The forces acting on the particles are calculated as the gradient (i.e., spatial derivative) of the potentials: semi-empirical functions of the atomic coordinates that are usually derived through regressions from QM-based data. Once the forces are calculated, the system can evolve in time by integrating Newton’s second law of motion. The computational complexity of MD scales with the number of atoms and is usually limited to systems comprising a few hundred thousand particles.

The semi-empirical potentials used in MD, however, are not always accurate. Depending on the data used for the extraction of these potentials, they might be tailored to specific situations, such as a specific range of thermodynamic states (e.g., specific temperature and pressure). For example, there are many potential functions available for the interaction of water molecules, each with its own strengths and weaknesses [

21].

On the other hand, QM-based models resolve the system down to individual electrons, and as such can accurately calculate the inter- and intra-molecular forces and interactions. Such techniques, however, are even more computationally demanding than MD, as atoms consist of many electrons that are correlated with each other (i.e., quantum entanglement). Simulation methods such as DFT attempt to simplify such many-body systems while still providing a quantum mechanical resolution. Regardless of the relative efficiency of DFT, however, it is still exceptionally computationally expensive compared to larger-scale approaches.

To approximate the accuracy of QM-based methods, while retaining the computational advantage of MD, ML-based methods, trained on QM data, can be used to derive accurate force-fields that can then be used by MD simulations. ML algorithms, such as ANNs [

22,

23,

24,

25], or Gaussian processes (GP) [

26,

27], have been trained, using DFT simulations, to reconstruct more general and accurate Potential Energy Surfaces (PES). The gradient of these PES provides the interatomic forces used by MD simulations. Chmiela et al. [

27] have used Kernel Ridge Regression (KRR) to calculate interatomic forces by taking as input the atomic coordinates. More recently, an ANN was trained to produce potentials that match the accuracy of the computationally demanding but accurate Coupled-Cluster approach, a numerical scheme that can provide exact solutions to the time-independent Schröedinger equation, exceeding the accuracy of DFT [

28].

Yet these potentials usually have a constant functional form. As such, they do not always generalize well, and cannot accurately capture complex transitions such as chemical reactions and phase change. Hybrid models referred to as First Principle Molecular Dynamics (FPMD), use Newton’s laws to integrate the system in time (i.e., as in normal, classical MD), but use quantum mechanical simulations such as DFT to calculate the force-fields at each timestep. Variations of this approach attempt to limit the quantum mechanical calculations only to when and where necessary [

29]. ML algorithms, such as GP [

30] or KRR [

31], can reduce the computational cost of such approaches. The general idea is that when a new state is encountered, the QM simulation will calculate the force-fields. The data, however, will also be used to train some ML algorithm, such that, if any similar states are encountered, the fast ML component will replace the computationally expensive QM simulation. Only when a new process is encountered are the computationally expensive QM simulations considered.

Such ML-based MD methods have successfully been used to simulate complex physical systems. ANNs have been used to simulate phase-change materials [

32,

33,

34,

35], make accurate predictions on the many phases of silicon [

36] and describe the structural properties of amorphous silicon [

37], as well as efficiently and accurately calculate infrared spectra of materials [

38].

Clustering methods have also been used to evaluate the accuracy of classical MD force-fields for various tasks. Pérez et al. [

39] used a combination of Principal Component Analysis (PCA) and K-means classification to evaluate and categorize different force-fields used for modelling carbohydrates in MD. The PCA reduced the dimensionality of the system from individual positions and energetics into a two-dimensional orthonormal basis. The above reduction resulted in a high-level map indicating similarities or differences between different force-fields. A hierarchical k-means clustering algorithm was used on the PCA data to formally categorize the force-fields. Raval et al. [

40] used k-means clustering on data produced by MD simulations in order to assess the effectiveness of various classical force-fields in simulating the formation of specific proteins, i.e., for homology modelling, and propose alternatives that can mitigate the identified weaknesses.

For macroscopic fluid dynamics, on the other hand, the most common simulation technique is CFD. This method solves the fluid dynamics equations: the continuity equation, derived from the conservation of mass; the momentum equations, derived from the conservation of momentum; and the energy equation, derived from the conservation of energy. These equations ignore the molecular nature of fluids, and instead consider material properties as continuum functions of space and time. For a computer to solve these equations, they must first be discretized. This is achieved by mapping the physical domain into a grid, and the equations are solved, using numerical methods, on discrete points (e.g., grid points, cell centers). As you increase the density of grid points the accuracy of the solution will also increase, at the expense of computational resources.

In between the nanoscale (commonly treated with QM-based methods and MD) and macroscopic systems (usually treated with CFD or a similar continuous method), there are physical problems that lie in an intermediate scale. For example, microfluidic systems, systems with micrometer characteristic dimensions, are usually composed of a massive number of molecules (i.e., in liquid water there are ∼

${10}^{10}$ molecules in a cubic micrometer). Classical MD is, practically speaking, not an option. On the other hand, such scales are often too small for continuum methods. Models such as CFD are often incapable of capturing the complex physics that emerge at solid-liquid interfaces, such as variations in the thermodynamic properties [

41,

42], or accounting for surface roughness [

43,

44,

45].

Instead, hybrid models have been proposed, in which the computationally favorable continuum solver is predominantly used, while using a molecular solver for the under-resolved flow features. However, the MD simulations must run relatively frequently, sometimes at each macroscopic timestep, which significantly increases the computational requirements. Instead ANNs have been used to provide molecular-level information to a CFD solver, thus enabling a computationally efficient resolution of such scales [

46,

47]. As with the QM-based simulations described above, the MD solver is only used when it encounters relatively new states, i.e., states that are outside a pre-defined confidence interval from previously encountered states. As the simulation progresses and more states are encountered, the ANN is primarily used. The ANN-based hybrid model captures the flow physics very well. Furthermore, while the thermal noise, inherent in MD simulations, prevented the continuum solver from converging satisfactory, the NN-based hybrid model suppressed the thermal fluctuations resulting in lower residuals and better convergence of the solution (

Figure 3).

A long-standing, unresolved problem of fluid dynamics, is turbulence, a flow regime characterized by unpredictable, time-dependent velocity and pressure fluctuations [

48,

49,

50,

51]. As mentioned in the introduction, accurate resolution of all the turbulent scales requires an extremely fine grid, resulting in often prohibitive run-times. Instead, reduced turbulent models are often used at the expense of accuracy. A simplified approach is the Reynolds Averaged Navier–Stokes (RANS) equations that describe the time-averaged flow physics, and consider turbulent fluctuations as an additional stress term, the Reynolds stress, which encapsulates the velocity perturbations. This new term requires additional models to close the system of equations. While several such models are available, e.g., k-epsilon, k-

$\omega $, Spallart–Allmaras (SA), none of them is universal, meaning that different models are appropriate under different circumstances. Choosing the correct model, as well as appropriate parameters for the models (e.g., turbulence kinetic energy, and dissipation rate), is crucial for an accurate representation of the average behavior of the flow. Furthermore, this choice (particularly the choice of parameters) is often empirical, requiring a trial-and-error process to validate the model against experimental data.

ML has been used for turbulent modelling for the last two decades. Examples include the use of ANNs to predict the instantaneous velocity vector field in turbulent wake flows [

52], or for turbulent channel flows [

53,

54]. The use of data-driven methods for fluid dynamics has increased significantly over the last few years, providing new modelling tools for complex turbulent flows [

55].

A significant amount of effort is currently put into using ML algorithms to improve the accuracy of RANS models. GPs have been used to quantify and reduce uncertainties in RANS models [

56]. Tracey et al. [

57] used an ANN that takes flow properties as its input and reconstructs the SA closure equations. The authors study how different components of ML algorithms, such as the choice of the cost function and feature scaling affect its performance. Overall, these ML-based models performed well for a range of different cases such as a 2D flat plate and transonic flow around 3D wings. The conclusion was that ML can be a useful tool in turbulence modelling, particularly if the ANN is trained on high-fidelity data such as data from DNS or Large Eddy Simulation (LES) simulations. Similarly, Zhu et al. [

58] trained an ANN with one hidden layer on flow data using the SA turbulence model to predict the flow around an airfoil.

Deep Neural Networks (DNNs) incorporating up to 12 hidden layers have also been used to make accurate predictions of the Reynolds stress anisotropy tensor [

59,

60]. The ML model preserved Galilean invariance, meaning that the choice of inertial frame of reference does not affect the predictions of the DNN. The authors of the paper further discuss the effect of more layers and nodes, indicating that more is not necessarily better. They also stressed the importance of cross-validation in providing credence to the ML model. Bayesian inference has been used to optimize the coefficients of RANS models [

61,

62], as well as to derive functions that quantify and reduce the gap between the model predictions and high-fidelity data, such as DNS data [

63,

64,

65]. Similarly, a Bayesian Neural Network (BNN) has been used to improve the predictions of RANS models and to specify the uncertainty associated with the prediction [

66]. Random forests have also been trained on high-fidelity data to quantify the error of RANS models based on mean flow features [

67,

68].

Another popular approach to turbulence is called Large Eddy Simulation (LES). LES resolves eddies of a particular length scale. Unresolved scales, i.e., those that are comparable or smaller than the cell size, are predicted through subgrid models. Several papers have successfully used ANN to replace these subgrid models [

69,

70]. Convolutional Neural Networks (CNNs) have also been used for the calculation of subgrid closure terms [

71]. The model requires no a priori information, generalizes well to various types of flow, and produces results that compare well with current LES closure models. Similarly, Maulik et al. [

72] used a DNN, taking as input vorticity and stream function stencils that predicts a dynamic, time and space-dependent closure term for the subgrid models.

A recent study successfully used ML algorithms that take very coarse CFD results of two-dimensional turbulent data, and reconstruct a very high-resolution flow field [

73]. The coarse input data was obtained by using average or max pooling on the DNS data. The authors downsampled the data to different resolutions: medium, low, and very low. They reconstructed the data back to the DNS resolution, either using basic bicubic interpolation, CNNs, or a ML hybrid algorithm, called Downsampled Skip-Connection/Multi-Scale (DSC/MS), which, in a way, is a specific CNN architecture. For various two-dimensional cases, the reconstructed output accurately matched the original DNS data. The kinetic energy spectra calculated by the DSC/MS matched the results of the DNS (

Figure 4) very well, at least within the resolution of the corresponding downsampled input, i.e., the data resulting from reconstructing the medium resolution input matched the DNS data for a larger range of wave numbers than that obtained from the low resolution input (c.f. grey and orange curved in

Figure 4).

ML can also be used to predict closure terms in systems of equations in other multi-scale, multi-physics problems [

74,

75]. Ma et al. [

76] have used ANNs to predict the behavior of multiphase, bubbly flows. The ANN consists of a single hidden layer with 10 neurons and uses data from DNS-based multiphase flows, to provide closure terms for a two-fluid model.

Another complex problem when studying the flow behavior of immiscible fluids is to describe the shape of the interface between them [

75]. Qi et al. [

77] used an ANN that takes the volume fractions of cells at the interface and predicts the curvature and shape. The ANN was subsequently incorporated in a multiphase, CFD solver that uses the volume of fluid technique for interface modelling.

A recent study provided a general overview of how ANNs can be used to provide closure models for partial differential equations [

78]. The paper presents five different general ML frameworks, mostly consisting of various architectures and combinations of DNNs and CNNs, for the solution of thermo-fluid problems, including turbulent flows. The paper provides a useful roadmap of how to use ANNs to solve engineering problems, and outlines considerations that need to be made according to the nature of the physical problem. A frequent issue with ML algorithms is that they often perform poorly when used in physics that are significantly different from those provided for training. Studies have attempted to provide metrics that indicate a priori how well an ML model will generalize [

79,

80]. Wu et al. [

80], for example, compare the Mahalanobis distance and a Kernel Density Estimation-based distance between points in feature space, to assess how the training data will generalize to other cases.

ANNs have also been used to speed up numerical methods. Feed-forward ANNs [

81] and CNNs [

82] have been used to accelerate the projection method, a time-consuming numerical scheme used for the solution of the incompressible, Euler equations.

Another application of NNs is for multi-scale modelling of materials. DNNs with two hidden layers have been used to predict the stresses applied on concrete, given the strain calculated by mesoscale simulations as input [

83,

84]. This information was then used in larger-scale, structure-level simulations. Subsequent studies used similar data-based multi-scale procedures to numerically investigate bone adaptation processes such as the accumulation of apparent fatigue damage of 3D trabecular bone architecture at a given bone site [

85,

86]. The predictions used ANNs with one hidden layer, trained using mesoscale simulations. The ANN takes stresses as inputs and output homogenized bone properties that the macroscale model takes as input. Sha and Edwards [

87] give a brief overview of considerations that need to be taken to create accurate NN-based multiphase models for materials science.

Another severe bottleneck of computational physics and engineering is storage. High-fidelity simulations often require a considerable number of grid cells (or several particles in particle-based methods). Compressing this data before storage has been studied for many years now [

88,

89,

90]. Furthermore, there is an increased interest in high-order methods (spectral methods, MUSCL and WENO schemes) in which even individual cells are characterized by a large number of degrees of freedom. Finally, when simulating transient physics, data on these grid points must be stored at finely spaced increments of time. Reconstructing the time-dependence of the physics requires storing a sequence of datasets, containing the high-order data of meshes with many cells. As such, common industrial problems such as high-speed flows around complex geometries, often require storing petabytes of data.

Carlberg et al. [

91] tackle this problem very well by proposing a compression of the spatial data, followed by a compression of the time-sequence. The above study examines high-order methods, specifically the Discontinuous Galerkin (DG) method that represents the solution at each cell as a linear combination of a potentially large number of basis functions. The spatial data is compressed in two stages: the first stage uses an autoencoder to reduce the degrees of freedom within each cell from 320 to 24. The second stage uses PCA to reduce the dimensionality of the encoded vectors across all the cells in the mesh, a reduction from

$\sim {10}^{6}$ degrees of freedom to only 500. The compressed data is then stored for sparsely separated timesteps. Finally, regression is used on the compressed data that allows the reconstruction of data at any timestep. The authors used several regression algorithms, with the Support Vector Machines (SVM) (with a radial basis kernel function) and the vectoral kernel orthogonal greedy algorithm performing the best.

#### 2.2. Surrogate Modelling

Many engineering applications need to make real-time predictions, whether for safety-critical reasons, such as in nuclear powerplants and aircraft, or simply practical reasons, such as testing many variants of some engineering design. While

Section 2.1 discussed ways in which ML can speed up conventional simulation techniques, the associated timescales are still many orders of magnitude greater than the requirements of such applications. Here we discuss ways in which ML can be used to make direct physical predictions, rather than assist numerical simulations of a physical experiment.

In its most basic form, ML algorithms can be trained to make static predictions. Given a set of data, the ML component can calculate instantaneous information that is either inaccessible through experiment or is too time-consuming to calculate through conventional simulation methods. In this sense, the ML algorithm can be viewed as a generic, black-box, purely data-driven component.

ML is often used to directly solve QM-based problems [

92], such as the calculation of the ground state of a molecular system, a task that otherwise requires the computationally expensive solution of the Schrodinger’s equation. Rupp et al. [

93] predicted the ground state using non-linear ridge regression that takes as its input nuclear charges and atomic positions. The authors investigated the effect of the size of the training set on the accuracy of the predictions and found that as it ranged from 500 to 7000 molecules, a significant drop in the mean absolute error was observed. The trained model can make predictions on new organic molecules at practically no computational cost, with an error of less than 10 kcal/mol, an accuracy that surpasses most semi-empirical quantum chemistry-based methods. Subsequently, Hansen et al. [

94] assessed the performance and usage of different ML algorithms for the same task, i.e., the calculation of the ground energy. They considered linear ridge regression, non-linear ridge regression using a Laplacian or Gaussian kernel, Support Vector Machines (SVM), k-nearest neighbor, and fully connected, feed-forward, multilayered neural networks. They found that linear regression and k-nearest neighbor algorithms were inadequate for the task. Instead, kernel-based methods and the ANN yielded errors of less than 3 kcal/mol, a significant improvement to the results presented by Rupp et al. [

93]. More recently, a paper proposed a novel method for calculating the ground state and evolution of highly entangled, many-body, QM problems using ANNs [

95]. The ANN uses a Restricted Boltzmann Machine (RBM) architecture. In general, RBMs includes two layers: the first acts as the input to the network and is usually referred to as the visible layer, while the second is called the hidden layer. The network outputs the probability of occurrence of the input configuration in the visible layer, which in the present paper represented the configuration of N spins. The RBM therefore acts as the wavefunction of the system.

Handley and Popelier [

96] used a feed-forward ANN with one hidden layer, trained on QM-based data, to calculate the multipole moments of a water molecule, based on the relative coordinates of its neighbors. Hautier et al. [

97] used Bayesian inference with a Dirichlet prior to identify new compounds and structures. The method generates a shortlist of possible compounds, the stability of which was then verified through DFT simulations. The model was trained on a database with experimental data on various crystal structures and, as a proof-of-concept, it was subsequently used to discover 209 new ternary oxides.

Another area that demonstrates the predictive capability of ML is in applications involving multiphase flows, such as the Oil and Gas industry, where information on the composition and behavior of multiphase fluids is necessary. Yet, measurement might be impossible, and numerical simulations are costly. An example is acquiring information regarding the multiphase-flow regime inside a pipeline. The multiphase-flow regime refers to how different immiscible fluids co-exist within a mixture. Gas can form small bubbles within a liquid or larger structures–sometimes referred to as slugs–, or can even form two parallel streams, with a flat (flat-stratified) or wavy (wavy-stratified) interface. Different types of the multiphase regime can result in significantly different flow fields and pressure losses.

Several studies used ANNs to predict the phase configuration and volume fractions of multiphase flows using different types of data such as from dual-energy gamma densitometers [

98], differential pressure measurements [

99,

100], flow impedance [

101], and bubble chord lengths taken at various points in a pipe [

102,

103]. ANNs have also been used to calculate the regime, phase velocity, and pressure drop in various geometries [

104].

ML has been used for flow control. Just over two decades ago, Lee et al. [

105] used NNs to create a drag-reduction protocol for aircraft. A blowing and suction mechanism at the wing was employed, actuated by the ML component that takes as its input shear stresses in the spanwise direction on the wall. A more recent study used an ANN to reduce the drag around a cylindrical body, by adjusting the mass flow rates of two jets, depending on velocity readings in the vicinity of the cylinder [

106].

ML has also been used in material characterization. NNs have been used in solid mechanics for almost three decades now, in an attempt to understand the stress-strain relationship of materials [

107,

108]. A good example where ML and NNs have been used is for the investigation of concrete, a material whose properties depend significantly on the proportions of its various constituents. Yeh [

109] used experimental data to train a NN with one hidden layer to accurately predict the compressive strength of high-performance concrete, taking as its input the amounts of the various ingredients used for its production. Other studies use NNs that accept a strain tensor as their input, and predict the stress applied on the material [

83,

84,

107,

108,

109,

110]. Waszczyszyn and Ziemiański [

110], for example, used several different NN architectures and evaluated their efficacy in several physical problems, such as calculating vibrational properties of buildings, and damage detection in beams. ANNs have also been used in structural dynamics [

111,

112]

Another good example of material characterization is nanofluids: fluids that contain solid particles of nanometer-sized characteristic dimensions. Properties of nanofluids, such as thermal conductivity and viscosity, are complex functions of many different parameters, including temperature, particle size, and particle aggregation. Expensive experiments [

113,

114,

115] and time-costly simulations [

116,

117,

118,

119,

120] have attempted to delineate this complex behavior with little success.

Instead, over the last decade, NNs have been used for the prediction of nanofluid properties. These NNs take a subset of system parameters (e.g., particle volume fraction and the thermal conductivity of the base fluid and particles) as inputs and output one or more nanofluid properties. Papari et al. [

121] have used a NN with one hidden layer to predict the thermal conductivity enhancement of the nanofluid, i.e., the ratio of the thermal conductivities of the nanofluid and base fluid. The 2-dimensional input vector consists of the thermal conductivity of the base fluid and the particle volume fraction. The authors trained and tested the model on various nanofluids consisting of carbon nanotubes suspended in different liquids. Other studies used similar models, extending, however, the feature vector to include the temperature [

122,

123,

124], and average particle cluster size (to account for aggregation) [

123]. Ariana et al. [

125] used an ANN with one hidden layer that takes the temperature, volume fraction and particle diameter as input and predicts the thermal conductivity. The authors studied the effect of the number of neurons in the calculation of the thermal conductivity of nanofluids. They concluded that 14 neurons provided optimal results. DNNs with two hidden layers have also been considered for the calculation of the thermal conductivity [

126,

127,

128,

129,

130,

131] and viscosity [

131]. An ANN with a single hidden layer and 12 neurons has also been used to calculate the thermal conductivity of hybrid nanofluids, with different types of particles [

132].

Fault detection and system health management is another area where ML has been used [

133]. ANNs can be employed to identify damage on the aerodynamic surfaces [

134] and sensors [

135,

136,

137] on the aircraft flight control system. More advanced RNNs with Long Short-Term Memory (LSTM) have been used for real-time anomaly detection on the aircraft [

138].

ANNs have also been used to optimize the design of structures, such as that of a yacht [

139], a task which usually requires several time-consuming CFD simulations. Esau [

140] used NNs to calculate fluid-structure interactions, even when the surface morphology is not well resolved.

While the static predictions described thus far are often very useful, other applications require predicting how a system will evolve at later time. Designing ML algorithms for such dynamical behavior is tantamount to unsteady simulations and is a highly active topic of research.

ANNs have been used to study transient physics. The most basic form involves providing the ANN with the values of a variable at consecutive times. In turn, it attempts to output the value of the variable for some future point. Equivalently the ANN can output the time-derivatives of the data, which can be integrated in time to yield the future state. The data in the time-series is quite often compressed using some-dimensionality reduction algorithm. This approach has been used successfully to predict the behavior of various physical systems such as the dampened harmonic oscillators [

141], and the 2D Navier–Stokes [

142,

143].

The above approach can also model chaotic, unstable systems. Nakajima et al. [

144] used simple ANNs to study fluidized beds, a complex mixture of fluid and solid particles, and have even managed to identify bifurcating patterns [

145], indicative of instabilities. The predictive capability of these methods, however, is usually limited to only a short time interval past that of the training set.

For predictions over a longer period of time, ANNs can re-direct their output and append it to the time-series corresponding to its input, and use it for the calculation of data at successively longer periods. However, chaotic systems are susceptible to initial and boundary conditions. Therefore, an accurate long-term prediction requires careful selection and pre-processing of the features [

146]; otherwise, the error for highly chaotic systems increases with continuously longer times [

147,

148]. More recently, regression forests [

149], as well LSTM [

150] have been used for transient flow physics.

A recent breakthrough in dynamic prediction is called Sparse Identification of Non-linear Dynamics (SINDy) [

151]. SINDy reconstructs governing equations, based on data measured across different times. In general, the user empirically selects several possible non-linear functions for the input features. Using sparse regression, SINDy then selects the functions that are relevant to the input data, thus recreating an attractor corresponding to the physical problem. The reconstructed equations can then be used without the need for ML, and can theoretically make predictions outside the scope represented by the training data.

Lui and Wolf [

152] used a methodology similar to SINDy, the main difference being that DNNs were used to identify the non-linear functions describing the problem, rather than manually and empirically selecting them. The authors tested their method successfully on non-linear oscillators, as well as compressible flows around a cylinder.

SINDy sounds like the optimal solution to dynamic prediction, as it extracts a relatively global function that is not necessarily restricted to the confines of the training data. Pan and Duraisamy [

153], however, identified that while SINDy is near perfect for systems described by polynomial functions, its performance drops significantly when considering nonpolynomial systems. The authors showcased this for the case of a nonrational, nonpolynomial oscillator. They attributed the behavior to the complexity of the dynamic behavior of the system that cannot be decomposed into a sparse number of functions. The paper concluded that for such general cases, the fundamental ANN-based way of dynamic modelling is more appropriate.

Efforts are also directed into incorporating our theoretical knowledge of physics into the ML algorithms, rather than using them as trained, black-box tools. Bayesian inference has long been linked to numerical analysis (i.e., Bayesian numerical analysis), and has recently been used for numerical homogenization [

154], i.e., finding solutions to partial differential equations. Subsequent studies have used GP regression that, given the solution of a physical problem, it can discover the underlying linear [

155,

156] and non-linear [

157] differential equations, such as the Navier–Stokes and Schröedinger equations. Alternatively, similar GP-based methods can infer solutions, given the governing differential equations as well as (potentially noisy) initial conditions [

158]. A particularly attractive feature of GPs is that they enable accurate predictions with a small amount of data.

While GPs generally allow the injection of knowledge through the covariance operators, creating physics-informed ANNs feels like a more challenging task considering the inherent black-box-nature of traditional ANNs. ANNs can be used for automatic differentiation [

159], enabling the calculation of derivatives of a broad range of functions.

Automatic differentiation has been used over the last few years to solve supervised learning problems using ANNs that respect the laws of physics, given as partial differential equations [

160,

161,

162]. An excellent example of the potential use of such an approach is given by Raissi et al. [

163]. The paper solves the inverse problem of calculating the lift of a bluff body, given sparse data of the flow field, a difficult task using conventional numerical schemes. The algorithm is based on two coupled DNNs (

Figure 5): the first is a traditional black-box ANN that takes flow input parameters, such as time, and coordinates, and outputs details of the flow, such as the velocity components, pressure, and the displacement of the solid body. Sigmoid activation functions were used for the neurons of this ANN. The output of the first network is then linked onto the physics-informed network that generates the equations, where derivatives are used as activation functions. The cost function of the entire NN is the sum of the two cost functions of each NN.

A recent approach to solving differential equations is the Deep Galerkin Method (DGM) [

164]. The conventional Galerkin method reconstructs solutions to partial differential equations as a linear superposition of various basis functions. DGM instead forms more complicated combinations of these basis functions using a DNN.

CNNs have also been used for data-driven approaches that are constrained by the laws of physics [

165]. This method carefully embeds the differential equations into the cost function. Thus, it can inherently calculate the error, without requiring provision of the pre-calculated solution for training. The accuracy of the model was comparable to fully connected DNNs, while significantly reducing the required time.