First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations: Mathematical Framework and Illustrative Application to the Nordheim–Fuchs Reactor Safety Model

Cacuci, Dan Gabriel

doi:10.3390/jne5030023

Open AccessArticle

First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations: Mathematical Framework and Illustrative Application to the Nordheim–Fuchs Reactor Safety Model

by

Dan Gabriel Cacuci

Department of Mechanical Engineering, University of South Carolina, Columbia, SC 29208, USA

J. Nucl. Eng. 2024, 5(3), 347-372; https://doi.org/10.3390/jne5030023

Submission received: 29 July 2024 / Revised: 3 September 2024 / Accepted: 9 September 2024 / Published: 13 September 2024

(This article belongs to the Special Issue Reliability Analysis and Risk Assessment of Nuclear Systems)

Download Versions Notes

Abstract

This work introduces the mathematical framework of the novel “First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations” (1st-CASAM-NODE) which yields exact expressions for the first-order sensitivities of NODE decoder responses to the NODE parameters, including encoder initial conditions, while enabling the most efficient computation of these sensitivities. The application of the 1st-CASAM-NODE is illustrated by using the Nordheim–Fuchs reactor dynamics/safety phenomenological model, which is representative of physical systems that would be modeled by NODE while admitting exact analytical solutions for all quantities of interest (hidden states, decoder outputs, sensitivities with respect to all parameters and initial conditions, etc.). This work also lays the foundation for the ongoing work on conceiving the “Second-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations” (2nd-CASAM-NODE) which aims at yielding exact expressions for the second-order sensitivities of NODE decoder responses to the NODE parameters and initial conditions while enabling the most efficient computation of these sensitivities.

Keywords:

neural ordinary differential equations (NODE); comprehensive adjoint sensitivity analysis methodology for NODE (1st-CASAM-NODE); Nordheim–Fuchs reactor safety model; sensitivity analysis for model features (1st-FASAM-N); exact sensitivities

1. Introduction

Concepts of dynamical systems theory have been frequently used to improve neural network performance [1,2,3] but Neural Ordinary Differential Equations (NODE) appear to have been formally introduced by Chen et al. [4]. NODE provide an explicit connection between deep feed-forward neural networks and dynamical systems and are considered to provide a bridge between modern deep learning and classical mathematical/numerical modeling. NODE provide a flexible trade-off between efficiency, memory costs, and accuracy. The approximation capabilities [5,6] of NODE are particularly useful for time-series modeling [4,7,8], generative models for continuous normalizing flows [4,9], and modeling/controlling physical environments see, e.g., [10].

Neural ODEs are trained by minimizing a least-squares quadratic scalar-valued “loss function” by computing its gradients with respect to the weights to be optimized using a first-order optimizer such as “stochastic gradient descent” [11,12]. Since ODE solvers (e.g., Runge–Kutta solvers) perform differentiable algebraic operations, the gradients of the loss function can be calculated by the so-called “direct method” which directly backpropagates through the operations performed by the ODE solver. However, when the dynamics are complex, the “direct method” can lead to an arbitrarily large number of function evaluations for adaptive solvers while storing all of the intermediate activations during the “solving” process, so the “direct method” becomes prohibitively memory intensive. A NODE-training method which is less memory intensive is the so-called “adjoint method” [13,14,15], which solves an ODE (related to the original NODE) backwards in time. The direct method is faster but is more memory intensive than the adjoint method. The one-dimensional definite integrals, which appear when computing gradients via the “adjoint method” are traditionally evaluated by solving them as differential equations, which considerably slows down the training process. Evaluating these one-dimensional definite integrals by using Gauss–Legendre quadrature (rather than solving them as ODEs) has been shown [16] to be faster than ODE-based methods while retaining memory efficiency, thus speeding up the training of NODE.

The gradients of the loss function are often called “sensitivities” in the literature on neural nets and aspects of the optimization/training procedure are occasionally called “sensitivity analysis”. But the “loss function” is of interest only for the “training” phase of the NODE and the “sensitivities of the loss function” are driven towards the ideal zero-values by the minimization process while optimizing the NODE weights/parameters. Furthermore, after the NODE is optimized to reproduce the underlying physical system as closely as possible, the responses of interest for the NODE-modeled system are no longer a “loss function” but are various functions of the NODE’s “decoder” output. Since the physical system being modeled by the NODE comprises itself parameters that stem from measurements or computations, they are not perfectly well known but are afflicted by uncertainties that stem from the respective experiments and/or computations. Hence, it is important to quantify the uncertainties induced in the NODE decoder output by the uncertainties that afflict the parameters/weights underlying the physical system modeled by the NODE. The quantification of the uncertainties in the NODE decoder and derived results (i.e., “NODE responses”) of interest require the computation of the sensitivities of the NODE decoder with respect to the optimized NODE weights/parameters. However, a “NODE sensitivity analysis” methodology for computing efficiently exact expressions of decoder sensitivities with respect to the post-training optimized parameters/weights, including with respect to the initial conditions/encoder, does not seem to be available in the literature.

The scope of this work is to present a novel methodology for computing all of the first-order sensitivities, exactly and exhaustively, of the responses of the post-training optimized NODE decoder with respect to the optimized/trained weights involved in the NODE’s decoder, hidden layers, and encoder. The general mathematical representation of the NODE network considered in this work is presented in Section 2. As a specific illustrative paradigm application, Section 3 presents the NODE conceptual representation of the Nordheim–Fuchs phenomenological reactor dynamics/safety model [17,18]. This paradigm illustrative model has been chosen because it is representative of typical NODE applications while admitting closed-form analytical solutions for the quantities of interest, including the functions describing the hidden layers, encoder, decoder, and sensitivities of decoder responses. Section 4 presents the mathematical framework of the novel “First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations” (1st-CASAM-NODE). Section 5 illustrates the application of the 1st-CASAM-NODE methodology to compute all of the first-order sensitivities of Nordheim–Fuchs model responses with respect to the underlying parameters. Specifically, Section 5.1, Section 5.2, Section 5.3 and Section 5.4, respectively, illustrate the application of the 1st-CASAM-NODE methodology for computing the first-order sensitivities with respect to the underlying model parameters and initial conditions of the following responses: (i) the reactor’s flux; (ii) the reactor’s energy release; (iii) the reactor’s temperature; and (iv) the reactor’s thermal conductivity. Using the “energy-released” response as a paradigm, Section 5.5 illustrates an alternative path for computing first-order sensitivities by applying the “First-Order Feature Adjoint Sensitivity Analysis Methodology for Nonlinear Systems” (1st-FASAM-N) [19], which is the most efficient procedure for computing first-order sensitivities, but which may require the construction of a dedicated neural net for this purpose.

2. Neural Ordinary Differential Equations (NODE): Basic Properties and Uses

A general mathematical representation of a NODE network is provided by the following system of so-called “augmented” equations:

\frac{d h (t)}{d t} = f [h (t); θ; t], t > 0,

(1)

h (t_{0}) = h_{e} (x, w), a t t = t_{0},

(2)

r (t_{f}) = h_{d} [h (t_{f}); φ], a t t = t_{f},

(3)

where:

(i): The quantity $t$ is a time-like independent variable which parameterizes the dynamics of the hidden/latent neuron units; the initial value is denoted as $t_{0}$ (which can be considered to be an initial measurement time) while the stopping value is denoted as $t_{f}$ (which can be considered to be the next measurement time).
(ii): The $T H$ -dimensional vector-valued function $h (t) ≜ {[h_{1} (t), \dots, h_{T H} (t)]}^{†}$ represents the hidden/latent neural networks. In this work, all vectors are considered to be column vectors and the dagger “ $†$ ” symbol will be used to denote “transposition”. The symbol “ $≜$ ” signifies “is defined as” or, equivalently, “is by definition equal to”.
(iii): The $T H$ -dimensional vector-valued nonlinear function $f [h (t); θ; t] ≜ {[f_{1} (h; θ; t), \dots, f_{T H} (h; θ; t)]}^{†}$ models the dynamics of the latent neurons with learnable scalar adjustable weights represented by the components of the vector $θ ≜ {(θ_{1}, \dots, θ_{T W})}^{†}$ , where $T W$ denotes the total number of adjustable weights in all of the latent neural nets.
(iv): The $T H$ -dimensional vector-valued function $h_{e} (x, w) ≜ {\{h_{1}^{e} (x, w), \dots, h_{T H}^{e} (x, w)\}}^{†}$ represents the “encoder” which is characterized by “inputs” $x ≜ {(x_{1}, \dots, x_{T I})}^{†}$ and “learnable” scalar adjustable weights $w ≜ {(w_{1}, \dots, w_{T E W})}^{†}$ , where $T I$ denotes the total number of “inputs” and $T E W$ denotes the total number of “learnable encoder weights” that define the “encoder”.
(v): The $T R$ -dimensional vector-valued function $r (t_{f}) ≜ {\{r_{1} [h (t_{f}); φ], \dots, r_{T R} [h (t_{f}); φ]\}}^{†} = h_{d} [h (t_{f}); φ]$ represents the vector of “system responses”. The vector-valued function $h_{d} [h (t_{f}); φ] ≜ {\{h_{1}^{d} [h (t_{f}); φ], \dots, h_{T R}^{d} [h (t_{f}); φ]\}}^{†}$ represents the “decoder” with learnable scalar adjustable weights, which are represented by the components of the vector $φ ≜ {(φ_{1}, \dots, φ_{T D})}^{†}$ , where $T D$ denotes the total number of adjustable weights that characterize the “decoder”. Each component $r_{n} [h (t_{f}); φ]$ can be represented in integral form as follows:

$r_{n} (h; φ) = \int_{t_{0}}^{t_{f}} h_{n}^{d} [h (t); φ] δ (t - t_{f}) d t; n = 1, \dots, T R .$

(4)

The weights of the NODE are adjusted/calibrated by “training” the NODE, using gradients of a scalar loss functional, denoted as

L [h (t); θ; t]

, which is designed to represent the deviations/discrepancies between the responses/outputs of the NODE and the “true” values obtained from measurements (or by other means, independently of the NODE). There are several methods for accomplishing this “training”, all of which require that the functions underlying the NODE, i.e.,

h (t)

,

f [h (t); θ; t]

,

h_{e} (x, w)

, and

h_{d} [h (t); φ]

, be differentiable with respect to their arguments. For complex systems, involving many parameters, the so-called “adjoint method” [13,14,15] offers an optimal compromise between memory requirements and computational intensity. This method computes the required gradients of the loss function by evaluating the following integral:

\frac{\partial L}{\partial θ} = \int_{t_{0}}^{t_{f}} {[a (t)]}^{†} \frac{\partial f [h (t); θ; t]}{\partial θ} d t,

(5)

where the so-called “adjoint function”

a (t) ≜ {[a_{1} (t), \dots, a_{T H} (t)]}^{†}

satisfies the following “adjoint equation” computed backwards in time:

\frac{d a (t)}{d t} = - {[a (t)]}^{†} \frac{\partial f [h (t); θ; t]}{\partial h}, t > 0,

(6)

a (t) = \frac{\partial L [h (t); θ; t]}{\partial h}, a t t = t_{f} .

(7)

After the “training” of the NODE has been accomplished, the various “weights” will have been assigned “optimal” values which will have minimized the chosen loss functional

L [h (t); θ; t]

. These “optimal” values will be denoted using a superscript “zero” as follows:

θ^{0} ≜ {[θ_{1}^{0}, \dots, θ_{T W}^{0}]}^{†}

and

w^{0} ≜ {[w_{1}^{0}, \dots, w_{T E W}^{0}]}^{†}

. These optimal values are used to compute the optimal values for the system responses, which will be denoted as

r_{n}^{0} [h (t_{f}); φ]

. However, since the physical parameters and the initial conditions underlying the actual physical system (which is represented by the optimized NODE) are not known exactly (because they are actually subject to uncertainties), it follows that the optimal values obtained for the weights are actually just nominal values that are used to compute the nominal/optimal response values

r_{n}^{0} [h (t_{f}); φ]

. The uncertainties in the various weights and initial conditions will induce uncertainties in the system responses, which can be computed deterministically by using the well-known “propagation of errors” methodology, originally proposed by Tuckey [20] and subsequently extended to the sixth order by Cacuci [21].

3. Illustrative Paradigm Application: NODE Conceptual Modeling of the Nordheim–Fuchs Phenomenological Reactor Dynamics/Safety Model

The Nordheim–Fuchs phenomenological model [17,18] describes a short-time self-limiting power transient in a nuclear reactor system having a negative temperature coefficient in which a large amount of reactivity is suddenly inserted, either intentionally or by accident. The response of such a reactor system can be estimated by considering that the reactivity insertion is sufficiently large and the time-span of the transient phenomena under consideration is of the order of the life-time of prompt neutrons. For such short times, the effects of delayed neutrons and the local spatial variations of the neutron distribution in the reactor can be neglected, and the heat generated during the transient remains within the reactor. Using the notation of Lamarsh [17], the Nordheim–Fuchs paradigm model describing such a self-limiting power transient comprises the following balance equations:

The time-dependent neutron balance (point kinetics) equation for the neutron flux φ(t):

$\frac{d φ (t)}{d t} = \frac{k (t) - 1}{l_{p}} φ (t), t > 0,$

(8)

φ (0) = φ_{0}, t = 0,

(9)

where

l_{p}

denotes the prompt-neutron lifetime,

k (t)

denotes the reactor’s multiplication factor, and

φ_{0}

denotes the initial (i.e., extant flux) prior to initiating the transient at time

t = 0

.

2.: The energy production equation:

E (t) = γ Σ_{f} \int_{0}^{t} φ (x) d x,

(10)

where

γ

denotes the recoverable energy per fission;

Σ_{f} ≜ σ_{f} N_{f}

denotes the reactor’s effective macroscopic fission cross-section, where

σ_{f}

denotes the reactor’s equivalent microscopic fission cross section while

N_{f}

denotes the reactor’s equivalent atomic number density.

3.: The energy conservation equation:

c_{p} [T (t) - T_{0}] = E (t),

(11)

where

E (t)

denotes the total energy released (per cm³) at time

t

in the reactor since the onset of reactivity change;

c_{p}

denotes the specific heat (per cm³) of the reactor.

4.: The reactivity–temperature feedback equation: $k (t) = k_{0} - α_{T} k_{0} [T (t) - T_{0}]$ , where $k_{0} ≜ k (0) \geq 1$ denotes the changed multiplication factor following the reactivity insertion at $t = 0$ , $α_{T}$ denotes the magnitude of the negative temperature coefficient, $T (t)$ denotes the reactor’s temperature, and $T_{0}$ denotes the reactor’s initial temperature at time $t = 0$ . For illustrating the application of the 1st-FASAM methodology, it suffices to consider the special case of a “prompt critical transient”, when the reactor becomes prompt critical after the reactivity insertion, i.e., when $k_{0} = 1$ , so that the reactivity–temperature feedback equation takes on the following particular form:

$k (t) = 1 - α_{T} [T (t) - T_{0}] .$

(12)

Equations (8)–(12) can be transformed into the following system of nonlinear differential equations:

\frac{d φ (t)}{d t} = - \frac{α_{T}}{l_{p} c_{p}} E (t) φ (t), t > 0 . φ (0) = φ_{0}, t = 0

(13)

\frac{d E (t)}{d t} = γ σ_{f} N_{f} φ (t), E (0) = 0,

(14)

\frac{d T (t)}{d t} = \frac{γ σ_{f} N_{f}}{c_{p}} φ (t); T (0) = T_{0} .

(15)

The Nordheim–Fuchs model described by Equations (13)–(15) can be solved analytically to obtain exact closed-form expressions for the state functions

φ (t)

,

E (t)

, and

T (t)

, as follows:

(i): Eliminating the function φ(t) from Equations (13) and (14) yields a nonlinear differential equation which can be integrated directly to obtain the following relation:

$φ (t) = - \frac{α_{T}}{2 l_{p} c_{p} γ σ_{f} N_{f}} E^{2} (t) + φ_{0} .$

(16)
(ii): Using Equation (16) in Equation (14) yields the following nonlinear equation for the released energy $E (t)$ :

$\frac{d E (t)}{d t} = - \frac{α_{T}}{2 l_{p} c_{p}} E^{2} (t) + φ_{0} γ σ_{f} N_{f}, E (0) = 0 .$

(17)

The closed-form solution of Equation (17) has the following form:

E (t) = K_{1} (α) \tanh [t K_{2} (α)],

(18)

where:

K_{1} (α) ≜ {[\frac{2 φ_{0} γ σ_{f} N_{f} l_{p} c_{p}}{α_{T}}]}^{1 / 2}; K_{2} (α) ≜ {[\frac{α_{T} φ_{0} γ σ_{f} N_{f}}{2 l_{p} c_{p}}]}^{1 / 2} .

(19)

(iii): Replacing Equation (18) into Equation (16) yields the following closed-form expression for $φ (t)$ :

$φ (t) = φ_{0} \{1 - \tanh^{2} [t K_{2} (α)]\} = \frac{φ_{0}}{\cosh^{2} [t K_{2} (α)]} .$

(20)
(iv): Replacing Equation (18) into Equation (11) yields the following closed-form expression for $T (t)$ :

$T (t) = T_{0} + \frac{K_{1} (α)}{c_{p}} \tanh [t K_{2} (α)] .$

(21)

The typical results of interest (called “model response”) for the Nordheim–Fuchs model are as follows:

(i): The neutron flux φ(τ) in the reactor at a “final time” instance denoted as t = τ, after the initiation at t = 0 of the prompt-critical power transient, which can be defined mathematically as follows:

$φ (τ) = \int_{0}^{τ} φ (t) δ (t - τ) d t,$

(22)
(ii): The total energy per cm³, $E (τ)$ , released at a user-chosen “final time” instance denoted as $t = τ$ , after the initiation at $t = 0$ of the prompt-critical power transient, which can be defined mathematically as follows:

E (τ) = \int_{0}^{τ} E (t) δ (t - τ) d t,

(23)

where

δ (t - τ)

denotes the Dirac-delta functional.

(iii): The reactor’s temperature $T (τ)$ at a “final time” instance denoted as $t = τ$ after the initiation at $t = 0$ of the prompt-critical power transient, which can be defined mathematically as follows:

$T (τ) = \int_{0}^{τ} T (t) δ (t - τ) d t,$

(24)

Comparing the structure of the Nordheim–Fuchs model, cf. Equations (13)–(15), to the generic structure of a NODE, cf. Equations (1) and (2), indicates the following correspondences:

h (t) ≜ {[h_{1} (t), \dots, h_{T H} (t)]}^{†} ≜ {[φ (t), E (t), T (t)]}^{†}; T H = 3;

(25)

θ ≜ {[θ_{1}, \dots, θ_{T W}]}^{†} ≜ {(α_{T}, l_{p}, c_{p}, γ, σ_{f}, N_{f})}^{†}; x ≜ {[x_{1}, x_{2}]}^{†} ≜ {(φ_{0}, T_{0})}^{†}; T W = 6, T I = 2 .

(26)

f_{1} (h; θ; t) ≜ - \frac{α_{T}}{l_{p} c_{p}} E (t) φ (t) ≜ - \frac{θ_{1}}{θ_{2} θ_{3}} h_{1} (t) h_{2} (t)

(27)

f_{2} (h; θ; t) ≜ γ σ_{f} N_{f} φ (t) ≜ θ_{4} θ_{5} θ_{5} h_{1} (t);

(28)

f_{3} (h; θ; t) ≜ \frac{γ σ_{f} N_{f}}{c_{p}} φ (t) ≜ \frac{θ_{4} θ_{5} θ_{6}}{θ_{3}} h_{1} (t) .

(29)

The actual values of the components of the vectors

θ

and

x

are unknown even after having trained the NODE, since the actual values of the parameters underlying the Nordheim–Fuchs model are experimentally measured and are thus subject to uncertainties. However, the nominal values of these parameters are considered to be known and are considered to be exactly reproducible by the “trained” NODE; these nominal values will be denoted using a superscript “zero”, as follows:

θ^{0} ≜ {[θ_{1}^{0}, \dots, θ_{6}^{0}]}^{†} ≜ {(α_{T}^{0}, l_{p}^{0}, c_{p}^{0}, γ^{0}, σ_{f}^{0}, N_{f}^{0})}^{†}; x^{0} ≜ {[x_{1}^{0}, x_{2}^{0}, x_{3}^{0}]}^{†} ≜ {[φ_{0}^{0}, 0; T_{0}^{0}]}^{†} .

(30)

Consequently, the exact values of the functions

h (t) ≜ {[h_{1} (t), h_{2} (t), h_{3} (t)]}^{†} ≜ {[φ (t), E (t), T (t)]}^{†}

are unknown but their nominal values

h^{0} (t) ≜ {[h_{1}^{0} (t), h_{2}^{0} (t), h_{3}^{0} (t)]}^{†} ≜ {[φ^{0} (t), E^{0} (t), T^{0} (t)]}^{†}

are known after having solved Equations (13)–(15) at the nominal values

(θ^{0}, x^{0})

.

The NODE representations, cf. Equation (4), of the responses considered in Equations (23) and (24) have the following expressions, respectively:

r_{1} (h) = \int_{t_{0} = 0}^{t_{f}} h_{1} (t) δ (t - t_{f}) d t = φ (t_{f});

(31)

r_{2} (h) = \int_{t_{0} = 0}^{t_{f}} h_{2} (t) δ (t - t_{f}) d t = E (t_{f});

(32)

r_{3} (h) = \int_{t_{0} = 0}^{t_{f}} h_{3} (t) δ (t - t_{f}) d t = T (t_{f}) .

(33)

To illustrate the efficient computation of responses involving decoders having their own parameters/weights, the thermal conductivity of the conceptual material of the Nordheim–Fuchs reactor model will be considered to be a “decoder” response having the following expression:

\begin{array}{l} r_{4} (h; φ) = \int_{t_{0}}^{t_{f}} h_{4}^{d} [h (t); φ] δ (t - t_{f}) d t; \\ h_{4}^{d} [h (t); φ] ≜ k (T) = φ_{1} + φ_{2} h_{3} (t) + φ_{3} h_{3}^{2} (t) = φ_{1} + φ_{2} T (t) + φ_{3} T^{2} (t) . \end{array}

(34)

4. First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1st-CASAM-NODE): Mathematical Framework

At the optimal/nominal parameter values, the optimal/nominal solution

h^{0} (t)

will satisfy the following forms of Equations (1) and (2):

\frac{d h^{0} (t)}{d t} = f [h^{0} (t); θ^{0}; t], t > 0,

(35)

h^{0} (t_{0}) = h_{e} (x^{0}, w^{0}), a t t = t_{0} .

(36)

Furthermore, the vector of optimal/nominal response will have components that are obtained by using the nominal values for the respective functions and parameters, i.e.:

r_{n}^{0} (h^{0}; φ^{0}) = \int_{t_{0}}^{t_{f}} h_{n}^{d} [h^{0} (t); φ^{0}] δ (t - t_{f}) d t; n = 1, \dots, T R .

(37)

The known nominal values

x^{0}

of the initial conditions will differ from the true but unknown values

x

of the initial conditions by variations denoted as

δ x ≜ x - x^{0}

. Furthermore, the known nominal values

w^{0}

of the weights characterizing the encoder will differ from the true but unknown values

w

of the respective weights by variations denoted as

δ w ≜ w - w^{0}

. Similarly, the nominal values

θ^{0}

and

φ^{0}

, respectively, will differ by variations

δ θ ≜ θ - θ^{0}

and

δ φ ≜ φ - φ^{0}

, respectively, from the corresponding true but unknown values

θ

and

φ

. Since the forward state functions

h (t)

are related to the weights and initial conditions through Equations (1) and (2), it follows that the variations in these weights and initial conditions will induce corresponding variations

v^{(1)} (t) ≜ {[δ h_{1} (t), \dots, δ h_{T H} (t)]}^{†}

around the nominal solution

h^{0} (t)

. In turn, the variations

δ φ

and

v^{(1)} (t)

will induce variations

δ r_{n} (h^{0}; φ^{0}; v^{(1)}; δ φ)

in the system’s response.

The 1st-CASAM-NODE methodology for computing the first-order sensitivities of the response with respect to the model’s weights and initial conditions will be established by following the same principles as those underlying the 1st-CASAM-N methodology [22], which commence by noting that Cacuci [23] has shown that the most general definition of the sensitivity of an operator-valued model response

R (e)

with respect to variations

δ e

in the model parameters and state functions in a neighborhood around the nominal functions and parameter values

e^{0}

, is given by the 1st-order Gateaux- (G-) variation, which will be denoted as

δ R (e^{0}; δ e)

and is defined as follows:

δ R (e^{0}; δ e) ≜ {\{\frac{d}{d ε} [R (e^{0} + ε δ e)]\}}_{ε = 0} ≜ \lim_{ε \to 0} \frac{R (e^{0} + ε δ e) - R (e^{0})}{ε},

(38)

for a scalar

ε

and for all (i.e., arbitrary) vectors

δ e

in a neighborhood

(e^{0} + ε δ e)

around

e^{0}

. The G-variation

δ R (e^{0}; δ e)

is an operator defined on the same domain as

R (e)

and has the same range as

R (e)

. The G-variation

δ R (e^{0}; δ e)

satisfies the following relation:

R (e^{0} + ε δ e) - R (e^{0}) = δ R (e^{0}; δ e) + Δ (δ e),

with

\lim_{ε \to 0} [Δ (ε δ e)] / ε = 0

. When the G-variation

δ R (e^{0}; δ e)

is linear in the variation

δ e

, it can be written in the form

δ R (e^{0}; δ e) = {\{\partial R / \partial e\}}_{e^{0}} δ e

, where

{\{\partial R / \partial e\}}_{e^{0}}

denotes the first-order G-derivative of

R (e)

with respect to

e

evaluated at

e^{0}

.

Applying the definition provided in Equation (38) to Equation (4) yields the following expression for the first-order G-variation

δ r_{n} (h^{0}; φ^{0}; v^{(1)}; δ φ)

of the response

r_{n} (h; φ)

:

\begin{array}{l} δ r_{n} (h^{0}; φ^{0}; v^{(1)}; δ φ) = {\{\frac{d}{d ε} \int_{t_{0}}^{t_{f}} h_{n}^{d} [h^{0} (t) + ε v^{(1)} (t); φ^{0} + ε δ φ] δ (t - t_{f}) d t;\}}_{ε = 0} \\ = {\{δ r_{n} (h^{0}; φ^{0}; δ φ)\}}_{d i r} + {\{δ r_{n} (h^{0}; φ^{0}; v^{(1)})\}}_{i n d}; n = 1, \dots, T R . \end{array}

(39)

where

v^{(1)} ≜ {[v_{1}^{(1)} (t), \dots, v_{T H}^{(1)} (t)]}^{†}

and:

{\{δ r_{n} (h^{0}; φ^{0}; δ φ)\}}_{d i r} ≜ {\int_{t_{0}}^{t_{f}} δ (t - t_{f}) \{\frac{\partial h_{n}^{d} [h (t); φ]}{\partial φ}\}}_{(h^{0}; φ^{0})} δ φ d t,

(40)

{\{δ r_{n} (h^{0}; φ^{0}; v^{(1)})\}}_{i n d} ≜ {\int_{t_{0}}^{t_{f}} δ (t - t_{f}) \{\frac{\partial h_{n}^{d} [h (t); φ]}{\partial h (t)}\}}_{(h^{0}; φ^{0})} v^{(1)} (t) d t .

(41)

Thus, the quantity

{\{\partial h_{n}^{d} [h (t); φ] / \partial φ\}}_{(h^{0}; φ^{0})}

in Equation (40) denotes the partial G-derivatives of the response

h_{n}^{d} [h (t); φ]

with respect to the decoder weights

φ ≜ {[φ_{1}, \dots, φ_{T D}]}^{†}

, evaluated at the nominal values

(h^{0}; φ^{0})

. The quantity

{\{δ r_{n} (h^{0}; φ^{0}; δ φ)\}}_{d i r}

is called the “direct-effect term” because it arises directly from parameter variations

δ φ

and can be computed directly using the nominal values

(h^{0}; φ^{0})

. The quantity

{\{δ r_{n} (h^{0}; φ^{0}; δ h; δ φ)\}}_{i n d}

is called the “indirect-effect term” because it arises indirectly through the variations

v^{(1)} (t)

in the hidden state functions

h (t)

. The indirect-effect term can be quantified only after having determined the variations

v^{(1)} (t)

, which are caused by the variations

δ x

,

δ w

, and

δ θ

.

The first-order relationships between the variations

v^{(1)} (t)

,

δ x

,

δ w

, and

δ θ

are obtained by computing the first-order G-variation of Equations (1) and (2), which are obtained, by definition, as follows:

{\{\frac{d}{d ε} [\frac{d}{d t} (h^{0} + ε v^{(1)})]\}}_{ε = 0} = {\{\frac{d}{d ε} f [h^{0} + ε v^{(1)}; θ^{0} + ε δ θ; t]\}}_{ε = 0},

(42)

{\{\frac{d}{d ε} [h^{0} (t_{0}) + ε v^{(1)} (t_{0})]\}}_{ε = 0} = {\{\frac{d}{d ε} [h_{e} (x^{0} + ε δ x, w^{0} + ε δ w)]\}}_{ε = 0} .

(43)

Carrying out the operations indicated in Equations (42) and (43) yields the following system of equations:

\frac{d v^{(1)} (t)}{d t} - {\{\frac{\partial f (h; θ)}{\partial h}\}}_{(h^{0}, θ^{0})} v^{(1)} (t) = {\{\frac{\partial f (h; θ)}{\partial θ}\}}_{(h^{0}, θ^{0})} δ θ,

(44)

v^{(1)} (t_{0}) = {\{\frac{\partial h_{e} (x, w)}{\partial x}\}}_{(x^{0}, w^{0})} δ x + {\{\frac{\partial h_{e} (x, w)}{\partial w}\}}_{(x^{0}, w^{0})} δ w .

(45)

where:

\frac{\partial f (h; θ)}{\partial h} ≜ {(\begin{matrix} \partial f_{1} / \partial h_{1} & \cdot & \partial f_{1} / \partial h_{T H} \\ \cdot & \cdot & \cdot \\ \partial f_{T H} / \partial h_{1} & \cdot & \partial f_{T H} / \partial h_{T H} \end{matrix})}_{T H \times T H},

(46)

\frac{\partial f (h; θ)}{\partial θ} ≜ {(\begin{matrix} \partial f_{1} / \partial θ_{1} & \cdot & \partial f_{1} / \partial θ_{T W} \\ \cdot & \cdot & \cdot \\ \partial f_{T H} / \partial θ_{1} & \cdot & \partial f_{T H} / \partial θ_{T W} \end{matrix})}_{T H \times T W},

(47)

\frac{\partial h_{e} (x, w)}{\partial x} ≜ {(\begin{matrix} \partial h_{1}^{e} / \partial x_{1} & \cdot & \partial h_{1}^{e} / \partial x_{T I} \\ \cdot & \cdot & \cdot \\ \partial h_{T H}^{e} / \partial x_{1} & \cdot & \partial h_{T H}^{e} / \partial x_{T I} \end{matrix})}_{T H \times T I},

(48)

\frac{\partial h_{e} (x, w)}{\partial w} ≜ {(\begin{matrix} \partial h_{1}^{e} / \partial w_{1} & \cdot & \partial h_{1}^{e} / \partial w_{T E W} \\ \cdot & \cdot & \cdot \\ \partial h_{T H}^{e} / \partial w_{1} & \cdot & \partial h_{T H}^{e} / \partial w_{T E W} \end{matrix})}_{T H \times T E W} .

(49)

The system comprising Equations (44) and (45) is called the “1st-Level Variational Sensitivity System” (1st-LVSS), and its solution,

v^{(1)} (t)

, is called the “1st-level variational sensitivity function”. Note that the 1st-LVSS would need to be solved anew for each component of the variations

δ x

,

δ w

, and

δ θ

, which would be prohibitively expensive computationally.

The need for solving the 1st-LVSS can be avoided if the indirect-effect term defined in Equation (41) could be expressed in terms of a “right-hand side” that does not involve the function

v^{(1)} (t)

. This goal can be achieved by expressing the right side of Equation (41) in terms of the solutions of the “1st-Level Adjoint Sensitivity System” (1st-LASS), the construction of which requires the introduction of adjoint operators. Adjoint operators can be defined in Banach spaces but are most useful in Hilbert spaces. For the NODE considered in this work, the appropriate Hilbert space is defined on the domain

Ω_{t} ≜ [t_{0}, t_{f}]

and will be denoted as

H_{1} (Ω_{t})

, so that

v^{(1)} (t) \in H_{1} (Ω_{t})

. In

H_{1} (Ω_{t})

, the inner product of two vectors in

u^{(a)} (t) \in H_{1} (Ω_{t})

and

u^{(b)} (t) \in H_{1} (Ω_{t})

will be denoted as

{〈u^{(a)}, u^{(b)}〉}_{1}

, and is defined as follows:

{〈u^{(a)}, u^{(b)}〉}_{1} ≜ {\{\int_{t_{0}}^{t_{f}} u^{(a)} (t) \cdot u^{(b)} (t) d t\}}_{(x^{0}; θ^{0}; w^{0}; φ^{0})},

(50)

where the “dot” indicates the “scalar product of two vectors” defined as follows:

u^{(a)} (t) \cdot u^{(b)} (t) ≜ {[u^{(a)} (t)]}^{†} u^{(b)} (t) ≜ \sum_{i = 1}^{T H} u_{i}^{(a)} (t) u_{i}^{(b)} (t) = {[u^{(b)} (t)]}^{†} u^{(a)} (t)

.

The next step is to form the inner product of Equation (44) with a vector

a^{(1)} (t) ≜ [a_{1}^{(1)} (t), \dots, a_{T D}^{(1)} (t)] \in H_{1} (Ω_{t})

, where the superscript “(1)” indicates “1st-level”, to obtain the following relationship:

{〈a^{(1)} (t), \frac{d v^{(1)} (t)}{d t} - {[\frac{\partial f (h; θ)}{\partial h}]}_{(h^{0}, θ^{0})} v^{(1)} (t)〉}_{1} = {〈a^{(1)} (t), {[\frac{\partial f (h; θ)}{\partial θ}]}_{(h^{0}, θ^{0})} δ θ〉}_{1} .

(51)

Using the definition of the adjoint operator in

H_{1} (Ω_{t})

, the left side of Equation (51) is transformed as follows, after integrating by parts over the independent variable

t

:

\begin{array}{l} \int_{t_{0}}^{t_{f}} a^{(1)} (t) \cdot \frac{d v^{(1)} (t)}{d t} d t - \int_{t_{0}}^{t_{f}} a^{(1)} (t) \cdot {[\frac{\partial f (h; θ)}{\partial h}]}_{(h^{0}, θ^{0})} v^{(1)} (t) d t = a^{(1)} (t_{f}) \cdot v^{(1)} (t_{f}) \\ - a^{(1)} (t_{0}) \cdot v^{(1)} (t_{0}) + \int_{t_{0}}^{t_{f}} v^{(1)} (t) \cdot \{- \frac{d a^{(1)} (t)}{d t} - {[\frac{\partial f (h; θ)}{\partial h}]}_{(h^{0}, θ^{0})}^{†} a^{(1)} (t)\} d t . \end{array}

(52)

The last term on the right side of Equation (52) is now required to represent the “indirect-effect” term defined in Equation (41), which is achieved by requiring that the 1st-level adjoint function

a^{(1)} (t)

satisfy the following relation:

- \frac{d a^{(1)} (t)}{d t} - {[\frac{\partial f (h; θ)}{\partial h}]}_{(h^{0}, θ^{0})}^{†} a^{(1)} (t) = {\{\frac{\partial h_{n}^{d} [h (t); φ]}{\partial h (t)}\}}_{(h^{0}; φ^{0})} δ (t - t_{f}) .

(53)

The definition of the 1st-level adjoint sensitivity function

a^{(1)} (t)

is now completed by requiring it to satisfy (adjoint) “boundary conditions at the final time”

t = t_{f}

so as to eliminate the term containing the unknown values

v^{(1)} (t_{f})

in Equation (52). This aim is achieved by requiring that

a^{(1)} (t_{f}) = 0 .

(54)

The system of equations comprising Equations (53) and (54) constitute the “1st-Level Adjoint Sensitivity System” (1st-LASS) for the 1st-level adjoint function

a^{(1)} (t)

. Evidently, the 1st-LASS is independent of parameter variations and needs to be solved just once to obtain the 1st-level adjoint function

a^{(1)} (t)

. Notably, the 1st-LASS has the same form as the “adjoint equations” used for training the NODE, cf. Equations (6) and (7), but with the “response”

\partial h_{n}^{d} [h (t); φ] / \partial h (t) δ (t - t_{f})

being the “source” for the 1st-LASS, whereas the “source” in the “training” of the NODE was the “loss functional”

L [h (t); θ; t] / \partial h δ (t - t_{f})

. Evidently, the 1st-level adjoint sensitivity function

a^{(1)} (t)

is the counterpart of the “adjoint function”

a (t)

in the “training” of the NODE.

Using the results represented by Equations (53), (54), (51) and (41) in Equation (52) yields the following alternative expression for the “indirect-effect” term, which does not involve the 1st-level variational sensitivity function

v^{(1)} (t)

but involves the 1st-level adjoint function

a^{(1)} (t)

:

{\{δ r_{n} (h^{0}; φ^{0}; v^{(1)})\}}_{i n d} = {〈a^{(1)} (x), {[\frac{\partial f (h; θ)}{\partial θ}]}_{(h^{0}, θ^{0})} δ θ〉}_{1} + a^{(1)} (t_{0}) \cdot v^{(1)} (t_{0}) .

(55)

Using in Equation (55) the expression provided for

v^{(1)} (t_{0})

in Equation (45) yields the following expression for the “indirect-effect” term:

\begin{array}{l} {\{δ r_{n} (h^{0}; φ^{0}; v^{(1)})\}}_{i n d} = {〈a^{(1)} (x), {[\frac{\partial f (h; θ)}{\partial θ}]}_{(h^{0}, θ^{0})} δ θ〉}_{1} \\ + a^{(1)} (t_{0}) \cdot {\{\frac{\partial h_{e} (x, w)}{\partial x}\}}_{(x^{0}, w^{0})} δ x + a^{(1)} (t_{0}) \cdot {\{\frac{\partial h_{e} (x, w)}{\partial w}\}}_{(x^{0}, w^{0})} δ w \end{array} .

(56)

Replacing the expression obtained in Equation (55) for the “indirect-effect term” together with the expression of the direct-effect term provided by Equation (40) into Equation (39) yields the following expression for the first-order G-variation

δ r_{n} (h^{0}; φ^{0}; v^{(1)}; δ φ)

of the response

r_{n} (h; φ)

:

\begin{array}{l} δ r_{n} (h^{0}; φ^{0}; v^{(1)}; δ φ) = {\{\frac{\partial h_{n}^{d} [h (t_{f}); φ]}{\partial φ}\}}_{(h^{0}; φ^{0})} δ φ + a^{(1)} (t_{0}) \cdot {\{\frac{\partial h_{e} (x, w)}{\partial w}\}}_{(x^{0}, w^{0})} δ w \\ + a^{(1)} (t_{0}) \cdot {\{\frac{\partial h_{e} (x, w)}{\partial x}\}}_{(x^{0}, w^{0})} δ x + \int_{t_{0}}^{t_{f}} a^{(1)} (t) \cdot \{{[\frac{\partial f (h; θ)}{\partial θ}]}_{(h^{0}, θ^{0})} δ θ\} d t; n = 1, \dots, T R . \end{array}

(57)

As indicated by the right side of Equation (57), the (partial) sensitivities of the response

r_{n} (h; φ)

are provided by the following expressions, all of which are to be evaluated at the nominal values of all functions and parameters/weights:

\frac{\partial r_{n}}{\partial φ_{i}} = \frac{\partial h_{n}^{d} [h (t_{f}); φ]}{\partial φ_{i}}; i = 1, \dots, T D; n = 1, \dots, T R;

(58)

\frac{\partial r_{n}}{\partial w_{i}} = a^{(1)} (t_{0}) \cdot \frac{\partial h_{e} (x, w)}{\partial w_{i}}; i = 1, \dots, T E W; n = 1, \dots, T R;

(59)

\frac{\partial r_{n}}{\partial x_{i}} = a^{(1)} (t_{0}) \cdot \frac{\partial h_{e} (x, w)}{\partial x_{i}}; i = 1, \dots, T I; n = 1, \dots, T R;

(60)

\frac{\partial r_{n}}{\partial θ_{i}} = \int_{t_{0}}^{t_{f}} a^{(1)} (t) \cdot \frac{\partial f (h; θ)}{\partial θ_{i}} d t; i = 1, \dots, T W; n = 1, \dots, T R .

(61)

5. Illustrative Application of the 1st-CASAM-NODE Methodology to Compute First-Order Sensitivities of Nordheim–Fuchs Model Responses with Respect to the Underlying Parameters

The application of the 1st-CASAM-NODE methodology to compute the first-order sensitivities of the responses

r_{1} (h)

,

r_{2} (h)

,

r_{3} (h)

, and

r_{4} (h)

with respect to the Nordheim–Fuchs model’s parameters and initial conditions will be presented below in Section 5.1, Section 5.2, Section 5.3 and Section 5.4, respectively. Using the “energy-released” response

r_{2} (h) = E (t_{f})

as a paradigm, Section 5.5 will illustrate an alternative path for computing first-order sensitivities by applying the “First-Order Feature Adjoint Sensitivity Analysis Methodology for Nonlinear Systems” (1st-FASAM-N) [24], which is the most efficient procedure for computing sensitivities, but which may require the construction of a dedicated neural net for this purpose.

5.1. First-Order Sensitivities of the Flux Response $r_{1} (h) = φ (t_{f})$

The first-order sensitivity of the response

r_{1} (h) = φ (t_{f})

is provided by the first-order G-differential of the expression in Equation (31), which is, by definition, obtained as follows:

δ r_{1} (h; δ h) = {\{\frac{d}{d ε} [\int_{0}^{t_{f}} [φ^{0} (t) + ε δ φ (t)] δ (t - t_{f}) d t]\}}_{ε = 0} = \int_{0}^{t_{f}} δ φ (t) δ (t - t_{f}) d t .

(62)

The variation

δ φ (t)

is the solution of the “First-Level Variational Sensitivity System” (1st-LVSS) which is obtained by G-differentiating Equations (13)–(15), which yield the following expressions:

{\{\frac{d}{d ε} [\frac{d}{d t} (φ^{0} + ε δ φ)]\}}_{ε = 0} = - {\{\frac{d}{d ε} [\frac{α_{T}^{0} + ε δ α_{T}}{(l_{p}^{0} + ε δ l_{p}) (c_{p}^{0} + ε δ c_{p})} (E^{0} + ε δ E) (φ^{0} + ε δ φ)]\}}_{ε = 0},

(63)

{\{\frac{d}{d ε} [\frac{d}{d t} (E^{0} + ε δ E)]\}}_{ε = 0} = {\{\frac{d}{d ε} [(γ^{0} + ε δ γ) (σ_{f}^{0} + ε δ σ_{f}) (N_{f}^{0} + ε δ N_{f}) (φ^{0} + ε δ φ)]\}}_{ε = 0},

(64)

{\{\frac{d}{d ε} [\frac{d}{d t} (T^{0} + ε δ T)]\}}_{ε = 0} = {\{\frac{d}{d ε} [\frac{(γ^{0} + ε δ γ) (σ_{f}^{0} + ε δ σ_{f}) (N_{f}^{0} + ε δ N_{f})}{(c_{p}^{0} + ε δ c_{p})} (φ^{0} + ε δ φ)]\}}_{ε = 0},

(65)

{\{\frac{d}{d ε} {[φ^{0} (t) + ε δ φ (t)]}_{t = 0}\}}_{ε = 0} = {\{\frac{d}{d ε} (φ_{0}^{0} + ε δ φ_{0})\}}_{ε = 0},

(66)

{\{\frac{d}{d ε} {[E^{0} (t) + ε δ E (t)]}_{t = 0}\}}_{ε = 0} = 0,

(67)

{\{\frac{d}{d ε} {[T^{0} (t) + ε δ T (t)]}_{t = 0}\}}_{ε = 0} = {\{\frac{d}{d ε} (T_{0}^{0} + ε δ T_{0})\}}_{ε = 0} .

(68)

Performing the operations involving the scalar

ε

in Equations (63)–(68) yields the following expression for the 1st-LVSS:

\begin{array}{l} \frac{d}{d t} δ φ (t) + \frac{α_{T}^{0}}{l_{p}^{0} c_{p}^{0}} E^{0} (t) δ φ (t) + \frac{α_{T}^{0}}{l_{p}^{0} c_{p}^{0}} φ^{0} (t) δ E (t) \\ = [- \frac{δ α_{T}}{l_{p}^{0} c_{p}^{0}} + \frac{α_{T}^{0}}{{(l_{p}^{0})}^{2} c_{p}^{0}} δ l_{p} + \frac{α_{T}^{0}}{l_{p}^{0} {(c_{p}^{0})}^{2}} δ c_{p}] E^{0} (t) φ^{0} (t), \end{array}

(69)

\frac{d}{d t} δ E (t) - γ^{0} σ_{f}^{0} N_{f}^{0} δ φ (t) = [(σ_{f}^{0} N_{f}^{0}) δ γ + (γ^{0} N_{f}^{0}) δ σ_{f} + (γ^{0} σ_{f}^{0}) δ N_{f}] φ^{0} (t),

(70)

\begin{array}{l} \frac{d}{d t} δ T (t) - \frac{γ^{0} σ_{f}^{0} N_{f}^{0}}{c_{p}^{0}} δ φ (t) \\ = [(σ_{f}^{0} N_{f}^{0}) δ γ + (γ^{0} N_{f}^{0}) δ σ_{f} + (γ^{0} σ_{f}^{0}) δ N_{f} - \frac{γ^{0} σ_{f}^{0} N_{f}^{0}}{c_{p}^{0}} δ c_{p}] \frac{φ^{0} (t)}{c_{p}^{0}}, \end{array}

(71)

{[δ φ (t)]}_{t = 0} = δ φ_{0},

(72)

{[δ E (t)]}_{t = 0} = 0,

(73)

{[δ T (t)]}_{t = 0} = δ T_{0} .

(74)

The 1st-LVSS comprising Equations (69)–(74) represents the specific form taken on by the general NODE representation of the 1st-LVSS provided by Equations (44) and (45) for the Nordheim–Fuchs model. Comparing Equations (69)–(74) to Equations (44) and (45) indicates the following correspondences:

\frac{\partial f (h; θ)}{\partial h} ≜ (\begin{matrix} - \frac{α_{T}^{0} E^{0} (t)}{l_{p}^{0} c_{p}^{0}} & - \frac{α_{T}^{0} φ^{0} (t)}{l_{p}^{0} c_{p}^{0}} & 0 \\ γ^{0} σ_{f}^{0} N_{f}^{0} & 0 & 0 \\ \frac{γ^{0} σ_{f}^{0} N_{f}^{0}}{c_{p}^{0}} & 0 & 0 \end{matrix}); v^{(1)} (t) ≜ (\begin{matrix} δ φ (t) \\ δ E (t) \\ δ T (t) \end{matrix});

(75)

\frac{\partial f (h; θ)}{\partial θ} ≜ (\begin{matrix} \frac{\partial f_{1}}{\partial θ_{1}} & \frac{\partial f_{1}}{\partial θ_{2}} & \frac{\partial f_{1}}{\partial θ_{3}} & 0 & 0 & 0 \\ 0 & 0 & 0 & \frac{\partial f_{2}}{\partial θ_{4}} & \frac{\partial f_{2}}{\partial θ_{5}} & \frac{\partial f_{2}}{\partial θ_{6}} \\ 0 & 0 & \frac{\partial f_{3}}{\partial θ_{3}} & \frac{\partial f_{3}}{\partial θ_{4}} & \frac{\partial f_{3}}{\partial θ_{5}} & \frac{\partial f_{3}}{\partial θ_{6}} \end{matrix}); δ θ ≜ (\begin{matrix} δ α \\ δ l_{p} \\ δ c_{p} \\ δ γ \\ δ σ_{f} \\ δ N_{f} \end{matrix});

(76)

\begin{array}{l} \frac{\partial f_{1}}{\partial θ_{1}} ≜ - \frac{E^{0} (t) φ^{0} (t)}{l_{p}^{0} c_{p}^{0}}; \frac{\partial f_{1}}{\partial θ_{2}} ≜ \frac{α_{T}^{0} E^{0} (t) φ^{0} (t)}{{(l_{p}^{0})}^{2} c_{p}^{0}}; \frac{\partial f_{1}}{\partial θ_{3}} ≜ \frac{α_{T}^{0} E^{0} (t) φ^{0} (t)}{l_{p}^{0} {(c_{p}^{0})}^{2}}; \\ \frac{\partial f_{2}}{\partial θ_{4}} ≜ σ_{f}^{0} N_{f}^{0} φ^{0} (t); \frac{\partial f_{2}}{\partial θ_{5}} ≜ γ^{0} N_{f}^{0} φ^{0} (t); \frac{\partial f_{2}}{\partial θ_{6}} ≜ γ^{0} σ_{f}^{0} φ^{0} (t); \\ \frac{\partial f_{3}}{\partial θ_{3}} ≜ - \frac{γ^{0} σ_{f}^{0} N_{f}^{0} φ^{0} (t)}{{(c_{p}^{0})}^{2}}; \frac{\partial f_{3}}{\partial θ_{4}} ≜ \frac{σ_{f}^{0} N_{f}^{0} φ^{0} (t)}{c_{p}^{0}}; \frac{\partial f_{3}}{\partial θ_{5}} ≜ \frac{γ^{0} N_{f}^{0} φ^{0} (t)}{c_{p}^{0}}; \frac{\partial f_{3}}{\partial θ_{6}} ≜ \frac{γ^{0} σ_{f}^{0} φ^{0} (t)}{c_{p}^{0}} . \end{array}

(77)

\frac{\partial h_{e} (x, w)}{\partial x} ≜ (\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}); δ x ≜ (\begin{matrix} δ φ_{0} \\ 0 \\ δ T_{0} \end{matrix}); \frac{\partial h_{e} (x, w)}{\partial w} ≜ (\begin{matrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}) .

(78)

It is evident that the 1st-LVSS would need to be solved repeatedly in order to compute the 1st-level variational function

v^{(1)} (t) ≜ {[δ φ (t), δ E (t), δ T (t)]}^{†}

for every possible variations

δ θ

in the model parameters and variations

δ x

in the initial conditions (“encoder”). This computationally expensive path can be avoided by applying the concepts of the 1st-CASAM-NODE previously outlined in Section 4, as follows:

Consider that the 1st-level variational function $v^{(1)} ≜ {[δ φ (t), δ E (t), δ T (t)]}^{†} \in H_{1} (Ω_{t})$ is an element in a Hilbert space denoted as $H_{1} (Ω_{t})$ , $Ω_{t} ≜ (0, t_{f})$ , comprising elements of the form $u^{(a)} (t) ≜ {[u_{1}^{(a)} (t), u_{2}^{(a)} (t), u_{3}^{(a)} (t)]}^{†}$ , $u^{(b)} (t) ≜ {[u_{1}^{(b)} (t), u_{2}^{(b)} (t), u_{3}^{(b)} (t)]}^{†}$ , and being endowed with the inner product ${〈u^{(a)}, u^{(b)}〉}_{1}$ introduced in Equation (50), which takes on the following particular form for the Nordheim–Fuchs model:

${〈u^{(a)}, u^{(b)}〉}_{1} ≜ \int_{0}^{t_{f}} u^{(a)} (t) \cdot u^{(b)} (t) d t = \sum_{i = 1}^{3} \int_{0}^{t_{f}} u_{i}^{(a)} (t) u_{i}^{(b)} (t) d t .$

(79)
Use Equation (79) to form the inner product of Equations (69)–(71) with a yet undefined function $a^{(1)} (t) ≜ {[a_{1}^{(1)} (t), a_{2}^{(1)} (t), a_{3}^{(1)} (t)]}^{†} \in H_{1} (Ω_{t})$ , to obtain the following relation, which is the particular form taken on by Equation (51) for the Nordheim–Fuchs model:

$\begin{array}{l} \int_{0}^{t_{f}} a_{1}^{(1)} (t) [\frac{d}{d t} δ φ (t) + \frac{α_{T}^{0}}{l_{p}^{0} c_{p}^{0}} E^{0} (t) δ φ (t) + \frac{α_{T}^{0}}{l_{p}^{0} c_{p}^{0}} φ^{0} (t) δ E (t)] d t \\ + \int_{0}^{t_{f}} a_{2}^{(1)} (t) [\frac{d}{d t} δ E (t) - γ^{0} σ_{f}^{0} N_{f}^{0} δ φ (t)] d t \\ + \int_{0}^{t_{f}} a_{3}^{(1)} (t) [\frac{d}{d t} δ T (t) - \frac{γ^{0} σ_{f}^{0} N_{f}^{0}}{c_{p}^{0}} δ φ (t)] d t \\ = \int_{0}^{t_{f}} a_{1}^{(1)} (t) [- \frac{δ α_{T}}{l_{p}^{0} c_{p}^{0}} + \frac{α_{T}^{0}}{{(l_{p}^{0})}^{2} c_{p}^{0}} δ l_{p} + \frac{α_{T}^{0}}{l_{p}^{0} {(c_{p}^{0})}^{2}} δ c_{p}] E^{0} (t) φ^{0} (t) d t \\ + \int_{0}^{t_{f}} a_{2}^{(1)} (t) [(σ_{f}^{0} N_{f}^{0}) δ γ + (γ^{0} N_{f}^{0}) δ σ_{f} + (γ^{0} σ_{f}^{0}) δ N_{f}] φ^{0} (t) d t \\ + \int_{0}^{t_{f}} a_{3}^{(1)} (t) [(σ_{f}^{0} N_{f}^{0}) δ γ + (γ^{0} N_{f}^{0}) δ σ_{f} + (γ^{0} σ_{f}^{0}) δ N_{f} - \frac{γ^{0} σ_{f}^{0} N_{f}^{0}}{c_{p}^{0}} δ c_{p}] \frac{φ^{0} (t)}{c_{p}^{0}} d t . \end{array}$

(80)
Integrating by parts the terms on the left side of Equation (80) yields the following relation:

\begin{array}{l} \int_{0}^{t_{f}} a_{1}^{(1)} (t) [\frac{d}{d t} δ φ (t) + \frac{α_{T}^{0}}{l_{p}^{0} c_{p}^{0}} E^{0} (t) δ φ (t) + \frac{α_{T}^{0}}{l_{p}^{0} c_{p}^{0}} φ^{0} (t) δ E (t)] d t \\ + \int_{0}^{t_{f}} a_{2}^{(1)} (t) [\frac{d}{d t} δ E (t) - γ^{0} σ_{f}^{0} N_{f}^{0} δ φ (t)] d t \\ + \int_{0}^{t_{f}} a_{3}^{(1)} (t) [\frac{d}{d t} δ T (t) - \frac{γ^{0} σ_{f}^{0} N_{f}^{0}}{c_{p}^{0}} δ φ (t)] d t = a_{1}^{(1)} (t_{f}) δ φ (t_{f}) - a_{1}^{(1)} (0) δ φ (0) \\ + a_{2}^{(1)} (t_{f}) δ E (t_{f}) - a_{2}^{(1)} (0) δ E (0) + a_{3}^{(1)} (t_{f}) δ T (t_{f}) - a_{3}^{(1)} (0) δ T (0) \\ + \int_{0}^{t_{f}} v^{(1)} (t) \cdot {\{A^{(1)} (h; θ) a^{(1)} (t)\}}_{(h^{0}; θ^{0})} d t, \end{array}

(81)

where:

A^{(1)} (h; θ) a^{(1)} (t) ≜ - \frac{d a^{(1)} (t)}{d t} - {[\frac{\partial f (h; θ)}{\partial h}]}_{(h^{0}, θ^{0})}^{†} a^{(1)} (t);

(82)

with

{[\frac{\partial f (h; θ)}{\partial h}]}_{(h^{0}, θ^{0})}^{†} ≜ (\begin{matrix} - α_{T}^{0} E^{0} (t) / (l_{p}^{0} c_{p}^{0}) & γ^{0} σ_{f}^{0} N_{f}^{0} & γ^{0} σ_{f}^{0} N_{f}^{0} / c_{p}^{0} \\ - α_{T}^{0} φ^{0} (t) / (l_{p}^{0} c_{p}^{0}) & 0 & 0 \\ 0 & 0 & 0 \end{matrix}) .

(83)

The relation obtained in Equation (81) is the particular form taken on by Equation (52) for the Nordheim–Fuchs model.

4.: The definition of the function $a^{(1)} (t)$ is now completed by requiring that: (i) the integral term on the right side of Equation (81) represent the G-differential $δ r_{1} (h; δ h)$ defined in Equation (62) and (ii) the appearance of the unknown values of the components of $v^{(1)} (t_{f})$ be eliminated from appearing in Equation (81). These requirements will be satisfied if the function $a^{(1)} (t) ≜ {[a_{1}^{(1)} (t), a_{2}^{(1)} (t), a_{3}^{(1)} (t)]}^{†} \in H_{1} (Ω_{t})$ is the solution of the following “1st-Level Adjoint Sensitivity System” (1st-LASS):

$A^{(1)} (h; θ) a^{(1)} (t) ≜ - \frac{d a^{(1)} (t)}{d t} - {[\frac{\partial f (h; θ)}{\partial h}]}_{(h^{0}, θ^{0})}^{†} a^{(1)} (t) = {[δ (t - t_{f}), 0, 0]}^{†};$

(84)

$a^{(1)} (t_{f}) ≜ {[a_{1}^{(1)} (t_{f}), a_{2}^{(1)} (t_{f}), a_{3}^{(1)} (t_{f})]}^{†} = {[0, 0, 0]}^{†} .$

(85)

It is important to note that if the vector-valued function

f (h; θ)

is linear in

h (t)

(in which case the NODE would be linear), then the 1st-level adjoint sensitivity function

a^{(1)} (t)

would not depend on

h (t)

, so the “forward solution path” would not need to be stored in order to compute

a^{(1)} (t)

. Otherwise, however, the “forward solution path”

h (t)

would need to be stored in order to compute

a^{(1)} (t)

.

5.: Using Equations (84), (85), (80), (62), (72), (73) and (74) in Equation (81) yields the following expression for the first G-differential $δ r_{1} (h; δ h)$ of the response under consideration:

$\begin{array}{l} δ r_{1} (h; δ h) = δ φ (t_{f}) = \int_{0}^{t_{f}} a_{1}^{(1)} (t) [- \frac{δ α_{T}}{l_{p}^{0} c_{p}^{0}} + \frac{α_{T}^{0}}{{(l_{p}^{0})}^{2} c_{p}^{0}} δ l_{p} + \frac{α_{T}^{0}}{l_{p}^{0} {(c_{p}^{0})}^{2}} δ c_{p}] E^{0} (t) φ^{0} (t) d t \\ + \int_{0}^{t_{f}} a_{2}^{(1)} (t) [(σ_{f}^{0} N_{f}^{0}) δ γ + (γ^{0} N_{f}^{0}) δ σ_{f} + (γ^{0} σ_{f}^{0}) δ N_{f}] φ^{0} (t) d t \\ + \int_{0}^{t_{f}} a_{3}^{(1)} (t) [(σ_{f}^{0} N_{f}^{0}) δ γ + (γ^{0} N_{f}^{0}) δ σ_{f} + (γ^{0} σ_{f}^{0}) δ N_{f} - \frac{γ^{0} σ_{f}^{0} N_{f}^{0}}{c_{p}^{0}} δ c_{p}] \frac{φ^{0} (t)}{c_{p}^{0}} d t \\ + a_{1}^{(1)} (0) δ φ_{0} + a_{3}^{(1)} (0) δ T_{0} . \end{array}$

(86)

It follows from Equation (86) that the first-order sensitivities of the response

φ (t_{f})

with respect to the parameters and initial conditions underlying the Nordheim–Fuchs model have the following expressions, all of which are to be evaluated at the nominal values of the respective parameters and functions (but the superscript “zero” is omitted to simplify the notation):

\frac{\partial φ (t_{f})}{\partial α_{T}} = - \frac{1}{l_{p} c_{p}} \int_{0}^{t_{f}} a_{1}^{(1)} (t) E (t) φ (t) d t;

(87)

\frac{\partial φ (t_{f})}{\partial l_{p}} = \frac{α_{T}}{{(l_{p})}^{2} c_{p}} \int_{0}^{t_{f}} a_{1}^{(1)} (t) E (t) φ (t) d t;

(88)

\frac{\partial φ (t_{f})}{\partial c_{p}} = \frac{α_{T}}{l_{p} {(c_{p})}^{2}} \int_{0}^{t_{f}} a_{1}^{(1)} (t) E (t) φ (t) d t - \frac{γ σ_{f} N_{f}}{{(c_{p})}^{2}} \int_{0}^{t_{f}} a_{3}^{(1)} (t) φ (t) d t;

(89)

\frac{\partial φ (t_{f})}{\partial γ} = σ_{f} N_{f} \int_{0}^{t_{f}} [a_{2}^{(1)} (t) + \frac{1}{c_{p}} a_{3}^{(1)} (t)] φ (t) d t;

(90)

\frac{\partial φ (t_{f})}{\partial σ_{f}} = γ N_{f} \int_{0}^{t_{f}} [a_{2}^{(1)} (t) + \frac{1}{c_{p}} a_{3}^{(1)} (t)] φ (t) d t;

(91)

\frac{\partial φ (t_{f})}{\partial N_{f}} = γ σ_{f} \int_{0}^{t_{f}} [a_{2}^{(1)} (t) + \frac{1}{c_{p}} a_{3}^{(1)} (t)] φ (t) d t;

(92)

\frac{\partial φ (t_{f})}{\partial φ_{0}} = a_{1}^{(1)} (0); \frac{\partial φ (t_{f})}{\partial E (0)} = 0; \frac{\partial φ (t_{f})}{\partial T_{0}} = a_{3}^{(1)} (0) .

(93)

5.2. First-Order Sensitivities of the Energy Released Response $r_{2} (h) = E (t_{f})$

The first-order G-differential of the response

r_{2} (h) = E (t_{f})

defined in Equation (32) is obtained as follows:

δ r_{2} (h; δ h) = {\{\frac{d}{d ε} [\int_{0}^{t_{f}} [E^{0} (t) + ε δ E (t)] δ (t - t_{f}) d t]\}}_{ε = 0} = \int_{0}^{t_{f}} δ E (t) δ (t - t_{f}) d t,

(94)

where the variation

δ E (t)

is the solution of the First-Level Variational Sensitivity System (1st-LVSS) defined by Equations (69)–(74).

The sensitivities of the response

r_{2} (h) = E (t_{f})

are determined by following the same procedure as has been outlined in Section 5.2, using an adjoint function denoted as

χ^{(1)} (t) ≜ {[χ_{1}^{(1)} (t), χ_{2}^{(1)} (t), χ_{3}^{(1)} (t)]}^{†} \in H_{1} (Ω_{t})

. Following the same steps as in Section 5.2 (which are omitted here to avoid undue repetition) leads to the following 1st-LASS for the 1st-level adjoint sensitivity function

χ^{(1)} (t)

:

A^{(1)} (h; θ) χ^{(1)} (t) ≜ - \frac{d χ^{(1)} (t)}{d t} - {[\frac{\partial f (h; θ)}{\partial h}]}_{(h^{0}, θ^{0})}^{†} χ^{(1)} (t) = {[0, δ (t - t_{f}), 0]}^{†};

(95)

χ^{(1)} (t_{f}) ≜ {[χ_{1}^{(1)} (t_{f}), χ_{2}^{(1)} (t_{f}), χ_{3}^{(1)} (t_{f})]}^{†} = {[0, 0, 0]}^{†} .

(96)

The sensitivities of

E (t_{f})

with respect to the model parameters and initial conditions have the same formal expressions as shown in Equations (87)–(93), but with the components of the 1st-level adjoint sensitivity function

χ^{(1)} (t)

replacing the components of

a^{(1)} (t) ≜ {[a_{1}^{(1)} (t), a_{2}^{(1)} (t), a_{3}^{(1)} (t)]}^{†} .

5.3. First-Order Sensitivities of the Temperature Response $r_{3} (h) = T (t_{f})$

The first-order G-differential of the response

r_{3} (h) = T (t_{f})

defined in Equation (33) is obtained as follows:

δ r_{3} (h; δ h) = {\{\frac{d}{d ε} [\int_{0}^{t_{f}} [T^{0} (t) + ε δ T (t)] δ (t - t_{f}) d t]\}}_{ε = 0} = \int_{0}^{t_{f}} δ T (t) δ (t - t_{f}) d t,

(97)

where the variation

δ T (t)

is the solution of the First-Level Variational Sensitivity System (1st-LVSS) defined by Equations (69)–(74).

The sensitivities of the response

r_{3} (h) = T (t_{f})

are determined by following the same procedure as has been outlined in Section 5.1, using an adjoint function denoted as

ξ^{(1)} (t) ≜ {[ξ_{1}^{(1)} (t), ξ_{2}^{(1)} (t), ξ_{3}^{(1)} (t)]}^{†} \in H_{1} (Ω_{t})

. Following the same steps as in Section 5.1 (which are omitted here to avoid undue repetition) leads to the following 1st-LASS for the 1st-level adjoint sensitivity function

ξ^{(1)} (t)

:

A^{(1)} (h; θ) ξ^{(1)} (t) ≜ - \frac{d ξ^{(1)} (t)}{d t} - {[\frac{\partial f (h; θ)}{\partial h}]}_{(h^{0}, θ^{0})}^{†} ξ^{(1)} (t) = {[0, 0, δ (t - t_{f})]}^{†};

(98)

ξ^{(1)} (t_{f}) ≜ {[ξ_{1}^{(1)} (t_{f}), ξ_{2}^{(1)} (t_{f}), ξ_{3}^{(1)} (t_{f})]}^{†} = {[0, 0, 0]}^{†} .

(99)

The sensitivities of

T (t_{f})

with respect to the model parameters and initial conditions have the same formal expressions as shown in Equations (87)–(93), but with the components of the 1st-level adjoint sensitivity function

ξ^{(1)} (t) ≜ {[ξ_{1}^{(1)} (t), ξ_{2}^{(1)} (t), ξ_{3}^{(1)} (t)]}^{†}

replacing the components of

a^{(1)} (t) ≜ {[a_{1}^{(1)} (t), a_{2}^{(1)} (t), a_{3}^{(1)} (t)]}^{†} .

5.4. First-Order Sensitivities of the Thermal Conductivity Response $r_{4} (h; φ) = k (T_{f}; φ)$

The first-order G-differential of the response

r_{4} (h; φ) = k (T_{f}; φ)

defined in Equation (34) is obtained as follows:

\begin{array}{l} δ r_{4} (h; φ; δ h; δ φ) = δ k (T; φ; δ T; δ φ) = {\{\frac{d}{d ε} \int_{0}^{t_{f}} [φ_{1}^{0} + ε δ φ_{1}] δ (t - t_{f}) d t\}}_{ε = 0} \\ + {\{\frac{d}{d ε} \int_{0}^{t_{f}} [(φ_{2}^{0} + ε δ φ_{2}) (T^{0} + ε δ T) + (φ_{3}^{0} + ε δ φ_{3}) {(T^{0} + ε δ T)}^{2}] δ (t - t_{f}) d t\}}_{ε = 0} \\ = {\{δ k (T; φ; δ φ)\}}_{d i r} + {\{δ k (T; φ; δ T)\}}_{i n d}, \end{array}

(100)

where the direct-effect and the indirect-effect terms, respectively, are defined as follows:

{\{δ k (T; φ; δ φ)\}}_{d i r} ≜ δ φ_{1} + δ φ_{2} \int_{0}^{t_{f}} T^{0} (t) δ (t - t_{f}) + δ φ_{3} \int_{0}^{t_{f}} {[T^{0} (t)]}^{2} δ (t - t_{f});

(101)

{\{δ k (T; φ; δ T)\}}_{i n d} ≜ \int_{0}^{t_{f}} [φ_{2}^{0} + 2 φ_{3}^{0} T^{0} (t)] δ T (t) δ (t - t_{f}) d t .

(102)

The direct-effect term yields the following sensitivities which can be evaluated immediately:

\frac{\partial k (T_{f})}{\partial φ_{1}} = 1; \frac{\partial k (T_{f})}{\partial φ_{2}} = T^{0} (t_{f}); \frac{\partial k (T_{f})}{\partial φ_{3}} = {[T^{0} (t_{f})]}^{2} .

(103)

The indirect-effect term can be evaluated only after determining the variational function

δ T (t)

, which is the solution of the 1st-LVSS defined by Equations (69)–(74). The need for solving (repeatedly) the 1st-LVSS can be circumvented by applying the principles of the 1st-CASAM-NODE, as previously outlined. Thus, following the same procedure as detailed in Section 5.1 leads to the following 1st-LASS for the 1st-level adjoint sensitivity function, denoted as

ψ^{(1)} (t) ≜ {[ψ_{1}^{(1)} (t), ψ_{2}^{(1)} (t), ψ_{3}^{(1)} (t)]}^{†} \in H_{1} (Ω_{t})

, for computing the sensitivities stemming from the indirect-effect term

{\{δ k (T; φ; δ T)\}}_{i n d}

:

A^{(1)} (h; θ) ψ^{(1)} (t) = {[[φ_{2}^{0} + 2 φ_{3}^{0} T^{0} (t)] δ (t - t_{f}), 0, 0]}^{†};

(104)

ψ^{(1)} (t_{f}) ≜ {[ψ_{1}^{(1)} (t_{f}), ψ_{2}^{(1)} (t_{f}), ψ_{3}^{(1)} (t_{f})]}^{†} = {[0, 0, 0]}^{†} .

(105)

It is important to note that all of the following 1st-Level Adjoint Sensitivity Systems, enumerated in items (i) through (iv), below:

(i): the 1st-LASS defined by Equations (84) and (85), which are solved for obtaining the corresponding 1st-level adjoint sensitivity function needed for computing the sensitivities of the component $h_{1} (t) ≜ φ (t)$ of the state function $h (t)$ ;
(ii): the 1st-LASS defined by Equations (95) and (96), which are solved for obtaining the corresponding 1st-level adjoint sensitivity function needed for computing the sensitivities of the component $h_{2} (t) ≜ E (t)$ of the state function $h (t)$ ;
(iii): the 1st-LASS defined by Equations (98) and (99), which are solved for obtaining the corresponding 1st-level adjoint sensitivity function needed for computing the sensitivities of the component $h_{3} (t) ≜ T (t)$ of the state function $h (t)$ ; and
(iv): the 1st-LASS defined by Equations (104) and (105), which are solved for obtaining the corresponding 1st-level adjoint sensitivity function needed for computing the sensitivities stemming from the indirect-effect term ${\{δ k (T; φ; δ T)\}}_{i n d}$ ,

have the same structures/operators on their left sides, and the respective adjoint sensitivity functions all satisfy the same final-time conditions; only the source terms on the right sides of the respective 1st-LASS differ from each other. Consequently, the same numerical procedures and/or neural nets can be used for computing the respective 1st-level adjoint sensitivity functions.

Since the NODE is a first-order ODE, the corresponding 1st-LASS is solved “backwards” in time, starting at the final time-step

t = t_{f}

, as indicated by the general 1st-CASAM-NODE methodology presented in Section 4. If the NODE is linear in the state function (dependent variable)

h (t)

, then the 1st-LASS will be independent of

h (t)

, so the “forward solution path” would not need to be stored in order to compute the 1st-level adjoint sensitivity functions. In contradistinction, if the NODE is nonlinear in the state function (dependent variable)

h (t)

, then the 1st-LASS will depend on

h (t)

, so the “forward solution path” would need to be stored in order to compute the respective 1st-level adjoint sensitivity functions.

Furthermore, the same formal expressions are obtained for the sensitivities of the responses considered. Thus, the respective 1st-level adjoint sensitivity functions differ from each other according to the response considered, but the quadrature schemes needed to evaluate the integrals defining the respective sensitivities are the same. Therefore, the same numerical procedures and/or neural nets can be used for computing the respective integrals that define the 1st-order sensitivities, while using the appropriate/corresponding 1st-level adjoint sensitivity functions. If the decoder response depends on parameters/weights, additional sensitivities arise from the respective nonvanishing “direct-effect term”.

If simple relations can be obtained among the responses of interest, such as Equations (11) and (16) for the illustrative paradigm example, then the sensitivities of the various responses can be obtained by using these relationships, but this is seldom the case in practice.

5.5. Most Efficient Computation of First-Order Sensitivities: Application of the 1st-FASAM-N

In most, if not all, practical situations, the equations modeling the physical system under consideration can be recast to suit the computation of the response under consideration and, consequently, the computation of the response sensitivities with respect to the underlying model parameters. For example, the response

r_{2} (h) = E (t_{f})

involves just the function

E (t)

; hence, this response would be ideally computed, together with its sensitivities to parameters, by using an equation containing as few as possible dependent variables other than the ones (e.g.,

E (t)

) needed for computing the response. Such an equation was obtained in Equation (17), which contains just the dependent variable

E (t)

, so it would be more advantageous to use it for the sensitivity analysis of

r_{2} (h) = E (t_{f})

rather than use the entire system of equations underlying the Nordheim–Fuchs model, as was performed, for illustrative purposes, in Section 5.1. Furthermore, the form of Equation (17) indicates that the “features” (i.e., functions) of model parameters characterizing this balance equation can be chosen as follows:

F_{1} (p) ≜ \frac{α_{T}}{2 l_{p} c_{p}}; F_{2} (p) ≜ φ_{0} γ σ_{f} N_{f}; F (p) ≜ {[F_{1} (p), F_{2} (p)]}^{†},

(106)

where the vector of primary model parameters is defined as follows:

p ≜ {[p_{1}, \dots, p_{7}]}^{†} ≜ {[α_{T}, l_{p}, c_{p}, γ, σ_{f}, N_{f}, φ_{0}]}^{†} .

(107)

Note that the vector

p

includes the initial condition

φ_{0}

.

In terms of the “feature function”

F (p) ≜ {[F_{1} (p), F_{2} (p)]}^{†}

, Equation (17) can alternatively be written as follows:

\frac{d E (t)}{d t} = - F_{1} (p) E^{2} (t) + F_{2} (p), E (0) = 0 .

(108)

In terms of the feature function

F (p) ≜ {[F_{1} (p), F_{2} (p)]}^{†}

, the solution of Equation (108) has the following form:

E (t) = {[\frac{F_{2} (p)}{F_{1} (p)}]}^{1 / 2} \tanh [t G (p)]; G (p) ≜ \sqrt{F_{1} (p) F_{2} (p)} .

(109)

Of course, a specific NODE would need to be constructed to model Equation (108).

The form of Equation (108) is suitable for applying the “n^th-Order Features Adjoint Sensitivity Analysis Methodology for Nonlinear Systems” (n^th-FASAM-N) [24], which is the most efficient methodology for computing sensitivities, particularly for sensitivities of second and higher orders. This methodology considers the specific “features” of model parameters, such as the function

F (p) ≜ {[F_{1} (p), F_{2} (p)]}^{†}

, to compute sensitivities with respect to model parameters more efficiently than by considering directly the respective primary parameters.

For the computation of 1st-order sensitivities, the 1st-FASAM-N commences by constructing the 1st-Level Variational Sensitivity System (1st-LVSS) for the variational function

δ E (t)

by applying the definition of the first-order G-differential to Equation (108), which yields:

\frac{d}{d ε} {\{\frac{d [E^{0} (t) + ε δ E (t)]}{d t} + [F_{1}^{0} + ε δ F_{1}] {[E^{0} + ε δ E]}^{2} - [F_{2}^{0} + ε (δ F_{2})]\}}_{ε = 0} = 0,

(110)

\frac{d}{d ε} {\{{[E^{0} (t) + ε δ E (t)]}_{t = 0}\}}_{ε = 0} = 0 .

(111)

Performing the operations indicated in Equations (110) and (111) yields the following expression for the 1st-LVSS satisfied by the variational function

δ E (t)

:

[\frac{d}{d t} + 2 F_{1} E (t)] δ E (t) = - δ F_{1} E^{2} (t) + δ F_{2}, t > 0,

(112)

δ E (0) = 0, t = 0 .

(113)

The 1st-LVSS represented by Equation (112) is to be solved at the nominal values for the parameters and the state function

E (t)

but the superscript “0” (which indicates “nominal values”) has been omitted to simplify the notation.

Numerically, the 1st-LVSS would need to be solved anew for the various variations

δ F_{1}

,

δ F_{2}

in the components of the feature function

F (p)

. This need for repeatedly solving the 1st-LVSS can be avoided by constructing the corresponding 1st-Level Adjoint Sensitivity System (1st-LASS). The Hilbert space appropriate for the construction of the 1st-LASS corresponding to Equation (112) is endowed with the following particular form of Equation (79):

{〈u^{(a)} (t), u^{(b)} (t)〉}_{1} ≜ \int_{0}^{t_{f}} u^{(a)} (t) u^{(b)} (t) d t .

(114)

Using Equation (114) to form the inner product of Equation (112) with a yet undefined function

ω^{(1)} (t)

yields the following relation:

\int_{0}^{t_{f}} ω^{(1)} (t) [\frac{d}{d t} + 2 F_{1} E (t)] δ E (t) d t = - (δ F_{1}) \int_{0}^{t_{f}} ω^{(1)} (t) E^{2} (t) d t + (δ F_{2}) \int_{0}^{t_{f}} ω^{(1)} (t) d t .

(115)

Integrating by parts the left side of Equation (115) yields the following relation:

\begin{array}{l} \int_{0}^{t_{f}} ω^{(1)} (t) [\frac{d}{d t} + 2 F_{1} E (t)] δ E (t) d t = ω^{(1)} (τ) δ E (τ) - ω^{(1)} (0) δ E (0) \\ + \int_{0}^{t_{f}} δ E (t) [- \frac{d ω^{(1)} (t)}{d t} + 2 F_{1} E (t) ω^{(1)} (t)] d t . \end{array}

(116)

Identifying the integral on the right side of Equation (116) with the G-differential

δ E (τ)

of the response

E (τ)

obtained in Equation (32) and eliminating the unknown value

δ E (τ)

from the right side of Equation (116) by setting

ω^{(1)} (τ) = 0

yields the following 1st-Level Adjoint Sensitivity System (1st-LASS) for the 1st-level adjoint sensitivity function

ω^{(1)} (t)

:

[- \frac{d}{d t} + 2 F_{1} E (t)] ω^{(1)} (t) = δ (t - t_{f}), t > 0,

(117)

ω^{(1)} (t_{f}) = 0, t = t_{f} .

(118)

The 1st-LASS represented by Equations (117) and (118) is independent of variations in the feature functions (and/or parameters) so it would need to be solved only once, numerically. In the present case, the 1st-LASS can be solved analytically to obtain the following closed-form expression for the 1st-level adjoint sensitivity function

ω^{(1)} (t)

:

ω^{(1)} (t) = H (t_{f} - t) {\{\frac{\cosh [t G (p)]}{\cosh [t_{f} G (p)]}\}}^{2},

(119)

where

H (t - t_{f})

denotes the Heaviside functional.

Using Equations (116)–(118) in Equation (115) yields the following expression for the first-order total G-differential

δ E (t_{f})

of the response

E (t_{f})

in terms of the 1st-level adjoint function

ω^{(1)} (t)

:

δ E (t_{f}) = - (δ F_{1}) \int_{0}^{t_{f}} ω^{(1)} (t) E^{2} (t) d t + (δ F_{2}) \int_{0}^{t_{f}} ω^{(1)} (t) d t .

(120)

It follows from Equations (120), (119) and (109) that the two sensitivities of the response

E (t_{f})

with respect to the two components of the feature function

F ≜ {(F_{1}, F_{2})}^{†}

have the following expressions:

\frac{\partial E (t_{f})}{\partial F_{1}} = - \int_{0}^{t_{f}} ω^{(1)} (t) E^{2} (t) d t = \frac{1}{2} {[\frac{F_{2} (p)}{F_{1} (p)}]}^{1 / 2} \{\frac{t_{f}}{\cosh^{2} [t_{f} G (p)]} - \frac{\tanh [t_{f} G (p)]}{G (p)}\};

(121)

\frac{\partial E (t_{f})}{\partial F_{2}} = \int_{0}^{t_{f}} ω^{(1)} (t) d t = \frac{1}{2 G (p)} \tanh [t_{f} G (p)] + \frac{t_{f}}{2 \cosh^{2} [t_{f} G (p)]} .

(122)

The above expressions are to be evaluated at the nominal parameter values but the superscript “zero” has been omitted, for simplicity. The expressions obtained in Equations (121) and (122) can be verified by differentiating the expression provided in Equation (109), evaluated at a user-chosen time

t = t_{f}

within the interval

0 < t_{f} < \infty

.

The sensitivities of the response

E (t_{f})

with respect to the model parameters and initial condition are obtained by using the following “chain-rule” relationship:

\frac{\partial E (t_{f}; F_{1}; F_{2})}{\partial p_{i}} = \frac{\partial E (t_{f})}{\partial F_{1}} \frac{\partial F_{1} (p)}{\partial p_{i}} + \frac{\partial E (t_{f})}{\partial F_{2}} \frac{\partial F_{2} (p)}{\partial p_{i}}; i = 1, \dots, 7 .

(123)

The explicit expressions for the specific sensitivities of the response

E (t_{f})

with respect to the parameters underlying the feature functions are obtained using Equation (123) in conjunction with Equations (121) and (122) while recalling the definitions of the feature functions

F_{1} (p)

and

F_{2} (p)

defined in Equation (106). The detailed expressions of these sensitivities are as follows:

\frac{\partial E (t_{f})}{\partial α_{T}} = \frac{\partial E (t_{f})}{\partial F_{1}} \frac{\partial F_{1}}{\partial α_{T}} + \frac{\partial E (t_{f})}{\partial F_{2}} \frac{\partial F_{2}}{\partial α_{T}} = \frac{1}{2 l_{p} c_{p}} \frac{\partial E (t_{f})}{\partial F_{1}};

(124)

\frac{\partial E (t_{f})}{\partial l_{p}} = \frac{\partial E (t_{f})}{\partial F_{1}} \frac{\partial F_{1}}{\partial l_{p}} + \frac{\partial E (t_{f})}{\partial F_{2}} \frac{\partial F_{2}}{\partial l_{p}} = - \frac{α_{T}}{2 {(l_{p})}^{2} c_{p}} \frac{\partial E (t_{f})}{\partial F_{1}};

(125)

\frac{\partial E (t_{f})}{\partial c_{p}} = \frac{\partial E (t_{f})}{\partial F_{1}} \frac{\partial F_{1}}{\partial c_{p}} + \frac{\partial E (t_{f})}{\partial F_{2}} \frac{\partial F_{2}}{\partial c_{p}} = - \frac{α_{T}}{2 {(c_{p})}^{2} l_{p}} \frac{\partial E (t_{f})}{\partial F_{1}};

(126)

\frac{\partial E (t_{f})}{\partial γ} = \frac{\partial E (t_{f})}{\partial F_{1}} \frac{\partial F_{1}}{\partial γ} + \frac{\partial E (t_{f})}{\partial F_{2}} \frac{\partial F_{2}}{\partial γ} = φ_{0} σ_{f} N_{f} \frac{\partial E (t_{f})}{\partial F_{2}};

(127)

\frac{\partial E (t_{f})}{\partial σ_{f}} = \frac{\partial E (t_{f})}{\partial F_{1}} \frac{\partial F_{1}}{\partial σ_{f}} + \frac{\partial E (t_{f})}{\partial F_{2}} \frac{\partial F_{2}}{\partial σ_{f}} = φ_{0} γ N_{f} \frac{\partial E (t_{f})}{\partial F_{2}};

(128)

\frac{\partial E (t_{f})}{\partial N_{f}} = \frac{\partial E (t_{f})}{\partial F_{1}} \frac{\partial F_{1}}{\partial N_{f}} + \frac{\partial E (t_{f})}{\partial F_{2}} \frac{\partial F_{2}}{\partial N_{f}} = φ_{0} γ σ_{f} \frac{\partial E (t_{f})}{\partial F_{2}} .

(129)

\frac{\partial E (t_{f})}{\partial φ_{0}} = \frac{\partial E (t_{f})}{\partial F_{1}} \frac{\partial F_{2}}{\partial φ_{0}} + \frac{\partial E (t_{f})}{\partial F_{2}} \frac{\partial F_{2}}{\partial φ_{0}} = γ σ_{f} N_{f} \frac{\partial E (t_{f})}{\partial F_{2}};

(130)

Notably, the application of the 1st-FASAM-N requires one “large-scale” computation to solve the 1st-LASS, cf. Equations (117) and (118), which is a single ODE, to obtain the 1st-level adjoint function

ω^{(1)} (t)

, which is a scalar-valued function. However, solving the forward model, cf. Equation (17), and the corresponding 1st-LASS, comprising Equations (117) and (118), would require the construction of a separate (albeit simpler) NODE. The 1st-level adjoint function

ω^{(1)} (t)

is subsequently used in performing two integrals (quadrature) for obtaining the two sensitivities of the response

E (t_{f})

with respect to the two components

F_{1} (p)

and

F_{2} (p)

of the feature function

F (p) ≜ {(F_{1}, F_{2})}^{†}

. Subsequently, all of the response sensitivities with respect to the model’s primary parameters are obtained analytically by using the chain rule to differentiate the components of the feature function with respect to the underlying model parameters and initial conditions.

In contradistinction, if one wishes to compute directly the sensitivities of the response with respect to the model parameters and initial conditions, it has been shown in Section 5.1, Section 5.2, Section 5.3 and Section 5.4 that the original NODE can be used to solve (backward in time) the 1st-LASS, which comprises a system of three coupled ODEs (rather than a single ODE if the 1st-FASAM is used) for obtaining the 1st-level adjoint function, which is a vector-valued function comprising three components, cf.

χ^{(1)} (t) ≜ {[χ_{1}^{(1)} (t), χ_{2}^{(1)} (t), χ_{3}^{(1)} (t)]}^{†}

for the response

E (t_{f})

. The respective vector-valued 1st-level adjoint function is subsequently used in computing six (rather than two, if the 1st-FASAM is used) integrals (quadrature) for obtaining the six sensitivities of the respective response with respect to the six model parameters.

Equations similar to Equation (17) can be derived for the reactor flux and reactor temperature responses, so the 1st-FASAM can be applied in a similar fashion to compute the first-order sensitivities of these responses. Using the sensitivities with respect to the reactor temperature response would readily provide the first-order sensitivities of the reactor thermal conductivity response. However, corresponding to each of these responses, a specific NODE would need to be constructed. Of course, any of these specific NODEs would have much simpler structures than the NODE for solving simultaneously the system of coupled ODEs presented in Section 5.1, Section 5.2, Section 5.3 and Section 5.4.

6. Use of First-Order Sensitivities for Uncertainty Analysis of NODE Responses

As has been discussed in Section 1, even if the NODE parameters are perfectly well matched (after completing the “training”) to the physical model, the model’s parameters are not known exactly but are affected by uncertainties stemming from the physical processes that were used to determine them in the first place. Consequently, the NODE parameters, namely: (i) the latent neurons with learnable scalar adjustable weights represented by the components of the vector

θ ≜ {(θ_{1}, \dots, θ_{T W})}^{†}

; (ii) the encoder’s “inputs”

x ≜ {[x_{1}, \dots, x_{T I}]}^{†}

; (iii) the encoder’s “learnable” scalar adjustable weights

w ≜ {(w_{1}, \dots, w_{T E W})}^{†}

; and (iv) the decoder’s learnable scalar adjustable weights represented by the components of the vector

φ ≜ {(φ_{1}, \dots, φ_{T D})}^{†}

, are all affected by uncertainties. It is convenient to consider that the vectors

θ

,

x

,

w

, and

φ

are the components of a (partitioned column) vector

α ≜ {(α_{1}, \dots, α_{T P})}^{†}

, where

T P

denotes “the total number of model parameters” and which is defined as follows:

\begin{array}{l} α ≜ {(α_{1}, \dots, α_{T P})}^{†} ≜ {(θ, x, w, φ)}^{†}; T P ≜ T W + T I + T E W + T D; \\ α_{i} ≜ θ_{i}; i = 1, \dots, T W; α_{i} ≜ x_{i}; i = T W + 1, \dots, T W + T I; \\ α_{i} ≜ w_{i}; i = T W + T I + 1, \dots, T W + T I + T E W; \\ α_{i} ≜ φ_{i}; i = T W + T I + T E W + 1, \dots, T W + T I + T E W + T D . \end{array}

(131)

Although the model parameters are not bona fide random quantities, these model parameters are considered in practice to be variates that obey a multivariate probability distribution function, which will be denoted as

p_{α} (α)

. The multivariate distribution

p_{α} (α)

is seldom known exactly, particularly for large-scale systems involving many parameters. Nevertheless, the various moments of

p_{α} (α)

can be defined in a standard manner by considering that

p_{α} (α)

is formally defined on a domain

D_{α}

. When the vectors of parameters

θ

,

x

,

w

, and

φ

are independent of each other, then

p_{α} (α)

is given by the product of the normalized probability distributions of the respective weights and inputs, namely:

p_{α} (α) = p (θ) p (x) p (w) p (φ)

. The moments of the probability distribution

p_{α} (α)

of model parameters are defined as follows:

The expected (or mean) value of a model parameter $α_{i}$ , denoted as $α_{i}^{0}$ , is defined as follows:

$α_{i}^{0} ≜ {〈α_{i}〉}_{α} ≜ \int_{D_{α}} α_{i} p_{α} (α) d α, i = 1, \dots, T P .$

(132)

In particular, the definition provided in Equation (132) implies that the components of the vector

p_{α} (α)

are defined similarly, namely:

θ_{j}^{0} ≜ {〈θ_{j}〉}_{θ} ≜ \int_{D_{θ}} θ_{j} p_{θ} (θ) d θ

,

j = 1, \dots, T W

, and so on. The expected parameter values

α_{i}^{0}

,

i = 1, \dots, T P

, are considered to be the “nominal values” for computing the nominal value

r_{n}^{0} (h^{0}; φ^{0})

of the response

r_{n} [h (t_{f}); φ]

predicted by the NODE decoder. The nominal or mean values are considered to be the components of the following vector of mean (expected) values:

α^{0} ≜ {(α_{1}^{0}, \dots, α_{T P}^{0})}^{†} .

(133)

2.: The covariance, $cov (α_{i}, α_{j})$ , of two parameters, $α_{i}$ and $α_{j}$ , is defined as follows:

$μ_{2}^{i j} (α) ≜ cov (α_{i}, α_{j}) ≜ {〈(α_{i} - α_{i}^{0}) (α_{j} - α_{j}^{0})〉}_{α}, i, j = 1, \dots, T P .$

(134)

The variance,

var (α_{i})

, of a parameter

α_{i}

, is defined as follows:

var (α_{i}) ≜ {〈{(α_{i} - α_{i}^{0})}^{2}〉}_{α}, i = 1, \dots, T P .

(135)

The standard deviation,

σ_{i}

, of

α_{i}

is defined as follows:

σ_{i} ≜ \sqrt{var (α_{i})}

. The correlation,

ρ_{i j}

, between two parameters,

α_{i}

and

α_{j}

, is defined as follows:

ρ_{i j} ≜ cov (α_{i}, α_{j}) / (σ_{i} σ_{j}); i, j = 1, \dots, T P .

(136)

3.: The third-order moment, $μ_{3}^{i j k}$ , of the distribution of parameters and the associated third-order correlation $t_{i j k}$ among three parameters are defined as follows, for $i, j, k = 1, \dots, T P$ :

$μ_{3}^{i j k} (α_{i}, α_{j}, α_{k}) ≜ {〈(α_{i} - α_{i}^{0}) (α_{j} - α_{j}^{0}) (α_{k} - α_{k}^{0})〉}_{α} ≜ t_{i j k} σ_{i} σ_{j} σ_{k} .$

(137)
4.: The fourth-order moment, $μ_{4}^{i j k l}$ , of the distribution of parameters and the associated fourth-order correlation $q_{i j k l}$ among four parameters are defined as follows, for $i, j, k, l = 1, \dots, T P$ :

$μ_{4}^{i j k l} (α_{i}, α_{j}, α_{k}, α_{l}) ≜ {〈(α_{i} - α_{i}^{0}) (α_{j} - α_{j}^{0}) (α_{k} - α_{k}^{0}) (α_{l} - α_{l}^{0})〉}_{α} ≜ q_{i j k l} σ_{i} σ_{j} σ_{k} σ_{l} .$

(138)

The uncertainties induced in the decoder response by uncertainties in the NODE model’s parameters can be quantified by using the “propagation of errors” concepts introduced by Tukey [20] and generalized to sixth order (in the model correlations) by Cacuci [21]. The “propagation of errors” method uses the Taylor series of a system response,

r_{n} [h (t_{f}); φ]

,

n = 1, \dots, T R

, around the expected (or nominal) parameter values

α^{0}

. When only first-order sensitivities are available, this Taylor series has the following formal expression:

\begin{array}{l} r_{n} [h (t_{f}); φ] ≜ r_{n} (α) = r_{n} (α^{0}) + \sum_{i = 1}^{T P} \frac{\partial r_{n} (α^{0})}{\partial α_{i}} (α_{i} - α_{i}^{0}) + O [{(α_{i} - α_{i}^{0})}^{2}] \\ = r_{n} (α^{0}) + \sum_{i = 1}^{T W} \frac{\partial r_{n} (α^{0})}{\partial θ_{i}} (θ_{i} - θ_{i}^{0}) + \sum_{i = 1}^{T I} \frac{\partial r_{n} (α^{0})}{\partial x_{i}} (x_{i} - x_{i}^{0}) + \sum_{i = 1}^{T E W} \frac{\partial r_{n} (α^{0})}{\partial w_{i}} (w_{i} - w_{i}^{0}) \\ + \sum_{i = 1}^{T D} \frac{\partial r_{n} (α^{0})}{\partial φ_{i}} (φ_{i} - φ_{i}^{0}) + O [{(α_{i} - α_{i}^{0})}^{2}] . \end{array}

(139)

The expectation (value),

E (r_{n})

, of a response

r_{n} [h (t_{f}); φ]

,

n = 1, \dots, T R

, is obtained using Equation (139), which yields the following result:

E (r_{n}) ≜ \int_{D_{α}} r_{n} (α) p_{α} (α) d α = r_{n} (α^{0}) + O [\int_{D_{α}} {(α_{i} - α_{i}^{0})}^{2} p_{α} (α) d α] .

(140)

The expression of the covariance between a parameter

α_{i}

,

i = 1, \dots, T P

, and a response

r_{n} [h (t_{f}); φ]

,

n = 1, \dots, T R

, is obtained using Equations (139) and (136), which yields the following result:

\begin{array}{l} c o v (α_{i}, r_{n}) ≜ \int_{D_{α}} (α_{i} - α_{i}^{0}) r_{n} (α) p_{α} (α) d α = \sum_{j = 1}^{T P} \frac{\partial r_{n} (α^{0})}{\partial α_{j}} cov (α_{i}, α_{j}) \\ + O [\int_{D_{α}} {(α_{i} - α_{i}^{0})}^{3} p_{α} (α) d α] . \end{array}

(141)

The expression of the covariance,

c o v (r_{k}, r_{l})

, between two decoder responses

r_{k} [h (t_{f}); φ]

and

r_{l} [h (t_{f}); φ]

is obtained using Equations (139), (140) and (136), which yields the following result:

\begin{array}{l} c o v (r_{k}, r_{l}) ≜ \int_{D_{α}} [r_{k} (α) - E (r_{k})] [r_{l} (α) - E (r_{l})] p_{α} (α) d α \\ = \sum_{i = 1}^{T P} \sum_{j = 1}^{T P} \frac{\partial r_{k} (α^{0})}{\partial α_{i}} \frac{\partial r_{l} (α^{0})}{\partial α_{j}} cov (α_{i}, α_{j}) + O [\int_{D_{α}} {(α_{i} - α_{i}^{0})}^{3} p_{α} (α) d α] . \end{array}

(142)

The expression of the triple-moment,

μ_{3} (r_{k}, r_{l}, r_{m})

, among three responses,

r_{k} [h (t_{f}); φ]

,

r_{l} [h (t_{f}); φ]

, and

r_{m} [h (t_{f}); φ]

, is obtained using Equations (139), (140) and (137), which yields the following result:

\begin{array}{l} μ_{3} (r_{k}, r_{l}, r_{m}) ≜ \int_{D_{α}} [r_{k} (α) - E (r_{k})] [r_{l} (α) - E (r_{l})] [r_{m} (α) - E (r_{m})] p_{α} (α) d α \\ = \sum_{a = 1}^{T P} \sum_{b = 1}^{T P} \sum_{c = 1}^{T P} \frac{\partial r_{k} (α^{0})}{\partial α_{a}} \frac{\partial r_{l} (α^{0})}{\partial α_{b}} \frac{\partial r_{m} (α^{0})}{\partial α_{c}} μ_{3}^{i j k} (α_{a}, α_{b}, α_{c}) + O [\int_{D_{α}} {(α_{i} - α_{i}^{0})}^{4} p_{α} (α) d α] . \end{array}

(143)

The expression of the triple-moment

μ_{3} (α_{i}, r_{k}, r_{l})

, among one parameter,

α_{i}

, and two responses,

r_{k} (α)

and

r_{l} (α)

, is obtained using Equations (139), (140) and (137), which yields the following result:

\begin{array}{l} μ_{3} (α_{i}, r_{k}, r_{l}) ≜ \int_{D_{α}} (α_{i} - α_{i}^{0}) [r_{k} (α) - E (r_{k})] [r_{l} (α) - E (r_{l})] p_{α} (α) d α \\ = \sum_{a = 1}^{T P} \sum_{b = 1}^{T P} \frac{\partial r_{k} (α^{0})}{\partial α_{a}} \frac{\partial r_{l} (α^{0})}{\partial α_{b}} μ_{3}^{i a b} (α_{i}, α_{a}, α_{b}) + O [\int_{D_{α}} {(α_{i} - α_{i}^{0})}^{4} p_{α} (α) d α] . \end{array}

(144)

The expression of the triple-moment, among two parameters

α_{i}

,

α_{j}

and one response,

r_{k} (α)

, is obtained using Equations (139), (140) and (137), which yields the following result:

\begin{array}{l} μ_{3} (α_{i}, α_{j}, r_{k}) ≜ \int_{D_{α}} (α_{i} - α_{i}^{0}) (α_{j} - α_{j}^{0}) [r_{k} (α) - E (r_{k})] p_{α} (α) d α \\ = \sum_{a = 1}^{T P} \frac{\partial r_{k} (α^{0})}{\partial α_{a}} μ_{3}^{i j a} (α_{i}, α_{j}, α_{a}) + O [\int_{D_{α}} {(α_{i} - α_{i}^{0})}^{4} p_{α} (α) d α] . \end{array}

(145)

The expression of the quadruple-moment, denoted as

μ_{4} (r_{k}, r_{l}, r_{m}, r_{n})

, among four responses,

r_{k} [h (t_{f}); φ]

,

r_{l} [h (t_{f}); φ]

,

r_{m} [h (t_{f}); φ]

, and

r_{n} [h (t_{f}); φ]

, is obtained using Equations (138)–(140), which yields the following result:

\begin{array}{l} μ_{4}^{(1)} (r_{k}, r_{l}, r_{m}, r_{n}) \\ ≜ \int_{D_{α}} [r_{k} (α) - E (r_{k})] [r_{l} (α) - E (r_{l})] [r_{m} (α) - E (r_{m})] [r_{n} (α) - E (r_{n})] p_{α} (α) d α \\ = \sum_{a = 1}^{T P} \sum_{b = 1}^{T P} \sum_{c = 1}^{T P} \sum_{d = 1}^{T P} \frac{\partial r_{k} (α^{0})}{\partial α_{a}} \frac{\partial r_{l} (α^{0})}{\partial α_{b}} \frac{\partial r_{m} (α^{0})}{\partial α_{c}} \frac{\partial r_{n} (α^{0})}{\partial α_{d}} μ_{4}^{a b c d} (α_{a}, α_{b}, α_{c}, α_{d}) + O [\int_{D_{α}} {(α_{i} - α_{i}^{0})}^{5} p_{α} (α) d α] . \end{array}

(146)

The expression of the quadruple-moment, denoted as

μ_{4} (α_{i}, r_{k}, r_{l}, r_{m})

, among one parameter,

α_{i}

, and three responses,

r_{k} [h (t_{f}); φ]

,

r_{l} [h (t_{f}); φ]

, and

r_{m} [h (t_{f}); φ]

, is obtained using Equations (138)–(140), which yields the following result:

\begin{array}{l} μ_{4} (α_{i}, r_{k}, r_{l}, r_{m}) \\ ≜ \int_{D_{α}} (α_{i} - α_{i}^{0}) [r_{k} (α) - E (r_{k})] [r_{l} (α) - E (r_{l})] [r_{m} (α) - E (r_{m})] p_{α} (α) d α \\ = \sum_{a = 1}^{T P} \sum_{b = 1}^{T P} \sum_{c = 1}^{T P} \frac{\partial r_{k} (α^{0})}{\partial α_{a}} \frac{\partial r_{l} (α^{0})}{\partial α_{b}} \frac{\partial r_{m} (α^{0})}{\partial α_{b}} μ_{4}^{a b c i} (α_{a}, α_{b}, α_{c}, α_{i}) + O [\int_{D_{α}} {(α_{i} - α_{i}^{0})}^{5} p_{α} (α) d α] . \end{array}

(147)

The expression of the quadruple-moment, denoted as

μ_{4} (α_{i}, α_{j}, r_{k}, r_{l})

, among two parameters,

α_{i}

and

α_{j}

, and two responses,

r_{k} [h (t_{f}); φ]

and

r_{l} [h (t_{f}); φ]

, is obtained using Equations (138)–(140), which yields the following result:

\begin{array}{l} μ_{4} (α_{i}, α_{j}, r_{k}, r_{l}) ≜ \int_{D_{α}} (α_{i} - α_{i}^{0}) (α_{j} - α_{j}^{0}) [r_{k} (α) - E (r_{k})] [r_{l} (α) - E (r_{l})] p_{α} (α) d α \\ = \sum_{a = 1}^{T P} \sum_{b = 1}^{T P} \frac{\partial r_{k} (α^{0})}{\partial α_{a}} \frac{\partial r_{l} (α^{0})}{\partial α_{b}} μ_{4}^{a b i j} (α_{a}, α_{b}, α_{i}, α_{j}) + O [\int_{D_{α}} {(α_{i} - α_{i}^{0})}^{5} p_{α} (α) d α] . \end{array}

(148)

The expression of the quadruple-moment, denoted as

μ_{4} (α_{i}, α_{j}, α_{k}, r_{l})

, among three parameters

α_{i}

,

α_{j}

,

α_{k}

, and one response,

r_{l} [h (t_{f}); φ]

, is obtained using Equations (138)–(140), which yields the following result:

\begin{array}{l} μ_{4} (α_{i}, α_{j}, α_{k}, R_{l}) ≜ \int_{D_{α}} (α_{i} - α_{i}^{0}) (α_{j} - α_{j}^{0}) (α_{k} - α_{k}^{0}) [r_{l} (α) - E (r_{l})] p_{α} (α) d α \\ = \sum_{a = 1}^{T P} \frac{\partial r_{l} (α^{0})}{\partial α_{a}} μ_{4}^{a i j k} (α_{a}, α_{i}, α_{j}, α_{k}) + O [\int_{D_{α}} {(α_{i} - α_{i}^{0})}^{5} p_{α} (α) d α] . \end{array}

(149)

7. Discussion and Conclusions

This work has introduced the mathematical framework of the novel “First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations” (1st-CASAM-NODE) which yields exact expressions for the first-order sensitivities of NODE decoder responses to the NODE parameters, including encoder initial conditions, while enabling the most efficient computation of these sensitivities. The application of the 1st-CASAM-NODE has been illustrated by using the Nordheim–Fuchs reactor dynamics/safety phenomenological model, which is representative of physical systems that would be modeled by NODEs while admitting exact analytical solutions for all quantities of interest (hidden states, decoder outputs, sensitivities with respect to all parameters and initial conditions, etc.). It has also been shown that if the equations underlying the physical model can be re-arranged so as to group the parameters/weights into functional “features” of several parameters, then the “First-Order Feature Adjoint Sensitivity Analysis Methodology for Nonlinear Systems” (1st-FASAM-N) can be advantageously applied to compute the response sensitivities with respect to the feature functions (which are by definition fewer than the number of parameters). The response sensitivities with respect to the primary parameters are subsequently obtained analytically by using the chain rule to differentiate the components of the feature function with respect to the underlying model parameters and initial conditions. Applying the 1st-FASAM-N, however, would require the construction of a specific NODE for this purpose.

This work has also laid the foundation for the ongoing work on conceiving the “Second-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations” (2nd-CASAM-NODE) which aims at yielding exact expressions for the second-order sensitivities of NODE decoder responses to the NODE parameters and initial conditions while enabling the most efficient computation of these sensitivities.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The author declares no conflicts of interest.

References

Haber, E.; Ruthotto, L. Stable architectures for deep neural networks. Inverse Probl. 2017, 34, 014004. [Google Scholar] [CrossRef]
Lu, Y.; Zhong, A.; Li, Q.; Dong, B. Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; PMLR. pp. 3276–3285. [Google Scholar]
Ruthotto, L.; Haber, E. Deep neural networks motivated by partial differential equations. J. Math. Imaging Vis. 2018, 62, 352–364. [Google Scholar] [CrossRef]
Chen, R.T.Q.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D.K. Neural ordinary differential equations. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2018; Volume 31, pp. 6571–6583. [Google Scholar] [CrossRef]
Dupont, E.; Doucet, A.; The, Y.W. Augmented neural odes. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32, pp. 14–15. [Google Scholar]
Kidger, P. On Neural Differential Equations. arXiv 2022, arXiv:2202.02435. [Google Scholar]
Kidger, P.; Morrill, J.; Foster, J.; Lyons, T. Neural controlled differential equations for irregular time series. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020; Volume 33, pp. 6696–6707. [Google Scholar]
Morrill, J.; Salvi, C.; Kidger, P.; Foster, J. Neural rough differential equations for long time series. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; PMLR. pp. 7829–7838. [Google Scholar]
Grathwohl, W.; Chen, R.T.Q.; Bettencourt, J.; Sutskever, I.; Duvenaud, D. Ffjord: Free-form continuous dynamics for scalable reversible generative models. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Zhong, Y.D.; Dey, B.; Chakraborty, A. Symplectic ode-net: Learning Hamiltonian dynamics with control. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
Tieleman, T.; Hinton, G. Lecture 6.5—RMSProp: Divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 2012, 4, 26. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Pontryagin, L.S. Mathematical Theory of Optimal Processes; CRC Press: Boca Raton, FL, USA, 1987. [Google Scholar]
LeCun, Y. A theoretical framework for back-propagation. In Proceedings of the Connectionist Models Summer School; Touresky, D., Hinton, G., Sejnowski, T., Eds.; Morgan Kaufmann Publishers, Inc.: San Mateo, CA, USA, 1988. [Google Scholar]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Norcliffe, A.; Deisenroth, M.P. Faster training of neural ODEs using Gauss–Legendre quadrature. arXiv 2023, arXiv:2308.10644. [Google Scholar]
Lamarsh, J.R. Introduction to Nuclear Reactor Theory; Adison-Wesley Publishing Co.: Reading, MA, USA, 1966; pp. 491–492. [Google Scholar]
Hetrick, D.L. Dynamics of Nuclear Reactors; American Nuclear Society, Inc.: La Grange Park, IL, USA, 1993; pp. 164–174. [Google Scholar]
Cacuci, D.G. Computation of high-order sensitivities of model responses to model parameters. II: Introducing the Second-Order Adjoint Sensitivity Analysis Methodology for Computing Response Sensitivities to Functions/Features of Parameters. Energies 2023, 16, 6356. [Google Scholar] [CrossRef]
Tukey, J.W. The Propagation of Errors, Fluctuations and Tolerances; Technical Reports No. 10–12; Princeton University: Princeton, NJ, USA, 1957. [Google Scholar]
Cacuci, D.G. The nth-Order Comprehensive Adjoint Sensitivity Analysis Methodology (nth-CASAM): Overcoming the Curse of Dimensionality in Sensitivity and Uncertainty Analysis, Volume I: Linear Systems; Springer Nature: Cham, Switzerland, 2022; 362p. [Google Scholar] [CrossRef]
Cacuci, D.G. The Fourth-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Nonlinear Systems (4th-CASAM-N): I. Mathematical Framework. J. Nucl. Eng. 2022, 3, 37–71. [Google Scholar] [CrossRef]
Cacuci, D.G. Sensitivity theory for nonlinear systems: I. Nonlinear functional analysis approach. J. Math. Phys. 1981, 22, 2794–2812. [Google Scholar] [CrossRef]
Cacuci, D.G. Introducing the n^th-Order Features Adjoint Sensitivity Analysis Methodology for Nonlinear Systems (n^th-FASAM-N): I. Mathematical Framework. Am. J. Comput. Math. 2024, 14, 11–42. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cacuci, D.G. First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations: Mathematical Framework and Illustrative Application to the Nordheim–Fuchs Reactor Safety Model. J. Nucl. Eng. 2024, 5, 347-372. https://doi.org/10.3390/jne5030023

AMA Style

Cacuci DG. First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations: Mathematical Framework and Illustrative Application to the Nordheim–Fuchs Reactor Safety Model. Journal of Nuclear Engineering. 2024; 5(3):347-372. https://doi.org/10.3390/jne5030023

Chicago/Turabian Style

Cacuci, Dan Gabriel. 2024. "First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations: Mathematical Framework and Illustrative Application to the Nordheim–Fuchs Reactor Safety Model" Journal of Nuclear Engineering 5, no. 3: 347-372. https://doi.org/10.3390/jne5030023

APA Style

Cacuci, D. G. (2024). First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations: Mathematical Framework and Illustrative Application to the Nordheim–Fuchs Reactor Safety Model. Journal of Nuclear Engineering, 5(3), 347-372. https://doi.org/10.3390/jne5030023

Article Menu

First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations: Mathematical Framework and Illustrative Application to the Nordheim–Fuchs Reactor Safety Model

Abstract

1. Introduction

2. Neural Ordinary Differential Equations (NODE): Basic Properties and Uses

3. Illustrative Paradigm Application: NODE Conceptual Modeling of the Nordheim–Fuchs Phenomenological Reactor Dynamics/Safety Model

4. First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1st-CASAM-NODE): Mathematical Framework

5. Illustrative Application of the 1st-CASAM-NODE Methodology to Compute First-Order Sensitivities of Nordheim–Fuchs Model Responses with Respect to the Underlying Parameters

5.1. First-Order Sensitivities of the Flux Response $r_{1} (h) = φ (t_{f})$

5.2. First-Order Sensitivities of the Energy Released Response $r_{2} (h) = E (t_{f})$

5.3. First-Order Sensitivities of the Temperature Response $r_{3} (h) = T (t_{f})$

5.4. First-Order Sensitivities of the Thermal Conductivity Response $r_{4} (h; φ) = k (T_{f}; φ)$

5.5. Most Efficient Computation of First-Order Sensitivities: Application of the 1st-FASAM-N

6. Use of First-Order Sensitivities for Uncertainty Analysis of NODE Responses

7. Discussion and Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations: Mathematical Framework and Illustrative Application to the Nordheim–Fuchs Reactor Safety Model

Abstract

1. Introduction

2. Neural Ordinary Differential Equations (NODE): Basic Properties and Uses

3. Illustrative Paradigm Application: NODE Conceptual Modeling of the Nordheim–Fuchs Phenomenological Reactor Dynamics/Safety Model

4. First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1st-CASAM-NODE): Mathematical Framework

5. Illustrative Application of the 1st-CASAM-NODE Methodology to Compute First-Order Sensitivities of Nordheim–Fuchs Model Responses with Respect to the Underlying Parameters

5.1. First-Order Sensitivities of the Flux Response r 1 h = φ t f

5.2. First-Order Sensitivities of the Energy Released Response r 2 h = E t f

5.3. First-Order Sensitivities of the Temperature Response r 3 h = T t f

5.4. First-Order Sensitivities of the Thermal Conductivity Response r 4 h ; φ = k ( T f ; φ )

5.5. Most Efficient Computation of First-Order Sensitivities: Application of the 1st-FASAM-N

6. Use of First-Order Sensitivities for Uncertainty Analysis of NODE Responses

7. Discussion and Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.1. First-Order Sensitivities of the Flux Response $r_{1} (h) = φ (t_{f})$

5.2. First-Order Sensitivities of the Energy Released Response $r_{2} (h) = E (t_{f})$

5.3. First-Order Sensitivities of the Temperature Response $r_{3} (h) = T (t_{f})$

5.4. First-Order Sensitivities of the Thermal Conductivity Response $r_{4} (h; φ) = k (T_{f}; φ)$