1. Introduction
It is well-known that Neural Ordinary Differential Equations (NODEs) have enabled [
1,
2,
3] the use of deep learning for modeling discretely sampled dynamical systems. NODEs provide a flexible trade-off between efficiency, memory costs, and accuracy while bridging traditional numerical modeling with modern deep learning, as demonstrated by various applications, including time-series, dynamics, and control [
1,
2,
3,
4,
5,
6,
7,
8,
9]. However, because each time-step is determined locally in time, NODEs are limited to describing systems that are instantaneous. On the other hand, integral equations (IEs) model global “long-distance” spatio-temporal relations, and IE solvers often possess stability properties that are superior to solvers for ordinary and/or partial differential equations. Therefore, differential equations are occasionally recast in integral equation forms that can be solved more efficiently using IE solvers, as exemplified by the applications described in [
10,
11,
12].
Due to their non-local behavior, IE solvers are suitable for modeling complex dynamics and learning the operator underlying the system under consideration by using data sampled from the respective system. As discussed in [
13], the operator learning problem is formulated on finite grids using finite-difference methods that approximate the domain of the functions under investigation; the learning is performed by using an IE solver, which samples the domain of integration continuously. As shown in [
14], Neural Integral Equations (NIEs) and the Attentional Neural Integral Equations (ANIEs) can be used to generate dynamics and infer the spatio-temporal relations that initially generated the data, thus enabling the continuous learning of non-local dynamics with arbitrary time resolution. The ANIE interprets the self-attention mechanism as the Nystrom method for approximating integrals [
15], which enables efficient integration over higher dimensions, as discussed in [
10,
11,
12,
13,
14,
15] and references therein.
Neural nets are trained by minimizing a “loss functional” chosen by the user to represent the discrepancy between the output produced by the neural net’s decoder and some user-chosen “reference solution”. However, the physical system modeled by a neural net inevitably comprises imperfectly known parameters that stem from measurements and/or computations and are therefore afflicted by uncertainties that stem from the respective experiments and/or computations. Hence, even if the neural net perfectly reproduces a given state of a physical system, the neural net’s “optimized weights” are subject to the uncertainties inherent in the parameters that characterize the underlying physical system, and these uncertainties inevitably propagate to the decoder’s output response. It is hence important to quantify the impact of parameter/weight uncertainties on the uncertainties induced in the decoder’s output response. This impact is quantified by the sensitivities of the decoder’s response with respect to the optimized weights/parameters comprised within the neural net.
Neural nets comprise not only scalar-valued weights/parameters but also functions (e.g., correlations) of such scalar model parameters, which can be conveniently called “features of primary model parameters”. Cacuci [
16] has developed the “nth-Order Features Adjoint Sensitivity Analysis Methodology for Nonlinear Systems (nth-FASAM-N)”, which enables the most efficient computation of the exact expressions of arbitrarily high-order sensitivities of model responses with respect to the model’s “features”. In turn, the sensitivities of the responses with respect to the primary model parameters are determined, analytically and trivially, by applying the “chain rule” to the expressions obtained for the response sensitivities with respect to the model’s “features”. The nth-FASAM-N [
16] has been applied to develop general first- and second-order sensitivity analysis methodologies for NODEs [
17] and for Neural Integral Equations of the Fredholm type [
18], which enable the computation, with unsurpassed efficacy, of the exact expressions of first- and second-order sensitivities of decoder responses with respect to the underlying neural net’s optimized weights.
This work continues the application of the nth-FASAM-N [
16] methodology to develop the “First- and Second-Order Methodologies for Neural Integral Equations of Volterra Type” (acronyms “1st-FASAM-NIE-V” and, respectively, “2nd-FASAM-NIE-V”). The 1st-FASAM-NIE-V methodology, which is presented in
Section 2, enables the most efficient computation of exact expressions of all of the first-order sensitivities of NIE decoder responses with respect to all of the optimal values of the NIE-net’s parameters/weights after the respective NIE-Volterra-net was optimized to represent the underlying physical system. The efficiency of the 1st-FASAM-NIE-V is illustrated in
Section 3 by applying it to perform a comprehensive first-order sensitivity analysis of the well-known model [
19,
20,
21] of neutron slowing down in a homogeneous medium containing fissionable material.
The general mathematical framework of the 2nd-FASAM-NIE-Volterra methodology, which is presented in
Section 4, enables the most efficient computation of the exact expressions of the second-order sensitivities of NIE decoder responses with respect to all of the optimal values of the NIE-net’s parameters/weights. The efficiency of the 2nd-FASAM-NIE-V is illustrated in
Section 5 by applying it to perform a comprehensive second-order sensitivity analysis of the neutron slowing down model [
19,
20,
21] considered in
Section 3.
Section 6 concludes this work by presenting a discussion that highlights the unparalleled efficiency of the 2nd-FASAM-NIE-V methodology for performing a sensitivity analysis of Volterra-type Neural Integral Equations.
2. First-Order Features Adjoint Sensitivity Analysis Methodology for Neural Integral Equations of the Volterra Type (1st-FASAM-NIE-V)
Following [
14], a network of nonlinear “Neural Integral Equations of Volterra-type (NIE-Volterra)” can be represented by the system of coupled equations shown below:
The quantities appearing in Equation (1) are defined as follows:
- (i)
The real-valued scalar quantities , and , are time-like independent variables that parameterize the dynamics of the hidden/latent neuron units. Customarily, the variable is called the “global time”, while the variable is called the “local time”. The initial time-value is denoted as , while the stopping time-value is denoted as .
- (ii)
The components of the vector represent scalar learnable adjustable weights, where denotes the total number of adjustable weights in all of the latent neural nets. The components of the column vector are considered to be “primary parameters”, while the components of the vector-valued function represent the ”feature” functions of the respective weights. The quantity denotes the “total number of feature/functions of the primary model parameters” comprised in the NIE-Volterra. In general, is a nonlinear function of . The total number of feature functions must necessarily be smaller than the total number of primary parameters (weights), i.e., . In the extreme case, when there are no feature functions, it follows that , for all . In this work, all vectors are considered to be column vectors, and the dagger “” symbol will be used to denote “transposition”. The symbol “” will be used to denote “is defined as”, or, equivalently, “is by definition equal to”.
- (iii)
The -dimensional vector-valued function represents the hidden/latent neural networks. The quantity denotes the total number of components of . At the initial time-value , the functions take on the known values .
- (iv)
The functions , model the initial state (“encoder”) of the network. The functions and , depend nonlinearly on and , respectively, and model the dynamics of the latent neurons.
The “training” of NIE-Volterra net is accomplished by using the “adjoint” or other methods to minimize the user-chosen “loss functional” intended to represent the discrepancy between the output produced by the NIE decoder and a “reference solution” chosen by the user. After the training is completed, the primary parameters (“weights”)
will have been assigned “optimal” values, which are obtained as a result of having minimized the chosen loss functional. These optimal values for the primary parameters (“weights”) will be denoted using a superscript “zero”, as follows:
. Using these optimal/nominal parameter values to solve the NIE system will yield the optimal/nominal solution
, which will satisfy the following form of Equation (1):
After the NIE-net is optimized to reproduce the underlying physical system as closely as possible, the subsequent responses of interest are no longer “loss functions” but become specific functionals of NIE’s “decoder” response/output. Such a decoder response, which will be denoted as
, can generally be represented by a scalar-valued functional of
and
, defined as follows:
The function models the decoder and may contain distributions (e.g., Dirac-delta and/or Heaviside functionals, etc.) if the decoder response is to be evaluated at some particular point in time or over a subinterval within the interval .
The optimal value of the decoder response, denoted as
, is represented by evaluating Equation (3) at the optimal/nominal parameter values
and the optimal/nominal solution
, as follows:
The true values of the primary parameters (“weights”) that characterize the physical system modeled by the NIE-V net are afflicted by uncertainties inherent to the experimental and/or computational methodologies employed to model the original physical system. Therefore, the true values of the primary parameters (“weights”) will differ from the known nominal values (which are obtained after training the NIE-net to represent the model of the physical system) by variations denoted as . The variations will induce corresponding variations and in the feature functions, which in turn will induce variations , , and around the nominal/optimal functions . Subsequently, the variations and will induce variations in the NIE decoder’s response.
The 1st-FASAM-IDE-V methodology for computing the first-order sensitivities of the decoder’s response with respect to the NIE’s weights will be established by applying the same principles as those underlying the 1st-FASAM-N [
16] methodology. These first-order sensitivities are embodied in the first-order G-variation
of the response
for variations
and
around the nominal values
and
, which is, by definition, obtained as follows:
In Equation (5), the “direct-effect term”
arises directly from variations
(which, in turn, stem from parameter variations
) and is defined as follows:
Meanwhile, the “indirect-effect term”
arises through the variations
in the hidden state functions
and is defined as follows:
The first-order relationship between the variations
and
is obtained from the first-order G-variation of Equation (1) for
, as follows:
Performing the operations indicated in Equation (8) yields the following NIE-V net, which will be called the “1st-Level Variational Sensitivity System” (1st-LVSS), for the components
,
of the “1st-level variational function”
:
where
As indicated in Equation (9), the 1st-LVSS is to be computed at the nominal/optimal values for the respective model parameters. It is important to note that the 1st-LVSS is linear in the variational function , although it generally remains nonlinear in .
The 1st-LVSS would need to be solved anew to obtain the function that would correspond to each variation , ; this procedure would become prohibitively expensive computationally if is a large number. The need to repeatedly solve the 1st-LVSS can be avoided by recasting the indirect-effect term in terms of an expression that does not involve the function . This goal can be achieved by expressing in terms of another function, which will be called the “1st-level adjoint function” and will be the solution of the “1st-Level Adjoint Sensitivity System (1st-LASS)” to be constructed next.
The 1st-LASS will be constructed in a Hilbert space, denoted as
, where
, comprising elements of the same form as
. The inner product of two elements
and
will be denoted as
and is defined as follows:
The inner product is required to hold in a neighborhood of the nominal values .
The next step is to form the inner product of Equation (9) with a vector
, where the superscript “(1)” indicates “1st-level”, to obtain the following relationship:
The second term on the left side of Equation (12) is transformed using “integration by parts”, as follows:
Placing the result obtained in Equation (13) into the left side of Equation (12) yields the following relation for the left side of Equation (12):
The term on the right side of Equation (14) is now required to represent the “indirect-effect” term defined in Equation (7), which is achieved by requiring that the components of the function
satisfy the following system of equations for
:
The Volterra-like neural system obtained in Equation (15) will be called the “1st-Level Adjoint Sensitivity System”, and its solution, , will be called the “1st-level adjoint sensitivity function”. The 1st-LASS is to be solved using the nominal/optimal values for the parameters and for the function , but this fact has not been explicitly indicated in order to simplify the notation. The 1st-LASS is linear in but it is, in general, nonlinear in . Notably, the 1st-LASS is independent of any parameter variations and needs to be solved once only to determine the 1st-level adjoint sensitivity function . The 1st-LASS is a “final-value problem” because the computation of the adjoint function will commence at , with the known values
It follows from Equations (12)–(15) that the indirect-effect term defined in Equation (7) can be expressed in terms of the 1st-level adjoint sensitivity function
, as follows:
Using the results obtained in Equations (16) and (6) in Equation (5) yields the following expression for the G-variation
, which is seen to be linear in
:
Identifying in Equation (17) the expressions that multiply the variations
yields the following expressions for the sensitivities
of the response
with respect to the components
of the feature function
for
:
The expression on the right-side of Equation (18) is to be evaluated at the nominal/optimal values for the respective model parameters, but this fact has not been indicated explicitly in order to simplify the notation.
The sensitivities with respect to the primary model parameters can be obtained by using the result obtained in Equation (18) together with the “chain rule” of differentiating compound functions, as follows:
The sensitivities are obtained from Equation (18), while the derivatives are obtained analytically, exactly, from the known expressions of the feature functions .
Particular Case: The First-Order Comprehensive Adjoint Sensitivity Analysis Methodology for Neural Integral Equations of the Volterra Type (1st-CASAM-NIE-V)
When no feature functions can be constructed from the model parameters/weights, the feature functions become identical to the parameters, i.e.,
for all
. In this case, the expression obtained in Equation (18) yields directly the first-order sensitivities
of the decoder response with respect to the model weights/parameters for all
, taking on the following specific form:
Because the 1st-LASS is independent of any parameter variations, the 1st-level adjoint sensitivity function
, which appears in Equation (20), remains the solution of the 1st-LASS defined by Equation (15). In this case, however, all of the sensitivities
for all
would be obtained by computing integrals using quadrature formulas. Thus, when there are no feature functions of parameters, the 1st-FASAM-NIE-V reduces to the “First-Order Comprehensive Adjoint Sensitivity Analysis Methodology [
16] applied to Neural Integral Equations of Volterra-Type” (1st-CASAM-NIE-V). On the other hand, when features of parameters can be constructed, only
numerical computations of integrals using quadrature formulas are required, using Equation (18) to obtain the sensitivities
,
. Subsequently, the sensitivities with respect to the model’s weights/parameters are obtained analytically using the chain rule provided in Equation (19).
3. Illustrative Application of the 1st-CASAM-NIE-V and 1st-FASAM-NIE-V Methodologies to Neutron Slowing Down in an Infinite Homogeneous Hydrogenous Medium
The illustrative model considered in this section is a Volterra-type integral equation that describes the energy distribution of neutrons in a homogeneous hydrogenous medium (such as a water-moderated/cooled reactor system) containing
238U (among other materials), which is a heavy element that strongly absorbs neutrons. The distribution of collided neutrons in such a medium is described [
19,
20,
21] by the following linear integral equation of the Volterra type, customarily called the “neutron slowing down equation” for the neutron collision density, denoted as
:
The various quantities that appear in Equation (21) are defined as follows:
- (i)
The quantity denotes the rate at which the source neutrons, considered to be monoenergetic, are emitted at the “source energy” . Neutron upscattering is considered to be negligeable; therefore, is the highest energy in the medium.
- (ii)
The quantity , , denotes the instantaneous energy of the collided neutrons; denotes the lowest neutron energy in the model.
- (iii)
The quantity
denotes the medium’s macroscopic scattering cross-section, which is defined as follows:
where
M denotes the number of materials in the medium,
denotes the relative weighting of the
ith-material in the medium,
denotes the number density of the
ith-material, and
denotes the energy-dependent scattering microscopic cross-section of the
ith-material.
- (iv)
The quantity
denotes the medium’s macroscopic scattering cross-section, which is defined as follows:
where
denotes the energy-dependent total microscopic cross-section of the
ith-material. The quantities
,
,
,
are subject to uncertainties because they are determined from experimentally obtained data.
Notably, the Volterra-type Equation (21) is a “final-value problem” because the computation is started at the highest energy value,
, and progresses towards the lowest energy value,
. Customarily, the solution of Equation (21) is written in the following form:
where
denotes the medium’s macroscopic absorption cross-section. The expression provided in Equation (24) is amenable to computations of the loss of neutrons due to absorbing materials, particularly in the so-called “resonance” energy region.
A typical “decoder response” for the NIE-Volterra network modeled by Equation (21) is the energy-averaged collision density, denoted below as
, which would be measured by a detector having an interaction cross-section
. Mathematically, this detector response can be expressed as follows:
where
and
denote, respectively, the detector material’s atomic number density and the microscopic cross-section describing the interaction (e.g., absorption) of neutrons with the detector’s material;
and
can be considered the “weights” that characterize the neural net’s “decoder”.
Because the energy dependence of the cross-sections does not play a significant role in the sensitivity analysis of the NIE-Volterra modeled using Equation (21), the respective microscopic cross-sections will henceforth be considered to be energy-independent for the purpose of illustrating the application of the 1st-FASAM-NIE-V in order to simplify the ensuing derivations. For energy-independent cross-sections, Equations (21) and (25) take on the following forms, respectively:
In Equations (26) and (27), the source strength
is an imprecisely known “weight” that characterizes the neural net’s “encoder”. Furthermore, the (column) vector of parameters denoted as
comprises as components the “imprecisely known primary model parameters” (or “weights”, as they are customarily called when referring to neural nets) and is defined as follows:
where
denotes the “total number of imprecisely-known weights/parameters”. These primary model parameters/weights are not known exactly but are affected by uncertainties because they stem from experimental procedures, which determine the nominal/mean/optimal values and the second-order moments of their unknown joint distributions; their third- and higher-order moments are rarely known. It is convenient to denote the nominal values of these primary model parameters/weights by using the superscript “zero”, as follows:
The “feature function of primary parameters”,
, is defined as follows:
The closed-form solution of Equation (26) has the following expression in terms of the feature function
:
The closed-form expression of the decoder response can be readily obtained by replacing the result obtained in Equation (31) in Equation (27) and performing the integration over the energy variable to obtain
The expression obtained in Equation (32) reveals that the imprecisely known quantities that affect the decoder response are as follows:
- (i)
the source strength ;
- (ii)
the detector interaction macroscopic cross-section , which can be considered to be a “feature function” of the model parameters ;
- (iii)
the feature function .
3.1. Application of 1st-CASAM-NIE-V to Directly Compute the First-Order Sensitivities of the Decoder Response with Respect to the Primary Model Parameters
The first-order sensitivities of the decoder response with respect to the model parameters are obtained by applying the definition of the G-differential to Equation (26) for arbitrary parameter variations around the parameters’ nominal values. These parameter variations will induce variations in the neutron collision density around the nominal value of the neutron collision density. The variations and will induce variations in the decoder’s response.
The first-order Gateaux (G-)variation
is obtained, by definition, from Equation (27) as follows:
where the “direct effect” term
arises directly from parameter variations
and is defined as follows:
Meanwhile, the indirect-effect term arises from the variations
and is defined as follows:
As indicated in Equations (34) and (35), both the direct-effect and the indirect-effect terms are to be evaluated at the nominal parameter values.
The first-order relation between the variation
and the parameter variations
is obtained by evaluating the G-variation of Equation (26) for variations
around the nominal parameter values
, which yields, by definition, the following NIE-Volterra equation for
:
where
The second equality in Equation (37) has been obtained by using Equations (26) and (31) to eliminate the integral term involving .
The particular form of the first-order derivative
, which appears in Equation (37), is obtained by using the definition of
provided in Equation (30), which yields the following expression:
In view of the definition provided in Equation (22), the derivatives
have the following particular expressions:
In view of the definition provided in Equation (23), the derivatives
have the following particular expressions:
The NIE-Volterra net represented by Equation (36) will be called the “1st-Level Variational Sensitivity System (1st-LVSS)” and its solution,
, will be called the “1st-level variational sensitivity function”. It is evident that Equation (36) would need to be solved
times in order to obtain the variation
for the source variation
and for every parameter variation
This need to repeatedly solve Equation (36) can be circumvented by applying the principles of the 1st-CASAM-NIE-V, generally outlined in
Section 2, to eliminate the appearance of the variation
in the indirect-effect term defined in Equation (35) while expressing this indirect-effect term as a functional of a first-level adjoint function that does not depend on any parameter variation, as follows.
Consider that the function
belongs to a Hilbert space denoted as
, which is defined on the domain
. The inner product in
of two functions,
and
, will be denoted as
, and it is defined as follows:
Form the inner product of Equation (36) with a vector
, where the superscript “(1)” indicates “1st-Level”, to obtain the following relationship:
Transform the left side of Equation (46) as follows:
In obtaining the expression on the right side of the last equality in Equation (47), the well-known “integration by parts” formula
has been used to reverse the integration order in the double integral
, as follows:
- 4.
Require the last term in Equation (47) to represent the indirect-effect term defined in Equation (35), which yields the following “1st-Level Adjoint Sensitivity System (1st-LASS)” for the first-level adjoint sensitivity function
:
The 1st-LASS represented by Equation (50) is a linear NIE-Volterra net, which is independent of any parameter variation and needs to be solved just once to obtain the first-level adjoint sensitivity function
. Notably, the 1st-LASS is an “initial-value problem”, in that the computation of
commences at the lowest energy value, where
, and progresses towards the highest energy value,
. For further reference, the closed-form solution of Equation (50) can be obtained by differentiating this equation with respect to
and subsequently integrating the resulting first-order linear differential equation to obtain the following exact expression:
The expression on the right side of Equation (51) is to be evaluated at the nominal parameter values , but the superscript “zero” has been omitted for notational simplicity.
- 5.
Using Equations (46), (47), and (50) yields the following expression for the indirect-effect term defined in Equation (35):
The expression on the right side of Equation (52) is to be evaluated at the nominal parameter values , but the superscript “zero” has been omitted for notational simplicity.
- 6.
Adding the expression obtained in Equation (52) to the expression of the direct-effect term represented by Equation (34) yields the following expression for the first-order G-variation
:
- 7.
It follows from Equation (53) that the first-order sensitivities of the decoder response with respect to the (encoder’s) source strength and the optimal weights/parameters have the following expressions:
Inserting into Equations (54)–(57) the closed-form expression for the neutron collision density obtained in Equation (31) yields the following closed-form explicit expressions for the first-order sensitivities of the decoder response with respect to the (encoder’s) source strength and the optimal weights/parameters:
The correctness of the expressions obtained in Equations (58)–(61) can be readily verified by differentiating the expressions of the decoder’s response obtained in Equation (32).
In practice, only the exact mathematical expression of the 1st-LASS, namely, Equation (50), and the exact mathematical expression of the first-order sensitivities obtained in Equations (54)–(57) are available. The solution of the 1st-LASS, which is a linear NIE-Volterra net for the first-level adjoint sensitivity function , would need to be obtained numerically in practice. The numerical solution for would be used to determine the first-order sensitivities stemming from the “indirect-effect” term by using quadrature formulas to evaluate the integrals obtained in Equations (54) and (57). It is very important to note that a single “large-scale” computation for determining numerically the adjoint function by solving the 1st-LASS (a NIE-Volterra type equation would be needed for evaluating all of the first-order sensitivities. The numerical computations using quadrature formulas for evaluating the integrals in Equations (54) and (57) are considered to be “small-scale” computations.
As it has been already observed in the brief remarks following Equation (37), the computation of the first-order sensitivities of the decoder response with respect to the encoder source strength S and model weights/parameters could also have been computed by repeatedly numerically solving the NIE-Volterra net (1st-LVSS) represented by Equation (36). This procedure would be very expensive computationally, as it would require large-scale computations to solve the 1st-LVSS defined by Equation (36) in order to obtain the variation for every parameter variation and the source variation . In addition, the same amount of “quadrature” computations would need to be performed using Equation (35) as would be needed for evaluating the first-order sensitivities using Equations (54) and (57).
3.2. Efficient Indirect Computation Using the 1st-FASAM-NIE-V of the First-Order Sensitivities of the Decoder Response with Respect to Primary Model Parameters
When feature functions of model parameters, such as
and
, can be identified, as is the case with the NIE-Volterra net and decoder response represented by Equations (26) and (27), respectively, it is considerably more efficient to determine the first-order sensitivities of the decoder response with respect to the feature functions and subsequently derive analytically the sensitivities with respect to the primary model parameters by using the “chain rule of differentiation”, as will be shown in this Section. Thus, considering arbitrary variations
and
around the nominal values
and, respectively,
, the first-order G-variation of the decoder response has the following expression:
where the expression of the indirect-effect term is defined in Equation (35). The first-order relation between the variation
and the variations
and
is obtained, by definition, from Equation (26) as follows:
where
Comparing Equation (63) to Equation (36) indicates that the only difference between these equations is the expression of the term
, which is expressed in terms of
in Equation (64). Consequently, the first-level adjoint sensitivity function that corresponds to the variational function
is determined by following the same procedure as outlined in Equations (46)–(50), ultimately obtaining the same 1st-LASS as was obtained in Equation (50), having as solution the same expression for
as was obtained in Equation (51). It further follows that the expression of the indirect-effect term will have the following expression:
It follows from Equations (62) and (65) that the first-order G-variation
has the following expression:
As indicated by the expression obtained in Equation (66), the first-order sensitivities of the decoder response with respect to the feature functions and the encoder’s source strength are as follows:
The closed-form expressions of the above sensitivities are readily determined by using in Equations (67)–(69) the expressions obtained in Equations (51) and (24), and by performing the respective integrations to obtain
The first-order sensitivities with respect to the primary parameters are obtained analytically from Equations (67) and (68), respectively, by using the following “chain rule” of differentiation:
The specific expressions of the first-order sensitivities , , are obtained by using Equation (75) in conjunction with Equation (69) and Equations (38)–(44).
3.3. Discussion: Direct Versus Indirect Computation of the First-Order Sensitivities of Decoder Response with Respect to the Primary Model Parameters
The principles of the 1st-CASAM-NIE-V were applied in
Section 3.1 to determine the first-order sensitivities of the decoder response directly with respect to the model’s primary parameters/weights. It has been shown that this procedure requires a single “large-scale” computation to solve an NIE-Volterra equation in order to determine the (single) first-level adjoint sensitivity function
, which is subsequently used in
integrals that are computed using quadrature formulas. The two additional first-order sensitivities with respect to the components of
require a single quadrature involving the forward function
.
The principles of the 1st-FASAM-NIE-V were applied in
Section 3.2 to determine the first-order sensitivities of the decoder response with respect to the feature functions. This path required just
two (as opposed to
) numerical evaluations of (two) integrals using quadrature formulas involving the first-level adjoint sensitivity function
. The sensitivities of the decoder response with respect to the primary parameters/weights were subsequently determined analytically, using the “chain rule of differentiation” of the explicitly known expression of the feature function
. Evaluating the two additional first-order sensitivities with respect to the components of
requires a single quadrature involving the forward function
, as in
Section 3.1. Evidently, the indirect path presented in
Section 3.2 is computationally more efficient, as it requires substantially fewer numerical quadratures than the path presented in
Section 3.1. The superiority of the indirect path, via “feature functions”, over the direct computation of sensitivities with respect to the model parameters will be considerably more evident for the computation of second-order sensitivities, as will be shown in the forthcoming
Section 4 and
Section 5, below.
Of course, when no feature functions can be identified, the 1st-FASAM-NIE-V methodology becomes identical to the 1st-CASAM-NIE-V methodology.
4. The Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Integral Equations of the Volterra Type (2nd-FASAM-NIE-V)
The second-order sensitivities of the response
defined in Equation (3) will be computed by conceptually using their basic definitions as being the “first-order sensitivities of the first-order sensitivities”. Thus, the second-order sensitivities stemming from the first-order sensitivities
are obtained from the first-order G-differential of Equation (18), for
, as follows:
In Equation (76), the expression of the direct-effect term
is obtained after performing the operations with respect to the scalar
and comprises the variations
(stemming from variations in the model parameters), being defined as follows:
The expression on the right side of Equation (77) is to be evaluated at the nominal/optimal values for the respective model parameters, but this fact has not been indicated explicitly in order to simplify the notation.
The expression of the indirect-effect term
defined in Equation (76) is obtained after performing the operations with respect to the scalar
and comprises the variations
and
, as follows:
The expressions in Equation (78) are to be evaluated at the nominal values of the respective functions and parameters, but the respective indication (i.e., the superscript “zero”) has been omitted in order to simplify the notation.
The direct-effect term
can be evaluated at this time for all variations
, but the indirect-effect term
can be evaluated only after having determined the variations
and
. The variation
is the solution of the 1st-LVSS defined by Equation (9). On the other hand, the variational function
is the solution of the system of equations obtained by G-differentiating the 1st-LASS. By definition, the G-differential of Equation (15) is obtained as follows for
:
Performing the operations indicated in Equation (79) and rearranging the various terms yields the following relations for
:
where
As indicated by the result obtained in Equation (80), the variations
are coupled to the variations
. Therefore, they can be obtained by simultaneously solving Equations (80) and (9), which together will be called the “2nd-Level Variational Sensitivity System (2nd-LVSS)”. The solution of the 2nd-LVSS, namely, the vector
, will be called the “2nd-level variational sensitivity function”. Because the 2nd-LVSS depends on the variations
(stemming from variations in the model parameters), it would need to be solved anew for each such variation. The repeated solving of the 2nd-LVSS can be avoided by following the general principles underlying the 2nd-FASAM [
16], which considers the function
to be an element in a Hilbert space, denoted as
. The Hilbert space
is considered to be endowed with an inner product denoted as
between two vectors
and
, with
,
,
, and
, which is defined as follows:
Following the general principles underlying the 2nd-FASAM [
16], the function
will be eliminated from the expression of each indirect-effect term
,
, defined in Equation (78). This elimination is achieved by considering, for each index
, a vector-valued function denoted as
, with
and
. Using the definition provided in Equation (82), we construct the inner product of Equations (9) and (80) with the vector
to obtain the following relation:
where
Following the principles of the 2nd-CASAM [
16], the left side of Equation (83) will be identified with the indirect-effect term defined in Equation (78), thereby determining the (yet undetermined) functions
. For this purpose, the right side of Equation (78) is cast in the form of the inner product
. The terms on the right side of Equation (78) involving the components of the function
are already in the desired format, but the terms involving the components of the function
must be re-arranged, as follows.
- (i)
The fourth term on the right side of Equation (78) is recast by using “integration by parts” as follows:
- (ii)
The sixth (last) term on the right side of Equation (78) is recast by using “integration by parts”, as above, to obtain the following relation:
Using in Equation (78) the results obtained in Equations (85) and (86) yields the following expression for the indirect-effect term for
:
The left side of Equation (83) is now recast in the form of the inner product by performing the following operations:
- (i)
The second term on the left side of Equation (83) is rearranged by using “integration by parts”, as follows:
- (ii)
The fourth term on the left side of Equation (83) is rearranged by using “integration by parts”, as follows:
- (iii)
The fifth term on the left side of Equation (83) is rearranged as follows:
- (iv)
The sixth term on the left side of Equation (83) is rearranged as follows:
Inserting the results obtained in Equations (88)–(91) into the left side of Equation (83) yields the following relation:
The right side of Equation (92) can now be required to represent the indirect-effect term defined in Equation (87) by imposing the requirement that the hitherto arbitrary function
be the solution of the following NIE-Volterra equations for
:
It follows from Equations (92)–(94) that the indirect-effect term
defined by Equation (78) or, equivalently, Equation (87) can be expressed in terms of the function
as follows for
:
The second-order sensitivities
of the decoder response with respect to the components of the feature function are obtained by adding the expression of the indirect-effect term obtained in Equation (95) to the expression for the direct-effect term obtained in Equation (77) and subsequently identifying the expressions that multiply the variations
. The expressions thus obtained for
for
are as follows:
The NIE-Volterra system presented in Equations (93) and (94) is called the “2nd-Level Adjoint Sensitivity System (2nd-LASS)” and its solution, , , is called the “2nd-level adjoint sensitivity function”. Because the sources on the right sides of Equations (93) and (94) stem from the first-order sensitivities , , they are dependent on the index “j”, which implies that for each first-order sensitivity , there will correspond a distinct 2nd-LASS with a distinct solution , a fact that has been emphasized by using the index “j” in the list of arguments of this second-level adjoint sensitivity function. Therefore, there will be as many second-level adjoint functions as there are distinct first-order sensitivities , which is equivalent to the number of components of the “feature-function” . Notably, the integral operators on the left sides of Equations (93) and (94) do not depend on the index “j”, which means that the same left-hand side needs to be inverted for computing the second-level adjoint function, regardless of the source term on the right side (which corresponds to the particular component of the feature function) of Equations (93) and (94). Therefore, if the inverses of the operators appearing on the left sides of Equations (93) and (94) could be stored, they would not need to be inverted repeatedly, so the various second-level adjoint functions would be computed most efficiently.
The second-order sensitivities of the decoder response with respect to the optimal weights/parameters
are obtained analytically by using the chain rule in conjunction with the expressions obtained in Equations (96) and (18), as follows:
When there are no feature functions but only individual model parameters, i.e., when for all , the expression obtained in Equation (96) yields directly the second-order sensitivities for all . In this case, the 2nd-LASS would need to be solved times rather than just times when feature functions can be constructed.