1. Introduction
The introduction of “Neural Ordinary Differential Equations” (NODE) models by Chen et al. [
1] has significantly advanced the versatility and applicability of neural-nets by providing an explicit connection between deep feed-forward neural networks and dynamical systems. By using differential equation solvers to learn dynamics through continuous deep learning models of neural networks, NODE models provide a bridge between modern deep learning and traditional numerical modeling while offering trade-offs between efficiency, memory costs, and accuracy, as demonstrated by various applications [
1,
2,
3,
4,
5,
6,
7,
8,
9]. However, NODE models are limited to describing systems that are instantaneous, since each time-step is determined locally in time, without contributions from the state of the system at other times.
In contradistinction to differential equations, integral equations (IEs) model global spatio-temporal relations, which are learned through an IE-solver (see, e.g., [
10]) which samples the domain of integration continuously. Due to their non-local behavior, IE-solvers are suitable for modeling complex dynamics. Zappala et al. [
11] have introduced the Neural Integral Equation (NIE) and the Attentional Neural Integral Equation (ANIE), which can be used to infer the spatio-temporal relations that generated the data, thus enabling the continuous learning of non-local dynamics with arbitrary time resolution [
11,
12]. Often, ordinary and/or partial differential equations can be recast in integral-equation forms that can be solved more efficiently using IE-solvers, as exemplified in [
13,
14,
15].
Zappala et al. [
16] have also developed a deep learning method called Neural Integro-Differential Equation (NIDE), which “learns” an integro-differential equation (IDE) whose solution approximates data sampled from given non-local dynamics. The motivation for using NIDE stems from the need to model systems that present spatio-temporal relations which transcend local modeling, as illustrated by the pioneering works of Volterra on population dynamics [
17]. Combining the properties of differential and integral equations, IDEs also present properties that are unique to their non-local behavior [
18,
19,
20], with applications in computational biology, physics, engineering, and applied sciences [
18,
19,
20,
21,
22,
23,
24,
25,
26].
All neural-nets are trained by minimizing a user-chosen “loss functional” which aims at representing the discrepancy between the output produced by the respective net’s decoder and a user-chosen “reference solution”. The neural-net is optimized to reproduce the underlying physical system as closely as possible. However, the physical system modeled by a neural-net comprises parameters that stem from measurements and/or computations which are subject to uncertainties. Therefore, even if the neural-net reproduces perfectly the underlying system, the uncertainties inherent in the system’s parameters would propagate to the subsequent results produced by the decoder. Quantifying the uncertainties in the decoder’s response can only be performed if the sensitivities of the decoder’s response with respect to the neural-net’s optimized parameters are known.
In addition to scalar-valued parameters, neural-nets often comprise scalar-valued functions (e.g., correlations, material properties, etc.) of the model’s scalar parameters. Calling such scalar-valued functions as “features of primary model parameters”, Cacuci [
27] has recently introduced the “nth-Order Features Adjoint Sensitivity Analysis Methodology for Nonlinear Systems (nth-FASAM-N)”. The nth-FASAM-N enables the most efficient computation of the exact expressions of arbitrarily high-order sensitivities of model responses with respect to the model’s “features”. Subsequently, the sensitivities of the responses with respect to the primary model parameters are determined, analytically and trivially, by using the well-known “chain-rule of differentiation” to obtain the response sensitivities with respect to the model’s features/functions of parameters.
Based on the general framework of the nth-FASAM-N methodology [
27], Cacuci has developed specific sensitivity analysis methodologies for NODE-nets, as follows: the “First-Order and Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (1st-FASAM-NODE)” and the “Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Ordinary Differential Equations (2nd-FASAM-NODE)” [
28]. The 1st-FASAM-NODE and the 2nd-FASAM-NODE are pioneering sensitivity analysis methodologies which enable the computation, with unparalleled efficiency, of exactly-determined first-order and, respectively, second-order sensitivities of decoder response with respect to the optimized/trained weights involved in the NODE’s hidden layers, decoder, and encoder.
Two important families of IDEs are the Volterra and the Fredholm equations. In a Volterra IDE, the interval of integration grows linearly during the system’s dynamics, while in a Fredholm IDE the interval of integration is fixed during the dynamic-history of the system, but at any given time instance within this interval, the system depends on the past, present, and future states of the system. By applying the general concepts underlying the nth-FASAM-N methodology [
27], Cacuci [
29,
30] has also developed the general methodologies underlying the “Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Integral Equations of Fredholm-Type (2nd-FASAM-NIE-F)” and the “Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Integral Equations of Volterra-Type (2nd-FASAM-NIE-V)”. The 2nd-FASAM-NIE-F encompasses the “First-Order Features Adjoint Sensitivity Analysis Methodology for Neural Integral Equations of Fredholm-Type (1st-FASAM-NIE-F), while the 2nd-FASAM-NIE-V encompasses the “First-Order Features Adjoint Sensitivity Analysis Methodology for Neural Integral Equations of Volterra-Type (1st-FASAM-NIE-V)”. The 1st-FASAM-NIE-F and 1st-FASAM-NIE-V methodologies, respectively, enable the computation, with unparalleled efficiency, of exactly-determined first-order sensitivities of decoder response with respect to the NIE-parameters, requiring a single “large-scale” computation for solving the 1st-Level Adjoint Sensitivity System (1st-LASS), regardless of the number of weights/parameters underlying the NIE-net. The 2nd-FASAM-NIE-F and 2nd-FASAM-NIE-V methodologies, respectively, enable the computation (with unparalleled efficiency) of exactly-determined second-order sensitivities of decoder response with respect to the NIE-parameters, requiring only as many “large-scale” computations as there are first-order sensitivities with respect to the feature functions.
Generalizing the methodologies presented in [
29,
30], this work presents the “First- and Second Order Features Adjoint Sensitivity Analysis Methodology for Neural Integro-Differential Equations of Fredholm-Type” abbreviated as “1st-FASAM-NIDE-F” and “2nd-FASAM-NIDE-F”, respectively. These methodologies are also based on the general principles underlying the nth-FASAM-N methodology [
24]. The 1st-FASAM-NIDE-F is presented in
Section 2, while the 2nd-FASAM-NIDE-F is presented in
Section 3, in the sequel. The discussion presented in
Section 4 concludes this work by highlighting the unparalleled efficiency of the 1st-FASAM-NIDE-F and 2nd-FASAM-NIDE-F methodologies, respectively, for computing exact first- and second-order sensitivities, respectively, of decoder responses to model parameters in optimized NIE-F networks. The accompanying work [
31] presents an illustrative application of the 1st-FASAM-NIDE-F and 2nd-FASAM-NIDE-F methodologies to a paradigm heat conduction model which admits exact closed-form solutions/expressions for all quantities of interest and is of fundamental importance in many scientific fields [
32,
33,
34,
35,
36,
37].
2. First-Order Features Adjoint Sensitivity Analysis Methodology for Neural Integro-Differential Equations of Fredholm-Type (1st-FASAM-NIDE-F)
The mathematical expression of the network of nonlinear Fredholm-type Neural Integro-Differential Equations (NIDE-F) considered in this work generalizes the NIDE-net model introduced in [
16] and is represented in component form by the following system of
Nth-order integro-differential equations:
The boundary conditions, imposed at the “initial time”
and/or “final time”
on the functions
and their time-derivatives associated with the encoder of the NIDE-F net represented by Equation (1) are represented in operator form as follows:
The quantities appearing in Equations (1) and (2) are defined as follows:
- (i)
The real-valued scalar quantities and , , are time-like independent variables which parameterize the dynamics of the hidden/latent neuron units. Customarily, the variable is called the “global time” while the variable is called the “local time”. The initial time-value is denoted as while the stopping time-value is denoted as . Thus, the dynamics modeled by Equation (1) depends both on non-local effects, as well as on instantaneous information.
- (ii)
The components of the -dimensional vector-valued function represents the hidden/latent neural networks; denotes the total number of components of . In this work, the symbol “” will be used to denote “is defined as” or, equivalently, “is by definition equal to”. The various vectors will be considered to be column vectors. Typically, vectors will be denoted using bold lower-case letters. The dagger “” symbol will be used to denote “transposition”.
- (iii)
The components of the column-vector represent the “primary” network parameters, namely scalar learnable adjustable parameters/weights, in all of the latent neural-nets, including the encoders(s) and decoder(s), where denotes the total number of adjustable parameters/weights.
- (iv)
The scalar-valued components , , of the vector-valued function represent the “feature/functions of the primary model parameters”. The quantity denotes the total number of such feature functions comprised in the NIDE-F. In particular, all of the model parameters that might appear solely in the boundary and/or initial conditions are considered to be included among the components of the vector . In general, is a nonlinear vector-valued function of . The total number of feature functions must necessarily be smaller than the total number of primary parameters (weights), i.e., . When the NIDE-F comprises only primary parameters, it is considered that for all .
- (v)
The functions model the dynamics of the neurons in a latent space where the local time integration occurs, while the functions map the local space back to the original data space. The functions model additional dynamics in the original data space. In general, these functions are nonlinear in their arguments.
- (vi)
The functions are coefficient-functions, which may depend nonlinearly on the functions and , associated with the order, , of the time-derivatives of the functions .
- (vii)
The operators , , represent boundary conditions associated with the encoder and/or decoder, imposed at and/or at on the functions and on their time-derivatives; the quantity “BC” denotes the “total number of boundary conditions”.
Customarily, the NIDE-F net is “trained” by minimizing a user-chosen loss functional representing the discrepancy between a reference solution (“target data”) and the output produced by the NIDE-F decoder. The “training” process produces “optimal” values for the primary parameters
, which will be denoted in this work by using the superscript “zero”, as follows:
. Using these optimal/nominal parameter values to evaluate the NIDE-F net yields the optimal/nominal solution
which will satisfy the following form of Equation (1):
subject to the following optimized/trained boundary conditions:
After the NIDE-F net is optimized to reproduce the underlying physical system as closely as possible, the subsequent responses of interest are no longer “loss functionals” but become specific functionals of the NIDE-F’s “decoder” output, which can be generally represented by the functional
defined below:
The function
models the decoder. The scalar-valued quantity
is a functional of
and
, and represents the NIDE-F’s decoder-response. At the optimal/nominal parameter values, i.e., at
, the decoder response takes on the following formal form:
The physical system modeled by the NIDE-F net comprises parameters that stem from measurements and/or computations. Consequently, even if the NIDE-F net models perfectly the underlying physical system, the NIDE-F’s optimal weights/parameters are unavoidably afflicted by uncertainties stemming from the parameters underlying the physical system. Hence, it is important to quantify the uncertainties induced in the decoder output, , by the uncertainties that afflict the parameters/weights underlying the physical system modeled by the NIDE-F net. The relative contributions of the uncertainties afflicting the optimal parameters to the total uncertainty in the decoder response are quantified by the sensitivities of the NIDE-F decoder-response with respect to the optimized NIDE-F parameters. The general methodology for computing the first-order sensitivities of the decoder output, , with respect to the components of the feature function , and with respect to the primary model parameters , will be presented in this Section.
The known nominal values of the primary model parameters (“weights”) characterizing the NIDE-V net will differ from the true but unknown values of the respective weights by variations denoted as . The variations will induce corresponding variations , , in the feature functions. The variations and will induce, through Equation (1), variations around the nominal/optimal functions . In turn, the variations and will induce variations in the NIE decoder’s response.
The “First-Order Features Adjoint Sensitivity Analysis Methodology for Neural Integro-Differential Equations of Fredholm-Type (1st-FASAM-NIDE-F)” aims at obtaining the exact expressions of the first-order sensitivities (i.e., functional derivatives) of the decoder’s response with respect to the feature function and the primary model parameters, followed by the most efficient computation of these sensitivities. The 1st-FASAM-NIDE-F will be established by applying the same principles as those underlying the 1st-FASAM-N [
24] methodology. The fundamental concept for defining the sensitivity of an operator-valued quantity
with respect to variations
in a neighborhood around the nominal values
, has been shown in 1981 by Cacuci [
38] to be provided by the 1st-order Gateaux- (G-) variation
of
, which is defined as follows:
for a scalar
and for arbitrary vectors
in a neighborhood
around
. When the G-variation
is linear in the variation
, it can be written in the form
, where
denotes the first-order G-derivative of
with respect to
, evaluated at
.
Applying the definition provided in Equation (7) to Equation (5) yields the following expression for the first-order G-variation
of the response
:
where the “direct effect term” arises directly from variations
and is defined as follows:
and where the “indirect effect term” arises indirectly, through the variations
in the hidden state functions
, is defined as follows:
The direct-effect term can be quantified using the nominal values but the indirect-effect term can be quantified only after determining the variations , which are caused by the variations through the NIDE-F net defined in Equation (1).
The first-order relationship between the variations
and
is obtained from the first-order G-variations of Equations (1) and (2). The first-order G-variations of Equations (1) and (2), respectively, are obtained, by definition, as follows:
Carrying out the operations indicated in Equations (11) and (12) yields the following NIDE-F net of Fredholm-type for the function
:
where:
The NIDE-F net represented by Equations (13) and (14) is called [
24] the “1st-Level Variational Sensitivity system (1st-LVSS) and its solution,
is called [
24] the “1st-level variational function”. All of the quantities in Equations (13) and (14) are to be computed at the nominal parameter values, but the respective indication has not been explicitly shown in order to simplify the notation.
It is important to note that the 1st-LVSS is linear in the variational function
. Therefore, the 1st-LVSS represented by Equation (13) can be written in matrix-vector form as follows:
where the
-dimensional rectangular matrix
comprises as components the quantities
defined in Equation (15), while the components of the
square matrix
are operators (algebraic, differential, integral) defined below, for
:
Note that the 1st-LVSS would need to be solved anew for each variation
,
, in order to determine the corresponding function
, which is prohibitively expensive computationally if
is a large number. The need for repeatedly solving the 1st-LVSS can be avoided if the variational function
could be eliminated from appearing in the expression of the indirect-effect term defined in Equation (10). This goal can be achieved [
24] by expressing the right-side of Equation (10) in terms of the solutions of the “1st-Level Adjoint Sensitivity System (1st-LASS)” to be constructed next. The construction of this 1st-LASS will be performed in a Hilbert space comprising elements of the same form as
, defined on the domain
. This Hilbert space is endowed with an inner product of two elements
and
, denoted as
and defined as follows:
The next step is to construct the inner product of Equation (13) with a vector
, where the superscript “(1)” indicates “1st-Level”, to obtain the following relationship:
The terms appearing in Equation (20) are to be computed at the nominal values but the respective notation has been omitted for simplicity.
Using the definition of the adjoint operator in
, the term on the left-side of Equation (20) is integrated by parts and the order of summations is reversed to obtain the following relation:
where the operator
denotes the formal adjoint of the operator
and where
represents the scalar-valued bilinear concomitant evaluated on the boundary
and/or
. Note that the
matrix valued operator
acts linearly on the vector
. The “star” superscript (*) will be used in this work to denote “formal adjoint operator”.
It follows from Equations (20) and (21) that the following relation holds:
The term on the left-side of Equation (22) is now required to represent the indirect effect term defined in Equation (10) by imposing the following relation:
Using Equations (22) and (23) in Equation (10) yields the following expression for the indirect effect term:
The boundary conditions accompanying Equation (23) for the function
are now chosen at the time values
and/or
so as to eliminate all unknown values of the 1st-level variational function
from the bilinear concomitant
which remain after implementing the initial conditions provided in Equation (2). These boundary conditions for the function
can be represented in operator form as follows:
The Fredholm-like NIDE net represented by Equations (23) and (25) will be called the “1st-Level Adjoint Sensitivity System” and the solution, , will be called the “1st-level adjoint sensitivity function”. The 1st-LASS is solved using the nominal/optimal values for the parameters and for the function but this fact has not been explicitly indicated in order to simplify the notation. Notably, the 1st-LASS is independent of any parameter variations so it needs to be solved just once to obtain the 1st-level adjoint sensitivity function The 1st-LASS is linear in but is, in general, nonlinear in .
Adding the result obtained in Equation (24) for the indirect-effect term
to the result obtained in Equation (9) for the direct-effect term yields the following expression for the first-order G-differential of the response
:
where
denotes the first-order sensitivity of the response
with respect to the components
of the “feature”. Each sensitivity
is obtained by identifying the expression that multiplies the corresponding variation
and can be represented formally in the following integral form:
The functions will be subsequently used for determining the exact expressions of the second-order sensitivities of the response with respect to the components of the feature function of model parameters.
In the following subsections, the detailed forms of the 1st-LASS are provided for first-order (n = 1) and, respectively, second-order (n = 2) Fredholm-like NIDE.
2.1. First-Order Neural Integral Equations of Fredholm-Type (1st-NIDE-F)
The representation of the first-order
neural integral equations of Fredholm-type (1st-NIDE-F) is provided below, for
:
The typical boundary conditions provided at
(“encoder”) are as follows:
where the scalar values
are known, albeit imprecisely, since they are considered to stem from experiments and/or computations. Equations (28) and (29) are customarily considered an “initial value (NIDE-F) problem” although the independent variable
t could represent some other physical entity (e.g., space, energy, etc.) rather than time.
The 1st-LVSS for the function
is obtained by G-differentiating Equations (28) and (29), and has the following particular forms of Equations (13) and (14) for
:
where:
The 1st-LASS is constructed by using Equation (19) to form the inner product of Equation (30) with a vector
to obtain the following relationship:
Examining the structure of the left-side of Equation (33) reveals that the bilinear concomitant will arise from the integration by parts of the first term the on the left-side of Equation (33) to obtain the following relation:
where the bilinear concomitant
has the following expression, by definition:
The second term on the left-side of Equation (33) will be recast in its “adjoint form” by reversing the order of summations so as to transform the inner product involving the function
to an inner product involving the function
, as follows:
The third term on the left-side of Equation (33) is now recast in its “adjoint form” by reversing the order of summations and integrations so as to transform the inner product involving the function
into an inner product involving the function
, as follows:
The fourth term on the left-side of Equation (33) will be recast in its “adjoint form” by reversing the order of summations and integrations so as to transform the inner product involving the function
into an inner product involving the function
, as follows:
Using the results obtained in Equations (34)–(38) in the left-side of Equation (33) yields the following relation:
The relation in Equation (39) is rearranged as follows:
The term on the right-side of Equation (40) is now required to represent the “indirect-effect” term defined in Equation (10), which is achieved by requiring the components of the function
to satisfy the following system of first-order NIDE-F equations:
The relation obtained in Equation (41) is the explicit form of the relation provided in Equation (23) for the particular case when , i.e., when considering first-order neural integral equations of Fredholm-type (1st-NIDE-F).
The unknown values
in the bilinear concomitant
in Equation (40) are eliminated by imposing the following final-time conditions:
It follows from Equations (33)–(42) and (31) that the indirect-effect term defined in Equation (10) has the following expression in terms of the 1st-level adjoint sensitivity function
:
The first-order NIDE-F obtained in Equations (41) and (42) represents the explicit form for the particular case n = 1 of the 1st-LASS represented, in general, by Equations (23) and (25). To obtain the 1st-level adjoint sensitivity function , the 1st-LASS is solved backwards in time (globally) using the nominal/optimal values for the parameters and for the function but this fact has not been explicitly indicated in order to simplify the notation. Notably, the 1st-LASS is independent of any parameter variations so it needs to be solved just once to obtain the 1st-level adjoint sensitivity function The 1st-LASS is linear in but is, in general, nonlinear in .
Using the results obtained in Equations (43) and (9) in Equation (8) yields the following expression for the G-variation
, which is seen to be linear in the variations
,
, in the model’s feature functions (induced by variations in the model’s primary parameters) and the variations
,
in the decoder’s initial conditions:
The expression in Equation (44) is to be satisfied at the nominal/optimal values for the respective model parameters, but this fact has not been indicated explicitly in order to simplify the notation.
Identifying in Equation (44) the expressions that multiply the variations
yields the following expressions for the decoder response sensitivities with respect to the encoder’s initial conditions:
It is apparent from Equation (45) that the sensitivities are functionals of the form predicted in Equation (27). It is also apparent from Equation (45) that the sensitivities are proportional to the values of the respective component of the 1st-level adjoint function evaluated at the initial-time . This relation provides an independent mechanism for verifying the correctness of solving the 1st-LASS from to (backwards in time) since the sensitivities can be computed independently of the 1st-LASS by using finite differences of appropriately high-order in conjunction with known variations and the correspondingly induced variations in the decoder response. Special attention needs to be devoted, however, to ensure that the respective finite-difference formula is accurate, which may need several trials with different values chosen for the variation .
It also follows from Equations (44) and (32) that the sensitivities
of the response
with respect to the components
of the feature function
have the following expressions, written in the form of Equation (27):
where
The subscript “1” attached to the quantity indicates that this quantity refers to a “first-order” NIDE-F net, while the superscript “(1)” indicates that this quantity refers to “first-order” sensitivities.
The sensitivities with respect to the primary model parameters can be obtained by using the result shown in Equation (46) together with the “chain rule” of differentiating compound functions, as follows:
When there only model parameters (i.e., there are no feature functions of model parameters), then for all , and the expression obtained in Equation (46) yields directly the first-order sensitivities , for all . In this case, all of the sensitivities , for all would be obtained by computing integrals (using quadrature formulas). In contradistinction, when features of parameters can be established, only integrals would need to be computed (using quadrature formulas) to obtain the , ; the sensitivities with respect to the model parameters would subsequently be obtained analytically using the chain-rule provided in Equation (48).
Occasionally, the boundary conditions may be provided through a measurement at the boundary
(“decoder”), as follows:
where the scalar values
are known, albeit imprecisely, since they are considered to stem from experiments and/or computations. In such a case, the determination of the first-order sensitivities
of the response
with respect to the components
of the feature function
follows the same steps as above, yielding the following results:
- (i)
The 1st-LASS will become an “initial value problem” comprising Equation (41), subject not the conditions shown in Equation (42), but subject to the following “initial conditions
- (ii)
The sensitivities of the response with respect to the components of the feature function will have the same formal expressions as in Equation (46) but the components of the 1st-level adjoint function will be the solution of Equations (41) and (50).
- (iii)
The sensitivities of the response
with respect to boundary conditions at
will have the following expressions:
2.2. Second-Order Neural Integral Equations of Fredholm-Type (2nd-NIDE-F)
The representation of the second-order
neural integral equations of Fredholm-type (2nd-NIDE-F) is provided below, for
:
There are several combinations of boundary conditions that can be provided, either for the function
and/or for its first-derivative
,
, at either
(encoder) or at
(decoder), or a combination thereof. For illustrative purposes, consider that the boundary conditions are as follows:
The 1st-LVSS is obtained by taking the G-variations of Equations (52) and (53) to obtain the following system, comprising the forms taken on for
by Equations (13) and (14), respectively:
where for
and
:
The 1st-LASS is constructed by using Equation (19) to form the inner product of Equation (54) with a vector
to obtain the following relationship:
Examining the structure of the left-side of Equation (57) reveals that the bilinear concomitant will arise from the integration by parts of the first and third terms the on the left-side of Equation (57), as follows:
where the bilinear concomitant
has the following expression:
The remaining terms on the left-side of Equation (57) will be recast into their corresponding “adjoint form” by using the results obtained in Equations (34)–(38). Using these results together with the results obtained in Equations (58) and (59) yields the following expression for the left-side Equation (57):
Using Equation (58) and rearranging the terms on the right-side of Equation (60) yields the following relation:
The term on the right-side of Equation (61) is now required to represent the “indirect-effect” term defined in Equation (10), which is achieved by requiring the components of the function
to satisfy the following 1st-LASS:
The relation obtained in Equation (62) is the explicit form of the relation provided in Equation (23) for the particular case when , i.e., when considering second-order neural integral equations of Fredholm-type (2nd-NIDE-F).
The unknown values involving the function
in the bilinear concomitant
defined in Equation (59) are eliminated by imposing the following conditions:
It follows from Equations (57)–(63) and (55) that the indirect-effect term defined in Equation (10) has the following expression in terms of the 1st-level adjoint sensitivity function
:
where the boundary quantity
contains the known remaining terms after having implemented the known boundary conditions given in Equations (55) and (63), and has the following explicit expression:
Using the results obtained in Equations (64), (65), (56) and (9) in Equation (8) yields the following expression for the G-variation
, which is seen to be linear in the variations
,
(
) and
(
):
The expression in Equation (66) is to be satisfied at the nominal/optimal values for the respective model parameters, but this fact has not been indicated explicitly in order to simplify the notation.
It also follows from Equations (66) and (56) that the sensitivities
of the response
with respect to the components
of the feature function
have the following expressions, written in the form of Equation (27):
where
The subscript “2” attached to the quantity indicates that this quantity refers to a “second-order” NIDE-F net, while the superscript “(1)” indicates that this quantity refers to “first-order” sensitivities. As expected, the expression of reduces to the expression of when the “second-order NIDE-F net” reduces to the “first-order NIDE-F net” in the case when .
Identifying in Equation (66) the expressions that multiply the variations
yields the following expressions for the decoder response sensitivities with respect to the encoder’s initial-time conditions:
Identifying in Equation (66) the expressions that multiply the variations
yields the following expressions for the decoder response sensitivities with respect to the final-time conditions:
If the boundary conditions imposed on the forward functions and/or the first-derivatives , , differ from the illustrative ones selected in Equation (53), then the corresponding boundary conditions for the 1st-level adjoint function would also differ from the ones shown in Equation (63), as would be expected. The components of would consequently have different values; therefore, all of the first-order sensitivities would have values different from those computed using Equation (68), even though the formal mathematical expressions of the respective sensitivities would remain unchanged. Of course, the sensitivities and would have expressions that would differ from those in Equations (69) and (70), respectively, if the boundary conditions in Equation (53), and consequently those in Equation (63), were different, since the residual bilinear concomitant would have a different expression from that shown in Equation (65).
3. Second-Order Features Adjoint Sensitivity Analysis Methodology for Neural Integro-Differential Equations of Fredholm Type (2nd-FASAM-NIDE-F)
The second-order sensitivities of the response
defined in Equation (5) will be computed by conceptually using their basic definitions as being the “first-order sensitivities of the first-order sensitivities”. Recall that the generic expression of the first-order sensitivities,
,
, of the response with respect to the components of the feature function
is provided in Equation (46). It follows that the second-order sensitivities of the response with respect to the components of the feature function will be provided by the first-order G-differential
of
, which is by definition obtained as follows:
where the indirect-effect term
comprises all dependencies on the vectors
and
of variations in the state functions
and
, around the respective nominal values denoted as
and
, respectively, which are computed at the nominal parameter values
. This indirect-effect term is defined as follows:
The variational function
is the solution of the system of equations obtained by G-differentiating the 1st-LASS defined in Equations (23) and (25), which is by definition obtained as follows:
Carrying out the operations indicated in Equations (73) and (74) yields the following relations:
For subsequent derivations, it is convenient to represent the relations in Equation (75) in matrix-vector form, as follows:
where
As indicated by Equation (78), the variational functions
and
are the solutions of the system of matrix equations obtained by concatenating the 1st-LVSS defined by Equations (14) and (16) with Equations (77) and (78). The concatenated system thus obtained will be called the 2nd-Level Variational Sensitivity System (2nd-LVSS) and has the block-matrix form provided below:
To distinguish block-matrices from block-vectors, two bold capital-letters have been used (and will henceforth be used) to denote block-matrices, as in the case of “the second-level variational matrix”
. The “2nd-level” is indicated by the superscript “(2)”. The argument “
”, which appears in the list of arguments of
, indicates that this matrix is a
-dimensional block-matrix comprising four submatrices, each of dimensions
. The structure of the block-matrix
is provided below:
The argument “2” which appears in the list of arguments of the vector
and of the “variational vector”
in Equation (80) indicates that each of these vectors is a 2-block column vector, each block comprising a column-vector of dimension
; the vectors
and
are defined as follows:
The 2-block vector
is defined as follows:
The 2-block column vector in Equation (81) represents the concatenated boundary/initial conditions provided in Equations (14) and (77), evaluated at the nominal parameter values. The argument “2” in the expression in Equation (81) indicates that this expression is a two-block column vector comprising two vectors, each of which has -components, all of which are zero-valued.
The need for solving the 2nd-LVSS is circumvented by deriving an alternative expression for the indirect-effect term
defined in Equation (72), in which the function
is replaced by a 2nd-level adjoint function that is independent of variations in the model parameter and state functions. This 2nd-level adjoint function will be the solution of a 2nd-Level Adjoint Sensitivity System (2nd-LASS), which will be constructed by using the same principles as employed for deriving the 1st-LASS. The 2nd-LASS is constructed in a Hilbert space
,
, comprising block-vectors having the same structure as
that can generically be represented as follows:
, with
, for
. The Hilbert space
is endowed with the following inner product of two vectors
and
:
The inner product defined in Equation (85) will be used to construct the 2nd-Level Adjoint Sensitivity System (2nd-LASS) for a 2nd-level adjoint function
,
,
, by implementing the following sequence of steps, which are conceptually similar to those implemented in
Section 2 for constructing the 1st-FASAM-NIDE-F methodology:
Using Equation (85), construct the inner product of the yet undetermined function
with Equation (80) to obtain the following relation:
Use the definition of the operator adjoint to
in the Hilbert space
to transform the inner product on the left-side of Equation (86) as follows:
where the quantity
denotes the corresponding bilinear concomitant on the domain’s boundary, evaluated at the nominal values for the parameters and respective state functions, and where the operator
denotes the formal adjoint of the matrix-valued operator
, comprising
block-matrices, each of dimensions
, having the following block-matrix structure.
Require the inner product on the right-side of Equation (87) to represent the indirect-effect term
defined in Equation (72) by imposing the following relation:
where
Since the source-term on the right-side of Equation (89) is a distinct quantity for each value of the index , this index has been added to the list of arguments of the function in order to emphasize that a distinct function will correspond to each index . Of course, the adjoint operator that acts on the function is independent of the index and could, in principle, be inverted just once and stored for subsequent repeated applications to the -dependent source terms for computing the corresponding functions .
- 4
The definition of the function
is completed by requiring it to satisfy adjoint boundary/initial conditions represented in operator form as follows:
The boundary/initial conditions represented by Equation (91) are determined imposing the following requirements:
- (a)
they must be independent of unknown values of ;
- (b)
the substitution of the boundary and/or initial conditions represented by Equations (81) and (91) into the expression of the bilinear concomitant must cause all terms containing unknown boundary/initial values of to vanish.
The NIDE-net comprising Equations (89) and (91) is called the “2nd-Level Adjoint Sensitivity System (2nd-LASS)” and its solution, , , is called the “2nd-level adjoint sensitivity function”. The unique properties of the 2nd-LASS will be highlighted in the sequel, below.
Using in Equation (72) the relations defining 2nd-LASS together with the 2nd-LVSS and the relation provided in Equation (87) yields the following alternative expression for the indirect-effect term, involving the 2nd-level adjoint sensitivity function
instead of the 2nd-level variational function
:
where
denotes known residual (non-zero) boundary terms which may not have vanished after having used the boundary and/or initial conditions represented by Equations (81) and (91).
Replacing the expression obtained in Equation (92) into Equation (71) yields the following expression:
The expressions of the second-order sensitivities of the response with respect to the components of the feature function are obtained by performing the following sequence of operations:
- (i)
Use Equation (84) to recast the second term on the right-side of Equation (93) as follows:
- (ii)
Recall that
, where the quantities
were defined in Equation (15). Recall that
where the quantities
were defined in Equation (76). Insert these expressions in Equation (94) to obtain the following relation:
- (iii)
Insert into Equation (93) the equivalent expression obtained in Equation (95), and subsequently identifying the quantities that multiply the variations
, to obtain the following expression for the second-order sensitivities
:
It is important to note that the 2nd-LASS is independent of variations and variations in the respective state functions. It is also important to note that the -dimensional matrix is independent of the index . Only the source-term depends on the index . Therefore, the same solver can be used to invert the matrix in order to solve numerically the 2nd-LASS for each -dependent source in order to obtain the corresponding -dependent -dimensional 2nd-level adjoint function . Computationally, it would be most efficient to store, if possible, the inverse matrix , in order to multiply directly the inverse matrix with the corresponding source term , for each index , in order to obtain the corresponding -dependent -dimensional 2nd-level adjoint function .
Since the adjoint matrix is block-diagonal, solving the 2nd-LASS is equivalent to solving two 1st-LASS, with two different source terms. Thus, the “solvers” and the computer program used for solving the 1st-LASS can also be used for solving the 2nd-LASS. The 2nd-LASS was designated as the “second-level” rather than the “second-order” adjoint sensitivity system, since the 2nd-LASS does not involve any explicit 2nd-order G-derivatives of the operators underlying the original system but involves the inversion of the same operators that need to be inverted for solving the 1st-LASS.
If the 2nd-LASS is solved -times, the 2nd-order mixed sensitivities will be computed twice, in two different ways, in terms of two distinct 2nd-level adjoint functions. Consequently, the symmetry property provides an intrinsic (numerical) verification that the 1st-level adjoint function and the components of the 2nd-level adjoint function and are computed accurately.
The second-order sensitivities of the decoder-response with respect to the optimal weights/parameters
, are obtained analytically by using the chain rule in conjunction with the expressions obtained in Equations (46) and (96), as follows: