Jeffreys Divergence and Generalized Fisher Information Measures on Fokker–Planck Space–Time Random Field

In this paper, we present the derivation of Jeffreys divergence, generalized Fisher divergence, and the corresponding De Bruijn identities for space–time random field. First, we establish the connection between Jeffreys divergence and generalized Fisher information of a single space–time random field with respect to time and space variables. Furthermore, we obtain the Jeffreys divergence between two space–time random fields obtained by different parameters under the same Fokker–Planck equations. Then, the identities between the partial derivatives of the Jeffreys divergence with respect to space–time variables and the generalized Fisher divergence are found, also known as the De Bruijn identities. Later, at the end of the paper, we present three examples of the Fokker–Planck equations on space–time random fields, identify their density functions, and derive the Jeffreys divergence, generalized Fisher information, generalized Fisher divergence, and their corresponding De Bruijn identities.


Introduction
Information entropy and Fisher information are quantities to measure random information, and entropy divergence is derived from information entropy to measure the difference between two probability distributions.Formally, we can construct straightforward definitions of entropy divergence and Fisher information for the case of a space-time random field found on classical definitions.The density function, in their definitions, can be obtained in many different ways.In this paper, the density function of a space-time random field is obtained by Fokker-Planck equations.The traditional Fokker-Planck equation is a partial differential equation that describes the probability density function of a random process [1].It describes the density function's time-varying change rule.However, the Fokker-Planck equations for random fields, especially for space-time random fields, do not yet possess a distinct form.The classical equation needs to be generalized because the variable varies from time to space-time.
In this paper, we mainly obtain the relation between Jeffreys divergence and generalized Fisher information measure for space-time random field generated by Fokker-Planck equations.Jeffreys divergence is a symmetric entropy divergence, which is generalized from Kullback-Leibler divergence (KL divergence).Jeffreys divergence is a measure in information theory and statistics that evaluates the variation between anticipated and real probability distributions.However, if there is no overlap between the two distributions, the outcome will be infinite, which is a limitation of this approach.To prevent infinite results, we examine how Jeffreys divergence relates to generalized Fisher information for a space-time random field with slight variations in space-time parameters.
Moreover, the classical De Bruijn identity describes the relationship between differential entropy and the Fisher information of the Gaussian channel [2], and it can be generalized to other cases [3][4][5][6][7].With gratitude to their works and following their ideas, we obtain De Bruijn identities on Jeffreys divergence and generalized Fisher information of space-time random fields, whose density functions satisfy Fokker-Planck equations.

Kramers-Moyal Expansion and Fokker-Planck Equation
In the literature of stochastic processes, Kramers-Moyal expansion refers to a Taylor series of the master equation, named after Kramers and Moyal [28,29].The Kramers-Moyal expansion is an infinite order partial differential equation where p(u, t) is the density function and is the n-order conditional moment.Here, W(u |u, t) is the transition probability rate.The Fokker-Planck equation is obtained by keeping only the first two terms of the Kramers-Moyal expansion.In statistical mechanics, the Fokker-Planck equation is usually used to describe the time evolution of the probability density function of the velocity of a particle under the influence of drag forces and random forces, as in the famous Brownian motion, and this equation is commonly employed for determining the density function of an Itô stochastic differential equation [1].

Differential Entropy and De Bruijn Identity
The entropy of a continuous distribution was proposed by Shannon in 1948, known as differential entropy [30]: where h(•) represents the differential entropy and p(•) is the probability density function of X.However, differential entropy is not easy to calculate and seldom exists.There are related studies on the entropy of stochastic processes and continuous systems [31][32][33][34].If we consider a classical one-dimensional Gaussian channel model where X is the input signal, G is standard Gaussian noise, t ≥ 0 is the strength, and Y t is the output, we can obtain that the density of Y t satisfies the following Fokker-Planck equation: Furthermore, the differential entropy of Y t can be calculated, and then its derivative with respect to t can be obtained as where is the Fisher information of Y t .The Equation ( 8) here is the De Burijn identity.The de Bruijn identity connects the differential entropy h(•) and the Fisher information FI(•), which shows that they are different aspects of the concept of "information".

Entropy Divergence
In information theory and statistics, an entropy divergence is a statistical distance generated from information entropy to measure the difference between two probability distributions.There are various divergences generated by information entropy, such as Kullback-Leibler divergence [35], Jeffreys divergence [36], Jensen-Shannon divergence [37], and Rényi divergence [38].These measures are applied in a variety of fields such as finance, economics, biology, signal processing, pattern recognition, and machine learning [39][40][41][42][43][44][45][46][47][48][49].In this paper, we mainly focus on the Jeffreys divergence of two distributions, formed as where µ is a measure of u.

Notations and Assumptions
In this paper, we use the subsequent notations and definitions

•
The probability density functions of P and Q are denoted as p and q.With ∀u ∈ R, p(u; t, x) is the density value at (t, x) of X and q(u; s, y) is the density value at (s, y) of Y.

•
Unless there are specific restrictions on the ranges of variables, suppose that our density functions p(u; t, x) and q(u, s, y) belongs to ).This means that p(u; t, x) and q(u; s, y) are partial differentiable twice with respect to u and once with respect to (t, x) or (s, y), respectively.

•
Vectors that differ only from the k-th coordinate of

Definitions
To obtain the generalized De Bruijn identities between Jeffreys divergence and Fisher divergence, we need to introduce some new definitions and propositions.
The primary and most important measure of information is the Kullback-Leibler divergence for random fields.Definition 1 is easily obtained as follows.
Definition 1.The Kullback-Leibler divergence between two space-time random fields X(t, x) and Y(s, y), (t, x), (s, y) ∈ R + × R d , with density functions p(u; t, x) and q(u; s, y), is defined as KL(P(t, x) Q(s, y)) = R p(u; t, x) log p(u; t, x) q(u; s, y) du (11) Similar to the classical Kullback-Leibler divergence, Kullback-Leibler divergence on random fields is not symmetrical, i.e., KL(P(t, x) Q(s, y)) = KL(Q(s, y) P(t, x)) (12) Following the classical definition of Jeffreys divergence on two random variables, we mainly consider Jeffreys divergence for random fields in this paper.
Definition 2. The Jeffreys divergence between space-time random fields X(t, x) and Y(s, y), (t, x), (s, y) ∈ R + × R d , with density function p(u; t, x) and q(u; s, y) is defined as Here, we replace with , in the distortion measure to emphasize the symmetric property.
Another significant measure of information is Fisher information.In this paper, we consider the generalized Fisher information of the space-time random field.Definition 3. The Generalized Fisher information of the space-time random field X(t, x), (t, x) ∈ R + × R d , with density function p(u; t, x) defined by nonnegative function f (•), is formed as In this case, where f is equal to 1, FI 1 (P(t, x)) represents the typical Fisher information.In addition to Equation (14), there are similar forms of generalized Fisher information and FI ) Obviously, (15) and ( 16) are generalized Fisher information on space-time variables.Regarding the generalized Fisher information (14), we can come to a following simple proposition.Proposition 1.For the arbitrary positive continuous function f (•), suppose the generalized Fisher information of continuous random variable X is well defined, where p X (u) represents the probability density.Then, we have the generalized Fisher information inequality when f ≡ 1, FI 1 (X) represents the Fisher information in the standard case.
Proof.Denote Z = X + Y, p X , p Y , and p Z represent densities, i.e., and derivative function If p X , p Y , and p Z never vanish, is the conditional expectation of p X (x) p X (x) for given z.Similarly, we can obtain and ∀µ, λ ∈ R, we also find that Then, we have with equality only if with probability 1 whenever z = x + y and we have Averaging both sides over the distribution of z i.e., (µ + λ) Let µ = 1 FI f (X) and λ = 1 FI f (Y) , we obtain According to Definition 3, we can obtain relevant definitions on the generalized Fisher information measure.

Definition 4.
The generalized Cross-Fisher information for space-time random fields X(t, x) and Y(s, y), (t, x), (s, y) ∈ R + × R d , with density functions p(u; t, x) and q(u; s, y), defined by the nonnegative function f (•), is defined as Similar to the concept of cross-entropy, it is easy to verify that ( 30) is symmetrical about P and Q. Definition 5.The generalized Fisher divergence for space-time random fields X(t, x) and Y(s, y), for (t, x), (s, y) ∈ R + × R d , with density functions p(u; t, x) and q(u; s, y), defined by nonnegative function f (•), is defined as In particular, when f ≡ 1, FD 1 (P(t, x) Q(s, y)) represents the typical Fisher divergence.
Obviously, the generalized Fisher divergence between two random fields is not a symmetrical measure of information.We need to create a new formula to expand on (31) in order to achieve symmetry.Definition 6.The generalized Fisher divergence for space-time random fields X(t, x) and Y(s, y), (t, x), (s, y) ∈ R + × R d , with density functions p(u; t, x) and q(u; s, y), defined by nonnegative functions f (•) and g(•), is defined as In particular, if f equals g, the generalized Fisher divergence for random fields using a single function is denoted as FD ( f , f ) (P(t, x) Q(s, y)).
In general, FD ( f ,g) (P(t, x) Q(s, y)) is asymmetric with respect to P and Q, i.e., If we suppose that f and g are functions only related to P and Q, i.e., where T is an operator; the generalized Fisher divergence FD ( f ,g) (P(t, x) Q(s, y)) can be rewritten as and we can easily obtain In this case, we call (35) symmetric Fisher divergence for random fields generated by operator T and denote it as Notice that for A, B, a, b ∈ R; then, we can rewrite (37) as Lemma 1 (Kramers-Moyal expansion [28,29]).Suppose that the random process X(t) has any order moment; then, the probability density function p(u, t) satisfies the Kramers-Moyal expansion where is the n-order conditional moment and W(u |u, t) is the transition probability rate.
Lemma 2 (Pawula theorem [50,51]).If the limit on conditional moment of random process X(t) exists for all n ∈ N + , and the limit value equals 0 for some even number, then the limit values are 0 for all n ≥ 3.
The Pawula theorem states that there are only three possible cases in the Kramers-Moyal expansion: (1) The Kramers-Moyal expansion is truncated at n = 1, meaning that the process is deterministic; (2) The Kramers-Moyal expansion stops at n = 2, with the resulting equation being the Fokker-Planck equation, and describes diffusion processes; (3) The Kramers-Moyal expansion contains all the terms up to n = ∞.
In this paper, we only focus on the case of the Fokker-Planck equation.

Main Results and Proofs
In this section, we establish the Fokker-Planck equations for continuous space-time random field.Additionally, we present the relationship theorem between Jeffreys divergence and Fisher information, as well as the De Bruijn identities connection between Jeffreys divergence and Fisher divergence.
Theorem 1.The probability density function p(u; t, x) of the continuous space-time random field X(t, x), u ∈ R, (t, x) ∈ R + × R d satisfies the following Fokker-Planck equations: where here, are n-order conditional moments and e k = (0, 0, Proof.∀∆t = 0, we can obtain the difference of density function in the time variable where is the n-order conditional moment.Then, the partial derivative of the density function with respect to t is The Pawula theorem implies that if the Kramers-Moyal expansion stops after the second term, we obtain the Fokker-Planck equation about the time variable t where Similarly, we may consider the increment ∆x k of the spatial variable x k , and we can obtain the Fokker-Planck equations about x k as where The Fokker-Planck equations are partial differential equations that describe the probability density function of the space-time random field, similar to the classical Fokker-Planck equation.Solving a system of partial differential equations for general Fokker-Planck equations proves to be challenging.Fortunately, in Section 4 we present three distinct categories of space-time random fields in detail, along with their corresponding Fokker-Planck equations, and deduce their probability density functions.
Next, we examine the relationship between Jeffreys divergence and Fisher information in a single space-time random field when there are different time or spatial variables.
Theorem 2. Suppose that p(u; t, x) > 0 is a continuous differential density function of the space-time random field X(t, x), the partial derivatives ∂ u p(u; t, x), ∂ t p(u; t, x), ∂ x k p(u; t, x) are continuous bounded functions, and the integrals in the proof are well-defined Similarly, for fixed t and ∀x k = x k , we can obtain the identity on Jeffreys divergence and Fisher information for space coordinates lim |x k −x k |→0 JD P(t, x), P(t, x(k) ) Theorem 2 states that as the space-time variable difference approaches zero, the Fisher information of the space-time random field is the limit of the ratio of Jeffreys divergence at different locations to the square of space-time variable difference.It is noteworthy that Theorem 2 specifically addresses Jeffreys divergence only in cases where a single space-time random field is situated in distinct space-time positions, and where the difference between space-time variables approaches to 0. This ensures that Jeffreys divergence will not be infinite.
Theorem 3. Suppose that p(u; t, x) and q(u; t, x) are continuous differentiable density functions of space-time random fields X(t, x) and Y(t, x) such that k (u; t, x)q(u; t, x) − a k (u; t, x)q(u; t, x) log q(u; t, x) p(u; t, x) − p(u; t, x) q(u; t, x) = 0 (60) where a k , b k are the forms in (44) and (45), and (t, Then, the Jeffreys divergence JD(P(t, x), Q(t, x)) satisfies generalized De Bruijn identities where here, we omit (u; t, x) in the integrals for convenience.
Proof.By Definition 2, we have where p := p(u; t, x), q := q(u; t, x) are density functions of X(t, x) and Y(t, x); here, we omit (u; t, x).
Unlike Theorem 3, Theorem 4 focuses on the Jeffreys divergence between two separate space-time random fields X(t, x) and Y(t, x), both at the same position (t, x), and establishes the identities of the connection between the Jeffreys divergence and the Fisher divergence of X(t, x) and Y(t, x).This is known as the De Bruijn identities.To prevent Jeffreys divergence from becoming infinite, it is necessary for the difference between the probability density functions of X(t, x) and Y(t, x) to be small.In Section 4, we obtain Jeffreys divergence and Fisher divergence using the same type of Fokker-Planck equations but with different parameters.This allows for the selection of only the appropriate parameters.

Three Fokker-Planck Random Fields and Their Corresponding Information Measures
In this section, we present three types of Fokker-Planck equations and derive their corresponding density functions and information measures, which are Jeffreys divergence, generalized Fisher information, and Fisher divergence.With these quantities, the results corresponding to the applications of Theorems 2 and 3 are obtained.On the one hand, we calculate the ratio of Jeffreys divergence to the square of space-time variation on the identical Fokker-Planck space-time random field at various space-time points, in comparison to generalized Fisher information.On the other hand, we derive the De Burijn identities for Jeffreys divergence and generalized Fisher divergence from Fokker-Planck equations on a single space-time random field at the corresponding space-time location, under same type but with different parameters.
First, we present a theorem regarding simple type Fokker-Planck equations of the random field.Theorem 4. Suppose the functions in the Fokker-Planck Equations (43) for the continuous random field X(t, x) are formulated as follows: where a 0 , a k , b 0 , and b k are continuously differentiable functions independent of u and two continuously differentiable functions, α(t, x) and β(t, x), exist such that the initial density function is p(u; t, x) = δ[u − u 0 (x)] as prod(t, x) = 0; then, the density function of X(t, x) is presented as follows: Proof.It can be easily inferred that the Fokker-Planck equations are simple parabolic equations, and their solution can be obtained through Fourier transform Recall that there are two functions α(t, x) and β(t, x) such that we can obtain the probability density function Actually, numerous examples exist in which the Fokker-Planck equations comply with Theorem 4. Let B(t, x) be the (1 + d, 1) Brownian sheet [52,53], that is, a centered continuous Gaussian process that is indexed by (1 + d) real, positive parameters and takes its values in R.Moreover, its covariance structure is given by where (• ∧ •) represents the minimum of two numbers.We can easily obtain where prod(t, x) = tx 1 x 2 • • • x d is the coordinate product of (t, x) and the density function is Moreover, the Fokker-Planck equations of Brownian sheet are with the initial condition p(u; t, x) = δ(u) as prod(t, x) = 0.
Following the concept of constructing a Brownian bridge on Brownian motion [53], we refer to as a Brownian sheet bridge on the cube (t, x) ∈ [0, 1] × [0, 1] d , where B(t, x) represents the Brownian sheet.Obviously, B * (t, x) is Gaussian, and E[B * (t, x)] = 0 and the covariance structure are we can obtain and the density function of B * (t, x) is In addition to this, the Fokker-Planck equations of Brownian sheet bridge are with the initial condition p(u; t, x) = δ(u) as prod(t, x) = 0, and we obtain the solution (85).
Combining two probability density functions (80) and (85) yields their respective Jeffreys divergence and generalized De Burijn identities.The Jeffreys divergence of (74) can be obtained at various space-time points as and the Fisher divergence between P (1) and P (2) at the identical space-time point represents = 1 where Bring the density function of Brownian sheet into Equation (87); we can easily obtain the Jeffreys divergence of the Brownian sheet at various space-time points as JD P (1) (t, x), P (1) and the generalized Fisher information on space-time variables is as follows: Then, we can obtain quotients of the squared difference between Jeffreys divergence and space-time variables JD P (1) (t, x), P (1) (t, x(k) ) and then we can obtain the relation between quotients and generalized Fisher information If we consider the approximation of spacetime points (t, x) and (s, y), the final result (92) satisfies the conclusion of Theorem 2.
Similarly, we can obtain the Jeffreys divergence of Brownian sheet bridge at different space-time points JD P (2) (t, x), P (2) and the generalized Fisher information on space-time variables Further, we can easily get the quotients of the squared difference between Jeffreys divergence and space-time variables and then we can obtain the relation between quotients and generalized Fisher information Without loss of generality, the result (96) also satisfies Theorem 2. Next, we evaluate the Jeffreys divergence between the density functions (80) and (85) for the same space-time points.It should be noted that the Brownian sheet bridge density function is defined on a bounded domain; therefore, we limit our analysis to the space-time region of (t, The Jeffreys divergence between P (1) and P (2) can be easily obtained as JD P (1) (t, x), P (2) and the Fisher divergence as shown in ( 88) is given by with the remainder terms Furthermore, we can obtain the generalized De Bruijn identities Next, we present two categories of significant Fokker-Planck equations and provide pertinent illustrations for computing Jefferys divergence, Fisher information, and Fisher divergence.Theorem 5. Suppose the functions in the Fokker-Planck Equations (43) for the continuous random field X(t, x) are formulated as follows: where b k are continuously differentiable functions independent of u and a continuously differentiable function β(t, x) exists, such that the initial value X(t, x) = 1 as prod(t, x) = 0 and the initial density function is p(u; t, x) = δ(u − 1) as prod(t, x) = 0.Then, the density function is p(u; t, x) = e β(t,x) Proof.Depending on the conditions, it is easy to obtain the Fokker-Planck equations as Take the transformation v = log u or u = e v and note p(v; t, x) = p(u(v); t, x); we can obtain with the solution Recall that a continuously differential function β(t, x) exists such that this enables the derivation of the probability density Remark 1.In the stochastic process theory, a correlation exists between the Fokker-Planck equation and the Itô process.Specifically, if the Itô process is then the corresponding Fokker-Planck equation can be obtained as where µ and σ represent drift and diffusion, B t is the standard Brownian motion, or where W t = dB t dt represents the white noise.Actually, if we consider the Itô processes corresponding to Fokker-Planck equations from Theorem 5, we can obtain where W k represents the space white noise with respect to x k , k = 1, 2, • • • , d. Further, we can also write Equation (113) in vector form where ∇ represents the gradient operator and represents element by element multiplication.Notice that each equation in Equation ( 113) is similar in form to the geometric Brownian motion in the theory of stochastic processes.Similarly, we can call the space-time random field that satisfies Equation (113) a geometric Brownian filed.
If we consider different β 3 (t, x) and β 4 (t, x) in density function (103), we can obtain density functions p (3) (u; t, x) and p (4) (u; t, x); then, we can obtain the Jeffreys divergence and generalized Fisher information and then the quotients and we can easily obtain the relation By implementing the transformation with u = sin v and defining p(v; t, x) = p(u(v; t, x)), the equations can be restructured as with the solution  and density functions are obtained.Moreover, we obtain the Jeffreys divergence of a space-time random field at different space-time positions, and we obtain the approximation of the ratio of Jeffreys divergence to the square of space-time coordinate difference to the generalized Fisher information (54).Additionally, we use the Jeffreys divergence on two space-time random fields from the same type but different parameters Fokker-Planck equations, to obtain generalized De Bruijn identities (61).Finally, we give three examples of Fokker-Planck equations, with their solutions, to calculate the corresponding Jeffreys divergence, generalized Fisher information, and Fisher divergence and obtain the De Bruijn identities.These results encourage further research into the entropy divergence of space-time random fields, which advances the pertinent fields of information entropy, Fisher information, and De Bruijn identities.
2∂v 2 p(v; t, x) + k = 1, 2, • • • , d Recall that a continuously differential function β(t, x) exists such thatdβ(t, x) = b 0 (t, x)dt + b 1 (t, x)dx 1 + • • • + b d (t,x)dx d Similar to the discussion in Remark 1, we can obtain the Itô processes corresponding to the Fokker-Planck equations in Theorem 6(t, x)X(t, x) + b 0 (t, x)[1 − X 2 (t, x)]W t ∂ x k X(t, x) = − 3 2 b k (t, x)X(t, x) + b k (t, x)[1 − X 2 (t, x)]W k k = 1, 2, • • • , d.In fact, this random field can be solved with a sinusoidal transformation, and the corresponding probability density function can be obtained.Although random field (132) has not yet found its application scenario, it gives us ideas for constructing different forms on space-time random fields in the future.