Next Article in Journal
Thermodynamics as Control Theory
Next Article in Special Issue
Prediction Method for Image Coding Quality Based on Differential Information Entropy
Previous Article in Journal
Dynamical Stability and Predictability of Football Players: The Study of One Match
Previous Article in Special Issue
Some Convex Functions Based Measures of Independence and Their Application to Strange Attractor Reconstruction

Entropy 2014, 16(2), 675-698; doi:10.3390/e16020675

Article
New Methods of Entropy-Robust Estimation for Randomized Models under Limited Data
Yuri Popkov 1,2,3,* and Alexey Popkov 1
1
Institute for Systems Analysis of Russian Academy of Sciences, 9 prospect 60-let Octyabrya, Moscow 117312, Russia; E-Mail: apopkov@isa.ru
2
Moscow Institute of Physics and Technology, 9 Institutskiy pereulok, g. Dolgoprudny, Moskovskaya oblast 141700, Russia
3
Higher School of Economics, 20 Myasnitskaya, Moscow 101000, Russia
*
Author to whom correspondence should be addressed; E-Mail: popkov@isa.ru.
Received: 17 October 2013; in revised form: 17 December 2013 / Accepted: 14 January 2014 /
Published: 23 January 2014

Abstract

: The paper presents a new approach to restoration characteristics randomized models under small amounts of input and output data. This approach proceeds from involving randomized static and dynamic models and estimating the probabilistic characteristics of their parameters. We consider static and dynamic models described by Volterra polynomials. The procedures of robust parametric and non-parametric estimation are constructed by exploiting the entropy concept based on the generalized informational Boltzmann’s and Fermi’s entropies.
Keywords:
randomized data models; robustness; entropy function and entropy functional; entropy functional variation; likelihood function and likelihood functional; Volterra polynomials; multiplicative algorithms; symbolic computing

1. Introduction

The problem of useful information retrieval (we comprehend it as parametric and nonparametric estimation based on real data) is a major one in modern science. Different scientific disciplines suggest numerous methods of solving this problem. Each method stems from certain hypotheses regarding the properties of data accumulated during the normal functioning of their source. Among advanced scientific disciplines in this field, we mention mathematical statistics [14], econometrics [57], financial mathematics [8,9], control theory [10,11] and others.

Methods developed within their frameworks rest upon two groups of fundamental hypotheses. The first one relates to models, whereas the other concerns data. Notably, models are supposed to have well-defined parameters (we call them deterministic). Parameter values appear unknown and unmeasurable directly.

The second group of hypotheses applies to data and plays an essential role. In fact, these hypotheses are stated in terms of the statistical properties of data arrays (e.g., a sufficient number of data arrays, a property of a sample from a universal set, normal probability density). In practice, it seems impossible to check such properties and assumptions in concrete problems.

The described situation happens in a class of problems, where the sizes of real data arrays are limited and data incorporate errors [5,6,12].

Consequently, the characteristics (parameters) of a model are estimated by a small amount of incompletely reliable data. We can treat them as random objects. In this case, the estimated characteristics of a model acquire the properties of random variables.

Therefore, one naturally arrives at the idea of considering model parameters as random quantities. This idea transforms the model with deterministic parameters to the model with random parameters. In the sequel, we adopt the term of a randomized model (RM). The characteristics of an RM include the probability density functions (pdfs) of the model parameters. Thus, one should estimate the pdfs of model parameters (not the estimates of their values) based on available data. Having such estimates at one’s disposal, one can apply an RM for:

  • constructing = moment models (MMs), where appropriate moments of random parameters serve as the model parameters;

  • generating an ensemble of random vectors (ERV) of RM “output” with a pdf estimate (by the Monte Carlo method) and performing the statistical processing of the ensemble to form the desired numerical characteristics (including moment ones).

Both directions of RM usage enlarge appreciably the application domains of such models (especially, the ones with a high level of uncertainties). However, a researcher still faces a certain problem. How could the probability density functions of parameters be estimated in randomized models?

This paper proposes involving the informational entropy maximization principle on sets defined by the “input-output” measurements of an RM. The proposal originates from a couple of considerations discussed and formalized below.

The first consideration is connected with the generalized notion of likelihood, viz., with transition from likelihood functions to likelihood functionals and, subsequently, to informational entropy functionals.

The second consideration is based on methodological interpretations of the notion of informational entropy as a measure of uncertainty. Entropy maximization guarantees the best solutions under the maximal uncertainty. This line of reasoning was pioneered in [13]. Informational entropy characterizes uncertainty caused by random parameters of an RM and by measurement noises. The last property of informational entropy ensures the best estimates for maximally uncertain noises (in units of entropy). Hence, the pdf estimates resulting from informational entropy maximization can be viewed as robust. This interpretation varies from the classic conception of robustness suggested in [14].

2. Randomized Models

2.1. Static Objects

Consider a static parameterized object with a measurable “input” described by a matrix, X, of sizes (s × n) and a measurable “output” characterized by a vector, y, of length s. Here, s indicates the amount of input observations, and n stands for the number of the object’s parameters.

The relationship between the “input” and “output” (including measurement errors) is defined by the randomized static model (RSM):

v = F [ X + η , a ] + ξ

We adopt the following notation: F is a given s-dimensional vector-function; adesignates a random n-dimensional vector of some parameters with independent components ai, i = 1 , n ¯, possessing values in the ranges 𝒜 i = [ a i , a i + ], i = 1 , n ¯.

The linear modification of the RSM acquires the form:

v = ( X + η ) a + ξ

Measurements incorporate errors modeled by a matrix, η, of sizes (s × n) (“input” errors) and by a vector, ξ, of length s (“output” errors). The elements η j i ( j = 1 , s ¯ , i = 1 , n ¯ ) and the components ξ j ( j = 1 , s ¯ ) represent independent random variables. Their values lie within the intervals j i = [ η j i , η j i + ] and Ξ j = [ ξ j , ξ j + ], j ∈ [1, s], respectively.

2.2. Dynamic Objects

Consider a discrete dynamic object with a finite “memory”, m. The object’s input, x[k], is measured precisely, while the output, y[k], is measured with an additive noise, ξ[k]. Here, k𝒯 = [m, m + s], and s corresponds to the number of measurements. Suppose that the process, ξ[k], is random with independent values.

The connection between the observed input and output (including output measurement errors) is defined by a randomized dynamic model (RDM). This model is described by the discrete functional Volterra polynomial of degree, R [15]:

v [ k ] = h = 1 R ( n 1 , , n h ) = 0 m ( w ( h ) [ n 1 , , n h ] r = 1 h x [ k n r ] ) + ξ [ k ]

Equality Equation (3) employs the weight functions (actually, impulse responses) w(h)[n1, . . . , nh]. These functions are random with independent ordinates belonging to the intervals:

𝒲 ( h ) = [ β ( h ) exp ( α ( h ) ( n 1 + + n h ) ) , β + ( h ) exp ( α ( h ) ( n 1 + + n h ) ) ]
where β ± ( h ), α ± ( h ) correspond to given constants.

The discrete dynamic model has the nonlinear expression Equation (3). Nevertheless, it can be linearized by the lexicographic ordering of the variables {n1, . . . , nh}. Renumber the resulting sets from zero to th = (m + 1)h − 1 according to a lexicographic rule. Introduce a local index, i(h) ∈ [0, th], and a set, such that:

{ n 1 , , n h } i ( h ) , i ( h ) [ 0 , t h ]

According to the accepted numbering, adopt the following indexing of random parameters that correspond to the values of weight functions in Equation (3):

a i ( h ) ( h ) w ( h ) [ n 1 , , n h ] , i ( h ) [ 0 , t h ]

Construct the random vector:

a ( h ) = { a 0 ( h ) , , a t h ( h ) }

Components of the vector, a(h), belong to the ranges:

𝒜 ( h ) ( i ( h ) ) = [ a ( h ) ( i ( h ) ) , a + ( h ) ( i ( h ) ) ]

By virtue of Equation (4), we have:

a ( h ) ( i ( h ) ) = β ( h ) exp [ i ( h ) α ( h ) ] a + ( h ) ( i ( h ) ) = β + ( h ) exp [ i ( h ) α + ( h ) ]

Similarly to Equation (5), introduce the lexicographic ordering of the variables {kn1, . . . , knh}, where k indicates a fixed parameter and indexes n1, . . . , nh possess values in the interval [0, m]. For each fixed k, renumber the resulting sets according to Equation (5):

{ k n 1 , , k n h } ( k , i ( h ) )

Consequently, for fixed k in formula Equation (3), the lexicographically ordered products of the variables, x[knr], form the vector:

x ( h ) [ k ] = { x k , 0 ( h ) , , x k , t h ( h ) }
where:
x ( h ) [ k ] = x ( h ) [ m + k ] , k [ 0 , s ]

Therefore, equality Equation (3) can be rewritten as:

v [ k ] = h = 1 R a ( h ) , x ( h ) [ k ] + ξ [ k ] , v [ k ] = v [ m + k ] , ξ [ k ] = ξ [ m + k ] ; k [ 0 , s ]

Consider the interval 𝒯 = [m, m + s], which corresponds to the measurements of the RDM input and output at instants m + k, k ∈ [0, s]. The observed output and noise are characterized by the vectors:

v = { v [ 0 ] , , v [ s ] } , ξ = { ξ [ 0 ] , , ξ [ s ] }

Define the input measurement matrices, whose rows are formed from vector Equation (10):

X ( h ) = [ x ( h ) [ k ] , k [ 0 , s ] ] , h [ 1 , R ]

Build the block matrix, X, of sizes [(s + 1) × u], where u = h = 1 R ( t h + 1 ):

X = [ X ( 1 ) , , X ( R ) ]

Finally, construct the block random vector of model parameters (of length u):

a = { a ( 1 ) , , a ( R ) }

Thus, dynamic model Equation (3) belongs to the class of randomized models with the random parameters, a, and the output noise, ξ. It can be reduced to the linear form:

v = X a + ξ

Structurally, this expression is analogous to linear static data model Equation (2). However, the input matrix embraces the nonlinearity and dynamics of model Equation (14).

3. Probabilistic Characteristics of RMs

Numerical characteristics of RMs are understood as the probability density functions (alternatively, probability functions) of model parameters and noise components.

In the sequel, we believe that RMs whose random components are described by probability density functions belong to the class RM-PWQ. By analogy, RMs described by the probabilities of lying within appropriate intervals will form the class RM-pwq.

3.1. RM-PWQ

The parameters of an RDM from this class and the measurement noises represent continuous random variables. They take values in intervals, where there exist probability density functions (pdfs):

  • -for the random parameters of an RSM:

    P ( a ) = i = 1 n p i ( a i ) , a i 𝒜 i

  • - for the random parameters of an RDM:

    P ( a ) = h = 1 R i = 1 t h p i ( h ) ( a i ( h ) )

  • - for the “input” measurement noises:

    W ( η ) = j = 1 s i = 1 n w j i ( η j i ) , η j i j i

  • - for the “output” measurement noises:

    Q ( ξ ) = j = 1 s q i ( ξ j ) , ξ j Ξ j

The above-mentioned pdfs of the random parameters and noises should be estimated using measurements of the RM “input” and “output” and a priori information on the pdfs. Formalization of such information involves the a priori pdfs of the parameters and noises, (P0(a), W0(η) and Q0(ξ)), as well as the classes of pdfs.

An RM generates an ensemble, 𝒱, of the random vectors, v, Equations (2) and (14). Measurements make up the vector, y, with measured components. Therefore, the estimation of pdfs lies in the forming of the vectors of the appropriate numerical characteristics of this ensemble. We employ moments of random components of the vector v:

m ( k ) = { ( v 1 ( k ) ) , , ( v s ( ) ) }
where k denotes the order of the moment,
( v j ( k ) ) = a 𝒜 , η , ξ Ξ j ( F j [ X + η , a ] + ξ j ) k P ( a ) W ( η ) Q ( ξ ) d a d η d ξ , j = 1 , s ¯

This paper utilizes the first moments, viz., the average values of components from the vector, v. For static model RSM-PWQ Equation (1), we obtain:

( v ) = v ¯ = ( a 𝒜 , η , ξ ξ ) ( F [ X + η , a ] + ξ ) P ( a ) W ( η ) Q ( ξ ) d a d η d ξ

For the dynamic model RM-PQ Equation (14), we similarly have:

( v ) = v ¯ = X a 𝒜 a P ( a ) d a + ξ Ξ ξ Q ( ξ ) d ξ

3.2. RM-pwq

The parameters of an RM from this class and the measurement noises turn out continuous random variables. Their belonging to an appropriate interval is characterized by some probability.

The parameters a1, . . . , an possess values within the intervals 𝒜1, . . . , 𝒜n with the probabilities p1, . . . , pn, respectively. Actually, the probabilities pi ∈ [0, 1], i = 1 , n ¯. By analogy, the elements, ηji, of the “input” measurement noise matrix take values from the intervals, ji, with the probabilities, wji ∈ [0, 1], respectively. Finally, the components, ξj, of the “output” measurement noises lie inside the intervals, Ξj, with the probabilities, qj ∈ [0, 1], respectively. Denote by p i 0, w j i 0, q j 0 the a priori values of the listed probabilities ( j = 1 , s ¯, i = 1 , n ¯).

Just like models from the class RM-PWQ, RMs belonging to the class considered to reproduce an ensemble, 𝒱, of the random vectors, v. This is done by generating the random parameters with the probabilities, p, the elements of the “input” measurement noise matrix with the probabilities, W, and the components of the “output” measurement noises with the probabilities, q.

As a numerical characteristic for the ensemble generated by an RM of this class, select the vector of the first quasi-moments (see [5]):

a = a + L a p , η = η + L η W , ξ = ξ + L ξ q

In the previous formula, ⊗ stands for element-wise multiplication,

L a = diag [ ( a i + a i ) | i = 1 , n ¯ ] ; L ξ = diag [ ( ξ j + ξ j ) | j = 1 , s ¯ ] L η = [ ( η j i + η j i ) | i = 1 , n ¯ , j = 1 , s ¯ ]

The transform Equation (23) can be interpreted as a substitution of the random parameters and noises by their “quasi-average” values.

By applying Equation (23) to Equation (1), we obtain the following expression for the first quasi-moment of the random vector, v, for RSM Equation (1):

v ˜ = F [ X + ( η + L η W ) , a + L a p ] + ξ + L ξ q

In the case of RDMs, the first quasi-moment formula of the random vector, v, Equation (10), acquires the form:

v ˜ = X a + L ap + ξ + L ξ q

Concluding this section, we emphasize a relevant aspect. For the class RM-PWQ, it is necessary to estimate probability density functions. For the class RM-pwq, one has to estimate vectors characterizing probability distributions.

4. Principles of Entropy-Robust Estimation

We propose to introduce a likelihood functional for the estimation of probability density functions for RM-PWQ (RM-PQ). So long as the model parameters and measurement noises are independent, their joint distribution function takes the form:

Φ ( a , η , ξ ) = P ( a ) W ( η ) Q ( ξ )

Suppose that we know the a priori density functions, P0(a), W0(η), Q0(ξ).

Following [4], specify the log-likelihood ratio (LLR) by:

φ ( a , η , ξ ) = ln P ( a ) P 0 ( a ) + ln W ( η ) W 0 ( η ) + ln Q ( ξ ) Q 0 ( ξ )

Hence, the LLR represents a nonrandom function of random arguments.

The probabilities, p, W, q, of certain events (e.g., entering appropriate intervals by the parameters, “input” and “output” noises) being estimated, one should define the LLR by:

φ ( p , W , q ) = ln p p 0 + ln W W 0 + ln q q 0
where p0, W0 and q0 are a priori probabilities.

Now, introduce the likelihood functional (the generalized informational Boltzmann entropy functional), (see Equation (29)), in the form:

[ P ( a , W ( η ) , Q ( ξ ) ] = [ P ( a ) , W ( η ) , Q ( ξ ) ] = = a 𝒜 P ( a ) ln P ( a ) P 0 ( a ) d a + η W ( η ) ln W ( η ) W 0 ( η ) d η + + ξ Ξ Q ( ξ ) ln Q ( ξ ) Q 0 ( ξ ) d ξ

Obviously (see Equation (28)), the likelihood functional with the minus mark represents the generalized Boltzmann entropy functional. This function has numerous interpretations. In particular, it is treated as the “distance” between pdfs [16], as an uncertainty measure [12,13,17] or as a robustness measure [6] (the degree of invariance of the pdfs P(a), W(η), Q(ξ), with respect to observations). In this paper, we involve the last interpretation of the entropy functional to construct estimates of the pdfs.

Define the likelihood function (the generalized informational Boltzmann entropy function) by (see [18]):

H ( p , W , q ) = i = 1 n p i ln p i p i 0 j = 1 s i = 1 n w j i ln w j i w j i 0 j = 1 s q j ln q j q j 0

The estimation quality for pdfs or probability vector components is characterized by the maximal value of the generalized informational Boltzmann entropy functional (or function). In the sequel, we distinguish between two types of estimates.

  • The 𝒮 PWQ k -robust entropy estimate (the 𝒮 P Q k-robust entropy estimate) of the pdfs of the parameters and measurement noises results from solving the problem:

    [ P ( a , W ( η ) , Q ( ξ ) ] max
    provided that:
    • the probability density functions belong to the class:

      𝒮 = 𝒫 𝒲 𝒬
      where 𝒫, 𝒲 and 𝒬 denote the classes of the pdfs of the parameters, “input” and “output” noises, respectively;

    • the balance condition holds true for the degree (1/k) of the k-th moment, v, and the measured vector, y. For RSM-PWQ, this condition becomes:

      ( { F ( k ) [ ( X + η ) , a ] + ξ ( k ) } ) ( 1 / k ) = y
      For RDM-PQ, the balance condition takes the form:
      ( { X a + ξ } ) ( 1 / k ) = y

  • The 𝒮 pwq 1 ˜ -robust entropy estimate of the probability distributions (pds) of the model parameters and measurement noise components follow from solving the problem:

    H ( p , W , q ) max
    provided that:
    • the probability distributions belong to the class:

      𝕊 = 𝕎
      where ℙ, 𝕎 and ℚ designate the classes of the pds of the parameters, “input” and “output” noises, respectively;

    • the balance condition holds true for the quasi-average vector, ṽ, and the measured vector, y:

      F [ X + ( η + L η W ) , ( a + L a p ) ] + + ξ + L ξ q = y

5. Structural Properties of 𝒮 P Q 1-Estimates

5.1. Power-Type RSM-PQ

Consider the subclass of power-type RSM-PQ with the following nonlinearities:

v ( t ) = h = 1 R i = 1 n a i h x i ( h ) ( t ) + ξ ( t )

The vector of parameters, a, turns out to be random with independent components (a1, . . . , an) possessing values in the intervals 𝒜 i = [ a i , a i + ], ( i = 1 , n ¯) with the pdfs p1(a1), . . . , pn(an).

The “input” and “output” are observed at instants t1, . . . , ts. The measured “input” is characterized by the set of R matrices:

X ( h ) = ( x 1 ( h ) ( t 1 ) x n ( h ) ( t 1 ) x 1 ( h ) ( t s ) x n ( h ) ( t s ) ) = ( x 11 ( h ) x 1 , n ( h ) x s , 1 ( h ) x s , n ( h ) ) , h [ 1 , R ]

Additionally, the observed “output” of the RM is characterized by the random vector v = {v(t1), . . . , v(ts)}.

Therefore, using observation results, we rewrite the RSM-PQ as:

v = h = 1 R X ( h ) a ( h ) + ξ

Here, a ( h ) = { a 1 ( h ) , , a n h }, ξ = {ξ(t1), . . . , ξ(ts)} = {ξ1, . . . , ξs} denotes the “output” measurement noise vector. It has independent components belonging to the intervals Ξ j = [ ξ j , ξ j + ], ( j = 1 , s ¯) with the pdfs q1(ξ1), . . . , qs(ξs). Suppose that a priori information is absent, i.e., P0(a) = Q0(ξ) = const.

To proceed, we analyze the 𝒮 P Q 1-robust entropy estimate of the pdfs P(a) = {p1(a1), . . . , pn(an)} and Q(ξ) = {q1(ξ1), . . . , qs(ξs)}. Moreover, we study some structural properties of the estimate.

Revert to problem Equations (32)(34). Reexpress these as:

[ P ( a ) , Q ( ξ ) ] = i = 1 n a i 𝒜 i p i ( a i ) ln p i ( a i ) d a i j = 1 s ξ j Ξ j q i ( ξ j ) ln q j ( ξ j ) d ξ j . max
subject to the constraints:
D i [ p i ( a i ) ] = 1 a i 𝒜 i p i ( a i ) d a i = 0 , i = 1 , n ¯ T j [ q j ( ξ j ) ] = 1 ξ j Ξ j q j ( ξ j ) d ξ j = 0 , j = 1 , s ¯
Φ j [ p ( a ) , q ( ξ ) ] = h = 1 R i = 1 n x j i ( h ) a i 𝒜 i a i h p i ( a i ) d a i + + ξ j Ξ j ξ j q j ( ξ j ) d ξ j y j = 0 , j = 1 , s ¯

The structural properties of the 𝒮 P Q 1-estimate are understood as a certain class of the robust family of probability density functions of the model parameters and noise. The procedure of 𝒮 P Q 1-estimation consists in solving problem Equations (32)(34) for RSM-PWQ (alternatively, problem Equations (32), (33) and (35) for RDM-PQ). Both problems belong to the class of functional entropy-linear programming problems with equality constraints. The standard approach to such variational problems (see [19]) naturally brings the following result. The entropy-optimal pdfs (in the class of continuously differentiable functions) of the parameters and noise components are defined by:

p i * ( a i ) ~ π i exp ( h = 1 R α i h a i h ) , i [ 1 , n ] q j * ( ξ j ) ~ κ j exp ( θ j ξ j ) , j = [ 1 , s ] )
where:
α i h = j = 1 s θ j x ( h ) j i , π i = exp ( 1 γ i ) ; κ j = exp ( 1 ω j )

In the last formulas, the Lagrange multipliers, γi, ωj, correspond to constraint Equation (43), while the ones of θj correspond to constraint Equation (44).

Figures 1 and 4 demonstrate some examples of estimating the pdfs of the parameters. For a linear RM, the 𝒮 P Q 1-estimate always represents an exponential function (see Figure 1). The “input” and “output” measurements do not affect the structure of the estimate (yet, they change the shape of the functions).

For nonlinear RMs, the structure of the 𝒮 P Q 1-estimates varies depending on the “input” and “output” measurements. For instance, Figures 2 and 4 show the estimates of the pdfs for a quadratic, quadratic-linear and cubic RM-PQ.

5.2. Power-Type RM-PQ

Address the linear form of RM-PQ Equation (26). In this case, the problem of 𝒮 P Q 1-estimation of the pdfs for parameter Equation (16) and noise Equation (18) acquires the form:

[ P ( a ) , Q ( ξ ) ] = a 𝒜 P ( a ) ln P ( a ) d a ξ X i Q ( ξ ) ln Q ( ξ ) d ξ max
subject to the constraints:
D [ P ( a ) ] = 1 a 𝒜 P ( a ) d a = 0 T [ Q ( ξ ) ] = 1 ξ Ξ Q ( ξ ) d ξ = 0
Φ ¯ [ P ( a ) , Q ( ξ ) ] = X a 𝒜 a P ( a ) d a + + ξ Ξ ξ Q ( ξ ) d ξ y = 0

The vector, Φ̄[P(a), Q(ξ)], has the length of (s + 1) (the number of measurements).

Problem Equations (47)(49) represent a modification of functional entropy-linear programming problem Equations (42)(44), which involves the linear data model. To obtain the solution, again, adopt the standard technique for such variational problems [19]. This yields the following entropy-optimal pdfs of the parameters and noises:

P ˜ ( a ) = exp ( 1 γ θ , X a ) , Q ˜ ( ξ ) = exp ( 1 ω θ ξ )

Here, γ, ω designate the Lagrange multipliers associated with constraint Equation (48). In addition, the vector, θ (whose length equals (s + 1)), comprises the Lagrange multipliers for constraint Equation (49).

Formula (50) indicates that the entropy-optimal pdfs of the parameters and noises belong to the exponential class, with the parameters representing the Lagrange multipliers. The latter quantities meet a system of equations generated by constraint Equations (48),(49).

6. Normalized and Interval 𝒮 p q 1 ˜-Estimates for Linear RSM-pq

Consider the subclass of linear RSM-pq with accurate “input” measurements:

v ˜ = X L a p + L ξ q + K ( a ξ )
where:
K ( a , ξ ) = X a + ξ

Denote by p0 and q0 the a priori probabilities of the parameters and noises, respectively.

Let us study the 𝒮 p q 1 ˜-estimates for some classes of the vectors, p, q.

Problem 1. Normalized probabilities: the 𝒮 p q 1 ˜-estimate results from solving the problem:

H ( p , q = i = 1 n p i ln p i p i 0 j = 1 s q j ln q j q j 0 max
subject to the probability normalization conditions:
i = 1 n p i = 1 , j = 1 s q j = 1
and the balance conditions for the real observations, yj, with the first quasi-moment of the output of RSM-pq:
i = 1 n x j i L a i p i + L ξ j q j + K j = y j , j [ 1 , s ] , s < n

This problem belongs to the class of entropy-linear programming problems [20]. Its solution employs the necessary conditions of optimality in terms of the Lagrange function:

L ( p , q , λ , ε , θ ) = H ( p , q ) + λ ( 1 i = 1 n p i ) + ε ( 1 j = 1 s q j ) + j = 1 s θ j [ y j i = 1 n x j i L a i p i L ξ j q j K j ]

Consequently, we derive a system of equations determining the entropy-optimal probabilities, p*, q*:

0 p i * ( θ ) = p i 0 exp ( j = 1 s θ j x j i L a i ) i = 1 n p i 0 exp ( j = 1 s θ j x j i L a i ) 1 , i = 1 , n ¯ 0 q j * ( θ ) = q j 0 exp ( θ j L ξ j ) j = 1 s q j 0 exp ( θ j L ξ j ) 1 , j = 1 , s ¯
and the corresponding Lagrange multipliers θ1, . . . , θs:
Φ j ( θ ) 1 y j K j i = 1 n x j i L a i p i * ( θ ) + L ξ j q j * ( θ ) = 1 , j = 1 , s ¯

Problem 2. The interval probabilities: the 𝒮 p q 1 ˜-estimate satisfies the following maximization problem for the generalized informational Fermi–Dirac entropy. The application of the generalized informational Fermi–Dirac entropy provides estimates lying in appropriate intervals [18]:

H ( p , q ) = i = 1 n ( p i ln p i φ i 0 + ( 1 p i ) ln ( 1 p i ) ) j = 1 s ( q j ln q j ϕ j 0 + ( 1 q j ) ln ( 1 q j ) ) max
subject to the interval probability constraints:
0 p i 1 , i = 1 , n ¯ ; 0 q j 1 , j [ 1 , s ] , s < n
and the balance conditions for the real observations, yj, with the first quasi-moment of the output of RM-pq:
i = 1 n x j i L a i p i + L ξ j q j + K j = y j , j = 1 , s ¯
where:
φ i 0 = p i 0 1 p i 0 , ϕ j 0 = q j 0 1 q j 0

Lagrange’s method of multipliers gives the following system of equations for the estimates of the probabilities, p, q:

0 p i * ( θ ) = p i 0 p 0 + ( 1 p i 0 ) exp ( j = 1 s θ j x j i L a i ) 1 , i = 1 , n ¯ 0 q j * ( θ ) = q j 0 q j 0 + ( 1 q j 0 ) exp ( θ j L ξ j ) 1 , j = 1 , s ¯
and the Lagrange multipliers θ1, . . . , θs:
Φ j ( θ ) = 1 y j K j i = 1 n θ j x j i p i * ( θ ) + L ξ j q j * ( θ ) = 1 , j = 1 , s ¯

So long as the estimates of p and q are expressed analytically through the Lagrange multipliers, θ, one should only solve the system of Equation (64). For this, it is possible to use the following multiplicative algorithm [20] for the exponential Lagrange multipliers zj = exp(−θj), j ∈ [1, s]:

z j k + 1 = z j k Φ j ( z k ) , , z j 0 > 0 , j [ 1 , s ]

Under certain conditions, the estimates generated by Problems 1–2 differ in the values of some entropy (e.g., generalized Boltzmann entropy Equation (53)). Let us introduce the following notation:

  • the entropy:

    H ( p , q ) = H ( g ) , g = ( p , q ) R + n + s

  • the estimate, g 1 *, obtained for normalized probabilities (Problem 1), and the estimate, g 2 *, obtained for interval probabilities (Problem 2);

  • the absolute maximum of the entropy ĝ = arg max H(g);

  • the set:

    Π = { g : 0 g 1 } Π ˜ = { g : p , 1 , q , 1 }

Theorem Assume that g ^ ( R + ( n + s ) \ Π ˜ ).

Then:

H ( g 1 * ) H ( g 2 * )
and the equality takes place iff g 1 * = g ^.

Proof. Entropy Equation (53) represents a strictly concave function with a unique absolute maximum at the point, ĝ. The value of the entropy at an arbitrary point, g, depends on the distance to the absolute maximum point. Denote by ϱ ( g ^ , g 1 * ) and ϱ ( g ^ , g 2 * ) the distances between the absolute maximum point, ĝ, and the points corresponding to the optimal estimates in Problems (53)(55) and (59)–(62), respectively. By virtue of the premise above, expression Equation (67) and strict concavity, we have:

ϱ ( g ^ , g 1 * ) ϱ ( g ^ , g 2 * )

These distances coincide only if g 1 * = g ^.

7. Examples

7.1. Entropy Estimation of the Parameters of Linear RSM-pq under a Small Amount of Data

Consider a linear RM with five random parameter Equations (51) belonging to the same intervals 𝒜 = [0, 10] and with the noise vector ξ = {ξ1, ξ2}, whose components lie within the intervals Ξ1 = [−3, 3] and Ξ2 = [−6, 6]. There are two “output” measurements, y = {y1, y2}.

The reference model has fixed parameters a0 = {1, 2, 2, 4, 1}. The deviation from these values is described by the relative square error:

ε = a 0 a a 0 + a

The “input” measurement matrix makes up:

X = ( 1.8 2.1 3.3 2.0 1.5 4.1 3.8 3.0 2.8 1.9 )

The “output” measurement vector (distorted by noises from appropriate intervals) takes the form:

y = { 21.1 ; 32.8 }

The first quasi-moments for the parameters and noise are defined by:

a i = 10 p i , i = 1 , 5 ¯ ; ξ 1 = 3 + 6 q 1 , ξ 2 = 6 + 12 q 2

Substituting the last equalities into Equation (51) yields the following RM-pq:

T p + L q = 1
with the matrix:
T = X L a = ( 0.75 0.87 1.37 0.83 0.62 1.06 0.98 0.77 0.72 0.45 )
the matrix:
L = L ξ = ( 0.25 0 0 0.31 )
and the vector 1 = {1, 1}.

As a priori information, we have selected three scenarios for the a priori probabilities (see Tables 1 and 2).

Scenario AE corresponds to uniform a priori distributions of the parameters and noise. Next, Scenario BD reflects nonuniform a priori distributions of the parameters and noise. Finally, CE is the combined scenario (the parameters possess nonidentical a priori distributions, whereas the noise has uniform a priori distributions).

We study the 𝒮 p q 1 ˜-estimates corresponding to Problems 1–2.

Problem 1. Normalized probabilities.

H ( p , q ) = i = 1 5 ( p i ln p i p i 0 ) j = 1 2 ( q j ln q j q j 0 ) max i = 1 5 p i = 1 , p i > 0 , j = 1 2 q j = 1 , q j > 0 0.75 p 1 + 0.87 p 2 + 1.37 p 3 + 0.83 p 4 + 0.62 p 5 + 0.25 q 1 = 1 1.06 q 1 + 0.98 p 2 + 0.77 p 3 + 0.72 p 4 + 0.45 p 5 + 0.31 q 1 = 1

Problem 2. Interval probabilities.

H ( p , q ) = i = 1 5 ( p i ln p i p i 0 ) j = 1 2 ( q j ln q j q j 0 ) max 0 p 1 , 0 q 1 0.75 p 1 + 0.87 p 2 + 1.37 p 3 + 0.83 p 4 + 0.62 p 5 + 0.25 q 1 = 1 1.06 p 1 + 0.98 p 2 + 0.77 p 3 + 0.72 p 4 + 0.45 p 5 + 0.31 q 1 = 1

The entropy, H, attains the absolute maximum at the point p ^ i = 0.36 p i 0; q ^ j = 0.36 q j 0; i ∈ [1, 5], j ∈ [1, 2]. Tables 3 and 4 present the coordinates of the absolute maximum point for the above scenarios of the a priori probabilities (see Tables 1 and 2).

Solutions to Problem 1 can be found in Tables 5 and 6.

Solutions to Problem 2 are shown by Tables 7 and 8.

The comparative analysis of these computations draws the following conclusions:

  • the estimates of the parameters derived for interval probabilities have a larger constrained maximum of the entropy than the ones derived for normalized probabilities (see Theorem 2);

  • the reference parameters and a priori probabilities appear interconnected: their “successful” choice (Scenario BD) leads to “better” approximation of the reference parameters in terms of the relative squared error than in the case of an “unsuccessful” choice (Scenario CD).

7.2. The Entropy Estimation of the pdfof Second-Order RDM-PQ Characteristics

Consider the following RDM-PQ:

v [ k ] = n 1 = 0 2 w ( 1 ) [ n 1 ] x [ k n 1 ] + + n 1 , n 2 = 0 2 w ( 2 ) [ n 1 , n 2 ] x [ k n 1 ] x [ k n 2 ] + ξ [ k ] , k 2
with the random impulse response:
w ( 1 ) [ n 1 ] = w ( 2 ) [ n 1 , n 2 ] = 0 , if any of   indexes ( n 1 , n 2 ) > 2 w 2 [ n 1 , n 2 ] = w 2 [ n 2 , n 1 ]

Therefore, the dynamic RM-PQ in question incorporates nine parameters:

a 0 = a 0 ( 1 ) = w ( 1 ) [ 0 ] , a 1 = a 1 ( 1 ) = w ( 1 ) [ 1 ] , a 2 = a 2 ( 1 ) = w ( 1 ) [ 2 ] a 3 = a 0 ( 2 ) = w ( 2 ) [ 0 , 0 ] , a 4 = a 1 ( 2 ) = w ( 2 ) [ 0 , 1 ] + w ( 2 ) [ 1 , 0 ] a 5 = a 2 ( 2 ) = w ( 2 ) [ 0 , 2 ] + w ( 2 ) [ 2 , 0 ] , a 6 = a 3 ( 2 ) = w ( 2 ) [ 1 , 1 ] a 7 = a 4 ( 2 ) = w ( 2 ) [ 1 , 2 ] + w ( 2 ) [ 2 , 1 ] , a 8 = a 5 ( 2 ) = w ( 2 ) [ 2 , 2 ]

The values of constants in Equation (8) are combined in Table 9.

Table 10 demonstrates interval Equation (6) covering the components of the vector, a.

We have two measurements of the “input” and “output.” Construct the blocks X(1), X(2) Equation (11):

X ( 1 ) = ( x [ 2 ] x [ 1 ] x [ 0 ] x [ 3 ] x [ 2 ] x [ 1 ] ) = ( x 00 x 01 x 02 x 10 x 11 x 12 ) X ( 2 ) = ( x 2 [ 2 ] x [ 2 ] x [ 1 ] x [ 2 ] x [ 0 ] x 2 [ 1 ] x [ 1 ] x [ 0 ] x 2 [ 0 ] x 2 [ 3 ] x [ 3 ] x [ 2 ] x [ 3 ] x [ 1 ] x 2 [ 2 ] x [ 2 ] x [ 1 ] x 2 [ 1 ] ) = = ( x 00 x 01 x 02 x 03 x 04 x 05 x 10 x 11 x 12 x 13 x 14 x 15 )

The matrix X = [X(1), X(2)] has the form:

X = ( 3.9 1.9 2.8 0.9 1.6 5.2 3.6 1.9 4.2 9.3 3.9 1.9 3.8 8.5 4.9 0.9 1.6 2.6 )

The noise components become ξ[0] = ξ0 ∈ Ξ0 = [−3, 3]; ξ[1] = ξ1 ∈ Ξ1 = [−6, 6]. Additionally, the measured values of the “output” are given by y[2] = y0 = 18.51, y[3] = y1 = 43.36.

The 𝒮 P Q 1-estimate of the pdf of the parameters, p j * ( a j ), j ∈ [0, 8], and the pdfs of the noise, q 0 * ( ξ 0 ), q 1 * ( ξ 1 ), is defined by:

p j * ( a j ) = exp [ 1 γ j l = 0 1 θ l x l j a j ] , j [ 0 , 8 ] q i * ( ξ i ) = exp [ 1 ω i θ i ξ i ] , i [ 0 , 1 ]

The Lagrange multipliers, αj, βi and θl, meet the following equations:

Γ j ( γ , θ ) = a j a j + exp [ 1 γ j l = 0 1 θ l x l j a j ] d a j = 1 , j [ 0 , 8 ] Ω i ( ω , θ ) = ξ i ξ i + exp [ 1 ω i θ i ξ i ] d ξ i = 1 , i [ 0 , 1 ]
Φ i ( γ , ω , θ ) = j = 0 8 x i j a j a j + a j exp [ 1 γ j l = 0 1 θ l x l j a j ] d a j + + ξ i ξ i + ξ i exp [ 1 ω i θ i ξ i ] d ξ i = y i , i [ 0 , 1 ] , j [ 0 , 8 ]

To solve these equations, we have used MATLAB for symbolic transformations and the numerical solution of nonlinear equations, known as the “trust-region dogleg” technique. The computed Lagrange multipliers form Tables 11 and 13.

Figure 5 shows the curves of the probability density functions of the parameters, pj(a(1)j), j ∈ [0, 2] (curves 0,1,2) and pj(a(2)j), j ∈ [3, 8] (curves 3–8). Finally, Figure 6 presents the curves of the probability density functions of the noise, q0(ξ0), q1(ξ1).

8. Conclusions

New methods of parametric (probabilities) and non-parametric (probability density functions) estimation of the randomized model characteristics are proposed. These methods are based on entropy functions or entropy functionals maximized under certain constraints. We can interpret the obtained estimate as a robust one, as the entropy function was used for its calculation. These methods are focused on the problems where data is limited and distorted by noises. It is shown that the entropy-robust estimation of the probabilities and probability density functions belong to the exponential class.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Cox, D.R.; Donnelly, C.A. Principles of Applied Statistics; Cambridge University Press: New York, NY, USA, 2011. [Google Scholar]
  2. Kendall, M.G.; Stuart, A. The Advanced Theory of Statistics: Inference and Relationship; Griffin & Co: London, UK, 1961. Volume 2.. [Google Scholar]
  3. Harrell, F.F., Jr. Regression Modeling Strategies: With Application to Linear Models, Logistic Regrassion, and Survival Analysis; Springer Series; Springer-Verlag Inc: New York, NY, USA, 2001. [Google Scholar]
  4. Cramer, H. Mathematical Methods of Statistics (PMS-9); Vol. 9. Princeton University Press: Princeton, NJ, USA, 1999. [Google Scholar]
  5. Golan, A.; Judge, G.; Miller, D. Maximum Entropy Econometrics: Robust Estimation with Limited Data; John Wiley & Sons: New York, NY, USA, 1996. [Google Scholar]
  6. Golan, A. Information and entropy econometrics—A review and synthesis. Found. Trends Econometr 2006, 2, 1–145. [Google Scholar]
  7. Racine, J.; Maasoumi, E. A versatile and robust metric entropy test of time-reversibility, and other hypotheses. J. Econometr 2007, 138, 547–567. [Google Scholar]
  8. Shiryaev, A.N. Essentials of Stochastic Finance: Facts, Models, Theory; World Scientific Publishing Co Pte. Ltd: River Edge, NJ, USA, 2000. [Google Scholar]
  9. Del Ruiz, M.C. A new approach to measure volatility in energy markets. Entropy 2012, 14, 74–91. [Google Scholar]
  10. Polyak, B.T. Robustness Analysis Multilinear Perturbations. In Robustness of Dynamic Systems with Parameter Uncertainties; Mansur, M., Balemi, S., Trueol, W., Eds.; Birkhauser: Basel, Switzerland, 1992; pp. 93–104. [Google Scholar]
  11. Lebiedz, D. Entropy-related extremum principles for model reduction of dissipative dynamical systems. Entropy 2010, 12, 706–719. [Google Scholar]
  12. Gupta, M.; Srivastava, S. Parametricbayesian estimation of differential entropy and relative entropy. Entropy 2010, 12, 818–843. [Google Scholar]
  13. Jaynes, E.T. Information theory and statistical mechanics. Phys. Rev 1957, 106, 620–630. [Google Scholar]
  14. Huber, P.J. Robust Statistics; John Willey & Sons: New York, NY, USA, 1984. [Google Scholar]
  15. Tsypkin, Y.Z.; Popkov, Y.S. Theory of Nonlinear Discrete Systems; (in Russian);. “Nauka” Publisher: Moscow, Russia, 1973. [Google Scholar]
  16. Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat 1951, 22, 79–86. [Google Scholar]
  17. Shannon, C. Communication theory of secrecy systems. Bell Syst. Tech. J 1949, 28, 656–715. [Google Scholar]
  18. Popkov, Y.S. Macrosystems Theory and its Applications: Equilibrium Models, Lecture Notes in Control and Information Sciences; Springer: London, UK, 1995. Volume 203. [Google Scholar]
  19. Gelfand, I.M.; Fomin, S.V. Calculus of Variations; Courier Dover Publications, Inc: New York, NY, USA, 2000. [Google Scholar]
  20. Popkov, Y.S. New class of multiplicative algorithms for solving of entropy-linear programms. Eur. J. Oper. Res 2006, 174, 1368–1379. [Google Scholar]
Entropy 16 00675f1 1024
Figure 1. p(a1, a2) = exp(−0, 3a1 + 2, 5a2).

Click here to enlarge figure

Figure 1. p(a1, a2) = exp(−0, 3a1 + 2, 5a2).
Entropy 16 00675f1 1024
Entropy 16 00675f2 1024
Figure 2. p ( a 1 , a 2 ) = exp ( 4 a 1 a 1 2 + a 2 ).

Click here to enlarge figure

Figure 2. p ( a 1 , a 2 ) = exp ( 4 a 1 a 1 2 + a 2 ).
Entropy 16 00675f2 1024
Entropy 16 00675f3 1024
Figure 3. p ( a 1 , a 2 ) = exp ( 4 a 1 a 1 2 + 6 a 2 a 2 2 ).

Click here to enlarge figure

Figure 3. p ( a 1 , a 2 ) = exp ( 4 a 1 a 1 2 + 6 a 2 a 2 2 ).
Entropy 16 00675f3 1024
Entropy 16 00675f4 1024
Figure 4. p ( a 1 , a 2 ) = exp ( 0.5 a 1 + 3.5 a 2 + 2 a 1 2 a 1 3 0.2 a 2 2 0.2 a 2 3 ).

Click here to enlarge figure

Figure 4. p ( a 1 , a 2 ) = exp ( 0.5 a 1 + 3.5 a 2 + 2 a 1 2 a 1 3 0.2 a 2 2 0.2 a 2 3 ).
Entropy 16 00675f4 1024
Entropy 16 00675f5 1024
Figure 5. PDFs of the parameters.

Click here to enlarge figure

Figure 5. PDFs of the parameters.
Entropy 16 00675f5 1024
Entropy 16 00675f6 1024
Figure 6. PDFs of the noise.

Click here to enlarge figure

Figure 6. PDFs of the noise.
Entropy 16 00675f6 1024
Table Table 1. The a priori probabilities, p i 0, i ∈ [1, 5].

Click here to display table

Table 1. The a priori probabilities, p i 0, i ∈ [1, 5].
Scenario/i12345
A1.01.01.01.01.0
B0.10.20.30.30.1
C0.30.40.10.050.15
Table Table 2. The a priori probabilities, q 1 0, q 2 0.

Click here to display table

Table 2. The a priori probabilities, q 1 0, q 2 0.
Scenario/i12
D0.20.8
E1.01.0
Table Table 3. The components, i, i ∈ [1, 5].

Click here to display table

Table 3. The components, i, i ∈ [1, 5].
Scenario/i12345
A0.360.360.360.360.36
B0.0360.0720.1080.1080.036
C0.1080.1440.0360.0180.054
Table Table 4. The components, 1, 2.

Click here to display table

Table 4. The components, 1, 2.
Scenario/i12
D0.0720.288
E0.360.36
Table Table 5. The 𝒮 p q 1 ˜-estimates of the probabilities, p, q, in Problem 1.

Click here to display table

Table 5. The 𝒮 p q 1 ˜-estimates of the probabilities, p, q, in Problem 1.
Scenario p 1 * p 2 * p 3 * p 4 * p 5 * q 1 * q 2 *H*
AD0.140.170.280.200.210.280.721.56
BD0.080.160.260.340.160.250.75−0.03
CD0.170.310.220.060.240.400.60−0.23
AE0.230.220.190.190.160.450.552.29
BE0.140.240.180.320.120.390.610.63
CE0.260.0.380.130.050.180.580.420.67
Table Table 6. The first quasi-moments of the parameters and noises in Problem 1.

Click here to display table

Table 6. The first quasi-moments of the parameters and noises in Problem 1.
Scenario a 1 * a 2 * a 3 * a 4 * a 5 * ξ 1 * ξ 2 *ε
AD1.431.662.782.012.13−1.332.650.13
BD0.781.642.623.371.58−1.523.040.03
CD1.673.082.250.632.37−0.621.230.30
AE2.352.221.911.881.64−0.310.620.15
BE1.412.361.843.211.18−0.651.300.02
CE2.563.761.300.561.820.46−0.910.36
Table Table 7. The 𝒮 p q 1 ˜-estimates of the probabilities, p, q, in Problem 2.

Click here to display table

Table 7. The 𝒮 p q 1 ˜-estimates of the probabilities, p, q, in Problem 2.
Scenario p 1 * p 2 * p 3 * p 4 * p 5 * q 1 * q 2 *H*
AD0.240.220.180.230.260.0.070.0.292.05
BD0.150.260.270.270.070.070.460.27
CD0.230.390.240.050.120.110.260.23
AE0.250.220.150.220.250.300.392.37
BE0.150.260.230.260.060.340.600.67
CE0.260.400.170.050.110.480.370.71
Table Table 8. The 1̃-quasi-moments of the parameters and noise in Problem 2.

Click here to display table

Table 8. The 1̃-quasi-moments of the parameters and noise in Problem 2.
Scenario a 1 * a 2 * a 3 * a 4 * a 5 * ξ 1 * ξ 2 *ε
AD2.372.261.832.352.64−2.61−2.570.14
BD1.502.642.712.740.68−2.58−0.470.06
CD2.343.952.440.521.23−2.34−2.920.33
AE2.502.241.502.212.50−1.17−1.350.16
BE1.522.582.352.600.66−0.971.210.06
CE2.623.981.680.471.11−0.12−1.510.36
Table Table 9. The values of β + ( 1 , 2 ), β(1,2)−, α + ( 1 , 2 ) and α ( 1 , 2 ).

Click here to display table

Table 9. The values of β + ( 1 , 2 ), β(1,2)−, α + ( 1 , 2 ) and α ( 1 , 2 ).
β + ( 1 ) β + ( 2 ) β ( 1 ) β ( 2 ) α + ( 1 ) α + ( 2 ) α ( 1 ) α ( 2 )
1.02.00.51.00.080.080.080.08
Table Table 10. The intervals for the parameters.

Click here to display table

Table 10. The intervals for the parameters.
j123456789
a j 0.500.460.421.000.920.850.850.790.72
a j +1.000.920.852.001.841.701.701.581.44
Table Table 11. The Lagrange multipliers, γj.

Click here to display table

Table 11. The Lagrange multipliers, γj.
j012345
γj−10.0475−6.8421−13.76881.544811.2746−37.2191
Table Table 12. The Lagrange multipliers, γj (continued).

Click here to display table

Table 12. The Lagrange multipliers, γj (continued).
j678
γj−32.9907−15.0797−29.7189
Table Table 13. The Lagrange multipliers, ωi and θi.

Click here to display table

Table 13. The Lagrange multipliers, ωi and θi.
i01
ωi26.645814.7240
θi9.9822−2.7918
Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert