Parameterization of the Stochastic Model for Evaluating Variable Small Data in the Shannon Entropy Basis

Oleh Bisikalo; Vyacheslav Kharchenko; Viacheslav Kovtun; Iurii Krak; Sergii Pavlov

doi:10.3390/e25020184

,

and

¹

Department of Automation and Intelligent Information Technologies, Faculty of Intelligent Information Technologies and Automation, Vinnytsia National Technical University, Khmelnitske Shose Str. 95, 21000 Vinnytsia, Ukraine

²

Department of Computer Systems, Networks and Cybersecurity, Faculty of Radio Electronics, Computer Systems and Information Communications, National Aerospace University KhAI, Chkalov Str. 17, 610700 Kharkiv, Ukraine

³

Department of Computer Control Systems, Faculty of Intelligent Information Technologies and Automation, Vinnytsia National Technical University, Khmelnitske Shose Str. 95, 21000 Vinnytsia, Ukraine

⁴

Department of Theoretical Cybernetics, Faculty of Computer Sciences and Cybernetics, Taras Shevchenko National University of Kyiv, Volodymyrska Str. 60, 01033 Kyiv, Ukraine

Entropy2023, 25(2), 184;https://doi.org/10.3390/e25020184

This article belongs to the Special Issue Entropy: The Cornerstone of Machine Learning

Version Notes

Order Reprints

Review Reports

Abstract

The article analytically summarizes the idea of applying Shannon’s principle of entropy maximization to sets that represent the results of observations of the “input” and “output” entities of the stochastic model for evaluating variable small data. To formalize this idea, a sequential transition from the likelihood function to the likelihood functional and the Shannon entropy functional is analytically described. Shannon’s entropy characterizes the uncertainty caused not only by the probabilistic nature of the parameters of the stochastic data evaluation model but also by interferences that distort the results of the measurements of the values of these parameters. Accordingly, based on the Shannon entropy, it is possible to determine the best estimates of the values of these parameters for maximally uncertain (per entropy unit) distortions that cause measurement variability. This postulate is organically transferred to the statement that the estimates of the density of the probability distribution of the parameters of the stochastic model of small data obtained as a result of Shannon entropy maximization will also take into account the fact of the variability of the process of their measurements. In the article, this principle is developed into the information technology of the parametric and non-parametric evaluation on the basis of Shannon entropy of small data measured under the influence of interferences. The article analytically formalizes three key elements: -instances of the class of parameterized stochastic models for evaluating variable small data; -methods of estimating the probability density function of their parameters, represented by normalized or interval probabilities; -approaches to generating an ensemble of random vectors of initial parameters.

Keywords:

Shannon entropy; machine learning; evaluation of small data; measurement errors; stochastic model; parametric optimization; normalized probabilities; interval probabilities

1. Introduction

One of the most relevant problems of modern science is the extraction of useful information from available data. In various fields of science, methodologies aimed at solving this problem are being developed. Each such methodology is based on a certain hypothesis about the properties of the data and the real or hypothetical source of their origin. In the context of the data evaluation problem, two fundamental hypotheses can be distinguished [1,2,3,4,5]. The first hypothesis focuses on directly measurable, deterministic parameters to identify potential functional dependencies between them. All data that cannot be attributed to one or more defined parameters are considered influences in this hypothesis and are rejected. Naturally, such an approach is adequate and productive only if the information is extracted from data obtained from a known, sufficiently investigated the source of origin. The second hypothesis focuses on the analysis of the data as such and is focused on identifying patterns in them, the presence of which can be assessed using a certain defined metric. This can be, for example, a measure of data sufficiency, a property of a sample from the general population, the normality of probability distribution densities, etc. It is practically impossible to guarantee the characteristics of these properties for specific data. However, the improbable becomes common if we analyze not data, but Big Data. This trend is the basis for the progress of such methodologies as mathematical statistics [2,6,7,8], machine learning [9,10,11,12], econometrics [13,14,15,16], financial mathematics [17,18,19] and control theory [20,21,22,23].

In recent decades, the first two of the methodologies just mentioned have been heard. Machine learning is based on the axiomatic perception of probability spaces, as outlined in the paradigm of the theory of statistical learning developed in the 1960s [24,25,26]. There are several dominant categories of machine learning, but the most common is tutored learning [9,10,27,28]. In this category, researchers work with symmetric finite datasets, summarized in the “input” and “output” entities. The purpose of data analysis is to identify the functional dependence between these entities. The set of admissible types of functions forms the hypothesis space of this category of machine learning. The machine learning algorithm consistently evaluates the expected risks of describing the dependence of the existing “input” and “output” entities by each type of function from the hypothesis space. The evaluation is carried out by calculating a single loss function for the entire research. The expected risk is understood as the product of the sum of the estimates and the probability distribution of the data. If the compatible mapping probability distribution is known, then finding the best hypothesis is a trivial task. In the general case, the distribution is unknown, so the machine learning algorithm chooses the most appropriate hypothesis according to a certain rule and proves this thesis by calculating the empirical risk. In addition to the computational complexity, the disadvantage of machine learning is the tendency of the algorithms of this methodology to minimize the loss function by overfitting the potentially best hypothesis to the available data (so-called overtraining [9,27,29]). A typical way to detect (but not prevent) overtraining is to test the best hypothesis on data that the algorithm has not yet worked on (the control sample). Methods of mathematical statistics are not subject to retraining, because they do not assess empirical risk as such.

A typical example of a problem, in the process of solving which the characteristic features of mathematical statistics and machine learning are manifested, is linear regression [7,8,9,10,11]. In the classic formulation of this problem, we need to find the regression coefficients that minimize the root mean square error between the reference entity “output” and its pattern as generated by the model. Such a problem can be solved in a closed form. The theory of statistical learning states that, if we choose the root mean square error as the loss function and carry out empirical risk optimization, then the obtained result will coincide with the one that we will obtain by applying traditional linear regression analysis. However, the maximum likelihood method [2,6,7,30] characteristic for mathematical statistics will demonstrate a similar result in this situation. By the way, the methods of mathematical statistics do not operate with the concepts of initial and test samples, but use metrics to evaluate the results of the model. In our example, the statistical approach allows us to reach the optimal solution because the solution itself exists in a closed form. The maximum likelihood method does not test alternative hypotheses and does not converge to the optimal solution, unlike a machine learning algorithm. However, if the piecewise linear loss function is used for the machine learning algorithm in the same problem, the final result does not coincide with the maximum likelihood method. The machine learning algorithm allows us to expand the space of relevant hypotheses with an a priori considered loss function. The process of their evaluation is carried out automatically. The maximum likelihood method can estimate the accuracy of the original model but does not allow us to automatically change its appearance. Therefore, the methods of machine learning and mathematical statistics work in different ways, while producing similar results. If the task of the researcher is to accurately predict the cost of housing, then machine learning tools are exactly what is needed. If a scientist is investigating the relationships between parameters or making scientifically based conclusions about the data, then a statistical model cannot be dispensed with.

Finally, machine learning experts say, “There are no such things as unsolvable problems, either data or computing power is scarce”. Indeed, everyone has heard about Big Data analysis [10,11,12,31]. Now, however, the issue of analyzing so-called “small data” is becoming increasingly common [32,33]. Classical machine learning approaches are helpless in such a situation. This circumstance prompted the authors to write this article.

Taking into account the strengths and weaknesses of the mentioned methods, we will formulate the necessary attributes of scientific research.

The object of the research is the process of the parameterization of the stochastic model for evaluating variable small data for machine learning purposes.

The research subject is probability theory and mathematical statistics, evaluation theory, information theory, mathematical programming methods and experiment planning theory.

The research aims to formalize the process of finding the best estimates of the probability density functions for the characteristic parameters of instances of the class of stochastic models for evaluating variable small data.

The research objectives are:

(1) To formalize the process of calculating the variable entropy estimation of the probability density functions of the characteristic parameters of the stochastic variable small data estimation model, represented by normalized probabilities;

(2) To formalize the process of calculating the variable entropy estimation of the probability density functions of the characteristic parameters of the stochastic variable small data estimation model, represented by interval probabilities;

(3) To justify the adequacy of the proposed mathematical apparatus and demonstrate its functionality with an example.

The main contribution of the research is that the article analytically summarizes the idea of applying the Shannon entropy maximization principle to sets that represent the results of observations of the “input” and “output” entities of the stochastic model for evaluating variable small data. To formalize this idea, a sequential transition from the likelihood function to the likelihood functional and the Shannon entropy functional is analytically described. Shannon’s entropy characterizes the uncertainty caused not only by the probabilistic nature of the parameters of the stochastic data evaluation model but also by influences that distort the results of the measurements of the values of these parameters. Accordingly, based on the Shannon entropy, it is possible to determine the best estimates of the values of these parameters for maximally uncertain (per entropy unit) influences that cause measurement variability. This postulate is organically transferred to the statement that the estimates of the probability distribution density of the parameters of the stochastic model of small data obtained as a result of Shannon entropy maximization will also take into account the fact of the variability of the process of their measurements. In the article, this principle is developed into the information technology of parametric and non-parametric evaluation on the basis of Shannon entropy of small data measured under the influence of interferences.

The highlights of the research are:

(1) Instances of the class of parameterized stochastic models for evaluating variable small data;

(2) Methods of estimating the probability density function of their parameters, represented by normalized or interval probabilities;

(3) Approaches to generating an ensemble of random vectors of initial parameters;

(4) A technique for statistical processing of such an ensemble using the Monte Carlo method to bring it to the desired numerical characteristics.

2. Models and Methods

2.1. Statement of the Research

Evaluation based on data that represent parametric signals or phenomena of physical, medical, economic, biological and other sources of origin is the functional purpose of evaluation theory as a branch of mathematical statistics. To solve the problem of evaluation, parametric and non-parametric approaches are used. In recent decades, the latter has noticeably dominated the former, which has become possible thanks to the “reactive” progress in the field of machine learning and artificial intelligence. At the same time, the focus of researchers’ interest is shifting from the study of the processes represented by Big Data to that of those processes about which the amount of data small, and the data itself contains errors. Such a preamble encourages the perception of the parameters of the small data evaluation model as stochastic quantities. Accordingly, we will call such a model a stochastic model for small data evaluation. The characteristics of such a model are the probability density functions of the stochastic parameters. The primary task in identifying a stochastic estimation model for specific small data is to estimate the parameters of these probability density functions. If this step is passed, then the identified stochastic evaluation model can be taken as a basis for forming moment models of small data, generating an ensemble of random vectors of the initial parameters and carrying out the statistical processing of such an ensemble using the Monte Carlo method [6,7,8] to bring it to the desired numerical characteristics. The formalization of the way to solve the primary problem formulated above has scientific potential and applied value.

Let there be a stochastic parameterized research object represented by the results of measurements, in which the matrix of values of the input parameters

X

with the dimension

[o \times n]

(entity “input”) is matched by a vector of values of the output parameter

y

with the dimension

[o \times 1]

(entity “output”), where

o

is the number of censored observations, and

n

is the number of input characteristic parameters of the research object.

The process of measuring the values of matrix

X

and vector

y

is characterized by errors, which are represented by the symmetrical matrix

N = (ν_{j i})

(variability of the measurement process),

i = \bar{1, n}

,

j = \bar{1, o}

, and vector

υ = (υ_{i})

, where

ν_{j i}

,

υ_{i}

are independent stochastic values,

\forall i, j

. The value of these stochastic quantities belongs to the intervals

N_{j i} = [ν_{j i}^{-}, ν_{j i}^{+}]

and

ϒ_{j} = [υ_{j}^{-}, υ_{j}^{+}]

, respectively.

The stochastic model of the

⟨ X, y ⟩

data evaluation is represented by an expression

s = F (X + N, α) + υ,

(1)

where

F

is a defined

o

-dimensional vector function,

α

is a random

n

-dimensional vector formed by independent stochastic parameters

α_{i}

,

i = \bar{1, n}

,

\forall α_{i} \in A_{i} = [α_{i}^{-}, α_{i}^{+}]

.

Let us assume that the parameters of the stochastic model and the variability of the measurements are continuous stochastic quantities, the values of which belong to the corresponding intervals of the tuple

⟨ N_{j i}, ϒ_{j}, A_{i} ⟩

(hereinafter—the “genuine” version of the stochastic Model (1) or

G ν S M

).

In this case, the probability density functions of the stochastic parameters of

G ν S M

(variability of measurements

P (α)

, input

W (N)

and output

Q (υ)

parameters) (Independent Stochastic Parameters of the Small Data Estimation Model) are described by the expressions:

P (α) = \prod_{i = 1}^{n} p_{i} (α_{i}),

(2)

W (N) = \prod_{j = 1}^{o} \prod_{i = 1}^{n} w_{j i} (ν_{j i}),

(3)

Q (υ) = \prod_{j = 1}^{o} q_{j} (υ_{j}),

(4)

where

α_{i} \in A_{i}

,

ν_{j i} \in N_{j i}

and

υ_{j} \in ϒ_{j}

, respectively. Formulating Expressions (2)–(4), the authors implied a priori that the measurement results were obtained in accordance with the provisions of the experiment planning theory. The corresponding variables are statistically independent.

Functions (2)–(4) will be evaluated based on data

⟨ X, y ⟩

according to Model (1), taking into account the available a priori information summarized by the tuple

⟨ P^{0} (α), W^{0} (N), Q^{0} (υ) ⟩

.

The stochastic Model (1) generates an ensemble of random vectors

s

, which can be compared with the vector

υ

obtained as a result of measurements. To carry out such an estimation of the probability density Functions (2)–(4), we will use

k

moments of the stochastic components of the vector

s

:

m^{(k)} = {M (s_{j}^{(k)})}, j = \bar{1, o},

where (Numerical characteristics for estimating these stochastic parameters)

M (s_{j}^{(k)}) = \int_{\begin{array}{l} α \in A, \\ ν \in N, \\ υ \in ϒ \end{array}} {(F_{j} (X + N, α) + υ_{j})}^{k} d P (α) d W (N) d Q (υ) .

Next, we will use moments of the first order

(k = 1)

. In accordance:

M (s) = \bar{s} = \int_{\begin{array}{l} α \in A, \\ ν \in N, \\ υ \in ϒ \end{array}} (F (X + N, α) + υ) d P (α) d W (N) d Q (υ) .

(5)

Another version of the implementation of the Model (1) will be one in which the parameters of the stochastic model and the variability of the measurements are continuous stochastic values, the belonging of which to the corresponding interval of the tuple

⟨ N_{j i}, ϒ_{j}, A_{i} ⟩

will be characterized by a certain probability (hereinafter—the “quasi” version of the stochastic Model (1) or

Q ν S M

). In this case:

(1) the parameters

α_{i}

take values in

A_{i}

intervals with probabilities

p_{i} \in [0, 1]

,

i = \bar{1, n}

;

(2) the parameters

ν_{j i}

take values in intervals

N_{j i}

with probabilities

w_{j i} \in [0, 1]

,

j = \bar{1, o}

,

i = \bar{1, n}

;

(3) the parameters

υ_{j}

take values in intervals

ϒ_{j}

with probabilities

q_{j} \in [0, 1]

,

j = \bar{1, o}

.

The available a priori information is summarized by the vector

(p_{i}^{0}, w_{j i}^{0}, q_{j}^{0})

,

j = \bar{1, o}

,

i = \bar{1, n}

.

At the same time, Expressions (2)–(4) retain their legitimacy. We generalize the initial numerical characteristics of

Q ν S M

in the form of a vector of quasi-momentums of the first order:

α = α^{-} + P L_{α}, ν = ν^{-} + W \otimes L_{ν}, υ = υ^{-} + Q L_{υ},

(6)

where

L_{α} = d i a g (α_{i}^{+} - α_{i}^{-} | i = \bar{1, n})

,

L_{ν} = d i a g (ν_{j i}^{+} - ν_{j i}^{-} | j = \bar{1, o}, i = \bar{1, n})

,

L_{α} = d i a g (υ_{j}^{+} - υ_{j}^{-} | j = \bar{1, o})

and the

\otimes

sign represents the element-by-element multiplication operation. Expressions (6) declare the replacement of the elements of the tuple

⟨ α, ν, υ ⟩

with their quasi-average values.

The analytical expression for the first-order quasi-momentum of the stochastic vector

s

can be obtained by substituting numerical Characteristics (6) into Expression (1):

\tilde{s} = F (ν^{-} + X + W \otimes L_{ν}, α^{-} + P L_{α}) + υ^{-} + Q L_{υ} .

(7)

In the context of the proposed statement of the research, we specify its aim and objectivities.

The research aims to formalize the process of finding the best estimates of the probability density functions for the

⟨ p, q ⟩

parameters of

G ν S M

and

Q ν S M

represented by Expressions (5) and (7), respectively.

The objectives of the research are:

(1) To formalize the process of calculating the variable entropy estimation of the probability density functions of characteristic parameters of

G ν S M

represented by normalized probabilities;

(2) To formalize the process of calculating the variable entropy estimation of the probability density functions of characteristic parameters of

Q ν S M

represented by interval probabilities;

(3) To justify the adequacy of the proposed mathematical apparatus and demonstrate its functionality with an example.

2.2. Parameterization of the Stochastic Model for Evaluating Variable Small Data in the Shannon Entropy Basis

Let us formulate the corresponding probability functionals for the available information about the values of the input and output parameters of the stochastic Model (1).

Taking into account the independence of the parameters of the “input” and “output” entities in the stochastic Model (1) and the variability of their measurement procedure, we determine the compatible probability density function

Φ (α, ν, υ)

and the corresponding logarithmic likelihood ratio

ϕ (α, ν, υ)

as

Φ (α, ν, υ) = P (α) W (ν) Q (υ),

(8)

ϕ (α, ν, υ) = \ln \frac{P (α)}{P^{0} (α)} + \ln \frac{W (ν)}{W^{0} (ν)} + \ln \frac{Q (υ)}{Q^{0} (υ)} .

(9)

Based on Expressions (8) and (9), we formulate the likelihood functional

L (P (α), W (ν), Q (υ))

:

\begin{matrix} L (P (α), W (ν), Q (υ)) = \int_{\begin{array}{l} α \in A, \\ ν \in N, \\ υ \in ϒ \end{array}} Φ (α, ν, υ) ϕ (α, ν, υ) d α d ν d υ = \int_{α \in A} P (α) \ln \frac{P (α)}{P^{0} (α)} d α \\ + \int_{ν \in N} W (ν) \ln \frac{W (ν)}{W^{0} (ν)} d ν + \int_{υ \in ϒ} Q (υ) \ln \frac{Q (υ)}{Q^{0} (υ)} d υ . \end{matrix}

(10)

Expression (10) presented in the

- L (P (α), W (ν), Q (υ))

format is the Shannon entropy functional [34,35]. According to its purpose, such a functional is a measure for evaluating the degree of variability of the elements of a tuple

⟨ P (α), W (ν), Q (υ) ⟩

. This fact determines the perspective of using such a functional for evaluating Functions (2)–(4). In the context of this motivation, let us transform Expression (10) into the form

\begin{matrix} H (\bar{s}) = - \sum_{i = 1}^{n} \int_{α_{i} \in A_{i}} p_{i} (α_{i}) \ln \frac{p_{i} (α_{i})}{p_{i}^{0} (α_{i})} d α_{i} - \sum_{j = 1}^{o} \sum_{i = 1}^{n} \int_{ν_{j i} \in N_{j i}} w_{j i} (ν_{j i}) \ln \frac{w_{j i} (ν_{j i})}{w_{j i}^{0} (ν_{j i})} d ν_{j i} \\ - \sum_{j = 1}^{o} \int_{υ_{j} \in ϒ_{j}} q_{j} (υ_{j}) \ln \frac{q_{j} (υ_{j})}{q_{j}^{0} (υ_{j})} d υ_{j} . \end{matrix}

(11)

The Functional (11) is defined for estimating the probability density functions of stochastic parameters of

G ν S M

. For

Q ν S M

, based on Expression (10), we obtain:

H (\tilde{s}) = - \sum_{i = 1}^{n} p_{i} \ln \frac{p_{i}}{p_{i}^{0}} - \sum_{j = 1}^{o} \sum_{i = 1}^{n} w_{j i} \ln \frac{w_{j i}}{w_{j i}^{0}} - \sum_{j = 1}^{o} q_{j} \ln \frac{q_{j}}{q_{j}^{0}} .

Based on Definition (11), we formulate the problem of finding the optimal estimate of the probability density functions of stochastic parameters of

G ν S M

, taking into account the fact of their variability, i.e.,

E_{\bar{s}}

.

We define the objective function of such an optimization problem as:

H (\bar{s}) \to \max .

(12)

We define the restrictions of the

E_{\bar{s}}

optimization problem as

E = P \cup W \cup Q,

(13)

that is, the probability distribution density of the variability of measurements

P (α) \in P

, input

W (N) \in W

and output

Q (υ) \in Q

parameters of

G ν S M

must belong to the space

E

defined by Expression (13), and

y = \frac{1}{{(M (F^{(k)} (X + ν, a) + υ^{(k)}))}^{k}},

(14)

that is, the elements of the vector with the results of measurements

y

are equal to the elements of the

k

th moment of the vector

s

raised to the

k^{- 1}

th power.

By analogy with the formulation of the optimization Problem (12)–(14), we formulate the problem of finding the optimal estimate of the probability density functions of stochastic parameters of

Q ν S M

, taking into account the fact of their variability, i.e.,

E_{\tilde{s}}

.

We define the objective function of such an optimization problem as:

H (\tilde{s}) \to \max .

(15)

Recall that the complex parameter

\bar{s}

generalizes a tuple of interval controlled parameters

⟨ P (α), W (ν), Q (υ) ⟩

(see Expressions (10) and (5)), and the complex parameter

\tilde{s}

focuses on the variability of measuring these characteristic parameters (see Expression (7)).

Considerations regarding the formulation of restrictions for finding the extremum of the objective Function (15) are identical to those embodied in Restrictions (13) and (14). At the same time, Restriction (13) fully satisfies the statement of the Problem (15), while Restriction (14) can be written in terms of the definition of

Q ν S M

:

y = F ((X + ν^{-} + W L_{ν}), (α^{-} + P L_{α})) + υ^{-} + Q L_{υ} .

(16)

Let us pay attention to the situation when the measurement errors

υ (t)

and the values of the vector of the initial parameters of the stochastic model

s (t)

are characterized by non-linearity of the

r

th degree:

s (t) = \sum_{r = 1}^{R} \sum_{i = 1}^{n} α_{i}^{h} x^{(h_{i}) (t)} + υ (t),

where

α = (α_{i})

is a vector of parameters, the independent stochastic elements of which take values from the ranges

A_{i} = [α_{i}^{-}, α_{i}^{+}]

with the probability distribution densities

p_{i} (α_{i})

,

i = \bar{1, n}

.

The measurement of the components of the entities “input” and “output” of the investigated process takes place at moments

t_{j}

,

j = \bar{1, o}

. The entity “input” is represented by a set of

r

-matrices,

r = \bar{1, R}

, of the form

X^{(r)} = (\begin{matrix} x_{1}^{(r)} (t_{1}) & \dots & x_{n}^{(r)} (t_{1}) \\ ⋮ & ⋱ & ⋮ \\ x_{1}^{(r)} (t_{o}) & \dots & x_{n}^{(r)} (t_{o}) \end{matrix}) = (\begin{matrix} x_{1, 1}^{(r)} & \dots & x_{1, n}^{(r)} \\ ⋮ & ⋱ & ⋮ \\ x_{o, 1}^{(r)} & \dots & x_{o, n}^{(r)} \end{matrix})

and the entity “output” is represented by stochastic elements of the vector

s = (s (t_{j}))

,

j = \bar{1, o}

.

Denoting

α^{(r)} = (α_{i}^{(r)})

,

i = \bar{1, n}

;

υ = (υ (t_{j})) = (υ_{j})

,

j = \bar{1, o}

, we present the Expression (17) in the form

s = \sum_{r = 1}^{R} X^{(r)} α^{(r)} + υ,

where the independent elements of the vector of the variability of measurements of the entity “output”

υ

take values in intervals

ϒ_{j} = [υ_{j}^{-}, υ_{j}^{+}]

with the probability density functions

Q (υ) = (q_{j} (υ_{j}))

,

j = \bar{1, o}

.

Let us identify and investigate the variable entropy estimate of the probability density functions

P (α) = (p_{i} (α_{i}))

,

i = \bar{1, n}

, and

Q (υ) = (q_{j} (υ_{j}))

,

j = \bar{1, o}

.

We present the objective function of the optimization Problem (12)–(14) in the form

H (\bar{s}) = - \sum_{i = 1}^{n} \int_{α_{i} \in A_{i}} p_{i} (α_{i}) \ln p_{i} (α_{i}) d α_{i} - \sum_{j = 1}^{o} \int_{α_{i} \in A_{i}} q_{j} (υ_{j}) \ln q_{j} (υ_{j}) d υ_{j} \to \max .

We present the system of Restrictions (13) and (14) in the form

{\bar{P}}_{i} (p_{i} (α_{i})) = 1 - \int_{α_{i} \in A_{i}} p_{i} (α_{i}) d α_{i} = 0, {\bar{Q}}_{j} (q_{j} (υ_{j})) = 1 - \int_{υ_{l} \in ϒ_{j}} q_{j} (υ_{j}) d υ_{j} = 0, Φ_{j} (P (α), Q (υ)) = - y_{j} + \sum_{r = 1}^{R} \sum_{i - 1}^{n} x_{j i}^{(r)} \int_{α_{i} \in A_{i}} α_{i}^{r} p_{i} (α_{i}) d α_{i} + \int_{υ \in ϒ_{j}} υ_{j} q_{j} (υ_{j}) d υ_{j},

where

i = \bar{1, n}

,

j = \bar{1, o}

.

Based on the necessary conditions of stationarity of the Lagrange functional [6,7,8], we will assert that the entropy estimates

E_{\bar{s}}^{(1)}

of the probability density functions

P (α)

and

Q (υ)

belong to continuously differentiable functions, respectively:

p_{i}^{\cup} \sim a_{i} \exp (- \sum_{r = 1}^{R} b_{i r} α_{i}^{h}),

(17)

q_{j}^{\cup} (υ_{j}) \sim c_{j} \exp (- d_{j} υ_{j}),

(18)

where

a_{i}

,

b_{i}

,

c_{j}

,

d_{j}

are fixed coefficients,

i = \bar{1, n}

,

j = \bar{1, o}

.

The conclusion generalized by Expressions (17) and (18) can be interpreted as follows:

(1) For a linear stochastic model of estimation of variable small data: entropy estimates

E_{\bar{s}}^{(1)}

are always exponential functions. The results of measuring the entities “input” and “output” of the investigated process determine the form, and not the type, of the

E_{\bar{s}}^{(1)}

-functions of the corresponding linear stochastic model;

(2) For a non-linear stochastic model for evaluating variable small data: the nomenclature of the types of functions of entropy estimates

E_{\bar{s}}^{(1)}

of the “input” and “output” entities of the investigated process is wider and includes both exponential and power types. The type of

E_{\bar{s}}^{(1)}

-functions depends on the organization of the measurement process of these “input” and “output” entities.

Therefore, it remains to formalize the variable entropy estimates

E_{\bar{s}}^{(1)} (E_{\tilde{s}}^{(1)})

of the probability density functions

p

and

q

of the parameters of

G ν S M (Q ν S M)

, respectively. Let us investigate the linear

G ν S M

without taking into account the variability of the measurement of the “input” entity:

\bar{s} = X p L_{α} + q L_{υ} + Ξ (α^{-}, υ^{-}),

(19)

where

Ξ (α^{-}, υ^{-}) = X α^{-} + υ^{-}

. We define the a priori probabilities by the elements of the tuple

⟨ p^{0}, q^{0} ⟩

.

Let us present the objective function of the optimization Problem (15) and (16) in the form

H (\bar{s}) = - \sum_{i = 1}^{n} p_{i} \ln \frac{p_{i}}{p_{i}^{0}} - \sum_{j = 1}^{o} q_{j} \ln \frac{q_{j}}{q_{j}^{0}} \to \max,

(20)

and the system of restrictions we present in the form

\sum_{i = 1}^{n} x_{j i} p_{i} L_{α}^{i} + q_{j} L_{υ}^{j} + Ξ_{j}, \forall j = \bar{1, o},

(21)

at

\sum_{i = 1}^{1} p_{i} = 1

,

\sum_{j = 1}^{o} q_{j} = 1

.

In terms of the Lagrange function, we present the solution of the mathematical programming Problem (20) and (21) as

L (\bar{s}) = H (\bar{s}) + β (1 - \sum_{i = 1}^{n} p_{i}) + μ (1 - \sum_{j = 1}^{o} q_{j}) + \sum_{j = 1}^{o} ψ_{j} (y_{j} - \sum_{i = 1}^{n} x_{j i} p_{i} L_{α}^{i} - q_{j} L_{υ}^{j} - Ξ_{j}),

(22)

where

β

μ

are fixed coefficients and

ψ = (ψ_{1}, \dots, ψ_{o})

is a set of Lagrange multipliers.

Entropy estimates

E_{\bar{s}}^{(1)} = [(p_{i}^{\cup} (ψ), i = \bar{1, n}), (q_{j}^{\cup} (ψ), j = \bar{1, o})]

are determined based on Expression (22):

0 \leq p_{i}^{\cup} (ψ) = \frac{p_{i}^{0} \exp (- \sum_{j = 1}^{o} x_{j i} ψ_{j} L_{α}^{i})}{\sum_{i = 1}^{n} p_{i}^{0} \exp (- \sum_{j = 1}^{o} x_{j i} ψ_{j} L_{α}^{i})} \leq 1, 0 \leq q_{j}^{\cup} (ψ) = \frac{q_{j}^{0} \exp (- ψ_{j} L_{υ}^{j})}{\sum_{j = 1}^{o} q_{j}^{0} \exp (- ψ_{j} L_{υ}^{j})} \leq 1, Φ_{j} (ψ) = \frac{1}{y_{j} - Ξ_{j}} \sum_{i = 1}^{n} x_{j i} p_{i}^{\cup} (ψ) L_{α}^{i} + q_{j}^{\cup} (ψ) L_{υ}^{j} = 1 .

(23)

Now let us investigate how the formulation and solution of the optimization Problem (20) and (21) will change if interval restrictions

0 \leq p_{i} \leq 1

,

\forall i \in [\bar{1, n}]

,

0 \leq q_{j} \leq 1

,

\forall j \in [\bar{1, o}]

, are respectively imposed on the values of the elements of the stochastic vectors

⟨ p, q ⟩

.

Under such conditions, the variable entropy estimate

E_{\tilde{s}}^{(1)}

of the probability density functions of the parameters

p

and

q

of

Q ν S M

can be obtained by solving the problem of finding the extreme generalized entropy of the form

H (\tilde{s}) = - \sum_{i = 1}^{n} (p_{i} \ln \frac{p_{i}}{{\overset{⌢}{p}}_{i}^{0}} + (1 - p_{i}) \ln (1 - p_{i})) - \sum_{j = 1}^{o} (q_{j} \ln \frac{q_{j}}{{\overset{⌢}{q}}_{j}^{0}} + (1 - q_{j}) \ln (1 - q_{j})) \to \max,

(24)

where

{\overset{⌢}{p}}_{i}^{0} = p_{i}^{0} / (1 - p_{i}^{0})

,

{\overset{⌢}{q}}_{j}^{0} = q_{j}^{0} / (1 - q_{j}^{0})

,

i = \bar{1, n}

,

j = \bar{1, o}

.

The objective Function (24) is supplemented by the adapted balance Equation (21):

\sum_{i = 1}^{n} x_{j i} p_{i} L_{α}^{i} + q_{j} L_{υ}^{j} + Ξ_{j} = y_{j},

(25)

where

0 \leq p_{i} \leq 1

,

0 \leq q_{j} \leq 1

,

i = \bar{1, n}

,

j = \bar{1, o}

.

Applying the method of Lagrange multipliers [7,8,36], the extreme entropy estimates

E_{\tilde{s}}^{(1)}

for the optimization Problem (24) and (25) will be obtained as a result of solving the system of equations

0 \leq p_{i}^{\cup} (ψ) = p_{i}^{0} / (p_{i}^{0} + (1 - p_{i}^{0}) \exp \sum_{j = 1}^{o} x_{j i} ψ_{j} L_{α}^{i}) \leq 1, 0 \leq q_{j}^{\cup} (ψ) = q_{j}^{0} / (q_{j}^{0} + (1 - q_{j}^{0}) \exp (- ψ_{j} L_{υ}^{j})) \leq 1, Φ_{j} (ψ) = \frac{1}{y_{j} - Ξ_{j}} \sum_{i = 1}^{n} x_{j i} p_{i}^{\cup} (ψ) ψ_{j} + q_{j}^{\cup} (ψ) L_{υ}^{j} = 1,

(26)

where

i = \bar{1, n}

,

j = \bar{1, o}

.

The starting point for calculating the variable entropy estimate

E_{\tilde{s}}^{(1)}

of the probability density functions of the parameters

p

and

q

of

Q ν S M

, both in the Interpretation (20) and (21), and in the Interpretation (24) and (25), is the calculation of the Lagrange multipliers

ψ

as a result of solving the systems of equations represented by Expressions (23) and (26), respectively. This process can be arranged, for example, according to the multiplicative algorithm [36]:

φ_{j}^{k + 1} = φ_{j}^{k} Φ_{j} (φ^{k}),

where

φ_{i} = \exp (- ψ_{j})

are exponential Lagrange multipliers,

φ_{j}^{0} > 0

,

j = \bar{1, o}

.

3. Experiments

Let us demonstrate the functionality of the mathematical apparatus proposed in Section 2 using the example of calculating the variable entropy estimate of the probability density functions of the characteristic parameters of the linear stochastic small data estimation model with the dimension of the entities “input” × “output” of

[5] \times [2]

. The matrix of the measurements of the “input” entity looks like this:

X = (\begin{matrix} 1.805 & 2.103 & 3.310 & 2.007 & 1.505 \\ 4.992 & 3.800 & 2.996 & 2.812 & 1.899 \end{matrix}) .

The vector of the measurements of the “output” entity, taking into account variability, looks like this:

y = (\begin{matrix} 21.091 & 32.814 \end{matrix}) .

Quasi-moments of the first order are described by the expressions:

a_{i} = 3.333 p_{i}, α_{i} \in A_{i}, \forall A_{i} \in A = [0, 10], i = \bar{1, 5}; υ_{1} = - 1 + 2 q_{1}, υ_{2} = - 2 + 4 q_{2}, υ_{1} \in ϒ_{1} = [- 3, 3], υ_{2} \in ϒ_{2} = [- 6, 6] .

The fixed parameters of the reference model are described by the vector

α^{0} = (\begin{matrix} 1.011 & 2.212 & 1.918 & 3.986 & 0.996 \end{matrix}) .

The deviations from the values specified in the vector

α^{0}

caused by the variability of the measurements are characterized by an error

ε = ‖ α^{0} - α ‖ / (‖ α^{0} ‖ + ‖ α ‖)

.

Summarizing the given initial information in the format of Expression (19), we obtain:

X L_{α} p + L_{υ} q = \vec{1},

where

L_{υ} = (\begin{matrix} 0.249 & 0 \\ 0 & 0.312 \end{matrix})

,

X L_{α} = (\begin{matrix} 0.747 & 0.873 & 1.366 & 0.834 & 0.622 \\ 1.065 & 0.982 & 0.767 & 0.721 & 0.449 \end{matrix})

,

\vec{1} = (\begin{matrix} 1 & 1 \end{matrix})

.

A priori information about the initial values of the vectors

p^{0} = (p_{i}^{0})

,

i = \bar{1, 5}

, and

q^{0} = (q_{j}^{0})

,

j = \bar{1, 2}

, is summarized in the corresponding named sets:

p_{A}^{0} = {1; 1; 1; 1; 1}

,

p_{B}^{0} = {0.1; 0.2; 0.3; 0.3; 0.1}

,

p_{C}^{0} = {0.3; 0.4; 0.1; 0.05; 0.15}

,

q_{D}^{0} = {0.2; 0.8}

,

q_{E}^{0} = {1; 1}

.

The tuple

⟨ p_{A}^{0}, q_{E}^{0} ⟩

implies a uniform distribution of the characteristic parameters

p

and disturbing influences causing measurement variability,

q

, respectively. Tuples

⟨ p_{B}^{0}, q_{D}^{0} ⟩

and

⟨ p_{C}^{0}, q_{E}^{0} ⟩

imply uneven distributions of the characteristic parameters and influences, while the latter represents the variant combined according to the a priori probabilities of the corresponding entities.

We obtain optimization problem Statements (20) and (24) for the initial parameters presented above.

The formulation of the optimization Problem (20) and (21) for the above-mentioned initial data has the form:

H (\tilde{s}) = - \sum_{i = 1}^{5} p_{i} \ln \frac{p_{i}}{p_{i}^{0}} - \sum_{j = 1}^{2} q_{j} \ln \frac{q_{j}}{q_{j}^{0}} \to \max, 0.747 p_{1} + 0.873 p_{2} + 1.366 p_{3} + 0.834 p_{4} + 0.622 p_{5} + 0.249 q_{1} = 1, 1.065 p_{1} + 0.982 p_{2} + 0.767 p_{3} + 0.721 p_{4} + 0.449 p_{5} + 0.312 q_{1} = 1; \sum_{i = 1}^{5} p_{i} = 1, p_{i} > 0; \sum_{j = 1}^{2} q_{j} = 1, q_{j} > 0 .

(27)

The formulation of the optimization Problem (24) and (25) for the above-mentioned initial data has the form:

H (\tilde{s}) = - \sum_{i = 1}^{5} (p_{i} \ln \frac{p_{i}}{{\overset{⌢}{p}}_{i}^{0}} + (1 - p_{i}) \ln (1 - p_{i})) - \sum_{j = 1}^{2} (q_{j} \ln \frac{q_{j}}{{\overset{⌢}{q}}_{j}^{0}} + (1 - q_{j}) \ln (1 - q_{j})) \to \max, 0.747 p_{1} + 0.873 p_{2} + 1.366 p_{3} + 0.834 p_{4} + 0.622 p_{5} + 0.249 q_{1} = 1, 1.065 p_{1} + 0.982 p_{2} + 0.67 p_{3} + 0.721 p_{4} + 0.449 p_{5} + 0.312 q_{1} = 1; {\overset{⌢}{p}}_{i}^{0} = p_{i}^{0} / (1 - p_{i}^{0}), {\overset{⌢}{q}}_{j}^{0} = q_{j}^{0} / (1 - q_{j}^{0}), i = \bar{1, 5}, j = \bar{1, 2}; 0 \leq p_{i} \leq 1, 0 \leq q_{j} \leq 1, i = \bar{1, 5}, j = \bar{1, 2} .

(28)

Such optimization problems can be solved by methods of non-linear mathematical programming [36]. In particular, for the above optimization problems, the extremum point is analytically identified as

(p_{i}^{*} = 0.36 p_{i}^{0}, q_{j}^{*} = 0.36 q_{j}^{0})

,

i = \bar{1, 5}

,

j = \bar{1, 2}

. So, for our example, the entropy

H (\tilde{s})

reaches its maximum at the point

(p^{*}, q^{*})

, where

p^{*} = f (i, p_{i}^{0})

,

q^{*} = f (j, q_{j}^{0})

,

i = \bar{1, 5}

,

j = \bar{1, 2}

.

Let us examine these dependencies, taking into account that we previously defined schemes for a priori values:

p^{0} = {p_{A}^{0}, p_{B}^{0}, p_{C}^{0}}

,

q^{0} = {q_{D}^{0}, q_{E}^{0}}

. For clarity, we present the dependences

p^{*} = f (i, p_{{A, B, C}}^{0})

and

q^{*} = f (j, q_{{D, E}}^{0})

in the form of diagrams (Figure 1 and Figure 2, respectively).

Figure 1. Visualization of dependence

p^{*} = f (i, p_{{A, B, C}}^{0})

.

Figure 2. Visualization of dependence

q^{*} = f (j, q_{{D, E}}^{0})

.

More detailed information on the values of the characteristic parameters of the investigated linear stochastic model of the small data evaluation presented in Section 3 can be seen in Figure 3 and Figure 4 (for

G ν S M

and for

Q ν S M

, respectively).

Figure 3. Visualization of dependence

E_{\bar{s}}^{(1)} (p_{i}^{\cup}, q_{j}^{\cup}, H (\bar{s})) = f (p_{{A, B, C}}^{0}, q_{{D, E}}^{0})

,

i = \bar{1, 5}

,

j = \bar{1, 2}

.

Figure 4. Visualization of dependence

E_{\tilde{s}}^{(1)} (p_{i}^{\cup}, q_{j}^{\cup}, H (\tilde{s})) = f (p_{{A, B, C}}^{0}, q_{{D, E}}^{0})

,

i = \bar{1, 5}

,

j = \bar{1, 2}

.

These figures visualize the values at the extremum point

(p^{*}, q^{*})

of

E^{(1)}

-estimates of such characteristic parameters as

p_{i}^{\cup}

,

i = \bar{1, 5}

;

q_{j}^{\cup}

,

j = \bar{1, 2}

, and

H^{*} (\bar{s})

(calculated by Expression (20) adapted to form (27)) and

H^{*} (\tilde{s})

(calculated by Expression (24) adapted to form (28)). At the same time, the schemes of the initial values of the vectors

p^{0} = (p_{i}^{0})

,

i = \bar{1, 5}

, and

q^{0} = (q_{j}^{0})

,

j = \bar{1, 2}

, are taken into account.

Comparing the symmetrical values visualized in Figure 3 and Figure 4, it can be concluded that the parameter estimates calculated for interval probabilities (i.e., for

Q ν S M

) are characterized by a larger value of the conditional maximum entropy than that inhered for

G ν S M

(i.e., for the normalized probabilities). The theoretical justification of this empirical fact is presented in Section 4.

Information about the state of the linear stochastic models, summarized by Expressions (27) and (28), is supplemented by such calculated data as:

(1) the value at the point of extremum

(p^{*}, q^{*})

of the quasi-moments of the characteristic parameters of

G ν S M

and

Q ν S M

(

α_{i}^{*}

,

i = \bar{1, 5}

),

(2) estimates of the variability of the above-mentioned parameters caused by interferences (

υ_{j}^{*}

,

j = \bar{1, 2}

),

(3) the errors

\bar{ε}

and

\tilde{ε}

, which characterize the deviation of the measured parameters

⟨ α, υ ⟩

from the reference

⟨ α^{0}, υ^{0} ⟩

for

G ν S M

and

Q ν S M

, respectively.

These data are visualized in Figure 5 and Figure 6.

Figure 5. Visualization of dependence

(α_{i}^{*}, υ_{j}^{*}, \bar{ε}) = f (p_{{A, B, C}}^{0}, q_{{D, E}}^{0})

,

i = \bar{1, 5}

,

j = \bar{1, 2}

.

Figure 6. Visualization of dependence

(α_{i}^{*}, υ_{j}^{*}, \tilde{ε}) = f (p_{{A, B, C}}^{0}, q_{{D, E}}^{0})

,

i = \bar{1, 5}

,

j = \bar{1, 2}

.

From the information shown in Figure 5 and Figure 6 (in addition to the information presented in Figure 3 and Figure 4), it can be concluded that the reference parameters and a priori probabilities are correlated. That is, the closer the values in the scheme of a priori probabilities are to the values of the reference parameters, the smaller the value of the error

ε

. This interpretation, in particular, explains the superiority of the scheme

(p_{B}, q_{D})

over the scheme

(p_{C}, q_{D})

, because

{\tilde{ε}}_{B D} < {\tilde{ε}}_{C D}

.

4. Discussion

Let us begin the analysis of the results presented in Section 3 of the applied use of the mathematical apparatus proposed in Section 2 with the fact that the estimates of the parameters

⟨ p, q ⟩

obtained as a result of solving optimization Problems (27) (derived from Problem (20), (21) and (28)) (derived from Problem (24) and (25)), turn out to be different in terms of the value of the generalized entropy (Expressions (20) and (24), respectively). We will explain this fact on the theoretical basis of the models presented in Section 2.

To simplify the formulations, we will introduce several renovations. Let us redefine entropy

H (s)

as

H (e) = H (s)

, where

e = ⟨ p, q ⟩

. Accordingly,

e_{1}^{*}

will be the optimal estimate of the parameters

⟨ p^{*}, q^{*} ⟩

represented by normalized probabilities (

H (\bar{s})

variant) and

e_{2}^{*}

will be the optimal estimate of the parameters

⟨ p^{*}, q^{*} ⟩

represented by interval probabilities (

H (\tilde{s})

variant). Let us denote

\hat{e} = \arg \max H (e)

and define the sets

\bar{E} = {e : ⟨ p, 1 ⟩, ⟨ q, 1 ⟩} \subset \tilde{E} = {e : 0 \leq e \leq 1} .

(29)

Summarizing what has been entered, we formulate the following: if

\hat{e} \in (R_{+}^{(n + o)} \ \bar{E})

then

H (e_{1}^{*}) < H (e_{2}^{*})

. The equality

H (e_{1}^{*}) = H (e_{2}^{*})

holds when

e_{1}^{*} = \hat{e}

. Let us explain our conclusions. The analysis of the function described by Expression (20) shows that it is a concave function with a single maximum at the point

\hat{e}

. The value of entropy

H (e)

depends on the distance of a point

e

from the extreme point

\hat{e}

. In this context, we denote as

Δ (\hat{e}, e_{1}^{*})

the distance between the extreme point

\hat{e}

and the point

e_{1}^{*}

, the coordinates of which we obtain as a result of solving optimization Problem (20) and (21). Accordingly, the parameter

Δ (\hat{e}, e_{2}^{*})

characterizes the distance between the extreme point

\hat{e}

and the point

e_{2}^{*}

, the coordinates of which we obtain as a result of solving optimization Problem (24) and (25). Since Function (20) is strictly concave, based on the Relation (29) we can conclude that

Δ (\hat{e}, e_{1}^{*}) < Δ (\hat{e}, e_{2}^{*})

. The equality

Δ (\hat{e}, e_{1}^{*}) = Δ (\hat{e}, e_{2}^{*})

holds only when

e_{1}^{*} = \hat{e}

. The presented theoretical explanations explain the discrepancy between those presented in Figure 3 and Figure 4 empirical values of

E_{\bar{s}}^{(1)} (H (\bar{s}))

and

E_{\tilde{s}}^{(1)} (H (\tilde{s}))

for the same schemes

(p_{{A, B, C}}^{0}, q_{{D, E}}^{0})

. Comparing the symmetrical values visualized in Figure 3 and Figure 4, it can be concluded that parameter estimates calculated for the interval probabilities (i.e., for

Q ν S M

) are characterized by a larger value of the conditional maximum entropy estimate than that characteristic of the normalized probabilities of

G ν S M

. Thus, the mathematical apparatus presented in Section 2 was empirically confirmed in Section 3.

In addition, the results of the experiments presented in Section 3 confirmed the conclusion generalized by Expressions (17) and (18) that, for a linear stochastic model of variable small data estimation, entropy estimates

E_{\bar{s}}^{(1)}

are always exponential functions. The results of measuring the “input” and “output” entities of the investigated process determine the form, and not the type, of the

E_{\bar{s}}^{(1)}

-functions of the corresponding linear stochastic model of small data estimation.

The results shown in Figure 3 and Figure 4 showed that a priori information about the initial values of the vectors

p^{0} = (p_{i}^{0})

,

i = \bar{1, 5}

, and

q^{0} = (q_{j}^{0})

,

j = \bar{1, 2}

, summarized in the corresponding named sets of

p_{{A, B, C}}^{0}

,

q_{{D, E}}^{0}

, has a significant effect on the

E_{{\bar{s}, \tilde{s}}}^{(1)} (p_{i}^{\cup}, q_{j}^{\cup}, H ({\bar{s}, \tilde{s}}))

estimates.

In this context, the fact that the author’s mathematical apparatus allows the calculation of the quasi-momentums of the characteristic parameters

α_{i}^{*}

,

i = \bar{1, 5}

, of both the

G ν S M

and

Q ν S M

, as well as the taking into account of their variability

υ_{j}^{*}

,

j = \bar{1, 2}

, caused by the measurement errors, is very relevant. From those visualized in Figure 5 and Figure 6 of the data, it can be seen that the

ε

deviations from the values indicated in the vector

α^{0}

caused by the variability of the measurements are most pronounced for the schemes

(p_{C}^{0}, q_{D}^{0})

and

(p_{C}^{0}, q_{E}^{0})

. These schemes are characterized by the fact that the essential parameters of the models are characterized by an uneven distribution (see Figure 1, “C”), and the influence parameters are characterized by both uneven (see Figure 2, “D”) and uniform distributions (see Figure 2, “E”). For both schemes, we obtained:

{\bar{ε}}_{C D} = 0.30

,

{\bar{ε}}_{C E} = 0.36

;

{\tilde{ε}}_{C D} = 0.33

,

{\tilde{ε}}_{C E} = 0.36

. Therefore, for the considered example, the unevenness of the distribution of parameters

p_{i}

,

i = \bar{1, 5}

provided a significant contribution to the high value of errors

ε

. Reliable a priori information turned out to be very important in the entropy estimation of variable small data.

5. Conclusions

The article analytically summarizes the idea of applying the Shannon entropy maximization principle to sets that represent the results of observations of the “input” and “output” entities of the stochastic model for evaluating variable small data. To formalize this idea, a sequential transition from the likelihood function to the likelihood functional and the Shannon entropy functional is analytically described. Shannon’s entropy characterizes the uncertainty caused not only by the probabilistic nature of the parameters of the stochastic data evaluation model but also by influences that distort the results of measurements of the values of these parameters. Accordingly, based on the Shannon entropy, it is possible to determine the best estimates of the values of these parameters for maximally uncertain (per entropy unit) influences that cause measurement variability. This postulate is organically transferred to the statement that the estimates of the probability distribution density of the parameters of the stochastic model of small data obtained as a result of Shannon entropy maximization will also take into account the fact of the variability of the process of their measurements. In the article, this principle is developed into the information technology of the parametric and non-parametric evaluation on the basis of Shannon entropy of small data measured under the influence of interferences.

The article also examines the structural properties of stochastic models for variable data evaluation, the parameters of which were represented by normalized or interval probabilities. At the same time, the inherent non-linearity of these models and errors in measuring the values of the “output” entity was taken into account.

The functionality and adequacy of the created mathematical apparatus are proven based on the empirical results obtained during the investigation of the linear stochastic model of evaluating specific variable small data.

The authors acknowledge that the research presented in the article is formulated in academic form. This circumstance complicates the applied use of the obtained results. At the same time, the developed methodological approach can be useful in various important applications. In particular, it concerns the assessment of software reliability, when the sample of data is usually not large due to the difficulties of reliably assessing them during the testing and operation of the system. In this case, the lack of testing data or information about failures during pilot software operation can be compensated for by analyzing the assumptions that are specific to the software and selecting appropriate models using assumption matrices [37]. Thus, studies that combine the analysis of small data and expert methods are interesting.

Another important application is in safety critical systems, which, due to multi-level reserving, have as a rule a low failure rate and small data about them. On the other hand, it is extremely important for such systems to have accurate or at least interval estimates of indicators with an acceptable range. For that, the described method could be combined with the traditional methods of reliability analysis and risk oriented assessing of safety indicators using formal and semi-formal methods [38].

In this regard, further research is proposed to formalize the obtained information technology on a UML basis. This will allow the future work to reach the stage of implementing the profile framework. In addition, it would be very interesting and useful from a practical point of view to combine Big and Small Data analysis to create universal or adaptable framework focusing on the assessment of data quality and their selection according to the quality indicator.

Author Contributions

Conceptualization, O.B., V.K. (Vyacheslav Kharchenko), V.K. (Viacheslav Kovtun), I.K. and S.P.; methodology, V.K. (Viacheslav Kovtun); software, V.K. (Viacheslav Kovtun); validation, V.K. (Vyacheslav Kharchenko) and V.K. (Viacheslav Kovtun); formal analysis, V.K. (Viacheslav Kovtun); investigation, V.K. (Viacheslav Kovtun); resources, V.K. (Viacheslav Kovtun); data curation, V.K. (Viacheslav Kovtun); writing—original draft preparation, V.K. (Vyacheslav Kharchenko) and V.K. (Viacheslav Kovtun); writing—review and editing, O.B., V.K. (Vyacheslav Kharchenko), V.K. (Viacheslav Kovtun), I.K. and S.P.; visualization, V.K. (Viacheslav Kovtun); supervision, V.K. (Viacheslav Kovtun); project administration, V.K. (Viacheslav Kovtun); funding acquisition, V.K. (Viacheslav Kovtun). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Most data is contained within the article. All the data available on request due to restrictions, e.g., privacy or ethical.

Acknowledgments

The authors would like to thank the Armed Forces of Ukraine for providing security to perform this work. This work has become possible only because of the resilience and courage of the Ukrainian Army.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ebrahimi, B.; Dellnitz, A.; Kleine, A.; Tavana, M. A novel method for solving data envelopment analysis problems with weak ordinal data using robust measures. Expert Syst. Appl. 2021, 164, 113835. [Google Scholar] [CrossRef]
Kovtun, V.; Kovtun, O.; Semenov, A. Entropy-Argumentative Concept of Computational Phonetic Analysis of Speech Taking into Account Dialect and Individuality of Phonation. Entropy 2022, 24, 1006. [Google Scholar] [CrossRef] [PubMed]
Viacheslav, K.; Kovtun, O. System of methods of automated cognitive linguistic analysis of speech signals with noise. Multimedia Tools Appl. 2022, 81, 43391–43410. [Google Scholar] [CrossRef]
Angeles, K.; Kijewski-Correa, T. Advancing building data models for the automation of high-fidelity regional loss estimations using open data. Autom. Constr. 2022, 140, 104382. [Google Scholar] [CrossRef]
Hao, R.; Zheng, H.; Yang, X. Data augmentation based estimation for the censored composite quantile regression neural network model. Appl. Soft Comput. 2022, 127, 109381. [Google Scholar] [CrossRef]
Garza-Ulloa, J. Methods to develop mathematical models: Traditional statistical analysis. In Applied Biomechatronics Using Mathematical Models; Elsevier: Amsterdam, The Netherlands, 2018; pp. 239–371. [Google Scholar]
Gao, Y.; Shi, Y.; Wang, L.; Kong, S.; Du, J.; Lin, G.; Feng, Y. Advances in mathematical models of the active targeting of tumor cells by functional nanoparticles. Comput. Methods Programs Biomed. 2020, 184, 105106. [Google Scholar] [CrossRef]
Yang, X.-S.; He, X.-S.; Fan, Q.-W. Mathematical framework for algorithm analysis. In Nature-Inspired Computation and Swarm Intelligence; Academic Press: Cambridge, MA, USA, 2020; pp. 89–108. [Google Scholar] [CrossRef]
Wang, X.; Liu, A.; Kara, S. Machine learning for engineering design toward smart customization: A systematic review. J. Manuf. Syst. 2022, 65, 391–405. [Google Scholar] [CrossRef]
Khan, T.; Tian, W.; Zhou, G.; Ilager, S.; Gong, M.; Buyya, R. Machine learning (ML)-centric resource management in cloud computing: A review and future directions. J. Netw. Comput. Appl. 2022, 204, 103405. [Google Scholar] [CrossRef]
Gonçales, L.J.; Farias, K.; Kupssinskü, L.S.; Segalotto, M. An empirical evaluation of machine learning techniques to classify code comprehension based on EEG data. Expert Syst. Appl. 2022, 203, 117354. [Google Scholar] [CrossRef]
Sholevar, N.; Golroo, A.; Esfahani, S.R. Machine learning techniques for pavement condition evaluation. Autom. Constr. 2022, 136, 104190. [Google Scholar] [CrossRef]
Alam, J.; Georgalos, K.; Rolls, H. Risk preferences, gender effects and Bayesian econometrics. J. Econ. Behav. Organ. 2022, 202, 168–183. [Google Scholar] [CrossRef]
Cladera, M. Assessing the attitudes of economics students towards econometrics. Int. Rev. Econ. Educ. 2021, 37, 100216. [Google Scholar] [CrossRef]
Joubert, J.W. Accounting for population density in econometric accessibility. Procedia Comput. Sci. 2022, 201, 594–600. [Google Scholar] [CrossRef]
MacKinnon, J.G. Using large samples in econometrics. J. Econ. 2022. [Google Scholar] [CrossRef]
Nazarkevych, M.; Voznyi, Y.; Hrytsyk, V.; Klyujnyk, I.; Havrysh, B.; Lotoshynska, N. Identification of Biometric Images by Machine Learning. In Proceedings of the 2021 IEEE 12th International Conference on Electronics and Information Technologies (ELIT), Lviv, Ukraine, 19–21 May 2021. [Google Scholar] [CrossRef]
Yusuf, A.; Qureshi, S.; Shah, S.F. Mathematical analysis for an autonomous financial dynamical system via classical and modern fractional operators. Chaos Solitons Fractals 2020, 132, 109552. [Google Scholar] [CrossRef]
Balbás, A.; Balbás, B.; Balbás, R. Omega ratio optimization with actuarial and financial applications. Eur. J. Oper. Res. 2021, 292, 376–387. [Google Scholar] [CrossRef]
Giua, A.; Silva, M. Petri nets and Automatic Control: A historical perspective. Annu. Rev. Control 2018, 45, 223–239. [Google Scholar] [CrossRef]
Sleptsov, E.S.; Andrianova, O.G. Control Theory Concepts: Analysis and Design, Control and Command, Control Subject, Model Reduction. IFAC-PapersOnLine 2021, 54, 204–208. [Google Scholar] [CrossRef]
Knorn, S.; Varagnolo, D. Automatic control: The natural approach for a quantitative-based personalized education. IFAC-PapersOnLine 2020, 53, 17326–17331. [Google Scholar] [CrossRef]
Rubio-Fernández, P.; Mollica, F.; Ali, M.O.; Gibson, E. How do you know that? Automatic belief inferences in passing conversation. Cognition 2019, 193, 104011. [Google Scholar] [CrossRef]
Aljohani, M.D.; Qureshi, R. Proposed Risk Management Model to Handle Changing Requirements. Int. J. Educ. Manag. Eng. 2019, 9, 18–25. [Google Scholar] [CrossRef]
Arefin, M.A.; Islam, N.; Gain, B. Roknujjaman Accuracy Analysis for the Solution of Initial Value Problem of ODEs Using Modified Euler Method. Int. J. Math. Sci. Comput. 2021, 7, 31–41. [Google Scholar] [CrossRef]
Ramadan, I.S.; Harb, H.M.; Mousa, H.M.; Malhat, M.G. Reliability Assessment for Open-Source Software Using Deterministic and Probabilistic Models. Int. J. Inf. Technol. Comput. Sci. 2022, 14, 1–15. [Google Scholar] [CrossRef]
Nayim, A.M.; Alam, F.; Rasel; Shahriar, R.; Nandi, D. Comparative Analysis of Data Mining Techniques to Predict Cardiovascular Disease. Int. J. Inf. Technol. Comput. Sci. 2022, 14, 23–32. [Google Scholar] [CrossRef]
Goncharenko, A.V. Specific Case of Two Dynamical Options in Application to the Security Issues: Theoretical Development. Int. J. Comput. Netw. Inf. Secur. 2021, 14, 1–12. [Google Scholar] [CrossRef]
Padmavathi, C.; Veenadevi, S.V. An Automated Detection of CAD Using the Method of Signal Decomposition and Non Linear Entropy Using Heart Signals. Int. J. Image Graph. Signal Process. 2019, 11, 30–39. [Google Scholar] [CrossRef]
Mwambela, A. Comparative Performance Evaluation of Entropic Thresholding Algorithms Based on Shannon, Renyi and Tsallis Entropy Definitions for Electrical Capacitance Tomography Measurement Systems. Int. J. Intell. Syst. Appl. 2018, 10, 41–49. [Google Scholar] [CrossRef]
Dronyuk, I.; Fedevych, O.; Poplavska, Z. The generalized shift operator and non-harmonic signal analysis. In Proceedings of the 2017 14th International Conference The Experience of Designing and Application of CAD Systems in Microelectronics (CADSM), Lviv, Ukraine, 21–25 February 2017. [Google Scholar] [CrossRef]
Hu, Z.; Tereykovskiy, I.A.; Tereykovska, L.O.; Pogorelov, V.V. Determination of Structural Parameters of Multilayer Perceptron Designed to Estimate Parameters of Technical Systems. Int. J. Intell. Syst. Appl. 2017, 9, 57–62. [Google Scholar] [CrossRef]
Izonin, I.; Tkachenko, R.; Shakhovska, N.; Lotoshynska, N. The Additive Input-Doubling Method Based on the SVR with Nonlinear Kernels: Small Data Approach. Symmetry 2021, 13, 612. [Google Scholar] [CrossRef]
Hu, Z.; Mashtalir, S.V.; Tyshchenko, O.K.; Stolbovyi, M.I. Clustering Matrix Sequences Based on the Iterative Dynamic Time Deformation Procedure. Int. J. Intell. Syst. Appl. 2018, 10, 66–73. [Google Scholar] [CrossRef]
Hu, Z.; Khokhlachova, Y.; Sydorenko, V.; Opirskyy, I. Method for Optimization of Information Security Systems Behavior under Conditions of Influences. Int. J. Intell. Syst. Appl. 2017, 9, 46–58. [Google Scholar] [CrossRef]
Pineda, S.; Morales, J.M.; Wogrin, S. Mathematical programming for power systems. In Encyclopedia of Electrical and Electronic Power Engineering; Elsevier: Amsterdam, The Netherlands, 2023; pp. 722–733. [Google Scholar] [CrossRef]
Kharchenko, V.S.; Tarasyuk, O.M.; Sklyar, V.V.; Dubnitsky, V.Y. The method of software reliability growth models choice using assumptions matrix. In Proceedings of the 26th Annual International Computer Software and Applications Conference (COMPSAC), Oxford, UK, 26–29 August 2002; pp. 541–546. [Google Scholar]
Babeshko, E.; Kharchenko, V.; Leontiiev, K.; Ruchkov, E. Practical Aspects of Operating and Analytical Reliability Assessment of Fpga-Based I&C Systems. Radioelectron. Comput. Syst. 2020, 3, 75–83. [Google Scholar] [CrossRef]

Figure 1. Visualization of dependence

p^{*} = f (i, p_{{A, B, C}}^{0})

.

Figure 2. Visualization of dependence

q^{*} = f (j, q_{{D, E}}^{0})

.

Figure 3. Visualization of dependence

E_{\bar{s}}^{(1)} (p_{i}^{\cup}, q_{j}^{\cup}, H (\bar{s})) = f (p_{{A, B, C}}^{0}, q_{{D, E}}^{0})

,

i = \bar{1, 5}

,

j = \bar{1, 2}

.

Figure 4. Visualization of dependence

E_{\tilde{s}}^{(1)} (p_{i}^{\cup}, q_{j}^{\cup}, H (\tilde{s})) = f (p_{{A, B, C}}^{0}, q_{{D, E}}^{0})

,

i = \bar{1, 5}

,

j = \bar{1, 2}

.

Figure 5. Visualization of dependence

(α_{i}^{*}, υ_{j}^{*}, \bar{ε}) = f (p_{{A, B, C}}^{0}, q_{{D, E}}^{0})

,

i = \bar{1, 5}

,

j = \bar{1, 2}

.

Figure 6. Visualization of dependence

(α_{i}^{*}, υ_{j}^{*}, \tilde{ε}) = f (p_{{A, B, C}}^{0}, q_{{D, E}}^{0})

,

i = \bar{1, 5}

,

j = \bar{1, 2}

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Parameterization of the Stochastic Model for Evaluating Variable Small Data in the Shannon Entropy Basis

Abstract

1. Introduction

2. Models and Methods

2.1. Statement of the Research

2.2. Parameterization of the Stochastic Model for Evaluating Variable Small Data in the Shannon Entropy Basis

3. Experiments

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics