Polynomial Distributions and Transformations

Yu, Yue; Loskot, Pavel

doi:10.3390/math11040985

Open AccessArticle

Polynomial Distributions and Transformations

by

Yue Yu

and

Pavel Loskot

^*

ZJU-UIUC Institute, Haining 314400, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(4), 985; https://doi.org/10.3390/math11040985

Submission received: 9 December 2022 / Revised: 8 February 2023 / Accepted: 13 February 2023 / Published: 15 February 2023

(This article belongs to the Special Issue Probability, Statistics and Their Applications 2021)

Download

Browse Figures

Versions Notes

Abstract

:

Polynomials are common algebraic structures, which are often used to approximate functions, such as probability distributions. This paper proposes to directly define polynomial distributions in order to describe stochastic properties of systems rather than to assume polynomials for only approximating known or empirically estimated distributions. Polynomial distributions offer great modeling flexibility and mathematical tractability. However, unlike canonical distributions, polynomial functions may have non-negative values in the intervals of support for some parameter values; their parameter numbers are usually much larger than for canonical distributions, and the interval of support must be finite. Hence, polynomial distributions are defined here assuming three forms of a polynomial function. Transformations and approximations of distributions and histograms by polynomial distributions are also considered. The key properties of the polynomial distributions are derived in closed form. A piecewise polynomial distribution construction is devised to ensure that it is non-negative over the support interval. A goodness-of-fit measure is proposed to determine the best order of the approximating polynomial. Numerical examples include the estimation of parameters of the polynomial distributions and generating polynomially distributed samples.

Keywords:

approximation; distribution; histogram; least-squares; polynomial; probability density

MSC:

60E05

1. Introduction

Approximating functions is motivated by reducing the computational complexity and achieving the analytical tractability of mathematical models. This also includes the problems of finding low-complexity and low-dimensional mathematical models for continuous or discrete-time observations, such as time-series data, and empirically determined features, such as histograms. This paper is concerned with the latter problem, i.e., how to effectively model the probability distributions of observation data. In particular, it is proposed to define polynomial probability distributions rather than to assume polynomial approximations of probability distributions. This is a major departure from the reasoning found in the existing literature.

Polynomial distributions provide superior flexibility over other canonical distributions, albeit at a cost of a larger number of parameters, and the support interval is constrained to a finite range of values. The main advantages of polynomial distributions are that they can yield parameterized closed-form expressions and enable the modeling of complex multi-modal and time-evolving probability distributions. These distributions are encountered, for example, when describing causal interactions and state transitions in dynamic systems. This may lead to the development of novel probabilistic mathematical frameworks. The disadvantage is that, in the case of a general polynomial function, it may be difficult to ensure that the polynomial is non-negative over the whole intended interval of support. The non-negativity can be guaranteed, for example, by assuming squared polynomials.

The Weierstrass theorem [1] is the fundamental result in the approximation theory of functions. It states that every continuous function can be uniformly approximated with an arbitrary precision over any finite interval by a polynomial of a sufficient order. The uniform approximation can be expressed as a sequence of algebraic polynomials uniformly converging over a given interval to the function of interest. The approximation accuracy can be evaluated by different metrics including

l_{p}

-norms, minimax norm, and others. The best approximating function from a set or a sequence of functions and its properties can be determined by the Jackson theorem. The Stone–Weierstrass theorem generalizes the function approximation to cases of multivariate functions and functions in multiple dimensions [2].

Runge’s phenomenon arises when the approximating polynomials contain a set of predefined points, which can prevent uniform convergence from being possible [3]. The equidistant approximating points can be optimized using the Lebesgue constant as a measure of the approximation accuracy [4,5].

Polynomials can be used to approximate known probability distributions as well as distributions estimated as histograms [6,7]. Reference [8] is one of the earlier works that assume the approximation of probability distributions by a polynomial. The fitting of multivariate polynomials to multivariate cumulative distributions and their partial derivatives is studied in [9], whereas multivariate polynomial interpolation is studied in [10]. The conditions for the coefficients of a polynomial to be a sum of two squared polynomials are determined in [11].

The problem of fitting a polynomial into a finite number of data samples has been investigated in the classic reference [12]. The polynomial curve fitting methods are often available in various software packages [13]. Modeling times series data by piecewise polynomials is considered in [14,15]. The least-squares polynomial approximation of random data samples with standard and induced densities is compared in [16]. A new method for polynomial interpolation of data points within a plane is proposed in [17]. Interestingly, the recent survey [18] on approximating probability distributions does not mention polynomial approximation as one of the available methods.

The polynomial expansion of chaos for the reliability analysis of systems is proposed in [19]. A polynomial kernel for feature learning from data is considered in [20]. The Stone–Weierstrass theorem is assumed in [21] to design a neural network that can approximate an arbitrary measurable function. The method for function approximation by a polynomial using a neural network is investigated in [22].

Polynomials can be sparse, i.e., only some of their coefficients—including the coefficient determining the order—are non-zero. The polynomials with special properties are named; for example, there are Lagrange, Legendre, Diskson, Chebyshev, and Bernstein polynomials [23,24]. Special polynomials, such as Hermite and Lagrangian polynomials, can form the basis for function decomposition. There is a close link between approximating periodic continuous functions and trigonometric polynomials in the Fourier analysis [25]. A procedure for orthogonal polynomial decomposition of multivariate distributions was devised in [26] in order to compute the output of a multidimensional system with the stochastic input.

Reference [23] is a comprehensive textbook on the theory of polynomials covering fundamental theorems, special polynomials, polynomial algebra, finding and approximating polynomial roots, finding polynomial factors, solving polynomial equations, and defining polynomial inequalities and properties of polynomial approximations. The other textbook [24] includes additional topics, such as critical points of polynomials, the compositions of polynomial functions, theorems and conjectures about polynomials, and defining extremal properties of polynomials. Although the textbook [27] focuses on solving differential equations by polynomial approximations, it also provides a necessary background on polynomials including their definitions and properties. Differential equations are solved by Jacobi polynomial approximation in [28].

The properties of minima and maxima of polynomials were studied in [29]. An algorithm for finding the global minimum of a general multivariate polynomial was developed in [30]. The number of local minima of a multivariate polynomial is bounded in [31]. Sturm series are assumed in [32] to find the maxima of a polynomial.

In this paper, polynomial distributions are introduced in Section 2 including transformations of the polynomial distributions, fitting a histogram with a polynomial distribution, constructing a piecewise polynomial distribution, and defining the basic properties of a random polynomial function. In Section 3, selected properties of the polynomial distributions are derived. The estimation problems involving polynomial distributions and determining the polynomial order are considered in Section 4. Numerical examples are presented in Section 5 including constructing a piecewise polynomial, generating polynomially distributed random samples, estimating parameters of polynomial distributions by the method of moments and by fitting the observations, and approximating distributions by Lagrange interpolation. Section 5 ends with a summary of key findings. The paper is concluded in Section 6. In addition, the key expressions for polynomial functions in Form I, II, and III are summarized in Appendix A, Appendix B and Appendix C, respectively.

The following notations are used in the paper: X denotes a random variable whereas x denotes a specific value of this random variable;

{(\cdot)}^{T}

is matrix transpose;

{(\cdot)}^{- 1}

is matrix inverse; the operators,

E [\cdot]

and

var [\cdot]

, denote expectation and variance, respectively;

(\cdot)!

denotes factorial,

\overset{!}{=}

is used to find a value satisfying the indicated equality, and

〈\cdot, \cdot〉

denotes the dot-product of two vectors.

2. Defining Polynomial Distributions

Given a continuous and finite interval,

(l, u)

,

- \infty < l < u < + \infty

, the probability density function (PDF),

p (x)

, of a random variable, X, with the support,

(l, u)

, must satisfy the following two conditions,

\begin{matrix} p (x) & \geq 0, \forall x \in (l, u) \\ \int_{l}^{u} p (x) & d x = 1 . \end{matrix}

(1)

Assume that the PDF,

p (x)

, can be linearly expanded as

p (x) = a_{0} + \sum_{i = 1}^{n} a_{i} b_{i} (x)

(2)

into the n-dimensional basis of generally non-linear functions,

b_{i} (x)

. These functions can also be parameterized as

b_{i} (x) \equiv b (x; θ_{i})

. Provided that the functions,

b_{i} (x)

, are themselves PDFs, i.e., they satisfy the conditions (1), then the PDF (2) is referred to as mixture distribution, and

\sum_{i = 0}^{n} a_{i} = 1

.

In this paper, the functions,

b_{i} (x) = x^{i}

, are assumed, so that, the expression (2) represents an ordinary univariate polynomial of degree, n. The coefficients,

a_{i}

, can be a function of another common variable, e.g.,

a_{i} (y)

,

i = 0, 1, \dots, n

; this multivariate polynomial is referred to as algebraic function. The multivariate polynomial having the same degree of a non-zero term is referred to as being homogeneous (formerly a quantic polynomial).

The following three representations of real-valued polynomial functions are considered in this paper.

Definition 1.

\begin{matrix} Form I : & p_{n} (x) = \sum_{i = 0}^{n} a_{i} x^{i}, & a_{i} \in R, a_{n} \neq 0 \end{matrix}

(3a)

\begin{matrix} Form II : & p_{n} (x) = a_{n} \prod_{i = 1}^{n} (x - r_{i}), & r_{i} \in, a_{n} \neq 0 \end{matrix}

(3b)

\begin{matrix} Form III : & p_{n} (x) = \sum_{i = 1}^{n} \frac{a_{i}}{x - r_{i}}, & a_{i} \neq 0, r_{i} \neq r_{j} \forall i \neq j \end{matrix}

(3c)

Form I is a canonical polynomial function. Form II indicates that every n-degree polynomial has exactly n, generally complex-valued, roots

r_{i}

[33]. The number of real-valued roots can be determined by Sturm’s theorem. Form III is a rational polynomial function. The basic properties of the polynomial Forms I, II, and III are summarized in Appendix A, Appendix B and Appendix C, respectively, including roots, indefinite and definite integrals, derivatives, general statistical moments, and characteristic or moment-generating functions. Note that every polynomial function,

p_{n} (x)

, of any order, n, diverges when its argument, x, becomes unbounded. In addition, Forms I and II are equivalent as shown in Appendix B, and, for complex-conjugate roots,

(x - r_{i}) (x - r_{i}^{*}) = {(x - Re (r_{i}))}^{2} + Im {(r_{i})}^{2} > 0

. Form I defined by (3a) can also be computed recursively as

\begin{matrix} p_{n} (x) & = (. . . (((a_{n} x + a_{n - 1}) x + a_{n - 2}) x + a_{n - 3}) . . .) x + a_{0} \\ = x p_{n - 1} (x) + a . \end{matrix}

(4)

A Form I or II polynomial,

p_{n} (x)

, of degree n and all of its derivatives,

p_{n}^{(k)} (x) = \frac{d^{k}}{d x^{k}} p_{n} (x)

,

k \leq n

, is continuous and strictly bounded over a finite interval,

x \in (l, u)

. However, the polynomial forms in Definition 1 represent a PDF, if and only if, they satisfy both conditions (1). This can be achieved by using linear and non-linear transformations, which are defined in the following lemma.

Lemma 1.

A polynomial,

p_{n} (x)

, can become a PDF by using either of the following transformations.

(a): There exist finite real constants, A and B, such that the linearly transformed polynomial, $A p_{n} (x) + B$ , satisfies PDF conditions (1).
(b): There exists a real positive constant, $A > 0$ , such that the polynomial, $A | p_{n} (x) |$ , or, $A {(p_{n} (x))}_{+}$ , satisfies PDF conditions (1) where $| \cdot |$ denotes absolute value, and ${(\cdot)}_{+}$ changes the negative values of its argument to zero.
(c): There exists a low-degree polynomial, $q_{k} (x)$ , such that the polynomial, $q_{k} (p_{n} (x))$ , satisfies PDF conditions (1); for instance, $q_{1} (x) = A x + B$ [cf. (a)], or, $q_{2} (x) = A x^{2}$ , $A > 0$ .

Proof.

(a): Let, $b = {min}_{x \in (l, u)} p_{n} (x)$ , so that, $p_{n} (x) - b \geq 0$ . Then, $A^{- 1} = \int_{l}^{u} p_{n} (x) - b d x$ , and, $B = - A b$ .
(b): For any, x, the functions, $| p_{n} (x) | \geq 0$ , and, ${(p_{n} (x))}_{+} \geq 0$ . Then, $A^{- 1} = \int_{l}^{u} | p_{n} (x) | d x$ , or, $A^{- 1} = \int_{l}^{u} {(p_{n} (x))}_{+} d x$ , respectively.
(c): The linear transformation, $q_{1} (x)$ , was considered in (a). The transformation, $p_{n}^{2} (x) \geq 0$ , and, $A^{- 1} = \int_{l}^{u} p_{n}^{2} (x) d x$ .

□

The polynomial PDFs defined in Lemma 1 can be further constrained by the required number of local minima, maxima, and roots within the interval of support,

(l, u)

. There can also be additional constraints on smoothness expressed in terms of the minimum required polynomial order.

By Bolzano’s theorem, a continuous function having opposite sign values in an interval also has a root between these values. Consequently, a polynomial,

p_{n} (x)

, of order n have at least one maximum or minimum between every two adjacent roots, and there can be a maximum or minimum located at the roots themselves [29]. Moreover, provided that the polynomial is considered over a finite interval, the boundary points of the support interval should be treated as additional roots, i.e., the boundary points can create local maximum or minimum as well as allow additional extrema to exist before the first nearest root. In the case of Form II polynomials, the condition of the first derivative to be zero can be equivalently expressed as

\frac{d}{d x} p_{n} (x) \overset{!}{=} 0 \Leftrightarrow \frac{d}{d x} log p_{n} (x) = \frac{{\dot{p}}_{n} (x)}{p_{n} (x)} = \sum_{i = 1}^{n} \frac{1}{x - r_{i}} \overset{!}{=} 0 .

(5)

However, this approach still requires finding the roots of (5) for every sub-interval,

(r_{i}, r_{i + 1})

,

i = 0, 1, \dots, n

, where

r_{0} \equiv l

and

r_{n + 1} \equiv u

. It may be much easier to find the local extrema by considering the recursion,

p_{n} (x) = \int p_{n - 1} (x) d x = \sum_{i = 0}^{n - 1} \frac{a_{i}}{i + 1} x^{i + 1} + c

(6)

provided that the roots of the polynomial,

p_{n - 1} (x) = a_{n - 1} \prod_{i = 1}^{n - 1} (x - r_{i})

, are known, and,

c = a_{0}

, denotes the constant of integration. These roots can be known by design, i.e., the locations of minima and maxima are selected a priori in a given interval of support. More importantly, in the case of a polynomial PDF, the local maxima represent the modes of such a distribution.

The Form I polynomial PDF can be generalized as

p_{n} (x) = \sum_{i = 0}^{n} a_{i} g^{i} (x), x \in (l, u)

(7)

where

g (x)

is a mathematical expression (i.e., not a transformation). For instance, it is possible to assume polynomials with fractional rather than integer powers of the independent variable [34].

For

g (x) = e^{j ω_{0} x}

,

j = \sqrt{- 1}

,

ω_{0} = 2 π / (u - l) > 0

, the PDF (7) becomes the truncated exponential Fourier series, i.e.,

p_{n} (x) = \sum_{i = 0}^{n} a_{i} e^{j ω_{0} i x}, a_{i} = \frac{1}{u - l} \int_{l}^{u} p_{n} (x) e^{- j ω_{0} i x} d x .

(8)

The corresponding k-th general moments are then computed as

\int_{l}^{u} x^{k} p_{n} (x) d x = \sum_{i = 0}^{n} a_{i} \int_{l}^{u} x^{k} e^{j ω_{0} i x} d x = \sum_{i = 0}^{n} a_{i} {(- 1)}^{k} W^{(k)} (j ω_{0} i)

(9)

where

W (j ω) = \int_{l}^{u} e^{j ω x} d x

is the Fourier transform of a rectangular window located over the interval,

(l, u)

.

For

g (x) = e^{x}

, PDF (7) becomes,

p_{n} (x) = \sum_{i = 0}^{n} a_{i} e^{i x}, x \in (l, u)

(10)

which can be readily integrated, although general statistical moments can only be expressed as special functions.

Consider now the general case of PDF (7), for

n = 2

. Thus, given

p_{2} (x)

and

g (x)

, and positive integers

i_{1}

and

i_{2}

, the task is to find the coefficients

a_{0}

,

a_{1}

and

a_{2}

of the PDF decomposition,

p_{2} (x) = a_{2} g^{i_{2}} (x) + a_{1} g^{i_{1}} (x) + a_{0}, x \in (l, u) .

(11)

Multiplying both sides of (11) by

g^{- i} (x) \dot{g} (x)

and integrating, we obtain,

\int_{l}^{u} p_{2} (x) g^{- i} (x) \dot{g} (x) d x = a_{2} \int_{l}^{u} g^{i_{2}} (x) g^{- i} (x) \dot{g} (x) d x + a_{1} \int_{l}^{u} g^{i_{1}} (x) g^{- i} (x) \dot{g} (x) d x + a_{0} \int_{l}^{g (u)} g^{- i} (x) \dot{g} (x) d x .

(12)

Assuming a substitution,

y = g (x)

, Equation (12) can be rewritten as

\int_{g (l)}^{g (u)} p_{2} (g^{- 1} (y)) y^{- i} d y = a_{2} \int_{g (l)}^{g (u)} y^{i_{2} - i} d y + a_{1} \int_{g (l)}^{g (u)} y^{i_{1} - i} d y + a_{0} \int_{g (l)}^{g (u)} y^{- i} d y .

(13)

Provided that,

g (u) = - g (l) = v

, and

i_{1} > 0

is an odd-integer, and

i_{2} > 0

is an even-integer, then, for

i = i_{1}

and

i = i_{2}

, respectively,

\begin{matrix} \int_{- v}^{v} p_{2} (g^{- 1} (y)) y^{- i_{1}} d y & = a_{2} \underset{= 0}{\underset{︸}{\int_{- v}^{v} y^{i_{2} - i_{1}} d y}} + 2 a_{1} v + a_{0} \underset{= 0}{\underset{︸}{\int_{- v}^{v} y^{- i_{1}} d y}} \\ \int_{- v}^{v} p_{2} (g^{- 1} (y)) y^{- i_{2}} d y & = 2 a_{2} v + a_{1} \underset{= 0}{\underset{︸}{\int_{- v}^{v} y^{i_{1} - i_{2}} d y}} + a_{0} \int_{- v}^{v} y^{- i_{2}} d y \end{matrix}

(14)

and, therefore,

\begin{matrix} a_{1} & = \frac{1}{2 v} \int_{- v}^{v} p_{2} (g^{- 1} (y)) y^{- i_{1}} d y \\ a_{2} & = \frac{1}{2 v} \int_{- v}^{v} (p_{2} (g^{- 1} (y)) - a_{0}) y^{- i_{2}} d y . \end{matrix}

(15)

The offset,

a_{0}

, must be computed from some other constraint, for example, as the minimum value to guarantee a non-negativity of

p_{2} (x)

. Note that the function,

g (x)

, in (11) must be chosen, so the integrals (15) converge.

2.1. Probability Density Transformations

In general, if

g (x)

is an invertible transformation of a random variable, X, having the PDF,

p (x)

, the PDF,

q (x)

, of random output variable,

g (x)

, is, [35]

q (x) = \frac{p (g^{- 1} (x))}{| \dot{g} (g^{- 1} (x))} = p (g^{- 1} (x)) |{\dot{g}}^{- 1} (x)|

(16)

where

\dot{g} (x) = \frac{d}{d x} g (x)

. Assuming

p (x)

is a Form I polynomial PDF,

p_{n} (x)

, the transformed PDF is also a Form I polynomial, i.e.,

q_{n} (x) = | {\dot{g}}^{- 1} (x) | \sum_{i = 1}^{n} a_{i} g^{- i} (x) = \sum_{i = 1}^{n} a_{i} {(\frac{\sqrt[i]{| {\dot{g}}^{- 1} (x) |}}{g (x)})}^{i}

(17)

in the variable,

\sqrt[i]{| {\dot{g}}^{- 1} (x) |} g^{- 1} (x)

.

Assuming a linear transformation,

g_{1} (x) = b_{1} x + b_{0}

, the PDF (17) is also a polynomial PDF of the same order, i.e.,

q_{n} (x) = \sum_{i = 1}^{n} \frac{a_{i}}{| b_{1} | b_{1}^{i}} {(x - b_{0})}^{i} .

(18)

However, the linear transformation changes the support,

(l, u)

of

p_{n} (x)

, to

(b_{1} l + b_{0}, b_{1} u + b_{0})

, if

b_{1} > 0

, and

(b_{1} u + b_{0}, b_{1} l + b_{0})

, if

b_{1} < 0

.

Another example of a non-linear transformation with memory that preserves polynomial form of the resulting distribution is an integrator. In particular, let,

g^{- 1} (x) = \int_{- \infty}^{x} f (u) d u \equiv F (x)

, i.e.,

g (x) = F^{- 1} (x)

, such that,

f (u) \geq 0

, for

\forall u

. Then, substituting into (16), the transformed PDF can be written as

q_{m} (x) = b_{1} f (x) p_{n} (b_{1} F (x) + b_{0})

(19)

where

b_{1} \neq 0

and

b_{0}

are arbitrary real constants. Provided that

f (x)

is a polynomial of order, k,

F (x)

is a polynomial of order,

(k + 1)

(cf. Appendix A) and, thus,

m = n (k + 1) k

. The family of PDFs with a form similar to (19) are considered in [36], which could be investigated also for our case of the polynomial distributions.

Consider now a general case of a polynomial nonlinear transformation,

g_{k} (x)

, and denote as

x_{i}

,

i = 1, 2, \dots, N (y)

, all the roots of,

g_{k} (x) = y

. Then, the PDF (16) is rewritten as

q_{m} (y) = \sum_{j = 1}^{N (y)} \frac{p_{n} (x_{j} (y))}{| \dot{g_{k}} (x_{j} (y)) |}

(20)

i.e., it is a sum of ratios of polynomials, i.e.,

q_{m} (y)

is a polynomial of a certain order, m.

Linear and non-linear transformations of a random variable can be used to change the interval of support of its probability distribution. The following Lemma 2 assumes linear transformations to convert the interval of support,

(l, u)

, into

(- 1, + 1)

and vice versa. Lemma 3 proposes two transformations on how to convert the interval of support,

(- 1, + 1)

, to semi-finite or infinite intervals of support, respectively.

Lemma 2.

The interval of support,

(l, u)

, of a PDF,

p (x)

, is changed to the interval,

(- 1, + 1)

, by a linear transformation,

\frac{2}{u - l} x - \frac{u + l}{u - l}

, which transforms the PDF,

p (x)

, to the PDF,

\frac{u - l}{2} p (\frac{(u - l) x - (u + l)}{2})

. Furthermore, the linear transformation,

\frac{(u - l)}{2} x + \frac{u + l}{2}

, transforms the PDF,

p (x)

, with support,

(- 1, + 1)

, into the PDF,

\frac{2}{u - l} p (\frac{2 x - (u + l)}{u - l})

, with the interval of support,

(l, u)

.

Proof.

The linear transformation,

A x + B

, transforms any PDF,

p (x)

, into the PDF,

{| A |}^{- 1} p (\frac{x - B}{A})

, [35]. □

Lemma 3.

The PDF,

p (x)

, defined over the finite interval of support,

(- 1, + 1)

, can be transformed into the PDF,

{(x^{2} + x + 1 / 4)}^{- 1} p (\frac{2 x - 1}{2 x + 1})

, with semi-infinite support,

(0, + \infty)

, using the non-linear transformation,

\frac{1}{2} (\frac{1 + x}{1 - x})

. Similarly, the non-linear transformation,

atanh (x)

, can be assumed to extend the support to all real numbers, for a PDF,

p (x)

, defined over the support interval,

(- 1, + 1)

. The transformed PDF becomes

\cosh^{- 2} (x) p (\tanh (x))

.

Proof.

The functions,

g (x) = \frac{1}{2} (\frac{1 + x}{1 - x})

, and,

g (x) = atanh (x)

, are increasing, i.e., invertible in the interval,

(- 1, + 1)

. Then, in (20),

N (y) = 1

, the inverse transforms,

\frac{2 x - 1}{2 x + 1}

, and,

\tanh (x)

, and their derivatives,

{(x^{2} + x + 1 / 4)}^{- 1}

, and,

\cosh^{- 2} (x)

, respectively. □

Furthermore, there are scenarios when deterministic values,

x \in (l, u)

, are transformed by a polynomial function,

p_{n} (x)

, having the random coefficients,

a_{i}

. The output value of such a transformation,

y = p_{n} (x) = \sum_{i = 0}^{n} a_{i} x^{i}

(21)

is a random variable. Provided that

a_{i}

are independent and distributed as

f_{a_{i}} (a_{i})

, and conditioned on x, the random variable Y is a sum of independent random variables, so its PDF is given by a multi-fold convolution,

f_{y | x} (y | x) = (\prod_{i = 0}^{n} | x^{- i} |) f_{a_{0}} (y) ⊛ f_{a_{1}} (y / x) ⊛ \dots ⊛ f_{a_{n}} (y / x^{n})

(22)

since,

a_{i} x^{i}

, are distributed as

| x^{- i} | f_{a_{i}} (y / x^{i})

. Furthermore, the conditional mean and variance of Y, respectively, are,

E [Y | X] = \sum_{i = 0}^{n} E [a_{i}] x^{i}, and, var [Y | X] = \sum_{i = 0}^{n} var [a_{i}] x^{2 i} .

(23)

Provided that

a_{i}

have equal means, then for any n and

x \in (- 1, + 1)

,

E [Y | X] > 0

, if

E [a_{i}] > 0

, and

E [Y | X] < 0

, if

E [a_{i}] < 0

. Note also that, if

a_{i}

have equal variances, then the variance of Y is minimized for

x = 0

, and, it is equal to,

var [Y | X = 0] = var [a_{0}]

.

The bounds for the number of real roots of random but sparse polynomials were provided in [37]. A numerical method for efficiently finding the zeros of complex-valued polynomials of very large orders was developed in [38]. Another method for a rapid root finding of polynomials is presented in [39].

2.2. Polynomial PDF Fit of a Histogram

Approximating a continuous function by a polynomial over a finite interval is formalized by the well-known Weierstrass theorem [1]. Runge’s phenomenon occurs when the approximating function must contain predefined points [3]. The polynomial approximation represents the problems of existence as well as the uniqueness of such a polynomial, and also the problem of how to find it. These problems are crucially dependent on the choice of metric for the goodness of approximation.

Hence, consider the problem of approximating a PDF having the finite support by a polynomial PDF. For instance, the empirical histogram can be fitted by a polynomial function, or a known PDF can be approximated by a polynomial in order to achieve mathematical tractability. However, in neither of these cases, the resulting polynomial is guaranteed to satisfy conditions (1), since the polynomial coefficients are normally chosen to obtain the best fit.

The polynomial PDF can be obtained by assuming a polynomial function, which is non-negative over a given interval for any values of its coefficients. One example of such a polynomial is,

p_{n}^{2} (x)

, which has degree,

2 n

. The true PDF,

q (x)

, can be then approximated as

q (x) \approx p_{n}^{2} (x), or, \sqrt{q (x)} \approx p_{n} (x) .

(24)

The latter strategy by first transforming

q (x)

with a square root is numerically more stable. Other invertible transformations of

q (x)

can also be assumed, provided that they yield the non-negative polynomial,

p_{n} (x)

, since scaling

p_{n} (x)

to have a unit area usually does not affect the approximation error significantly.

For instance, the data points,

(x_{i}, \sqrt{y_{i}})

,

l \leq x_{i} \leq u

,

i = 1, 2, \dots, M

, can be interpolated by Lagrange polynomials [5],

L_{i} (x) = \prod_{\binom{j = 1}{i \neq j}}^{M} \frac{x - x_{i}}{x_{i} - x_{j}} .

(25)

Then, the true PDF,

p (x)

, is approximated as

p (x) \approx {(\sum_{i = 1}^{M} \sqrt{y_{i}} L_{i} (x))}^{2} \equiv q_{2 (M - 1)} (x)

(26)

which is a polynomial of order,

2 (M - 1)

. In order to normalize the approximation (26), let,

c_{i j} = \prod_{\binom{j_{1} = 1}{j_{1} \neq i}}^{M} \prod_{\binom{j_{2} = 1}{j_{2} \neq j}}^{M} (x_{i} - x_{j_{1}}) (x_{j} - x_{j_{2}})

(27)

and,

\begin{matrix} s_{i j} = \int_{l}^{u} L_{i} (x) L_{j} (x) d x & = c_{i j}^{- 1} \int_{l}^{u} \prod_{\binom{j_{1} = 1}{j_{1} \neq i}}^{M} \prod_{\binom{j_{2} = 1}{j_{2} \neq j}}^{M} (x - x_{j_{1}}) (x - x_{j_{2}}) d x \\ = c_{i j}^{- 1} \sum_{k = 0}^{2 (M - 1)} a_{k} \int_{l}^{u} x^{k} d x = c_{i j}^{- 1} \sum_{k = 0}^{2 (M - 1)} \frac{a_{k}}{k + 1} (u^{k + 1} - l^{k + 1}) . \end{matrix}

(28)

Then, the area is calculated as

\int_{l}^{u} q_{2 (M - 1)} (x) d x = \sum_{i = 1}^{M} \sum_{j = 1}^{M} \sqrt{y_{i} y_{j}} s_{i j} .

(29)

The Lagrange polynomials satisfy,

q_{2 (M - 1)} (x_{i}) = y_{i}

, so they are subject to Runge’s phenomenon [3]. In particular, the approximation error can be estimated as [3],

p (x) - q_{2 (M - 1)} (x) = \frac{p^{(M + 1)} (\tilde{x})}{(M + 1)!} \prod_{i = 1}^{M} (x - x_{i})

(30)

where

l \leq \tilde{x} \leq u

. Provided that

| p^{(M + 1)} (x) | < \infty

, for

\forall M

, and,

\forall x \in (l, u)

,

q_{2 (M - 1)} (x)

converges uniformly to

p (x)

, i.e.,

{sup}_{x} | p (x) - q_{2 (M - 1)} (x) | = 0

, as

M \to \infty

. The PDF,

p (x)

, is often only known as a sequence of sample points,

(x_{i}, y_{i})

, so the derivatives,

p^{(M + 1)} (x)

, cannot be determined. However, the approximation can be improved by using Chebyshev or extended Chebyshev points instead of equally spaced points [4].

The most common method for fitting a polynomial to a histogram is a linear regression [40]. Denote the vectors,

y = [y_{i}]

,

i = 1, 2, \dots, M

, and,

a = [a_{j}]

,

j = 0, 1, \dots, n

, and the matrix,

X = [x_{i}^{j}]

. The constrained least squares (LS) problem is then formulated as

min_{a} {∥y - X a∥}^{2}, s.t. w^{T} a = 1

(31)

where the weights,

w_{i} = \frac{1}{i + 1} (u^{i + 1} - l^{i + 1})

, assuming the support interval,

(l, u)

. The first derivative of the corresponding Lagrangian is set equal to zero, and the estimated coefficients,

\hat{a}

, of the fitting polynomial,

p_{n} (x)

, are computed as [41],

\begin{matrix} \frac{d}{d a} L (λ) & = 2 X^{T} X a - 2 X^{T} y + λ w^{T} \overset{!}{=} 0 \\ \Rightarrow \hat{a} & = {(X^{T} X)}^{- 1} (X^{T} y + \frac{λ}{2} w^{T}) . \end{matrix}

(32)

In order to approximate a known continuous distribution,

f (x)

, over a finite interval,

(l, u)

, representing the full or truncated support of that distribution, the constrained least-squares (32) can be again used assuming the distribution samples,

f (l + Δ_{x} i)

,

i = 0, 1, \dots

.

If

p_{n} (x)

is the best polynomial fit of a histogram, or of a sampled known PDF, then it must be evaluated whether it is non-negative over the whole support of interest,

(l, u)

, for example, due to Runge’s phenomenon as discussed in Section 2.2. This can be readily and reliably tested by numerically computing the integral,

I_{1} = Im (\int_{l}^{u} \sqrt{p_{n} (x)} d x)

, or,

I_{2} = \int_{l}^{u} p_{n} (x) - | p_{n} (x) | d x

. If

p_{n} (x)

contains negative values within the interval,

(l, u)

, then

I_{1} \neq 0

, and

I_{2} < 0

, respectively. It is also possible to assume a logarithm instead of the square root in the definition of the integral,

I_{1}

.

In this case, the polynomial fitted to a histogram contains negative values, a constant,

d > 0

, can be added to the observed data points, i.e.,

y_{i} \mapsto \frac{y_{i} + d / Δ_{x}}{1 + M d}

, where

Δ_{x} = x_{i + 1} - x_{i}

, and the scaling ensures that,

Δ_{x} \sum_{i = 1}^{M} y_{i} = 1

. Correspondingly, the fitted polynomial is also shifted and scaled as

p_{n} (x) \mapsto \frac{p_{n} (x) + d / Δ_{x}}{1 + d M}

, so

\int_{l}^{u} p_{n} (x) d x = 1

.

Furthermore, the roots of a polynomial,

p_{n} (x)

, can be constrained in order to guarantee that it is non-negative over a finite interval,

(l, u)

. This is formulated in the following theorem.

Theorem 1.

A Form II real-valued polynomial,

p_{n} (x)

, of order n with

a_{n} > 0

and the roots,

r_{1} \leq r_{2} \leq \dots \leq r_{n}

, is non-negative over the interval,

(l, u)

, provided that all its roots satisfy at least one of the following conditions:

(a): a root has even multiplicity;
(b): a root has a complex conjugate pair;
(c): a (real-valued) root is smaller than l;
(d): a real-valued root has odd multiplicity and is larger than u; the number of such roots must be even.

Proof.

Form II polynomial is a product of linear functions,

(x - r_{i})

. Cases (a), (b), and (c) are trivial. Case (d) is a combinatorial problem. The roots with odd multiplicity cannot be smaller than u. Even if these roots are all larger than u, then their number must be even in order for their negative parts to cancel out for all values smaller than u. □

Corollary 1.

A Form II real-valued polynomial,

p_{n} (x)

, of order n with

a_{n} > 0

and the roots,

r_{1} \leq r_{2} \leq \dots \leq r_{n}

, has negative values in the interval,

(l, u)

, provided that there is an odd-number of real-valued roots with odd multiplicities that are greater than l, or, there is an even number of real-valued roots with odd multiplicities, and at least one root is located between l and u.

Theorem 1 can also be used for Form I polynomials, provided that they are converted to Form II as indicated in Appendix B. Even though the roots cannot be obtained analytically for polynomials of order

n > 4

(Abel-–Ruffini’s theorem), it may be sometimes possible to consider a product,

\prod_{j} p_{n_{j}} (x)

, of polynomials of orders,

n_{j} \leq 4

, for

\forall j

.

2.3. Piecewise Polynomial PDF

In some applications, a piecewise polynomial curve fitting can be assumed. In particular, the following construction is proposed to fit a set of

(M + 1)

points,

(x_{i}, y_{i})

,

i = 1, \dots, (M + 1)

,

x_{i} < x_{i + 1}

, and

y_{i} \geq 0

, representing local extrema of a histogram, or of a known PDF. The construction yields a piecewise polynomial PDF,

p_{n} (x)

, of the same order n, over the interval,

(l, u)

, with

l = x_{1}

and

u = x_{M + 1}

, such that, exactly,

p_{n} (x_{i}) = y_{i}

. The data points,

(x_{i}, y_{i})

, are referred to as control points of the piecewise polynomial

p_{n}

. It should be noted that univariate piecewise polynomial functions are generally referred to as splines, and their control points as knots.

The following construction defines a piecewise polynomial PDF by sample points representing an alternating sequence of local maxima and minima. The points between the subsequent extrema are then interpolated by increasing or decreasing polynomial segments with a defined continuity order between these segments. As long as the minima are non-negative, all segments are non-negative. However, the resulting curve must be normalized, so that it has a unit area, i.e., it is a PDF.

Definition 2.

Let

p_{n} (x)

be piecewise continuous, and composed of M non-overlapping polynomial segments,

q_{(i) n} (x)

, i.e.,

p_{n} (x) = A \sum_{i = 1}^{M} w_{i} q_{(i) n} (x) .

(33)

The segments,

q_{(i) n} (x)

, are increasing, i.e.,

q_{(i) n}^{'} (x) > 0

, over their support intervals,

(x_{i}, x_{i + 1})

. The points,

x_{i}

, define the local minima and maxima, such that, if

y_{i}

is a local minimum, then

y_{i + 1}

is a local maximum and vice versa. The weights,

w_{i} = + 1

, if

y_{i}

is a local minimum, and

w_{i} = - 1

, if

y_{i}

is a local maximum. The constant,

A > 0

, is chosen, so that,

p_{n} (x)

, integrates to unity over its interval of support,

(x_{1}, x_{M + 1})

. In addition, a continuity (smoothness) of order C requires that the first C derivatives,

lim_{ϵ \to 0^{+}} p_{n}^{(k)} (x + ϵ) = lim_{ϵ \to 0^{+}} p_{n}^{(k)} (x - ϵ), \forall x \in (x_{1}, x_{M + 1}), a n d, \forall k = 0, 1, \dots, C

(34)

which needs to be true at all points of the local minima and maxima, i.e.,

lim_{ϵ \to 0^{+}} q_{(i) n}^{(k)} (x_{i + 1} - ϵ) = lim_{ϵ \to 0^{+}} q_{(i + 1) n}^{(k)} (x_{i + 1} + ϵ), i = 1, \dots, M - 1 .

(35)

In order to construct the desired segment polynomials,

q_{(i) n} (x)

, consider two increasing polynomials,

u_{n} (x) = \sum_{i = 0}^{n} a_{i} x^{i}

, and,

v_{n} (x) = \sum_{i = 0}^{m} b_{i} x^{i}

, such that, for some

x_{0}

, the derivatives,

\begin{matrix} u_{n}^{(k)} (x_{0}) & = - v_{n}^{(k)} (x_{0}), k = 0, 1, \dots, C \\ \sum_{i = k}^{n} a_{i} x_{0}^{i - k} & = - \sum_{i = k}^{m} b_{i} x_{0}^{i - k} \end{matrix}

(36)

or, in matrix notation,

\underset{X_{C} (x_{0})}{\underset{︸}{[\begin{matrix} x_{0}^{n} & x_{0}^{n - 1} & \dots & x_{0} & 1 \\ x_{0}^{n - 1} & x_{0}^{n - 2} & \dots & 1 & 0 \\ ⋱ & ⋱ \\ x_{0}^{n - C} & \dots & 1 & 0 & 0 \end{matrix}]}} \cdot \underset{a}{\underset{︸}{[\begin{matrix} a_{n} \\ a_{n - 1} \\ ⋮ \\ a_{0} \end{matrix}]}} = - [\begin{matrix} x_{0}^{m} & x_{0}^{m - 1} & \dots & x_{0} & 1 \\ x_{0}^{m - 1} & x_{0}^{m - 2} & \dots & 1 & 0 \\ ⋱ & ⋱ \\ x_{0}^{m - C} & \dots & 1 & 0 & 0 \end{matrix}] \cdot \underset{b}{\underset{︸}{[\begin{matrix} b_{m} \\ b_{m - 1} \\ ⋮ \\ b_{0} \end{matrix}]}} .

(37)

For

m = n

, Equation (37) can be rewritten as

X_{C} (x_{0}) (a + b) = 0

(38)

so the coefficients

a

and

b

are in the null space of

X_{C} (x_{0})

.

Provided that

a_{(i)}

denotes the coefficients of

q_{(i) n} (x) = \sum_{i = 0}^{n} a_{i} x^{i}

, it is required that,

\begin{matrix} X_{C} (x_{2}) (a_{(1)} + a_{(2)}) & = 0 \\ X_{C} (x_{3}) (a_{(2)} + a_{(3)}) & = 0 \\ ⋮ \\ X_{C} (x_{M}) (a_{(M - 1)} + a_{(M)}) & = 0 . \end{matrix}

(39)

Note that the matrices,

X_{C} (x_{i})

, are computed assuming the control points,

x_{i}

.

Given the first vector of coefficients,

a_{(1)}

, the other coefficient vectors,

a_{(i)}

,

i = 2, 3, \dots, M - 1

, can be computed iteratively using the underdetermined sets of Equation (39). The numerical feasibility of this problem requires that the order,

n ≫ C

.

The vector,

a_{(1)}

, must be selected, so that

q_{(1) n} (x_{1}) = y_{1}

, and,

q_{(1) n} (x_{M + 1}) = y_{M + 1}

, and

q_{(1) n} (x) > 0

is C-continuous for

x \in (x_{1}, x_{2})

. Let sample

q_{(1) n} (x)

at K equidistant points between

x_{1}

and

x_{2}

. The coefficients

a_{(1)}

are then the solution of the quadratic program,

\begin{matrix} min 〈a_{(1)}, a_{(1)}〉 \\ s . t . 〈X_{0} (x_{1}), a_{(1)}〉 = y_{1}, 〈X_{0} (x_{2}), a_{(1)}〉 = y_{2} \\ 〈w_{1} X_{0} Δ_{1}, a_{(1)}〉 > 0 \end{matrix}

(40)

where

X_{0} (x) = [x^{n}, x^{n - 1}, \dots, x, 1]

, and

Δ_{1} = (x_{2} - x_{1}) / (K - 1)

is the equidistant sampling step. The last condition in (40) enforces

q_{(1) n} (x)

to be approximately increasing between the points

x_{1}

and

x_{2}

. The other coefficients,

a_{(i)}

,

i > 1

, are computed similarly to the program (40), but with an additional constraint due to (39). The extended quadratic program to compute these coefficients is defined as

\begin{matrix} min 〈a_{(i)}, a_{(i)}〉 \\ s . t . 〈X_{0} (x_{i}), a_{(i)}〉 = y_{i}, 〈X_{0} (x_{i + 1}), a_{(i)}〉 = y_{i + 1} \\ 〈w_{i} X_{0} Δ_{i}, a_{(i)}〉 > 0 \\ X_{C} (x_{i}) (a_{(i - 1)} + a_{(i)}) = 0 \end{matrix}

(41)

where

Δ_{i} = (x_{i + 1} - x_{i}) / (K - 1)

, and

i = 2, 3, \dots, M

.

More importantly, quadratic programs (40) and (41) require that the constraints are sufficiently underdetermined, i.e.,

n ≫ C

, otherwise, the solution may be difficult to find, or even may not exist. Moreover, the solution is less numerically stable for a linear program than for a quadratic program and, therefore, the quadratic programs should be considered.

3. Derived Characteristics of a Polynomial Distribution

The cumulative distribution function (CDF) can be readily obtained for Form I polynomial PDF as shown in Appendix A, i.e.,

\begin{matrix} P_{n} (x) & = \int_{l}^{x} p_{n} (v) d v = \sum_{i = 0}^{n} \frac{a_{i}}{i + 1} (x^{i + 1} - l^{i + 1}), x \in (l, u) \\ = x \sum_{i = 0}^{n} \frac{a_{i}}{i + 1} x^{i} - l \sum_{i = 0}^{n} \frac{a_{i}}{i + 1} l^{i} = x {\tilde{p}}_{n} (x) - l {\tilde{p}}_{n} (l) . \end{matrix}

(42)

Note also that, for symmetric support interval, when

u = - l

,

P_{n} (u) = \sum_{i = 0}^{n} \frac{a_{i}}{i + 1} (u^{i + 1} - {(- u)}^{i + 1})

, so the normalization of PDF to unity is only affected by even-index coefficients,

a_{i}

.

In the case of a Form II polynomial PDF, it is best to convert it to Form I first as shown in Appendix B.

The median (

q = 1 / 2

), and more generally, the quantile,

X_{q}

, of a polynomial distribution is defined as

P_{n} (X_{q}) = X_{q} {\tilde{p}}_{n} (X_{q}) - l {\tilde{p}}_{n} (l) \overset{!}{=} q, 0 < q < 1 .

(43)

Denoting,

P_{0} (q) = l {\tilde{p}}_{n} (l) + q

, the quantile is the unique root of the polynomial,

x {\tilde{p}}_{n} (x) - P_{0} (q) = a (q) (x - X_{q})

.

The expressions for general moments and characteristic or moment-generating functions are derived in Appendix A, Appendix B and Appendix C, respectively.

The Kullback–Leibler (KL) divergence or relative entropy between two polynomial distributions,

p_{n} (x) = \sum_{i = 0}^{n} a_{i} x^{i} = a_{n} \prod_{i = 1}^{n} (x - r_{i})

, and,

q_{m} (x) = b_{m} \prod_{j = 1}^{m} (x - s_{j})

, is defined as

\begin{matrix} KL (p_{n} ∥ q_{m}) & = \int_{l}^{u} p_{n} (x) log \frac{p_{n} (x)}{q_{m} (x)} d x = \int_{l}^{u} \sum_{i = 0}^{n} a_{i} x^{i} log \frac{a_{n} \prod_{l = 1}^{n} (x - r_{l})}{b_{m} \prod_{j = 1}^{m} (x - s_{j})} d x \\ = \sum_{i = 0}^{n} a_{i} \int_{l}^{u} x^{i} (log \frac{a_{n}}{b_{m}} + \sum_{l = 1}^{n} log (x - r_{l}) - \sum_{j = 1}^{m} log (x - s_{j})) d x \\ = \sum_{i = 0}^{n} a_{i} (\frac{u^{i + 1} - l^{i + 1}}{i + 1} log \frac{a_{n}}{b_{m}} + \sum_{l = 1}^{n} \int_{l}^{u} x^{i} log (x - r_{l}) d x - \sum_{j = 1}^{m} \int_{l}^{u} x^{i} log (x - s_{j}) d x) . \end{matrix}

(44)

The inner integral,

\int_{l}^{u} x^{i} log (x - r_{l}) d x

, can be expressed in terms of the hypergeometric,

_{2} F_{1}

, functions with the help of, for example, Mathematica software.

Differential entropy of a polynomial distribution,

p_{n} (x)

, is defined as

\begin{matrix} H (p_{n}) & = - \int_{l}^{u} p_{n} (x) log p_{n} (x) d x = - \int_{l}^{u} \sum_{i = 0}^{n} a_{i} x^{i} log a_{n} \prod_{i = 1}^{n} (x - r_{i}) d x \\ = \sum_{i = 0}^{n} a_{i} (\frac{log a_{n}}{i + 1} (u^{i + 1} - l^{i + 1}) + \sum_{l = 1}^{n} \int_{l}^{u} x^{i} log (x - r_{l}) d x) . \end{matrix}

(45)

Finally, the sum,

Z = X + Y

, of two independent random variables, X, and, Y, having the polynomial distributions,

p_{n} (x)

, and,

q_{m} (y)

, respectively, with the same interval of support,

(l, u)

, also has the polynomial distribution,

f_{m} (z)

, given by the convolution,

\begin{matrix} f_{m} (z) & = \int_{l}^{u} p_{n} (x) q_{m} (z - x) d x = \int_{l}^{u} \sum_{i = 0}^{n} a_{i} x^{i} \sum_{j = 0}^{m} b_{j} {(z - x)}^{j} d x \\ = \sum_{i = 0}^{n} \sum_{j = 0}^{m} a_{i} b_{j} \int_{l}^{u} x^{i} {(z - x)}^{j} d x = \sum_{i = 0}^{n} \sum_{j = 0}^{m} a_{i} b_{j} \sum_{k = 0}^{j} (\binom{j}{k}) z^{j - k} \int_{l}^{u} x^{i + k} d x \\ = \sum_{i = 0}^{n} \sum_{j = 0}^{m} a_{i} b_{j} \sum_{k = 0}^{j} (\binom{j}{k}) z^{j - k} \frac{1}{i + k + 1} (u^{i + k + 1} - l^{i + k + 1}) \\ = \sum_{i = 0}^{n} \sum_{j = 0}^{m} \sum_{k = 0}^{j} (\binom{j}{k}) \frac{a_{i} b_{j}}{i + k + 1} (u^{i + k + 1} - l^{i + k + 1}) z^{j - k} . \end{matrix}

(46)

4. Estimation Problems Involving Polynomial Distributions

The parameter estimation of the polynomial distributions is subject to the following equality and inequality constraints, respectively,

\begin{matrix} a^{T} X_{0} - 1 & = 0 \\ - a^{T} X_{k} & \leq 0, k = 1, 2, \dots, K \end{matrix}

(47)

where the column vectors,

X_{0} = [(u^{i + 1} - l^{i + 1}) / (i + 1)]

, and,

X_{k} = [{(l + (k - 1) (u - l) / (K - 1))}^{i}]

,

i = 0, 1, \dots, n

. The equality constraint guarantees that the estimated polynomial integrates to unity. The second constraint requires that the estimated polynomial is non-negative at K equidistant points within the interval of support,

(l, u)

. The value of K must be determined empirically.

Consider the problem of estimating the coefficients,

a

, of a polynomial PDF,

p_{n} (x; a)

. For M independent measurements,

x_{m}

, the likelihood function is,

L (x; a) = \prod_{m = 1}^{M} p_{n} (x_{m}; a) = \prod_{m = 1}^{M} \sum_{i = 0}^{n} a_{i} x_{m}^{i} = \prod_{m = 1}^{M} a^{T} x_{m}

(48)

where the column vector,

x_{m} = [x_{m}^{i}]

,

i = 0, 1, \dots, n

. The Karush–-Kuhn-–Tucker (KKT) function representing the constrained maximum (log-) likelihood (ML) estimation is [42],

K (a; x; {μ_{k}}) = log L (x; a) + μ_{0} (a^{T} X_{0} - 1) + \sum_{k = 1}^{K} μ_{k} a^{T} X_{k}

(49)

where

μ_{k} \leq 0

, and,

μ_{k} a^{T} X_{k} = 0

, for

\forall k \geq 1

. The first vector-derivative of

K

by

a

[41] must be equal to zero, i.e.,

\frac{\partial}{\partial a} K (a; x; {μ_{k}}) \overset{!}{=} 0 \Leftrightarrow \frac{\partial}{\partial a} \sum_{m = 1}^{M} log (a^{T} x_{m}) = \sum_{m = 1}^{M} \frac{x_{m}^{T}}{a^{T} x_{m}} = - \sum_{k = 0}^{K} μ_{k} X_{k}^{T} .

(50)

Expression (50) together with constraint (47) represent the set of

(n + K + 2)

non-linear equations with the same number of unknowns, which must be solved numerically.

The cosine theorem can be assumed instead of logarithm in maximizing (49) and (50), since,

\underset{a}{argmax} \prod_{m = 1}^{M} a^{T} x_{m} = \underset{a}{argmax} \prod_{m = 1}^{M} \frac{a^{T}}{∥a∥} \frac{x_{m}}{∥x_{m}∥} = \underset{a}{argmax} \prod_{m = 1}^{M} cos ϕ_{m}

(51)

where

ϕ_{m}

is the angle between the vectors

a

and

x_{m}

. The unconstrained maximization of (51) can be performed using the geometric-arithmetic mean inequality [43]. Specifically, the likelihood (51) is maximized when the coefficient vector,

a

, is aligned with all the observations,

x_{m}

. It can be approximated by assuming that the distances between the normalized vectors,

a^{T} / ∥a∥

, and,

x_{m} / ∥x_{m}∥

, are constant, for

\forall m

. Then, the estimate is the centroid of observations, i.e.,

\hat{a} = \frac{1}{M} \sum_{m = 1}^{M} x_{m} / ∥x_{m}∥

.

The estimator complexity can be greatly reduced if constraint (47) is ignored. In this case, the estimated coefficients,

\hat{a}

, may not satisfy the PDF conditions (1). Assuming that the estimate,

\hat{a}

, is not too far from the true vector,

a

, the first estimate can be improved by the subsequent estimator,

\hat{\hat{a}} = \underset{a}{argmin} ∥a - \hat{a}∥ s . t . constraints (47) .

(52)

The estimator (52) minimizes the distance between the first estimate,

\hat{a}

, and the subsequent estimate,

\hat{\hat{a}}

, under constraint (47). More importantly, the corresponding set of equations to solve (52) is now linear.

In the sequel, we drop the inequality constraints in (47) since closed-form expressions can be obtained with the equality constraint. For

M = 2

, or, equivalently, only two out of M measurements are considered at a time, the ML estimator can be constrained as in (31), i.e.,

{\hat{a}}_{12} = \underset{a}{argmax} a^{T} X_{12} a, s.t. w^{T} a = 1

(53)

where

x_{i} = [x_{i}^{j}]

,

j = 0, 1, \dots, n

, and

X_{12} = x_{1} x_{2}^{T}

is a

(n + 1) \times (n + 1)

square matrix. The first derivative of the corresponding Lagrangian must be equal to zero, i.e., [41]

\begin{matrix} \frac{\partial}{\partial a} (a^{T} X_{12} a + λ (w^{T} a - 1)) \overset{!}{=} 0 \\ \Rightarrow a = - \frac{λ}{2} X_{12}^{- 1} w, λ = \frac{- 2}{w^{T} X_{12}^{- 1} w} . \end{matrix}

(54)

Consequently, the ML estimate is,

{\hat{a}}_{12} = \frac{X_{12}^{- 1} w}{w^{T} X_{12}^{- 1} w}

(55)

and its likelihood is equal to,

{(w^{T} X_{12}^{- 1} w)}^{- 1}

. Finally, the observation pairs,

(x_{1}, x_{2})

,

(x_{3}, x_{4})

, …, are independent, so the final ML estimate is,

\hat{a} = \frac{2}{M} \sum_{i = 1}^{M / 2} \frac{X_{2 i - 1, 2 i}^{- 1} w}{w^{T} X_{2 i - 1, 2 i}^{- 1} w} .

(56)

The Cramer–Rao bound has been defined to lower-bound the covariance matrix of the estimation error,

\hat{a} - a

, of any unbiased estimator, i.e.,

cov [\hat{a} - a] \overset{E [\hat{a}] = a}{=} var [\hat{a}] \geq J^{- 1} (a)

(57)

where

J (a)

is the Fisher information matrix. In order to calculate the elements of this matrix, it is better to assume Form II of the polynomial distribution,

p_{n} (x; r) = a_{n} \prod_{i = 1}^{n} (x - r_{i})

, and the problem of estimating the parameters,

r

, i.e.,

\begin{matrix} {[J]}_{i j} & = E [(\frac{\partial}{\partial r_{i}} log p_{n} (x; r)) (\frac{\partial}{\partial r_{j}} log p_{n} (x; r))] \\ = E [(\frac{\partial}{\partial r_{i}} log (x - r_{i})) (\frac{\partial}{\partial r_{j}} log (x - r_{j}))] \\ = E [\frac{1}{(x - r_{i})} \frac{1}{(x - r_{j})}] = \int_{l}^{u} a_{n} \prod_{\begin{matrix} k = 1 \\ k \neq i, j \end{matrix}}^{n} (x - r_{k}) d x . \end{matrix}

(58)

The last integral in (58) can be computed by converting the Form II polynomial into Form I.

The coefficients,

a

, can also be estimated by the method of moments [7,44]. In particular, the k-th general moment of a polynomial distribution,

p_{n} (x)

, is, (cf. Appendix A)

M_{k} = \int_{l}^{u} x^{k} \sum_{i = 0}^{n} a_{i} x^{i} d x = \sum_{i = 0}^{n} \frac{a_{i}}{i + k + 1} (u^{i + k + 1} - l^{i + k + 1}) .

(59)

Observing a vector of the first K general moments,

M = [M_{k}]

,

k = 1, 2, \dots, K

, and pre-computing the matrix,

B = [{(i + k + 1)}^{- 1} (u^{i + k + 1} - l^{i + k + 1})]

,

i = 0, 1, \dots, n

, and ignoring non-negativity constraint in (47), the estimation can be again defined as the constrained or unconstrained least-square regression, i.e.,

\hat{a} = \underset{a}{argmin} ∥M - B a∥, s.t. w^{T} a = 1

(60)

which can be efficiently solved as in (32).

5. Numerical Examples

This section briefly explores the numerical properties of the polynomial distributions by means of several examples, since the practical approximations of functions is a rather extensive subject [45]. The examples are presented in Section 5.1, Section 5.2, Section 5.3 and Section 5.4, including constructing a piecewise polynomial PDF, generating samples of a polynomially distributed random variable, estimating the parameters of a polynomial PDF from observed random samples, and approximating a given PDF by a polynomial PDF.

5.1. Constructing Piecewise Polynomial PDF

The first example in Figure 1 demonstrates the construction of a piecewise polynomial PDF as described in Section 2.3. Given the continuity order

C = 2

, the given set of 5 control points can be connected by 4 increasing or decreasing polynomial functions of order

n = 5

or

n = 6

. For polynomial orders smaller than 5 or larger than about 8, it was observed that Runge’s phenomenon starts to appear, which indicates that the polynomial order cannot be chosen completely arbitrarily.

5.2. Generating Random Samples Using Polynomial PDF

Consider now the methods how to generate polynomially distributed random samples. Similar to most other distributions, polynomial distributions cannot be easily inverted. It is then challenging to use the inverse method for generating random variables. On the other hand, the CDF of a polynomially distributed random variable is another polynomial as shown in Appendix A. A CDF discretization can be used as a general strategy for implementing the inverse method of generating random samples from the distribution with a known CDF. In particular, consider approximating the CDF by a piecewise linear function between the samples,

(x_{i}, F (x_{i}))

,

i = 1, 2, \dots

. The inverse value,

X = F^{- 1} (U)

, where

U \in (0, 1)

is a uniformly distributed random variable, is then approximated by a piecewise linear function as

x = x_{i} + \frac{(x_{i + 1} - x_{i}) (u - F (x_{i}))}{F (x_{i + 1} - F (x_{i}))} .

(63)

Moreover, polynomial distributions can also be assumed to obtain a proposal distribution for the rejection and importance of sampling methods. Assuming the latter, denote the expected value,

θ = E_{q} [g (x)]

, assuming a random observation, X, from a complex distribution,

q (x)

. The mean can be computed by assuming instead the distribution,

p (x)

, as

θ = \int_{l}^{u} g (x) \frac{q (x)}{p (x)} p (x) d x = E_{p} [g (x) \frac{q (x)}{p (x)}] .

(64)

The corresponding difference in variances is,

{var}_{q} [g (x)] - {var}_{p} [g (x) \frac{q (x)}{p (x)}] = \int_{l}^{u} g^{2} (x) (1 - \frac{q (x)}{p (x)}) q (x) d x .

(65)

Consequently, the better

p (x)

approximates

g (x) q (x)

, the larger the variance reduction by sampling from

p (x)

instead of

q (x)

. It is easy to show that, if

p (x) = g (x) q (x) / θ

, then

{var}_{p} [g (x) \frac{q (x)}{p (x)}] = 0

. Consequently and importantly, it is likely that the flexibility of the polynomial PDF,

p_{n} (x)

, allows approximating

g (x) q (x)

with much better accuracy than any other canonical distribution.

Figure 2 assumes the piecewise polynomial distribution,

p_{5} (x)

, designed in the example in Figure 1 in order to generate samples from the target distribution,

q (x)

. The distribution,

q (x)

, is chosen to be a mixture of two truncated skewed-Gaussian distributions [46,47]. In particular, the rejection sampling method requires that,

q (x) \leq A p_{5} (x)

, for some

A > 0

and

\forall x

. Note that the third and the fifth (counting from left) control points were moved slightly above zero in Figure 2 (and already also in Figure 1) in order to satisfy the required inequality.

In the sequel, the examples assume the polynomial PDF,

p_{4} (x) = \frac{75}{896} (- \frac{x^{4}}{30} + \frac{x^{3}}{5} + \frac{x^{2}}{10} - \frac{26 x}{15} + 2), x \in (- 3, 5) .

(66)

The random samples from this distribution were generated by discretizing the support interval,

(- 3, 5)

, into 30 equally sized bins. The distribution and the empirical histogram for

10^{4}

generated samples are shown in Figure 3.

5.3. Estimating Parameters of Polynomial PDF

Estimating the coefficients of the polynomial PDF must be constrained in order to satisfy the PDF conditions (1). The numerical experiments revealed that ML methods described in Section 4 are generally less suitable for estimating longer vectors of parameters. The ML methods require a relatively large number of observations, which quickly leads to the accumulation of numerical errors. On the other hand, non-parametric methods, such as the method of moments and the LS fitting appear to be much more robust, and they work well, even if the number of observations is relatively small.

Figure 4 shows the absolute error between the PDF (66) and the PDF,

{\hat{p}}_{4} (x)

, estimated using the method of moments with 500 data samples. It can be observed that at least the first

K = 5

moments are required to achieve a good estimation accuracy.

Figure 5 shows the distributions of the estimated coefficients for the PDF (66) using a constrained LS fit of the histogram with 10 bins and 50 data samples. The little triangles at the sides of the boxes in Figure 5 indicate the true values of the coefficients. Interestingly, the smallest accuracy in terms of the largest variance occurs for the lower-order coefficients,

a_{1}

, and,

a_{0}

. If the polynomial order, n, is not known a priori, it can be determined by plotting the

GoF

measure (62) as shown in Figure 6. It can be observed that the value of

GoF

drops sharply when n is increased to 4, and then it remains nearly constant even if n is increased further.

5.4. Approximating PDF by Polynomial PDF

Even if the closed-form expression of a given PDF is available, the expression may be too complex to use in mathematical derivations [46,47]. In this case, a polynomial approximation of the PDF can be assumed. The approximation of a given PDF by a polynomial PDF is studied in Figure 7. In particular, Figure 7 compares the ordinary (unconstrained) LS fit (panel A), and the constrained LS fit (panel B) in order to satisfy the PDF constraints (1). It can be observed that assuming the constraints suppresses Runge’s phenomenon. However, in all cases considered, the polynomial PDF struggles to approximate the peak value of the given PDF, even when the polynomial order is increased.

Finally, the approximation of a given PDF by Lagrange polynomials defined in Section 2.2 is investigated in Figure 8. Recall that Lagrange approximation is an interpolating polynomial function having the order given by the number of sampling points. Specifically, the left panel (A) in Figure 8 assumes equidistant sampling points, whereas the right panel (B) in Figure 8 shows the approximations with Chebyshev sampling points [4]. It can be observed that the latter case is clearly more accurate including the peak of the given PDF. More importantly, for the PDF example in Figure 8, the polynomial PDF approximation consisting of Lagrange polynomials is much easier to compute than performing the constrained LS curve fitting or using other statistical estimation methods, such as the method of moments.

5.5. Summary of Observations

In general, the ML estimation of a vector of parameters of a polynomial distribution becomes numerically problematic when the polynomial orders are larger than only about 2 (and because numerical errors quickly accumulate with the number of observations). On the other hand, non-parametric and likelihood-free methods appear to be much more robust and numerically stable. For instance, the method of moments and the constrained LS fit can estimate the parameters of polynomial distributions efficiently, even if the number of observations is relatively small.

It is always useful to inspect the estimated polynomial coefficient values since some of those values can be near zero. The corresponding coefficients can then be rounded to zero, i.e., removed. Provided that these coefficients are the ones with the largest indices, the order of the estimated polynomial is reduced.

If the likelihood is not computed, the AIC cannot be used to determine the best polynomial order. In this paper, we propose defining the goodness of the polynomial fit as the squared Euclidean distance between the empirical and estimated general moments, which are penalized by the number of parameters to be estimated.

The Lagrange interpolation yields a polynomial function of the same order as the number of distribution samples. This method is usually less accurate when the samples are equidistant. Better accuracy can be achieved with Chebyshev or extended Chebyshev samples. Consequently, the bins of a histogram should be optimized rather than assumed with equal sizes in order to improve the interpolation accuracy.

In some cases, Runge’s phenomenon can be avoided or mitigated by extending the given finite support interval with zero samples since oscillations of the approximating function tend to concentrate near the approximation interval boundaries.

6. Conclusions

Polynomials are often used for approximating univariate and multivariate functions, including probability distributions in order to enable mathematical tractability and simulation efficiency. This paper defined polynomial distributions, which can also be used to approximate other canonical and empirically estimated distributions over a finite interval of support. Polynomial distributions can be considered as more flexible alternatives to commonly used canonical distributions. More importantly, in this paper, many key properties, as well as limitations of the polynomial distributions, were derived and presented as closed-form expressions.

There is a need for defining a family of distributions, such as polynomial distributions that can be used more universally for solving problems in probability, statistics, and data analysis. The polynomial distributions considered in this paper are univariate and continuous; the extension to multivariate and discrete polynomial distributions may be the subject of future work. Polynomials could be generalized as weighted linear sums of non-linear functions in the same variables. A number of research problems remain open. For example, given a polynomial, it would be useful to algebraically identify all sub-intervals where it is non-negative. Or, given a polynomial order and an interval of support, the task is to find all polynomials that represent a PDF. This problem can be further constrained by the desired number of modes, the smoothness and/or sparsity conditions, and assuming other statistical and algebraic properties of the polynomial distributions. Moreover, the problem of determining the minimum polynomial degree or sparsity to satisfy the given constraints was not fully considered in this paper. It would be very useful to investigate how to interpret polynomial distributions, especially as they may arise naturally when observing some stochastic phenomena.

Author Contributions

Conceptualization, P.L.; Formal analysis, Y.Y. and P.L.; Investigation, Y.Y. and P.L.; Supervision, P.L.; Project administration, P.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a research grant from Zhejiang University.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Basic Properties of Form I Polynomials

Definition:

$p_{n} (x) = \sum_{i = 0}^{n} a_{i} x^{i}, a_{i} \in R, a_{n} \neq 0$

(A1)
Roots, $n = 1$ :

$p_{1} (x) = 0 \Leftrightarrow x_{1} = - \frac{a_{0}}{a_{1}}$

(A2)
Roots, $n = 2$ :

$p_{2} (x) = 0 \Leftrightarrow \begin{matrix} x_{1, 2} = \frac{- a_{1} \pm \sqrt{a_{1}^{2} - 4 a_{2} a_{0}}}{2 a_{2}} & a_{1}^{2} > 4 a_{2} a_{0} \\ x_{1} = x_{2} = - \frac{a_{1}}{2 a_{2}} & a_{1}^{2} = 4 a_{2} a_{0} \\ x_{1, 2} \notin R & a_{1}^{2} < 4 a_{2} a_{0} \end{matrix}$

(A3)
Roots, $n = 3$ :

$p_{3} (x) = 0 \Leftrightarrow \begin{matrix} x_{1} = \sqrt[3]{- \frac{b}{2} + \sqrt{\frac{b^{2}}{4} + \frac{a^{3}}{27}}} + \sqrt[3]{- \frac{b}{2} - \sqrt{\frac{b^{2}}{4} + \frac{a^{3}}{27}}}, x_{2, 3} \notin R & D < 0 \\ x_{1} = x_{2} = x_{3} = - \frac{a_{2}}{3 a_{3}} & D = 0, a_{2}^{2} = 3 a_{3} a_{1} \\ x_{1} = x_{2} = \frac{9 a_{3} a_{0} - a_{2} a_{1}}{2 (a_{2}^{2} - 3 a_{3} a_{1})}, x_{3} = \frac{4 a_{3} a_{2} a_{0} - 9 a_{3}^{2} a_{0} - a_{2}^{3}}{a_{3} (a_{2}^{2} - 3 a_{3} a_{1})} & D = 0, a_{2}^{2} \neq 3 a_{3} a_{1} \\ x_{k} = - \frac{1}{3 a} (b + ξ^{k - 1} C + \frac{Δ_{0}}{ξ^{k - 1} C}), k = 1, 2, 3 & D > 0 \end{matrix}$

(A4)

where

$a = \frac{- a_{2}}{3 a_{3}}, b = a^{3} + \frac{a_{2} a_{1} - 3 a_{3} a_{0}}{6 a_{3}^{2}}, D = \frac{4 (a_{2}^{2} - 3 a_{3} a_{1}) - (2 a_{2}^{3} - 9 a_{3} a_{2} a_{1} + 27 a_{3}^{2} a_{0}^{2})}{27 a_{3}^{2}}$

$Δ_{0} = a_{2}^{2} - 3 a_{3}, Δ_{1} = 2 a_{2}^{3} - 9 a_{3} a_{2} a_{1} + 27 a_{3}^{2} a_{0}, C = \sqrt[3]{\frac{Δ_{1} \pm \sqrt{Δ_{1}^{2} - 4 Δ_{0}^{3}}}{2}}, ξ = \frac{- 1 + \sqrt{- 3}}{2}$
Roots, general case:
- By Abel–-Ruffini’s theorem, closed-form expressions for roots of a polynomial exist for polynomials of degree at most $n = 4$ , and there is no algebraic solution for the polynomial roots for degree $n > 4$ .
- The total number of real roots of a polynomial within a given interval or overall real numbers can be determined by Sturm’s theorem.
- Other relationships between polynomial coefficients and roots can be obtained, such as the Vieta’s formulas:
  
  $\sum_{i_{1} \neq i_{2} \neq \dots \neq i_{k}} r_{i_{1}} r_{i_{2}} \dots r_{i_{k}} = {(- 1)}^{i} \frac{a_{n - i}}{a_{n}}, k \leq i = 1, 2, \dots, n$
  
  (A5)
Indefinite integral:

${\tilde{p}}_{n} (x) \equiv \int p_{n} (x) d x = \sum_{i = 0}^{n} \frac{a_{i}}{i + 1} x^{i + 1}$

(A6)
Definite integral:

$P_{n} (u) \equiv \int_{- \infty}^{u} p_{n} (x) d x$

(A7)

$\begin{matrix} \int_{l}^{u} p_{n} (x) d x & = \int_{- \infty}^{u} p_{n} (x) d x - \int_{- \infty}^{l} p_{n} (x) d x = P_{n} (u) - P_{n} (l), l < u \\ = \sum_{i = 0}^{n} \frac{a_{i}}{i + 1} (u^{i + 1} - l^{i + 1}) \equiv \tilde{p} (u) - \tilde{p} (l) \end{matrix}$

(A8)
Indefinite k-fold integral, $k > 1$ :

$\underset{k}{\int ... \int} p_{n} (x) d x^{k} = \sum_{i = 0}^{n} \frac{a_{i}}{(i + 1) (i + 2) . . . (i + k)} x^{i + k} = \sum_{i = 0}^{n} a_{i} \frac{i!}{(i + k)!} x^{i + k}$

(A9)
Derivative:

${\dot{p}}_{n} (x) = \frac{d}{d x} p_{n} (x) = \sum_{i = 1}^{n} i a_{i} x^{i - 1}$

(A10)
k-th derivative, $1 < k \leq n$ :

$\begin{matrix} p_{n}^{(k)} (x) & = \sum_{i = k}^{n} i (i - 1) . . . (i - k + 1) a_{i} x^{i - k} \\ = \sum_{i = k}^{n} a_{i} \frac{i!}{(i - k)!} x^{i - k} = \sum_{i = k}^{n} a_{i} k! (\binom{i}{k}) x^{i - k} \end{matrix}$

(A11)
k-th moment, $k \geq 1$ :

$\begin{matrix} \int x^{k} p_{n} (x) d x & = \sum_{i = 0}^{n} \frac{a_{i}}{i + k + 1} x^{i + k + 1} \\ \int_{l}^{u} x^{k} p_{n} (x) d x & = \sum_{i = 0}^{n} \frac{a_{i}}{i + k + 1} (u^{i + k + 1} - l^{i + k + 1}) \end{matrix}$

(A12)
Characteristic function:

$\begin{matrix} ϕ_{X} (t) & = E [e^{j t X}] = \int_{- 1}^{+ 1} p_{n} (x) e^{j t x} d x = \sum_{i = 0}^{n} a_{i} \int_{- 1}^{+ 1} x^{i} e^{j t x} d x \\ = \sum_{i = 0}^{n} a_{i} \frac{j}{t} \int_{- t / j}^{t / j} {(\frac{x j}{t})}^{i} e^{- x} d x \\ = \sum_{i = 0}^{n} a_{i} {(\frac{j}{t})}^{i + 1} (\int_{- t / j}^{\infty} x^{i} e^{- x} d x - \int_{t / j}^{\infty} x^{i} e^{- x} d x) \\ = \sum_{i = 0}^{n} \frac{a_{i}}{{(- j t)}^{i + 1}} (Γ (1 + i, j t) - Γ (1 + i, - j t)) \end{matrix}$

(A13)

where $Γ (a, z)$ is the incomplete Gamma function

Appendix B. Basic Properties of Form II Polynomials

Definition:

$p_{n} (x) = a_{n} \prod_{i = 1}^{n} (x - r_{i}), a_{n} \neq 0, r_{i} \in$

(A14)
Recursive form:

$\begin{matrix} p_{n} (x) & = \frac{a_{n}}{a_{n - 1}} (x - r_{n}) p_{n - 1} (x), n > 1 \\ p_{1} (x) & = x - r_{1} \end{matrix}$

(A15)
Converting Form I to Form II, general case:

$\sum_{i = 0}^{n} a_{i} x^{i} = a_{n} \prod_{i = 1}^{n} (x - r_{i})$

(A16)

$\begin{matrix} a_{n} & \equiv a_{n}, a_{n - 1} = a_{n} {(- 1)}^{1} \sum_{i = 1}^{N} r_{i}, a_{n - 2} = a_{n} {(- 1)}^{2} \sum_{i = 1, j = 1, i \neq j}^{N} r_{i} r_{j} \\ a_{n - 3} & = a_{n} {(- 1)}^{3} \sum_{i = 1, j = 1, k = 1, i \neq j \neq k}^{N} r_{i} r_{j} r_{k}, \dots, a_{0} = a_{n} {(- 1)}^{n} \prod_{i = 1}^{n} r_{i} \end{matrix}$

(A17)
Converting Form I to Form II, $n = 2$ :

$a_{2} \equiv a_{2}, a_{1} = - a_{2} (r_{1} + r_{2}), a_{0} = a_{2} r_{1} r_{2}$

(A18)
Converting Form I to Form II, $n = 3$ :

$a_{3} \equiv a_{3}, a_{2} = - a_{3} (r_{1} + r_{2} + r_{3}), a_{1} = a_{3} (r_{1} r_{2} + r_{1} r_{3} + r_{2} r_{3}), a_{0} = - a_{3} r_{1} r_{2} r_{3}$

(A19)
Converting Form I to Form II, $n = 4$ :

$\begin{matrix} a_{4} \equiv a_{4}, a_{3} = & - a_{4} (r_{1} + r_{2} + r_{3} + r_{4}), a_{2} = a_{4} (r_{1} r_{2} + r_{1} r_{3} + r_{1} r_{4} + r_{2} r_{3} + r_{2} r_{4} + r_{3} r_{4}) \\ a_{1} = & - a_{4} (r_{1} r_{2} r_{3} + r_{1} r_{2} r_{4} + r_{1} r_{3} r_{4} + r_{2} r_{3} r_{4}), a_{0} = a_{4} r_{1} r_{2} r_{3} r_{4} \end{matrix}$

(A20)
Indefinite integral: (recursive form)

$\begin{matrix} I_{n} (x) & = \frac{1}{a_{n}} \int p_{n} (x) d x, n > 1 \\ = (x - r_{n}) I_{n - 1} (x) - \int I_{n - 1} (x) d x \\ I_{1} (x) & = \frac{1}{a_{1}} \int p_{1} (x) d x = \int (x - r_{1}) d x = \frac{1}{2} x^{2} - r_{1} x \end{matrix}$

(A21)
Indefinite k-fold integral:

$\begin{matrix} I_{n} (x) & = (x - r_{n}) I_{n - 1} (x) - \int I_{n - 1} (x) d x \\ \int I_{n} (x) d x & = \int (x - r_{n}) I_{n - 1} (x) d x - \int \int I_{n - 1} (x) d x^{2} \\ = (x - r_{n}) \int I_{n - 1} (x) d x - 2 \int \int I_{n - 1} (x) d x^{2} \\ \underset{k}{\int ... \int} I_{n} (x) d x^{k} & = (x - r_{n}) \underset{k}{\int ... \int} I_{n - 1} (x) d x^{k} - (k + 1) \underset{k + 1}{\int ... \int} I_{n - 1} (x) d x^{k + 1} \end{matrix}$

(A22)
Derivative:

$\begin{matrix} {\dot{p}}_{n} (x) & = a_{n} \prod_{i = 1}^{n - 1} (x - r_{i}) + \frac{a_{n}}{a_{n - 1}} (x - r_{n}) {\dot{p}}_{n - 1} (x), n > 1 \\ = \frac{a_{n}}{a_{n - 1}} p_{n - 1} (x) + \frac{a_{n}}{a_{n - 1}} (x - r_{n}) {\dot{p}}_{n - 1} (x) \\ {\dot{p}}_{1} (x) & = a_{1} \end{matrix}$

(A23)
k-th derivative, $1 < k \leq n$ :

$\begin{matrix} {\dot{p}}_{n} (x) & = \frac{a_{n}}{a_{n - 1}} p_{n - 1} (x) + \frac{a_{n}}{a_{n - 1}} (x - r_{n}) {\dot{p}}_{n - 1} (x) \\ p_{n}^{(k)} (x) & = k \frac{a_{n}}{a_{n - 1}} p_{n - 1}^{(k - 1)} (x) + \frac{a_{n}}{a_{n - 1}} (x - r_{n}) p_{n - 1}^{(k)} (x) \end{matrix}$

(A24)
k-th moment, $k \geq 1$ :

$\begin{matrix} {\overset{}{\tilde{p}}}_{n} (x) \equiv \int p_{n} (x) d x & , {\overset{k}{\tilde{p}}}_{n} (x) \equiv \underset{k}{\int ... \int} p_{n} (x) d x \\ \int x^{k} p_{n} (x) d x & = x^{k} {\overset{}{\tilde{p}}}_{n} (x) - k \int x^{k - 1} {\overset{}{\tilde{p}}}_{n} (x) d x^{2} \\ \int x^{k - 1} {\overset{}{\tilde{p}}}_{n} (x) d x & = x^{k - 1} {\overset{2}{\tilde{p}}}_{n} (x) - (k - 1) \int x^{k - 2} {\overset{2}{\tilde{p}}}_{n} (x) d x \\ ⋮ \\ \int x {\overset{k - 1}{\tilde{p}}}_{n} (x) d x & = x {\overset{k}{\tilde{p}}}_{n} (x) - {\overset{k + 1}{\tilde{p}}}_{n} (x) \\ \int_{l}^{u} x^{k} p_{n} (x) d x = & {[x^{k} {\overset{}{\tilde{p}}}_{n} (x)]}_{l}^{u} - k \int_{l}^{u} x^{k - 1} {\overset{}{\tilde{p}}}_{n} (x) d x \end{matrix}$

(A25)
Moment-generating function:

$\begin{matrix} M (t) = \int_{l}^{u} e^{t x} p_{n} (x) d x & = \frac{e^{t x}}{t} p_{n} (x) - \frac{1}{t} \int_{l}^{u} e^{t x} {\overset{1}{\dot{p}}}_{n} (x) d x \\ \int_{l}^{u} e^{t x} {\overset{1}{\dot{p}}}_{n} (x) d x & = \frac{e^{t x}}{t} p_{n}^{'} (x) - \frac{1}{t} \int_{l}^{u} e^{t x} {\overset{2}{\dot{p}}}_{n} (x) d x \\ ⋮ \\ \int_{l}^{u} e^{t x} {\overset{n}{\dot{p}}}_{n} (x) d x & = \int_{l}^{u} e^{t x} a_{n} n! d x = \frac{a_{n} n!}{t} e^{t^{u} - t^{l}} \end{matrix}$

(A26)

Appendix C. Basic Properties of Form III Polynomials

Definition:

$p_{n} (x) = \frac{s_{m} (x)}{q_{n} (x)} = \frac{s_{m} (x)}{c_{n} \prod_{i = 1}^{n} (x - r_{i})} = \sum_{i = 1}^{n} \frac{a_{i}}{x - r_{i}}, m < n, c_{n} \neq 0, r_{i} \neq r_{j} \forall i \neq j$

(A27)

where the residuals, $a_{i} = \frac{s_{m} (r_{i})}{{\dot{q}}_{n} (r_{i})} \neq 0$
Indefinite integral:

$\begin{matrix} r_{i} \in R : \int_{l}^{u} \frac{1}{x - r_{i}} d x & = \{\begin{matrix} ln \frac{u - r_{i}}{l - r_{i}}, & r_{i} < l or r_{i} > u \\ n . c . & otherwise \end{matrix} \\ r_{i} \in : \int_{l}^{u} \frac{1}{x - r_{i}} d x & = \{\begin{matrix} ln \frac{u - r_{i}}{l - r_{i}}, & Re (r_{i}) < l or Re (r_{i}) > u or Im (r_{i}) \neq 0 \\ n . c . & otherwise \end{matrix} \\ \int \sum_{i = 1}^{n} \frac{c_{i}}{x - r_{i}} d x & = \sum_{i = 1}^{n} c_{i} ln (x - r_{i}) \end{matrix}$

(A28)
Definite integral:

$\int_{l}^{u} \sum_{i = 1}^{n} \frac{c_{i}}{x - r_{i}} d x = \sum_{i = 1}^{n} c_{i} \frac{ln (u - r_{i})}{ln (l - r_{i})}$

(A29)
k-th derivative, $1 \leq k \leq n$ :

$p_{n}^{(k)} (x) = {(- 1)}^{k} k! \sum_{i = 1}^{n} \frac{c_{i}}{{(x - r_{i})}^{k + 1}}$

(A30)
k-fold integral, $k \geq 1$ :

$\frac{d^{k}}{d x^{k}} (\frac{x^{k - 1} ln x}{(k - 1)!}) = \frac{1}{x}$

(A31)

$\underset{k}{\int ... \int} p_{n} (x) d x^{k} = \sum_{i = 1}^{n} c_{i} \frac{{(x - r_{i})}^{k - 1} ln (x - r_{i})}{(k - 1)!}$

(A32)
k-th moment, $k \geq 1$ :

$\begin{matrix} \int_{l}^{u} x^{k} \sum_{i = 1}^{n} \frac{c_{i}}{x - r_{i}} d x & = \sum_{i = 1}^{n} c_{i} r_{i}^{k} (β_{l / r_{i}} (1 + k, 0) - β_{u / r_{i}} (1 + k, 0)), 0 \leq l < u \leq 1 \\ \int_{l}^{u} \frac{x^{k}}{x - r_{i}} d x & = - r_{i}^{k - 1} \int_{l}^{u} {(\frac{x}{r_{i}})}^{k} {(1 - \frac{x}{r_{i}})}^{- 1} d x = - r_{i}^{k} \int_{l / r_{i}}^{u / r_{i}} x^{k} {(1 - x)}^{- 1} d x \\ = r_{i}^{k} (β_{l / r_{i}} (1 + k, 0) - β_{u / r_{i}} (1 + k, 0)), r_{i} \neq 0 \\ \int_{l}^{u} \frac{x^{k}}{x} d x & = \frac{1}{k} (u^{k} - l^{k}), r_{i} = 0 \end{matrix}$

(A33)

where $β_{z} (a, b) = \int_{0}^{z} t^{a - 1} {(1 - t)}^{b - 1} d t$ is incomplete $β$ -function
Characteristic function:

$ϕ_{X} (t) = E [e^{j t X}] = \sum_{i = 1}^{n} c_{i} \int_{l}^{u} \frac{e^{j t x}}{x - r_{i}} d x = \sum_{i = 1}^{n} c_{i} e^{j r_{i} t} (Γ (0, j t (r_{i} - l)) - Γ (0, j t (r_{i} - u))), r \in (l, u)$

(A34)

References

Phillips, G.M.; Taylor, P.J. “Best”Approximation. In Theory and Applications of Numerical Analysis; Academic Press: London, UK, 1996; pp. 86–130. [Google Scholar] [CrossRef]
Cheney, E.W. Introduction to Approximation Theory, 2nd ed.; AMS Chelsea Publishing: Providence, RI, USA, 1982. [Google Scholar]
Epperson, J. On the Runge example. Ameican Math. Mon. 1987, 94, 329–341. [Google Scholar] [CrossRef]
Smith, S.J. Lebesgue constants in polynomial interpolation. Ann. Math. Inform. 2006, 33, 109–123. [Google Scholar]
Ibrahimoglu, B.A. Lebesgue functions and Lebesgue constants in polynomial interpolation. J. Inequalities Appl. 2016, 93, 93. [Google Scholar] [CrossRef] [Green Version]
Freedman, D.; Diaconis, P. On the histogram as a density estimator:L2 theory. Z. Wahrscheinlichkeitstheorie Verwandte Geb. 1981, 57, 453–476. [Google Scholar] [CrossRef] [Green Version]
Munkhammar, J.; Mattsson, L.; Rydén, J. Polynomial probability distribution estimation using the method of moments. PLoS ONE 2017, 12, e0174573. [Google Scholar] [CrossRef] [Green Version]
Badinelli, R.D. Approximating probability density functions and their convolutions using orthogonal polynomials. Eur. J. Oper. Res. 1996, 95, 211–230. [Google Scholar] [CrossRef]
Abdous, B.; Bensaid, E. Multivariate local polynomial fitting for a probability distribution function and its partial derivatives. J. Nonparametric Stat. 2007, 13, 77–94. [Google Scholar] [CrossRef]
Gasca, M.; Sauer, T. Polynomial interpolation in several variables. Adv. Comput. Math. 2000, 12, 377–410. [Google Scholar] [CrossRef]
Ghasemi, M.; Marshall, M. Lower bounds for a polynomial in terms of its coefficients. Arch. Math. 2010, 95, 343–353. [Google Scholar] [CrossRef]
Forsythe, G.E. Generation and Use of Orthogonal Polynomials for Data-Fitting with a Digital Computer. J. Soc. Ind. Appl. Math. 1957, 5, 74–88. [Google Scholar] [CrossRef]
Cunis, T. The pwpfit Toolbox for Polynomial and Piece-wise Polynomial Data Fitting. In Proceedings of the International Federation of Automatic Control, Baku, Azerbaijan, 13–15 September 2018; pp. 682–687. [Google Scholar] [CrossRef]
Hiang, T.S.; Ali, J.M. Quartic and quintic polynomial interpolation. In Proceedings of the AIP Conference Proceedings, Kuala Lumpur, Malaysia, 24–26 July 2013; Volume 1522, pp. 664–675. [Google Scholar] [CrossRef]
Gao, J.; Ji, W.; Zhang, L.; Shao, S.; Wang, Y.; Shi, F. Fast Piecewise Polynomial Fitting of Time-Series Data for Streaming Computing. IEEE Access 2020, 8, 43764–43775. [Google Scholar] [CrossRef]
Guo, L.; Narayan, A.; Zhou, T. Constructing Least-Squares Polynomial Approximations. SIAM Rev. 2020, 62, 483–508. [Google Scholar] [CrossRef]
Han, H.; Liu, H.; Ji, X. Interpolation to Data Points in Plane with Cubic Polynomial Precision. In Proceedings of the Technologies for E-Learning and Digital Entertainment, Hong Kong, China, 11–13 June 2007; pp. 677–686. [Google Scholar] [CrossRef]
Melucci, M. A brief survey on probability distribution approximation. Comput. Sci. Rev. 2019, 33, 91–97. [Google Scholar] [CrossRef]
He, W.; Hao, P.; Li, G. A novel approach for reliability analysis with correlated variables based on the concepts of entropy and polynomial chaos expansion. Mech. Syst. Signal Process. 2021, 146, 106980. [Google Scholar] [CrossRef]
Chen, D.; Yuan, Z.; Hua, G.; Zheng, N.; Wang, J. Similarity Learning on an Explicit Polynomial Kernel Feature Map for Person Re-Identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar] [CrossRef]
Cotter, N.E. The Stone-Weierstrass theorem and its application to neural networks. IEEE Trans. Neural Netw. 1990, 1, 290–295. [Google Scholar] [CrossRef]
Tong, Y.; Yu, L.; Li, S.; Liu, J.; Qin, H.; Li, W. Polynomial Fitting Algorithm Based on Neural Network. ASP Trans. Pattern Recognit. Intell. Syst. 2021, 1, 32–39. [Google Scholar] [CrossRef]
Barbeau, E.J. Polynomials; Springer: New York, NY, USA, 1989. [Google Scholar]
Rahman, Q.I.; Schmeisser, G. Analytic Theory of Polynomials; Oxford University Press: New York, NY, USA, 2002. [Google Scholar]
Apostol, T.M. Mathematical Analysis, 2nd ed.; Addison-Wesley: Reading, MA, USA, 1974. [Google Scholar]
Rahman, S. An extended polynomial dimensional decomposition method for arbitrary probability distributions. J. Eng. Mech. 2009, 135, 1439–1451. [Google Scholar] [CrossRef] [Green Version]
Funaro, D. Polynomial Approximation of Differential Equations; Springer: Berlin/Heidelberg, Germany, 1992. [Google Scholar]
Guo, B.Y.; Shen, J.; Wang, L.L. Generalized Jacobi Polynomials/Functions and Their Applications. Appl. Numer. Math. 2009, 59, 1011–1028. [Google Scholar] [CrossRef] [Green Version]
Boas, R.P.; Klamkin, M.S. Extrema of Polynomials. Math. Mag. 1977, 50, 75–78. [Google Scholar] [CrossRef]
Hanzon, B.; Jibetean, D. Global Minimization of a Multivariate Polynomial using Matrix Methods. J. Glob. Optim. 2003, 27, 1–23. [Google Scholar] [CrossRef]
Qi, L.; Teo, K.L. Multivariate Polynomial Minimization and Its Application in Signal Processing. J. Glob. Optim. 2003, 26, 419–433. [Google Scholar] [CrossRef]
Uteshev, A.Y.; Cherkasov, T.M. The Search for the Maximum of a Polynomial. J. Symb. Comput. 1998, 25, 587–618. [Google Scholar] [CrossRef] [Green Version]
Pan, V.Y. Solving A Polynomial Equation: Some History And Recent Progress. SIAM Rev. 1997, 39, 187–220. [Google Scholar] [CrossRef]
Beji, S. Polynomial Functions Composed of Terms with Non-Integer Powers. Adv. Pure Math. 2021, 11, 791–806. [Google Scholar] [CrossRef]
Papoulis, A.; Pillai, S.U. Probability, Random Variables, and Stochastic Processes, 4th ed.; McGraw-Hill: New York, NY, USA, 2002. [Google Scholar]
Alzaatreh, A.; Lee, C.; Famoye, F. A new method for generating families of continuous distributions. Metron 2013, 71, 63–79. [Google Scholar] [CrossRef] [Green Version]
Gorav, J.; Pandey, A.; Shukla, H.; Zisopoulos, C. How many zeros of a random sparse polynomial are real? In Proceedings of the International Symposium on Symbolic and Algebraic Computation, Hong Kong, China, 14–18 December 2020; pp. 273–280. [Google Scholar] [CrossRef]
Bini, D.A. Numerical computation of polynomial zeros by means of Aberth’s method. Numer. Algorithms 1996, 13, 179–200. [Google Scholar] [CrossRef]
Lang, M.; Frenzel, B.C. Polynomial root finding. IEEE Signal Process. Lett. 1994, 1, 141–143. [Google Scholar] [CrossRef]
Maulud, D.; Abdulazeez, A.M. A Review on Linear Regression Comprehensive in Machine Learning. J. Appl. Sci. Technol. Trends 2020, 1, 140–147. [Google Scholar] [CrossRef]
Seber, G.A.F. A Matrix Handbook for Statisticians; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Aldaz, J.M. Self–Improvement Of The Inequality Between Arithmetic And Geometric Means. J. Math. Inequalities 2009, 3, 213–216. [Google Scholar] [CrossRef] [Green Version]
Mnatsakanov, R.M.; Hakobyan, A.S. Recovery of Distributions via Moments. In Proceedings of the Optimality: The Third Erich L. Lehmann Symposium, Houston, TX, USA, 16–19 May 2007; Volume 57, pp. 252–265. [Google Scholar] [CrossRef]
Trefethen, L.N. Approximation Theory and Approximation Practice; SIAM-Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2019. [Google Scholar]
Flecher, C.; Allard, D.; Naveau, P. Truncated skew-normal distributions: Moments, estimation by weighted moments and application to climatic data. Int. J. Stat. 2012, 68, 331–345. [Google Scholar] [CrossRef]
Morán-Vásquez, R.A.; Zarrazola, E.; Nagar, D.K. Some Statistical Aspects of the Truncated Multivariate Skew-t Distribution. Mathematics 2022, 10, 2793. [Google Scholar] [CrossRef]

Figure 1. The piecewise polynomial distributions,

p_{n} (x)

, consisting of 4 segments with 5 control points (red squares), when the continuity order

C = 2

. The segments were computed using 20 equally spaced samples within each segment.

Figure 1. The piecewise polynomial distributions,

p_{n} (x)

, consisting of 4 segments with 5 control points (red squares), when the continuity order

C = 2

. The segments were computed using 20 equally spaced samples within each segment.

Figure 2. The rejection sampling from distribution

p_{5} (x)

designed in Figure 1 to generate samples from the target distribution,

q (x) \leq A p_{5} (x)

, where

A = 1.6

.

Figure 2. The rejection sampling from distribution

p_{5} (x)

designed in Figure 1 to generate samples from the target distribution,

q (x) \leq A p_{5} (x)

, where

A = 1.6

.

Figure 3. The PDF (66) and the empirical histogram for

10^{4}

generated random samples.

Figure 3. The PDF (66) and the empirical histogram for

10^{4}

generated random samples.

Figure 4. The estimation error,

p_{4} (x) - {\hat{p}}_{4} (x)

, for the method of moments with 500 data samples.

Figure 4. The estimation error,

p_{4} (x) - {\hat{p}}_{4} (x)

, for the method of moments with 500 data samples.

Figure 5. The box plots of estimated coefficients of the polynomial PDF (66) from only 50 samples, and repeated 1000 times. The triangles on the side of the boxes indicate the true coefficient values.

Figure 6. The goodness-of-fit measure (62) as a function of the polynomial order, n.

Figure 7. The approximations of a given PDF by polynomial PDFs,

p_{n} (x)

, with

n = 3, 5

, and 7 using the unconstrained (A), and constrained (B) LS approximation, respectively.

Figure 7. The approximations of a given PDF by polynomial PDFs,

p_{n} (x)

, with

n = 3, 5

, and 7 using the unconstrained (A), and constrained (B) LS approximation, respectively.

Figure 8. The approximations of a given PDF by Lagrange polynomials assuming

N = 6, 8

, and 10 equidistant samples (A), and the same number of Chebyshev samples (B).

Figure 8. The approximations of a given PDF by Lagrange polynomials assuming

N = 6, 8

, and 10 equidistant samples (A), and the same number of Chebyshev samples (B).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, Y.; Loskot, P. Polynomial Distributions and Transformations. Mathematics 2023, 11, 985. https://doi.org/10.3390/math11040985

AMA Style

Yu Y, Loskot P. Polynomial Distributions and Transformations. Mathematics. 2023; 11(4):985. https://doi.org/10.3390/math11040985

Chicago/Turabian Style

Yu, Yue, and Pavel Loskot. 2023. "Polynomial Distributions and Transformations" Mathematics 11, no. 4: 985. https://doi.org/10.3390/math11040985

APA Style

Yu, Y., & Loskot, P. (2023). Polynomial Distributions and Transformations. Mathematics, 11(4), 985. https://doi.org/10.3390/math11040985

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Polynomial Distributions and Transformations

Abstract

1. Introduction

2. Defining Polynomial Distributions

2.1. Probability Density Transformations

2.2. Polynomial PDF Fit of a Histogram

2.3. Piecewise Polynomial PDF

3. Derived Characteristics of a Polynomial Distribution

4. Estimation Problems Involving Polynomial Distributions

Other Estimation Problems

5. Numerical Examples

5.1. Constructing Piecewise Polynomial PDF

5.2. Generating Random Samples Using Polynomial PDF

5.3. Estimating Parameters of Polynomial PDF

5.4. Approximating PDF by Polynomial PDF

5.5. Summary of Observations

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Basic Properties of Form I Polynomials

Appendix B. Basic Properties of Form II Polynomials

Appendix C. Basic Properties of Form III Polynomials

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI