Numerical Algorithms for Estimating Probability Density Function Based on the Maximum Entropy Principle and Fup Basis Functions

Estimation of the probability density function from the statistical power moments presents a challenging nonlinear numerical problem posed by unbalanced nonlinearities, numerical instability and a lack of convergence, especially for larger numbers of moments. Despite many numerical improvements over the past two decades, the classical moment problem of maximum entropy (MaxEnt) is still a very demanding numerical and statistical task. Among others, it was presented how Fup basis functions with compact support can significantly improve the convergence properties of the mentioned nonlinear algorithm, but still, there is a lot of obstacles to an efficient pdf solution in different applied examples. Therefore, besides the mentioned classical nonlinear Algorithm 1, in this paper, we present a linear approximation of the MaxEnt moment problem as Algorithm 2 using exponential Fup basis functions. Algorithm 2 solves the linear problem, satisfying only the proposed moments, using an optimal exponential tension parameter that maximizes Shannon entropy. Algorithm 2 is very efficient for larger numbers of moments and especially for skewed pdfs. Since both Algorithms have pros and cons, a hybrid strategy is proposed to combine their best approximation properties.


Introduction
Many physical processes cannot be characterized deterministically owing to the presence of intrinsic or parametric uncertainty due to their physical nature, interpretation or measurements. Therefore, results are usually given in the form of a certain number of the first few statistical power moments or, rarely, as a probability density function (pdf) [1][2][3].
The maximum entropy principle, defined by Jaynes [4], is a versatile tool for statistical inference of the probability density function from its moments by maximizing Shannon entropy [5]. The principle states that among all possible pdfs that satisfy our incomplete information about the system, the one that maximizes entropy is the least biased estimate that can be made. It agrees with everything that is known but carefully avoids anything that is unknown [6,7].
In the past few decades, a great number of maximum entropy algorithms (MEAs) have been developed and applied in various fields of science, such as continuum mechanics [8], signal processing [9,10], chemical engineering [11][12][13] and heat transfer [14]. In coastal engineering, a MEAs are used as a powerful tool for the prediction of extreme significant wave heights [15][16][17] and for the distribution of streamwise velocity in open channels [18]. MEAs are especially popular in structural reliability analysis [19][20][21][22][23][24][25]. Their application goes much further, though; for example, MEAs used for the evaluation of the reliability of semiconductor devices [26]. MEAs have even been extended to multidimensional problems [27][28][29][30] and models in 2D domains and on surfaces [31].
MaxEnt algorithms for a higher number of moments are subjected to highly unbalanced nonlinearities, ill-conditioned Jacobian and Hessian matrices in Newton algorithms

Maximum Entropy Principle
Let f(x) be an unknown probability density function (pdf) defined in a finite real interval that satisfies a basic normalized zero statistical moment, which says that all pdf outcomes present a certain event: Suppose that the additional moment constraints on f are given in the form of classical statistical moments of the higher order: b a x i f(x)dx = µ i , i = 1, . . . , m (2) where µ i represents given and known statistical moments. Without losing generality, the interval [a, b] can be transformed into the interval [0, 1], as we will write later in this article. The estimation of the function, f(x), by the maximum entropy principle is obtained by maximizing the Shannon entropy, H(f), associated with the function, f, with constraints given by Equation (2), where the Shannon entropy is given in the form: Jaynes [4] showed that maximizing H(f) with respect to f, under constraints (2), leads to the following analytical solution: f(x) = e −λ 0 −λ 1 x−...−λ m x m (4) where Lagrange multipliers λ 0 , . . . , λ m satisfy the following relations: λ j x j dx = exp(λ 0 ) 1 From a practical point of view, determining the maximum entropy of a pdf from given moments comes down to solving a nonlinear system of m Equation (6), creating a challenging nonlinear numerical problem faced by unbalanced nonlinearities, numerical instability and lack of convergence, especially for larger numbers of moments.

Numerical Algorithms Using Fup Basis Functions
This chapter describes two basic numerical algorithms for estimating the probability density function from a finite number of known moments. Both algorithms for solving the classical moment problem were developed using atomic basis functions, one of the algebraic type, Fup 4 (x), and one of the exponential type, EFup 4 (x, ω).
The function Fup 4 (x) is shown in Figure 1.
where μ represents given and known statistical moments. Without losing generality, the interval [a, b] can be transformed into the interval [0,1], as we will write later in this article. The estimation of the function, f(x), by the maximum entropy principle is obtained by maximizing the Shannon entropy, H(f), associated with the function, f, with constraints given by Equation (2), where the Shannon entropy is given in the form: Jaynes [4] showed that maximizing H(f) with respect to f, under constraints (2), leads to the following analytical solution: f(x) = e ... (4) where Lagrange multipliers λ , … , λ satisfy the following relations: x exp − ∑ λ x dx exp − ∑ λ x dx = μ , i = 1, … , m From a practical point of view, determining the maximum entropy of a pdf from given moments comes down to solving a nonlinear system of m Equation (6), creating a challenging nonlinear numerical problem faced by unbalanced nonlinearities, numerical instability and lack of convergence, especially for larger numbers of moments.

Numerical Algorithms Using Fup Basis Functions
This chapter describes two basic numerical algorithms for estimating the probability density function from a finite number of known moments. Both algorithms for solving the classical moment problem were developed using atomic basis functions, one of the algebraic type, Fup (x), and one of the exponential type, EFup (x, ω).
The function Fup (x) is shown in Figure 1. The function support contains six characteristic intervals. By a linear combination of these functions, shifted from each other by the length of the characteristic interval, algebraic polynomials up to the fourth degree can be accurately represented (see Appendix A, Appendix A.1). To approximate the pdf using the maximum entropy principle in the range [0,1], a base of at least six basis functions, Fup (x), is required, as shown in Figure  2. The function support contains six characteristic intervals. By a linear combination of these functions, shifted from each other by the length of the characteristic interval, algebraic polynomials up to the fourth degree can be accurately represented (see Appendix A, Appendix A.1). To approximate the pdf using the maximum entropy principle in the range [0, 1], a base of at least six basis functions, Fup 4 (x), is required, as shown in Figure 2.
Therefore, the MaxEnt optimization procedure for estimating the probability density function starts from the sixth moment (with the zeroth moment engaged, i.e., m = 5). For numerical operations in the domain to be performed efficiently, it is necessary to modify basis functions that have non-zero values within the domain and the vertices of which are outside the domain by expressing them in the form of a linear combination of the original basis functions [50]. Figure 3 shows the basis functions in the observed domain for the estimation with six moments (m = 5).  Therefore, the MaxEnt optimization procedure for estimating the probability density function starts from the sixth moment (with the zeroth moment engaged, i.e., m = 5). For numerical operations in the domain to be performed efficiently, it is necessary to modify basis functions that have non-zero values within the domain and the vertices of which are outside the domain by expressing them in the form of a linear combination of the original basis functions [50]. Figure 3 shows the basis functions in the observed domain for the estimation with six moments (m = 5). Fup basis functions of the exponential type are much less well-investigated than those of the algebraic type [56]. The linear combination of the functions EFup (x, ω) can accurately represent the exponential function E(x) = e (Appendix A, Appendix A.2). The EFup (x, ω) function retains all the properties of the Fup (x) function, such as smoothness, finiteness, polynomial development, etc. The special feature of these functions is that they contain the parameter ω, which allows them to change the slope depending on the parameter value, as shown in Figure 4.  Therefore, the MaxEnt optimization procedure for estimating the probability density function starts from the sixth moment (with the zeroth moment engaged, i.e., m = 5). For numerical operations in the domain to be performed efficiently, it is necessary to modify basis functions that have non-zero values within the domain and the vertices of which are outside the domain by expressing them in the form of a linear combination of the original basis functions [50]. Figure 3 shows the basis functions in the observed domain for the estimation with six moments (m = 5). Fup basis functions of the exponential type are much less well-investigated than those of the algebraic type [56]. The linear combination of the functions EFup (x, ω) can accurately represent the exponential function E(x) = e (Appendix A, Appendix A.2). The EFup (x, ω) function retains all the properties of the Fup (x) function, such as smoothness, finiteness, polynomial development, etc. The special feature of these functions is that they contain the parameter ω, which allows them to change the slope depending on the parameter value, as shown in Figure 4.  Fup basis functions of the exponential type are much less well-investigated than those of the algebraic type [56]. The linear combination of the functions EFup n (x, ω) can accurately represent the exponential function E(x) = e 2 n ωx (Appendix A, Appendix A.2). The EFup 4 (x, ω) function retains all the properties of the Fup 4 (x) function, such as smoothness, finiteness, polynomial development, etc. The special feature of these functions is that they contain the parameter ω, which allows them to change the slope depending on the parameter value, as shown in Figure 4.   This property of EFup functions of the exponential type gives them an advan over other finite functions (such as algebraic Fups, splines, wavelets), and it is their bility that allows them to "adapt" more to the solution. On a similar track, expone splines that also contain a tension parameter are used in the literature; these are applied to solve the singularly perturbed boundary problem [58,59]. When using th ponential type of atomic basis functions, the basic task is to determine the value o  It can be seen that the function EFup 4 (x, ω) for ω > 0 is tilted to the right, and if ω < 0, the function is tilted to the left. In the boundary case when ω = 0, the function is symmetric and identical to the algebraic function Fup 4 (x). Thus, a vector function space of the functions EFup n (x, ω) also contains functions Fup n (x).
This property of EFup functions of the exponential type gives them an advantage over other finite functions (such as algebraic Fups, splines, wavelets), and it is their flexibility that allows them to "adapt" more to the solution. On a similar track, exponential splines that also contain a tension parameter are used in the literature; these are often applied to solve the singularly perturbed boundary problem [58,59]. When using the exponential type of atomic basis functions, the basic task is to determine the value of the parameter ω.
Since the optimization problem of maximum entropy is the search for a pdf that maximizes the Shannon integral (3), the above will be taken as the criterion for choosing the parameter ω. Thus, of all possible solutions obtained for different values of the parameter ω, the one that gives the maximum value of the Shannon entropy functional (3) is adopted.

Classic MaxEnt Algorithm (Algorithm 1)
Here, we describe the procedure for solving the problem of pdf estimation from known moments using the principle of maximum entropy and Fup 4 (x) basis functions [34]. The procedure is simplified in relation to [34] and consists of the following steps: 1.
Let the set of m algebraic moments, µ i , be given on the general interval [a, b]. Its values on the unit interval [0, 1] are calculated using a simple linear transformation.

2.
In expression (4), we express the monomials x i , i = 0, . . . , m as a linear combination of basis functions Fup 4 j (x), j = 0, . . . , m, as shown in Appendix A, Appendix A.1. Since the Fup n basis functions exactly describe algebraic monomials up to and including the nth order in the algorithm from [34], the residual function ε i (x) is introduced, i.e.,: where are residual functions that describe the difference between monomials of the mth order and their Fup n approximation, and d ij presents the connection matrix, which depends on the Fup n approximation and on the location and number of basis functions and, consequently, moments. For instance, the collocation procedure [48] calculates exact monomials in the collocation points, and residual functions ε i (x) fluctuate around zero, with significantly smaller values than Fup basis functions. For increasing numbers of moments and basis functions, the residual functions ε i (x) converge to zero, i.e., in the limit, for m → ∞ , ε i (x) is zero. Given the fact that the influence of lower moments in the process of determining the pdf is the largest, in this paper, we choose the Fup basis function of the higher order, i.e., n = 4, which allows us to simplify the algorithm from [34] (leaving in mind that monomials above the fourth degree are now expressed approximately): Next, we calculate the connection matrix d ij . Unlike [34], where the connection matrix d ij is obtained by the collocation method, here, it is obtained by the Galerkin method: Multiplying expression (9) with the optimal pdf in the form and integrating over the domain [0, 1], we can relate the classical algebraic and Fup 4 moments where µ i represents the given moments of the requested pdf, while represents the moments of the Fup 4 basis functions. Since the Lagrange multipliers, γ j , are unknown, the system of Equation (13) must be solved iteratively.

3.
The algorithm starts with the initial pdf assumption, i.e., for the step k = 0, the initial values of the Lagrange multipliers γ j (k =0) , j = 0, . . . , m are selected, from which the initial pdf is calculated according to (11). For simplicity, for initial values, γ j (k =0) , we take zero values, as the numerical procedure is not sensitive to the selection of the initial pdf. Then, the Fup moments µ j Fup 4 ; j = 0, . . . , m are determined in the kth iterative step by solving system (12). 4.
From the known Fup moments, the problem of pdf estimation using the principle of maximum entropy is now solved: To determine the Lagrange multipliers, γ j , the nonlinear system of Equation (14) is not solved at once but is solved iteratively by the classical Newton-Raphson method, equation by equation, using Romberg's numerical integration. 5.
The index of the current equation, i, is set. The correction of the ith Lagrange coefficient is performed using the classical Newton iterative procedure. If we demand that the residual function of the Lagrange coefficient, Ψ γ k i , weighs zero, then for the k + 1 iterative step, the relation holds: In the current iterative step, k, the correction of the value γ k i is required in a way so that first the residual of the Lagrange coefficient belonging to a particular moment is determined: and then the first derivation of that residual: and the correction of the Lagrange multiplier in the kth iterative step: If all values of ∆γ k i , i = 0, . . . , m are less than the given tolerance, the iterative procedure is stopped; otherwise, step k + 1 is taken and point five is repeated until the convergence condition is satisfied or until the maximum number of given iterative steps is reached.

Fast Linear Algorithm (Algorithm 2)
Instead of solving a nonlinear system of Equation (13) in the classical MaxEnt problem, which, in some cases, can cause greater numerical instabilities and slow the solution convergence, the process of pdf estimation from known moments using the maximum entropy principle can be simplified by reducing the problem to solving the linear system of equations. The procedure is described using EFup 4 (x, ω) basis functions and consists of the following steps: 1.
The probability density function is assumed in the form of a linear combination of EFup 4 basis functions: The procedure of pdf determination from the given moments, µ i , comes down to directly solving a linear system of equations: from which follow the coefficients of the linear combination, c j .

2.
By including the calculated coefficients, c j , in expression (19), a pdf estimation is obtained, which can also have negative values. Since the probability density function must be positive by definition, i.e., non-negative in the range [0, 1], the corresponding R-function [60] is used to remove the negative part of the function (or to control the lower limit of the pdf): is an arbitrary function that satisfies the condition: For α = 1 follows: In the case of maximum entropy, f 1 (x) represents the probability density function, and the function f 2 (x) is the lower limit that has a value of zero, so (21) becomes: If f(x) obtained from (19) i (20) is already a non-negative function, then (21) has no effect on the solution of the pdf. 3.
The function obtained according to expression (21) is normalized for the pdf to fulfill the basic probability condition (1). The moments of the pdf obtained in this way are calculated, compared with the given moments and the error of the calculated moments is estimated. The value of Shannon entropy is also calculated according to expression (3).

4.
The procedure is repeated for a number of real parameters, ω, within the selected interval. The interval can be arbitrary and contain positive and negative values of parameters. When ω = 0, the EFup 4 basis functions are identical to the Fup 4 basis functions.

5.
A pdf with the parameter ω that gives the highest value of Shannon entropy according to expression (3) is selected.
In the limit, when m → ∞ , the obtained function f(x) approaches the exact pdf function because it satisfies all moments. For a finite number of moments, the procedure is approximate, but the possibility of choosing the parameter, ω, which gives the maximum possible value of the functional (3) and also adapts to the shape of the pdf, often allows for a very good approximation of the pdf. On the other hand, the pdf vector space of the maximum entropy (4) shows that it belongs to a function of the exponential type; thus, the EFup 4 basis functions make very good candidates for the linear approximation (19). The procedure is particularly suitable for greater numbers of moments when Algorithm 1 can fail.

Examples
In this section, the approximation possibilities of the proposed algorithms and their computational efficiency are explored through various examples of pdfs in order to show the advantages and disadvantages of nonlinear classical Algorithm 1 and linear Algorithm 2.

Example 1-Impulse Function
The efficiency of Algorithm 1 is demonstrated in the example of a numerically demanding probability density function, which, for the MaxEnt moment problem (1-2), represents a challenging numerical problem. It is an impulse function that is very similar to Dirac's function in the form: The peak of this function is in the coordinate x = 0.5 with a value of 144.4325. Since this function is symmetric, the parameter ω is zero, which means that Algorithm 1 and basis functions of the algebraic type Fup 4 (x) are used. It is interesting to show the convergence of the solution ( Figure 5) that is obtained with only six basis functions, i.e., using the minimum number of moments for this algorithm, with an increase in the number of iterative nonlinear steps.
The peak of this function is in the coordinate x = 0.5 with a value of 144.4325. Since this function is symmetric, the parameter ω is zero, which means that Algorithm 1 and basis functions of the algebraic type Fup (x) are used. It is interesting to show the convergence of the solution ( Figure 5) that is obtained with only six basis functions, i.e., using the minimum number of moments for this algorithm, with an increase in the number of iterative nonlinear steps. For 3 • 10 iterations, the achieved pdf shape practically coincides with the shape of the analytical function. Table 1 shows how the moment error calculated using Algorithm 1 decreases with the total number of iterations and how the value of the pdf increases at the point x = 0.5 of the observed area.  For 3·10 6 iterations, the achieved pdf shape practically coincides with the shape of the analytical function. Table 1 shows how the moment error calculated using Algorithm 1 decreases with the total number of iterations and how the value of the pdf increases at the point x = 0.5 of the observed area. This example shows the accuracy and stability of numerical Algorithm 1 but also the deficiency of a nonlinear algorithm that requires a large number of iterations. With only six moments, an excellent solution is achieved. With a fast linear Algorithm 2 for this example of a symmetric impulse pdf function, it is not possible to achieve a satisfactory solution, nor can one be achieved with a larger number of moments due to, among other things, the oscillations that occur around the point x = 0.5. For example, the maximum pdf value obtained with 20 moments is 16.097, which is very far from the exact value.

Example 2-Bimodal Function
In the second example, we consider a bimodal function composed of two Gaussian distributions normalized to the domain [0, 1]. Figure 6a-c show approximations of the bimodal pdf obtained by Algorithm 1, and Figure 6d-f show approximations obtained using Algorithm 2 for m = 5, 13 and 20. Since this function is approximately symmetric, a value close to zero is obtained for the parameter ω so that the numerical calculation is done using Fup 4 (x) basis functions.
Improving the pdf estimation by increasing the number of moments taken into account, besides its shape, is monitored through the achieved accuracy of the moments and through the values of the Shannon entropy functional using expression (3). The exact value of the Shannon integral for a bimodal function is H(f) = −0.42420. Tables 2 and 3 show that as the number of constraints of the required pdf function increases, the absolute moment error decreases, while the approximation of the Shannon entropy functional (3) improves for calculations using Algorithm 1 and Algorithm 2, respectively. Algorithm 1 approximately satisfies the moments due to solving a nonlinear system of equations and the selected accuracy criterion ("threshold"), as well as approximation of monomials with Fup 4 basis functions. On the other hand, functional (3) is also approximately calculated, and with an increasing number of moments, converges relatively slowly to the exact value of H(f) = 0.42420. Algorithm 2 only solves a linear system that satisfies the given moments. With a small number of moments (m = 5) in the symmetric type of pdf, the solution is significantly worse than that of Algorithm 1, which is reflected not only in the visual effect but also in the accuracy of the moments and the value of the functional (3). Yet, when the number of moments reaches m = 13, very accurate results are obtained with a much simpler procedure.
The calculation time ("CPU time") depending on the number of moments taken for both algorithms is graphically shown in Figure 7. The computer time is given in seconds and is obtained using the classic programming language Fortran on a computer with an Intel i7 2.80 GHz processor. As the number of moments increases, both algorithms require a nonlinear increase in computation time. When the number of moments exceeds the limit of m = 10 − 12 a significant saving of computation time is obtained with Algorithm 2. As seen in Figure 7, in the log-log scale, an approximately linear dependence of the number of moments and CPU time is obtained. value close to zero is obtained for the parameter ω so that the numerical calculation is done using Fup (x) basis functions. Improving the pdf estimation by increasing the number of moments taken into account, besides its shape, is monitored through the achieved accuracy of the moments and through the values of the Shannon entropy functional using expression (3). The exact value of the Shannon integral for a bimodal function is H(f) = −0.42420.  Tables 2 and 3 show that as the number of constraints of the required pdf function increases, the absolute moment error decreases, while the approximation of the Shannon entropy functional (3) improves for calculations using Algorithm 1 and Algorithm 2, respectively. Algorithm 1 approximately satisfies the moments due to solving a nonlinear

Number of Moments 6 (m = 5) 14 (m = 13) 21 (m = 20)
Absolute moment error 0.33 × 10 −6 0.36 × 10 −7 0.47 × 10 −9 Value of the Shannon functional −0.39841 −0.42283 −0.42419 Table 3. Absolute error of moments and values of functional (3) for Algorithm 2.  The calculation time ("CPU time") depending on the number of moments taken f both algorithms is graphically shown in Figure 7. The computer time is given in second and is obtained using the classic programming language Fortran on a computer with a Intel i7 2.80 GHz processor. As the number of moments increases, both algorithms requi a nonlinear increase in computation time. When the number of moments exceeds the lim of m = 10 − 12 a significant saving of computation time is obtained with Algorithm As seen in Figure 7, in the log-log scale, an approximately linear dependence of the num ber of moments and CPU time is obtained. From the presented results, it can be concluded that the classical nonlinear MaxE Algorithm 1 gives an approximate pdf shape with only six moments. By increasing th number of moments or the number of basis functions, the calculated pdf shape approach the exact form of the bimodal function, the accuracy of the moments increases and th value of the Shannon entropy functional also approaches the exact value. It can be sa that a satisfactory solution is achieved with 21 moments.
However, due to the large number of iterative steps required to solve a nonline system of equations, this procedure is a computationally expensive and also very nume ically sensitive. Algorithm 2 is a fast algorithm, is not numerically demanding and th procedure is stable. The system of linear equations is solved directly, and, as can be see in Figure 7, the required computational time is significantly less than that of Algorithm Note that the pdf estimation for a small number of moments is not sufficient; with nin moments, the obtained pdf shape has the characteristics of the required solution that A gorithm 1 provides with only six moments.
However, for a larger number of moments (m ≥ 13), the accuracy of the solution i creases significantly (see Table 3) both in terms of pdf approximation approaching th From the presented results, it can be concluded that the classical nonlinear MaxEnt Algorithm 1 gives an approximate pdf shape with only six moments. By increasing the number of moments or the number of basis functions, the calculated pdf shape approaches the exact form of the bimodal function, the accuracy of the moments increases and the value of the Shannon entropy functional also approaches the exact value. It can be said that a satisfactory solution is achieved with 21 moments.
However, due to the large number of iterative steps required to solve a nonlinear system of equations, this procedure is a computationally expensive and also very numerically sensitive. Algorithm 2 is a fast algorithm, is not numerically demanding and the procedure is stable. The system of linear equations is solved directly, and, as can be seen in Figure 7, the required computational time is significantly less than that of Algorithm 1. Note that the pdf estimation for a small number of moments is not sufficient; with nine moments, the obtained pdf shape has the characteristics of the required solution that Algorithm 1 provides with only six moments.
However, for a larger number of moments (m ≥ 13), the accuracy of the solution increases significantly (see Table 3) both in terms of pdf approximation approaching the exact function shape and in terms of the moment accuracy and the Shannon integral value (3). Using 21 moments, a pdf solution was achieved in which the absolute error of the moments was practically equal to zero and the value of H(f) coincided with the exact value. The advantages of both numerical algorithms for estimating the pdf from known moments can be combined in a way by first obtaining the shape of the required function with six moments using the Fup MaxEnt Algorithm 1, from which one can conclude the basic characteristics of the function, and then with the fast Algorithm 2 to get a more accurate pdf estimation using a greater number of moments.

Example 3-Beta Function
The third example of the beta distribution is given in the form: (26) When applying atomic basis functions of exponential type EFup 4 (x, ω), the basic task is to determine the value of the parameter ω. Function (26) is asymmetric and inclined to the right, which means that the parameter ω > 0 (see Figure 4). Since the principle of maximum entropy requires a pdf that maximizes the Shannon integral (3), this condition will be used as a criterion for selecting the parameter ω. The procedure is reduced to the simultaneous direct solution of the linear system (20) for different values of the parameter ω. Of all the numerical solutions thus obtained, the one that gives the maximum value of Shannon entropy (3) is adopted.
The procedure for determining the parameter ω for pdf estimation by the principle of maximum entropy with 12 moments using basis functions EFup 4 (x, ω) is illustrated in Figure 8. The exact value of the integral (3) for the beta function of pdf (26) is H(f) = 1.29632, and the closest value is obtained with ω = 22.5. moments was practically equal to zero and the value of H(f) coincided with the exact value. The advantages of both numerical algorithms for estimating the pdf from known moments can be combined in a way by first obtaining the shape of the required function with six moments using the Fup MaxEnt Algorithm 1, from which one can conclude the basic characteristics of the function, and then with the fast Algorithm 2 to get a more accurate pdf estimation using a greater number of moments.

Example 3-Beta Function
The third example of the beta distribution is given in the form: When applying atomic basis functions of exponential type EFup (x, ω), the basic task is to determine the value of the parameter ω. Function (26) is asymmetric and inclined to the right, which means that the parameter ω > 0 (see Figure 4). Since the principle of maximum entropy requires a pdf that maximizes the Shannon integral (3), this condition will be used as a criterion for selecting the parameter ω. The procedure is reduced to the simultaneous direct solution of the linear system (20) for different values of the parameter ω. Of all the numerical solutions thus obtained, the one that gives the maximum value of Shannon entropy (3) is adopted.
The procedure for determining the parameter ω for pdf estimation by the principle of maximum entropy with 12 moments using basis functions EFup (x, ω) is illustrated in Figure 8. The exact value of the integral (3) for the beta function of pdf (26) is H(f) = 1.29632, and the closest value is obtained with ω = 22.5.    Figure 9 compares the numerical solutions obtained by applying basis functions Fup 4 (x) and EFup 4 (x, ω) with Algorithms 1 and 2, respectively, for m = 5. The value of the parameter ω for Algorithm 2, determined by the previously described procedure for pdf estimation with six moments, is ω = 23.0.
From Figure 9, we can see the advantage that exponential finite basis functions have in relation to basis functions of the algebraic type when it comes to pdf approximation, which is quite an asymmetric function. The greater accuracy of numerical solutions obtained using basis functions EFup 4 (x, ω) can also be seen in Table 4, where the absolute errors are given of approximate moments for calculations with both algorithms using algebraic and exponential Fup basis functions. Table 4 shows the convergence of the values of the Shannon entropy functionals (3) obtained by increasing the number of moments starting from m = 5 using Algorithms 1 and 2 and using the atomic basis functions Fup 4 (x) and EFup 4 (x, ω). In this example of the probability density function, the advantage that exponential finite functions have over basis functions of the algebraic type comes down to expression. It can be seen that the functions EFup 4 (x, ω) achieve a significantly better convergence of the numerical values of the Shannon integral to the exact value of H(f) = −1.29632. Since exponential functions adapt better to the shape of the MaxEnt problem solution thanks to the parameter ω, they ensure the stability of the numerical procedure, which is especially important for the classical MaxEnt algorithm. In this example, this indicated an extreme numerical sensitivity when using Fup 4 (x) basis functions. From Figure 9, we can see the advantage that exponential finite basis functions have in relation to basis functions of the algebraic type when it comes to pdf approximation, which is quite an asymmetric function. The greater accuracy of numerical solutions obtained using basis functions EFup (x, ω) can also be seen in Table 4, where the absolute errors are given of approximate moments for calculations with both algorithms using algebraic and exponential Fup basis functions.   Table 4 shows the convergence of the values of the Shannon entropy functionals (3) obtained by increasing the number of moments starting from m = 5 using Algorithms 1 and 2 and using the atomic basis functions Fup (x) and EFup (x, ω). In this example of the probability density function, the advantage that exponential finite functions have over basis functions of the algebraic type comes down to expression. It can be seen that the functions EFup (x, ω) achieve a significantly better convergence of the numerical values of the Shannon integral to the exact value of H(f) = −1.29632. Since exponential functions adapt better to the shape of the MaxEnt problem solution thanks to the parameter ω, they ensure the stability of the numerical procedure, which is especially important for the classical MaxEnt algorithm. In this example, this indicated an extreme numerical sensitivity when using Fup (x) basis functions.
For practical application, i.e., for estimating the pdf from known moments when the form of the function is unknown, the advantages of both described numerical algorithms can be used. First, using the nonlinear MaxEnt Algorithm 1 with basis functions Fup (x), which are easier to apply than FEup (x, ω), we get the appearance of the required function with six moments (m = 5), from which can be drawn certain conclusions about the basic   For practical application, i.e., for estimating the pdf from known moments when the form of the function is unknown, the advantages of both described numerical algorithms can be used. First, using the nonlinear MaxEnt Algorithm 1 with basis functions Fup 4 (x), which are easier to apply than FEup 4 (x, ω), we get the appearance of the required function with six moments (m = 5), from which can be drawn certain conclusions about the basic characteristics of the pdf, such as its shape, position peaks, number of peaks and inclination, while the number of statistical moments required to accurately describe all the properties of the pdf can be assumed. If an approximately symmetric function is obtained, such a first approximation, the calculation can be continued with the basis functions Fup 4 (x).
If the first approximation gives an asymmetric function or a larger number of moments is required, then using the fast linear Algorithm 2, the parameter ω is determined with the criterion for finding the maximum of the Shannon entropy functional (3), and the calculation continues with exponential functions FEup 4 (x, ω). If it is a simple pdf for which less than six moments are enough, Algorithm 1 with Fup 2 [34] or Fup 1 basis functions can be used. In this way, using the accuracy and robustness of the classic nonlinear MaxEnt Algorithm 1 up to 10-12 moments, we can solve a large number of problems and diagnose the condition if more moments are needed, especially with pdf asymmetry, when linear Algorithm 2 becomes much simpler and more efficient. With this hybrid technique, a large number of challenging problems can be solved in a much easier way than could be solved individually with the two presented algorithms.

Conclusions
In this paper, the application has been shown of numerical algorithms based on the maximum entropy principle using Fup basis functions that belong to the class of With such demanding pdfs, the number of iterative steps in the nonlinear procedure becomes very large. The characteristics of numerical solutions obtained with Algorithms 1 and 2 using algebraic finite basis functions are shown in the example of a demanding bimodal function. The classic nonlinear MaxEnt Algorithm 1 gives a good pdf shape approximation with only six moments. By increasing the number of moments or the number of basis functions, the approximate solution converges to the exact function. However, the numerical procedure is computationally expensive and sensitive to the computer's precision, so the optimization problem becomes poorly conditioned. Fast linear Algorithm 2 ensures complete stability of the procedure, but a higher number of moments is required to satisfy the accuracy of the pdf estimation.
The third example of a pdf is a beta function that represents a polynomial of the 20th degree, which is why it is numerically demanding to approximate. Nonlinear Algorithm 1 showed special sensitivity in this pdf when applying algebraic finite basis functions. In the example of quite an asymmetric pdf function, the advantage of exponential finite basis functions over functions of the algebraic type can be noted both in the stability of the nonlinear classical MaxEnt algorithm and in the achieved accuracy of calculated pdf moments and calculated Shannon integral values as a basic criterion for pdf estimation by the principle of maximum entropy.
For practical applications, i.e., for pdf estimation from known moments when the shape of the function is unknown, we can use the advantages of both numerical algorithms described here, which we defined as a hybrid strategy in this article. Acknowledgments: This research is partially supported through project KK.01.1.1.02.0027, a project co-financed by the Croatian Government and the European Union through the European Regional Development Fund-the Competitiveness and Cohesion Operational Programme.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Fup n and EFup n Basis Functions
Appendix A. 1

. Calculation and Approximation Properties of the Fup n Basis Functions
Atomic basis functions are infinitely-differentiable functions with compact support [35,56]. Atomic functions are defined as solutions of differential functional equations of the following type: where L is a linear differential operator with constant coefficients, λ is a nonzero scalar, C k are coefficients of the linear combination, a > 1 is a parameter that defines the length of the compact support, and b k are coefficients that determine displacements of the basis functions. Rvačev and Rvačev, in their pioneering work [35], called these basis functions "atomic" because they span the vector spaces of all three fundamental functions in mathematics: algebraic, exponential and trigonometric polynomials. The simplest function, which is the most studied of the atomic basis functions, is the up(x) function. The function up(x) is a smooth function with compact support over [−1, 1], which is obtained as a solution of a differential functional equation The function up(x) can be expressed as an inverse Fourier transform: sin t/2 j t/2 j dt (A3) Since (A3) represents an exact but mathematically intractable expression, in [36,56], a more numerically adequate expression for calculating the function up(x) is provided: where coefficients C jk are rational numbers determined according to the following expression: Calculation of up(−1 + 2 −r ) r[0, ∞] in binary-rational points (A5), as well as all details regarding the calculation of the function up(x) values, is provided in [56,61]. The argument (x − 0, p 1 . . . p k ) in (A1.4) is the difference between the real value of coordinate x and its binary form in k bits, where p 1 . . . p k are digits, 0 or 1, of the binary representation of the x coordinate. Therefore, the accuracy of the x coordinate computation, and thus the accuracy of the up(x) function at an arbitrary point, depends on machine accuracy.
The Fup n (x) function satisfies the following differential-functional equation: where C k n+1 and C k−1 n+1 are binomial coefficients and n is the Fup order. Index n also denotes the highest degree of the polynomial that can be expressed exactly as a linear combination of n + 2 Fup n (x) basis functions, uniformly displaced by a characteristic interval 2 −n .
Linear combination of displaced Fup n (x) functions in the form: is a polynomial of nth degree if coefficients D k are the polynomials of degree n of index k. Coefficients D k are determined using collocation procedure [36].
Therefore, a polynomial of nth degree can be exactly expressed on a 2 −n long interval as a linear combination of basis functions obtained by function Fup n (x) displacement over a characteristic interval ∆x n = 2 −n .
A linear combination of displaced Fup 4 (x) basis functions at the interval [0, 1] describes exactly the monomials: Other higher-order monomials are expressed approximately. Differences between monomials and their Fup 4 (x) approximations are defined by residual functions ε i (x) (see article, Point 3.1).

Appendix A.2. Calculation and Approximation Properties of the EFup n Basis Functions
The EFup n (x, ω) functions are atomic finite functions of class C ∞ with compact support [35] and are the elements of the linear vector space EUP n . The simplest function from the class EFup n (x, ω) is the function Eup(x, ω) for n = 0 [57], which, analogous to the algebraic ABF [56], is called the maternal function. The functions EFup n (x, ω) retain all the properties of 'their' maternal basis function, Eup(x, ω) [57]. The index n denotes the largest degree of an exponential polynomial that can be represented exactly in the form of a linear combination of shifted EFup n (x, ω) functions on a segment of length ∆x n = 2 −n . The parameter ω gives them an additional property of 'adaptability' with respect to the ABF of the algebraic type (see Figure 4).
The inverse Fourier transform, i.e., the function EFup (x, ω), with the satisfaction of the Paley-Wiener normalization condition, can be expressed in the form: EFup (x, ω) = 1 2π e F (t) dt (A13) By developing expression (A13) in the Fourier series, the ˝original˝ of the function EFup (x, ω) can be determined at arbitrary points. However, practically, the most optimal possibility of constructing functions EFup (x, ω) is in the form of a linear combination of mutually shifted Eup(x, ω) basis functions.
EFup (x, ω) = C (n) ⋅ Eup x − 1 − k 2 + n + 2 2 , ω (A14) 1 . . . The inverse Fourier transform, i.e., the function EFup n (x, ω), with the satisfaction of the Paley-Wiener normalization condition, can be expressed in the form: By developing expression (A13) in the Fourier series, the original of the function EFup n (x, ω) can be determined at arbitrary points. However, practically, the most optimal possibility of constructing functions EFup n (x, ω) is in the form of a linear combination of mutually shifted Eup(x, ω) basis functions.