Next Article in Journal
Research on the Calculation Model and Control Method of Initial Supporting Force for Temporary Support in the Underground Excavation Roadway of Coal Mine
Next Article in Special Issue
Personalized Transfer Learning Framework for Remaining Useful Life Prediction Using Adaptive Deconstruction and Dynamic Weight Informer
Previous Article in Journal
Convergence Analysis of the Strang Splitting Method for the Degasperis-Procesi Equation
Previous Article in Special Issue
Cumulative Entropy of Past Lifetime for Coherent Systems at the System Level
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Parameter Estimation of the Dirichlet Distribution Based on Entropy

by
Büşra Şahin
1,†,
Atıf Ahmet Evren
2,†,
Elif Tuna
2,†,
Zehra Zeynep Şahinbaşoğlu
2,*,† and
Erhan Ustaoğlu
3,†
1
Department of Computer Engineering, Faculty of Engineering, Halic University, Eyupsultan, 34060 Istanbul, Turkey
2
Department of Statistics, Faculty of Sciences and Literature, Yildiz Technical University, Davutpasa, Esenler, 34210 Istanbul, Turkey
3
Department of Informatics, Faculty of Management, Marmara University, Göztepe, 34180 Istanbul, Turkey
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Axioms 2023, 12(10), 947; https://doi.org/10.3390/axioms12100947
Submission received: 31 July 2023 / Revised: 28 September 2023 / Accepted: 30 September 2023 / Published: 5 October 2023

Abstract

:
The Dirichlet distribution as a multivariate generalization of the beta distribution is especially important for modeling categorical distributions. Hence, its applications vary within a wide range from modeling cell probabilities of contingency tables to modeling income inequalities. Thus, it is commonly used as the conjugate prior of the multinomial distribution in Bayesian statistics. In this study, the parameters of a bivariate Dirichlet distribution are estimated by entropy formalism. As an alternative to maximum likelihood and the method of moments, two methods based on the principle of maximum entropy are used, namely the ordinary entropy method and the parameter space expansion method. It is shown that in estimating the parameters of the bivariate Dirichlet distribution, the ordinary entropy method and the parameter space expansion method give the same results as the method of maximum likelihood. Thus, we emphasize that these two methods can be used alternatively in modeling bivariate and multinomial Dirichlet distributions.

1. Introduction

In statistics, the method of moments and maximum likelihood are used frequently, details of which can be found in [1,2]. For a long time, their asymptotic properties have been studied in detail [3]. Since the asymptotic distributions of estimators found by these two methods are normal, they have been proven to be very powerful tools for parameter estimation. However, nowadays, alternative estimation methods based on entropy maximization are applied increasingly frequently.
In 1948, [4] defined entropy as a numerical measure of uncertainty, or conversely the information content, associated with a probability distribution f ( x ; θ ) with parameter θ . It is used to describe a random variable X and is mathematically expressed as
I [ f ] = f ( x ; θ ) l n f ( x ; θ ) d x , f ( x ; θ ) d x = 1
for continuous X, where I [ f ] can be considered the mean value of l n f ( x ; θ ) . For discrete probability distributions, the integration operator in (1) is simply replaced by the summation operator. Rényi (1961) provided a generalization of Shannon entropy [5]. Rényi entropy is also called α -class entropy. For a discrete case, it is defined as
H R = l n ( i = 1 K p i α ) 1 α f o r α > 0 a n d α 1
By L’Hôspital’s rule
lim α 1 d d α ( l n ( i = 1 K p i α ) ) d d α ( 1 α ) = lim α 1 i = 1 K p i α l n p i i = 1 K p i α 1 = i = 1 K p i l n p i = H S
Therefore, Shannon entropy can be evaluated as a special case of Rényi entropy. Another generalization of Shannon entropy was realized by Constantino Tsallis (1988) [5]. Tsallis entropy is also known as β -class entropy [6]. It is defined as
H T = 1 i = 1 K p i α α 1 f o r α > 0 a n d α 1
By L’Hôspital’s rule,
lim α 1 1 i = 1 K p i α α 1 = lim α 1 d d α ( 1 i = 1 K p i α ) d d α ( α 1 ) = lim α 1 i = 1 K p i α l n p i 1 = i = 1 K p i l n p i
In other words, Tsallis entropy approaches Shannon entropy as α 1 as well as Rényi entropy. Note that for continuous distributions, the summation signs in defining equations are replaced by integration signs.
Kullback (1959) used entropy and relative entropy as the two key concepts in multivariate statistical analysis [7]. Asymptotic distributions of various entropy measures can be found in [8]. Pardo emphasizes that entropy and relative entropy formulas can be derived as special cases of divergence measures [9].
Entropy-Based Parameter Estimation in Hydrology is the first book to focus on parameter estimation using entropy for a number of distributions frequently used in hydrology [10], including uniform, exponential, normal, two-parameter lognormal, extreme value type I, Weibull, gamma, Pearson, and two-parameter Pareto distributions, among others. Singh also applies entropy theory to some problems of hydraulic and environmental engineering [11,12,13].
The principle of maximum entropy (POME), described by Jaynes as “the least biased estimate possible on the given information”, can be stated mathematically as follows [14]: Given m linearly independent constraints C i in the form
C i = a b y i ( x ) f ( x ) d x , i = 1 , 2 , , m ,
where y i ( x ) are some functions whose averages over f ( x ) are specified, the maximum of I, subject to the conditions in Equation (6), is given by the distribution
f ( x ) = exp λ 0 i = 1 m λ i y i ( x ) ,
where λ i , i = 0 , 1 , , m are Lagrange multipliers and can be determined from Equations (6) and (7) along with the normalization condition in Equation (1).
The general procedure for entropy-based parameter estimation involves (1) defining given information in terms of constraints, (2) maximizing entropy subject to given information, and (3) relating parameters to the given information. In this procedure, Lagrange multipliers are related to the constraints on one hand and to the distribution parameters on the other. One can eliminate the Lagrange multipliers and obtain parameter estimations as well.
The parameter space expansion method was developed by Singh and Rajogopal (1986). This method is different from the previous entropy method in that it employs enlarged parameter space and maximizes entropy subject to both the parameters and the Lagrange multipliers [15]. The method works as follows: for the given distribution, first the constraints are defined, and the POME formulation is obtained in terms of the parameters to be estimated and the Lagrange multipliers. After the maximization procedure, the parameter estimations can be obtained.
Entropy-based models have been intensively used for determining parameter estimations in recent years. For example, Song and Kang examined two entropy-based methods that both use the POME for the estimation of the parameters of the four-parameter exponential gamma distribution [16]. Hao and Singh applied two entropy-based methods, also using the POME, for the estimation of the parameters of the extended Burr XII distribution [17]. Singh and Deng revisited the four-parameter kappa distribution, presented an entropy-based method for estimating its parameters, and compared its performance with that of maximum likelihood estimation, methods of moments, and L moments [18]. Gao and Han used the maximum entropy method to apply a concrete solution to a special nonlinear expectation problem in a special parameter space and analyzed the convergence for the maximum entropy solution [19].
The objective of the present paper is to apply ordinary entropy and parameter space expansion to estimate the parameters of a bivariate Dirichlet distribution as an alternative to the known methods, and then to compare them with those estimated by the maximum likelihood method and method of moments.

2. Dirichlet Distribution

The beta distribution plays an important role in Bayesian statistics, especially in modeling the parameters of the Bernoulli distribution [20]. The Dirichlet distribution is a multivariate generalization of the beta distribution. Thus, the Dirichlet distribution and the generalized Dirichlet distribution can both be used as a conjugate prior for a multinomial distribution [21].
Let X k = [ X 1 , X 2 , , X k ] be a vector with k components, X i 0 for i = 1 , 2 , , k and i = 1 k x i = 1 . Also, a k = [ a 1 , a 2 , , a k ] , where a i > 0 for each i. The probability density function (pdf) of the Dirichlet distribution is given as
f ( x k ) = Γ ( a 0 ) i = 1 k Γ ( a i ) i = 1 k x i a i 1 ,
where a 0 = i = 1 k a i , x i > 0 , x 1 + x 2 + + x k 1 < 1 , and x k = 1 x 1 x k 1 and Γ is the Euler’s gamma function, which is denoted by the formula Γ ( x ) = 0 t x 1 e t d t or Γ ( x ) = ( x 1 ) ! .
It can be noted that marginals of this Dirichlet distribution are beta distributions [22], namely X i B e t a a i , ( j = 1 k a j ) a i . The moments are given by
E [ X i ] = a i a 0
V a r [ X i ] = a i ( a 0 a i ) a 0 2 ( a 0 + 1 )
C o v ( X i , X j ) = a i a j a 0 2 ( a 0 + 1 )
C o r ( X i , X j ) = a i a j ( a 0 a i ) ( a 0 a j )
For further properties, one may refer to [22,23,24].

3. Ordinary Entropy Method

In the ordinary entropy method, there are three steps in parameter estimation: (1) specification of appropriate constraints, (2) derivation of the entropy function of the distribution, and (3) derivation of the relations between parameters and constraints.

3.1. Specification of Constraints

Taking the natural logarithm of Equation (8), we obtain
l n f ( x k ) = l n Γ ( a 0 ) l n i = 1 k Γ ( a i ) + i = 1 k l n ( x i a i 1 )
Multiplying Equation (13) by [ f ( x k ) ] and integrating between [ 0 , 1 ] and [ 0 , 1 x i ] , we obtain the entropy function
I [ f ] = f ( x k ) l n f ( x k ) d x 1 d x k 1 = l n Γ ( a 0 ) i = 1 k Γ ( a i ) f ( x k ) d x 1 d x k 1
i = 1 k l n ( x i a i 1 ) f ( x k ) d x 1 d x k 1
To maximize I [ f ] in Equation (14), the following constraints should be satisfied:
f ( x k ) d x 1 d x k 1 = 1
l n x i f ( x k ) d x 1 d x k 1 = E [ l n x i ] , i = 1 , , k 1
l n ( 1 x 1 x k 1 ) f ( x k ) d x 1 d x k 1 = E [ 1 x 1 x k 1 ]

3.2. Construction of the Partition Function and Zeroth Lagrange Multiplier

The least biased pdf, f ( x k ) consistent with equations from (15) to (17) and by POME, takes the following form:
f ( x k ) = e x p [ λ 0 i = 1 k 1 λ i l n x i λ k l n ( 1 x 1 x k 1 ) ] ,
where λ 0 , λ 1 , , λ k are Lagrange multipliers. Substituting (18) in (15) yields
exp [ λ 0 i = 1 k 1 λ i l n x i λ k l n ( 1 x 1 x k 1 ) ] d x 1 d x k 1 = 1
Equation (19) gives the partition function as
e x p ( λ 0 ) = exp [ i = 1 k 1 λ i l n x i λ k l n ( 1 x 1 x k 1 ) ] d x 1 d x k 1 ,
which may be further simplified as follows:
e x p ( λ 0 ) = i = 1 k 1 x i λ i ( 1 x 1 x k 1 ) λ k d x 1 d x k = i = 1 k Γ ( 1 λ i ) Γ ( k λ 1 λ k )
The zeroth Lagrange multiplier λ 0 is obtained from Equation (21) as
λ 0 = i = 1 k l n Γ ( 1 λ i ) l n Γ ( k λ 1 λ k )
The zeroth Lagrange multiplier is also obtained from (20) as
λ 0 = l n exp [ i = 1 k 1 λ i l n x i λ k l n ( 1 x 1 x k 1 ) ] d x 1 d x k 1

3.3. Relation between Lagrange Multipliers and Constraints

Differentiating Equation (23) with respect to λ 1 , , λ k , we obtain the derivatives of λ 0 with respect to λ 1 , , λ k :
λ 0 λ 1 = l n x 1 exp [ i = 1 k 1 λ i l n x i λ k l n ( 1 x 1 x k 1 ) ] d x 1 d x k 1 exp [ i = 1 k 1 λ i l n x i λ k l n ( 1 x 1 x k 1 ) ] d x 1 d x k 1
= l n x 1 exp [ λ 0 i = 1 k 1 λ i l n x i λ k l n ( 1 x 1 x k 1 ) ] d x 1 d x k 1
= E [ l n X 1 ]
λ 0 λ 2 = l n x 2 exp [ i = 1 k 1 λ i l n x i λ k l n ( 1 x 1 x k 1 ) ] d x 1 d x k 1 exp [ i = 1 k 1 λ i l n x i λ k l n ( 1 x 1 x k 1 ) ] d x 1 d x k 1
= l n x 2 exp [ λ 0 i = 1 k 1 λ i l n x i λ k l n ( 1 x 1 x k 1 ) ] d x 1 d x k 1
= E [ l n X 2 ]
If it continues like this until k 1 ,
λ 0 λ k 1 = l n x k 1 exp [ i = 1 k 1 λ i l n x i λ k l n ( 1 x 1 x k 1 ) ] d x 1 d x k 1 exp [ i = 1 k 1 λ i l n x i λ k l n ( 1 x 1 x k 1 ) ] d x 1 d x k 1
= l n x k 1 exp [ λ 0 i = 1 k 1 λ i l n x i λ k l n ( 1 x 1 x k 1 ) ] d x 1 d x k 1
= E [ l n X k 1 ]
Furthermore,
λ 0 λ k = l n ( 1 ( i = 1 k 1 x i ) ) exp [ i = 1 k 1 λ i l n x i λ k l n ( 1 ( i = 1 k 1 x i ) ) ] d x 1 d x k 1 exp [ i = 1 k 1 λ i l n x i λ k l n ( 1 ( i = 1 k 1 x i ) ) ] d x 1 d x k 1
= l n ( 1 x 1 x k 1 ) exp [ i = 1 k 1 λ i l n x i λ k l n ( 1 ( i = 1 k 1 x i ) ) ] d x 1 d x k 1
= E [ l n ( 1 x 1 x k 1 ) ]
Differentiating Equation (22) with respect to λ 1 , λ 2 , , λ k , we obtain
λ 0 λ 1 = ( i = 1 k l n Γ ( 1 λ i ) l n Γ ( k λ 1 λ k ) ) λ 1 = ψ ( 1 λ 1 ) + ψ ( k λ 1 λ k )
λ 0 λ 2 = ( i = 1 k l n Γ ( 1 λ i ) l n Γ ( k λ 1 λ 2 ) ) λ 2 = ψ ( 1 λ 2 ) + ψ ( k λ 1 λ k )
Similarly, for the k term,
λ 0 λ k = ( i = 1 k l n Γ ( 1 λ i ) l n Γ ( k λ 1 λ k ) ) λ k = ψ ( 1 λ k ) + ψ ( k λ 1 λ k ) ,
where ψ ( x ) is the digamma function, which is defined as ψ ( x ) = d d x l n ( Γ ( x ) ) [25].
By equating (24) and (28), we obtain
E [ l n X 1 ] = ψ ( 1 λ 1 ) ψ ( k λ 1 λ k )
Secondly, by equating (25) and (29), we obtain
E [ l n X 2 ] = ψ ( 1 λ 2 ) ψ ( k λ 1 λ k )
If we go on until the ( k 1 ) term
E [ l n X k 1 ] = ψ ( 1 λ k 1 ) ψ ( k λ 1 λ k )
Next, by (27) and (30), we obtain
E [ l n ( 1 x 1 x k 1 ) ] = ψ ( 1 λ k ) ψ ( k λ 1 λ k )

3.4. Relation between Lagrange Multipliers and Parameters

Substituting (22) into (18) yields
f ( x k ) = e x p [ i = 1 k l n Γ ( 1 λ i ) + l n Γ ( k λ 1 λ k ) i = 1 k 1 λ i l n x i λ k l n ( 1 ( i = 1 k 1 x i ) ) ]
= Γ ( k λ 1 λ k ) i = 1 k Γ ( 1 λ i ) i = 1 k x i λ i
A comparison of Equation (35) with Equation (8) shows that
a i = 1 λ i

3.5. Relation between Parameters and Constraints

The parameters of the Dirichlet distribution are related to the Lagrange multipliers. In turn, these parameters are related to the known constraints by equations from (31) to (34). By eliminating Lagrange multipliers from these sets of equations, we can obtain an alternative way of presentation as shown below:
E [ l n X 1 ] = ψ ( a 1 ) ψ ( a 0 )
E [ l n X 2 ] = ψ ( a 2 ) ψ ( a 0 )
E [ l n X k 1 ] = ψ ( a k 1 ) ψ ( a 0 )
E [ l n ( 1 x 1 x k 1 ) ] = ψ ( a k ) ψ ( a 0 )

3.6. Distribution Entropy

From (14),
I [ f ] = f ( x k ) l n f ( x k ) d x 1 d x k 1
= l n Γ ( a 0 ) i = 1 k Γ ( a i ) i = 1 k l n ( x i a i 1 ) f ( x k ) d x 1 d x k 1
= l n i = 1 k Γ ( a i ) Γ ( a 0 ) i = 1 k ( a i 1 ) l n x i

4. Parameter Space Expansion Method

4.1. Specification of Constraints

Following [15], the constraints for this method are Equation (15) and
l n x i a i 1 f ( x k ) d x 1 d x k 1 = E [ l n X i a i 1 ] , i = 1 , , k 1
... l n ( 1 x 1 x k 1 ) a k 1 f ( x k ) d x 1 d x k 1 = E [ ( 1 x 1 x k 1 ) a k 1 ]

4.2. Derivation of the Entropy Function

The pdf that corresponds to the POME and that is consistent with Equation (15), (42), and (43) takes the form
f ( x k ) = e x p [ λ 0 i = 1 k 1 λ i l n x i a i 1 λ k l n ( 1 x 1 x k 1 ) a k 1 ] ,
where λ 0 , λ 1 , , λ k are Lagrange multipliers. Substituting (44) into Equation (15) yields
e x p ( λ 0 ) = ... exp [ i = 1 k 1 λ i l n x i a i 1 λ k l n ( 1 x 1 x k 1 ) a k 1 d x 1 d x k 1 ]
= i = 1 k Γ ( 1 λ i ( a i 1 ) ) Γ ( k λ 1 ( a 1 1 ) λ k ( a k 1 ) )
Substitution of Equation (45) into (44) gives
f ( x k ) = Γ ( k λ 1 ( a 1 1 ) λ k ( a k 1 ) ) i = 1 k Γ ( 1 λ i ( a i 1 ) ) e x p [ i = 1 k 1 λ i l n x i a i 1 λ k l n ( 1 ( i = 1 k 1 x i ) ) a k 1 ]
A comparison of Equation (46) with Equation (15) shows that λ 1 = λ k = 1 and taking the logarithm of (46) and multiplying by [ f ( x k ) ] and integrating between [ 0 , 1 ] and [ 0 , 1 x i ] , we obtain the entropy function
I [ f ] = l n Γ ( k λ 1 ( a 1 1 ) λ k ( a k 1 ) ) + i = 1 k Γ ( 1 λ i ( a i 1 ) ) + i = 1 k λ i E [ l n x i a i 1 ]

4.3. Relation between Parameters and Constraints

Equating the partial derivatives of (47) with respect to λ 1 , , λ k , a 1 , , a k to zero, one obtains
I [ f ] λ 1 = 0 = ( a 1 1 ) ψ ( K 1 ) ( a 1 1 ) ψ ( K 2 ) + E [ l n X 1 ( a 1 1 ) ]
K 1 = ( k λ 1 ( a 1 1 ) λ k ( a k 1 ) ) , K 2 = ( 1 λ 1 ( a 1 1 ) )
I [ f ] λ k = 0 = ( a k 1 ) ψ ( K 1 ) ( a k 1 ) ψ ( K k ) + E [ l n X k ( a k 1 ) ]
K k = ( k λ k ( a k 1 ) )
I [ f ] a 1 = 0 = λ 1 ψ ( K 1 ) λ 1 ψ ( K 2 ) + λ 1 E [ l n X 1 ]
I [ f ] a k = 0 = λ k ψ ( K 1 ) λ k ψ ( K k ) + λ k E [ l n X k ]
The simplification of equations from (48) to (51) yields
E [ l n X 1 ] = ψ ( K 2 ) ψ ( K 1 )
E [ l n X k ] = ψ ( K k ) ψ ( K 1 )
These equations provide the parameter estimators of the Dirichlet distribution.

5. Two Other Parameter Estimation Methods

5.1. Method of Moments

The Dirichlet distribution has k parameters like a i , i = 1 , , k . Therefore, i moments are needed for the parameter estimation. We have the moments, variances, and covariance formula in (9), (10), and (11). Because of (11),
E ( X i X j ) = C o v ( X i , X j ) + E ( X i ) E ( X j ) = a i a j a o ( a 0 + 1 )
E ( X i X j ) C o v ( X i , X j ) = a 0
If we multiply (55) by the negative of (9),
a i ^ = E ( X i ) E ( X i X j ) C o v ( X i , X j )
Due to (55) and (56), the last parameter estimation is
a k ^ = E ( X i X j ) ( 1 E ( X 1 ) E ( X k 1 ) C o v ( X i , X j )

5.2. Method of Maximum Likelihood Estimation

The likelihood function L, where n is the sample size, is
L = Γ ( a 0 ) i = 1 k Γ ( a i ) n j = 1 n i = 1 k x i j a i 1
Then the log likelihood function, l n L , is
l n L = n l n Γ ( a 0 ) n i = 1 k l n Γ ( a i ) + j = 1 n l n i = 1 k x i j a i 1
Differentiating Equation (59) with respect to parameters a 1 , , a k , respectively, and equating each derivative to zero yield the following equations:
E [ l n X 1 ] = ψ ( a 1 ) ψ ( a 0 )
E [ l n X 2 ] = ψ ( a 2 ) ψ ( a 0 )
E [ l n X k 1 ] = ψ ( a k 1 ) ψ ( a 0 )
E [ l n ( 1 x 1 x k 1 ) ] = ψ ( a k ) ψ ( a 0 )
These results are the same as those found by the ordinary entropy method and parameter space expansion method.
The maximum likelihood (ML) estimation method provides singular point estimates for model parameters while overlooking the residual uncertainty inherent in the estimation process. Conversely, the Bayesian estimation method adopts a different approach, yielding posterior probability distributions encompassing the entire spectrum of model parameters. This is achieved by integrating the observed data with prior distributions [26]. Broadly speaking, when contrasted with ML estimation, Bayesian parameter estimation within a statistical model has the potential to yield a robust and stable estimate. This is primarily due to its ability to incorporate the accompanying uncertainty into the estimation process, a particularly valuable attribute when dealing with limited amounts of observed data [27]. The Dirichlet distribution, being a constituent of the exponential family, possesses a corresponding conjugate prior. Nevertheless, due to the intricate nature of the posterior distribution, its practical utility in problem-solving scenarios is limited. Consequently, the task of Bayesian estimation for the Dirichlet distribution, in a general context, lacks analytical tractability. To achieve this objective, Zao employed an approximation approach to model the parameter distribution within the Dirichlet distribution. Specifically, they approximated it with a multivariate Gaussian distribution, leveraging the expectation propagation (EP) framework [28]. Furthermore, there are some studies in reliability engineering which estimate parameters with the determination of quantiles by the application of the maximum likelihood method, such as [29].

6. Simulation and Comparison of Parameter Estimation Methods

Simulation from the Dirichlet distribution can be performed in two steps: the probability integral theorem states that the distribution function of any continuous distribution is uniform on ( 0 , 1 ) . Then, by the inverse distribution function of gamma, one may simulate a number of independent gamma variates as needed. In other words, first one may simulate k independent gamma variates X 1 , X 2 , , X n such that X i G a m m a ( α i , 1 ) , i = 1 , 2 , , k and calculate Y j = X j i = 1 k X i ; j = 1 , 2 , , k . Then the random vector ( Y 1 , Y 2 , Y k ) fits the Dirichlet distribution, having parameter vector ( α 1 , α 2 , , α k ) [30]. This procedure can easily be realized even by Microsoft Excel.
In the present study, we first simulated 1000 ( X , Y ) pairs from the Dirichlet distribution for some arbitrary parameters α 1 , α 2 and α 3 . We obtained estimates obtained by four methods but since the maximum likelihood estimators and estimators obtained by the ordinary entropy method and parameter space expansion method are all the same, the comparison is between moment estimates and the rest. Then we repeated this experiment 5000 times. The summarizing statistics are as shown in Table 1:
Note that the maximum likelihood estimates (MLEs) are obtained by Excel Solver. In general, moment estimates and MLEs are close to each other. Absolute percentage errors (APEs) are calculated by the following formula:
A P E = 100 | p a r a m e t e r e s t i m a t e | p a r a m e t e r
Then, it can be inferred that one measure does not dominate all the time, i.e., there are some instances in which moment estimators perform better than the others, and there are other instances in which maximum likelihood estimators do it better. In any case, it is definite from the table that increasing the number of simulations increases precision considerably.
Note that, by the central limit theorem, moment estimators are expected to be distributed normally for a large number of observations (or simulations) since a moment estimator considers a sum of random observations (or the sum of some power of these random observations). Maximum likelihood estimators also have the asymptotic normality property with lower variances. For the Dirichlet distribution, we found that the entropy estimators mentioned above and maximum likelihood estimators are identical. Therefore, entropy estimators also have the asymptotic normality property.
Finally, we note that the selection of ( α 1 , α 2 , α 3 ) is quite arbitrary just for addressing the fact that maximum likelihood estimators (and maximum entropy estimators) are better (i.e., show lower sampling variability as compared to moment estimators). Actually, this was not the case all the time in the simulations. Since, in our study, the initial estimates of maximum likelihood (and maximum entropy) are provided by the method of moments and since the initial estimates provided by moments are close enough to the actual parameters, a great improvement in sampling variability may not be achieved. This is probably due to the nature of nonlinear estimation. To have a better picture, first simulating a random vector ( α 1 , α 2 , α 3 ) several times, then trying to calculate moment estimates and then, based on these initial estimates, moving forward to maximum likelihood estimates may be meaningful.

7. Conclusions

In the present study, parameter estimations of the Dirichlet distribution are obtained by four methods. For a Dirichlet distribution with three parameters, parameter estimates found by entropy methods (that we considered here) and by maximum likelihood are almost the same. Maximum likelihood estimators are consistent, most efficient, sufficient, tend to normality (as the sample size increases), and are invariant under functional transformations [31]. Therefore, parameter estimators found by the entropy methodology have the same appealing properties as the maximum likelihood estimators. Based on the fact that the sample moment will tend to be more concentrated for the corresponding population moment for larger samples, a sample moment can also be used to estimate population moments [1]. In general, moment estimators are asymptotically normally distributed and consistent. However, their variance may be larger than that of estimators derived by other methods [32]. Yet, it may be a good idea to start nonlinear estimation either for maximum likelihood or entropy maximization methods with initial moment estimates. In the present study, we started with moment estimates of a Dirichlet distribution with arbitrarily selected parameters to demonstrate that better parameter estimates (i.e., estimates both with lower bias and lower sampling variability) can be achieved. The simulation part of this work can be enlarged by determining parameters randomly by further simulations for further generalizations.

Author Contributions

All authors made the same contributions. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All results presented in the article were produced from model simulations. Therefore, there are no data to be made available. Researchers who wish to replicate the study should use Microsoft Excel and the parameters described in the article. With those parameters, researchers can use modeling simulations to replicate the tables and figures presented in the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Mood, A.M.; Graybill, F.A.; Boes, D. Introduction to the Theory of Statistics; McGraw-Hill Edition: New York, NY, USA, 1974. [Google Scholar]
  2. Casella, G.; Berger, R.L. Statistical Inference, 2nd ed.; Duxbury Advanced Series; Cengage Learning: Pacific Grove, CA, Australia, 2002. [Google Scholar]
  3. Dasgupta, A. Asymtotic Theory of Statistics and Probability; Springer: New York, NY, USA, 2002. [Google Scholar]
  4. Shannon, C.E. A mathematical theory of communication. Bell. Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
  5. Renyi, A. On measures of entropy and information. In Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 20 June–30 July 1960; Volume 1, pp. 547–561. [Google Scholar]
  6. Ullah, A. Entropy, divergence and distance measures with econometric applications. J. Stat. Plan. Inference 1996, 49, 137–162. [Google Scholar] [CrossRef]
  7. Kullback, S. Information Theory and Statistics; Dover Publications: New York, NY, USA, 1978. [Google Scholar]
  8. Esteban, M.D.; Morales, D. A summary on entropy statistics. Kybernetika 1995, 1, 337–346. [Google Scholar]
  9. Pardo, L. Statistical İnference Based on Divergence Measures; Chapman&Hall/CRC: New york, NY, USA, 2006. [Google Scholar]
  10. Singh, V.P. Entropy-Based Parameter Estimation in Hydrology; Kluwer Academic Publishers: Boston, MA, USA, 1998. [Google Scholar]
  11. Singh, V.P. Entropy Theory and Its Application in Environmental and Water Engineering; John Wiley and Sons: West Sussex, UK, 2013. [Google Scholar]
  12. Singh, V.P. Entropy Theory in Hydraulic Engineering: An Introduction; ASCE Press: Reston, VA, USA, 2015. [Google Scholar]
  13. Singh, V.P. Entropy Theory in Hydrologic Science and Engineering; Hill Education: New York, NY, USA, 2014. [Google Scholar]
  14. Jaynes, E.T. Probability Theory: The Logic of Science; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
  15. Singh, V.P.; Rajagopal, A.K. A new method of parameter estimation for hydrologic frequency analysis. Hydrol. Sci. Technol. 1986, 3, 33–40. [Google Scholar]
  16. Song, S.; Song, X.; Kang, Y. Entropy-Based Parameter Estimation for the Four-Parameter Exponential Gamma Distribution. Entropy 2017, 19, 189. [Google Scholar] [CrossRef]
  17. Hao, Z.; Singh, V.P. Entropy-based parameter estimation for extended Burr XII distribution. Stoch Env. Res Risk Assess 2009, 23, 1113–1122. [Google Scholar] [CrossRef]
  18. Singh, V.P.; Deng, Z.Q. Entropy-based parameter estimation for kappa distribution. J. Hydrol. Eng. 2003, 8, 81–92. [Google Scholar] [CrossRef]
  19. Gao, L.; Han, D. Methods of Moment and Maximum Entropy for Solving Nonlinear Expectation. Mathematics 2019, 7, 45. [Google Scholar] [CrossRef]
  20. De Groot, M.; Shervish, M. Probability and Statistics, 4th ed.; Addison-Wesley: Boston, MA, USA, 2002. [Google Scholar]
  21. Press, S.J. Applied Multivariable Analysis Using Bayesian and Frequent Methods of Inference; Dover Publications: Mineola, NY, USA, 1981. [Google Scholar]
  22. Lin, J. On the Dirichlet Distribution. Master’s Thesis, Queens University, Kingston, ON, Canada, 2016. [Google Scholar]
  23. Robin, K.S. A generalization of the Dirichlet distribution. J. Stat. Softw. 2010, 33, 1–18. [Google Scholar]
  24. Bilodeau, M.; Brenner, D. Theory of Multivariate Statistics; Springer: New York, NY, USA, 1999. [Google Scholar]
  25. Abramowitz, M.; Stegun, I.A. Handbook of Mathematical Functions; Dover Publications: Washington, DC, USA, 1964. [Google Scholar]
  26. Bishop, C. Pattern Recognition and Machine Learning. J. Electron. Imaging 2006, 4, 049901. [Google Scholar] [CrossRef]
  27. Zhanyu, M.; Pravin, K.R.; Jalil, T.; Markus, F.; Arne, L. Bayesian estimation of Dirichlet mixture model with variational inference. Pattern Recognit. 2014, 47, 3143–3157. [Google Scholar] [CrossRef]
  28. Ma, Z. Bayesian estimation of the Dirichlet distribution with expectation propagation. In Proceedings of the 20th European Signal Processing Conference (EUSIPCO), Bucharest, Romania, 27–31 August 2012; pp. 689–693. [Google Scholar]
  29. Zhuang, L.; Xu, A.; Wang, X.L. A prognostic driven predictive maintenance framework based on Bayesian deep learning. Reliab. Eng. Syst. Saf. 2023, 234, 109181. [Google Scholar] [CrossRef]
  30. Devroye, L. Non-Uniform Random Variate Generation; Springer Science+Business Media: New York, NY, USA, 1986. [Google Scholar]
  31. Keeping, E.S. Introduction to Statistical Inference; Dover Publications: New York, NY, USA, 1995; pp. 126–127. [Google Scholar]
  32. Hines, W.W.; Montgomery, D.C.; Goldsman, D.M.; Borror, C.M. Probability and Statistics in Engineering, 4th ed.; John Wiley Sons, Inc.: Hoboken, NJ, USA, 2008; pp. 222–225. [Google Scholar]
Table 1. Results of some simulations (1000 runs, 5000 runs).
Table 1. Results of some simulations (1000 runs, 5000 runs).
1000 runs/5000 runs α 1 = 3 α 2 = 2 α 3 = 4
MOM2.79/2.921.86/1.943.81/3.88
APE6.75/2.356.99/2.94.73/2.94
MLE3.06/3.232.21/2.114.64/4.32
APE2.07/7.8510.5/5.9816.24/8.1
1000 runs/5000 runs α 1 = 4 α 2 = 0 . 25 α 3 = 2
MOM3.96/3.940.25/0.241.96/1.97
APE0.99/1.262.82/0.051.53/1.18
MLE4.28/4.020.26/0.222.03/1.96
APE7.01/0.574.27/9.691.85/1.79
1000 runs/5000 runs α 1 = 0 . 5 α 2 = 3 α 3 = 2
MOM0.48/0.53.1/3.092.09/2.04
APE3.91/1.943.36/3.134.67/2.09
MLE0.61/0.53.39/3.1 12.32/2.06
APE22.48/0.5813.01/3.9216.1/3.34
1000 runs/5000 runs α 1 = 3 α 2 = 3 α 3 = 4
MOM3.01/3.043.05/2.994.13/4.04
APE0.49/1.631.82/0.363.38/1.04
MLE3.22/2.863.41/2.884.61/3.61
APE7.57/4.3813.67/3.7115.48/9.64
1000 runs/5000 runs α 1 = 13 α 2 = 2 α 3 = 0 . 75
MOM14.04/12.992.1/1.980.83/0.73
APE8/0.025.34/0.5611/1.45
MLE12.52/13.111.96/2.020.75/0.69
APE3.64/0.871.89/1.340.72/7.85
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Şahin, B.; Evren, A.A.; Tuna, E.; Şahinbaşoğlu, Z.Z.; Ustaoğlu, E. Parameter Estimation of the Dirichlet Distribution Based on Entropy. Axioms 2023, 12, 947. https://doi.org/10.3390/axioms12100947

AMA Style

Şahin B, Evren AA, Tuna E, Şahinbaşoğlu ZZ, Ustaoğlu E. Parameter Estimation of the Dirichlet Distribution Based on Entropy. Axioms. 2023; 12(10):947. https://doi.org/10.3390/axioms12100947

Chicago/Turabian Style

Şahin, Büşra, Atıf Ahmet Evren, Elif Tuna, Zehra Zeynep Şahinbaşoğlu, and Erhan Ustaoğlu. 2023. "Parameter Estimation of the Dirichlet Distribution Based on Entropy" Axioms 12, no. 10: 947. https://doi.org/10.3390/axioms12100947

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop