1. Introduction
Information theoretic estimators have been receiving increasing attention in the econometric-statistics literature [
1,
2,
3,
4,
5,
6,
7]. In other work, [
3] proposed an information theoretic estimator based on minimization of the Kullback-Leibler Information Criterion as an alternative to optimally-weighted generalized method of moments estimation. This specific estimator handles weakly dependent data generating mechanisms and under reasonable regulatory assumptions it is consistent and asymptotically normally distributed. Subsequently, [
1] proposed an information theoretic estimator based on minimization of the Cressie-Read discrepancy statistic as an alternative approach to inference in moment condition models. In [
1] identified a special case of the Cressie-Read statistic—the Kullback-Leibler Information Criterion (e.g., maximum entropy)—as being preferred over other estimators (e.g., empirical likelihood) because of its efficiency and robustness properties. Special issues of the
Journal of Econometrics (March 2002) and
Econometric Reviews (May 2008) were devoted to this particular topic of information estimators.
Historically, information theoretic estimators have been motivated in several ways. The Cressie-Read statistic directly minimizes an information based concept of closeness between the estimated and empirical distribution [
1]. Alternatively, the maximum entropy principle is based on an axiomatic approach that defines a unique objective function to measure uncertainty of a collection of events [
8,
9,
10]. Interest in maximum entropy estimators stems from the prospect to recover and process information when the underlying sampling model is incompletely or incorrectly known and the data are limited, partial, or incomplete [
10]. To date the principle of maximum entropy has been applied in an abundance of circumstances, including in the fields of econometrics and statistics [
11,
12,
13,
14,
15,
16,
17], economic theory and applications [
18,
19,
20,
21,
22,
23,
24], accounting and finance [
25,
26,
27], and resources and agricultural economics [
28,
29,
30,
31,
32]. Moreover, widely used econometric software packages are now incorporating procedures to calculate maximum entropy estimators in their latest releases (e.g., SAS, SHAZAM, and GAUSSX).
In most cases, rigorous investigation of small and large sample properties of information theoretic estimators have lagged far behind empirical applications [
3]. Exceptions include [
1,
2,
3] who examined information theoretic alternatives to generalized method of moments estimation; [
14] who derived the statistical properties of the generalized maximum entropy estimator in the context of modeling multinomial response data; and, [
10] who provided asymptotic properties for the moment-constrained generalized maximum entropy (GME) estimator for the general linear model (showing it is asymptotically equivalent to ordinary least squares). An alternative information theoretic estimator of the general linear model (GLM), yet to be rigorously investigated, but that has arisen in empirical applications (e.g., [
24]), is the purely data-constrained formulation of the generalized maximum entropy estimator [
10]. In a purely data-constrained formulation the regression model itself, as opposed to moment conditions of it, represents the constraining function to the entropy objective function. In the maximum entropy framework, unlike ordinary least square or maximum likelihood estimators of the GLM, moment constraints are not necessary to uniquely identify parameter estimates. Moreover, there exists distinct differences between the data and moment constrained versions of the GME for the GLM. For [
10] have shown the data-constrained GME estimator to be mean square error superior to the moment-constrained GME estimator of the GLM in selected Monte Carlo experiments.
Our paper contributes to the econometric literature in several ways. First, regularity conditions are identified that provide a solid foundation from which to develop statistical properties of the data constrained GME estimator of the GLM and hypothesis tests on model parameters. Given the regularity conditions, we define a conditional maximum entropy function to rigorously prove consistency and asymptotic normality. As demonstrated in this paper the data-constrained GME estimator is not asymptotically equivalent to the moment-constrained GME estimator or ordinary least squares estimator. However, the GME estimator is shown to be nearly asymptotically efficient. Moreover, we derive formulae to compute the asymptotic variance of the proposed estimator. This allows us to define classical Wald, Likelihood Ratio, and Lagrange Multiplier tests for testing hypothesis about model parameters.
Second, theoretical extensions to unbiased, cross entropy, and Bayesian estimation are also identified. Further, we demonstrate that the GME specification can be extended from finite-discrete parameter and error spaces to infinite-continuous parameter and error spaces. Alternative formulations of the data constrained GME estimator of the GLM under selected regularity conditions, and the implications to properties of the estimator, are also discussed.
Third, to compliment the theoretical results, Monte Carlo experiments are used in comparing the performance of the data-constrained GME estimates to least squares estimates for small and medium size samples. The performance of the GME estimator is tested relative to selected distributions of the errors, to the user supplied supports of the parameters and errors, and to its robustness to model misspecification. Monte Carlo experiments are also performed to examine the size and power of the Wald, Likelihood Ratio, and Lagrange Multiplier test statistics.
Fourth, insight into computational efficiency and guidelines for setting boundaries of parameters and error support spaces are discussed. The conditional maximum entropy formulation utilized in proof of asymptotic properties provides a basis for new computationally efficient method of calculating GME estimates. The approach involves a nonlinear search over a K-vector of coefficient parameters, which is much more efficient than numerical approaches proposed elsewhere in the literature. Finally, practical guidelines for setting boundaries of parameters and error support spaces are analyzed and discussed.
2. The Data-Constrained GME Formulation
Let
represent the general linear model with
Y being an
dependent variable vector,
X being a fixed
matrix of explanatory variables,
β being a
vector of parameters, and ε being an
vector of disturbance terms (All of our results can be extended to stochastic
X. For example, if
is iid with
, a positive definite matrix, then the asymptotic properties are identical to those developed below). The GME rule for defining the estimator of the unknown
β in the general linear model formulation is given by
with
derived from the following constrained maximum entropy problem:
subject to:
In the preceding formulation, the matrices
Z and
V are
and
matrices of support points for the β and ε vectors, respectively, as:
where
is a
vector such that
and
, and similarly
is a
vector such that
and
(in their original formulation, [
10] required
to be contained in a fixed interval with arbitrarily high probability. Here we assume such an event occurs with probability). The
vectors and The
vectors are weight vectors having nonnegative elements that sum to unity and are used to represent the β and ε vectors as
for
, and
for
.
The basic principle underlying the estimator for β is to choose an estimate that contains only the information available. In this way the maximum entropy estimator is not constrained by any extraneous assumptions. The information used is the observed information contained in the data, the information contained in the constraints on the admissible values of β, and the information inherent in the structure of the model, including the choice of the supports for The ’s. In effect, the information set used in estimation is shrunk to the boundary of the observed data and the parameter constraint information. Because the objective function value increases as the weights in pi and wi are more uniformly distributed, any deviation from uniformity represents the effect of the data constraints on the weighting of the support points used for representing β and ε. This fact also motivates the interpretation of the GME as a shrinkage-type estimator that in the absence of constraints on β will shrink to the centers of the supports defined in the specification of Z. We next establish consistency and asymptotic normality results for the GME estimator under general regularity conditions on the specification of the estimation problem.
3. Consistency and Asymptotic Normality of the GME Estimator
Regularity Conditions. To establish asymptotic results for the GME estimator, we utilize the following regularity conditions for the problem of estimating
β in
.
- R1
The ′s are iid with for some δ > 0 and large enough finite positive .
- R2
The pdf of , is symmetric around 0 with variance .
- R3
, for finite .
- R4
X has full column rank.
- R5
is O(1) and the smallest eigenvalue of for some ε > 0, and , where N** is some positive integer.
- R6
, a finite positive definite symmetric matrix.
Note that condition R1 states that the support of is contained in the interior of some large enough closed finite interval . Condition R3 states that the true value of parameter can be enclosed within some open interval . The conditions R4-R6 on X are familiar analogues to typical assumptions made in the least squares context for establishing asymptotic properties of the least squares estimator of β. We utilize condition R6 to simplify the demonstration of asymptotic normality, but the result can be established under weaker conditions, as alluded to in the proof. Finally, our proof of the asymptotic results will utilize symmetry of the disturbance distribution, which is the content of condition R2.
Reformulated GME Rule. The asymptotic results are derived within the context of the following representation of the GME model, represented in scalar notation to facilitate exposition of the proof. The GME representation described below is completely consistent with the formulation in
Section 2 under the condition that the support points represented by the vector
vi are chosen to be symmetrically dispersed around 0. We use the same vector of support points for each of The
′s, consistent with the iid nature of the disturbances, and so henceforth
refers to the common
scalar support point in the development below. The representation is also more general than the representation in
Section II in the sense that different numbers of support points can be used for the representation of different
parameters. The constrained maximum entropy problem is as follows:
subject to:
- C1
- C2
- C3
- C4
- C5
- C6
As will become apparent, the nonnegativity restrictions on and are inherently enforced by the structure of the optimization problem itself, and thus need not be explicitly incorporated into the constraint set.
Asymptotic Properties. The following theorem establishes the consistency and asymptotic normality of the GME estimator of β in the GLM.
Theorem. Under regularity conditions R1-R5, the GME estimator is a consistent estimator of β. With the addition of regularity condition R6, the GME estimator is asymptotically normally distributed asfor appropriate definitions of .
Proof. Define the maximized entropy function, conditional on
, as:
The optimal value of
in the conditionally-maximized entropy function is given by:
which is the maximizing solution to the Lagrangian:
The optimal value of
is then:
where
is the optimal value of the Lagrangian multiplier
under the condition
, and
. It follows from the symmetry of the
vi’s around zero that:
Similarly, the optimal value of
in the conditionally-maximized entropy function is given by:
which is the maximizing solution to the Lagrangian:
The optimal value of
is then:
where
is the optimal value of the Lagrangian multiplier
under the condition
.
Substituting the optimal solutions for the
’s and
’s into (2) obtains the conditional maximum value function:
Define the gradient vector of
so that:
and thus
, where
and
are
and
vectors of Lagrangian multipliers. It follows that the Hessian matrix of
is given by:
Regarding the functional form of the derivatives of the Lagrangian multipliers appearing in the definition of
, it follows from (C2) that:
so that from (3):
Then, from (C2)
, and thus:
Also, based on (C1):
so that:
Because the denominators of the terms in the definition of the
’s are positive valued, it follows that
is a negative definite matrix, because
is positive definite.
Now consider the case where
, so that:
are iid with mean zero, and thus:
are iid with mean zero. Because
is bounded in the interior of
, the range of
is bounded as well. In addition,
is symmetrically distributed around zero because The
’s are so distributed, and, from (4):
It follows that
, the
’s are iid, and
has finite variance, say
. Then, using a multivariate version of Liapounov's central limit theorem, and given condition R6 (asymptotic normality can be established without regularity condition R6. In fact, the boundedness properties on the
X-matrix stated in R5 would be sufficient. See [
33] for a related proof under the weaker regularity conditions).
3.1. Consistency
For any τ, represent the conditional maximum value function,
, by a second order Taylor series around
β as:
where
lies between
τ and
β. The value of the quadratic term in the expansion can be bounded by:
where
denotes the smallest eigenvalue of
[
34]. The smallest eigenvalue exhibits a positive lower bound given by
whatever the value of
.
The value of the linear term in the expansion is bounded in probability; that is,
and for
, there exists a finite
such that:
because
. It follows from Equations (7)–(9) that, for all
. Thus
, and the GME estimator of
β is consistent.
3.2. Asymptotic Normality
Expand
G(
b) in a Taylor series around
β, where
is the GME estimator of
β, to obtain:
where
is between
and
β. In general, different
points will be required to represent the different coordinate functions in
. At the optimum,
and
is a consistent estimator of
β; therefore
, and:
where
denotes equivalence of limiting distributions. Using
, note that:
where
are iid. It follows from R6 that
with
. Recalling that
, Slutsky's Theorem [
34] implies that:
Note that holding the support of ε constant, one can reduce the interval (c1, cJ). As , the asymptotic variance of may tend to zero, but cannot grow without bound. For example, if at such that , all all , then .
Also note that, for large samples, the parameters reliance on the supports vanishes. In contrast, the supports on the errors influence the computed covariance matrix. Finally, for non-homogenous errors, the covariance matrix estimator could be adjusted following a standard White’s covariance correction.
3.3. Cross-Entropy Extensions
To extend the previous asymptotic results to the case of cross-entropy maximization [
10], first suppose that
and/or
for some
. Let
denote the distinct values among the
’s and
’s, respectively, and let
denote the respective multiplicities of the values
. From Equations (3) and (5),
if
and
if
. Thus, the maximization problem given by Equation (2) and Conditions C1-C6 is equivalent to:
with obvious changes being made to C1-C6. The only alterations needed to the preceding proof are:
More generally, the same representation (11)-(13) applies for any
. Furthermore, Equations (12) and (13) are homogeneous of degree zero in
and
, respectively. Thus, without loss of generality, the normalization conditions:
can be imposed.
Using Equations (11), (12), and (13), we have characterized the maximum cross entropy solution. Upon substitution of Equations (11)–(13) in the appropriate arguments, all results, including the results in the next section on statistical testing, apply to the maximum cross-entropy paradigm.
5. Monte Carlo Simulations
A Monte Carlo experiment was conducted to explore the sampling behavior of test situations based on the Generalized Maximum Entropy Estimator. The data were generated based on a linear model containing an intercept term, a dichotomous explanatory variable, and two continuously measured explanatory variables. The results of the Monte Carlo experiment also add additional perspective to simulation results relating the bias and mean square error to the maximum entropy estimator generated previously by [
10].
The linear model
Y =
Xβ +
ε is specified as
, where
is a discrete random variable such that
Bernoulli(.5), observations on the pair of explanatory random variables
are generated from iid outcomes of
that are censored at the mean ±3 standard deviations, and outcomes of the disturbance term are defined as
, where
Uniform(0,1). The support points for the disturbance terms were specified as
V = (−10, 0, 10)' (recall C2 and C3) for all experiments. Three different sets of support points were specified for the
β-vector, given by:
and:
(recall C1). The support points in
ZI were chosen to be most favorable to the GME estimator, where the elements of the true
β-vector are located in the center of their respective supports and the widths of the supports are relatively narrow. The supports represented by
ZII are tilted to the left of
β1 and
β2 and to the right of
β3 and
β4 by 1 unit, with the widths of the supports being the same as their counterparts in
ZI . The last set of supports represented by
ZIII are wider and effectively define an upper bound of 10 on the absolute values of each of the elements of
β.
To explore the respective sizes of the various tests presented in
Section 4, the hypothesis
was tested using the
TZ test, and the hypothesis
was tested using the Wald, pseudo-likelihood, and Lagrange Multiplier tests, with
c and
d set equal to the true values of
β2 and
β3,
i.e.,
c = 1 and
d = −1. Critical values of the tests were based on their respective asymptotic distributions and a 0.05 level of significance. An observation on the power of the respective tests was obtained by performing a test of significance whereby
c =
d = 0 in the preceding hypotheses. All scenarios were analyzed using 10,000 Monte Carlo repetitions, and sample sizes of
n = 25, 100, 400, and 1,600 were examined. In the course of calculating values of the test statistics, both unrestricted and restricted (by
β2 =
c and/or
β3 =
d) GME estimators needed to be calculated. Therefore, bias and mean square error measures relating to these and the least squares estimators were calculated as well. Monte Carlo results for the test statistics and for the unrestricted GME and OLS estimators are presented in
Table 1 and
Table 2, respectively, while results relating to the restricted GME and OLS estimators are presented in
Table 3. Because the choice of which asymptotic covariance matrix to use in calculating the
TZ and Wald tests was inconsequential, only the results for the second suggested covariance matrix representation are presented here.
Regarding properties of the test statistics, their behavior under a true H0 is consistent with the behavior expected from the respective asymptotic distributions when n is large (sample size of 1600), their sizes being approximately .05 regardless of the choice of support for β. The sizes of the tests remain within 0.01 of their asymptotic size when n decreases to 400, except for the Lagrange Multiplier test under support ZII, which has a slightly larger size. Across all support choices and ranging over all sample sizes from small to large, the sizes of the TZ and Wald tests remain in the 0−0.10 range; for ZI supports and small sample sizes, the sizes of the tests are substantially less than 0.05. Results were similar for the pseudo-likelihood and Lagrange Multiplier tests, except for the cases of ZII support and n ≤ 100, where the size of the test increased as high as 0.36 for the pseudo-likelihood test and 0.73 for the Lagrange multiplier test when n = 25.
Table 1.
Rejection Probabilities for True and False Hypotheses.
Table 1.
Rejection Probabilities for True and False Hypotheses.
Supports | Tz | WALD | Pseudo-Likelihood | Lagrange Multiplier |
---|
| H0 | H0 | H0 | H0 |
ZI | | | β2 = 1 | β2 = 0 | β2 = 1 | β2 = 0 | β2 = 1 | β2 = 0 |
β2 = 1 | β2 = 0 | β3 = −1 | β3 = 0 | β3 = −1 | β3 = 0 | β3 = −1 | β3 = 0 |
n = 25 | 0.000 | 0.825 | 0.004 | 0.998 | 0.021 | 1.000 | 0.059 | 1.000 |
n = 100 | 0.017 | 0.999 | 0.022 | 1.000 | 0.038 | 1.000 | 0.056 | 1.000 |
n = 400 | 0.041 | 1.000 | 0.042 | 1.000 | 0.048 | 1.000 | 0.053 | 1.000 |
n = 1600 | 0.047 | 1.000 | 0.046 | 1.000 | 0.049 | 1.000 | 0.050 | 1.000 |
ZII | | | | | | | | |
n = 25 | 0.101 | 0.047 | 0.080 | 0.894 | 0.357 | 0.980 | 0.734 | 0.995 |
n = 100 | 0.085 | 0.996 | 0.067 | 1.000 | 0.114 | 1.000 | 0.172 | 1.000 |
n = 400 | 0.053 | 1.000 | 0.048 | 1.000 | 0.058 | 1.000 | 0.066 | 1.000 |
n = 1600 | 0.052 | 1.000 | 0.052 | 1.000 | 0.055 | 1.000 | 0.057 | 1.000 |
ZIII | | | | | | | | |
n = 25 | 0.038 | 0.670 | 0.070 | 0.967 | 0.097 | 0.980 | 0.088 | 0.972 |
n = 100 | 0.045 | 0.999 | 0.050 | 1.000 | 0.057 | 1.000 | 0.052 | 1.000 |
n = 400 | 0.045 | 1.000 | 0.050 | 1.000 | 0.051 | 1.000 | 0.050 | 1.000 |
n = 1600 | 0.051 | 1.000 | 0.051 | 1.000 | 0.052 | 1.000 | 0.051 | 1.000 |
The powers of the tests were all substantial in rejecting false null hypotheses except for the TZ test in the case of ZII support and the smallest sample size, the latter result being indicative of a notably biased test. Overall, the choice of support did impact the power of tests for rejecting the errant hypotheses, although the effect was small for all but the TZ test.
In the case of unrestricted estimators and the most favorable support choice (
ZI ), the GME estimator dominated the OLS estimator in terms of MSE, and GME superiority was substantial for sample sizes of
n ≤ 100 (
Table 2). The GME-
ZI estimator and, of course, the OLS estimator, were unbiased, with the GME-
ZI estimator exhibiting substantially smaller variances for smaller
n. The choice of support has a significant effect on the bias and MSE of the GME estimator for small sample sizes. Neither the GME-
ZII or GME-
ZIII estimator dominates the OLS estimator, although the GME-
ZIII estimator is generally the better estimator across the various sample sizes. When
n = 25, the GME-
ZII estimator offers notable improvement over OLS for estimating three of the four elements of
β, but is significantly worse for estimating
β2. For larger sample sizes, the GME-
ZII estimator is generally inferior to the OLS estimator. Although the centers of the
ZIII support are on average further from the true
β’s than are the centers of the
ZII support, the wider widths of the former result in a superior GME estimator.
The results for the restricted GME estimators in
Table 3 indicate that under the errant constraints
, the GME dominates the OLS estimator for all sample sizes and for all support choices. The superiority of the GME estimator is substantial for smaller sample sizes, but dissipates as sample size increases. The results suggest a misspecification robustness of the GME estimator that deserves further investigation.
Table 2.
and Mean Square Error Measures–Unrestricted Estimators.
Table 2.
and Mean Square Error Measures–Unrestricted Estimators.
Estimator | β1 = 2 | β2 = 1 | β3 = −1 | β4 = 3 |
---|
| MSE | | MSE | | MSE | | MSE |
---|
GME-ZI | | | | | | | | |
n = 25 | 2.000 | 0.015 | 1.001 | 0.038 | −1.001 | 0.028 | 3.000 | 0.006 |
n = 100 | 2.003 | 0.034 | 1.003 | 0.026 | −1.000 | 0.011 | 2.999 | 0.004 |
n = 400 | 2.000 | 0.032 | 1.001 | 0.009 | −1.000 | 0.003 | 3.000 | 0.002 |
n = 1600 | 2.000 | 0.014 | 1.000 | 0.002 | −1.000 | 0.001 | 3.000 | 0.001 |
GME-ZII | | | | | | | | |
n = 25 | 1.022 | 0.977 | 0.484 | 0.309 | −0.840 | 0.058 | 3.182 | 0.040 |
n = 100 | 1.306 | 0.519 | 0.826 | 0.056 | −0.966 | 0.013 | 3.139 | 0.023 |
n = 400 | 1.672 | 0.141 | 0.960 | 0.010 | −0.996 | 0.003 | 3.066 | 0.006 |
n = 1600 | 1.892 | 0.026 | 0.991 | 0.002 | −1.000 | 0.001 | 3.022 | 0.001 |
GME-ZIII | | | | | | | | |
n = 25 | 1.278 | 0.757 | 0.946 | 0.131 | −0.881 | 0.069 | 3.092 | 0.028 |
n = 100 | 1.709 | 0.252 | 0.995 | 0.037 | −0.978 | 0.014 | 3.046 | 0.011 |
n = 400 | 1.914 | 0.068 | 0.999 | 0.010 | −0.996 | 0.003 | 3.015 | 0.003 |
n = 1600 | 1.978 | 0.017 | 0.999 | 0.002 | −0.999 | 0.001 | 3.004 | 0.001 |
OLS | | | | | | | | |
n = 25 | 1.997 | 1.342 | 1.002 | 0.181 | −1.002 | 0.066 | 3.001 | 0.065 |
n = 100 | 2.009 | 0.283 | 1.003 | 0.041 | −1.000 | 0.014 | 2.998 | 0.014 |
n = 400 | 2.001 | 0.068 | 1.001 | 0.010 | −1.000 | 0.003 | 3.000 | 0.003 |
n = 1600 | 2.000 | 0.017 | 1.000 | 0.003 | −1.000 | 0.001 | 3.000 | 0.001 |
Table 3.
and Mean Square Error Measures – Restricted Estimators Under the Errant Restriction .
Table 3.
and Mean Square Error Measures – Restricted Estimators Under the Errant Restriction .
Estimator | β1 = 2 | β4 = 3 |
---|
| MSE | | MSE |
---|
GME-ZI | | | | |
n = 25 | 2.078 | 0.041 | 2.681 | 0.011 |
n = 100 | 2.340 | 0.191 | 2.630 | 0.142 |
n = 400 | 2.689 | 0.537 | 2.600 | 0.196 |
n = 1600 | 2.898 | 0.832 | 2.520 | 0.232 |
GME-ZII | | | | |
n = 25 | 1.064 | 0.915 | 2.885 | 0.018 |
n = 100 | 1.603 | 0.234 | 2.772 | 0.056 |
n = 400 | 2.330 | 0.169 | 2.630 | 0.140 |
n = 1600 | 2.776 | 0.628 | 2.543 | 0.210 |
GME-ZIII | | | | |
n = 25 | 1.686 | 0.589 | 2.750 | 0.084 |
n = 100 | 2.468 | 0.542 | 2.601 | 0.172 |
n = 400 | 2.842 | 0.823 | 2.530 | 0.225 |
n = 1600 | 2.958 | 0.948 | 2.508 | 0.243 |
OLS | | | | |
n = 25 | 3.011 | 3.342 | 2.497 | 0.342 |
n = 100 | 3.013 | 1.575 | 2.497 | 0.274 |
n = 400 | 3.005 | 1.138 | 2.499 | 0.256 |
n = 1600 | 2.999 | 1.030 | 2.500 | 0.251 |
Asymmetric Error Supports
We present further Monte Carlo simulations to show that regularity condition R2, which assumes symmetry of the disturbance term, is not a necessary condition for identification of the GME slope parameters. It is demonstrated below that if the supports of the error distribution asymmetric, then only the intercept term of the GME regression estimator is asymptotically biased.
The Monte Carlo experiments that follow are identical to those above except for specification of the user supplied support points for the error terms and the underlying true error distribution. To illustrative the impact of asymmetric errors, experiments are based on one set of support points symmetric about zero, , and two sets of support points not symmetric about zero, and . The support VII is a simple translation of VI by five positive units in magnitude and retaining symmetry centered about 5. The asymmetric support VIII translates the truncation points by five positive units in magnitude, but retains the center support point 0. The true error distribution is generated in two ways: a symmetric distribution specified as a N(0,1) distribution truncated at (−3,3) and an asymmetric distribution specified as a Beta(3,2) translated and scaled from support (0,1) to (−3,3) with mean 0.6. Supports on the parameter coefficients terms are retained as ZI, providing symmetric support points about the true coefficient values.
Monte Carlo experiments presented in
Table 4 and
Table 5 are generated for sample sizes 25, 100, and 400 with 1,000 replications for each sample size. Consider when the true distribution is symmetric about zero. Slope coefficients for error supports that are not symmetric about zero appear biased in smaller sample sizes. However, the bias and MSE of the slope coefficients decrease as the sample sizes increases. Next, suppose the true distribution is asymmetric. For symmetric and asymmetric supports only the intercept terms are persistently biased, diverging from the true parameter values as the sample size increases. These results demonstrate the robustness of GME slope coefficients to asymmetric error distributions and user supplied supports.
Table 4.
Mean and MSE of 1,000 Monte Carlo Simulations with True Distribution Symmetric. Symmetric and Asymmetric Error Supports and Coefficient Support ZI.
Table 4.
Mean and MSE of 1,000 Monte Carlo Simulations with True Distribution Symmetric. Symmetric and Asymmetric Error Supports and Coefficient Support ZI.
Estimator | β1 = 2 | β2 = 1 | β3 = −1 | β4 = 3 |
---|
E(β1) | MSE | E(β2) | MSE | E(β3) | MSE | E(β4) | MSE |
---|
GME-ZI,VI | | | | | | | | |
25 | 2.002 | 0.016 | 1.003 | 0.042 | −1.000 | 0.030 | 2.997 | 0.007 |
100 | 2.000 | 0.033 | 1.001 | 0.026 | −1.002 | 0.011 | 3.002 | 0.004 |
400 | 2.000 | 0.035 | 1.001 | 0.010 | −0.998 | 0.003 | 2.999 | 0.002 |
GME-ZI,VII | | | | | | | | |
25 | 1.259 | 0.585 | 0.815 | 0.101 | −1.009 | 0.048 | 2.209 | 0.636 |
100 | 0.208 | 3.258 | 0.804 | 0.071 | −0.944 | 0.020 | 2.381 | 0.389 |
400 | −1.144 | 9.903 | 0.868 | 0.028 | −0.959 | 0.005 | 2.640 | 0.132 |
GME-ZI,VIII | | | | | | | | |
25 | 1.506 | 0.271 | 0.875 | 0.069 | −1.005 | 0.038 | 2.476 | 0.282 |
100 | 0.752 | 1.598 | 0.875 | 0.045 | −0.961 | 0.015 | 2.602 | 0.163 |
400 | −0.235 | 5.024 | 0.925 | 0.015 | −0.977 | 0.004 | 2.794 | 0.044 |
OLS | | | | | | | | |
25 | 2.014 | 1.321 | 1.007 | 0.204 | −0.998 | 0.069 | 2.993 | 0.065 |
100 | 1.999 | 0.280 | 1.001 | 0.042 | −1.002 | 0.014 | 3.002 | 0.014 |
400 | 2.001 | 0.075 | 1.001 | 0.011 | −0.997 | 0.003 | 2.999 | 0.003 |
Table 5.
Mean and MSE of 1000 Monte Carlo Simulations with True Distribution Asymmetric. Symmetric and Asymmetric Error Supports and Coefficient Support ZI.
Table 5.
Mean and MSE of 1000 Monte Carlo Simulations with True Distribution Asymmetric. Symmetric and Asymmetric Error Supports and Coefficient Support ZI.
| β1=2 | β2=1 | β3=−1 | β4=3 |
---|
Estimator | E(β1) | MSE | E(β2) | MSE | E(β3) | MSE | E(β4) | MSE |
---|
GME-ZI,VI | | | | | | | | |
25 | 2.089 | 0.031 | 1.038 | 0.060 | −1.005 | 0.041 | 3.094 | 0.018 |
100 | 2.233 | 0.108 | 1.023 | 0.033 | −1.006 | 0.016 | 3.071 | 0.010 |
400 | 2.427 | 0.229 | 1.015 | 0.012 | −1.004 | 0.005 | 3.033 | 0.004 |
GME-ZI,VII | | | | | | | | |
25 | 1.358 | 0.449 | 0.843 | 0.103 | −1.021 | 0.057 | 2.305 | 0.496 |
100 | 0.410 | 2.583 | 0.826 | 0.073 | −0.966 | 0.019 | 2.463 | 0.294 |
400 | −0.860 | 8.209 | 0.890 | 0.025 | −0.966 | 0.006 | 2.700 | 0.092 |
GME-ZI,VIII | | | | | | | | |
25 | 1.597 | 0.190 | 0.905 | 0.075 | −1.019 | 0.049 | 2.574 | 0.193 |
100 | 0.964 | 1.129 | 0.889 | 0.055 | −0.967 | 0.020 | 2.674 | 0.112 |
400 | 0.126 | 3.553 | 0.946 | 0.016 | −0.981 | 0.005 | 2.835 | 0.030 |
OLS | | | | | | | | |
25 | 2.600 | 2.324 | 1.041 | 0.261 | −1.009 | 0.097 | 2.998 | 0.099 |
100 | 2.616 | 0.813 | 1.001 | 0.052 | −0.999 | 0.020 | 2.997 | 0.021 |
400 | 2.610 | 0.471 | 1.003 | 0.013 | −1.000 | 0.005 | 2.997 | 0.005 |
6. Further Results
Unbiased GME Estimation. It is apparent from the proof of the theorem in
Section 3 that the
terms are asymptotically uninformative. It is instructive to note that if these terms are deleted from the GME objective function and the resulting objective function is then maximized through choosing
b and
w subject to constraints C2–C4 and C6, the resulting GME estimator is in fact
unbiased for estimating
β. This follows because The
’s are iid mean zero and symmetrically distributed around zero, and the new estimator, say
, is such that
is a symmetric function of The
’s.
Bayesian Analogues. As pointed out by [
35] maximum entropy methods can be motivated as an empirical Bayes rule. We expand on their analogy by noting a strong formal parallel to the traditional Bayesian framework of inference. In particular, one can view
as the maximum entropy analogue to the log of a non-normalized Bayesian prior and
as the maximum entropy analogue to the non-normalized log of the probability density kernel or log-likelihood function. For any given set of support points
Z and
V, we can define functions
by:
Then for , the maximum likelihood estimator of β is , and if one adds priors , then is the Bayesian posterior mode estimator of β. We note the following consequences of these equivalences. First, if the support points can be chosen so that is very close to the true distribution of ε, then the GME estimator should be nearly asymptotically efficient. Second, in finite samples the prior information influences such that is generally not unbiased. Third, the support points used in the GME estimator have no particular relationship to the points of support of the distribution of a discrete random variable. The distributions and are absolutely continuous for any choice of Z and V.
The previous Monte Carlo results illustrate the Bayesian-like character of the maximum entropy results. The GME with reasonably narrow points of support centered on the true values of β dominated the OLS estimator and was sometimes far better. On the other hand, the GME performed poorly when the points of support were similarly narrow and mis-centered by only one-eighth the range of the points of support. In the latter case, mean squared errors were often much worse than OLS and biases were often substantial. Finally, wider points of support, even though they were the most mis-centered of the cases examined, were quite similar to OLS results for moderate to large sample sizes, and provided some degree of improvement over OLS for small samples.
Finally, the GME approach is a special case of generalized cross entropy, which incorporates a reference probability distribution over support points. This allows a direct method of including prior information, akin to a Bayesian framework. However, in a classical sense, the empirical estimation strategies are inherently different.
GME Calculation Method. The conditional maximum entropy formulation (2) utilized in the proof of asymptotic results represents the basis for a computationally efficient method of obtaining GME estimates. In particular, maximizing
through choice of
τ involves a nonlinear search over a vector of relatively low dimension (
K) as opposed to searching over the (
KM + NJ) dimensional space of (
p,w) values. In the process of concentrating the objective function, note that the needed Lagrange multiplier functions
can be expressed as elementary functions for three support points or less, and still exist in closed form (using inverse hyperbolic functions) for support vectors having five elements. As a point of comparison, the calculation of GME estimates in the Monte Carlo experiment with N = 1,600 was completed in a matter of seconds on a 133 mhz personal computer. Such a calculation would be intractable, let alone efficient, in the space of (
p,w) values. We note further that the dual algorithm of [
10] would still involve a search over a space of dimension N = 1,600, which would be infeasible here and in other problems in which the number of data points is large.