Generalized Maximum Entropy Analysis of the Linear Simultaneous Equations Model

A generalized maximum entropy estimator is developed for the linear simultaneous equations model. Monte Carlo sampling experiments are used to evaluate the estimator's performance in small and medium sized samples, suggesting contexts in which the current generalized maximum entropy estimator is superior in mean square error to two and three stage least squares. Analytical results are provided relating to asymptotic properties of the estimator and associated hypothesis testing statistics. Monte Carlo experiments are also used to provide evidence on the power and size of test statistics. An empirical application is included to demonstrate the practical implementation of the estimator.


Introduction
The simultaneous equations model (SEM) is applied extensively in econometric-statistical studies.Examples of traditional estimators for the SEM include two stage least squares [1], three stage OPEN ACCESS least squares [2], limited information maximum likelihood [3], and full information maximum likelihood [4,5].These estimators yield consistent estimates of structural parameters by correcting for simultaneity between the endogenous variables and the disturbance terms of the statistical model.However, in the presence of small samples or ill-posed problems, traditional approaches may provide parameter estimates with high variance and/or bias, or provide no solution at all.As an alternative to traditional estimators, we present a generalized maximum entropy estimator for the linear SEM and rigorously analyze its sampling properties in small and large sample situations including the case of contaminated error models.
Finite sampling properties of the SEM have been discussed in [6][7][8][9][10], where alternative estimation techniques that have potentially superior sampling properties are suggested.Specifically, they discussed limitations of asymptotically justified estimators in finite sample situations and the lack of research on estimators that have small sample justification.In a special issue of The Journal of Business and Economic Statistics, the authors of [11,12] examined small sample properties of generalized methods of moments estimators for model parameters and covariance matrices.References [13][14][15] pointed out that even small deviations from model assumptions in parametric econometric-statistical models that are only asymptotically justified can lead to undesirable outcomes.Moreover, Reference [16] singled out the extreme sensitivity of least squares estimators to modest departures from strictly Gaussian conditions as a justification for examining robust methods of estimation.These studies motivate the importance of investigating alternatives to parameter estimation methods for the SEM that are robust in finite samples and lead to improved prediction, forecasting, and policy analysis.
The principle of maximum entropy has been applied in a variety of modeling contexts.Reference [10] proposed estimation of the SEM based on generalized maximum entropy (GME) to deal with small samples or ill-posed problems, and defined a criteria that balances the entropy in both the parameter and residual spaces.The estimator was justified on information theoretic grounds, but the repeated sampling properties of the estimator and its asymptotic properties were not analyzed extensively.Reference [17] suggested an information theoretic estimator based on minimization of the Kullback-Leibler Information Criterion as an alternative to optimally-weighted generalized method of moments estimation that can accommodate weakly dependent data generating mechanisms.Subsequently, [18] investigated an information theoretic estimator based on minimization of the Cressie-Read discrepancy statistic as an alternative approach to inference in models whose data information was cast in terms of moment conditions.Reference [18] identified both exponential empirical likelihood (negative entropy) andempirical likelihood as special cases of the Cressie-Read power divergence statistic.More recently, [19,20] applied the Kullback-Leibler Information Criterion to define empirical moment equations leading to estimators with improved predictive accuracy and mean square error in some small sample estimation contexts.Reference [21] provided an overview of information theoretic estimators for the SEM Reference [22] demonstrated that maximum entropy estimation of the SEM has relevant application to spatial autoregressive models wherein autocorrelation parameters are inherently bounded and in circumstances when traditional spatial estimators become unstable.Reference [23] examined the effect of management factors on enterprise performance using a GME SEM estimator.Finally, [24] estimated spatial structural equation models also extended to a panel data framework.
In this paper we investigate a GME estimator for the linear SEM that is fundamentally different from traditional approaches and identify classes of problems (e.g., contaminated error models) in which the proposed estimator outperforms traditional estimators.The estimator: (1) is completely consistent with data and other model information constraints on parameters, even in finite samples; (2) has large sample justification in that, under regularity conditions, it retains properties of consistency and asymptotic normality to provide practitioners with means to apply standard hypothesis testing procedures; and (3) has the potential for improved finite sample properties relative to alternative traditional methods of estimation.The proposed estimator is a one-step instrumental variable-type estimator based on a nonlinear-in-parameters SEM model discussed in [1,7,25].The method does not deal with data information by projecting it in the form of moment constraints but rather, in GME parlance, is based on data constraints that deal with the data in individual sample observation form.Additional information utilized in the GME estimator includes finite support spaces that are imposed on model parameters and disturbances, which allows users to incorporate a priori interval restrictions on the parameters of the model.
Monte Carlo (MC) sampling experiments are used to investigate the finite sample performance of the proposed GME estimator.In the small sample situations analyzed, the GME estimator is superior to two and three stage least squares based on mean square error considerations.Further, we demonstrate the improved robustness of GME relative to 3SLS in the case of contaminated error models.For larger sample sizes, the consistency of the GME estimator results in sampling behavior that emulates that of 2SLS and 3SLS estimators.Observations on power and size of asymptotic test statistics suggest that the GME does not dominate, nor is it dominated by, traditional testing methods.An empirical application is provided to demonstrate practical implementation of the GME estimator and to delineate inherent differences between GME and traditional estimators in finite samples.The empirical analysis also highlights the sensitivity of GME coefficient estimates and predictive fit to specification of error truncation points, underscoring the need for care in specifying the empirical error support.

The GME-Parameterized Simultaneous Equations Model
Consider the SEM with G equations, which can be written in matrix form as: where of coefficients of exogenous variables, and matrix of reduced form disturbances.The reduced form for the ith endogenous variable is: The ith equation in Equation ( 1) can be rewritten in terms of a nonlinear structural parameter representation of the reduced form model as [1]: where XΠ X , and = vec( , ) In general the notation (-i) in the subscript of a variable represents the explicit exclusion of the ith column vector, such as y i being excluded from Y to form Y (−i) , in addition to the exclusion of any other column vectors implied by the structural restrictions.Then represents the K i exogenous variables with nonzero coefficients in the ith equation, and i β is the corresponding   Historically, Equation (4) has provided motivation for two stage least squares (2SLS) and three stage least squares (3SLS) estimators.The presence of right hand side endogenous variables yields biased and inconsistent estimates for Y (−i) [1].In 2SLS and 3SLS, the first stage is to approximate E[Y (−i) ] by applying ordinary least squares (OLS) to the unrestricted reduced form model in Equation (2) and thereby obtain predicted values of Y (−i) .Then, using the predicted values to replace E[Y (−i) ], the second stage is to estimate the model in Equation (4) with OLS.In the event that the error terms are normally distributed, homoskedastic, and serially independent, the 3SLS estimator is asymptotically equivalent to the asymptotically efficient full-information maximum likelihood (FIML) estimator [21].Under the same conditions, it is equivalent to apply FIML to either Equation (1) or to Equation (4) under the restriction -1 =-Π BΓ .

GME Estimation of the SEM
Following the maximum entropy principle, the entropy of a distribution of probabilities 1 ( ,..., ) , is defined by: 1 ( ) ln [26].The value of H(q) reaches a maximum when q n = N −1 for n = 1,...,N, which characterizes the uniform distribution.Generalizations of the entropy function that have been examined elsewhere in the econometrics and statistics literature include the Cressie-Read power divergence statistic [18], Kullback-Leibler Information Criterion [27], and the α-entropy measure [28].We restrict our analysis to the entropy objective function due to its efficiency and robustness properties [18], and its current universal use within the context of GME applications [9].GME estimators previously proposed for the SEM include (a) the data constrained estimator for the general linear model, hereafter GME-D, which amounts to applying the GME principle to a vectorized version of the structural model in Equation (1); and (b) a two stage estimator analogous to 2SLS whereby GME-D is applied to the reduced form model in the first stage and to the structural model in the second stage, hereafter GME-2S.Alternatively, [10] applied the GME principle to the reduced form model in Equation ( 3) with the restriction -1 =-Π BΓ imposed, hereafter GME-GJM.Our approach follows 2SLS and 3SLS in the sense that the restriction XΠ .However, unlike 2SLS and 3SLS, our approach is formulated under the GME principle completely consistent with Equation (4) retained as a nonlinear constraint and concurrently solved with the unrestricted reduced form model in Equation (3) to identify structural and reduced form coefficient estimates.Reference [7] refers to Equations ( 3) and ( 4) as a nonlinear-in-parameters (NLP) form of the SEM model.
To formulate a GME estimator for the NLP model of the SEM, henceforth referred to as GME-NLP, parameters and disturbance terms of Equations ( 3) and ( 4) are reparameterized as convex combinations of reference support points and unknown convexity weights.Support matrices S i for , , , , i z w     that identify finite bounded feasible spaces for individual parameters and weight vectors , , , , subject to:   The S i support matrices (for , , , , ) present in Equations ( 6) and ( 7) consist of user supplied reference support points defining feasible spaces for parameters and disturbances.For example, w S is given by: where the nth disturbance term of the gth equation with M support points is defined, in summation notation, as  8) contains the required adding up conditions for each of the sets of convexity weights used in forming the GME-NLP estimator.Nonnegativity of the weights is an inherent characteristic of the maximum entropy objective and does not need to be explicitly enforced with inequality constraints.Regarding notation in (8), denote the number of unknown denote the number of unknown ' ig s  , then together with the KG reduced form parameters, the ' kg s  , the total number of unknown parameters in the structural and reduced form equations is Optimizing the objective function defined in Equation ( 5) optimizes the entropy in the parameter and disturbance spaces for both the structural model in Equation (6) and the reduced form model in Equation (7).The optimized objective function can mitigate the detrimental effects of ill-conditioned explanatory and/or instrumental variables and extreme outliers due to heavy tailed sampling distributions.In these circumstances traditional estimators are unstable and often represent an unsatisfactory basis for estimation and inference [20,25,29].
We emphasize that the proposed GME-NLP is a data-constrained estimator.Equations ( 5)-( 8) constitute a data-constrained model in which the regression models themselves, as opposed to moment conditions based on them, represent constraining functions to the entropy objective function.[16] pointed out that outside the Gaussian error model, estimation based on sample moments can be inefficient relative to other procedures.Reference [9] provided MC evidence that data-constrained GME models, making use of the full set of observations, outperformed moment-constrained GME models in mean square error.In the GME-NLP model, constraints Equations ( 6) and ( 7) remain completely consistent with sample data information in Equations ( 3) and (4).
We also emphasize that the proposed GME-NLP estimator is a one-step approach, simultaneously solving for reduced form and structural parameters.As a result, the nonlinear specification of Equation ( 6) leads to first order optimization conditions (Equation ( 16) derived in the Appendix) that are different from other multiple-step or asymptotically justified estimators.The most obvious difference is that the first order conditions do not require orthogonality between right hand side variables and error terms, i.e., GME-NLP relaxes the orthogonality condition between instruments and the structural error term.Perhaps more importantly, multiple-step estimators (e.g., 2SLS or GME-2S) only approximate the NLP model and ignore nonlinear interactions between reduced and structural form coefficients.Thus, constraints Equations ( 6) and ( 7) are not completely satisfied by multiple-step procedures, yielding an estimator that is not fully consistent with the entire information set underlying the specification of the model.Although this is not a critical issue in large sample estimation, as demonstrated below, estimation inefficiency can be substantial in small samples if multiple-step estimators do not adequately approximate the NLP model.
The proposed GME-NLP estimator has some econometric limitations similar to, and other limitations which set it apart from, 2SLS that are evident when inspecting Equations ( 5)- (8).Firstly, like 2SLS, the residuals in Equations ( 4) and ( 6) are not identical to those of the original structural model, nor are they the same as the reduced form error term, except when evaluated at the true parameter values.Secondly, the GME-NLP estimator does not attempt to correct for contemporaneous correlation among the errors of the structural equations.Although a relevant efficiency issue, contemporaneous correlation is left for future research.Thirdly, and perhaps most importantly, the use of bounded disturbance support spaces in GME estimation introduces a specification issue in empirical analysis that typically does not arise with traditional estimators.These issues are discussed in more detail ahead.

Parameter Restrictions
In practice, parameter restrictions for coefficients of the SEM have been imposed using constrained maximum likelihood or Bayesian regression [7,30].Neither approach is necessarily simple enough to specify analytically nor estimate empirically, and each has its empirical advantages and disadvantages.For example, Bayesian estimation is well-suited for representing uncertainty with respect to model parameters, but can also require extensive MC sampling when numerical estimation techniques are required, as is often the case in non-normal, non-conjugate prior model contexts.In comparison to constrained maximum likelihood or Bayesian analysis, the GME-NLP estimator also enforces restrictions on parameter values, is arguably no more difficult to specify or estimate, and does not require the use of MC sampling in the estimation phase of the analysis.Moreover, and in contrast to constrained maximum likelihood or the typical parametric Bayesian analysis, GME-NLP does not require explicit specification of the distributions of the disturbance terms or of the parameter values.However, both the coefficient and the disturbance support spaces are compact in the GME-NLP estimation method, which may not apply in some idealized empirical modeling contexts.
Imposing bounded support spaces on coefficients and error terms has several implications for GME estimation.Consider support spaces for coefficients.Selecting bounds and intermediate reference support points provides an effective way to restrict parameters of the model to intervals.If prior knowledge about coefficients is limited, wider truncation points can be used to increase the confidence that the support space contains the true .If knowledge exists about, say, the sign of a specific coefficient from economic theory, this can be straightforwardly imposed together with a reasonable bound on the coefficient.
Importantly, there is a bias-efficiency tradeoff that arises when parameter support spaces are specified in terms of bounded intervals.A disadvantage of bounded intervals is that they will generally introduce bias into the GME estimator unless the intervals happen to be centered on the true values of the parameters.An advantage of restricting parameters to finite intervals is that they can lead to increases in efficiency by lowering parameter estimation variability.In the MC analysis ahead, it is demonstrated that the bias introduced by bounded parameter intervals in the GME-NLP estimator can be much more-than compensated for by substantial decreases in variability, leading to notable increases in overall estimation efficiency.
In practice, support spaces for disturbances can always be chosen in a manner that provides a reasonable approximation to the true disturbance distribution because upper and lower truncation points can always be selected sufficiently wide to contain the true disturbances of regression models [31].The number, M, of support points for each disturbance can be chosen to account for additional information relating to higher moments (e.g., skewness and kurtosis) of each disturbance term.MC experiments by [9] demonstrated that support points ranging from 2 to 10 are acceptable for empirical applications.
For the GME-NLP estimator, identifying bounds for the disturbance support spaces is complicated by the interaction among truncation points of the parameters and disturbance support points of both the reduced and structural form models.Yet, several informative generalizations can be drawn.First, [32] demonstrated that ordinary least squares-like behavior can be obtained by appropriately selecting truncation points of the GME-D estimator of the general linear model.This has direct implications to SEM estimation in that appropriately selected truncation points of the GME-2S estimator leads to 2SLS-like behavior.However, as demonstrated ahead, given the nonlinear interactions between the structural and reduced form models, adjusting truncation points of the GME-NLP does not necessarily lead to two stage like behavior in finite samples.Second, the reduced form model in Equation ( 3) and the nonlinear structural parameter representation of the reduced form model in Equation ( 4) have identical error structure at the true parameter values.Hence, in the empirical applications below, we specify identical support matrices for error terms of both the structural and reduced form models. Third, in the limiting case where the disturbance boundary points of the GME-NLP structural model expand in absolute value to infinity, the parameter estimates converge to the mean of their support points.
Given ignorance regarding the disturbance distribution, [9,10] suggest using a sample scale parameter and the multiple-sigma truncation rule to determine error bounds.For example, the three sigma rule for random variables states that the probability of a unimodal continuous random variable assuming outcomes distant from its mean by more than three standard deviations is at most 5% [33].Intuitively, this multiple-sigma truncation rule provides a means of encompassing an arbitrarily large proportion of the disturbance support space.From the empirical evidence presented below, it appears that combining the three sigma rule with a sample scale parameter to estimate the GME-NLP model is a useful approach.

GME-NLP Asymptotic Properties and Inference
To derive consistency and asymptotic normality results for the GME-NLP estimator, we assume the following regularity conditions.

R1. The N rows of the  
N G  disturbance matrix Ε are independent random drawings from an G-dimensional population with zero mean vector and unknown finite covariance matrix Σ .
R2.The   N K  matrix X of exogenous variables has rank K and consists of nonstochastic elements, with where Ω is a positive definite matrix.
R3.The elements ng  of the vector g g  v μ (n = 1,...,N, g = 1,...,G) are independent and bounded such that and large enough positive c gM = c g1 .
The probability density function of μ is assumed to be symmetric about the origin with a finite covariance matrix.R4.

 
,    can be enclosed within a bounded interval.

Estimator Properties
The regularity conditions (R1)-(R5) provide a basic set of assumptions sufficient to establish asymptotic properties for the GME-NLP estimator of the SEM.For notational convenience let The intuition behind the proof is that without the reduced form component in Equation ( 7) the parameters of the structural component in Equation ( 6) are not identified.As shown in the Appendix, the reduced form component yields estimates that are consistent and contribute to identifying the structural parameters, and the structural component in Equation (7) ties the structural coefficients to the data and draws the GME-NLP estimates toward the true parameter values as the sample size increases.

Theorem 2. Under the conditions of Theorem 1, the GME-NLP estimator
The asymptotic covariance matrix consists of . The elements of  Ω are defined by Estimators of the SEM are generally categorized as "full information" (e.g., 3SLS or FIML) or "limited information" (e.g., 2SLS or LIML) estimators.GME-NLP is not a full information estimator because the estimator neither enforces the restriction -1 =-Π BΓ nor explicitly characterizes the contemporaneous correlation of the disturbance terms.An advantage of GME-NLP is that it is completely consistent with data constraints in both small and large samples, because we concurrently estimate the parameters of the reduced form and structural models.As a limited information estimator, GME-NLP has several additional attractive characteristics.First, similar to other limited information estimators, it is likely to be more robust to misspecification than a full information alternative because in the latter case misspecification of any one equation can lead to inconsistent estimation of all the equations in the system [34].Second, GME-NLP is easily applied in the case of a single equation, G = 1, and it retains the asymptotic properties identified above.Finally, the single equation case is a natural generalization of the data-constrained GME estimator for the general linear model.

Hypothesis Tests
Because the GME-NLP estimator δ is consistent and asymptotically normally distributed, asymptotically valid normal and chi-square test statistics can be used to test hypothesis about δ .To implement such tests a consistent estimate of the asymptotic covariance of δ , or Ω Ω Ω , is required.The matrix  Ω can be estimated using   ŵ ng  δ above or alternatively by: In the former case based on   ŵ ng  δ , which are the elements of w g Ξ as defined in the Appendix, then In the latter case based on ˆg  and for i,j = 1,...,G.Combining these elements, the estimated asymptotic covariance matrix of δ is defined as

Wald Tests
To define Wald tests on the elements of δ , let H o : ( ) = 0 R δ be the null hypothesis to be tested.Here R( δ ) is a continuously differentiable L-dimensional vector function with rank In the special case of a linear null hypothesis H o : R .It follows from Theorem 5.37 in [35] that: The Wald test statistic has a 2  limiting distribution with L degrees of freedom given as ˆˆV ar( ) under the null hypothesis.

Monte Carlo Experiments
For the sampling experiments we set up an overdetermined simultaneous system with contemporaneously correlated errors that is similar, but not identical, to empirical models discussed in [10,36,37].Reference [10] provide empirical evidence of the performance of the GME-GJM estimator for both ill-posed (multicollinearity) and well-posed problems using a sample size of 20 observations.In this study we attempt to focus on both smaller and larger sample size performance of the GME-NLP estimator, the size and power of single and joint hypothesis tests, and the relative performance of GME-NLP to 2SLS and 3SLS.In addition, the performance of GME-NLP is compared to Golan, Judge, and Miller's GME-GJM estimator.The estimation performance measure is the mean square error (MSE) between the empirical coefficient estimates and the true coefficient values.

Parameters and Support Spaces
The parameters Γ and Β and the covariance structure Σ of the structural system in Equation (1) are specified as: The exogenous variables are drawn from an iid N(0,1) distribution, while the errors for the structural equations are drawn from a multivariate normal distribution with mean zero and covariance  Σ I that is truncated at ±3 standard deviations.
To specify the GME models, additional information beyond that traditionally used in 2SLS and 3SLS is required.Upper and lower bounds, as well as intermediate support points for the individual coefficients and disturbance terms, are supplied for the GME-NLP and GME-GJM models along with starting values for the parameter coefficients.The difference in specification of GME-GJM relative to GME-NLP is that in the former, -1 =-Π BΓ replaces the structural model in Equation ( 6) and the GME-GJM objective function excludes any parameters associated with the structural form disturbance term.The upper and lower bounds of the support spaces specified for the structural and reduced form models are identical to [10] except that we use three rather than five support points.The supports are defined as , and The error supports for the reduced form and structural model were specified as , where i  is the standard deviation of the errors from the ith equation and from R3 we let i  = 2.5 to ensure feasibilty.See appendix material for a more complete discussion of computational issues.

Estimation Performance
Table 1 contains the mean values of the estimated Γ parameters based on 1,000 MC repetitions for sample sizes of 5, 25, 100, 400, and 1,600 observations per equation.From this information, we can infer several implications about the performance of the GME estimators.For a sample size of five observations per equation, 2SLS and 3SLS estimators provide no solution due to insufficient degrees of freedom.For five and 25 observations the GME-NLP and GME-GJM estimators have mean values that are similar, although GME-NLP exhibits more bias.When the sample size is 100, the GME-NLP estimator generally exhibits less bias.Like 2SLS and 3SLS, the GME-NLP estimator is converging to the true coefficient values as N increases to 1,600 observations per equation (3SLS estimates are not reported for 1,600 observations).
In Table 2 the standard error (SE) and MSE are reported for 3SLS and GME-NLP.The GME-NLP estimator has uniformly lower standard error and MSE than does 3SLS.For small samples of 25 observations the MSE performance of the GME-NLP estimator is vastly improved relative to the 3SLS estimator, which is consistent with MC results from other studies relating to other GME-type estimators [9,32].As the sample size increases from 25 to 400 observations, both the standard error and mean squared error of the 3SLS and GME-NLP converge towards each other.Interestingly, even at a sample size of 100 observations the GME-NLP mean squared error remains notably superior to 3SLS.

Inference Performance
To investigate the size of the asymptotically normal test, the single hypothesis H 0 : ij k   was tested with k set equal to the true values of the structural parameters.Critical values of the tests were based on a normal distribution with a 0.05 level of significance.An observation on the power of the respective tests was obtained by performing a test of significance whereby k = 0 in the preceding hypothesis.To complement this analysis, we investigated the size and power of a joint hypothesis H 0 : , k k     using the Wald test.The scenarios were analyzed using 1000 MC repetitions for sample sizes of 25, 100, and 400 per equation.
Table 3 contains the rejection probabilities for the true and false hypotheses of both the GME-NLP and 3SLS estimators.The single hypothesis test for the parameter 21   0.222 based on the asymptotically normal test responded well for GME-NLP (3SLS), yielding an estimated test size of 0.066 (0.043) and power of 0.980 (0.964) at 400 observations per equation.In contrast, for the remaining parameters, the size and power of the hypotheses tests were considerably less satisfactory.This is due in part to the second and third equations having substantially larger disturbance variability.
For the joint hypothesis test based on the Wald test the size and power perform well for GME-NLP (3SLS) with an estimated test size of 0.047 (0.047) and power of 0.961 (0.934) at 400 observations.Overall, the results indicate that based on asymptotic test statistics GME-NLP does not dominate, nor is it dominated by, 3SLS.

Further Results: 3-Sigma Rule and Contaminated Errors
Further MC results are presented to demonstrate the sensitivity of the GME-NLP to the sigma truncation rule (Table 4) and to illustrate robustness of the GME-NLP relative to 3SLS in the presence of contaminated error models (Table 5).Each of these issues play a critical role in empirical analysis of the SEM, while the latter can compound estimation problems especially in small sample estimation.
To obtain the results in Table 4, the error supports for the reduced form and structural model were specified as before with    where i  is the standard deviation of the errors from the ith equation, j = 3,4,5 and from R3 i  = 2.5, again for solution feasibility.The results exhibit a tradeoff between bias and MSE specific to the individual coefficient estimates.For 21  the bias and the MSE decreases as the truncation points are shrunk from five to three sigma.In contrast, for the remaining coefficients in Table 4, the MSE increases as the truncation points are decreased.The bias decreases for 32  and 13  as the truncation points are shrunk, while the direction of bias is ambiguous for 12  .Predominately, the empirical standard error of the coefficients decreased with wider truncation points.Overall, these results underscore that the mean and standard error of GME-NLP coefficient values are sensitive to the choice of truncation points.Results from Table 5 provide the mean and MSE of the distribution of coefficient estimates for 3SLS and GME-NLP when the error term is contaminated by outcomes from an asymmetric distribution [14,15].For a given percentage level  , the errors for the structural equations are and then truncated at ±3 standard deviations.We define  and examine the robustness of 3SLS and GME-NLP with values of  = 0.1, 0.5, and 0.9.The error supports for the reduced form and structural model were specified with the three sigma rule.As evident in Table 5, when the percent of contamination induced in the error component of the SEM increases, performance of both estimators is detrimentally impacted.For 25 observations, the 3SLS coefficient estimates are much less robust the contamination process than are the GME-NLP estimates as measured by the MSE values.At 100 observations the performance of 3SLS improves, but still remain less than GME-NLP.

Discussion
The performance of the GME-NLP estimator was based on a variety of MC experiments.In small and medium sample situations (≤100 observations) the GME-NLP is MSE superior to 3SLS for the defined experiments.Increasing the sample size clearly demonstrated consistency of the GME-NLP estimator for the SEM.Regarding performance in single or joint hypothesis testing contexts, the empirical results indicate that the GME-NLP did not dominate, nor was it dominated by 3SLS.
The MC evidence provided above indicates that applying the multiple-sigma truncation rule with a sample scale parameter to estimate the GME-NLP model is a useful empirical approach.Across the 3, 4, and 5-sigma rule sampling experiments, GME-NLP continued to dominate 3SLS in MSE for 25, 100, and 400 observations per equation.For wider truncation points the empirical SE of the coefficients decreased.However, these results also demonstrate that the GME-NLP coefficients are sensitive to the choice of truncation points with no consensus in choosing narrower (3-sigma) over wider (5-sigma) truncation supports under a Gaussian error structure.We suggest that additional research is needed to optimally identify error truncation points.
Finally, the GME-NLP estimator exhibited more robustness in the presence of contaminated errors relative to 3SLS.The MC analysis illustrates that deviations from normality assumptions in asymptotically justified econometric-statistical models lead to dramatically less robust outcomes in small samples.Reference [9,16] emphasized that under traditional econometric assumptions, when samples are Gaussian in nature and sample moments are taken as minimal sufficient statistics, then no information may be lost.However, they point out that outside the Gaussian setting, reducing data constraints to moment constraints can be wasteful use of sample information and results in estimators that are less than fully efficient.The above MC analysis suggests that GME-NLP, which relies on full sample information but does not rely on a full parametric specification such as maximum likelihood, can be more robust to alternative error distributions.

Empirical Illustration
In this section, an empirical application is examined to demonstrate implementation of the GME-NLP estimator.It is the well known three-equation system that comprises the Klein Model I, which further benchmarks the GME-NLP estimator relative to least squares.

Klein Model
Klein's Model I was selected as an empirical application because it has been extensively applied in many studies.Klein's macroeconomic model is highly aggregated with relatively low parameter dimensionality, making it useful for pedagogical purposes.It is a three-equation SEM based on annual data for the United States from 1920 to 1941.All variables are in billions of dollars, which are constant dollars with base year 1934 (for a complete description of the model and data see [1,38]).
The model is comprised of three stochastic equations and five identities.The stochastic equations include demand for consumption, investment, and labor.Klein's consumption function is given as: where CN t is consumption, W 1t is wages earned by workers in the private sector, W 2t is wages earned by government workers, P t is nonwage income (profit), and 1t  is a stochastic error term.This equation describes aggregate consumption as a function of the total wage bill and current and lagged profit.The investment equation is given by: where I t is net investment, K t is the stock of capital goods at the end of the year, and 2t  is a stochastic error term.This equation implies that net investment reacts to current and lagged profits, as well as beginning of the year capital stocks.The demand for labor is given by: where E t is a measure of private product and 3t  is a stochastic error term.It implies that the wage bill paid by private industry varies with the current and lagged total private product and a time trend.A time trend is included to capture institutional changes over the period, primarily the bargaining strength of labor.The identities that complete the structural model include: The first identity states that national income, Y t , plus business taxes, TX t , are equal to the sum of goods and services demanded by consumers, CN t , plus investors, I t , plus net government demands, G t + W 2t .The second identity holds total income, Y t , as the sum of profit, P t , and wages, W t , while the third implies that end-of-year capital stocks, K t , are equal to investment, I t , plus last years end-of-year capital stock, K t−1 .In the fourth identity, W t , is the total wage bill that is the sum of wages earned from the private sector, W 1t , and wages earned by the government, W 2t .The fifth identity states that private product, E t , is the equal to income, I t , plus business taxes, TX t , less government wages, W 2t .

Klein Model I Results
Table 6 contains the estimates of the three stochastic equations using ordinary least squares (OLS), two stage least squares (2SLS), three stage least squares (3SLS), and GME-NLP.Parameter restrictions for GME-NLP were specified using the fairly uninformative reference support points (-50,0,50) for the intercept, (-5,0,5) for the slope parameters of the reduced form models and (-2,0,2) for the slope parameters of the structural form models. Truncation points for the error supports of the structural model are specified using both three-and five-sigma rules.
For the given truncation points, the GME-NLP estimates of asymptotic standard errors are greater than those of the other estimators.It is to be expected that if more informative parameter support ranges had been used when representing the feasible space of the parameters, standard errors would have been reduced.In most of the cases, the parameter, standard error, and R 2 measures were not particularly sensitive to the choice of error truncation point, although there were a few notable exceptions dispersed throughout the three equation system.The Klein Model I benchmarks the GME-NLP estimator relative to OLS, 2SLS, and 3SLS.Comparisons are based on the sum of the squared difference (SSD) measures between GME-NLP and the OLS, 2SLS and 3SLS parameter estimates.Turning to the consumption model, the SSD is smallest (largest) between GME-NLP and OLS (3SLS) parameter estimates for both the three-and five-sigma rules (but only marginally).For example, the SSD between OLS (3SLS) and GME-NLP under the 3-sigma is 3.35 (4.15).Alternatively, for the labor model, the SSD is smallest (largest) between GME-NLP and 3SLS (OLS) parameter estimates for both the three-and five-sigma rules.The most dramatic differences arise in the investment model.For example, the SSD between OLS (3SLS) and GME-NLP under the 3-sigma is 3.00 (391.79).This comparison underscores divergences that exist between GME-NLP and 2SLS and 3SLS estimators.In addition to the information introduced by the parameter support spaces, another reason for this divergence may be due to the fact that GME-NLP is a single-step estimator that is completely consistent with data constraints Equations ( 6) and (7), while 2SLS and 3SLS are multiple step estimators that only approximate the NLP model and ignore nonlinear interactions between reduced and structural form coefficients.The nonlinear specification of GME-NLP leads to first order optimization conditions (Equation ( 16) derived in the Appendix) that are different from other multiple-step or asymptotically justified estimators such as 2SLS and 3SLS.Overall, the SSD comparisons characterize finite samples differences in the GME-NLP estimator relative to more traditional estimators.

Conclusions
In this paper a one-step, data-constrained generalized maximum entropy estimator is proposed for the nonlinear-in-parameters model of the SEM (GME-NLP).Under the assumed regularity conditions, it is shown that the estimator is consistent and asymptotically normal in the presence of contemporaneously correlated errors.We define an asymptotically normal test (single scalar hypothesis) and an asymptotically chisquare-distributed Wald test (joint vector hypothesis) that are capable of performing hypothesis tests typically used in empirical work.Moreover, the GME-NLP estimator provides a simple method of introducing prior information into the model by means of informative supports on the parameters that can decrease the mean square error of the coefficient estimates.The reformulated GME-NLP model, which is optimized over the structural and reduced form parameter set, provides a computationally efficient approach for large and small sample sizes.
We evaluated the performance for the GME-NLP estimator based on a variety of Monte Carlo experiments and in an illustrative empirical application.In small and medium sample situations (≤100 observations) the GME-NLP is mean square error superior to 3SLS for the defined experiments.Relative to 3SLS the GME-NLP estimator exhibited dramatically more robustness in the presence of contaminated error problems.These result illustrate advantages of a one-step, data-constrained estimator over multiple-step, moment-constrained estimators.Increasing the sample size clearly demonstrated consistency of the GME-NLP estimator for the SEM.The empirical results indicate that the GME-NLP did not dominate, nor was it dominated by, 3SLS in single or joint asymptotic hypothesis testing.
The three-equation Klein Model I was estimated as an empirical application of the GME-NLP method.Results of the Klein Model I benchmarked parameter estimates of GME-NLP relative to OLS, 2SLS, and 3SLS using the summed squared difference between parameter values of the estimators.GME-NLP was most similar to 2SLS and 3SLS for the consumption and labor demand equations, while it was most similar to OLS for the investment demand equation.In all, the empirical example also demonstrated some disadvantages of GME estimation in that coefficient estimates and predictive fit were somewhat sensitive to specification of error truncation points.This suggests additional research is needed to optimally identify error truncation points.
The analytical results in this study contribute toward establishing a rigorous foundation for GME estimation of the SEM and analogous properties of test statistics.It also furnishes a starting point for empirical economists desiring to apply maximum entropy to linear simultaneous systems (e.g., normalized quadratic demand systems used extensively in applied research).While empirical results are intriguing, this approach does not definitively solve the problem of estimating the SEM in small samples or ill-posed problems, and underscores the need for continued research on problems of a number of problems in small sample estimation based on asymptotically justified estimators. ...
( 1 ) for = 1,..., (where for odd 0 and = , ) Constraints A2-A6 define the reparameterized coefficients and errors with supports.In A5 the term The optimal value of z ngm in the conditionally-maximized entropy function is the solution to the Lagrangian   ) while the optimal value w ngm : ) and: ( ) represent the optimal values of the Lagrangian multipliers.Substituting the solutions defined from Equations (A10), (A11), and (A14) into the conditional objective function yields the conditional maximum value entropy function: The Lagrangian multipliers are vertically concatenated into , , , , λ λ λ λ λ , where, for example, the vector where  denotes the Hadamard product (element wise) between two matrices.The   , where

  
with: (see [40,41]).From the above results and by applying Slutsky's Theorem: Proof.With the exception that we account for contemporaneous correlation in the errors, this is the proof for consistency of the data-constrained GME estimator of the general linear model [32].Consider the conditional maximum function: (see [40,41]).The covariance matrix is . Hence the gradient is bounded in probability.The value of the quadratic term in the Taylor expansion can be bounded above by:     

ΣΓ
unobserved random disturbances.The standard stochastic assumptions of the disturbance vectors are that [ ] contains the unknown ' ij s  for i,j = 1,...,G.The reduced form model is obtained by post-multiplying Equation (1) by1   and solving for Y as:

1 iK
 subvector of the parameter vector i Β .It is assumed that the linear exclusion restrictions on the structural parameters are sufficient to identify each equation.The   endogenous variables in Y (−i) .

1 vec
z w that consist of unknown parameters to be estimated are explicitly defined below.The parameters are redefined as N(0,1) under the null hypothesis H o : 0 ij ij    , the statistic Z can be used to test hypothesis about the values of the ' ij s  .
the matrix   Ξ τ is made up of derivatives of the Lagrangian multipliers w λ and z λ .It is defined as:

Theorem 1 .zTheorem 2 .
By the Cauchy-Swcharz inequality, symmetry assumption on the supports, and the adding up conditions, then ( ) definite matrix.Next, we prove consistency and asymptotic normality of the GME-NLP estimator.Under the regularity conditions R1-R5, the GME-NLP estimator, Let  represent a bounded, convex, and dense parameter space such that the true coefficient values   θ .Consider the just identified case.From Equations (5)-(8):is not a function of  p or z almost everywhere.Furthermore, it is not a function of the reduced form coefficients satisfying the identification conditions that are discussed after Equation (4).In addition the irrelevant terms that vanish in the convergence of the scaled Hessian or 1 N H . Accordingly the GME-NLP estimates of the reduced form parameters, π , are asymptotically and uniquely determined by: (7) and a normalization condition in Equation(8).The π are consistent, or ˆp  π π, which is proved in the Proposition below.Next define the conditional estimator parameter set that satisfies the identification conditions.By[32]: for the just identified case.Further results pertaining to the overidentified case are available from the authors upon request.Under the conditions of Theorem 1, the GME-NLP estimator, Let δ be the GME-NLP estimator of the gradient vector in a Taylor series around δ to obtain: δ and δ .Since δ is a consistent estimator of δ , then * p  δ δ .Using this information and the fact that   left hand and right hand side terms have equivalent limiting distributions.Note that iid for n = 1,...,N, theng ng E        .The scaled gradient term is asymptotically normally distributed as:

Proposition 1 .
Under the assumptions of Theorem 1, the reduced form estimates of (3) are consistent,

τ 0 Ω
about π with a Taylor series expansion that yields:               τ and π .The gradient vector is given by by a multivariate version of Liapounov's central limit theorem


The parameter s  denotes the smallest eigenvalue of H π for any * π that lies between  τ and π denotes the standard vector norm.

Table 2 .
Standard error (SE) and mean square error (MSE) of parameter estimates from 1000 Monte Carlo simulations using 3SLS and GME-NLP.

Table 3 .
Rejection Probabilities for True and False Hypotheses.

Table 4 .
Mean, standard error (SE), and mean square error (MSE) of parameter estimates from 1000 Monte Carlo simulations for GME-NLP with 3, 4, and 5-sigma truncation rules.

Table 5 .
Mean and mean square error (in parentheses) of parameter estimates from 1000 Monte Carlo simulations for 3SLS and GME-NLP with contaminated normal distribution.

Table 6 .
Structural parameter estimates and standard errors (in parentheses) of Klein's Model I using OLS, 2SLS, 3SLS, and GME-NLP.