Abstract
Simultaneous Equations Models (SEM) is a statistical technique widely used in economic science to model the simultaneity relationship between variables. In the past years, this technique has also been used in other fields such as psychology or medicine. Thus, the development of new estimating methods is an important line of research. In fact, if we want to apply the SEM to medical problems with the main goal being to obtain the best approximation between the parameters of model and their estimations. This paper shows a computational study between different methods for estimating simultaneous equations models as well as a new method which allows the estimation of those parameters based on the optimization of the Bayesian Method of Moments and minimizing the Akaike Information Criteria. In addition, an entropy measure has been calculated as a parameter criteria to compare the estimation methods studied. The comparison between those methods is performed through an experimental study using randomly generated models. The experimental study compares the estimations obtained by the different methods as well as the efficiency when comparing solutions by Akaike Information Criteria and Entropy Measure. The study shows that the proposed estimation method offered better approximations and the entropy measured results more efficiently than the rest.
1. Introduction
Simultaneous Equations Models (SEM) [1] is statistical model formed by a set of regression equations that reflect the simultaneity between the set of dependent and independent variables of the model. SEM is used when there is a bidirectional influence relationship between both types of variables. The estimation of the coefficients of a SEM can be made by methods based on either the classical statistical approach or the Bayesian approach.
Regarding classic inference, the estimation of a SEM can be made by limited information and full information methods. Limited information methods estimate each of the equations of the structural form [1] without making use of the information contained in the detailed specification of the rest of the model, only considering both the endogenous and exogenous variables that are included in this equation. Ordinary Least Squares (OLS), Indirect Least Squares (ILS), and the Two Stage Least Squares (2SLS) are examples of limited information methods [1]. Full information methods consider joint estimation of the whole model in the structural form. These methods require the specification all equations, and all of them have to be identified. In general, they are more asymptotically efficient than the others since they incorporate all the information of the system, but, with the drawback that if any equation is incorrectly specified, estimates that are inconsistent with the other equations may be generated. Examples of these kinds of methods are Full Information Maximum Likelihood (FIML) or Three Stage Least Squares (3SLS) [1].
On the other hand, the Bayesian inference which is based on a given set of data, does not use sampling assumptions, but introduces a high degree of complexity due to the prior specification of the distribution, as well as the obtaining of the posterior distribution. Some estimation techniques in the Bayesian approach are the Bayesian Method of Moments (BMOM) [2] or the methods used by Chao and Phillips [3], Geweke [4], and Kleibergen and Van Dijk [5]. The recent development of the Markov Chain Monte Carlo method has been key in making the computation of large models that require integrations over hundreds or even thousands of unknown parameters possible. The Metropolis–Hastings algorithm and the Gibbs Sampling [6] are examples of them.
Regarding model selection, the literature is limited to a comparison between Bayesian and classical estimators, concluding that Bayesian methods perform better in the case of a small sample [2].
Applications of SEMs can be found mainly in the economic framework, although there are some applications of SEMs in other fields. For instance, in tax research, the effects of fiscal decentralization on regional income inequality in Indonesia have been studied using provincial-level data over the period 2001–2014 [7], or finding a SEM that relates employment to mental health [8], others have studied the impact of foreign trade on energy efficiency in China’s textile industry [9] or biomass energy consumption, economic growth, and carbon emission in West Africa [10]. And other studies have concentrated on comparing the results obtained by SEMs versus linear regression modeling of complex phenotypes [11], a model for knowing the forecast demand on the facilities provided airside at airports [12] or a SEM for modeling prescriptions in primary care [13]. The Bayesian SEM approach has been used in agricultural science [14], or for studying the impact of product information on third-party websites on the feedback mechanism between internal word-of-mouth and retail sales on Download.com and Amazon.com (accessed on 1 March 2020) [15]. It can be useful for analyzing the interdependence of a television program viewership between spouses [16], for exploring peer effects in casino gambling behavior [17], for modeling the interaction between people’s health risk perception and betel chewing habits in Taiwan [18], and for studying the effects of repetitive iodine thyroid blocking on the development of the foetal brain and thyroid in rats [19].
When several estimate models are available, it is necessary to have a selection parameter criteria. There are a lot of useful information parameter criteria for comparing SEMs, such as Akaike Information Criteria (AIC) [20,21], its corrected version (AICc) [22], Schwarz Information Criteria (SIC) [23], Bayesian Information Criteria (BIC) [24], Hannan and Quinn (HQ) [25], and Model Selection Criterion based on Kullback–Leibler’s Symmetric Divergence [26].
Entropy was initially introduced in thermodynamics, where it was used to provide the basis for the second law of thermodynamics. Subsequently, mechanical statistics provided a connection between the macroscopic properties of entropy and the states of the system, and from a mathematical point of view, are non-negative functions defined in probability distributions with multiple applications such as using information theory for measuring a system stability [27]. Applications of entropy have been used in the fields of finance, [28], environmental and water engineering [29], urban systems [30], and applications for customer satisfaction surveys [31].
In our paper, a new method for the SEM estimation is developed and compared with other methods through the AIC and an entropy measure developed by Amigó [32], which allows us to select the estimation method with the highest homogeneity in the estimation errors.
The organization of the paper is as follows: In Section 2, the model is set up and several classical and Bayesian methods for estimating SEM are briefly reviewed. Section 3 describes the proposed estimation method. In Section 4, the entropy is shown and a new version is obtained as information criteria for selecting the method of estimation with minimum error. The experimental design and the results are shown in Section 5, and, finally, the conclusions and future lines of work are presented in Section 6.
2. Definition of the Model and Methods for Estimating a SEM Problem
2.1. Definition of the Model
Consider m interdependent or endogenous variables which depend on k independent or exogenous variables. Suppose that each endogenous variable can be expressed as a linear combination of the other endogenous variables, the exogenous ones, and white noise that represents stochastic interference. Thus, a SEM as a matrix form is [1]:
The equations can be represented in matrix form as:
where and are matrices of coefficients, is the matrix of endogenous variables, is the matrix of exogenous variables, and is the matrix of white noise variables, being n the sample size. Some coefficients of B and are zero, and are known a priori. The number of endogenous and exogenous variables in the ith equation of (1) is denoted by and . An equation is identified if the number of variables (endogenous and exogenous) in the equation is lower than or equal to , that is (order condition (1)). When , the equation is exactly identified and when it is over-identified. Only identified equations can be solved. Solving the model is equivalent to obtaining a estimation of B and in (2) from a representative sample of the model (a set of values of the data variables X and Y), in order to explain a well-known matrix equation that represents the relationship between both sets of variables.
2.2. Methods for Estimating an SEM Problem
There are different techniques for estimating SEM parameters. In the classical approach, examples of these are 3SLS, 2SLS, OLS, K-class estimators [33], etc., and in the Bayesian approach, MCMC algorithms, and several conjugate distributions are some examples. In this section, a brief review of those estimators that have been used in our work is shown.
2.2.1. Two Stage Least Squares (2SLS)
2SLS is the most common estimation method for a SEM [1], developed independently by Theil (1953) [34] and Basmann (1957) [35]. This method is called two stage, because in its mathematical expression, Ordinary Least Square (OLS) is applied twice. In the first step, new variables, called proxy, are calculated by Least Squares using the exogenous variables of the model as independent variables, and in the second step, the endogenous variables are substituted by the , and then Least Squares is applied once more. Both, the 2SLS and OLS method are single K-class estimators [33], expressed in (3) when (OLS), and when (2SLS).
2.2.2. Bayesian Method of Moments ()
This method, proposed by Zellner in 1998 [2], applies the principle of maximum entropy and generates optimal estimation evaluated by double K-class estimators shown in Table 1. When there is not enough information available to obtain the likelihood function, allowing for data analysis without specifying a probability function and sampling assumption.
Table 1.
K1 and K2 parameters proposed to minimize loss function. Bayesian Method of Moments (BMOM).
Considering, for example, the first structural equation, , being and the matrix of endogenous and exogenous variables and , the white noise vector. The parameters , are estimated by BMOM, that minimize the loss functions are given by:
with , where is the coefficient matrix of the reduced form equation for , being . Those coefficients can be calculated by LS obtaining and .
Table 1 shows the and parameters for two loss functions using the BMOM approach, being n and k, the sample size and the number of exogenous variables:
2.2.3. Bayesian Approach in Two Stages ()
In Bayesian inference, a pragmatic solution for choosing the prior distribution is to select a member of the distribution family so that it remains in the same family as the prior distribution. If the prior is conjugate, the posterior distribution after the first observation belongs, by definition, to the same type and is used as the new prior distribution in the next observation. Incorporating this second observation, the new posterior distribution, also belongs to the class of conjugation. This sequential process only updates the value of parameters of the distribution [6]. In this work, the method uses a Normal-Inverse Gamma prior to obtaining the exact analytic expressions for the posterior distribution of the structural B and coefficients of the SEM. Basically, it has been applied in two stages as the 2SLS method, used in the two-steps Bayesian Least Squares instead of Ordinary Least Squares.
2.2.4. Markov Chain Monte Carlo (MCMC)
In the Bayesian approach, it is essential to select prior distribution, however, there are situations where this selection is somewhat difficult due to the absence of previous model information. The MCMC methodology provides a wide scope for statistical modeling, and is widely used to summarize complicated posterior distributions in econometrics models. In particular, Bayesian methods need to integrate the posterior distribution of model parameters for that reason and MCMC draws samples from these posterior distributions. There are many ways for constructing these chains like Gibbs sampler or special cases of the general framework of Metropolis and Hastings [6]. In particular, in this work, Gibbs Sampling has been used for the simulation of posterior distribution, calculating the average to estimate the model parameters.
3. The Proposed Estimation Method: Optimized BMOM Method ()
A variation of BMOM (Section 2.2.2) is proposed obtaining and by the optimization of different parameter criteria, instead of setting them by the proposed values in (Table 1). Concretely, the optimization of and parameters that minimize the AIC it is proposed, which is a quality measure of statistical models [20] based on sample fit to estimate the likelihood of a model. Thus, given a collection of model-based estimates for data, AIC obtains the quality of each model with respect to other models, providing a way for a model selection.
The expression for this measure is:
where n is the sample size, m is the number of equations, and are the number of endogenous and exogenous variables in i-equation and , the variance-covariance matrix of the errors .
For the experimental study, a large number of SEMs models have been randomly generated through a model generator tool, called real models. Thus, new errors, denoted by , have been obtained as the difference between the values generated by the real models, , and the estimated values by each method described in the previous section, . These errors are substituted in (4), and, a new measure, denoted is calculated. This value is a measure of the error and we propose it as a reference parameter, which could be an indicator of the goodness of the estimated model. It can only be calculated if the real coefficients are known, that is, in an experimental study. To reach the minimum value, an algorithm based on the Quasi-Newton method is used, so that, in each iteration the algorithm looks for an approximation of the inverse of the Hessian matrix. So, the algorithm guarantees approximation at every step of the process.
4. Entropy as an Information Parameter Criteria
Another motivation in this work is to obtain an alternative measure to AIC which is closely related to the BIC. In this section, it presents a parameter criteria of a SEM quality based on entropy, . Although entropy was introduced by Clausius [36] in thermodynamics to measure the amount of energy in a system that cannot produce work, this concept appears in many contexts (statistical mechanics, information theory, etc.) as disorder, uncertainty, randomness, complexity, etc. Claude Shannon [37] in 1948 built his theory of information and communication, being generalized by some authors as Tsallis [38]. The expression for Shannon entropy [39], , is:
On the other hand, a new entropy measure, developed by Amigó [32] as a variation of generalized entropy, has been developed to allow having small values in the distribution. The expression for this entropy, , is:
The model SEM has m equations, the same number as endogenous variables. Then, applying this entropy to each equation in the estimated model obtains:
where n is the sample size and m the number of endogenous variables, and the values for each endogenous variable have been obtained as follows:
where is the error mass in each endogenous variable and has been calculated from the error matrix as the difference between endogenous variables and its estimation through each method. Finally, for each method and estimated model, the average of logarithms of (7) is calculated:
The minimum value of is reached when are homogeneous, so, the values are more well-balanced.
5. Experimental Design and Results
5.1. Experimental Design
In the experimental study a large number of SEMs are generated (that is, generating the matrices X, Y, B y of each model) and then are estimated through the methods presented in Section 2 and Section 3 and finally, the models are compared to their estimations.
The SEMs have been generated as follows: The values of matrices B and are generated randomly following a Uniform distribution in [0, 10], matrix X a multivariate normal distribution, and finally, matrix Y as plus a Normal distribution with mean 0 and sigma . Two functions of the R package have been used: In MCMC, Markov chains have been simulated through the MCMCregress function of MCMCpack package without prior information, and, in the function optim has been used to obtain the optimal values and . In , a Normal-Inverse Gamma prior has been used, with an average of 5 and precision of 0.2 as the initial parameters.
For the comparison study, measures based on the generated model parameters and measures based on the estimated parameters have been calculated, in order to have the criteria to find the best estimation method. So, it is possible to have criteria to choose the best estimation method. In the first type, the Euclidean distance between and , denoted by , where is the coefficient matrix and its estimation, and . In the second type, and the entropy have been calculated. And finally, the execution time has been calculated. Table 2 shows the average and the standard deviation of 50 simulations for each measure, model, and method when the number of variables and sample size of SEMs are varying.
Table 2.
Average and standard deviation of 50 simulations of , , , , and execution time in seconds. Markov Chain Monte Carlo (MCMC). Sigma .
5.2. Experimental Results
Regarding , the results shows that the method is better in all estimated models. Nevertheless, the high computational cost used by this method could be an issue in large problems. In such cases it would be more efficient to use , since this method obtains good estimation in less time. shows similar results than . Regarding estimated measures, does not offer satisfactory results because its values are not in agreement with the Euclidean distance or with , which is proposed as a reformulation that works well. The entropy has yielded satisfactory results and it could be considered as a new comparative measurement.
In all methods, when the complexity of model increases, increases, being the opposite when the sample size increases. Both BMOMs methods (Goodness of fit and Precision of estimation) show similar results, having a small difference between them. MCMC provides estimations with the biggest average of , in which no trend is appreciated when the number of variables and sample size vary. The minimum average of is reached by , with MCMC being the worst method. Regarding the entropy, the minimum average value has been obtained by , except in the smallest case, where obtains the minimum value with a small difference. and 2SLS require less execution time in all cases, with performing the worst.
6. Conclusions and Future Work
In this paper, the estimation of simultaneous equations models was studied through the comparison of models, carried out through an experimental study using randomly generated models. A new estimation method was proposed, , based on the optimization of some parameter of the Bayesian Method of Moments and minimizing the Akaike Information Criteria. The computational study showed that the proposed method was the best one regarding the minimum and entropy. The study also showed that the parameter presented deficiencies for selecting the estimation method with a minimum value and minimum .
The is one of the most used parameter criteria to compare different methods of estimation. Nevertheless, in this study, the results showed that using entropy instead of in the evaluation of the methods provides values according with the quality of the estimation (similarity with the real value).
In future, the study of information criteria parameters and their application in SEM problems, the study of use other criteria for optimization, as well as studying how to reduce the execution costs can be considered.
Author Contributions
Conceptualization, J.J.L.-E. and C.P.; software, M.G.; validation, B.P.-S., J.J.L.-E. and C.P.; formal analysis, B.P.-S. and C.P.; writing—original draft preparation, B.P.-S.; writing—review and editing, J.J.L.-E. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by Spanish Ministry for Economy and Competitiveness (Ministerio de Economíıa, Industria y Competitividad) under grant TIN2016-80565-R.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Data can be solicited to author.
Acknowledgments
The authors are grateful for the computer resources and assistance provided by the Scientific Computing and Parallel Programming Group of the University of Murcia for the simulation study. Constructive comments from the referees to improve the presentation of the paper are great appreciated.
Conflicts of Interest
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
References
- Gujarati, D.; Porter, D. Econometría, 5th ed.; México, D.F., Ed.; McGraw-Hill Interamericana Editores SA: Ciudad de México, México, 2004. [Google Scholar]
- Zellner, A. The finite sample properties of simultaneous equations’ estimates and estimators Bayesian and non-Bayesian approaches. J. Econom. 1998, 83, 185–212. [Google Scholar] [CrossRef]
- Chao, J.C.; Phillips, P.C. Jeffreys prior analysis of the Simultaneous Equations Model in the case with n+1 endogenous variables. J. Econom. 2002, 111, 251–283. [Google Scholar] [CrossRef]
- Geweke, J. Bayesian Reduced Rank Regression in Econometrics. J. Econ. Trics 1996, 75, 121–146. [Google Scholar] [CrossRef]
- Kleibergen, F.; Dijk, H.V. Bayesian simultaneous equation analysis using reduced rank structures. Econom. Theory 1998, 14, 701–743. [Google Scholar] [CrossRef]
- Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis; Chapman and Hall: New York, NY, USA, 2015. [Google Scholar]
- Siburian, M.E. Fiscal decentralization and regional income inequality: Evidence from Indonesia. Appl. Econ. Lett. 2019, 1–4. [Google Scholar] [CrossRef]
- Steele, F.; French, R.; Bartley, M. Adjusting for Selection Bias in Longitudinal Analysis Using Simultaneous Equations Modeling. The Relationship Between Employment Transitions and Mental Health. Epidemiology 2013, 24, 703–711. [Google Scholar] [CrossRef]
- Zhao, H.; Lin, B. Impact of foreign trade on energy efficiency in China’s textile industry. J. Clean. Prod. 2019, 245, 118878. [Google Scholar] [CrossRef]
- Adewuyi, A.O.; Awodumi, O.B. Biomass energy consumption, economic growth and carbon emi-ssions: Fresh evidence from West Africa using a simultaneous equation model. Energy 2017, 119, 453–471. [Google Scholar] [CrossRef]
- King, T. Using simultaneous equation modelling for defining complex phenotypes. BMC Genet. 2003, 4, S10. [Google Scholar] [CrossRef] [PubMed]
- Pitfield, D.; Caves, R.; Quddus, M. Airline strategies for aircraft size and airline frequency with changing demand and competition: A simultaneous-equations approach for traffic on the north Atlantic. J. Air Transp. Manag. 2009, 16, 151–158. [Google Scholar] [CrossRef]
- Olmeda, N.G.; Martinez, I.B. Application of simultaneous equation models to temporary disability prescriptions in primary healthcare centres. Int. J. Comput. Math. 2014, 91, 252–260. [Google Scholar] [CrossRef]
- Strathe, A.; Jørgensen, H.; Kebreab, E.; Danfær, A. Bayesian simultaneous equation models for the analysis of energy intake and partitioning in growing pigs. J. Agric. Sci. 2012, 150, 764–774. [Google Scholar] [CrossRef]
- Zhou, W.; Duan, W. An empirical study of how third-party websites influence the feedback mechanism between online Word-of-Mouth and retail sales. Decis. Support Syst. 2015, 76, 14–23. [Google Scholar] [CrossRef]
- Yang, S.; Narayan, V.; Assael, H. Estimating the Interdependence of Television Program Viewership Between Spouses: A Bayesian Simultaneous Equation Model. Mark. Sci. 2006, 25, 336–349. [Google Scholar] [CrossRef]
- Park, H.; Manchanda, P. When Harry Bet with Sally: An Empirical Analysis of Multiple Peer Effects in Casino Gambling Behavior. Mark. Sci. 2015, 2, 179–194. [Google Scholar] [CrossRef]
- Chen, C.; Chang, K.; Lin, L.; Lee, J. Health risk perception and betel chewing behavior. The evidence from Taiwan. Addict. Behav. 2013, 38, 2714–2717. [Google Scholar] [CrossRef]
- Cohen, D.P.; Benadjaoud, M.A.; Lestaevel, P.; Lebsir, D.; Benderitter, M.; Souidi, M. Effects of repetitive Iodine Thyroid Blocking on the Development of the Foetal Brain and Thyroid in rats: A Systems Biology approach. bioRxiv 2019. [Google Scholar] [CrossRef]
- Akaike, H. Information Theory and an Extension of the Maximum Likelihood Principle; Springer: New York, NY, USA, 1998. [Google Scholar]
- Keerativibool, W. New Criteria for Selection in Simultaneous Equations Model. Thail. Stat. 2012, 10, 163–181. [Google Scholar]
- Hurvich, C.; Tsai, C. Regression and time series model selection in small samples. Biometrika 1989, 76, 297–397. [Google Scholar] [CrossRef]
- Schwarz, G. Estimating the Dimension of a Model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
- Findley, D.F. Counterexamples to parsimony and BIC. Ann. Inst. Stat. Math. 1991, 43, 505–514. [Google Scholar] [CrossRef]
- Hannan, E.J.; Quinn, B.G. The Determination of the Order of an Autoregression. J. R. Stat. Soc. Ser. B (Methodol.) 1979, 41, 190–195. [Google Scholar] [CrossRef]
- Keerativibool, W.; Jitthavech, J. Model Selection Criterion Based on Kullback-Leibler’s Symmetric Divergence for Simultaneous Equations Model. Chiang Mai J. Sci. 2015, 42, 761–773. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley and Sons, Inc.: Hoboken, NJ, USA, 1991. [Google Scholar]
- Zhou, R.; Cai, R.; Tong, G. Applications of Entropy in Finance: A Review. Entropy 2013, 15, 4909–4931. [Google Scholar] [CrossRef]
- Cui, H.; Sivakumar, B.; Singh, V. Entropy Applications in Environmental and Water Engineering. Entropy 2019, 20, 598. [Google Scholar] [CrossRef] [PubMed]
- Purvis, B.; Mao, Y.; Robinson, D. Entropy and its Application to Urban Systems. Entropy 2019, 21, 56. [Google Scholar] [CrossRef]
- Oruç, Ö.E.; Kuruoglu, E.; Gündüz, A. Entropy Applications for Customer Satisfaction Survey in Information Theory. Front. Sci. 2011, 1, 1–4. [Google Scholar] [CrossRef]
- Amigó, J.M.; Balogh, S.G.; Hernández, S. A Brief Review of Generalized Entropies. Entropy 2018, 20, 813. [Google Scholar] [CrossRef]
- Qayyum, Z.; Hasan, S.S. K-Class estimators-a Review. Int. J. Math. Trends Technol. 2017, 50, 104–107. [Google Scholar] [CrossRef]
- Theil, H. Repeated Least Squares Applied to Complete Equation Systems; Central Planning Bureau: The Hague, The Netherlands, 1953.
- Basmann, R.L. A Generalized Classical Method of Linear Estimation of Coefficients in a Structural Equation. Econometrica 1957, 25, 77–83. [Google Scholar] [CrossRef]
- Clausius, R. The Mechanical Theory of Heat; Macmillan: New York, NY, USA, 1879. [Google Scholar]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
- Lombardi, O.; Holik, F.; Vanni, L. What is Shannon information? Synthese 2016, 193, 1983–2012. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).