# Generalized Information Matrix Tests for Detecting Model Misspecification

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

#### 1.1. Information Matrix Test Methods for Detection of Model Misspecification

#### 1.2. Recent Developments in Information Matrix Test Theory

## 2. GIMT Theoretical Framework: Definitions and Assumptions

#### 2.1. Data Generating Process

**Assumption**

**1.**

**Data Generating Process (DGP).**Let ${X}_{i},\text{}i=1,2,\mathrm{...}$ be a sequence of independent and identically distributed (i.i.d) random vectors where ${X}_{i}$ has a common probability measure P on the measurable space $\left({\mathcal{R}}^{d},\mathcal{B}\left({\mathcal{R}}^{d}\right)\right)$ with completion $\left({\mathcal{R}}^{d},{\mathcal{F}}_{0},{P}_{0}\right)$.

**x**

_{i}(a realization of

**X**

_{i}) may be a particular value of the outcome (dependent) variable for a regression model associated with the ith data record, the second element of

**x**

_{i}may be the number 1 for the purpose of introducing an intercept parameter, and the remaining elements of

**x**

_{i}may be particular values for the predictor variables associated with the ith data record, i = 1, …, n.

**X**

_{i}, i = 1, 2, … are i.i.d., the theory presented here is also applicable to panel data analyses. For example, consider a situation where data are collected in a longitudinal study on a group of individuals over a period of time. Assume the observations across participants are assumed to be i.i.d., but the observations for a particular participant are neither necessarily identically distributed nor independent. Let

**X**

_{it}denote the observation associated with the measurement of the ith participant in the study at time index t for t = 1, … ,T (where T is a fixed finite number) and i = 1, …, n. The theory described in this article is applicable to evaluating the degree to which a probability model can account for the observed data ${\mathit{X}}_{i}\equiv \left[\begin{array}{ccc}{\mathit{X}}_{i,1}& \dots & {\mathit{X}}_{i,T}\end{array}\right]$, i = 1, …, n.

**Assumption**

**2.**

**Absolute Continuity.**Let ${\nu}_{j}\left({x}_{j}\right)$ be a σ-finite measure on the measurable space $\left(\mathcal{R},\mathcal{B}\left(\mathcal{R}\right)\right)$ , j = 1, … , d. Let $\nu \equiv \underset{j=1}{\overset{d}{{\displaystyle \otimes}}}{\nu}_{j}\left({x}_{j}\right)$ be a σ-finite product measure on the measurable space $\left({\mathcal{R}}^{d},\mathcal{B}\left({\mathcal{R}}^{d}\right)\right)$. Assume ${P}_{0}$ is absolutely continuous with respect to $\nu $.

**X**

_{i}, P

_{0}, may be represented using a Radon-Nikodým density function. The Radon-Nikodým density ${p}_{x}\equiv d{P}_{0}/d\nu $ is common to the i.i.d. random variables ${X}_{i}$, i = 1, …, n on the measurable space $\left({\mathcal{R}}^{d},\mathcal{B}\left({\mathcal{R}}^{d}\right)\right)$.

#### 2.2. Probability Model

**X**.

**Assumption**

**3.**

**Parametric Densities.**(i) Let $\mathsf{\Theta}$ be a compact and non-empty subset of ${\mathcal{R}}^{k}$, $k\in \mathbb{N}$; (ii) Let $f:{\mathcal{R}}^{d}\times \mathsf{\Theta}\to [0,\infty )$. For each

**θ**in $\mathsf{\Theta}$, $f(\cdot ;\mathsf{\theta})$ is a density with respect to v and $f(x;\cdot )$ is continuous on $\mathsf{\Theta}$ for each $\mathit{x}\in supp\text{}\mathit{X}$; (iii) $\mathrm{log}f(x;\cdot )$ is continuously differentiable on $\mathsf{\Theta}$ for each $\mathit{x}\in supp\text{}\mathit{X}$; (iv) $\mathrm{log}f(x;\cdot )$ is twice continuously differentiable on $\mathsf{\Theta}$ for each $\mathit{x}\in supp\text{}\mathit{X}$; (v) $\mathrm{log}f(x;\cdot )$ is thrice continuously differentiable on $\mathsf{\Theta}$ for each $\mathit{x}\in supp\text{}\mathit{X}$.

**Definition.**

**Probability Model**. Let $f$ be defined as in Assumption 3(i) and Assumption 3(ii). Let $F:{\mathcal{R}}^{d}\times \mathsf{\Theta}\to [0,1]$ be defined such that for each

**θ**in $\mathsf{\Theta}$, $F\left(\cdot ;\mathsf{\theta}\right):{\mathcal{R}}^{d}\to [0,1]$ is the probability distribution for

**X**specified by density $f\left(\cdot ;\mathsf{\theta}\right).$ The set $\mathcal{M}\equiv \left\{F\left(\cdot ;\mathsf{\theta}\right):{\mathcal{R}}^{d}\to [0,1]|\mathsf{\theta}\in \mathsf{\Theta}\right\}$ is the probability model on $\mathbf{\Theta}$ specified by $f$.

**Definition.**

**Misspecified Model**. The probability model $\mathcal{M}$ is misspecified when ${P}_{0}\notin \mathcal{M}$, otherwise $\mathcal{M}$ is correctly specified.

#### 2.3. Hypothesis Function

**Definition.**

**GIMT Hypothesis Function**. Let $\mathsf{\Upsilon}$ be a compact and non-empty subset of ${\mathcal{R}}^{k\times k}$, $k\in \mathbb{N}$. A Generalized Information Matrix Test (GIMT) Hypothesis function $s:\mathsf{\Upsilon}\times \mathsf{\Upsilon}\to {\mathcal{R}}^{r}$ has the property that if $A=B$, then $s\left(A,B\right)={0}_{r}$ for every symmetric positive definite matrix $A\in \mathsf{\Upsilon}$ and for every symmetric positive definite matrix $B\in \mathsf{\Upsilon}$.

**Definition.**

**Nondirectional and directional GIMT Hypothesis Functions**. Let $\mathsf{\Upsilon}$ be a compact and non-empty subset of ${\mathcal{R}}^{k\times k}$, $k\in \mathbb{N}$. A nondirectional GIMT hypothesis function $s:\mathsf{\Upsilon}\times \mathsf{\Upsilon}\to {\mathcal{R}}^{r}$ has the property $A=B$ if and only if $s\left(A,B\right)={0}_{r}$ for all $\left(A,B\right)\in \mathsf{\Upsilon}\times \mathsf{\Upsilon}$. A directional GIMT hypothesis function is a GIMT hypothesis function that is not nondirectional.

**Assumption**

**4.**

**Hypothesis Function Regularity Conditions.**(i) Let $\mathsf{\Upsilon}$ be a compact and non-empty subset of ${\mathcal{R}}^{k\times k}$, $k\in \mathbb{N}$. Let $s:\mathsf{\Upsilon}\times \mathsf{\Upsilon}\to {\mathcal{R}}^{r}$ be continuous on $\mathsf{\Upsilon}\times \mathsf{\Upsilon}$; (ii) ${A}^{\ast}$ and ${B}^{\ast}$ are in the interior of $\mathsf{\Upsilon}\subseteq {\mathcal{R}}^{k\times k}$; (iii) $\mathsf{\nabla}s$ exists and is continuous on $\mathsf{\Upsilon}\times \mathsf{\Upsilon}$; (iv) $\mathsf{\nabla}{s}^{\ast}$ has full row rank r on $\mathsf{\Upsilon}\times \mathsf{\Upsilon}$.

**Definition.**

**Antisymmetric GIMT Hypothesis Function**. Let $s:\mathsf{\Upsilon}\times \mathsf{\Upsilon}\to {\mathcal{R}}^{r}$ be a GIMT hypothesis function satisfying Assumption 4(i), Assumption 4(ii), and Assumption 4(iii). If, in addition, $s\left(A,B\right)=-s\left(B,A\right)$ for all $\left(A,B\right)\in \mathsf{\Upsilon}\times \mathsf{\Upsilon}$, then $s:\mathsf{\Upsilon}\times \mathsf{\Upsilon}\to {\mathcal{R}}^{r}$ is called an antisymmetric GIMT hypothesis function.

#### 2.4. Notation

#### 2.5. Regularity Conditions

**Assumption**

**5.**

**Domination Conditions**

**X**is bounded. The assumption that the support of

**X**is bounded is satisfied, for example, by observational data consisting of discrete random variables. Assumptions 5(i) and Assumption 5(ii) more generally are satisfied for many commonly used finite-dimensional parametric smooth probability models for observational data modeled as combinations of both discrete and absolutely continuous random variables.

**Assumption**

**6.**

**Uniqueness.**(i) For some ${\mathsf{\theta}}^{\ast}\in \mathsf{\Theta}$, $l$ has a unique minimum at ${\mathsf{\theta}}^{\ast}$; (ii) ${\mathsf{\theta}}^{\ast}$ is interior to Θ.

**s**. Our ultimate goal is to construct a statistical test for testing the GIMT null hypothesis ${H}_{0}:s\left({A}^{\ast},{B}^{\ast}\right)={0}_{r}$ by characterizing the asymptotic behavior of the test statistic ${\widehat{s}}_{n}\equiv s\left({\widehat{A}}_{n},{\widehat{B}}_{n}\right)$. Note that the GIMT hypothesis function test statistic ${\widehat{s}}_{n}\equiv s\left({\widehat{A}}_{n},{\widehat{B}}_{n}\right)$ is an estimator of ${s}^{\ast}\equiv s\left({A}^{\ast},{B}^{\ast}\right)$ (see Theorem 6).

**Assumption**

**7.**

**Positive Definiteness.**(i) ${A}^{\ast}$ is positive definite; (ii) ${B}^{\ast}$ is positive definite; and (iii) ${\sum}_{s}^{\ast}$ is positive definite.

## 3. GIMT Theoretical Framework: Theorems and Formulas

#### 3.1. Classical Results

**Theorem**

**1.**

**Estimator Measurability ([30], Lemma 2).**Assume that Assumptions 1, 2, 3(i), and 3(ii) hold. Let ${P}_{0}^{n}$ be the joint distribution of ${X}_{1},\mathrm{...},{X}_{n}$. Then for each n = 1, 2, … , there exists a measurable function ${\widehat{\mathsf{\theta}}}_{n}:{\mathcal{R}}^{dn}\to \mathsf{\Theta}$ and an element, B

_{n}, of ${\left(\mathcal{B}\left({\mathcal{R}}^{d}\right)\right)}^{n}$ with ${P}_{0}^{n}\left({B}_{n}\right)=1$ such that for all $\left\{{x}_{1},\dots ,{x}_{n}\right\}\in {B}_{n}$:

**Theorem**

**2.**

**Estimator Consistency ([31], Theorem 2.1).**Assume Assumptions 1, 2, 3(i), 3(ii), 5(i)a, and 6 hold. Then as $n\to \infty $, ${\widehat{\mathsf{\theta}}}_{n}\to {\mathsf{\theta}}^{\ast}$ with probability one.

**Theorem**

**3.**

**Estimator Asymptotic Distribution ([1], Theorem 3.2; also see [32]).**Assume Assumptions 1, 2, 3(i), 3(ii), 3(iii), 3(iv), 5(i), 6, 7(i), and 7(ii) hold. As n → ∞, $\sqrt{n}\left({\widehat{\mathsf{\theta}}}_{n}-{\mathsf{\theta}}^{\ast}\right)$ converges in distribution to a zero-mean Gaussian random vector with non-singular covariance matrix ${C}^{\ast}\equiv {\left({A}^{\ast}\right)}^{-1}{B}^{\ast}{\left({A}^{\ast}\right)}^{-1}$.

**Theorem**

**4.**

**Contrapositive Information Matrix Equality ([1], Theorem 3.3).**Assume Assumptions 1, 2, 3(i), 3(ii), 3(iii), 3(iv), 5, and 6 hold. If ${A}^{\ast}\ne {B}^{\ast}$, then the probability model $\mathcal{M}\equiv \left\{F\left(\cdot ;\mathsf{\theta}\right):{\mathcal{R}}^{d}\to [0,1]|\mathsf{\theta}\in \mathsf{\Theta}\right\}$ is misspecified.

**Theorem**

**5.**

**Consistent QMLE Covariance Matrix Estimators (e.g., [1]).**Assume Assumptions 1, 2, 3(i), 3(ii), 3(iii), 3(iv), 5(i), 6, 7(i) and 7(ii) hold. Then, with probability one as $n\to \infty $: ${\widehat{B}}_{n}\to {B}^{\ast}$, ${\left({\widehat{B}}_{n}\right)}^{-1}\to {\left({B}^{\ast}\right)}^{-1}$, ${\widehat{A}}_{n}\to {A}^{\ast}$, ${\left({\widehat{A}}_{n}\right)}^{-1}\to {\left({A}^{\ast}\right)}^{-1}$, ${\widehat{C}}_{n}\to {C}^{\ast}$, and ${\left({\widehat{C}}_{n}\right)}^{-1}\to {\left({C}^{\ast}\right)}^{-1}$.

#### 3.2. GIMT Statistic Asymptotic Behavior

**Theorem**

**6.**

**GIMT Statistic Consistency.**Assume Assumptions 1, 2, 3, 4(i), 4(ii), 4(iii), 5(i), and 6 hold. Then as $n\to \infty $, ${\widehat{s}}_{n}\to {s}^{\ast}$ with probability one. If, in addition, Assumptions 5(ii) and 7(iii) hold, then with probability one ${{\displaystyle \widehat{\sum}}}_{s}^{n}\to {\sum}_{s}^{\ast}$ and ${\left({{\displaystyle \widehat{\sum}}}_{s}^{n}\right)}^{-1}\to {\left({\sum}_{s}^{\ast}\right)}^{-1}$ as $n\to \infty $.

**Theorem**

**7.**

**Generalized Information Matrix Wald Test.**Assume Assumptions 1, 2, 3, 4, 5(i), 5(ii), 6, 7 hold with respect to a GIMT hypothesis function $s:\mathsf{\Upsilon}\times \mathsf{\Upsilon}\to {\mathcal{R}}^{r}$ and probability model $\mathcal{M}$. Let ${\widehat{\mathcal{W}}}_{n}\equiv n{\left({\widehat{s}}_{n}\right)}^{T}{\left({{\displaystyle \widehat{\sum}}}_{s}^{n}\right)}^{-1}\left({\widehat{s}}_{n}\right)$. If ${H}_{0}:{s}^{\ast}={0}_{r}$, then ${\widehat{\mathcal{W}}}_{n}\stackrel{d}{\to}{\chi}_{r}^{2}$ as $n\to \infty $. If ${H}_{0}:{s}^{\ast}={0}_{r}$ is false, then ${\widehat{\mathcal{W}}}_{n}\to \infty $ as $n\to \infty $ w.p.1.

**Proposition**

**1.**

**Interpretation of GIMT Null and Alternative Hypotheses.**Suppose the Assumptions of Theorem 4 hold. Let

**s**be a GIMT hypothesis function. (i) If $\mathcal{M}$ is correctly specified, then ${H}_{0}:{s}^{\ast}={0}_{r}$; (ii) If ${H}_{0}:{s}^{\ast}={0}_{r}$ is false, then $\mathcal{M}$ is misspecified.

#### 3.3. GIMT Covariance Matrix Estimators

**Theorem**

**8.**

**Lancaster-Chesher Estimator (see [12]).**Assume Assumptions 1, 2, 3, 5(i)a, 5(i)c, 5(i)d, 5(ii)a, 5(ii)c, 5(iii), and 6 hold with respect to a GIMT hypothesis function $s:\mathsf{\Upsilon}\times \mathsf{\Upsilon}\to {\mathcal{R}}^{r}$ and probability model $\mathcal{M}$. If $\mathcal{M}$ is correctly specified, then with probability one $\ddot{\mathsf{\nabla}}{d}_{n}\left({\widehat{\mathsf{\theta}}}_{n}\right)\to \mathsf{\nabla}{d}^{\ast}$ as $n\to \infty $.

**s**is anti-symmetric and ${A}^{\ast}={B}^{\ast}$, then it follows that the term $\left(\mathsf{\nabla}{s}^{\ast}\right){\mathcal{D}}_{k}^{\otimes}{d}^{\ast}={0}_{r}$ so that the centering term ${d}^{\ast}$ in (1) can be set equal to ${0}_{r}$. Thus, an alternative estimator of ${d}^{\ast}$ that can be used instead of the centering term estimator ${\widehat{d}}_{n}$ in (2) is simply a vector of zeros. These two methods yield six different non-directional GIMT covariance matrix estimators.

#### 3.4. Adjusted GIMT Hypothesis Functions

**s**may have the property that the r-dimensional matrix ${\Sigma}_{s}^{\ast}$ is singular with rank g where g < r so that Assumption 7(iii) fails. However, it is often possible to replace the original GIMT hypothesis function $s:\mathsf{\Upsilon}\times \mathsf{\Upsilon}\to {\mathcal{R}}^{r}$ with an alternative “adjusted” GIMT hypothesis function ${s}^{\prime}:\mathsf{\Upsilon}\times \mathsf{\Upsilon}\to {\mathcal{R}}^{g}$ that tests a similar null hypothesis yet has the properties that: (i) the resulting asymptotic covariance matrix of ${n}^{1/2}{\mathbf{s}}_{n}^{\prime}$ is nonsingular; and (ii) rejection of ${H}_{0}:{s}^{\prime}({A}^{\ast},{B}^{\ast})={0}_{r}$ implies rejection of ${H}_{0}:s({A}^{\ast},{B}^{\ast})={0}_{r}$.

**Proposition**

**2.**

**Adjusted GIMT Hypothesis Function Properties.**Let ${\Sigma}_{s}^{\ast}$ be an r-dimensional GIMT asymptotic covariance matrix for GIMT hypothesis function $s:\mathsf{\Upsilon}\times \mathsf{\Upsilon}\to {\mathcal{R}}^{r}$ such that Assumption 7(iii) holds. Let the $g$ rows of the rank g matrix $T\in {\mathcal{R}}^{g\times r}$ be r-dimensional orthonormal eigenvectors of ${\Sigma}_{s}^{\ast}$ $\left(r>g\ge 1\right)$ for GIMT hypothesis function

**s**. Define an alternative GIMT hypothesis function ${s}^{\prime}\equiv Ts$ whose respective g-dimensional GIMT asymptotic covariance matrix is ${\Sigma}_{T}^{\ast}=T{\Sigma}_{s}^{\ast}{T}^{T}$. (i) If ${H}_{0}:{s}^{\prime}\left({A}^{\ast},{B}^{\ast}\right)={0}_{g}$ is false, then ${H}_{0}:s\left({A}^{\ast},{B}^{\ast}\right)={0}_{r}$ is false; (ii) The g-dimensional GIMT asymptotic covariance matrix, ${\Sigma}_{T}^{\ast}$, for ${s}^{\prime}$ is finite and positive definite.

**T**in Proposition 1 is called the adjusted GIMT hypothesis projection matrix. The proof of Proposition 2(i) follows from the observation that if ${\left|s\right|}^{2}={0}_{r}$, then ${\left|s\prime \right|}^{2}={0}_{g}$. Proposition 2(ii) follows from the observation that ${\Sigma}_{T}^{\ast}=T{\Sigma}_{s}^{\ast}{T}^{T}$ is non-singular by the construction of

**T**and Assumption 7(iii).

## 4. Simulation Studies

#### 4.1. Generalized Information Matrix Tests

#### 4.1.1. Adjusted Classical GIMT (Directional) [23]

**T**are r-dimensional orthonormal eigenvectors of ${\Sigma}_{s}^{\ast}$ ($r>g\ge 1$). Then, instead of testing the null hypothesis ${H}_{0}:{A}^{\ast}={B}^{\ast}$ associated with the classical full non-directional Information Matrix Test [1], the null hypothesis ${H}_{0}:Tvech\left({A}^{\ast}\right)=Tvech\left({B}^{\ast}\right)$ is tested using the GIMT hypothesis function

**s**defined such that: $s\left({A}^{\ast},{B}^{\ast}\right)=Tvech\left({A}^{\ast}-{B}^{\ast}\right)$. The GIMT associated with this hypothesis function is called the Adjusted Classical GIMT (Directional). Golden et al. [23] provided further discussion of this GIMT and showed that it had good level and power properties using simulation studies of a realistic epidemiological data analysis problem.

#### 4.1.2. Fisher Spectra GIMT (Directional)

**s**defined such that:

#### 4.1.3. Robust Log GAIC GIMT (Directional)

**s**defined such that:

#### 4.1.4. Robust Log GAIC Ratio GIMT (Directional)

**s**is defined such that:

#### 4.1.5. Composite Log GAIC GIMT (Nondirectional)

**s**is defined such that:

#### 4.1.6. Composite GAIC GIMT (Non-Directional)

**s**is defined such that:

#### 4.2. Methods

#### 4.2.1. Simulated Data Generating Processes

#### 4.2.2. Estimation of Type 1 and Type 2 Error Rates

^{−8}. Further, we avoided fitting models to degenerate simulated data by omitting samples with condition numbers greater than 4.5 × 10

^{14}to insure numerical stability. The condition number is defined as the maximum eigenvalue divided by the minimum eigenvalue of the inverse of the Hessian covariance matrix estimator. Each simulation was run until m = 10,000 simulated data samples of size n was reached. The sample sizes n for the simulated data represented 6.25%, 12.5%, 25%, 50%, and 100% of the original 16,000-member sample.

#### 4.3. Results and Discussion

#### 4.3.1. Type 1 Error Performance

#### 4.3.2. Level-Power Analyses

## 5. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## Appendix A. Proofs of Theorems and Propositions

**Definition.**

**Dominated by an Integrable Function.**Let

**X**be a random d-dimensional real vector defined on a complete probability space $(\mathrm{\Omega},\Im ,P)$, where P has Radon-Nikodým density $p$ with respect to a σ-finite measure ${\nu}_{x}$. Let $\mathsf{\Theta}\subset {\mathcal{R}}^{r}$ be a compact set, $r\in \mathbb{N}$. Let $Q:{\mathcal{R}}^{d}\times \mathsf{\Theta}\to {\mathcal{R}}^{m\times n}$ be a function defined such that each element of $Q(x,\cdot )$ is continuous on Θ for all $\mathit{x}\in supp\text{}\mathit{X}$, and each element of $Q(\cdot ,\mathsf{\theta})$ is measurable for all $\mathsf{\theta}\in \mathsf{\Theta}$. Suppose there exists a function $K:{\mathcal{R}}^{d}\to {\mathcal{R}}^{+}$ such that each element, ${q}_{ij}$ of

**Q**: $\left|{q}_{ij}(x,\mathsf{\theta})\right|\le K(x)$ for all $\mathsf{\theta}\in \mathsf{\Theta}$ and for all $\mathit{x}\in supp\text{}\mathit{X}$. Also assume that the expected value of $K(X)$ with respect to $p$ is finite. Then

**Q**is dominated by an integrable function K on Θ with respect to $p$.

**Proof of Theorem**

**6.**

**Proof of Theorem**

**7.**

^{1/2}we then have:

## References

- H. White. “Maximum Likelihood Estimation of Misspecified Models.” Econometrica 50 (1982): 1–25. [Google Scholar] [CrossRef]
- H. White. Estimation, Inference, and Specification Analysis. New York, NY, USA: Cambridge University Press, 1994. [Google Scholar]
- T.M. Kashner, S.S. Henley, R.M. Golden, A.J. Rush, and R.B. Jarrett. “Assessing the preventive effects of cognitive therapy following relief of depression: A methodological innovation.” J. Affect. Disord. 104 (2007): 251–261. [Google Scholar] [CrossRef] [PubMed]
- T.M. Kashner, R. Rosenheck, A.B. Campinell, A. Suris, R. Crandall, N.J. Garfield, P. Lapuc, K. Pyrcz, T. Soyka, and A. Wicker. “Impact of work therapy on health status among homeless, substance-dependent veterans: A randomized controlled trial.” Arch. Gen. Psychiatry 59 (2002): 938–944. [Google Scholar] [CrossRef] [PubMed]
- T.M. Kashner, T.J. Carmody, T. Suppes, A.J. Rush, M.L. Crismon, A.L. Miller, M. Toprac, and T. Madhukar. “Catching up on health outcomes: The Texas Medication Algorithm Project.” Health Serv. Res. 38 (2003): 311–331. [Google Scholar] [CrossRef] [PubMed]
- T.M. Kashner, S.S. Henley, R.M. Golden, J.M. Byrne, S.A. Keitz, G.W. Cannon, B.K. Chang, G.J. Holland, D.C. Aron, E.A. Muchmore, and et al. “Studying the Effects of ACGME Duty Hours Limits on Resident Satisfaction: Results From VA Learners’ Perceptions Survey.” Acad. Med. 85 (2010): 1130–1139. [Google Scholar] [CrossRef] [PubMed]
- S.S. Henley, T.M. Kashner, R.M. Golden, and A.N. Westover. “Response to letter regarding “A systematic approach to subgroup analyses in a smoking cessation trial”.” Am. J. Drug Alcohol Abuse 42 (2016): 112–113. [Google Scholar] [CrossRef] [PubMed]
- A.N. Westover, T.M. Kashner, T.M. Winhusen, R.M. Golden, and S.S. Henley. “A Systematic Approach to Subgroup Analyses in a Smoking Cessation Trial.” Am. J. Drug Alcohol Abuse 41 (2015): 498–507. [Google Scholar] [CrossRef] [PubMed]
- S.C. Brakenridge, S.S. Henley, T.M. Kashner, R.M. Golden, D. Paik, H.A. Phelan, M. Cohen, J.L. Sperry, E.E. Moore, J.P. Minei, and et al. “Comparing Clinical Predictors of Deep Venous Thrombosis vs. Pulmonary Embolus After Severe Blunt Injury: A New Paradigm for Post-Traumatic Venous Thromboembolism? ” J. Trauma Acute Care Surg. 74 (2013): 1231–1238. [Google Scholar] [CrossRef] [PubMed]
- S.C. Brakenridge, H.A. Phelan, S.S. Henley, R.M. Golden, T.M. Kashner, A.E. Eastman, J.L. Sperry, B.G. Harbrecht, E.E. Moore, J. Cuschieri, and et al. “Early blood product and crystalloid volume resuscitation: Risk association with multiple organ dysfunction after severe blunt traumatic injury.” J. Trauma 71 (2011): 299–305. [Google Scholar] [CrossRef] [PubMed]
- A. Chesher. “The information matrix test: Simplified calculation via a score test interpretation.” Econ. Lett. 13 (1983): 45–48. [Google Scholar] [CrossRef]
- T. Lancaster. “The Covariance Matrix of the Information Matrix Test.” Econometrica 52 (1984): 1051–1054. [Google Scholar] [CrossRef]
- T. Aparicio, and I. Villanua. “The asymptotically efficient version of the information matrix test in binary choice models. A study of size and power.” J. Appl. Stat. 28 (2001): 167–182. [Google Scholar] [CrossRef]
- R. Davidson, and J.G. MacKinnon. “Graphical Methods for Investigating the Size and Power of Hypothesis Tests.” Manch. Sch. 66 (1998): 1–26. [Google Scholar] [CrossRef]
- R. Davidson, and J.G. MacKinnon. “A New Form of the Information Matrix Test.” Econometrica 60 (1992): 145–157. [Google Scholar] [CrossRef]
- G. Dhaene, and D. Hoorelbeke. “The information matrix test with bootstrap-based covariance matrix estimation.” Econ. Lett. 82 (2004): 341–347. [Google Scholar] [CrossRef]
- C. Stomberg, and H. White. Bootstrapping the Information Matrix Test. Discussion Paper; San Diego, CA, USA: Department of Economics, University of California, 2000. [Google Scholar]
- L.W. Taylor. “The Size Bias of White’s Information Matrix Test.” Econ. Lett. 24 (1987): 63–67. [Google Scholar] [CrossRef]
- B. Presnell, and D.D. Boos. “The IOS Test for Model Misspecification.” J. Am. Stat. Assoc. 99 (2004): 216–227. [Google Scholar] [CrossRef]
- M. Capanu, and B. Presnell. “Misspecification tests for binomial and beta-binomial models.” Stat. Med. 27 (2008): 2536–2554. [Google Scholar] [CrossRef] [PubMed]
- M. Capanu. “Tests of Misspecification for Parametric Models.” University of Florida, 2005. Available online: http://etd.fcla.edu/UF/UFE0010943/capanu_m.pdf (accessed on 1 June 2016).
- S. Zhang, P.X.K. Song, D. Shi, and Q.M. Zhou. “Information ratio test for model misspecification on parametric structures in stochastic diffusion models.” Comput. Stat. Data Anal. 56 (2012): 3975–3987. [Google Scholar] [CrossRef]
- R.M. Golden, S.S. Henley, H. White, and T.M. Kashner. “New Directions in Information Matrix Testing: Eigenspectrum Tests.” In Causality, Prediction, and Specification Analysis: Recent Advances and Future Directions: Essays in Honor of Halbert L. White, Jr. (Festschrift Hal White Conference). Edited by X. Chen and N.R. Swanson. New York, NY, USA: Springer, 2013, pp. 145–178. [Google Scholar]
- J.S. Cho, and H. White. “Testing the Equality of Two Positive-Definite Matrices with Application to Information Matrix Testing.” In Essays in Honor of Peter C. B. Phillips. Edited by Y. Chang, T.B. Fomby and J.Y. Park. Bingley, UK: Emerald Group Publishing Limited, 2014, pp. 491–556. [Google Scholar]
- Q.M. Zhou, P.X.K. Song, and M.E. Thompson. “Information Ratio Test for Model Misspecification in Quasi-Likelihood Inference.” J. Am. Stat. Assoc. 107 (2012): 205–213. [Google Scholar] [CrossRef]
- W. Huang, and A. Prokhorov. “A Goodness-of-Fit Test for Copulas.” Econom. Rev. 33 (2014): 751–771. [Google Scholar] [CrossRef][Green Version]
- W.H. Marlow. Mathematics for Operations Research. Mineola, NY, USA: Dover Publications, 2012. [Google Scholar]
- J.R. Magnus. “On the concept of matrix derivative.” J. Multivar. Anal. 101 (2010): 2200–2206. [Google Scholar] [CrossRef]
- J.R. Magnus, and H. Neudecker. Matrix Differential Calculus with Applications in Statistics and Econometrics. New York, NY, USA: John Wiley & Sons, 1999. [Google Scholar]
- R.I. Jennrich. “Asymptotic Properties of Non-linear Least Squares Estimators.” Ann. Math. Stat. 40 (1969): 633–643. [Google Scholar] [CrossRef]
- H. White. “Consequences and detection of misspecified nonlinear regression models.” J. Am. Stat. Assoc. 76 (1981): 419–433. [Google Scholar] [CrossRef]
- P. Huber. “The Behavior of Maximum Likelihood Estimates under Non-Standard Conditions.” In Proceedings Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1. Berkeley, CA, USA: University of California Press, 1967, pp. 221–233. [Google Scholar]
- A. Prokhorov, U. Schepsmeier, and Y. Zhu. Generalized Information Matrix Tests for Copulas, Working Paper. Sydney, Australia: University of Sydney Business School, Discipline of Business Analytics, 2015. [Google Scholar]
- H. Bozdogan. “Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions.” Psychometrika 52 (1987): 345–370. [Google Scholar] [CrossRef]
- H. Linhart, and W. Zucchini. Model Selection. New York, NY, USA: Wiley, 1986. [Google Scholar]
- K. Takeuchi. “Distribution of information statistics and a criterion of model fitting for adequacy of models.” Math. Sci. 153 (1976): 12–18. [Google Scholar]
- J. Cho, and P. Phillips. “Testing Equality of Covariance Matrices via Pythagorean Means.” 2014. Available online: http://ssrn.com/abstract=2533002 (accessed on 1 June 2016).
- R.M. Golden. “Statistical tests for comparing possibly misspecified and nonnested models.” J. Math. Psychol. 44 (2000): 153–170. [Google Scholar] [CrossRef] [PubMed]
- R.M. Golden. “Discrepancy risk model selection test theory for comparing possibly misspecified or nonnested models.” Psychometrika 68 (2003): 229–249. [Google Scholar] [CrossRef]
- S.S. Henley, R.M. Golden, T.M. Kashner, and H. White. Exploiting Hidden Structures in Epidemiological Data: Phase II Project. Plano, TX, USA: NIH/NIAAA, 2000. [Google Scholar]
- S.S. Henley, R.M. Golden, T.M. Kashner, H. White, and D. Paik. Robust Classification Methods for Categorical Regression: Phase II Project. Plano, TX, USA: National Cancer Institute, 2008. [Google Scholar]
- S.S. Henley, R.M. Golden, T.M. Kashner, H. White, and R.D. Katz. Model Selection Methods for Categorical Regression: Phase I Project. Plano, TX, USA: NIH/NIAAA, 2003. [Google Scholar]
- Q.H. Vuong. “Likelihood ratio tests for model selection and non-nested hypotheses.” Econometrica 57 (1989): 307–333. [Google Scholar] [CrossRef]
- H. Bozdogan. “Akaike’s Information Criterion and Recent Developments in Information Complexity.” J. Math. Psychol. 44 (2000): 62–91. [Google Scholar] [CrossRef] [PubMed]
- T. Fawcett. “An introduction to ROC analysis.” Pattern Recogn. Lett. 27 (2006): 861–874. [Google Scholar] [CrossRef]
- M.S. Pepe. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford, UK: Oxford University Press, 2004. [Google Scholar]
- T.D. Wickens. Elementary Signal Detection Theory. New York, NY, USA: Oxford University Press, 2002. [Google Scholar]
- T. Hastie, and R. Tibshirani. Generalized Additive Models. New York, NY, USA: Chapman and Hall, 1990. [Google Scholar]
- P. McCullagh, and J.A. Nelder. Generalized Linear Models. London, UK: New York, NY, USA: Chapman and Hall, 1989. [Google Scholar]
- B. Wei. Exponential Family Nonlinear Models. New York, NY, USA: Springer, 1998. [Google Scholar]
- D.W. Hosmer, and S. Lemeshow. Applied Logistic Regression. New York, NY, USA: Wiley, 1989. [Google Scholar]
- F.E. Harrell. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York, NY, USA: Springer, 2001. [Google Scholar]
- G. Arminger, and M.E. Sobel. “Pseudo-maximum likelihood estimation of mean and covariance structures with missing data.” J. Am. Stat. Assoc. 85 (1990): 195–203. [Google Scholar] [CrossRef]
- J. Gallini. “Misspecifications that can result in path analysis structures.” Appl. Psychol. Meas. 7 (1983): 125–137. [Google Scholar] [CrossRef]
- S.W. Raudenbush, and A.S. Bryk. Hierarchical Linear Models: Applications and Data Analysis Methods. Thousand Oaks, CA, USA: Sage Publications, Inc., 2002. [Google Scholar]
- D.W. Hosmer, and S. Lemeshow. “A goodness-of-fit test for the multiple logistic regression model.” Commun. Stat. A10 (1980): 1043–1069. [Google Scholar] [CrossRef]
- D.W. Hosmer, S. Lemeshow, and J. Klar. “Goodness-of-Fit Testing for Multiple Logistic Regression Analysis when the Estimated Probabilities are Small.” Biom. J. 30 (1988): 1–14. [Google Scholar] [CrossRef]
- D.W. Hosmer, S. Taber, and S. Lemeshow. “The importance of assessing the fit of logistic regression models: A case study.” Am. J. Public Health 81 (1991): 1630–1635. [Google Scholar] [CrossRef] [PubMed]
- R.J. Serfling. Approximation Theorems of Mathematical Statistics. New York, NY, USA: Wiley-Interscience, 1980. [Google Scholar]
- H. White. Asymptotic Theory for Econometricians, Revised Edition. New York, NY, USA: Academic Press, 2001. [Google Scholar]
- H. White. “Using least squares to approximate unknown regression functions.” Int. Econ. Rev. 21 (1980): 149–170. [Google Scholar] [CrossRef]

**Figure 1.**Level-power for GIMTs using the analytic 3rd derivative formula is characterized by Area Under the Receiver Operating Characteristic curve (AUROC) as a function of sample size. With respect to the chosen test problem, these GIMTs obtain nearly perfect performance in correct rejection of the null hypothesis and correct acceptance of the null hypothesis when the sample size in this simulation study exceeds 4000 exemplars. Each data point in the above graph was generated from 10,000 bootstrap data samples.

**Figure 2.**Level-power for GIMTs using the Lancaster-Chesher 3rd derivative approximation is characterized by Area Under the Receiver Operating Characteristic curve (AUROC) as a function of sample size. With respect to the chosen test problem these GIMTs obtain excellent performance in correct rejection of the null hypothesis and correct acceptance of the null hypothesis when the sample size in this simulation study is near 16,000 exemplars. While the Adjusted Classical GIMT evidences excellent performance across sample sizes, the other GIMTs show poor Level-Power performance below 15,000 exemplars. Each data point in the above graph was generated from 10,000 bootstrap data samples.

**Table 1.**Type 1 error performance of GIMTs using the analytic third derivative formula for pre-specified (nominal) significance levels: 0.01, 0.025, 0.05, and 0.10. Level performance for the directional GIMTs was better than level performance for the non-directional GIMTs. Bootstrap simulation standard errors are shown in parentheses. Computed values are for 10,000 simulated data samples for sample size n = 16,000. df = degrees of freedom.

Generalized Information Matrix Test (GIMT) | Test Type | p = 0.01 | p = 0.025 | p = 0.05 | p = 0.10 |
---|---|---|---|---|---|

Adjusted Classical (≤10 df) | Directional | 0.0136 | 0.0308 | 0.0550 | 0.1059 |

(0.0012) | (0.0017) | (0.0023) | (0.0031) | ||

Composite GAIC (2 df) | Non-Directional | 0.0830 | 0.1014 | 0.1225 | 0.1546 |

(0.0027) | (0.0030) | (0.0032) | (0.0036) | ||

Composite Log GAIC (2 df) | Non-Directional | 0.0564 | 0.0742 | 0.0930 | 0.1219 |

(0.0023) | (0.0026) | (0.0029) | (0.0032) | ||

Fisher Spectra (4 df) | Directional | 0.0205 | 0.0337 | 0.0584 | 0.1035 |

(0.0014) | (0.0018) | (0.0023) | (0.0030) | ||

Robust Log GAIC (1 df) | Directional | 0.0185 | 0.0360 | 0.0618 | 0.1144 |

(0.0013) | (0.0018) | (0.0024) | (0.0031) | ||

Robust Log GAIC Ratio (1 df) | Directional | 0.0158 | 0.0335 | 0.0590 | 0.1135 |

(0.0012) | (0.0018) | (0.0023) | (0.0031) |

**Table 2.**Type 1 error performance of GIMTs using the Lancaster-Chesher third derivative approximation for pre-specified (nominal) significance levels: 0.01, 0.025, 0.05, and 0.10. Like the third derivative method in Table 1, level performance for the directional GIMTs was better than level performance for the non-directional GIMTs. Further, for non-directional GIMTs, level performance of the Lancaster-Chesher third derivative approximation for the non-directional GIMTs was better than using third derivative GIMTs. Bootstrap simulation standard errors are shown in parentheses. Computed values are for 10,000 simulated data samples for sample size n = 16,000. df = degrees of freedom.

Generalized Information Matrix Test (GIMT) | Test Type | p = 0.01 | p = 0.025 | p = 0.05 | p = 0.10 |
---|---|---|---|---|---|

Adjusted Classical (≤10 df) | Directional | 0.0085 | 0.0195 | 0.0409 | 0.0916 |

(0.0009) | (0.0014) | (0.0020) | (0.0029) | ||

Composite GAIC (2 df) | Non-Directional | 0.0662 | 0.0821 | 0.1006 | 0.1259 |

(0.0024) | (0.0026) | (0.0029) | (0.0032) | ||

Composite Log GAIC (2 df) | Non-Directional | 0.0403 | 0.0498 | 0.0646 | 0.0884 |

(0.0019) | (0.0021) | (0.0023) | (0.0027) | ||

Fisher Spectra (4 df) | Directional | 0.0071 | 0.0161 | 0.0264 | 0.0535 |

(0.0008) | (0.0012) | (0.0015) | (0.0021) | ||

Robust Log GAIC (1 df) | Directional | 0.0045 | 0.0138 | 0.0236 | 0.0622 |

(0.0006) | (0.0011) | (0.0014) | (0.0023) | ||

Robust Log GAIC Ratio (1 df) | Directional | 0.0032 | 0.0097 | 0.0285 | 0.0588 |

(0.0005) | (0.0009) | (0.0016) | (0.0022) |

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license ( http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Golden, R.M.; Henley, S.S.; White, H.; Kashner, T.M. Generalized Information Matrix Tests for Detecting Model Misspecification. *Econometrics* **2016**, *4*, 46.
https://doi.org/10.3390/econometrics4040046

**AMA Style**

Golden RM, Henley SS, White H, Kashner TM. Generalized Information Matrix Tests for Detecting Model Misspecification. *Econometrics*. 2016; 4(4):46.
https://doi.org/10.3390/econometrics4040046

**Chicago/Turabian Style**

Golden, Richard M., Steven S. Henley, Halbert White, and T. Michael Kashner. 2016. "Generalized Information Matrix Tests for Detecting Model Misspecification" *Econometrics* 4, no. 4: 46.
https://doi.org/10.3390/econometrics4040046