# An Information Criterion for Auxiliary Variable Selection in Incomplete Data Analysis

## Abstract

## 1. Introduction

## 2. Preliminaries

#### 2.1. Incomplete Data Analysis for Primary Variables

#### 2.2. Statistical Analysis with Auxiliary Variables

#### 2.3. Comparing the Two Estimators

## 3. An Illustrative Example with Auxiliary Variables

#### 3.1. Model Setting

- Case 1:
- ${q}_{a|x}(a|y,z)={q}_{a|z}(a|z)=zN(a;1.8,0.49)+(1-z)N(a;-1.8,0.49)$.
- Case 2:
- ${q}_{a|x}(a|y,z)={q}_{a}(a)=0.6N(a;1.8,0.49)+0.4N(a;-1.8,0.49)$.

#### 3.2. Estimation Results

## 4. Information Criterion

#### 4.1. Asymptotic Expansion of the Risk Function

#### 4.2. Estimating the Risk Function

#### 4.3. Akaike Information Criteria for Auxiliary Variable Selection

#### 4.4. The Illustrative Example (Cont.)

## 5. Leave-One-Out Cross Validation

## 6. Experiments with Simulated Datasets

#### 6.1. Unbiasedness

#### 6.2. Auxiliary Variable Selection

## 7. Experiments with Real Datasets

## 8. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Appendix A. Proofs

#### Appendix A.1. Proof of Lemma 1

#### Appendix A.2. Proof of Lemma 2

#### Appendix A.3. Proof of Theorem 2

#### Appendix A.4. Proof of Theorem 3

**Figure 1.**Useful auxiliary variable (Case 1). The left panel plots ${\{({y}_{i},{a}_{i})\}}_{i=1}^{100}$ with labels indicating ${z}_{i}$. The estimated ${p}_{b}({\widehat{\beta}}_{b})$ is shown by the contour lines. The right panel shows the histogram of ${\{{y}_{i}\}}_{i=1}^{100}$, and three density functions ${p}_{y}({\widehat{\theta}}_{x})$ (broken line), ${p}_{y}({\widehat{\theta}}_{y})$ (dotted line), and ${p}_{y}({\widehat{\theta}}_{b})$ (solid line). In Section 4.4, this useful auxiliary variable is selected by our method (Case 1 in Table 2).

**Figure 2.**Useless auxiliary variable (Case 2). The symbols are the same as Figure 1. In Section 4.4, this useless auxiliary variable is NOT selected by our method (Case 2 in Table 2).

**Table 1.**Random variables in incomplete data analysis with auxiliary variables. $B=(Y,A)$ is used for estimation of unknown parameters, and $X=(Y,Z)$ is used for evaluation of candidate models.

Observed | Latent | Complete | |
---|---|---|---|

Primary | Y | Z | X |

Auxiliary | A | – | – |

All | B | – | C |

**Table 2.**Comparisons between ${\widehat{\theta}}_{b}$ and ${\widehat{\theta}}_{y}$ for predicting X, and that for Y.

${\mathit{p}}_{\mathit{x}}({\widehat{\mathbf{\theta}}}_{\mathit{b}})$ vs. ${\mathit{p}}_{\mathit{x}}({\widehat{\mathbf{\theta}}}_{\mathit{y}})$ | ${\mathit{p}}_{\mathit{y}}({\widehat{\mathbf{\theta}}}_{\mathit{b}})$ vs. ${\mathit{p}}_{\mathit{y}}({\widehat{\mathbf{\theta}}}_{\mathit{y}})$ | |
---|---|---|

${\mathit{AIC}}_{\mathit{x};\mathit{b}}-{\mathit{AIC}}_{\mathit{x};\mathit{y}}$ | ${\mathit{AIC}}_{\mathit{y};\mathit{b}}-{\mathit{AIC}}_{\mathit{y};\mathit{y}}$ | |

Case 1 | −2.67 | −0.96 |

Case 2 | 9.86 | 10.37 |

**Table 3.**Expected Akaike Information Criterion (AIC) difference is compared with the risk difference. The values are computed from $T={10}^{4}$ runs of simulation with their standard errors in parentheses.

n | 100 | 200 | 500 | 1000 | 2000 | 5000 |
---|---|---|---|---|---|---|

$E[{\mathrm{AIC}}_{x;b}-{\mathrm{AIC}}_{x;y}]$ | −3.559 | −3.263 | −3.221 | −3.197 | −3.195 | −3.180 |

(0.074) | (0.021) | (0.015) | (0.013) | (0.013) | (0.012) | |

$2n\{{\mathcal{R}}_{x}({\widehat{\theta}}_{b})-{\mathcal{R}}_{x}({\widehat{\theta}}_{y})\}$ | −3.603 | −3.333 | −3.275 | −3.208 | −3.182 | −3.232 |

(0.071) | (0.054) | (0.050) | (0.050) | (0.050) | (0.050) |

**Table 4.**Useful auxiliary variable (Case 1): selection frequencies of ${\widehat{\theta}}_{b}$ and ${\widehat{\theta}}_{y}$.

n | 100 | 200 | 500 | 1000 | 2000 | 5000 |
---|---|---|---|---|---|---|

${\widehat{\theta}}_{b}$ | 9230 | 9475 | 9649 | 9687 | 9711 | 9727 |

${\widehat{\theta}}_{y}$ | 770 | 525 | 351 | 313 | 289 | 273 |

**Table 5.**Useless auxiliary variable (Case 2): selection frequencies of ${\widehat{\theta}}_{b}$ and ${\widehat{\theta}}_{y}$.

n | 100 | 200 | 500 | 1000 | 2000 | 5000 |
---|---|---|---|---|---|---|

${\widehat{\theta}}_{b}$ | 1508 | 212 | 1 | 0 | 0 | 0 |

${\widehat{\theta}}_{y}$ | 8492 | 9788 | 9999 | 10,000 | 10,000 | 10,000 |

**Table 6.**Useful auxiliary variable (Case 1): estimated risk functions of ${\widehat{\theta}}_{b}$, ${\widehat{\theta}}_{y}$, and ${\widehat{\theta}}_{best}$, and their standard errors in parenthesis.

n | 100 | 200 | 500 | 1000 | 2000 | 5000 |
---|---|---|---|---|---|---|

$2n\{{\mathcal{R}}_{x}({\widehat{\theta}}_{b})-{\mathcal{L}}_{x}({\theta}_{0})\}$ | 4.229 | 4.079 | 4.051 | 4.039 | 4.029 | 4.033 |

(0.032) | (0.030) | (0.029) | (0.028) | (0.029) | (0.028) | |

$2n\{{\mathcal{R}}_{x}({\widehat{\theta}}_{y})-{\mathcal{L}}_{x}({\theta}_{0})\}$ | 7.831 | 7.412 | 7.326 | 7.247 | 7.211 | 7.266 |

(0.078) | (0.061) | (0.058) | (0.058) | (0.058) | (0.058) | |

$2n\{{\mathcal{R}}_{x}({\widehat{\theta}}_{best})-{\mathcal{L}}_{x}({\theta}_{0})\}$ | 5.109 | 4.741 | 4.501 | 4.491 | 4.479 | 4.454 |

(0.052) | (0.045) | (0.041) | (0.042) | (0.042) | (0.041) |

**Table 7.**Useless auxiliary variable (Case 2): estimated risk functions of ${\widehat{\theta}}_{b}$, ${\widehat{\theta}}_{y}$, and ${\widehat{\theta}}_{best}$, and their standard errors in parenthesis.

n | 100 | 200 | 500 | 1000 | 2000 | 5000 |
---|---|---|---|---|---|---|

$2n\{{\mathcal{R}}_{x}({\widehat{\theta}}_{b})-{\mathcal{L}}_{x}({\theta}_{0})\}$ | 105.527 | 214.659 | 543.685 | 1091.105 | 2182.647 | 5452.623 |

(0.111) | (0.167) | (0.301) | (0.474) | (0.723) | (1.151) | |

$2n\{{\mathcal{R}}_{x}({\widehat{\theta}}_{y})-{\mathcal{L}}_{x}({\theta}_{0})\}$ | 7.831 | 7.412 | 7.326 | 7.247 | 7.211 | 7.266 |

(0.078) | (0.061) | (0.058) | (0.058) | (0.058) | (0.058) | |

$2n\{{\mathcal{R}}_{x}({\widehat{\theta}}_{best})-{\mathcal{L}}_{x}({\theta}_{0})\}$ | 22.064 | 11.555 | 7.375 | 7.247 | 7.211 | 7.266 |

(0.358) | (0.304) | (0.079) | (0.058) | (0.058) | (0.058) |

**Table 8.**Experiment average of ${n}_{te}\{\mathcal{L}({\widehat{\theta}}_{y})-{\mathcal{L}}_{x}({\widehat{\theta}}_{best})\}$ for each case of $Y={V}_{\ell}$, $\ell =1,\dots ,13$. Standard errors are in parenthesis.

Y | ${\mathit{V}}_{1}$ | ${\mathit{V}}_{2}$ | ${\mathit{V}}_{3}$ | ${\mathit{V}}_{4}$ | ${\mathit{V}}_{5}$ | ${\mathit{V}}_{6}$ | ${\mathit{V}}_{7}$ |
---|---|---|---|---|---|---|---|

${n}_{te}\{{\mathcal{L}}_{x}({\widehat{\theta}}_{y})-{\mathcal{L}}_{x}({\widehat{\theta}}_{best})\}$ | 0.13 | −0.14 | 89.71 | 46.24 | −1.76 | 3.34 | 76.54 |

(0.08) | (0.12) | (3.82) | (4.17) | (2.52) | (1.34) | (6.09) | |

$\mathit{Y}$ | ${\mathit{V}}_{\mathbf{8}}$ | ${\mathit{V}}_{\mathbf{9}}$ | ${\mathit{V}}_{\mathbf{10}}$ | ${\mathit{V}}_{\mathbf{11}}$ | ${\mathit{V}}_{\mathbf{12}}$ | ${\mathit{V}}_{\mathbf{13}}$ | |

${n}_{te}\{{\mathcal{L}}_{x}({\widehat{\theta}}_{y})-{\mathcal{L}}_{x}({\widehat{\theta}}_{best})\}$ | 13.91 | 39.45 | 1.72 | 111.24 | 15.48 | 0.23 | |

(2.21) | (3.12) | (0.29) | (8.46) | (2.11) | (0.09) |

