How Informative Is the Marginal Information in a 2 × 2 Table for Assessing the Association Between Variables? The Aggregate Informative Index
Abstract
:1. Introduction
“Let us blot out the contents of the table, leaving only the marginal frequencies. If it be admitted that these marginal frequencies by themselves supply no information on the point at issue, namely, as to the proportionality of the frequencies in the body of the table, we may recognize that we are concerned only with the relative probabilities of occurrence of the different ways in which the table can be filled in, subject to these marginal frequencies.”
“Choosing the method that is appropriate to a particular data set involves several considerations. Firstly, do the assumptions fit the data? All EI methods make assumptions about the data to compensate for the loss of information due to aggregation. Secondly, statistical evaluations of models can point to theoretically better alternatives…Thirdly, [the] plausibility and consistency of results are, in themselves, important indicators for success. Fourthly, testing EI methods relies on using empirical evaluation on a range of data sets.”
“As with any model, EI is built on assumptions, and these can be far off or right on target. The estimates therefore may also be far off or right on the true parameters. Substantive discussions of the results of EI should thus always include a discussion of the assumptions, how reasonable they are for the problem at hand, and how these assumptions drive the results. Excitement about the advances to ecological inference provided by EI should not be allowed to lead to insufficient attention to the strong and potentially inappropriate assumptions at the heart of the model. The model is useful if and only if the assumptions fit.”
“The conclusions of an ecological study should be carefully evaluated in order to assess whether they are biologically plausible, whether alternative explanations exist to interpret the results and whether all potential confounders were taken into account in the data analysis. When reading an ecological study, we should always be aware of the possibility of an ecological fallacy whereby potentially misleading causal inferences might be generated.”
“How informative is the marginal information for determining whether there exists a statistically significant association between the variables?”
2. Methods
2.1. The 2 × 2 Contingency Table
2.2. The Aggregate Association Index
2.3. The Aggregate Informative Index
2.3.1. The Benchmark Situation (No Information)
2.3.2. The New Index
3. Results
3.1. Analysis of Fisher’s Criminal Twin Data
3.1.1. The Data
3.1.2. On the Robustness of the New Index
3.1.3. Concerning Extreme Marginal Information
3.2. Analysis of Selikoff’s Asbestosis Data
4. Discussion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Fisher, R.A. The logic of inductive inference (with discussion). J. R. Stat. Assoc. Ser. A 1935, 98, 39–82. [Google Scholar] [CrossRef]
- Yates, F. Tests of significance for 2 × 2 contingency tables (with discussion). J. R. Stat. Soc. Ser. A 1984, 147, 426–463. [Google Scholar] [CrossRef]
- Plackett, R.L. The marginal totals of a 2 × 2 table. Biometrika 1977, 64, 37–42. [Google Scholar]
- Aitkin, M.; Hind, J.P. Comments to Yates’ “Tests of significance for 2 × 2 contingency tables”. J. R. Stat. Soc. Ser. A 1984, 147, 453–454. [Google Scholar]
- Barnard, G.A. Comments to Yates’ “Tests of significance for 2 × 2 contingency tables”. J. R. Stat. Soc. Ser. A 1984, 147, 449–450. [Google Scholar]
- Goodman, L.A. Ecological regressions and behaviour of individuals. Am. Sociol. Rev. 1953, 18, 663–664. [Google Scholar] [CrossRef]
- Goodman, L.A. Some alternatives to ecological correlation. Am. J. Sociol. 1959, 64, 610–625. [Google Scholar] [CrossRef]
- King, G. A Solution to Ecological Inference Problem; Princeton University Press: Princeton, NJ, USA, 1997. [Google Scholar]
- Cho, W.K.T. Iff the assumption fits. A comment on the King ecological inference solution. Political Anal. 1998, 7, 143–163. [Google Scholar]
- King, G. EI: A program for ecological inference. J. Stat. Softw. 2004, 11, 41. [Google Scholar] [CrossRef]
- Steel, D.G.; Beh, E.J.; Chambers, R.L. The information in aggregate data. In Ecological Inference: New Methodological Strategies; King, G., Rosen, O., Tanner, M., Eds.; Cambridge University Press: New York, NY, USA, 2004; pp. 51–68. [Google Scholar]
- Hudson, I.L.; Moore, L.; Beh, E.J.; Steel, D.G. Ecological inference techniques: An empirical evaluation using data describing gender and voter turnout at New Zealand elections 1893–1919. J. R. Stat. Soc. Ser. A 2010, 173, 185–213. [Google Scholar] [CrossRef]
- Greenland, S.; Robins, J. Ecologic studies–biases, misconceptions, and counterexamples. Am. J. Epidemiol. 1994, 8, 747–760. [Google Scholar] [CrossRef]
- Barreto, M.; Collingwood, L.; Garcia-Rios, S.; Oskooii, K.A.R. Estimating candidate support in voting rights act cases: Comparing iterative EI and EI-RxC methods. Sociol. Methods Res. 2022, 51, 271–304. [Google Scholar] [CrossRef]
- Papalia, R.B.; Vazquez, E.F. Entropy-based solutions for ecological inference problems: A composite estimator. Entropy 2020, 22, 781. [Google Scholar] [CrossRef]
- Roumeliotis, S.; ElHafeez, S.A.; Jager, K.J.; Dekker, F.W.; Stel, V.S.; Pitino, A.; Zoccali, C.; Tripepi, G. Be careful with ecological associations. Nephrology 2021, 26, 501–505. [Google Scholar] [CrossRef]
- Kim, S.; Lee, W. Discovering hidden statistical issues through individual-level models in ecological inference. J. Appl. Stat. 2019, 46, 2540–2552. [Google Scholar] [CrossRef]
- Geissbühler, M.; Hincapié, C.A.; Aghlmandi, S.; Zwahlen, M.; Jüni, P.; da Costa, B.R. Most published meta-regression analyses based on aggregate data suffer from methodological pitfalls: A meta-epidemiological study. BMC Med. Res. Methodol. 2021, 21, 123. [Google Scholar] [CrossRef]
- Pavía, J.M.; Romero, R. Improving estimates accuracy of voter transitions. Two new algorithms for ecological inference based on linear programming. Sociol. Methods Res. 2024, 53, 1491–1533. [Google Scholar] [CrossRef]
- Fisher, L.H.; Wakefield, J. Ecological inference for infectious disease data, with application to vaccination strategies. Stat. Med. 2020, 39, 220–238. [Google Scholar] [CrossRef]
- Ferree, K.E. Iterative approaches to R×C ecological inference problems: Where they can go wrong and one quick fix. Political Anal. 2004, 12, 143–159. [Google Scholar] [CrossRef]
- Greiner, D.J.; Quinn, K.M. R×C ecological inference: Bounds, correlations, flexibility and transparency of assumptions. J. R. Stat. Soc. Ser. A 2009, 172, 67–81. [Google Scholar] [CrossRef]
- Collingwood, L.; Oskooii, K.; Garcia-Rios, S.; Barreto, M. eiCompare: Comparing ecological inference estimates across EI and EI:R×C. R J. 2016, 8, 92–101. [Google Scholar] [CrossRef]
- Plescia, C.; De Sio, L. An evaluation of the performance and suitability of R×C methods for ecological inference with known true values. Qual. Quant. 2018, 52, 669–683. [Google Scholar] [CrossRef]
- Greiner, D.J.; Baines, P.; Quinn, K.M. R×CEcoInf: R×C Ecological Inference with Optional Incorporation of Survey Information (R Package Version 0.1-5). 2021. Available online: https://cran.r-project.org/web/packages/RxCEcolInf/index.html (accessed on 13 November 2024).
- Pavía, J.M.; Thomsen, S.R. ecolRxC: Ecological inference estimation of R×C tables using latent structure approaches. Political Sci. Res. Methods 2024, in press. [Google Scholar] [CrossRef]
- Pavía, J.M.; Romero, R. Data wrangling, computational burden, automation, robustness and accuracy in ecological inference forecasting of R×C tables. SORT 2023, 47, 151–186. [Google Scholar]
- Pavía, J.M.; Romero, R. Symmetry estimating R×C vote transfer matrices from aggregate data. J. R. Stat. Soc. Ser. A 2024, 187, 919–943. [Google Scholar] [CrossRef]
- Imai, K.; Lu, Y.; Strauss, A. eco: R package for ecological inference in 2 × 2 tables. J. Stat. Softw. 2011, 42, 23. [Google Scholar] [CrossRef]
- King, G.; Roberts, M. ei: Ecological Inference (R Package Version 1.3-3). 2016. Available online: https://cran.r-project.org/web/packages/ei/index.html (accessed on 13 November 2024).
- Lau, O.; Moore, R.T.; Kellerman, M. eiPack: Ecological Inference and Higher-Dimension Data Management (R Package Version 0.2-2). 2023. Available online: https://cran.r-project.org/web/packages/eiPack/index.html (accessed on 13 November 2024).
- Forcina, A.; Pavía, J.M. eiCircles: Ecological Inference of R×C Tables by Overdispersed-Multinomial Models (R Package Version 0.0.1-7). 2024. Available online: https://cran.r-project.org/web/packages/eiCircles/index.html (accessed on 13 November 2024).
- Pavía, J.M.; Romero, R. lphom: Ecological Inference by Linear Programming Under Homogeneity (R Package Version 0.3.5-5). 2024. Available online: https://cran.r-project.org/web/packages/lphom/index.html (accessed on 13 November 2024).
- Beh, E.J. Correspondence analysis of aggregate data: The 2 × 2 table. J. Stat. Plan. Inference 2008, 138, 2941–2952. [Google Scholar] [CrossRef]
- Beh, E.J. The aggregate association index. Comput. Stat. Data Anal. 2010, 54, 1570–1580. [Google Scholar] [CrossRef]
- Fréchet, M. Sur les tableaux de corrélation dont les marges sont données. Ann. Univ. Lyon Sect. A Sér. 3 1951, 14, 53–77. [Google Scholar]
- Beh, E.J.; Tran, D.; Hudson, I.L. A reformulation of the aggregate association index using the odds ratio. Comput. Stat. Data Anal. 2013, 68, 52–65. [Google Scholar] [CrossRef]
- Beh, E.J.; Cheema, S.A.; Tran, D.; Hudson, I.L. Adjustment to the aggregate association index to minimize the impact of large samples. In Advances in Latent Variables; Carpita, M., Brentari, E., Qannari, E.M., Eds.; Springer: Berlin/Heidelberg, Germany, 2015; pp. 241–251. [Google Scholar]
- Beh, E.J.; Tran, D.; Hudson, I.L. A generalization of the aggregate association index (AAI): Incorporating a linear transformation of the cells of a 2 × 2 table. Metrika 2024, 87, 499–531. [Google Scholar] [CrossRef]
- Tran, D.; Beh, E.J.; Hudson, I.L. The aggregate association index applied to stratified 2 × 2 tables: Application to the 1893 election data in New Zealand. Stat. J. IAOS 2018, 34, 379–394. [Google Scholar] [CrossRef]
- Beh, E.J.; Tran, D.; Hudson, I.L.; Moore, L. Clustering of stratified aggregated data using the aggregate association index: Analysis of New Zealand voter turnout (1893–1919). In Analysis and Modeling Complex Data in Behavioral and Social Sciences; Vicari, D., Okada, A., Ragozini, G., Weihs, C., Eds.; Springer: Cham, Switzerland, 2014; pp. 21–28. [Google Scholar]
- Fairburn, M.; Olssen, E. Class, Gender and the Vote: Historical Perspectives from New Zealand; University of Otago Press: Dunedin, New Zealand, 2013. [Google Scholar]
- Moore, L. Gender Counts: Men, Women and Electoral Politics in New Zealand, 1893–1919. Unpublished Master’s Thesis, University of Canterbury, Christchurch, New Zealand, 2004. Available online: https://ir.canterbury.ac.nz/items/6cbad7e6-bf0f-4eb6-bf27-6c4c0cc29367/full (accessed on 13 November 2024).
- Selikoff, I.J. Household risk with inorganic fibers. Bull. N. Y. Acad. Med. 1981, 57, 947–961. [Google Scholar]
- Duncan, O.D.; Davis, B. An alternative to ecological correlation. Am. Sociol. Rev. 1953, 18, 665–666. [Google Scholar] [CrossRef]
- Everitt, B.S. The Analysis of Contingency Tables, 2nd ed.; Chapman & Hall: London, UK, 1992. [Google Scholar]
- Mosteller, F. Association and estimation in contingency tables. J. Am. Stat. Assoc. 1968, 63, 1–28. [Google Scholar] [CrossRef]
- Beh, E.J.; Smith, D.R. Real world occupational epidemiology, Part 1: Odds ratios, relative risk, and asbestosis. Arch. Environ. Occup. Health 2011, 66, 119–123. [Google Scholar] [CrossRef]
- Tran, D.; Beh, E.J.; Smith, D.R. Real-world occupational epidemiology, Part 3: An aggregate data analysis of Selikoff’s “20-year rule”. Arch. Environ. Occup. Health 2012, 67, 243–248. [Google Scholar] [CrossRef]
- Haber, M. Do the marginal total of a 2 × 2 contingency table contain information regarding the table proportion? Commun. Stat. Theory Methods 1989, 18, 147–156. [Google Scholar] [CrossRef]
- Bowker, A.H. A test for symmetry in contingency tables. J. Am. Stat. Assoc. 1948, 43, 572–598. [Google Scholar] [CrossRef]
Column 1 | Column 2 | Total | |
---|---|---|---|
Row 1 | |||
Row 2 | |||
Total |
Convicted | Not Convicted | Total | |
---|---|---|---|
Monozygotic | 10 | 3 | 13 |
Dizygotic | 2 | 15 | 17 |
Total | 12 | 18 | 30 |
Sample Size (n) | AII | AAI |
---|---|---|
30 | 54.93 | 69.40 |
50 | 52.98 | 75.83 |
100 | 51.49 | 84.89 |
250 | 50.60 | 92.84 |
500 | 50.30 | 96.13 |
1000 | 50.15 | 97.96 |
2500 | 50.06 | 99.15 |
5000 | 50.03 | 99.56 |
Onset of Exposure | Asbestosis | Total | |
---|---|---|---|
Yes | No | ||
0–19 years | 522 | 203 | 725 |
20+ years | 53 | 339 | 392 |
Total | 575 | 542 | 1117 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cheema, S.; Beh, E.J.; Hudson, I.L. How Informative Is the Marginal Information in a 2 × 2 Table for Assessing the Association Between Variables? The Aggregate Informative Index. Mathematics 2024, 12, 3719. https://doi.org/10.3390/math12233719
Cheema S, Beh EJ, Hudson IL. How Informative Is the Marginal Information in a 2 × 2 Table for Assessing the Association Between Variables? The Aggregate Informative Index. Mathematics. 2024; 12(23):3719. https://doi.org/10.3390/math12233719
Chicago/Turabian StyleCheema, Salman, Eric J. Beh, and Irene L. Hudson. 2024. "How Informative Is the Marginal Information in a 2 × 2 Table for Assessing the Association Between Variables? The Aggregate Informative Index" Mathematics 12, no. 23: 3719. https://doi.org/10.3390/math12233719
APA StyleCheema, S., Beh, E. J., & Hudson, I. L. (2024). How Informative Is the Marginal Information in a 2 × 2 Table for Assessing the Association Between Variables? The Aggregate Informative Index. Mathematics, 12(23), 3719. https://doi.org/10.3390/math12233719