# Diagnosing a 12-Item Dataset of Raven Matrices: With Dexter

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Data

#### 2.2. Methods

- the usual statistics of classical test theory (CTT) Lord and Novick (1968);
- distractor plots, i.e., nonparametric regressions of each response alternative on the sum score;

- the empirical regression, shown with pink dots and representing, simply, the proportion of correct responses to the item (or the mean item score, for partial credit items), at each test score;
- the regression predicted by the Rasch (or partial credit) model, shown as a thin black line;
- the regression predicted by Haberman’s interaction model, shown as a thicker gray line.

## 3. Results

## 4. Discussion

## Funding

## Acknowledgments

## Conflicts of Interest

## Abbreviations

2PL | Two-parameter logistic (model) |

3PL | Three-parameter logistic (model) |

CTT | Classical test theory |

IM | Interaction model |

IRF | Item response function |

IRT | Item response theory |

ITR | Item-total regression |

PCM | Partial credit model |

SPM-LS | Standard Progressive Matrices (last series) |

## Appendix A

library(dexter) | # load the dexter library |

setwd(’~/WD/Raven’) | # set the work directory |

keys = data.frame( | # data frame as required |

item_id = sprintf(’SPM%02d’, 1:12), | # by keys_to_rules function |

noptions = 8, | |

key = c(7,6,8,2,1,5,1,6,3,2,4,5) | # (the correct responses) |

) | |

rules = keys_to_rules(keys) | # scoring rules as reqd by dexter |

db = start_new_project(rules, ’raven.db’) | # data base from the rules |

dat = read.csv(’dataset.csv’, head=TRUE) | # read in data... |

add_booklet(db, dat, ’r’) | # ... and add to the data base |

tia_tables(db) | # tables of CTT statistics |

mo = fit_inter(db) | # fit the Rasch and the IM |

plot(mo) | # produce all ITR plots |

distractor_plot(db,’SPM01’) | # distractor plot for item SPM01 |

## Appendix B

**Table A1.**Parameter estimates for the 3PL model obtained for the SPM-LS dataset with three different programs.

Item | Mirt Estimates | Ltm Estimates | BILOG-MG Estimates | ||||||
---|---|---|---|---|---|---|---|---|---|

a | b | c | a | b | c | a | b | c | |

SPM01 | 0.85 | −1.55 | 0.00 | 0.87 | −1.51 | 0.00 | 0.83 | −1.57 | 0.00 |

SPM02 | 1.93 | −1.82 | 0.00 | 2.00 | −1.76 | 0.00 | 2.00 | −1.80 | 0.00 |

SPM03 | 1.61 | −1.24 | 0.00 | 1.66 | −1.21 | 0.00 | 1.62 | −1.24 | 0.00 |

SPM04 | 3.65 | −1.01 | 0.00 | 4.31 | −0.95 | 0.00 | 3.60 | −1.02 | 0.00 |

SPM05 | 4.70 | −1.11 | 0.00 | 5.59 | −1.04 | 0.00 | 4.57 | −1.13 | 0.00 |

SPM06 | 2.26 | −0.89 | 0.00 | 2.36 | −0.86 | 0.00 | 2.23 | −0.91 | 0.00 |

SPM07 | 1.55 | −0.75 | 0.02 | 1.57 | −0.76 | 0.00 | 1.55 | −0.75 | 0.02 |

SPM08 | 1.58 | −0.29 | 0.00 | 1.62 | −0.29 | 0.00 | 1.57 | −0.28 | 0.00 |

SPM09 | 2.28 | 0.19 | 0.24 | 2.27 | 0.18 | 0.23 | 2.27 | 0.19 | 0.24 |

SPM10 | 2.09 | 0.35 | 0.00 | 2.15 | 0.34 | 0.00 | 1.88 | 0.39 | 0.00 |

SPM11 | 5.83 | 0.63 | 0.11 | 32.28 | 0.67 | 0.12 | 6.04 | 0.63 | 0.11 |

SPM12 | 3.39 | 0.90 | 0.14 | 3.25 | 0.88 | 0.14 | 3.35 | 0.91 | 0.14 |

**Table A2.**Parameter estimates for the 3PL model obtained for the SPM-LS dataset with BILOG-MG and three different settings.

Item | Priors on a and c | Prior on a | No Prior | ||||||
---|---|---|---|---|---|---|---|---|---|

a | b | c | a | b | c | a | b | c | |

SPM01 | 0.90 | −1.29 | 0.11 | 0.85 | −1.53 | 0.00 | 0.83 | −1.57 | 0.00 |

SPM02 | 1.93 | −1.75 | 0.11 | 1.97 | −1.80 | 0.00 | 2.00 | −1.80 | 0.00 |

SPM03 | 1.65 | −1.13 | 0.10 | 1.61 | −1.24 | 0.00 | 1.62 | −1.24 | 0.00 |

SPM04 | 3.23 | −1.01 | 0.06 | 3.36 | −1.03 | 0.00 | 3.60 | −1.02 | 0.00 |

SPM05 | 3.85 | −1.13 | 0.06 | 3.97 | −1.15 | 0.00 | 4.57 | −1.13 | 0.00 |

SPM06 | 2.34 | −0.82 | 0.07 | 2.21 | −0.90 | 0.00 | 2.23 | −0.91 | 0.00 |

SPM07 | 1.64 | −0.62 | 0.10 | 1.49 | −0.80 | 0.00 | 1.55 | −0.75 | 0.02 |

SPM08 | 1.67 | −0.18 | 0.07 | 1.58 | −0.29 | 0.00 | 1.57 | −0.28 | 0.00 |

SPM09 | 1.79 | 0.05 | 0.16 | 1.91 | 0.10 | 0.19 | 2.27 | 0.19 | 0.24 |

SPM10 | 2.18 | 0.41 | 0.03 | 1.85 | 0.38 | 0.00 | 1.88 | 0.39 | 0.00 |

SPM11 | 3.97 | 0.64 | 0.10 | 3.98 | 0.63 | 0.10 | 6.04 | 0.63 | 0.11 |

SPM12 | 2.63 | 0.91 | 0.13 | 2.61 | 0.91 | 0.13 | 3.35 | 0.91 | 0.14 |

## Appendix C

`fit_domains()`, for the analysis of subtests within the test. The function transforms the items belonging to each subtest, or domain, into one large partial credit item. Such ‘polytomisation’, as discussed by Verhelst and Verstralen (2008), is a simple and efficient way to deal with testlets. The formal, constructed, and homogeneous nature of the SPM-LS test makes it a good candidate for some further experimentation. Note that I am not proposing a new method—I am just being curious.

**Figure A1.**Category trace lines for partial credit items obtained by combining the original items SPM01 and SPM07 (Item 1), SPM02 and SPM08 (Item 2) etc. The partial credit model is shown with thinner and darker lines, and the polytomous IM with broader and lighter lines of the same hue.

**Figure A2.**Item-total regressions for partial credit items obtained by combining the original items SPM01 and SPM07 (Item 1), SPM02 and SPM08 (Item 2) etc. Observed data is shown with pink dots, the PCM with thin black lines, and the interaction model with thick gray lines.

**Figure A3.**Item-total regressions for partial credit items obtained by combining triplets of items. Observed data is shown with pink dots, the PCM with thin black lines, and the interaction model with thick gray lines.

**Figure A4.**Item-total regressions for partial credit items obtained by combining quadruples of items. Observed data is shown with pink dots, the PCM with thin black lines, and the interaction model with thick gray lines.

**Figure A5.**Item-total regressions for two subtests of six items each. Observed data is shown with pink dots, the PCM with thin black lines, and the interaction model with thick gray lines.

## References

- American Psychological Association. 2010. Publication Manual of the American Psychological Association, 6th ed. Washington: American Psychological Association. [Google Scholar]
- Andersen, Erling B. 1973. Conditional inference for multiple-choice questionnaires. British Journal of Mathematical and Statistical Psychology 26: 31–44. [Google Scholar] [CrossRef]
- ATP Tour Inc. 2020. The 2020 ATP
^{®}Official Rulebook. Available online: https://www.atptour.com/en/corporate/rulebook (accessed on 1 April 2020). - Azevedo, C. L. N. 2009. Some Observations on the Identification and Interpretation of the 3PL IRT Model. Measurement: Interdisciplinary Research and Perspectives 7: 89–91. [Google Scholar] [CrossRef]
- Bock, R. Darrell, and Murray Aitkin. 1981. Marginal Maximum Likelihood Estimation of Item Parameters: Application of an EM Algorithm. Psychometrika 46: 443–59. [Google Scholar] [CrossRef]
- Bock, R. Darrell. 1972. Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika 37: 29–51. [Google Scholar] [CrossRef]
- Brouwers, S. A., F. J. van de Vijver, and D. A. van Hemert. 2009. Variation in Raven’s Progressive Matrices scores across time and place. Learning and Individual Differences 19: 330–38. [Google Scholar] [CrossRef][Green Version]
- Carroll, Lewis. 1865. Alice’s Adventures in Wonderland. London: MacMillan. [Google Scholar]
- Chalmers, Robert P. 2012. mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software 48: 1–29. [Google Scholar] [CrossRef][Green Version]
- Chalmers, Robert P., A. Counsell, and D. B. Flora. 2016. It Might Not Make a Big DIF: Improved Differential Test Functioning Statistics That Account for Sampling Variability. Educational and Psychological Measurement 76: 114–40. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Dorans, Neil J. 2012. The Contestant Perspective on Taking Tests: Emanations From the Statue Within. Educational Measurement: Issues and Practice 31: 20–37. [Google Scholar] [CrossRef]
- Garcia-Garzon, Eduardo, Francisco J. Abad, and Luis E. Garrido. 2019. Searching for G: A New Evaluation of SPM-LS Dimensionality. Journal of Intelligence 7: 14. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Glas, Cees A. W. 2009. What IRT Can and Cannot Do. Measurement: Interdisciplinary Research and Perspectives 7: 91–93. [Google Scholar] [CrossRef]
- González, Jorge, and Marie Wiberg. 2017. Applying Test Equating Methods: Using R. Berlin/Heidelberg: Springer. [Google Scholar] [CrossRef]
- Gulliksen, Harold. 1950. Theory of Mental Tests. Hoboken: Wiley. [Google Scholar]
- Haberman, Shelby J. 2007. The Interaction Model. In Multivariate and Mixture Distribution Rasch Models: Extensions and Applications. Edited by M. von Davier and C. H. Carstensen. New York: Springer, chap. 13. pp. 201–16. [Google Scholar]
- Kolen, Michael J., and Robert L. Brennan. 2014. Test Equating, Scaling, and Linking: Methods and Practices, 3rd ed. New York: Springer. [Google Scholar] [CrossRef]
- Koops, Jesse, Eva de Schipper, Ivailo Partchev, Gunter Maris, and Timo Bechger. 2019. dextergui: A Graphical User Interface for Dexter, (Version 0.2.0); R Package. Available online: https://cran-r.project.org (accessed on 1 April 2020).
- Leighton, J. P., and M. J. E. Gierl. 2007. Cognitive Diagnostic Assessment for Education: Theory and Applications. Cambridge: Cambridge University Press. [Google Scholar] [CrossRef]
- Lord, Frederic M. 1980. Applications of Item Response Theory to Practical Testing Problems. Hillsdale: Lawrence Erlbaum. [Google Scholar]
- Lord, Frederic M., and Melvin R. Novick. 1968. Statistical Theories of Mental Test Scores (with Contributions by A. Birnbaum). Reading: Addison-Wesley. [Google Scholar]
- Maris, Gunter, and Timo Bechger. 2009. On Interpreting the Model Parameters for the Three Parameter Logistic Model. Measurement: Interdisciplinary Research and Perspectives 7: 75–88. [Google Scholar] [CrossRef]
- Maris, Gunter, Timo Bechger, Jesse Koops, and Ivailo Partchev. 2019. Dexter: Data Management and Analysis of Tests, (Version 1.0.1); R Package. Available online: https://cran-r.project.org (accessed on 1 April 2020).
- Masters, Geoffrey N. 1982. A Rasch Model for Partial Credit Scoring. Psychometrika 47: 149–74. [Google Scholar] [CrossRef]
- Myszkowski, Neil, and M. Storme. 2018. A snapshot of g? Binary and polytomous item-response theory investigations of the last series of the Standard Progressive Matrices (SPM-LS). Intelligence 68: 109–16. [Google Scholar] [CrossRef]
- Partchev, Ivailo. 2009. 3PL: A Useful Model with a Mild Estimation Problem. Measurement: Interdisciplinary Research and Perspectives 7: 94–96. [Google Scholar] [CrossRef]
- R Core Team. 2013. R: A Language and Environment for Statistical Computing; Vienna, Austria: R Foundation for Statistical Computing. Available online: http://www.R-project.org/ (accessed on 1 April 2020).
- Rasch, Georg. 1980. Probabilistic Models for Some Intelligence and Attainment Tests. Chicago: University of Chicago Press. First published 1960. [Google Scholar]
- Raven, J. C. 1941. Standardization of Progressive Matrices, 1938. British Journal of Medical Psychology 19: 137–50. [Google Scholar] [CrossRef]
- Rizopoulos, Dimitris. 2006. ltm: An R package for Latent Variable Modelling and Item Response Theory Analyses. Journal of Statistical Software 17: 1–25. [Google Scholar] [CrossRef][Green Version]
- San Martín, Ernesto, Jorge González, and Francis Tuerlinckx. 2009. Identified Parameters, Parameters of Interest and Their Relationships. Measurement: Interdisciplinary Research and Perspectives 7: 97–105. [Google Scholar] [CrossRef]
- Sinharay, Sandip, and Shelby J. Haberman. 2014. How Often Is the Misfit of Item Response Theory Models Practically Significant? Educational Measurement: Issues and Practice 33: 23–35. [Google Scholar] [CrossRef]
- Thissen, David. 1976. Information in Wrong Responses to the Raven Progressive Matrices. Journal of Educational Measurement 13: 201–14. [Google Scholar] [CrossRef]
- Thissen, David. 2009. On Interpreting the Parameters for any Item Response Model. Measurement: Interdisciplinary Research and Perspectives 7: 106–10. [Google Scholar] [CrossRef]
- Verhelst, Norman D. 2019. Exponential Family Models for Continuous Responses. In Theoretical and Practical Advances in Computer-Based Educational Measurement. Edited by B. P. Veldkamp and C. Sluijter. Berlin/Heidelberg: Springer, chap. 7. pp. 135–59. [Google Scholar]
- Verhelst, Norman D., and Huub Verstralen. 2008. Some Considerations on the Partial Credit Model. Psicologica: International Journal of Methodology and Experimental Psychology 29: 229–54. [Google Scholar]
- von Davier, Matthias. 2009. Is There Need for the 3PL Model? Guess What? Measurement: Interdisciplinary Research and Perspectives 7: 110–14. [Google Scholar] [CrossRef]
- von Davier, Alina, ed. 2011. Statistical Models for Test Equating, Scaling, and Linking. Berlin/Heidelberg: Springer. [Google Scholar] [CrossRef]
- von Davier, Alina, Paul W. Holland, and Dorothy T. Thayer. 2004. The Kernel Method of Test Equating. New York: Springer. [Google Scholar] [CrossRef]
- Zimowski, Michelle. F., Eiji Muraki, Rober J. Mislevy, and R. Darrell Bock. 1996. BILOG–MG. Multiple-Group IRT Analysis and Test Maintenance for Binary Items. Chicago: SSI Scientific Software International. [Google Scholar]

**Figure 1.**Example plot comparing three item-total regressions for the fourth item. Pink dots show the observed regression (in this case, proportion of correct responses at each distinct total score), predictions from the Rasch model are shown with a thin black line, and those from the interaction model with a thick gray line.

**Figure 2.**Item facility (

**left**) and correlation with the rest score (

**right**) by position of the item in the SPM-LS test.

**Figure 3.**Item-total regressions for the items in the SPM-LS test obtained from the data (pink dots), the Rasch model (thin black lines), and the interaction model (thick gray lines).

**Figure 4.**Non-parametric option-total regressions (distractor plots) for the twelve items in the SPM-LS test. The title of each plot shows the item label, in which booklet the item appears, and in what position. The legend shows the actual responses and the scores they will be given. Response alternatives that do not show up have not been chosen by any person.

Item | Facility | rit | rir |
---|---|---|---|

SPM01 | 0.76 | 0.43 | 0.30 |

SPM02 | 0.91 | 0.48 | 0.40 |

SPM03 | 0.80 | 0.56 | 0.46 |

SPM04 | 0.82 | 0.66 | 0.58 |

SPM05 | 0.86 | 0.65 | 0.57 |

SPM06 | 0.76 | 0.66 | 0.56 |

SPM07 | 0.70 | 0.59 | 0.47 |

SPM08 | 0.58 | 0.63 | 0.52 |

SPM09 | 0.57 | 0.57 | 0.44 |

SPM10 | 0.39 | 0.63 | 0.51 |

SPM11 | 0.36 | 0.55 | 0.42 |

SPM12 | 0.32 | 0.48 | 0.34 |

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Partchev, I. Diagnosing a 12-Item Dataset of Raven Matrices: With Dexter. *J. Intell.* **2020**, *8*, 21.
https://doi.org/10.3390/jintelligence8020021

**AMA Style**

Partchev I. Diagnosing a 12-Item Dataset of Raven Matrices: With Dexter. *Journal of Intelligence*. 2020; 8(2):21.
https://doi.org/10.3390/jintelligence8020021

**Chicago/Turabian Style**

Partchev, Ivailo. 2020. "Diagnosing a 12-Item Dataset of Raven Matrices: With Dexter" *Journal of Intelligence* 8, no. 2: 21.
https://doi.org/10.3390/jintelligence8020021