# Evaluating an Automated Number Series Item Generator Using Linear Logistic Test Models

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Automatic Item Generation

#### 1.2. Inductive Reasoning

#### 1.3. Automatic Number Series Item Generator

#### 1.4. ANSIG Item Models

#### 1.5. numGen R Package

#### 1.6. Statistically Modelling ANSIG

#### 1.7. Construct Validity

## 2. Method

#### 2.1. Measures

#### 2.2. Participants

#### 2.3. Data Analyses

## 3. Results

#### 3.1. Rasch Model Fit

^{2}(48) = 45.83, p = 0.56), suggesting that the remaining item models fitted the Rasch model well. Items covered a wide range of difficulty parameter, which in a logit scale, goes from −3.80 (easiest item) to 4.01 (most difficulty item). This can be seen in the ICC plot with regard to their horizontal locations ranging from theta −4 to 4 in Figure 1 respectively, with each coloured sigmoid line representing an item at different levels of ability.

#### 3.2. Item and Item Model Variation in Difficulty

#### 3.3. Test Information

#### 3.4. LLTM(s) Comparison with Different Q-Matrices

^{2}= 0.96. The models were nested within each other: This is not obvious from the Q-matrices but all columns of the Holzman et al. [23] Q-matrix are linear combinations of the columns in the newly revised Q-matrix (see Table 1). A likelihood ratio test for nested models (χ

^{2}(2) = 115.9, p < 0.0001) and information criteria indicated superior fit of the newly revised cognitive operators (AIC = 14,901.8 and BIC = 14,955.2 for newly revised vs. AIC = 15,013.7 and BIC = 15,051.9 for Holzman et al. [23].

#### 3.5. Difficulty Prediction with the LLTM and LLTM Plus Error

^{2}(1) = 1444, p < 0.001). Since the discrepancies between the parameter estimates of all models were not substantial, the LLTM plus error is the preferred choice from a model comparison perspective.

#### 3.6. Goodness of Fit between the Rasch, LLTM and LLTM Plus Error

#### 3.7. Item and Person Parameter Estimates

^{2}= 0.69. The results suggest that the LLTM(s) accounted for 69% of the total variance of the item difficulty parameters estimated by the Rasch model. Finally, the person parameter correlations between the models are reported in Table 9.

#### 3.8. Nomothetic Span

## 4. Discussion

^{5}) combinations. Thus, more item models should be included in future research that will result in greater variations in the number series items. Maximising the number of item models will provide researchers an opportunity to study interaction effects between the interrelated cognitive operators involved in the task solution and evaluate the structural relationships between the individual, the task and the result of their interaction.

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Embretson, S.; Yang, X. 23 Automatic Item Generation and Cognitive Psychology. Handb. Stat.
**2006**, 26, 747–768. [Google Scholar] - Irvine, S.; Kyllonen, P. Generating Items for Cognitive Tests: Theory and Practice; Erlbaum: Mahwah, NJ, USA, 2002. [Google Scholar]
- Gierl, M.J.; Lai, H. Instructional topics in educational measurement (ITEMS) module: Using automated processes to generate test items. Educ. Meas. Issues Pract.
**2013**, 32, 36–50. [Google Scholar] [CrossRef] - LaDuca, A.; Staples, W.I.; Templeton, B.; Holzman, G.B. Item modelling procedure for constructing content-equivalent multiple choice questions. Med. Educ.
**1986**, 20, 53–56. [Google Scholar] [PubMed] - Arendasy, M.E.; Sommer, M. Evaluating the contribution of different item features to the effect size of the gender difference in three-dimensional mental rotation using automatic item generation. Intelligence
**2010**, 38, 574–581. [Google Scholar] [CrossRef] - Bejar, I.I. A Generative Analysis of a Three-Dimensional Spatial Task. Appl. Psychol. Meas.
**1990**, 14, 237–245. [Google Scholar] [CrossRef] - Embretson, S.E. Generating abstract reasoning items with cognitive theory. In Item Generation for Test Development; Irvine, S.H., Kyllonen, P.C., Eds.; Lawrence Erlbaum Associates: Mahwah, NJ, USA, 2002; pp. 219–250. [Google Scholar]
- Freund, P.A.; Holling, H. Creativity in the classroom: A multilevel analysis investigating the impact of creativity and reasoning ability on GPA. Creat. Res. J.
**2008**, 20, 309–318. [Google Scholar] [CrossRef] - Arendasy, M.; Sommer, M. The effect of different types of perceptual manipulations on the dimensionality of automatically generated figural matrices. Intelligence
**2005**, 33, 307–324. [Google Scholar] - Blum, D.; Holling, H.; Galibert, M.S.; Forthmann, B. Task difficulty prediction of figural analogies. Intelligence
**2016**, 56, 72–81. [Google Scholar] [CrossRef] - Arendasy, M.E.; Sommer, M.; Mayr, F. Using automatic item generation to simultaneously construct German and English versions of a word fluency test. J. Cross-Cult. Psychol.
**2012**, 43, 464–479. [Google Scholar] [CrossRef] - Zeuch, N.; Holling, H.; Kuhn, J.T. Analysis of the Latin Square Task with linear logistic test models. Learn. Individ. Differ.
**2011**, 21, 629–632. [Google Scholar] [CrossRef] - Arendasy, M.; Sommer, M.; Gittler, G.; Hergovich, A. Automatic Generation of Quantitative Reasoning Items. J. Individ. Differ.
**2006**, 27, 2–14. [Google Scholar] [CrossRef] - Carroll, J.B. The higher-stratum structure of cognitive abilities: Current evidence supports g and about ten broad factors. In The Scientific Study of General Intelligence: Tribute to Arthur R. Jensen; Nyborg, H., Ed.; Pergamon: San Diego, CA, USA, 2003; pp. 5–22. [Google Scholar]
- Sternberg, R.; Sternberg, K. Cognitive Psychology; Nelso n Education: Scarborough, ON, Canada, 2016. [Google Scholar]
- Raven, J.C. Progressive Matrices: A Perceptual Test of Intelligence; HK Lewis: London, UK, 1938; Volume 19, p. 20. [Google Scholar]
- Spearman, C. The Nature of Intelligence and the Principles of Cognition; Macmillan and Co.: London, UK, 1923. [Google Scholar]
- Leighton, J.P.; Sternberg, R.J. Reasoning and problem solving. In Handbook of Psychology; Wiley: New York, NY, USA, 2003. [Google Scholar]
- Thorndike, R.L.; Hagen, E. Cognitive Abilities Test; Houghton Mifflin: Boston, MA, USA, 1971. [Google Scholar]
- Thurstone, T.G.; Thurstone, L.L. Primary Mental Abilities Tests; Science Research Associates: Chicago, IL, USA, 1962. [Google Scholar]
- Wechsler, D. Wechsler Intelligence Scale for Children—Fifth Edition (WISC-V); Pearson: Bloomington, MN, USA, 2014. [Google Scholar]
- Simon, H.A.; Kotovsky, K. Human acquisition of concepts for sequential patterns. Psychol. Rev.
**1963**, 70, 534–546. [Google Scholar] [CrossRef] [PubMed] - Holzman, T.G.; Pellegrino, J.W.; Glaser, R. Cognitive variables in series completion. J. Educ. Psychol.
**1983**, 75, 603–618. [Google Scholar] [CrossRef] - LeFevre, J.A.; Bisanz, J. A cognitive analysis of number-series problems: Sources of individual differences in performance. Mem. Cogn.
**1986**, 14, 287–298. [Google Scholar] [CrossRef] - Verguts, T.; Maris, E.; De Boeck, P. A dynamic model for rule induction tasks. J. Math. Psychol.
**2002**, 46, 455–485. [Google Scholar] [CrossRef] - Arendasy, M.E.; Sommer, M. Using automatic item generation to meet the increasing item demands of high-stakes educational and occupational assessment. Learn. Individ. Differ.
**2012**, 22, 112–117. [Google Scholar] [CrossRef] - Gierl, M.J.; Lai, H. The role of item models in automatic item generation. Int. J. Test.
**2012**, 12, 273–298. [Google Scholar] [CrossRef] - Lemay, S.; Bédard, M.A.; Rouleau, I.; Tremblay, P.L. Practice effect and test-retest reliability of attentional and executive tests in middle-aged to elderly subjects. Clin. Neuropsychol.
**2004**, 18, 284–302. [Google Scholar] [CrossRef] [PubMed] - Sun, K.T.; Lin, Y.C.; Huang, Y.M. An Efficient Genetic Algorithm for Item Selection Strategy. Available online: https://www.researchgate.net/profile/Koun-Tem_Sun/publication/228531836_An_efficient_genetic_algorithm_for_item_selection_strategy/links/09e4150b6b5700a519000000/An-efficient-genetic-algorithm-for-item-selection-strategy.pdf (accessed on 30 June 2016).
- Bejar, I.I.; Lawless, R.R.; Morley, M.E.; Wagner, M.E.; Bennett, R.E.; Revuelta, J. A feasibility study of on-the-fly item generation in adaptive testing. ETS Res. Rep. Ser.
**2002**, 2002. [Google Scholar] [CrossRef] - Nathan, M.J.; Koedinger, K.R. Teachers’ and researchers’ beliefs about the development of algebraic reasoning. J. Res. Math. Educ.
**2000**, 31, 168–190. [Google Scholar] [CrossRef] - Nathan, M.J.; Petrosino, A. Expert blind spot among preservice teachers. Am. Educ. Res. J.
**2003**, 40, 905–928. [Google Scholar] [CrossRef] - Drasgow, F.; Luecht, R.M.; Bennett, R. Technology and testing. In Educational Measurement, 4th ed.; Brennan, R.L., Ed.; American Council on Education: Washington, DC, USA, 2006; pp. 471–516. [Google Scholar]
- Kotovsky, K.; Simon, H.A. Empirical tests of a theory of human acquisition of concepts for sequential patterns. Cogn. Psychol.
**1973**, 4, 399–424. [Google Scholar] - Cho, S.J.; De Boeck, P.; Embretson, S.; Rabe-Hesketh, S. Additive multilevel item structure models with random residuals: Item modeling for explanation and item generation. Psychometrika
**2014**, 79, 84–104. [Google Scholar] [CrossRef] [PubMed] - Glas, C.A.W.; van der Linden, W.J. Computerized adaptive testing with item cloning. Appl. Psychol. Meas.
**2003**, 27, 247–261. [Google Scholar] [CrossRef] - Millman, J.; Westman, R.S. Computer assisted writing of achievement test items: Toward a future technology. J. Educ. Meas.
**1989**, 26, 177–190. [Google Scholar] - Roid, G.H.; Haladyna, T.M. Toward a Technology of Test-Item Writing; Academic: New York, NY, USA, 1982. [Google Scholar]
- Freund, P.A.; Hofer, S.; Holling, H. Explaining and controlling for the psychometric properties of computer- generated figural matrix items. Appl. Psychol. Meas.
**2008**, 32, 195–210. [Google Scholar] - Holling, H.; Bertling, J.P.; Zeuch, N. Probability word problems: Automatic item generation and LLTM modelling. Stud. Educ. Eval.
**2009**, 35, 71–76. [Google Scholar] - Daniel, R.C.; Embretson, S.E. Designing cognitive complexity in mathematical problem-solving items. Appl. Psychol. Meas.
**2010**, 34, 348–364. [Google Scholar] [CrossRef] - Carpenter, P.A.; Just, M.A.; Shell, P. What one intelligence test measures: A theoretical account of the processing in the Raven Progressive Matrices Test. Psychol. Rev.
**1990**, 97, 404–431. [Google Scholar] [CrossRef] [PubMed] - Babcock, R.L. Analysis of age differences in types of errors on the Raven’s Advanced Progressive Matrices. Intelligence
**2002**, 30, 485–503. [Google Scholar] [CrossRef] - Loe, B.S. numGen: Number Series Generator. R Package Version 0.1.0. 2017. Available online: https://CRAN.R-project.org/package=numGen (accessed on 30 June 2016).
- Glas, C.A.W.; van der Linden, W.J. Modeling variability in item parameters in CAT. Presented at the North American Psychometric Society Meeting, Valley Forge, PA, USA, 2001. [Google Scholar]
- Sinharay, S.; Johnson, M.S.; Williamson, D.M. Calibrating item families and summarizing the results using family expected response functions. J. Educ. Behav. Stat.
**2003**, 28, 295–313. [Google Scholar] [CrossRef] - Geerlings, H.; Glas, C.A.W.; van der Linden, W.J. Modeling rule-based item generation. Psychometrika
**2011**, 76, 337–359. [Google Scholar] [CrossRef] - Birnbaum, A. Test scores, sufficient statistics and the information structures of tests. In Statistical Theories of Mental Test Scores; Lord, L., Novick, M., Eds.; Addison-Wesley: Reading, MA, USA, 1968; pp. 425–435. [Google Scholar]
- Fischer, G.H. The linear logistic test model as an instrument in educational research. Acta Psychol.
**1973**, 37, 359–374. [Google Scholar] [CrossRef] - Janssen, R.; Schepers, J.; Peres, D. Models with item and item group predictors. In Explanatory Item Response Models; Springer: New York, NY, USA, 2004; pp. 189–212. [Google Scholar]
- Whitely, S.E. Construct validity: Construct representation versus nomothetic span. Psychol. Bull.
**1983**, 93, 179–197. [Google Scholar] [CrossRef] - Embretson, S.; Gorin, J. Improving construct validity with cognitive psychology principles. J. Educ. Meas.
**2001**, 38, 343–368. [Google Scholar] [CrossRef] - Cyders, M.A.; Coskunpinar, A. Measurement of constructs using self-report and behavioral lab tasks: Is there overlap in nomothetic span and construct representation for impulsivity? Clin. Psychol. Rev.
**2011**, 31, 965–982. [Google Scholar] [CrossRef] [PubMed] - Flynn, J.R. What Is Intelligence? Beyond the Flynn Effect; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
- Condon, D.M.; Revelle, W. The International Cognitive Ability Resource: Development and initial validation of a public-domain measure. Intelligence
**2014**, 43, 52–64. [Google Scholar] [CrossRef] - R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2017; Available online: https://www.R-project.org/ (accessed on 17 September 2016).
- Stillwell, D.; Lis, P.; Sun, L. Concerto: Open-Source On-Line R-Based Adaptive Testing Platform [Computer Software]. 2015. Available online: http://www. psychometrics.cam.ac.uk (accessed on 17 June 2016).
- Cronbach, L.J. Coefficient alpha and the internal structure of tests. Psychometrika
**1951**, 16, 297–334. [Google Scholar] [CrossRef] - Andrich, D. An index of person separation in latent trait theory, the traditional KR. 20 index and the Guttman scale response pattern. Educ. Res. Perspect.
**1982**, 9, 95–104. [Google Scholar] - Dandurand, F.; Shultz, T.R.; Onishi, K.H. Comparing online and lab methods in a problem-solving experiment. Behav. Res. Methods
**2008**, 40, 428–434. [Google Scholar] [CrossRef] [PubMed] - Litman, L.; Robinson, J.; Abberbock, T. TurkPrimecom: A versatile crowdsourcing data acquisition platform for the behavioral sciences. Behav. Res. Methods
**2017**, 49, 433–442. [Google Scholar] [CrossRef] [PubMed] - Behrend, T.S.; Sharek, D.J.; Meade, A.W.; Wiebe, E.N. The viability of crowdsourcing for survey research. Behav. Res. Methods
**2011**, 43, 800–813. [Google Scholar] [CrossRef] [PubMed] - Buhrmester, M.; Kwang, T.; Gosling, S.D. Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspect. Psychol. Sci.
**2011**, 6, 3–5. [Google Scholar] [CrossRef] [PubMed] - Horton, J.J.; Rand, D.G.; Zeckhauser, R.J. The online laboratory: Conducting experiments in a real labor market. Exp. Econ.
**2011**, 14, 399–425. [Google Scholar] [CrossRef] - Lord, F. Applications of Item Response Theory to Practical Testing Problems; Routledge: Hillsdale, NJ, USA, 1980. [Google Scholar]
- Fischer, G.H.; Ponocny, I. An extension of the partial credit model with an application to the measurement of change. Psychometrika
**1994**, 59, 177–192. [Google Scholar] [CrossRef] - Mair, P.; Hatzinger, R. Extended Rasch modeling: The eRm package for the application of IRT models in R. J. Stat. Softw.
**2007**, 20, 1–20. [Google Scholar] [CrossRef] - Bates, D.; Maechler, M.; Bolker, B.M.; Walker, S.C. Fitting Linear Mixed-Effects Models Using lme4. J. Stat. Softw.
**2015**, 67, 1–48. [Google Scholar] [CrossRef] - De Boeck, P.; Bakker, M.; Zwitser, R.; Nivard, M.; Hofman, A.; Tuerlinckx, F.; Partchev, I. The estimation of item response models with the lmer function from the lme4 package in R. J. Stat. Softw.
**2011**, 39, 1–28. [Google Scholar] [CrossRef] - Doran, H.; Bates, D.; Bliese, P.; Dowling, M. Estimating the multilevel Rasch model: With the lme4 package. J. Stat. Softw.
**2007**, 20, 1–18. [Google Scholar] [CrossRef] - Glas, C.A.W.; Verhelst, N.D. Testing the Rasch model. In Rasch Models: Foundations, Recent Developments and Applications; Fischer, G.H., Molenaar, I.W., Eds.; Springer: New York, NY, USA, 1995; pp. 69–96. [Google Scholar]
- Andersen, E.B. A goodness of fit test for the Rasch model. Psychometrika
**1973**, 38, 123–140. [Google Scholar] [CrossRef] - Suárez-Falcón, J.C.; Glas, C.A. Evaluation of global testing procedures for item fit to the Rasch model. Br. J. Math. Stat. Psychol.
**2003**, 56, 127–143. [Google Scholar] [CrossRef] [PubMed] - Scheiblechner, H. Personality and system influences on behavior in groups: Frequency models. Acta Psychol.
**1972**, 36, 322–336. [Google Scholar] [CrossRef] - Kubinger, K.D. On the revival of the Rasch model-based LLTM: From constructing tests using item generating rules to measuring item administration effects. Psychol. Sci.
**2008**, 50, 311–327. [Google Scholar] - Van den Noortgate, W.; De Boeck, P.; Meulders, M. Cross-classification multilevel logistic models in psychometrics. J. Educ. Behav. Stat.
**2003**, 28, 369–386. [Google Scholar] [CrossRef] - Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control
**1974**, 19, 716–723. [Google Scholar] [CrossRef] - Schwarz, G. Estimating the dimension of a model. Ann. Stat.
**1978**, 6, 461–464. [Google Scholar] [CrossRef] - Vrieze, S.I. Model selection and psychological theory: A discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychol. Methods
**2012**, 17, 228–243. [Google Scholar] [CrossRef] [PubMed] - Baghaei, P.; Kubinger, K.D. Linear Logistic Test Modeling with R. Pract. Assess. Res. Eval.
**2015**, 20, 1–11. [Google Scholar] - Baker, F.B. Sensitivity of the linear logistic test model to misspecification of the weight matrix. Appl. Psychol. Meas.
**1993**, 17, 201–210. [Google Scholar] [CrossRef] - Van der Linden, W.J.; Hambleton, R.K. Handbook of Modern Item Response Theory; Springer: New York, NY, USA, 2013. [Google Scholar]
- Fischer, G.H.; Formann, A.K. Some applications of logistic latent trait model with linear constraints on the parameters. Appl. Psychol. Meas.
**1982**, 6, 396–416. [Google Scholar] [CrossRef] - Janssen, R.; De Boeck, P. Confirmatory analyses of componential test structure using multidimensional item response theory. Multivar. Behav. Res.
**1999**, 34, 245–268. [Google Scholar] [CrossRef] [PubMed] - Gierl, M.J.; Lai, H.; Hogan, J.B.; Matovinovic, D. A Method for Generating Educational Test Items that are Aligned to the Common Core State Standards. J. Appl. Test. Technol.
**2015**, 16, 1–18. [Google Scholar] - Embretson, S.E. Generating items during testing: Psychometric issues and models. Psychometrika
**1999**, 64, 407–433. [Google Scholar] [CrossRef] - Fischer, G.H.; Pendl, P. Individualized Testing on the Basis of Dichotomous Rasch Model. In Psychometrics for Educational Debates; Van der Kamp, L.J.T., Langerak, W.F., de Gruijter, D.N.M., Eds.; John Wiley Et Sons: Chichester, UK, 1980; pp. 171–187. [Google Scholar]

1 |

**Figure 2.**Item difficulty variations between and within item models using Rasch estimates. CO = cognitive operators.

Index | Cognitive Operator | Description | Example | Relation Detection | Discovery of Periodicity | Pattern Description | Extrapolation |
---|---|---|---|---|---|---|---|

1 | Apprehension of succession | Identification of the missing value is determined by the immediate preceding value. | 1,2,3,4,5,6 | Low | Low | Low | Low |

2 | Identification of parallel sequences | Two parallel sequences are inherent within an item that forms two number series. | 1,2,1,4,1,6 | High | |||

3 | Object cluster formation | The missing value is determined by the relationship within groups of elements. | 1,1,1,2,2,2 | High | |||

4 | Non-progressive coefficient patterns | Identification of the missing value is influenced by the difference between two preceding values. | 2,4,6,8,10 | Middle | Middle | ||

5 | Complex progressive coefficient patterns | The missing value increases or decreases largely based on more than one arithmetic operation or increasing values. | 1,2,4,7,11 | High | High |

Item Model | Sample Item | Task Objective | Item Logic |
---|---|---|---|

1 | 10 20 30 40 (50) | Elementary understanding of sequence succession | Simple linear sequences which do not require use of advanced arithmetic operations, such as ordered multiples of 1, 10, or 100. |

Example: A sequence of ordered multiples of 10. | |||

2 | 1 1 1 5 5 (5) | Understanding of object clusters | Sequences consist of elements belonging to two homogeneous groups with equal number of elements. Missing element belongs to the group with fewer elements present in the sequence. |

Example: Ordered groups of 1s and 5s. Number 5 added to the sequence results in equal number of elements in the two groups. | |||

3 | 1 2 4 8 16 (32) | Use of basic algebraic skills | Each element in the sequence is derived from the preceding by applying one of four basic arithmetic operations—addition, subtraction, multiplication, or division. Coefficient of change is invariant across the sequence. |

Example: A sequence of elements using a multiplication of 2. | |||

4 | 1 10 2 20 3 30 (4) (40) | Identification of co-occurring relationships between elements (minimum use of arithmetic skills) | Sequences that consist of regularly alternating parallel sub-sequences. Understanding of succession requires minimum use of algebraic skills. Sub-sequences involve items from item model 1. |

Example: Odd elements of the sequence are multiples of 1 and even elements of the sequence are multiples of 10. | |||

5 | 2 7 4 14 8 28 16 (56) (32) | Identification of co-occurring relationships between elements (with use of arithmetic skills) | Logic analogous to the item model 4 but at least one sub-sequence involves the basic arithmetic operations. Sequences combine items from item models 1 and 3. |

Example: Both odd and even elements of the sequence are multiplied by 2 but with different starting values. | |||

6 | 2 4 7 11 16 (22) | Identification of progressively evolving coefficients of change | Non-linear progressive sequences which require a higher level of abstraction; the coefficient of change between two neighbouring elements is not invariable and its elements form a new sequence. The coefficient sequences correspond to items from item models 1 and 3. |

Example: The coefficient of change between each pair of neighbouring elements in the sequence increases by 1. | |||

7 | 3 10 24 52 108 (220) | Identification of complex coefficients of change | Ability to identify complex coefficients; the coefficient of change involves a combination of arithmetic operations (e.g., addition and multiplication) applied serially. |

Example: Each element in the sequence is derived from the preceding by adding two and multiplying the result by two. | |||

8 | 1 3 8 10 207 (209) | Identification of non-successive relationships within a sequence | Sequences consist of pairs (or triads) of elements which share common features, while the values across pairs (triads) are unrelated. |

Example: A sequence formed by three pairs of elements. The difference between elements in each pair equals two. Individual pairs are not otherwise related. | |||

9 | 1 1 2 3 5 8 (13) | Identification of relationships within a chain of elements | Progressive sequences which involve relationships between multiple preceding objects (e.g., Fibonacci sequence). |

Example: Each element of the sequence is a result of addition of its two preceding elements. | |||

10 | 2 15 4 17 7 19 11 21 16 (23) (22) | Combined identification of parallel sub-sequences and progressively evolving coefficients of change | Logic analogous to the item model 4 but at least one sub-sequence involves a progressively evolving coefficient. Sub-sequences involve items from item models 1, 3 and 6. |

Example: The coefficient of change between odd elements in the sequence increases by 1. The even elements increase by 2. | |||

11 | 1 7 14 20 40 46 (92) (98) | Identification of alternating coefficients of change | Progressively evolving sequences whose elements develop following multiple alternating rules (e.g., addition for even elements and multiplication for odd elements). |

Example: A sequence whose coefficient of change alternates between (+6) and (×2). | |||

12 | 1 22 44 2 66 88 3 110 (132) (4) | Identification of unevenly ordered sub-sequences | Logic analogous to the item model 4 but sub-sequences follow irregular pattern: S_{1}, S_{2}, S_{2}, S_{1}, S_{2}, S_{2}, S_{1}, S_{2}, S_{2}. Sub-sequences involve items from item models 1, 3 and 6. |

Example: Sub-sequences with coefficients of (+1) and (+22) ordered according to the pattern above. | |||

13 | 1 5 8 3 209 212 5 41 (44) (7) | Combined identification of unevenly ordered sub-sequences and non-successive relationships between elements | Logic analogous to the item model 12 but the second sequence belongs to the item model 8. As a result, pairs of elements following certain rule are embedded into a progressive sequence. |

Example: Sequence with coefficient of (+2) is interposed with pairs of elements which differ by 3. |

Form A (n = 396) | Form B (n = 174) | |
---|---|---|

Gender | ||

Male | 124 (31.3%) | 51 (29.3%) |

Female | 270 (68.2%) | 121 (69.5%) |

Prefer not to say | 2 (0.5%) | 2 (1.2%) |

Nationality | ||

American | 337 (85.1%) | 146 (83.9%) |

Others | 57 (14.4%) | 26 (14.9%) |

Prefer not to say | 2 (0.5%) | 2 (1.2%) |

Education | ||

Doctorate | 9 (2.3%) | 6 (3.5%) |

Master’s Degree | 67 (16.9%) | 25 (14.4%) |

Bachelor’s Degree | 150 (37.9%) | 67 (38.5%) |

Vocational Qualifications | 55 (13.9%) | 32 (18.4%) |

At least Primary Education | 103 (26%) | 42 (23.6%) |

Prefer not to say | 2 (0.5%) | 2 (1.2%) |

Item Model | Apprehension of Succession | Parallel Sequences | Cluster Formation | Non-Progressive Coefficient Patterns | Progressive Coefficient Patterns |
---|---|---|---|---|---|

1 | 1 | 0 | 0 | 0 | 0 |

2 | 0 | 0 | 1 | 0 | 0 |

3 | 1 | 0 | 0 | 1 | 0 |

4 | 0 | 1 | 0 | 0 | 0 |

5 | 0 | 1 | 0 | 1 | 0 |

6 | 1 | 0 | 0 | 0 | 1 |

7 | 1 | 0 | 0 | 0 | 1 |

8 | 0 | 0 | 1 | 1 | 0 |

9 | 1 | 0 | 1 | 1 | 0 |

10 | 0 | 1 | 0 | 0 | 1 |

11 | 1 | 0 | 1 | 1 | 0 |

12 | 0 | 1 | 1 | 1 | 0 |

13 | 0 | 1 | 1 | 1 | 0 |

**Table 5.**Q-matrix of the cognitive operators proposed by Holzman et al. [23].

Item Model | Relation Detection | Discovery of Periodicity | Pattern Description | Extrapolation |
---|---|---|---|---|

1 | 0 | 0 | 0 | 0 |

2 | 1 | 0 | 0 | 0 |

3 | 0 | 0 | 1 | 1 |

4 | 0 | 1 | 0 | 0 |

5 | 0 | 1 | 1 | 1 |

6 | 0 | 0 | 2 | 2 |

7 | 0 | 0 | 2 | 2 |

8 | 1 | 0 | 1 | 1 |

9 | 1 | 0 | 1 | 1 |

10 | 0 | 1 | 2 | 2 |

11 | 1 | 0 | 1 | 1 |

12 | 1 | 1 | 1 | 1 |

13 | 1 | 1 | 1 | 1 |

**Table 6.**Cognitive operator estimates in the linear logistic test model (LLTM) and LLTM + ε predicting item easiness.

Effects | Parameter | LLTM | LLTM + ε | ||
---|---|---|---|---|---|

Estimate | SE | Estimate | SE | ||

Fixed effects | Constant | 4.77 *** | 0.16 | 5.32 *** | 0.85 |

AOS | 0.35 *** | 0.07 | 0.37 | 0.58 | |

PS | −1.53 *** | 0.07 | −1.83 ** | 0.59 | |

CF | −2.13 *** | 0.06 | −2.25 *** | 0.41 | |

NPCP | −2.65 *** | 0.15 | −3.09 *** | 0.71 | |

PCP | −3.76 *** | 0.14 | −4.28 *** | 0.69 | |

LLTM | LLTM + ε | ||||

Variance | Std. Dev | Variance | Std. Dev | ||

Random effects | θj (persons) | 1.19 | 1.09 | 1.67 | 1.29 |

ɛi (item) | - | - | 1.03 | 1.01 |

Model | No. of Parameters | AIC | BIC |
---|---|---|---|

Rasch | 50 | 13,321 | 13,702 |

LLTM | 7 | 14,902 | 14,955 |

LLTM + ε | 8 | 13,460 | 13,521 |

Item | Item Model | Rasch Estimate | Std. Error | LLTM | Bootstrap SE | LLTM + ε | Bootstrap SE |
---|---|---|---|---|---|---|---|

1 | 3 | −2.62 | 0.18 | −2.46 | 0.07 | −2.60 | 0.34 |

2 | 3 | −2.86 | 0.19 | −2.46 | 0.07 | −2.60 | 0.34 |

3 | 3 | −2.39 | 0.16 | −2.46 | 0.07 | −2.60 | 0.34 |

4 | 3 | −2.20 | 0.16 | −2.46 | 0.07 | −2.60 | 0.34 |

5 | 3 | −2.06 | 0.15 | −2.46 | 0.07 | −2.60 | 0.34 |

6 | 4 | −3.10 | 0.24 | −3.24 | 0.14 | −3.49 | 0.58 |

7 | 4 | −2.72 | 0.21 | −3.24 | 0.14 | −3.49 | 0.58 |

8 | 4 | −3.80 | 0.31 | −3.24 | 0.14 | −3.49 | 0.58 |

9 | 5 | −0.91 | 0.21 | −0.59 | 0.07 | −0.40 | 0.36 |

10 | 5 | −0.57 | 0.20 | −0.59 | 0.07 | −0.40 | 0.36 |

11 | 5 | −0.19 | 0.19 | −0.59 | 0.07 | −0.40 | 0.36 |

12 | 5 | 3.80 | 0.32 | −0.59 | 0.07 | −0.40 | 0.36 |

13 | 5 | −0.61 | 0.20 | −0.59 | 0.07 | −0.40 | 0.36 |

14 | 6 | −0.97 | 0.14 | −1.36 | 0.06 | −1.40 | 0.31 |

15 | 6 | −1.83 | 0.17 | −1.36 | 0.06 | −1.40 | 0.31 |

16 | 6 | 0.16 | 0.13 | −1.36 | 0.06 | −1.40 | 0.31 |

17 | 6 | −1.73 | 0.16 | −1.36 | 0.06 | −1.40 | 0.31 |

18 | 7 | −0.76 | 0.21 | −1.36 | 0.06 | −1.40 | 0.31 |

19 | 7 | −0.19 | 0.19 | −1.36 | 0.06 | −1.40 | 0.31 |

20 | 7 | 0.39 | 0.19 | −1.36 | 0.06 | −1.40 | 0.31 |

21 | 7 | 0.36 | 0.19 | −1.36 | 0.06 | −1.40 | 0.31 |

22 | 7 | −1.26 | 0.22 | −1.36 | 0.06 | −1.40 | 0.31 |

23 | 8 | 0.67 | 0.11 | 0.01 | 0.06 | 0.01 | 0.50 |

24 | 8 | 0.18 | 0.11 | 0.01 | 0.06 | 0.01 | 0.50 |

25 | 8 | 0.19 | 0.11 | 0.01 | 0.06 | 0.01 | 0.50 |

26 | 8 | 0.85 | 0.11 | 0.01 | 0.06 | 0.01 | 0.50 |

27 | 9 | −1.08 | 0.22 | −0.33 | 0.07 | −0.35 | 0.27 |

28 | 9 | −1.50 | 0.24 | −0.33 | 0.07 | −0.35 | 0.27 |

29 | 9 | −0.87 | 0.21 | −0.33 | 0.07 | −0.35 | 0.27 |

30 | 9 | −0.87 | 0.21 | −0.33 | 0.07 | −0.35 | 0.27 |

31 | 9 | −0.50 | 0.20 | −0.33 | 0.07 | −0.35 | 0.27 |

32 | 10 | 1.97 | 0.15 | 0.51 | 0.06 | 0.80 | 0.32 |

33 | 10 | −0.42 | 0.13 | 0.51 | 0.06 | 0.80 | 0.32 |

34 | 10 | −0.86 | 0.14 | 0.51 | 0.06 | 0.80 | 0.32 |

35 | 10 | 1.55 | 0.14 | 0.51 | 0.06 | 0.80 | 0.32 |

36 | 10 | 1.09 | 0.13 | 0.51 | 0.06 | 0.80 | 0.32 |

37 | 11 | 1.62 | 0.20 | −0.33 | 0.07 | −0.35 | 0.27 |

38 | 11 | 0.11 | 0.19 | −0.33 | 0.07 | −0.35 | 0.27 |

39 | 11 | 0.51 | 0.19 | −0.33 | 0.07 | −0.35 | 0.27 |

40 | 11 | 0.39 | 0.19 | −0.33 | 0.07 | −0.35 | 0.27 |

41 | 11 | 1.29 | 0.19 | −0.33 | 0.07 | −0.35 | 0.27 |

42 | 12 | 2.98 | 0.18 | 1.54 | 0.07 | 1.84 | 0.29 |

43 | 12 | 4.01 | 0.25 | 1.54 | 0.07 | 1.84 | 0.29 |

44 | 12 | 1.76 | 0.14 | 1.54 | 0.07 | 1.84 | 0.29 |

45 | 13 | 1.10 | 0.19 | 1.54 | 0.07 | 1.84 | 0.29 |

46 | 13 | 3.80 | 0.32 | 1.54 | 0.07 | 1.84 | 0.29 |

47 | 13 | 3.91 | 0.33 | 1.54 | 0.07 | 1.84 | 0.29 |

48 | 13 | 1.13 | 0.19 | 1.54 | 0.07 | 1.84 | 0.29 |

49 | 13 | 3.08 | 0.26 | 1.54 | 0.07 | 1.84 | 0.29 |

Models | Rasch | LLTM | LLTM + ε |
---|---|---|---|

Rasch | 1 | - | - |

LLTM | 0.998 | 1 | - |

LLTM + ε | 1 | 0.998 | 1 |

**Table 10.**Correlations between the factor scores of the number series items and the 16-item International Cognitive Ability Resource short form overall and individual item types.

Variable | Numeric Series Ability (Form A) | Form A (Adjusted) | Numeric Series Ability (Form B) | Form B (Adjusted) |
---|---|---|---|---|

16-item ICAR Short Form Test | 0.60 *** | 0.79 *** | 0.66 ** | 0.84 *** |

Verbal Reasoning (4 items) | 0.36 *** | 0.56 *** | 0.34 *** | 0.62 *** |

Letter-Number (4 items) | 0.42 *** | 0.64 *** | 0.41 *** | 0.58 *** |

3D Rotation (4 items) | 0.33 *** | 0.46 *** | 0.45 *** | 0.55 *** |

Matrix Reasoning (4 items) | 0.40 *** | 0.63 *** | 0.28 ** | 0.49 *** |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Loe, B.S.; Sun, L.; Simonfy, F.; Doebler, P. Evaluating an Automated Number Series Item Generator Using Linear Logistic Test Models. *J. Intell.* **2018**, *6*, 20.
https://doi.org/10.3390/jintelligence6020020

**AMA Style**

Loe BS, Sun L, Simonfy F, Doebler P. Evaluating an Automated Number Series Item Generator Using Linear Logistic Test Models. *Journal of Intelligence*. 2018; 6(2):20.
https://doi.org/10.3390/jintelligence6020020

**Chicago/Turabian Style**

Loe, Bao Sheng, Luning Sun, Filip Simonfy, and Philipp Doebler. 2018. "Evaluating an Automated Number Series Item Generator Using Linear Logistic Test Models" *Journal of Intelligence* 6, no. 2: 20.
https://doi.org/10.3390/jintelligence6020020