# Automated Test Assembly in R: The eatATA Package

^{1}

^{2}

^{3}

^{*}

^{†}

## Abstract

**:**

`R`package

`eatATA`, which allows using several mixed-integer programming solvers for automated test assembly in

`R`. We describe the general functionality and the common work flow of

`eatATA`using a minimal example. We also provide four more elaborate use cases of automated test assembly: (a) The assembly of multiple test forms for a pilot study; (b) the assembly of blocks of items for a multiple matrix booklet design in the context of a large-scale assessment; (c) the assembly of two linear test forms for individual diagnostic purposes; (d) the assembly of multi-stage testing modules for individual diagnostic purposes. All use cases are accompanied with example item pools and commented

`R`code.

## 1. Theoretical Background

`R`package

`eatATA`(educational assessment tools: Automated Test Assembly) and this tutorial we try to give easy access to ATA to more practitioners. The paper is structured as follows: First, we give a short introduction to why an

`R`package is suitable in this context. We then give an overview of the functionalities of

`eatATA`and which solvers are accessible via the package, and illustrate the general work flow when using

`eatATA`for automated test assembly with a minimal example. Subsequently, we provide four practical use cases alongside detailed and commented

`R`code to illustrate the package functionality in depth.

## 2. eatATA

`R`[7] is a common tool for psychometric and statistical analyses.

`R`is an open source and free software environment and its extensive and actively maintained libraries offer tools for a rich diversity of data analysis use cases. Furthermore, a variety of mathematical programming solvers are available through

`R`, including both open source and commercial solvers. These solvers can usually be accessed via packages that function as APIs (application programming interfaces) to a specific solver, and the solver itself is often included directly in the respective package (e.g.,

`lpSolveAPI`is an API to the solver

`lpSolve`). This, in principal, enables researchers to use

`R`for test assembly purposes. For example, a short tutorial on how ATA can be used in

`R`using

`lpSolveAPI`[8] can be seen in Diao and van der Linden [9]. However, while such an implementation is possible, the translation of test specifications into mathematical constraints can be an interesting challenge for some, but a cumbersome task for others.

`R`package

`eatATA`[10] and this tutorial paper we want to promote ATA methods and provide easier access to ATA for measurement practitioners. The package facilitates the access to mathematical programming and its potential for ATA-problems without worrying about how test specifications are formulated mathematically. In the spirit of the

`R`programming language, the functionality of the package is based on functions. Every test specification can be expressed by a function, thereby enabling a work flow in

`R`that will feel familiar for practitioners with

`R`experience.

`GLPK`[11],

`lpSolve`[12],

`SYMPHONY`[13], and

`Gurobi`[14]. These solvers are used via the R package APIs

`Rglpk`[15],

`lpSolve`[16],

`Rsymphony`[17], and

`gurobi`[18]. For a general overview of different available open source and commercial MIP solvers see, for example, Donoghue [19] and Luo [20].

#### 2.1. Work Flow

`eatATA`is the following:

- (1)
- Item Pool: A
`data.frame`including all information on the item pool is loaded or created. If the items have already been calibrated (e.g., based on data from a pilot study) this will include the calibrated item parameters. - (2)
- Test Specifications: Usually a combination of: (a) Typically one objective and (b) multiple constraints.
- (a)
- Objective Function: Usually a single object corresponding to the optimization goal, created via one of the objective function functions. This refers to a test specification where we have no absolute criterion, but where we want to minimize or maximize something.
- (b)
- Further Constraints: Further constraint objects, created using various constraint functions. These refer to test specifications with a fixed value or an upper and/or lower bound.

- (3)
- Solver Call: The
`useSolver()`function is called using the constraint objects to find an optimal solution. - (4)
- Solution Processing: The solution can be inspected using the
`inspectSolution()`and`appendSolution()`functions.

#### 2.2. Minimal Example

- (1)
- Item Pool

`eatATA`package (

`items_mini`). In general, item pool information should be stored in a single

`data.frame`with each row representing an item. In the example item pool, items are characterized by their format (

`“format”`), average response times (

`“time”`), and a difficulty parameter (

`“difficulty”`), based on a calibration according to a Rasch model [21]. To calculate the item information function (IIF) we use the

`calculateIIF()`function (see Figure 1). Alternatively, the

`calculateIIF()`function could be used to calculate the IIF for the item parameters from the 2 and 3 parameter logistic models. (In principal, any response model can be used within

`eatATA`and there exist various

`R`packages to calculate IIFs for a wide range of response models.) We provide the item parameters and one or multiple ability points (

`theta`) at which the item information function should be calculated. In our case, we are interested only in the information function at a medium ability, so we set

`theta = 0`and append the IIF to our item pool

`data.frame`. The resulting first five rows of the item pool can be seen in Table 1.

- (2a)
- Objective Function

`maxObjective()`function (see Figure 2). This is achieved via maximizing the sum of the IIFs of the items in the test form. (As a Rasch model has been used for calibration, maximizing the TIF at ability level 0 corresponds to minimizing the difference of the average item difficulty from 0.) Note that item identifiers should be supplied to all objective function and constraint functions. This guarantees that all constraints relate to the same set of items and provides a more readable solver output. Other available functions for defining optimization goals are:

`minObjective()`,

`maximinObjective()`,

`minimaxObjective()`, and

`cappedMaximinObjective()`.

- (2b)
- Constraints

`itemsPerFormConstraint()`function. This function specifically serves the purpose of setting the test length for the test assembly. Using the

`operator`and the

`targetValue`arguments we can set a fixed target value or an upper or lower bound. The total test time is constrained to approximately eight minutes using the

`itemsValuesDeviationConstraint()`function. This function belongs to a family of functions that can be used to set constraints using numerical item values. By setting

`allowedDeviaton = 5`we allow the testing time to vary between 7 min and 55 s and 8 min and 5 s. Note that setting the test time to be exactly eight minutes would be overly restrictive and not necessary from a practical stand point. An overview over the available constraint functions and their functionality can be found here: https://CRAN.R-project.org/package=eatATA/vignettes/overview.html (accessed on 20 May 2021).

- (3)
- Solver Call

`list`and hand these to the solver of our choice via the

`useSolver()`function (see Figure 4). The order in which the constraints (including the objective function) are created or ordered does not have any impact on the solution of the test assembly problem.

`timeLimit`argument. In cases where the time limit is reached and where at least one feasible solution is found, but the search of the total solution space is incomplete, the function returns the best available solution. In most practical applications the quality of this solution will be absolutely sufficient. As this illustrative example is a very simple ATA problem for which

`GLPK`finds a solution almost instantly, it is not necessary to set a time limit for the solver. The solver used for ATA can be specified via the

`solver`argument, with

`“GLPK”`being the default.

`useSolver()`will issue a corresponding message. To identify which (combination of) constraints causes the infeasibility, it can be helpful to remove constraints from the ATA problem step by step until feasibility is achieved. Alternatively, constraints can be added to the ATA problem step by step starting with just the objective function until the problem becomes infeasible [22].

- (4)
- Solution Processing

`eatATA`provides two functions to process the output of

`useSolver()`:

`inspectSolution()`to directly view the assembled test forms presented in a list that only contains the items in the assembled test form(s), and

`appendSolution()`, which appends the assignment matrix containing 0 (item not in this test form) and 1 (item in this test form) to the item pool

`data.frame`(see Figure 5).

## 3. Use Cases

`eatATA`package to test assembly problems. The use cases were chosen to cover a broad range of contexts for ATA application: (1) A pilot study setting in which we assemble multiple test forms while depleting the item pool (without prior item calibration), (2) a typical large-scale assessment situation, in which calibrated items are assembled to blocks for a multiple matrix booklet design, (3) the assembly of multiple parallel test forms for a high-stakes assessments from a calibrated item pool, and (4) the assembly of modules from a calibrated item pool for a multi-stage assessment. To illustrate the accessibility of

`eatATA`compared to plain solver API’s use case (3) and (4) correspond to two of the problems used in the tutorial paper by Diao and van der Linden [9]. Because the solver calls and the solution processing do not differ much between the minimal example and the different use cases, we primarily focus on how the constraint and objective function definitions have to be altered from application to application. Complete syntaxes for all uses cases can be found in the corresponding Supplementary Files.

#### 3.1. Pilot Study

`items_pilot`, which is included in the

`eatATA`package. The item pool consists of 100 items with various characteristics, for example the expected response times in seconds (

`“time”`), the item format (

`“format”`), and a rough estimate of the item difficulty (

`“diffCategory”`), grouped into five categories. The first five items of the item pool can be seen in Appendix B Table A1. From this item pool, we want to assemble test forms that meet the following requirements: (1) each item should appear in exactly one test form (this implies no item overlap between test forms), (2) all items should be used (item pool depletion), (3) the expected test form response times should be as close to 10 minutes as possible, (4) the number of test forms should be determined accordingly, (5) item difficulty categories and items formats should be distributed as evenly as possible across test forms, (6) each content domain should be at least once in each test form, and (7) item exclusions should be incorporated.

`ceiling()`function, resulting in eight test forms. The actual objective function is defined via the

`minimaxObjective()`function, which allows us to specify a

`targetValue`. The maximum difference of test form times from this target value is then minimized.

`itemUsageConstraint()`. The operator argument is set to

`“=”`which means that every item will occur exactly once across all test forms. Test specification (5) refers to the difficulty column

`“diffCategory”`as well as to the item format column

`“format”`. For item difficulty, we define the column

`“diffCategory”`to be a factor variable, as we do not want the numerical mean value to be equal across test forms but the distribution of distinct difficulty levels. We use the function

`autoItemValuesMinMaxConstraint()`to determine the required

`targetValues`automatically, after which the function directly calls the respective constraint functions using the calculated

`targetValues`. By default, the function returns the resulting minimum and maximum levels. For example, for item difficulty, items of difficulty category 1 will occur once or twice in each test form. Alternatively, for item formats, the

`“cmc”`format will occur four or five times in each test form. Test specification (6) requires that each domain occurs at least once in each test form. Using the

`itemCategoryMinConstraint()`and the

`min`argument, we define for each of the three categories (levels) of

`domain`(

`“listening”`,

`“reading”`,

`“writing”`) the minimum occurrence frequency.

`“exclusions”`column of the

`items_pilot data.frame`. The column contains item exclusions as a single character string for each item. The items in the data set have either no exclusions (

`NA`), only one exclusion (e.g.,

`“76”`), or multiple exclusions (e.g.,

`“70, 64”`). As there are items with multiple exclusions, we need to separate the string into discrete item identifiers via the function

`itemTuples()`, which produces pairs (tuples) of exclusive items (also called enemy items). Using the

`sepPattern`argument in the

`itemTuples()`function, the user must specify the pattern, which separates the item identifiers within the string. These tuples can be used to define exclusion constraints in the

`itemExclusionConstraint()`function. The complete code for the pilot study use case, including the solver call and the solution inspection, can be seen in Supplement S1.

#### 3.2. LSA Blocks for Multiple Matrix Booklet Designs

`eatATA`and the reader is referred to the literature on booklet designs [24,25].

`eatATA`package (

`items_lsa`). The first 10 items of this item pool can be seen in Appendix C Table A2. The assembled item blocks should conform to the following test specifications: (1) blocks should contain as many well fitting items as possible, (2) hierarchical stimulus item structures should be incorporated, (3) no item overlap, (4) a fixed set of anchor items has to be included in the block assembly (if LSAs intend to measure trends between different times of measurement, new assessment cycles partially reuse items from former studies, so-called anchor items, to establish a common scale [26]. Usually, anchor items are chosen beforehand based on their advantageous psychometric properties), (5) the average item block times should be around 20 min, (6) difficulty levels should be distributed evenly across item blocks, (7) all blocks should contain at least three different item formats, and (8) maximally two items per block should have an average proportion of correct responses below 8 or above 92 percent.

`“infit”`. As we are only interested in absolute deviations from 1 (otherwise positive and negative deviations could cancel each other out) we create a new variable,

`infitDev`. The deviation of this variable from 0 is then minimized using the

`minimaxObjective()`function.

`eatATA`package, options (a) and (b) are implemented and option (b) is chosen for this specific use case. Option (b) can indeed be implemented very similarly to the item exclusion constraints that were introduced in the pilot study use case. Inclusion tuples are built using the function

`stemInclusionTuples()`and then provided to the

`itemInclusionConstraint()`function.

`itemUsageConstraint()`function. The "less than or equal" operator

`“<=”`is used, because complete depletion of the item pool is not required. Test specification (4) refers to the forced inclusion of certain items in the block assembly, which can also be implemented using the

`itemUsageConstraint()`function. In this specific case we specify the

`whichItems`argument, which lets us choose to which items this constraint should apply. For this specification, the

`operator`argument is set to

`“=”`as the items have to appear once across the blocks.

`itemValuesDeviationConstraint()`function. Test specifications (6), (7), and (8) are implemented by transforming the respective variables to factors so we can apply the

`itemCategoryMinConstraint()`or the

`itemCategoryMaxConstraint()`functions. As every block should contain at least some items at the intermediate difficulty levels and also in each block at least one item at the adjacent difficulty levels, we set the

`min`argument for this test specification to

`c(1, 2, 2, 1)`. For test specification (7), item formats are grouped into three different groups, which then are constrained by setting the minimum number of items of each group per block to two. In some LSA studies, items are flagged that have empirical proportions correct below and/or above a certain value (cf., test specification (8)). Therefore, we limit the inclusion of items that range below 8 percent and above 92 proportion correct to a maximum of two items per category per block.

#### 3.3. High-Stakes Assessment

`items_diao`in the

`eatATA`package). The item pool consists of 165 items following the three-parameter logistic model (3PL). Each item belongs to one of six content categories. The first five items of the generated item pool can be seen in Appendix D Table A3. In this example, the goal is to assemble two parallel test forms with the following test specifications: (1) absolute target values for the TIFs set as ${T}_{\theta}=5.4,10,5.4$ at $\theta =-1.5,0,1.5$; minimize the distances of the TIFs of the two new forms with respect to the target at these ability values, (2) distribute the number of items per content category evenly across test forms; Appendix E Table A4 presents the numbers of items per content category that are available in the complete item pool as well as the numbers required in each of the two forms, (3) no overlapping items, and (4) each test form should contain exactly 55 items. These specifications are directly copied from Diao and van der Linden [9]. The code for calculating the IIF, and setting up the minimax objective function as well as the other constraints can be seen in Figure 8.

#### 3.4. Multi-Stage Testing

#### 3.4.1. Original Approach

`allowedDeviation`is set to 0.5. The syntax for the implementation of the objective function and the constraints can be found in Appendix F Figure A1 and Figure A2.

#### 3.4.2. Combined Capped Approach

`eatATA`package, it is possible to assemble the modules for the two stages in one combined assembly. Especially in situations with multiple stages, a simultaneous assembly may prevent infeasibility at later stages—that is, when the modules for the stages are assembled sequentially, the assembly of the first stages may deplete the item pool so that it becomes impossible to meet certain test specifications at later stages. In addition, from a practitioners perspective, a simultaneous assembly may also be easier, as the item pool does not need to be adjusted after every assembly step.

`R`syntax for the combined capped approach can be seen in Figure 9. To implement test specification (1) we specify the maximization of the minimum TIF values at the ability values for the routing module in the first stage. Note that we do not use the original maximin approach but rather the capped maximin approach [20]. The capped maximin approach does not require to set a maximally allowed deviation, it combines maximizing the minimal TIF with minimizing the maximal difference between the TIFs. For the modules at the second stage (test specification (2)) the capped maximin approach is also used. To combine these constraints, knowing that the obtained TIF values in the first stage and the obtained TIF values at the second stage do not need to be in the same range, we can set a weight for the TIF values. In this case, the minimal TIF in both stages can be considered equally important. Hence, the weights are set to 1 (which is the default). Because the other test specifications in Figure 9 correspond to test specifications illustrated earlier, further explanations are omitted. The complete code for the multi-stage assessment use case, both for the original two-stage as well as the new combined assembly, with the solver call and the solution inspection, can be found in Supplement S4.

## 4. Discussion

`eatATA`package and this tutorial paper have been written to promote ATA methods and make them more accessible to practitioners and researchers. We have provided a short overview of the basic ideas of ATA and an illustration of the typical

`eatATA`work flow. Using a small illustrative example and four different, more realistic use cases, we demonstrated how the package can be used to implement ATA in

`R`. By choosing a wide range of different ATA applications with diverse test specifications we hope to spark interest in ATA methods in a broad audience.

#### 4.1. Limitations

`eatATA`to solve ATA problems. As mentioned in use case (2), hierarchical item stimulus structures can not be implemented as flexibly as suggested by van der Linden [1] with optional item selection. However in practice, item sets are often treated as fixed units anyway, as altering the item set that is presented alongside a stimulus might have undesirable effects on the psychometric properties of the individual items. Furthermore, item overlap specifications between test forms cannot be specified directly in

`eatATA`. Generally, a direct implementation of item overlap constraints drastically increases the complexity of the mathematical programming problem, resulting in high computing times. Moreover, often item overlap specifications can be met indirectly, for instance by first selecting a set of items that can serve as overlap items, and then constraining the number of overlap items per test form, as well as how many times the overlap items can appear across the test forms. Therefore, direct overlap constraints are deliberately not included in

`eatATA`. Finally, solver selection in

`eatATA`is limited to the solvers mentioned in the introduction. For example,

`CPLEX`and

`XPRESS`are potent commercial alternatives to

`Gurobi`. Another potentially promising open source solver, unfortunately currently without an

`R`API is

`SCIP`[20]. However, we do believe that for many ATA contexts the available selection of solvers is more than sufficient.

#### 4.2. Alternatives

`R`packages exists, this is not the case for ATA methods. More precisely, we are only aware of four other

`R`packages on CRAN that have some ATA functionality implemented, of which only two (

`TestDesign`,

`RSCAT`) seem to be under active development. Indeed,

`TestDesign`[28] provides access to the same selection of solvers as

`eatATA`but has a strong focus on adaptive testing. This is illustrated by the fact that

`TestDesign`is not suited for the assembly of multiple parallel test forms. In a similar vein,

`RSCAT`provides functionality specific to the shadow test approach in computerized adaptive testing [29]. However, other testing approaches, such as multi-stage testing or linear testing, are not supported in that package. Finally,

`Rata`[30] and

`xxIRT`[31] also implement ATA methods in

`R`. Yet both packages only provide access to the

`Rglpk`and

`lpSolve`solvers. In addition, although both packages in general have a similar work flow compared to

`eatATA`, their functionality is more limited compared to

`eatATA`(e.g., no specific categorical item constraint functions, no automatic calculation of target values, no item inclusions constraints).

`eatATA`. Another potential benefit of heuristic algorithms is that some allow the introduction of soft constraints, which might be helpful for dealing with feasibility issues. However, to our knowledge, only limited ATA applications of these algorithms exist. For example, we are not aware of a single ATA application of heuristic algorithms using

`R`. Furthermore, it can be argued that for most practical ATA applications MIP solvers perform sufficiently well from a computational stand point [36].

#### 4.3. Conclusions

`eatATA`can be a helpful tool for researchers and practitioners that want to assemble test forms. It is applicable in a wide range of scenarios and its user interface should be rather intuitive for

`R`users. By providing this tool we hope to promote automated test assembly methods, which are almost always superior to manual test assembly approaches.

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

`eatATA`package.

## Acknowledgments

`eatATA`and early testing of the package. We also thank Simone Dubiel for further ideas for and testing of

`eatATA`.

## Conflicts of Interest

## Abbreviations

ATA | Automated Test Assembly |

MIP | Mixed-integer Programming |

CAT | Computerized Adaptive Testing |

MST | Multi-stage Testing |

LSA | Large-scale Assessment |

HST | High-stakes Assessment |

IIF | Item Information Function |

TIF | Test Information Function |

## Appendix A

## Appendix B

Item | diffCategory | Format | Domain | Time | Exclusions |
---|---|---|---|---|---|

1 | 2 | cmc | listening | 44.54 | |

2 | 4 | cmc | listening | 44.81 | |

3 | 4 | mc | writing | 32.36 | 76 |

4 | 2 | mc | listening | 48.03 | |

5 | 2 | mc | writing | 42.06 | 9 |

## Appendix C

Testlet | Item | Level | Format | Frequency | Infit | Time | Anchor |
---|---|---|---|---|---|---|---|

TRA5308 | TRA5308a | IV | multiple choice | 0.19 | 1.22 | 54.00 | 0 |

TRA5308 | TRA5308b | IV | multiple choice | 0.24 | 1.01 | 66.00 | 0 |

TRA5308 | TRA5308c | II | multiple choice | 0.42 | 1.22 | 89.00 | 0 |

TRA5308 | TRA5308d | III | multiple choice | 0.41 | 1.21 | 92.00 | 0 |

TRB6832 | TRB6832a | III | open answer | 0.51 | 1.21 | 85.00 | 0 |

TRB6832 | TRB6832b | III | open answer | 0.20 | 1.08 | 61.00 | 0 |

TRB6832 | TRB6832c | IV | open answer | 0.33 | 1.25 | 84.00 | 0 |

TRB6832 | TRB6832d | II | open answer | 0.49 | 1.05 | 109.00 | 0 |

TRC9792 | TRC9792a | I | cmc | 0.70 | 1.10 | 94.00 | 0 |

TRC9792 | TRC9792b | I | cmc | 0.61 | 1.02 | 110.00 | 0 |

## Appendix D

Item | a | b | c | Category |
---|---|---|---|---|

1 | 0.54 | −0.09 | 0.17 | 6 |

2 | 0.71 | −1.07 | 0.24 | 1 |

3 | 0.84 | −1.11 | 0.17 | 2 |

4 | 1.38 | −0.71 | 0.21 | 3 |

5 | 1.26 | −0.44 | 0.12 | 4 |

## Appendix E

Cat. 1 | Cat. 2 | Cat. 3 | Cat. 4 | Cat. 5 | Cat. 6 | |
---|---|---|---|---|---|---|

Item Pool | 23 | 26 | 22 | 29 | 29 | 36 |

HST | 9 | 9 | 7 | 9 | 9 | 11 |

MST: Stage 1 | 4 | 4 | 3 | 4 | 4 | 5 |

MST: Stage 2 | 3 | 3 | 2 | 3 | 3 | 4 |

## Appendix F

## References

- Van der Linden, W.J. Linear Models for Optimal Test Assembly; Springer: New York, NY, USA, 2005. [Google Scholar]
- Luecht, R.M.; Sireci, S.G. A Review of Models for Computer-Based Testing; Research Report 2011-12; College Board: New York, NY, USA, 2011. [Google Scholar]
- Kuhn, J.T.; Kiefer, T. Optimal test assembly in practice. Z. Für Psychol.
**2015**, 221, 190–200. [Google Scholar] [CrossRef] - OECD. PISA 2018 Technical Report; Technical Report; OECD Publishing: Paris, France, 2019. [Google Scholar]
- Yan, D.; Von Davier, A.A.; Lewis, C. Computerized Multistage Testing: Theory and Applications; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar]
- Van der Linden, W.J.; Glas, C.A. Computerized Adaptive Testing: Theory and Practice; Kluwer Academic Publishers: New York, NY, USA, 2000. [Google Scholar]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2020. [Google Scholar]
- Konis, K.; Schwendinger, F. lpSolveAPI: R Interface to ‘lp_solve’ Version 5.5.2.0, R Package Version 5.5.2.0-17.7; 2020. Available online: https://CRAN.R-project.org/package=lpSolveAPI (accessed on 20 May 2021).
- Diao, Q.; van der Linden, W.J. Automated test assembly using lp_solve version 5.5 in R. Appl. Psychol. Meas.
**2011**, 35, 398–409. [Google Scholar] [CrossRef] - Becker, B.; Debeer, D. eatATA: Create Constraints for Small Test Assembly Problems, R Package Version 0.11.2; 2021. Available online: https://CRAN.R-project.org/package=eatATA (accessed on 20 May 2021).
- Makhorin, A. GLPK (GNU Linear Programming Kit). 2018. Available online: https://www.gnu.org/software/glpk/ (accessed on 20 May 2021).
- Berkelaar, M.; Eikland, K.; Notebaert, P. lp_solve 5.5.2.5. 2016. Available online: http://lpsolve.sourceforge.net/5.5/ (accessed on 20 May 2021).
- Ladanyi, L.; Ralphs, T.; Menal, G.; Mahajan, A. coin-or/SYMPHONY: Version 5.6.17. 2019. Available online: https://projects.coin-or.org/SYMPHONY (accessed on 20 May 2020).
- Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual; Gurobi Optimization, LLC: Houston, TX, USA, 2021. [Google Scholar]
- Theussl, S.; Hornik, K. Rglpk: R/GNU Linear Programming Kit Interface, R Package Version 0.6-4; 2019. Available online: https://CRAN.R-project.org/package=Rglpk (accessed on 20 May 2020).
- Berkelaar, M.; Csárdi, G. lpSolve: Interface to ’Lp_solve’ v. 5.5 to Solve Linear/Integer Programs; R Package Version 5.6.15; 2020; Available online: https://CRAN.R-project.org/package=lpSolve (accessed on 20 May 2021).
- Harter, R.; Hornik, K.; Theussl, S. Rsymphony: SYMPHONY in R, R Package Version 0.1-29; 2020. Available online: https://CRAN.R-project.org/package=Rsymphony (accessed on 20 May 2021).
- Gurobi Optimization, LLC. Gurobi: Gurobi Optimizer 9.1 interface, R package version 9.1-1; Gurobi Optimization, LLC: Houston, TX, USA, 2021. [Google Scholar]
- Donoghue, J.R. Comparison of Integer Programming (IP) Solvers for Automated Test Assembly (ATA); Research Report 15-05; Educational Testing Service: Princeton, NJ, USA, 2015. [Google Scholar]
- Luo, X. Automated Test Assembly with Mixed-Integer Programming: The Effects of Modeling Approaches and Solvers. J. Educ. Meas.
**2020**, 57, 547–565. [Google Scholar] [CrossRef] - Rasch, G. Studies in Mathematical Psychology: I. Probabilistic Models for Some Intelligence and Attainment Tests; Nielsen & Lydiche: Copenhagen, Denmark, 1960. [Google Scholar]
- Spaccapanico Proietti, G.; Matteucci, M.; Mignani, S. Automated Test Assembly for Large-Scale Standardized Assessments: Practical Issues and Possible Solutions. Psych
**2020**, 2, 315–337. [Google Scholar] [CrossRef] - Gonzalez, E.; Rutkowski, L. Principles of multiple matrix booklet design and parameter recovery in large-scale assessments. In IERI Monograph Series: Issues and Methodologies in Large-Scalse Assessments: Volume 3; von Davier, M., Hastedt, D., Eds.; IEA-ETS Research Institute: Hamburg, Germany, 2010; pp. 125–156. [Google Scholar]
- Frey, A.; Hartig, J.; Rupp, A.A. An NCME instructional module on booklet designs in large-scale assessments of student achievement: Theory and practice. Educ. Meas. Issues Pract.
**2009**, 28, 39–53. [Google Scholar] [CrossRef] - Pokropek, A. Missing by design: Planned missing-data designs in social science. Res. Methods
**2011**, 20, 81–105. [Google Scholar] - Kolen, M.J.; Brennan, R.L. Test Equating, Scaling, and Linking: Methods and Practices; Springer Science & Business Media: New York, NY, USA, 2014. [Google Scholar]
- OECD. PISA 2012 Technical Report; Technical Report; OECD Publishing: Paris, France, 2014. [Google Scholar]
- Choi, S.W.; Lim, S. TestDesign: Optimal Test Design Approach to Fixed and Adaptive Test Construction, R Package Version 1.2.2; 2021. Available online: https://CRAN.R-project.org/package=TestDesign (accessed on 20 May 2021).
- Jiang, B. RSCAT: Shadow-Test Approach to Computerized Adaptive Testing, R Package Version 1.1.0; 2021. Available online: https://CRAN.R-project.org/package=RSCAT (accessed on 20 May 2021).
- Luo, X. Rata: Automated Test Assembly; R Package Version 0.0.2; 2019. Available online: https://CRAN.R-project.org/package=Rata (accessed on 20 May 2021).
- Luo, X. xxIRT: Item Response Theory and Computer-Based Testing in R, R Package Version 2.1.2; 2019. Available online: https://CRAN.R-project.org/package=xxIRT (accessed on 20 May 2021).
- Chang, T.Y.; Shiu, Y.F. Simultaneously construct IRT-based parallel tests based on an adapted CLONALG algorithm. Appl. Intell.
**2012**, 36, 979–994. [Google Scholar] [CrossRef] - Sun, K.T.; Chen, Y.J.; Tsai, S.Y.; Cheng, C.F. Creating IRT-based parallel test forms using the genetic algorithm method. Appl. Meas. Educ.
**2008**, 21, 141–161. [Google Scholar] [CrossRef] - Verschoor, A.J. Genetic Algorithms for Automated Test Assembly. Ph.D. Thesis, Twente University, Enschede, The Netherlands, 2007. [Google Scholar]
- Veldkamp, B.P. Multiple objective test assembly problems. J. Educ. Meas.
**1999**, 36, 253–266. [Google Scholar] [CrossRef] - Van der Linden, W.J.; Li, J. Comment on three-element item selection procedures for multiple forms assembly: An item matching approach. Appl. Psychol. Meas.
**2016**, 40, 641–649. [Google Scholar] [CrossRef] [PubMed]

**Figure 3.**Define constraints: Number of items in the test form, number of times an item can be used, and total average testing time.

Item | Format | Time | Difficulty | IIF_0 |
---|---|---|---|---|

1 | mc | 27.79 | −1.88 | 0.11 |

2 | mc | 15.45 | 0.84 | 0.45 |

3 | mc | 31.02 | 1.12 | 0.33 |

4 | mc | 29.87 | 0.73 | 0.50 |

5 | mc | 23.13 | −0.49 | 0.61 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Becker, B.; Debeer, D.; Sachse, K.A.; Weirich, S. Automated Test Assembly in R: The eatATA Package. *Psych* **2021**, *3*, 96-112.
https://doi.org/10.3390/psych3020010

**AMA Style**

Becker B, Debeer D, Sachse KA, Weirich S. Automated Test Assembly in R: The eatATA Package. *Psych*. 2021; 3(2):96-112.
https://doi.org/10.3390/psych3020010

**Chicago/Turabian Style**

Becker, Benjamin, Dries Debeer, Karoline A. Sachse, and Sebastian Weirich. 2021. "Automated Test Assembly in R: The eatATA Package" *Psych* 3, no. 2: 96-112.
https://doi.org/10.3390/psych3020010