Flexible Item Response Modeling in R with the flexmet Package

: The ﬁltered monotonic polynomial (FMP) model is a semi-parametric item response model that allows ﬂexible response function shapes but also includes traditional item response models as special cases. The flexmet package for R facilitates the routine use of the FMP model in real data analysis and simulation studies. This tutorial provides several code examples illustrating how the flexmet package may be used to simulate FMP model parameters and data (both for dichotomous and polytomously scored items), estimate FMP model parameters, transform traditional item response models to different metrics, and more. This tutorial serves as both an introduction to the unique features of the FMP model and as a practical guide to its implementation in R via the flexmet package.


Background
Although many applications of item response theory are in the context of parametric models such as the Rasch, two, and three-parameter logistic models [1], there is also a recognized need for models that allow for more flexible relationships between the latent trait and an item category response [2]. Models and techniques that allow for more flexible item response functions are variously known as non-parametric [2], quasi-parametric [3], or semi-parametric [4] models, and include methods such as Mokken scale analysis [5], kernel smoothing [6], and polynomial splines [4]. Historically, flexible item response models have been used to analyze data sets with small sample sizes, to check the assumptions of parametric item response models, and as an alternative to poorly fitting parametric models [2]. In recent years, flexible item response models have increasingly been used in confirmatory contexts. For example, flexible item response models have recently been applied to computerized adaptive testing [7,8], the creation of item banks for measuring health outcomes [9], and the development of optimal scoring procedures [10].
A compelling recent addition to the family of semi-parametric item response models is the filtered monotonic polynomial (FMP) model [3,11]. The general form of the FMP model [12] used in this paper is a generalization of Muraki's generalized partial credit model [13] (GPCM) that replaces a linear function of the latent trait θ with a polynomial expansion of θ. As clarified below, constraints are placed on the item parameters so that this polynomial function is a monotonically increasing function of θ. For item i with C i ordered response categories and person j, the item response function (IRF) of the general FMP model gives the probability of a response in category c, c = 0, . . . C i − 1, as where Psych 2021, 3 ∑ 0 v=0 (b 0 v i + m i (θ j , b 1i , . . . b 2k i +1,i )) ≡ 0, and b i = (b 0 1 i , . . . , b 0 C i −1 i , b 1i , b 2i , . . . , b (2k i +1)i ) is a vector of item parameters in a polynomial coefficient parameterization.
In Equations (1) and (2), the k i parameter is an item-specific non-negative integer that controls the maximum degree of the polynomial function of θ. Specifically, the highestorder polynomial equals 2k i + 1, and so k i = 0, 1, and 2 imply linear, cubic, and fifth-degree polynomial functions of θ. Note that this formulation forces an odd value for the highestorder polynomial of θ, which is a necessary (but not sufficient) condition for the polynomial to be a monotonic function of θ.
A key feature of the FMP model is that it reduces to familiar parametric item response models when m is set to be a linear function of θ (i.e., when k i = 0). Specifically, if k i = 0, Equation (1) reduces to the GPCM, and if both k i = 0 and C i = 2 (i.e., scored item responses are dichotomous), then Equation (1) reduces to the two-parameter logistic (2PL) model.
Another key feature of this model not shared by many flexible item response models is that its model parameters are portable [3], meaning that the FMP model can be used to construct item banks, conduct adaptive testing, and score examinees not included in the original sample. To better acquaint the reader with the relationship between the FMP model with k = 0 and other item response models used in popular IRT software packages, several examples of finding FMP parameters from the output of the R packages ltm [14], mirt [15], and TAM [16] are included in Appendix A.
The FMP model requires that m (θ) be a monotonically increasing function of θ. To enforce monotonicity, we may use a transformation of the polynomial coefficient parameters b. In general, consider the polynomial function Equation (3) is a strictly monotonically increasing function of θ if and only if the first derivative of m(θ), ∂m(θ) ∂θ = b 1 + 2b 2 θ + 3b 3 θ 2 + · · · + (2k + 1)θ 2k , is positive at all values of θ. One way to enforce positivity is through the following reparameterization of Equation (4) [3,12,17]: In this parameterization, no boundary constraints are required for ω, α h , or τ h , h = 1, . . . k. Because b 0 c i parameters do not affect the monotonicity of m , no transformation of these parameters is necessary, but we use the symbol ξ ci = b 0 c i to use notation consistently across the different parameterizations. Therefore, b 0 c i = ξ ci for c = 1, . . . , C i − 1, b 1 = exp(ω i ), and the b 2 , . . . , b 2k i +1 parameters are functions of ω i , α i , and τ i calculated using matrix operations described elsewhere [3,11,12,18,19]. Thus, the FMP model is equivalently represented by the polynomial coefficient parameters . . , b (2k i +1)i ) and the Greek-letter parameters (ξ 1i , ξ 2i , . . . , ξ (C i −1)i , ω, α 1i , τ 1i , . . . , α (2k i +1)i , τ (2k i +1)i ) . In both parameterizations, an item with C i response categories and item complexity k i is described by 2k i + C i − 1 item parameters. To better acquaint the reader with the Greek-letter and polynomial coefficient parameterizations, Appendix B includes formulas to calculate the polynomial coefficients from the Greek-letter parameters up to k = 2.

Specifying the FMP Model in flexmet flexmet flexmet
The R package flexmet provides broad functionality for specifying, fitting, and transforming the FMP model, and many of its features (relative to version 1.1) are illustrated in the remainder of this paper. The IRF for the FMP model is specified using the polynomial coefficient parameters b i as described in the previous section. However, the Greek-letter parameterization is also needed when fitting and generating FMP items to ensure mono-tonicity. The flexmet package includes the greek2b and b2greek functions to navigate between the two different parameterizations. To illustrate these functions, consider a six-item test. Three items (items 1, 2, and 3) are defined for binary item responses, and three items are defined for four-category responses (items 4, 5, and 6). Additionally, items 1 and 4 have k i = 0, items 2 and 5 have k i = 1, and items 3 and 6 have k i = 2. The parameters used for this illustration (both the Greek-letter and polynomial coefficient parameterizations) are printed in Table 1. As another example, let us find the b vector for item 6. In this case, the greek2b function requires the user also to specify the alpha and tau arguments, as well as multiple xi parameters corresponding to the different response categories. Note that the xi parameters should be given in order from ξ 1i , . . . , ξ (C i −1)i , and the alpha and tau vectors should also be ordered α 1i , . . . , α k i i and τ 1i , . . . , τ k i i . It is also possible to represent FMP item parameters with different k values and different numbers of items in the same matrix. To do this, the NA symbol may be used to represent higher-order ξ values for items with C i < max(C i ). In addition, specifying α h = 0 and τ h = −∞ will set the corresponding polynomial coefficients (i.e., b 2h,i and b 2h+1,i ) to be equal to 0. For example, to find the matrix of polynomial-coefficient item parameters for the example six-item test, we can bind together calls to greek2b that have xi, alpha, and tau arguments of the same length for each item.  Following from Equation (1), these item parameters can be used to find the probability of responding in each response category as a function of the latent trait θ; that is, the IRF. The irf_fmp function calculates item response probabilities for the FMP model given a scalar or vector of latent trait value(s) theta and a matrix or vector of item parameters bmat in the polynomial coefficient parameterization. It may also be necessary to specify the maxncat function, which gives the maximum number of response categories represented in the matrix (such that the first maxncat-1 columns are interpreted as ξ parameters). The default value of maxncat = 2, so this argument should be specified if bmat includes at least one polytomous item. Calls to irf_fmp will result in a three-dimensional array with θ values in the first dimension, items in the second dimension, and response categories in the third dimension. For example, In the above output (and elsewhere in this tutorial), informative dimension labels have been added to ease readability. This output shows, for example, that the probabilities of responding in categories 0, 1, 2, and 3 to item 6 for a person with θ = −1 equal 0.4118, 0.3800, 0.1849, and 0.0233. Notice that the output also includes some NA values. This is because items 1, 2, and 3 have only 2 response categories, and therefore these subjects can only respond in categories 0 and 1 and categories 2 and 3 are NA.
For dichotomous items, it is common to only find the probability of responding in the higher response category (because the sum of response category probabilities must sum to 1 for each θ, probabilities for category 0 are 1 minus those for category 1). In flexmet, specific category response probabilities can be found by adding the returncat argument. This is illustrated below for item 1: irf_fmp(theta = theta, bmat = bmat[1, ], maxncat = 4, returncat = 1) ## item 1 ## theta = -1 0.3113050 ## theta = 1 0.8549576 Notice that the maxncat argument is set equal to 4 in this example, even though the item is dichotomous. This is because the item parameters are taken from the first row of bmat, and bmat includes columns corresponding to four-category items. In other words, maxncat should be set to one greater than than the number of bmat columns (or one greater than the number of bmat entries, if bmat is a vector) that correspond to ξ parameters, even if no item with maxncat categories is included in the call to bmat. Note that the returncat argument represents the value of the category to output, where categories are labeled starting at 0. Therefore, returning category "1" for dichotomous item 1 returns the probability of a positive item response to item 1. By default, if maxncat = 2, only probabilities for category "1" are returned, and if maxncat > 2, all response category probabilities are returned.
Calls to irf_fmp also can be used to plot IRFs. Figure 1 displays the IRFs for the six example items, and tge code to produce a version of this figure is included in Appendix C. In this figure, response probabilities for category 1 (a positive response) are shown for the 3 dichotomous items, and all response category probabilities are shown for the 3 polytomous items.

Fitting the FMP Model in flexmet flexmet flexmet
The flexmet package includes functionality for specifying, fitting, and manipulating the general FMP model. Notably, flexmet is not the only R package available for fitting the FMP model. For example, the mirt package [15] includes functionality to estimate FMP item parameters using marginal maximum likelihood estimation. However, if any items have k = 0, the mirt package currently (version 1.34 of mirt is current at the time of writing) requires the user to estimate either 2PL or GPCM parameters rather than FMP parameters. Because mirt parameterizes the 2PL and GPCM differently than the Greek-letter FMP parameterization, this may make the mirt package less than ideal for tests with a variety of k i values and for advanced applications of the FMP model (such as scale transformations, as described in a later section). Therefore, the flexmet package includes several ways to estimate FMP model parameters, including built-in methods for fixed-effects and random-effects estimation, as well as a wrapper that uses the mirt package to estimate parameters and return parameter estimates in a standardized format. As illustrated below, flexmet facilitates item parameter estimation for items with any combination of k i values and numbers of response categories, with or without the use of Bayesian priors.
Early applications of the FMP model treated the latent trait as a fixed effect to estimate item parameters [3,11]. In this fixed-effects approach, initial estimates of the θ parameters are treated as known quantities when calculating maximum likelihood estimates of the Greekletter parameters for an individual item. These fixed θ values are called θ surrogates [3,11] and are calculated as the first principal components scores of the full data matrix. In the following code chunk, 1000 true θ values are simulated from a standard normal distribution. Then, the sim_data function in flexmet is used in conjunction with the six-item bmat matrix defined earlier to randomly generate item response data. Finally, θ surrogates are calculated by passing the simulated data to flexmet's get_surrogates function.
set.seed(234) theta <-rnorm(1000) dat <-sim_data(bmat = bmat, theta = theta, maxncat = 4) tsur <-get_surrogates(dat) The tsur object now includes 1000 θ surrogate values calculated from the simulated data. To estimate item parameters for a single item using fixed-effects estimation with θ surrogates, we can use the fmp_1 function in flexmet. As illustrated below, this function requires the user to specify the data vector, the desired k value, and a vector of θ surrogates tsur. Below, this is illustrated for k values of 0, 1, and 2.
i <-2 # choose an item fe0i <-fmp_1(dat[, i], k = 0, tsur = tsur) fe1i <-fmp_1(dat[, i], k = 1, tsur = tsur) fe2i <-fmp_1(dat[, i], k = 2, tsur = tsur) The best choice of k for a given item is typically unknown, and so authors have suggested comparing the item-level Akaike Information Criteria (AIC) value to select the optimal k value [3,12]. We can perform this comparison by extracting the AIC list element from each call to fmp_1. In this example, k = 1 leads to the lowest AIC value. Incidentally, k = 1 is also the data-generating k value for this item (item 2), though this will not always be the case. Note that it is not always desirable to seek the "correct" k i value, but instead it may be preferable to approximate the population curve as closely as possible without overfitting the data. One measure of the similarity of item response functions is the root integrated mean squared error (RIMSE; [6,12]), which is defined here as where P 1 and P 2 represent the two item response functions to compare (not necessarily from the FMP model), and g(θ) indicates a θ distribution to integrate over. Smaller values of RIMSE indicate greater similarity between the two curves. In flexmet, the rimse function can calculate the RIMSE for any combination of b-vectors that represent the same number of response categories (though not illustrated here, rimse can also be used with non-FMP item response functions). In flexmet, g(θ) is standard normal by default but can be modified using the int argument, which expects a matrix with two columns. The first column should include a sequence of quadrature points, and the second column should include the densities of each quadrature point, scaled so that the densities sum to 1. The int_mat function in flexmet facilitates the creation of this matrix. The int_mat function takes a distribution function distr (such as dnorm or dunif), a named list args of the parameters of that distribution, the lower and upper bounds of the quadrature points, lb and ub, and the number of quadrature points, npts. An example of modifying these arguments of the int_mat function is included later in this paper (Section 4.2). The first two arguments to rimse should be two vectors of b-parameters, in either order. In the code below, the true b-parameters for item 2 are listed first. Note that because columns 2 and 3 of bmat represent category intercept parameters for polytomous items (and include NA values for item 2), we omit these from the b-vector when calling rimse. The estimated b-parameters are listed second and are found in the bmat list element of each call to fmp_1. If items are polytomous, the ncat argument should also be specified to indicate the number of response categories. Because the default value of ncat = 2, it is not necessary to include this argument for dichotomous items.
Comparing the RIMSE of the estimated curves versus the data-generating curves for 3 values of k, we see that k = 0 leads to the highest error in estimation, followed by k = 2, and k = 1 most closely traces the population item response function.
It is also possible to estimate fixed-effects item parameters for multiple items in one command using the fmp function with the em = FALSE argument. This method will automatically calculate θ surrogates based on the provided data matrix. In the fmp function, it is possible to specify different k values for different items. Namely, if a scalar is specified for the k argument, the same k value will be used for all items. Otherwise, a vector of k values should be specified, one per item.
In the example below, the fixed-effects FMP model is fit several times. First, all items are fit with k = 0, k = 1, and k = 2. Based on these results, a model with differing k values is specified based on the optimal k value, as indicated by comparing item-level AIC values. The user may notice that the model with k = 2 produces an error that the (item parameter) information matrix cannot be inverted. This is a common problem with high k values. If this error occurs, standard errors are not available for the estimated item parameters; however, the item parameter estimates themselves may be used without concern.
Based on the above results, we see that the lowest AIC value is observed for k = 0, 0, 2, 2, 0, and 1 for the 6 items. We may then choose to fit a new fixed-effects FMP model with these varying k values, as illustrated below. In the item parameter matrix printed above, notice how items with different k values and numbers of response categories are represented. Specifically, NA's are used as placeholders for items 1-3 that include less than the maximum number of response categories. In contrast, if k i is less than the maximum k i value represented in the parameter matrix (as is the case for items 1, 2, 5, and 6), then the higher-order b parameters are set to 0.
In addition to fixed-effects estimation, flexmet also provides functionality for randomeffects estimation using marginal maximum likelihood estimation via the expectationmaximization (EM) algorithm [20]. The flexmet package includes both an inbuilt algorithm for estimating item parameters (using the fmp function with option em = TRUE) and a wrapper to the mirt [15] package (using the fmp function with option em = "mirt"). The mirt algorithm is currently faster and more reliable than the inbuilt algorithm, and so this option is used for illustration. Note that the mirt package must be installed in order to use this option. Using the same pattern of k values found with the fixed-effects model, we find the following results: At a glance, comparing the bmat outputs of fe_mixed and re_mixed indicates that while some parameter estimates are similar across the two estimation methods, others are quite different. Notably, for highly parameterized models such as the FMP model, very different sets of parameters can trace similar curves. For this reason, it is useful to plot curves or to apply overall summary measures such as the RIMSE when comparing estimated curves to the true curve or to each other. These RIMSE values are calculated for fe_mixed and re_mixed after illustrating one final estimation method.
Another useful estimation option available for both the fmp_1 and fmp functions is the use of Bayesian priors. Particularly, for higher values of k, the FMP model may become computationally unstable, which can be somewhat alleviated through the use of priors [12]. Because the Greek-letter parameterization of the FMP is used for parameter estimation, priors are placed on the Greek-letter parameters rather than on the more readily interpretable b parameters. When choosing priors, it may be helpful to note that (ξ 1i , . . . ξ (C i −1)i ) = (b 0 1 i , . . . b 0 C i −1 i ) and that ω = ln b 1 . In addition, higher-order b parameters will equal zero if the corresponding α = 0 and τ = −∞. Prior predictive simulation may help the user to select appropriate priors. For example, the following code produces a prior predictive simulation of 20 item response curves generated from the following distributions: ξ ∼ N(0, 1), ω ∼ N(−0.5, 1), α ∼ N(0, 0.5), and τ ∼ N(−3, 0.5). The plot produced by this code is shown in Figure 2. This choice of priors appears to allow for a variety of shapes and locations of the FMP curves, and so we continue to illustrate Bayesian estimation with these priors.
Priors can be added to models fit using flexmet by specifying the prior argument to the fmp or fmp_1 function. As of version 1.1 of flexmet, only normally distributed priors are available, and the same priors are applied to all model parameters of a given class (i.e., for all items in the data set and for all instances of that parameter type within an item). To specify priors, a list with named elements xi, omega, alpha, and tau should be passed to the prior argument. For example, to specify a standard normal prior for all ξs, we may write prior = list(xi = c("norm", 0, 1)). The code below illustrates Bayesian estimation with the em = "mirt" option.
re_priors <-fmp(dat, k = c(0, 1, 2, 0, 1, 2), em = "mirt", prior = list(xi = c("norm", 0, 1), omega = c("norm", -0.5, 1), alpha = c("norm", 0, 0.5), tau = c("norm", -3, 0.5))) Finally, we may compare the accuracy of each of the three illustrated estimation methods using the rimse function. The code below does this for each of the six simulated items. For each item, random-effects estimation tends to lead to somewhat more accurate curves than fixed-effects estimation, and the use of priors improves parameter estimation accuracy in some, but not all, cases. Finally, flexmet includes the th_est_ml and th_est_eap functions for maximum likelihood (ML) and expected a posteriori (EAP, [21]) person parameter estimation. Both of these functions require a data matrix dat and a matrix of b-parameters bmat. If at least one item is polytomous, the maxncat argument should also be specified. By default, the th_est_eap function uses a standard normal prior with 33 quadrature points, and this may be modified using the int argument in conjunction with int_mat. A call to either trait estimation function will result in a matrix with two columns: one for the θ parameter estimates and one for the standard errors (for ML) or posterior standard deviations (for EAP). The code below illustrates this procedure for the example data and the re_priors estimated item parameters. Aside from the increased flexibility in IRF shapes afforded by the FMP model, another potential use of the FMP model is to transform item response models linearly or nonlinearly [18,19]. Specifically, suppose that two scalings of the latent trait, θ and θ , are related by a monotonic polynomial function of degree 2k θ + 1 such that where k θ is a non-negative integer. Then, an FMP model defined on the scale of θ may be transformed to the scale of θ using matrix operations described elsewhere [18,19]. For a given k θ and an item with a given k i (i.e., item complexity defined on the scale of θ), the item complexity on the scale of θ , k i , equals Therefore, item response functions on the scale of θ will necessarily have larger item complexities than response functions on the scale of θ. Two applications of nonlinearly transforming the FMP model will be illustrated below. In the first application, item parameters are estimated separately for two groups with different latent trait distributions. Because the latent trait is often assumed to follow a standard normal distribution for model identification, the fitted item response models will be on different, nonlinearly related, scales. Linear and nonlinear linking transformations will be estimated and compared. In the second application, an item response model will be transformed such that the latent trait scale is on a more interpretable percentage-correct score metric.

Item Parameter Linking with the FMP Model
If item parameters for the same items are estimated from two different samples, item parameter linking may be used to put the parameters on the same scale. The need for item parameter linking arises from differences in model identification constraints [22]. For the most common parametric item response models, such as the 2PL and GPCM, the chosen functional form of the item response model identifies the scale of the latent trait up to a linear transformation. In addition, models may be identified by assuming that the latent trait is standardized, or that the latent trait follows a standard normal distribution (as is the case for the fixed-effects FMP estimation illustrated above). Therefore, if there are any differences in the location, variance, or shape of the latent trait distribution between groups, linking may be necessary to put the two sets of item parameters on more similar scales. To illustrate how item parameters on different scales may be linked with the FMP model, an illustrative simulation is presented next.
To simulate data, a population set of FMP item parameters was first generated for 20 items each with two response categories and k = 0 using the sim_bmat function in flexmet. With the sim_bmat function, the user can generate items with varying k values and numbers of response categories by specifying vectors for the k and ncat functions. The population item parameters were then used to generate data for two groups of 5000 examinees. The population latent trait values for the first group were first drawn from a standard normal distribution, then transformed by taking −2 plus e to the power of the standard normal scores divided by 2. The population latent trait values for the second group were drawn from a standard normal distribution. From this design, because the second group's θ values were transformed to the scale of the first group's scores (which were identified by a combination of the shape of the fitted model and the assumption of a standard normal distribution), the population relationship between the two scales equals set.seed(123) bmat <-sim_bmat(n_items = 20, k = 0, ncat = 2)$bmat theta1 <-exp(rnorm(5000) / 2) -2 theta2 <-rnorm(5000) After generating population θ values, data were then generated separately for each group. To select the k values that best represent these data, each item for each group was first fit to a series of FMP models using the fmp_1 function based on theta surrogates. Specifically, each item was first fit with k = 0 and k = 1. If the AIC for k = 0 was smaller than for k = 1, then the final model was fit with k = 0. If not, models with sequentially higher k values were fit until the model with the smaller k value had a smaller AIC than the model with the higher k value. The code to implement this procedure is included in Appendix D and resulted in a mixture of k = 0 and k = 1 for both groups. Once the empirically selected k values for each item and group were chosen, item parameters were estimated again with fmp and the em = "mirt" option. For group 1, the argument technical = list(NCYCLES = 1500) was passed to mirt because more than 500 cycles of the EM algorithm (the default in mirt) were required for this model to converge. data1 <-sim_data(bmat = bmat, theta = theta1) data2 <-sim_data(bmat = bmat, theta = theta2) k1 <-c(0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1) k2 <-c (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) d1_2pl <-fmp(data1, k = k1, em = "mirt", technical = list(NCYCLES = 1500)) d2_2pl <-fmp(data2, k = k2, em = "mirt") The flexmet package includes two functions for linking item parameters: hb_link for linking based on the Haebara method (HB) [23] and sl_link for linking based on the Stocking-Lord method (SL) [24], and both functions take the same arguments. The bmat1 and bmat2 arguments give the matrices of item parameters in the polynomial coefficient parameterization. These functions currently assume that both matrices include the same set of common items in the same order. The bmat2 parameters are on the θ metric and are transformed to the θ metric of the bmat1 parameters. The polynomial degree of the estimated transformation is specified by the k_theta argument (monotonicity of the estimated linking transformation is ensured using the same strategy as is used for item parameters).
Comparing these values resulting from the HB or SL method with different k θ can help inform which k θ yields the most appropriate latent trait transformations, where smaller values indicate smaller differences between the estimated item/test response functions on the two scales. Although criteria values from calls to hb_link should not be compared to values from calls to sl_link, comparing values with different k θ suggests that k θ = 1 leads to the closest match between the two sets of item parameters under both the SL and HB methods.
Calls to hb_link and sl_link also output the estimated vector of polynomial coefficients in the tvec list element. As shown below, although these values are similar for the HB and SL methods, there are some differences, particularly for higher-order polynomial coefficients. The tvec output is useful for understanding the estimated relationship between the θ and θ scales. Recall from Equation (7) that θ is a polynomial function of θ , and Equation (9) gives the population relationship between θ and θ . In Figure 3, the SL linking transformations are compared to the true linking transformation. The code to reproduce a version of this figure is included in Appendix E and makes use of flexmet's inv_poly function to calculate the θ value that corresponds to any particular value of θ . Although in non-simulation applications, the true latent trait transformation will not be known, plotting may still be a useful tool to inspect the similarity of different transformations. To illustrate the meaning of the linking transformation, consider a θ value of 0.5. In the code below, this θ value is transformed to a θ value using the inv_poly function, which takes a scalar value (here, each θ) and returns the θ value that combines with the polynomial coefficient vector coefs (see Equation (7)) to produce θ. As shown below for the SL transformation with k θ = 1, the item response probabilities for theta combined with the originally estimated bmat are identical to the transformed thetastar values combined with the transformed item parameters (which are found in the bmat list element of the linking output). This equality is demonstrated for the first three items of the simulated test.

Transforming the FMP Model to a User-Defined Scale
Another potential application of FMP model transformations is to specify an item response model on a latent trait metric other than θ. In the following illustration, flexmet functions are used to transform a set of 3 parameter logistic (3PL) model item parameters from the typical θ metric to a percentage-correct (PC) metric where the lowest score is 0, the highest score is 100, and scores roughly correspond to the percentage of items answered correctly. Although this type of model transformation is illustrated using a PC metric, any monotonic transformation is possible, so long as the user can identify a suitable polynomial approximation to the desired transformation. Transforming the entire item response model to a more interpretable metric allows for trait estimates, item information, standard errors, and other quantities to be calculated directly on the reported score metric.
The item parameters used for the following illustration are taken from the tcals data set found in version 3.16 of the catR package [25] (and also included in Appendix F). Note that these parameters correspond to the following representation of the 3PL model: where a i indicates the item discrimination parameter, b i indicates the item difficulty parameter, and c i indicates the guessing, or lower asymptote, parameter. The star notation is used for these item parameters only to avoid confusion with previously defined notation.
Note that the 3PL presented in Equation (10) is not a special case of the FMP model as presented in Equation (1). However, for dichotomous item responses, an FMP model may be specified that includes upper (d i ) and/or lower (c i ) asymptotes on the item response functions [26] as follows: If not otherwise specified, we may assume that all d i = 1 and all c i = 0. In many flexmet functions, upper and lower asymptote parameters may be specified using the cvec and dvec arguments (note that flexmet currently does not allow these asymptote parameters to be estimated with the fmp or fmp_1 functions; however, most other flexmet functions accommodate asymptote parameters). These asymptote parameters are unaffected by transformations of the latent trait metric. In the code below, the tcals parameters are read in first. Then, an object fmp_pars is defined that transforms the 3PL a i and b i parameters to the polynomial coefficients required by the FMP model. Then, a sequence of θ values is generated between −4 and 4, and item response probabilities are calculated for each item and θ value. The expected percentage-correct (PC) scores are then calculated by summing the item response probabilities for each θ value and multiplying them by 100/85 (where 85 is the number of test items). We then regress the θ values on the PC scores using monotonic polynomial regression [27] as implemented in version 0.3-10 of the Monopoly package [28] for R. Several regressions are fit using k values (which are used as k θ values) from 0 to 5, where, as before, 2k + 1 indicates the maximum polynomial degree. Finally, the average regression residual is printed for each k value. The average regression residuals indicate that higher k values lead to better approximations of the desired curve. However, choosing a higher k value will lead to many higher-order coefficients for the transformed item parameters, so we may choose to use the smallest k value that yields an acceptable level of accuracy. For instance, if we decide that an average regression residual of less than 0.1 is acceptable, then we may choose to move forward with the k = 1 transformation. The estimated regression coefficients then become the polynomial coefficients with which to transform fmp_pars (i.e., the t coefficients for Equation (7) with k θ = 1). We can implement this transformation by applying the transform_b function in flexmet to each row of fmp_pars, specifying the θ transformation coefficients in the tvec argument, as illustrated below. Note again that the c i parameters remain unchanged during FMP parameter transformations, and so the argument cvec = tcals$c should be passed to any flexmet function that makes use of these parameters. For example, we may compare two methods of calculating trait estimates on the transformed θ /PC metric: first, estimating person parameters on the θ metric and then applying the polynomial transformation; and estimating person parameters directly on the θ /PC metric. In this example, we use a standard normal prior to estimate EAPs on the θ metric, and a N(80, 10) prior to estimate EAPs directly on the θ /PC metric. This prior is specified by feeding the output of a call to the int_mat function to the int argument. In the example code shown below, both priors are specified with 89 quadrature points, the prior on θ is bounded between 0 and 4, and the prior on θ is bounded between 0 and 100. Finally, the EAPs that were estimated on the θ metric are transformed to the θ /PC metric using the inv_poly function.
set.seed(987) dat <-sim_data(bmat = fmp_pars, theta = rep(0, 1000), maxncat = 2, cvec = tcals$c) theta_ests <-th_est_eap(dat = dat, bmat = fmp_pars, maxncat = 2, cvec = tcals$c, int = int_mat(dnorm, list(mean = 0, sd = 1), lb = -4, ub = 4, npts = 89)) tstar_ests <-th_est_eap(dat = dat, bmat = transformed_bmat, maxncat = 2, cvec = tcals$c, int = int_mat(dnorm, list(mean = 80, sd = 10), lb = 0, ub = 100, npts = 89)) theta_ests_transformed Comparing the distribution of trait estimates calculated with both approaches, we see that the scores are relatively similar (the small difference in estimated distributions may be due to the use of different prior distributions). However, a major advantage of estimating parameters directly on the θ /PC metric is that the estimated standard errors are on the appropriate metric. That is, nonlinear monotonic transformations of the latent trait metric can lead to unexpected distortions of the information matrices [19,29]. An illustration of the effects of this transformation is illustrated in Figure 4, and the code to reproduce a version of this figure is included in Appendix F. This code makes use of the iif_fmp function in flexmet, which has similar arguments to the irf_fmp function and yields item information for any FMP items and θ values. In the upper panels of Figure 4, three example IRFs (taken from the 85 tcals items) are displayed on the θ metric and after being transformed to the θ /PC metric. Notice that although the shape of each IRF changes, the relative relationships among the three curves do not change. In the bottom panels, the test information function (i.e., the sum of item-level information functions) is illustrated for both the θ and θ metrics. Whereas test information on the θ metric is unimodal and centered just below θ = 0, the θ metric is bimodal with peaks around θ = 35 and θ = 90. This result implies that the items that the trait regions measured most accurately on the θ metric are not necessarily those measured most accurately on the θ metric. This result also highlights the importance of calculating information directly on the metric on which scores are reported.

Summary
This paper provides an introduction to the specification and use of the FMP model through the flexmet package for R. Version 1.1 of flexmet includes functionality to estimate FMP item parameters for both dichotomous and polytomous items, as well as methods to transform FMP item parameters, generate FMP item parameters and data, and several other methods that are particularly useful when working with the FMP model. The provided examples also illustrate some scenarios in which it may be desirable to use the FMP model, including fitting IRFs that can take on more flexible shapes than allowed by traditional models and transforming item response models to user-defined metrics (e.g., a sum score or proportion-correct metric). Moreover, it is hoped that broader use of the flexmet package will lead to further development of the flexmet package, including planned features such as more choices of prior distributions for Bayesian estimation and the estimation of upper and lower asymptote parameters. Although not every feature of the current flexmet package is illustrated in the examples presented above, we hope that this tutorial provides the reader with an accessible introduction to the FMP model and the tools to use the FMP model with the flexmet package.
Funding: This research received no external funding.

Informed Consent Statement: Not applicable.
Data Availability Statement: All information required to fully reproduce the results in this paper is included in text.
Acknowledgments: Many thanks to Xing Chen for his assistance testing code and suggesting improvements and to three anonymous reviewers for their suggestions in improving the readability and accessibility of this tutorial.

Conflicts of Interest:
The author declares no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A. Relationship of the FMP Model (k = 0) to Equivalent Models in Other Software Packages
The purpose of this appendix is to illustrate how to transform 2PL and GPCM item parameters from various software packages to the item parameters used by the FMP model with k = 0. These transformations are useful for fully understanding the FMP model and some of its applications (e.g., see Section 4.2). Although these examples are in the context of specific R packages, they are broadly representative of the most common parameterizations of the 2PM and GPCM, and well as variations of these models such as the 3PM (see also Section 4.2) and the partial credit model.
In the following examples, 2PL or GPCM data are generated using the flexmet package (item parameter and data generation are described in Section 3). The data are then fit to the 2PL or GPCM using the ltm version 1.1-1 [14], mirt version 1.34 [15], and TAM version 3.7-16 [16] packages, the estimated item parameters are transformed to FMP parameters. The equivalence of the different parameterizations is then verified by comparing category response probabilities using flexmet's irf_fmp function (introduced in Section 2) and each package's inbuilt function for calculating response probabilities. Throughout, packages are called explicitly using R's :: syntax to avoid potential package conflicts and to clarify which commands belong to which packages. Note also that this appendix uses flexmet functions and conventions that are more fully described within the main tutorial.

Appendix A.1. Two-Parameter Model
In Equation (1), the portion of the numerator within the exp() function when k = 0 and C i = 2 (i.e., the conditions which make the FMP model equivalent to the 2PL) equals The following code demonstrates how to find the FMP b 0i and b 1i parameters from other parameterizations of the 2PL. To begin, we first generate 2PL data.
set.seed(987) population_pars <-flexmet::sim_bmat(n_items = 10, k = 0, ncat = 2)$bmat dat <-flexmet::sim_data(bmat = population_pars, theta = rnorm(1000)) The ltm package with option IRT.param = FALSE and the mirt package both parameterize the 2PL in the same way as the FMP model. This is illustrated first for ltm. library(ltm) # must be loaded in for the ltm() function to work ltm_fit1 <-ltm::ltm(dat~z1, IRT.param = FALSE) ltm_fit1_FMP <-ltm::coef.ltm(ltm_fit1) The first column of ltm_fit1_FMP is labeled (Intercept) and is equivalent to b 0i . The second column of ltm_fit1_FMP is labeled z1 and is equivalent to b 1i . This is verified by comparing predicted response probabilities. When fitting the 2PL in mirt, the same parameterization is used as for the FMP model, but mirt will print the b i1 parameters (labeled a in mirt) before the b 0i parameters (labeled d inmirt), so the columns must be reordered. A common way for the 2PL to be parameterized is in terms of the difficulty parameter di f f i and a discrimination parameter disc i . In this parameterization, the expression inside the exp() (see Equation (A1)) is Therefore, the FMP b 01 = −di f f i disc i and the FMP b 1i = disc i . This difficultydiscrimination parameterization is used by the ltm package with option IRT.param = TRUE, as illustrated next.
# library(ltm) # ltm package must be loaded to make the following run ltm_fit2 <-ltm::ltm(dat~z1, IRT.param = TRUE) head(ltm::coef.ltm(ltm_fit2)) Here, we see that ltm stores the difficulty parameters in the first column and the discrimination parameters in the second column. Next, we transform these parameters to b 0i and b 1i and verify that these parameters yield the same FMP response probabilities as ltm's inbuilt function. Finally, the TAM package uses a slightly different expression for the 2PL than the previously explored packages. Namely, the expression within the numerator exp() (see Equation (A1)) is Therefore, the TAM 2PL model output is related to the FMP model in that b 0i = −ξ i and b 1i = B i . Again, we can verify that this transformation results in the same predicted response probabilities (up to rounding error). Note that for the TAM package, the function that calculates response probabilities is located in the CDM package and appears to only be available for a certain grid of θ values. The 9th, 11th, and 13th of these θ values are used below for illustration. In Equation (1), the portion of the numerator within the exp() function when k = 0 and C i > 2 (i.e., the conditions that make the FMP model equivalent to the GPCM) equals The following code demonstrates how to find the FMP b 0 c i and b 1i from other parameterizations of the GPCM. To begin, we first generate GPCM data with five response categories.
rm(list = ls()) # clear environment set.seed(876) population_pars <-flexmet::sim_bmat(n_items = 10, k = 0, ncat = 5)$bmat dat <-flexmet::sim_data(bmat = population_pars, theta = rnorm(1000), maxncat = 5) The ltm package with the IRT.param = FALSE option produces GPCM item parameter estimates that are already in the same format that the flexmet package uses for the FMP model. library(ltm) # must be loaded in for the gpcm() function to run ltm_fit1 <-ltm::gpcm(data = dat, IRT.param = FALSE) head(ltm::coef.gpcm(ltm_fit1)) Here, columns Catgr.1 through Catgr.4 include parameters b 0 1 i through b 0 4 i , and the Dscrmn column includes the b 1i parameters. We can verify that these are equivalent by comparing the response probabilities calculated from each package. Here (and through the rest of this section), any minor differences in predicted probabilities can be attributed to rounding.  Next, the mirt package with option itemtype = "gpcmIRT" is very similar to the ltm package with option IRT.param = TRUE. The expression within the numerator exp() (see The mirt package also includes the itemtype = "gpcm" option for fitting the GPCM. Here, the expression in the numerator exp() (see Equation (A4)) is ca i θ + d i (note the lack of the summation operator), where c is the response category (e.g., 1, 2, 3, or 4). Because this expression is equivalent to the cumulative sum in the FMP model expression, some of the FMP-equivalent parameters must be found through subtraction. Namely, the FMP b 1i = a i , b 0 1 i = d 1 , and b 0 c i = d c − d c−1 for c > 1. Below, we fit the model and inspect the outputted model parameters.
mirt_fit2 <-mirt::mirt(as.data.frame(dat), model = 1, itemtype = "gpcm") mirt_pars2 <-mirt::coef(mirt_fit2, simplify = TRUE)$items head ( Note in the above that the columns beginning with B.Cat are integer multiples of each other; that is, the values in B.Cat2.Dim1 are double those in B.Cat1.Dim1, those in B.Cat3.Dim1 are triple those in B.Cat1.Dim1, and so forth. Therefore, we only need the B.Cat1.Dim1 column, which is equivalent to the FMP b 1i . The columns beginning with AXsi are equivalent to the negative d i parameters in the previous mirt example (i.e., mirt with option itemtype = "gpcm"). As such, subtraction could be used to find the FMP parameters in the same manner as the previous mirt example. However, it is simpler to use the estimated xsi parameters in TAM, printed below. Notice that for item 1, the xsi for category 1 is the same as AXsi_.Cat1, the xsi for category 2 equals AXsi_.Cat2 -AXsi_.Cat1, and so forth. Therefore, the negative xsi parameters are equal to the FMP b 0 c i parameters. The following code builds a matrix of FMP parameters from the TAM output and verifies that both methods produce the same category response probabilities.