Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Extending Applications of Generalizability Theory-Based Bifactor Model Designs

Psych 2023, 5(2), 545-575; https://doi.org/10.3390/psych5020036

by Walter P. Vispoel^1,*

, Hyeryung Lee¹

, Tingting Chen¹

and Hyeri Hong²

Reviewer 1:

Leah M. Feuerstahler

Reviewer 2: Anonymous

Psych 2023, 5(2), 545-575; https://doi.org/10.3390/psych5020036

Submission received: 29 April 2023 / Revised: 25 May 2023 / Accepted: 6 June 2023 / Published: 13 June 2023 / Corrected: 28 May 2024

(This article belongs to the Special Issue Feature Papers in Psych)

Round 1

Reviewer 1 Report

Overall, this paper is very well-written and provides a thorough but accessible introduction to GT as well as an extension to a bifactor structure. Although the numerous tables of equations and results could be considered excessive, I believe these can serve as an effective reference for researchers who wish to learn the methods described in this paper by reproducing your results (especially in concert with the supplementary file). My suggestions are minor, but I believe that these would increase the effectiveness of your work.

The discussion could include more about the limitations and assumptions of the GT approach. I believe this is especially important given the statement on p. 7 "GT requires no explicit assumptions about the content of the universe or statistical properties of scores", which could be easily misinterpreted to mean that this method could be broadly applied without regard for test development best practices. As an example of what I mean, some D studies assume that it is feasible and possible to write parallel items to extend the test length. There are certainly other assumptions/considerations that the reader should be reminded of.

Could you also include some discussion of whether items should be treated as continuous and/or categorical, and what considerations arise from using a linear vs. nonlinear model in lavaan?

The supplement would be most effective if readers had access to the data set that you analyze. It's not clear to me whether "bfi2_open_389.csv" is publicly available. If it is not possible to make that file available, I would recommend that the researchers include a fake (i.e., simulated) data file with similar properties as the BFI example, so that readers can exactly reproduce the results presented in the supplement.

Minor check for spelling and correct word usage - please also carefully check the supplement.

Author Response

We thank the reviewer for taking the time to evaluate our manuscript and supplement and have done our best to address or respond to his or her suggestions. Below, we reproduce the reviewer’s comments and our responses to them.

Comments and Suggestions for Authors

We thank the reviewer for these supportive comments.

We appreciate this suggestion and have attended to it by changing the highlighted sentences on page 10 as shown below.

2.3. Confidence Intervals

Applications of GT are based on three primary assumptions: (1) the universe(s) of generalization is/are clearly defined, (2) facet conditions are experimentally independent, and (3) scores are expressed on equal interval metrics [1} [p. 145]. Consistent with ANOVA procedures that form the foundation for GT analyses, facet conditions themselves (items, occasions, raters, etc.) are treated as unordered [1]. As previously noted, facet conditions also are considered randomly sampled from or exchangeable with others within the broader universe(s) from which they are drawn. However, because no explicit assumptions are made about the content of the universe or statistical properties of scores, fit indices for the overall GT-SEM are not required. Instead, Monte Carlo confidence intervals can be built around estimates of variance components, G coefficients, D coefficients, and proportions of measurement error to evaluate their trustworthiness.

Could you also include some discussion of whether items should be treated as continuous and/or categorical, and what considerations arise from using a linear vs. nonlinear model in lavaan?

Hopefully, the modifications just described will satisfy the reviewer. As noted above, scores in traditional GT analyses are presumed to be on equal interval scales. The ANOVA procedures that form the foundation for GT analyses treat facet conditions as being unordered, thereby addressing the question of linear versus non-linear relations between facet conditions and scores. SEMs used in GT analyses simply serve as computational tools for deriving variance components that are traditionally obtained using ANOVA-based procedures. We also mention at the end of the discussion section that the analytical procedures demonstrated in the article can extended to control for scale coarseness effects common with binary and ordinal data, and direct readers to articles in which these procedures have been used.

To address the reviewer’s suggestion, we included a dataset with 200 cases randomly selected from the original dataset to allow readers to apply the code to a new dataset that should approximate the original results.

Comments on the Quality of English Language

Minor check for spelling and correct word usage - please also carefully check the supplement.

We did, as suggested, to address such possible issues.

We again the reviewer for his or her helpful feedback and hope that all concerns have been adequately addressed.

Respectfully yours,

The authors

Reviewer 2 Report

This is an interesting manuscript on bi-factor model based generalizability theory. The manuscript is well written, clear, and the real data illustration is interesting. However, from the manuscript it is unclear what the novel contribution is of this study. That is, Vispoel and colleagues already introduced the present idea thoroughly in different publications (2022a, 2022b, 2023a, 2023b). To motivate the present study, the present authors only mention that:

“In this article, we extend the work of Vispoel and colleagues into GT-based bifactor models to allow for derivation of consistency and agreement indices reflecting both relative and absolute differences in scores and demonstrate explicitly how changes in measurement procedures might affect bifactor model-based indices of generalizability, dependability, measurement error, scale viability, and subscale added value.”

However, the work by Vispoel is not extended. The formulas presented are just straightforward to obtain from the SEM specification given in earlier publications. Thus, a researcher probably wont need the results from the present study to be able to calculate indices from this paper (it is just straightforward SEM). Therefore, I think the manuscript will have more impact as a tutorial paper. Or, the authors should convincingly argue what the key challenges are that warrant a new study.

Author Response

We thank the reviewer for taking the time to evaluate our manuscript and supplements and have done our best to address or respond to his or her suggestions. Below, we reproduce the reviewer’s comments and our responses to them.

Comments and Suggestions for Authors

Actually the work be extended to include derivation of indices for absolute error within bifactor models and previous studies did not include prophecy formulas.

We thank the reviewer for these comments but must respectfully disagree with the conclusion that the present study does not extend the work of Vispoel and colleagues. None of their papers about GT-bifactor models shown below included derivation of variance components for absolute error, global dependability coefficients, or cut-score specific dependability coefficient. Their studies also did not include the comprehensive sets of equations for bifactor model-based indices presented in Tables 1, 2, 4, and 5. In previous treatments of GT (including Brennan, 2001), prophecy formulas are limited to G coefficients, global D coefficients, overall absolute error, and overall relative error. Formulas presented in prior research also have been applied almost exclusively to GT univariate rather than bifactor analyses. In the current study, we extended prophecy formulas to include individual sources of measurement error, omega total coefficients, omega hierarchical coefficients, and value-added indices, with some of this appearing in the supplemental materials. In fact, the reviewer’s comments that we highlighted above acknowledge these very extensions. In contrast to the other Vispoel et al. studies mentioned, we also included confidence intervals for all key parameters, provided code for constructing them, and presented partitioning of variance in relation to both relative and absolute error. Finally, the equations used here in the tables are more flexible and different than those presented in the previous articles about GT-bifactor analyses. Many of these same points are reiterated in the abstract as well as at the beginning and end of discussion section as shown below. We also are confused about the basis of the reviewer’s conclusions given that some of the cited studies have not been published. A more careful reading of the current paper and the first two cited below should make their distinctiveness readily apparent.

Abstract: In recent years, researchers have described how to analyze generalizability theory (GT) based univariate, multivariate, and bifactor designs using structural equation models. However, within GT studies of bifactor models, variance components have been limited to those reflecting relative differences in scores for norm-referencing purposes, with only limited guidance provided for estimating key indices when making changes to measurement procedures. In this article, we demonstrate how to derive variance components for multi-facet GT-based bifactor model designs that represent both relative and absolute differences in scores for norm- or criterion-referencing purposes using scores from selected scales within the recently expanded form of the Big Five Inventory (BFI-2). We further develop and apply prophecy formulas for determining how changes in numbers of items, numbers of occasions, and universes of generalization affect a wide variety of key indices instrumental in determining the best ways to change measurement procedures for specific purposes. These indices include coefficients representing score generalizability and dependability; scale viability and added value; and proportions of observed score variance attributable to general factor effects, group factor effects, and individual sources of measurement error. To enable readers to apply these techniques, we provide detailed formulas, code in R, and sample data for conducting all demonstrated analyses within this article and extended online Supplemental Material.

Near beginning of Discussion section

Our purpose in the analyses described in this article was to expand upon the work of Vispoel and colleagues to derive indices of score dependability, in addition to generalizability, and demonstrate techniques for estimating effects of altering measurement procedures on a wide variety of key indices that included G coefficients, D coefficients, omega total and hierarchical coefficients, proportions of measurement error, and indices of scale viability and added value. These techniques were further expanded to produce confidence intervals surrounding estimates of those parameters to gauge their trustworthiness. Collectively, the results underscored the practical value of the demonstrated techniques for evaluating and improving measurement procedures and their potential for becoming standard techniques routinely applied to GT-bifactor designs.

Summary and Future Extensions

Our goals in this article were to demonstrate how GT-bifactor designs can be extended to derive variance components needed to estimate dependability coefficients when using scores for criterion-referenced purposes and to determine how key indices are affected by changes make to measurement procedures. Estimated indices included G coefficients; D coefficients; proportions of observed score variance accounted for by general factor, group factor, and measurement error effects; common to unique explained variance ratios; and subscale value-added ratios. We also built Monte Carlo-based confidence intervals around those estimates to evaluate their trustworthiness.

Vispoel, W.P.; Lee, H.; Xu, G.; Hong, H. Integrating bifactor models into a generalizability theory structural equation modeling framework. J Exp. Educ. 2022a. Advance online publication. https://doi.org/10.1080/00220973.2022.2092833

Vispoel, W.P.; Lee, H.; Xu, G.; Hong, H. Expanding bifactor models of psychological traits to account for multiple sources of measurement error. Psychol. Assessment 2022b, 32(12), 1093-1111. https://doi.org/10.1037/pas0001170

Vispoel, W.P.; Lee, H.; Hong, H.; Chen, T. Analyzing and comparing univariate, multivariate, and bifactor generalizability designs for hierarchically structured personality traits. J. Pers. Assess. 2023. (submitted).

Submission Date

29 April 2023

Date of this review

22 May 2023 11:36:04

We again thank the reviewer for his or her helpful feedback and hope that all concerns have been adequately addressed.

Respectfully yours,

The authors

Round 2

Reviewer 2 Report

The authors convincingly pointed out in their revised manuscript how the current study adds to the existing literature. Therefore, my main concerns are taken away, and I think the manuscript is ready for publication. I thank the authors for being responsive to my comments

Article Menu

Extending Applications of Generalizability Theory-Based Bifactor Model Designs

Further Information

Guidelines

MDPI Initiatives

Follow MDPI