Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Modelling Norm Scores with the cNORM Package in R

Psych 2021, 3(3), 501-521; https://doi.org/10.3390/psych3030033

by Sebastian Gary¹, Wolfgang Lenhard^1,*

and Alexandra Lenhard²

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Psych 2021, 3(3), 501-521; https://doi.org/10.3390/psych3030033

Submission received: 25 July 2021 / Revised: 25 August 2021 / Accepted: 26 August 2021 / Published: 30 August 2021

(This article belongs to the Special Issue Computational Aspects, Statistical Algorithms and Software in Psychometrics)

Round 1

Reviewer 1 Report

The cNorm Package in R is thoroughly presented for determining norm scores when the latent ability covaries with age, grade level, or other explanatory variables. The package uses polynomial regression to model relations between raw scores, norm scores, and the explanatory variable.

The cNORM norming approach reduces the problem of test norming to a model selection problem. Although other issues, eg. test administration were not addressed. The article is clear in noting that to find adequate test norms, it is necessary to determine the coefficients of equation (4), and, therefore, to find a polynomial regression model describing the norming sample as precisely as possible with a minimal number of predictors. This is related to “All possible subset regression” with a model fit criteria. (Section 3.2), P. 5. A few other references could be added for All-possible subset regression, the current reference is in 2002. Otherwise norming errors, are discussed via use of stratified sampling. Overfitting well discussed.

Appears to mention single or multiple explanatory variables, but could be further explained via equation example in text. Explanatory variables can be continuous or categorical as noted.

Model fit using R-square adjusted, Mallow CP, AIC, or BIC. P. 16 Compares predicted and observed scores via RMSE and cross-fit using training set. Provides a rule of thumb criteria [.90; 1.10] if not a good model fit. This is also well discussed. Future research could be suggested to explore this rule of thumb.

Appears to overcome large age intervals , missing scores, and sampling error, and sample size. Missing scores was indicated as a simple deletion in raw score data set. Not sure how cNorm handles sample size issues.

Mentioned that cNorm generates Norm Tables for end user, would be nice to include in Appendix.

Is this an accurate statement, “IRT was seen as the more elaborate (but less used) framework for test development”. P. 1

Also, norming procedure in cNorm “reducing the error variance usually introduced by the norming procedure”. P. 3 Can this be elaborated in the example - before and after analysis?

Argument for Latent Transition Analysis given by “in the case of covariance between latent trait and explanatory variable, the percentile curves must develop continuously and systematically, but not necessarily monotonically across the explanatory variable. For example, the average raw scores of a test scale measuring fluid reasoning increase from childhood to early adulthood but later on, they decrease.” P. 3 Mentioned this, “Therefore, the norming results should also be validated from a theoretical point of view regarding the measured latent ability.” P. 16 This argument indicates that problems could occur if different LTA or lagged missing age groups, correct?

Author Response

Reply to Reviewer 1

“The cNorm Package in R is thoroughly presented for determining norm scores when the latent ability covaries with age, grade level, or other explanatory variables. The package uses polynomial regression to model relations between raw scores, norm scores, and the explanatory variable. The cNORM norming approach reduces the problem of test norming to a model selection problem. Although other issues, eg. test administration were not addressed. The article is clear in noting that to find adequate test norms, it is necessary to determine the coefficients of equation (4), and, therefore, to find a polynomial regression model describing the norming sample as precisely as possible with a minimal number of predictors. This is related to “All possible subset regression” with a model fit criteria. (Section 3.2), P. 5. A few other references could be added for All-possible subset regression, the current reference is in 2002. Otherwise norming errors, are discussed via use of stratified sampling. Overfitting well discussed.”

Thank you for the supportive feedback and the suggestion for inclusion of additional literature. We have included several additional references on best subset regression.

“Appears to mention single or multiple explanatory variables, but could be further explained via equation example in text. Explanatory variables can be continuous or categorical as noted.”

We would like to refrain from extending the article this way due to two reasons: First, it is a complicated question when to use covariates, which can only be answered on the content level of a test and the field of application. Second, the complexity of the text would increase and we would like to keep it simple. Indeed, cNORM can model further covariates, for example by forcing in additional predictors into the regression. It is as well capable of doing weighted ranking. But these features were so far not the main goal of the package development and to date, we cannot yet fully assess the performance of cNORM under these conditions.

“Model fit using R-square adjusted, Mallow CP, AIC, or BIC. P. 16 Compares predicted and observed scores via RMSE and cross-fit using training set. Provides a rule of thumb criteria [.90; 1.10] if not a good model fit. This is also well discussed. Future research could be suggested to explore this rule of thumb.”

Thank you for this point. Indeed, the initial rule of thumb parameters is only a starting point. cNORM provides a lot more methods for deciding on optimal parameter setting, especially cross validation. This was already described in the article, but we now place a stronger emphasis on this point.

“Appears to overcome large age intervals , missing scores, and sampling error, and sample size. Missing scores was indicated as a simple deletion in raw score data set. Not sure how cNorm handles sample size issues.”

On the technical basis, cNORM automatically checks the preconditions and issues warnings if the sample size is low, or the explanatory variable explains only a very small share of variance. Analyses on sample size requirements were already thoroughly presented in source [11] and thus, we only relate to this reference, but included some additional remarks as well.

“Mentioned that cNorm generates Norm Tables for end user, would be nice to include in Appendix.”

We added example norm tables as electronic support material.

“Is this an accurate statement, “IRT was seen as the more elaborate (but less used) framework for test development”. P. 1”

Indeed, it is a strong statement, but we nonetheless think, it is true. The dominance of CTT over many decades is well documented and at the same time, IRT offers many very elaborated techniques.

“Also, norming procedure in cNorm “reducing the error variance usually introduced by the norming procedure”. P. 3 Can this be elaborated in the example - before and after analysis?”

There is another publication (Lenhard & Lenhard, 2020) on the different sources of norming error and that assesses, how norm scores reflect the latent trait of a person. It is based on simulated data, where we can specifically assess, how well the conventional and the continuous norm scores capture the latent ability. Thus, we can estimate the error variance introduced by norming or avoided by continuous norming respectively. We added references at the according text passage.

It is a fascinating thought to work in the direction of LTA, which we will take into consideration in the further development of the package. With regard to norming, it is of course always necessary to cover the age range as precise as necessary. The mentioned problems with regard to missing age groups should occur not depending on the chosen approach, but because of the missing representativeness of the sample with regard to the age progression (the age progression can’t be modelled accurately when certain age groups are excluded from the sample). Our point is that not only statistical criteria should be used to evaluate the model, but also the theoretical background. For example, the statistical model for a fluid reasoning test could show a high model fit without describing the right age progression (increase until middle age, then decrease).

Author Response File: Author Response.docx

Reviewer 2 Report

Establishing stable norms is a huge endeavor and investment for test publishers but often complicated by the challenges in acquiring sizable and representative samples for the task. This paper discusses the continuous norming approach implemented in the cNORM package, an open-source R package. Continuous norming methods have been used by testing companies for decades; however, relatively few resources/publications have been available to psychometricians as best practices and guidelines. Most test publishers use proprietary software programs for norming and the practice is not openly discussed let alone distributed. To that end, the development of the cNORM package in R is very timely and commendable, and this paper in the current tutorial format seems well-suited for a wider dissemination of the methods and associated tools. I just have a few suggestions to potentially increase the acceptance of the methods/tools by test publishers and practitioners from the field.

Historically, smoothing techniques such as cubic spline and log-linear models have been used to “continuize” discrete and choppy score distributions and/or percentile ranks. There are similarities and differences with the new methods, and the practitioners who are more used to the traditional methods would appreciate how the new method differs from them and why they should consider it. Adding a paragraph early on would be beneficial in my view.

Another potential issue is the use of the term “latent” variable throughout the manuscript. Traditionally, the purpose of smoothing is to reduce sampling errors at the expense of introducing some bias. That smoothing doesn’t make the taw score variable a latent variable. In order for a variable to be defined as latent, there has to be some functional relationship defined with its manifest counterpart. Perhaps it’s the non-parametric (or semi-parametric) nature of the method; however, it seems odd to call a smoothed manifest variable (a combination of raw test scores and an observed explanatory/grouping variable) a latent variable. To be a little more specific, the following sentence in the manuscript seems somewhat strange: “Norming aims at mapping the raw scores of a test to the latent ability.” Perhaps a clarification as to why a smoothed (or regressed) score variable becomes a latent variable would benefit readers who have used the term in the SEM and IRT contexts.

One minor issue is that there are at least two writing styles represented in the manuscript. Without being too explicit, some sections of the manuscript were clearer and without unfamiliar jargons and expressions than others. Although there may be multiple authors who contributed to the manuscript, it seems important that the paper as whole should read like one coherent piece of work.

p. 5, Line 215: The reference #14 in the text doesn’t seem to correspond to the 14^th citation in References.

p. 8, Line 318: Please clarify this statement “Note that in our view, the norm score assigned to a certain raw score equals the estimated latent person parameter \hat{\theta}”

p. 9, Line 406: Explain what model assumptions are being referenced here.

p. 19, Line 701: Explain how the scale reliability is used to establish the confidence intervals.

p. 20, Line 730: “improves the predictive validity of the latent ability”…I am not sure how this can be. Smoothing reduces sampling errors associated with a scale, but predictive validity is established with a separate criterion, typically measured sometime in the future.

Author Response

Reply to Reviewer 2

“Establishing stable norms is a huge endeavor and investment for test publishers but often complicated by the challenges in acquiring sizable and representative samples for the task. This paper discusses the continuous norming approach implemented in the cNORM package, an open-source R package. Continuous norming methods have been used by testing companies for decades; however, relatively few resources/publications have been available to psychometricians as best practices and guidelines. Most test publishers use proprietary software programs for norming and the practice is not openly discussed let alone distributed. To that end, the development of the cNORM package in R is very timely and commendable, and this paper in the current tutorial format seems well-suited for a wider dissemination of the methods and associated tools. I just have a few suggestions to potentially increase the acceptance of the methods/tools by test publishers and practitioners from the field.”

We want to thank you for your supporting assessment. Indeed, this scarcity of research and openly available tools were the starting point for developing cNORM.

“Historically, smoothing techniques such as cubic spline and log-linear models have been used to “continuize” discrete and choppy score distributions and/or percentile ranks. There are similarities and differences with the new methods, and the practitioners who are more used to the traditional methods would appreciate how the new method differs from them and why they should consider it. Adding a paragraph early on would be beneficial in my view.”

We have extended the introduction accordingly. Without diving too deep into other forms of continuous norming, the reader now gets a broader picture on this field of research.

“Another potential issue is the use of the term “latent” variable throughout the manuscript. Traditionally, the purpose of smoothing is to reduce sampling errors at the expense of introducing some bias. That smoothing doesn’t make the taw score variable a latent variable. In order for a variable to be defined as latent, there has to be some functional relationship defined with its manifest counterpart. Perhaps it’s the non-parametric (or semi-parametric) nature of the method; however, it seems odd to call a smoothed manifest variable (a combination of raw test scores and an observed explanatory/grouping variable) a latent variable. To be a little more specific, the following sentence in the manuscript seems somewhat strange: “Norming aims at mapping the raw scores of a test to the latent ability.” Perhaps a clarification as to why a smoothed (or regressed) score variable becomes a latent variable would benefit readers who have used the term in the SEM and IRT contexts.”

Latent trait is not used in a statistical sense here (contrasted by manifest data), but it reflects the theoretical assumption, on which IRT is built. We added a passage with reference to Lenhard & Lenhard (2020), where we could show that continuously modelled norm scores better reflect the latent trait as compared to INT.

We have tried to make the manuscript more consistent.

“p. 5, Line 215: The reference #14 in the text doesn’t seem to correspond to the 14^th citation in References.”

Changed to Lumley (2017)

“p. 8, Line 318: Please clarify this statement “Note that in our view, the norm score assigned to a certain raw score equals the estimated latent person parameter \hat{\theta}””

This is indeed the central point of section 2 of the paper and we accordingly added a reference.

9, Line 406: Explain what model assumptions are being referenced here.

The central assumption is that percentile curves must not intersect. We tried to clarify this and as well specified further aspects like model overfit and implausible trajectories.

19, Line 701: Explain how the scale reliability is used to establish the confidence intervals.

Of course, it would be much better, to have ability dependent intervals, but Added formula based on current cNORM version.

20, Line 730: “improves the predictive validity of the latent ability”…I am not sure how this can be. Smoothing reduces sampling errors associated with a scale, but predictive validity is established with a separate criterion, typically measured sometime in the future.

Many thanks for this suggestion. Indeed, this was too imprecise and led to misunderstandings. We wanted to express that the norm scores better reflect the latent trait. We reformulated the passage accordingly.

Author Response File: Author Response.docx

Article Menu

Modelling Norm Scores with the cNORM Package in R

Further Information

Guidelines

MDPI Initiatives

Follow MDPI