Next Article in Journal
Fatherhood Is Associated with Increased Infidelity and Moderates the Link between Relationship Satisfaction and Infidelity
Previous Article in Journal
Translation and Validation of the Mindful Self-Care Scale—Chinese Version: A Pilot Study
Previous Article in Special Issue
Automated Test Assembly for Large-Scale Standardized Assessments: Practical Issues and Possible Solutions
 
 
Article
Peer-Review Record

Better Rating Scale Scores with Information–Based Psychometrics

Psych 2020, 2(4), 347-369; https://doi.org/10.3390/psych2040026
by James Ramsay 1,*, Juan Li 2 and Marie Wiberg 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Psych 2020, 2(4), 347-369; https://doi.org/10.3390/psych2040026
Submission received: 14 October 2020 / Revised: 25 November 2020 / Accepted: 26 November 2020 / Published: 15 December 2020
(This article belongs to the Special Issue Learning from Psychometric Data)

Round 1

Reviewer 1 Report

The authors present a model that represents performance as a space with a metric structure by transforming probability into surprisal or information and provide results in easily (ore easier than IRT convection) interpretable interval [0,100]. I find the manuscript to be an important contribution to the field and the ideas presented by authors worth discussing. It is well written, clear, very easy to follow, and well-structured without any major mistake that I could spot.

On the other hand, I feel that promises of the authors e.g. “We propose several modifications of the psychometric testing theory that together demonstrate remarkable improvements in the quality of rating scale scores.” are a little bit overate. The presented model is interesting but 1) has some important drawbacks compared to existing methodology 2) has roots in existing models that were not explicitly presented 3) and in my opinion, its usefulness was not fully proven.

 

  • The comparisons with the existing models are not fully fair. Authors are claiming that “In this paper, we will use some new techniques to further improve what the methodology has to offer, and we will point out along the way some possible limiting factors that may explain why IRT has failed to dislodge the sum score.” But those problems are basically a) negative index values (thetas); b) the concept of infinity used in theta scale c) estimation efficacy. The alternative has proposed some problems were solved but some new arise and this, in my opinion, was not stressed enough. Problems with infinity were replaced by boundary conditions. Problems with estimations were replaced by arbitrary decisions on numbers of bins and size of jittering, the order of splines, the type of smoothing, stopping rules, etc. Additionally, the fact that parametric IRT models are better solutions for problems like CAT, missing-by-design plans, item banking, etc. has been omitted. I understand that the primary goal is to emphasize the merits of a new approach, but I feel that the work would merit a more balanced evaluation.
  • In my understanding the presented approach is a clever combination of two existing approaches a) Non-parametric IRT models and b) equating literature, especially kernel equating (von Davier, Holland, & Thayer, 2004). I think that better literature review and direct links to those roots should be provided
  • I would be fully convinced of the proposed model if authors could provide simulations showing how their approach copes data that are generated from simple IRT models (and possibly models that validate the IRT models) compared to established methods.

Additionally, as I previously mention presented model requires relatively many arbitrary decisions on numbers of bins and size of jittering, the order of splines, the type of smoothing, etc.  the sensitivity analysis showing how the model copes with deferent settings and how robust results are to different specifications.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

This is a potentially interesting manuscript which reports on a non-parametric approach to psychometric scaling. The study seems technically correct, however, the structure and motivation of the study is not very well worked out. That is, the new approach is not explicitly contrasted to existing approaches by which it is hard to judge the merit of the present paper. In addition, it is not explicated how this study differs from existing studies. Please see my more specific comments below:

From the introduction section 1, it does not become clear what exact problem the authors try to solve. That is, in the final last paragraph of section 1 they state:

“…we will point out along the way some possible limiting factors that may explain why IRT has failed to dislodge the sum score”.

It is not clear what problem with the sum score the authors refer to (the authors discuss some problems with the sum score on page 2, but these are problems inherent to classical test theory but not to IRT). In addition, I did not get any important limitations of parametric IRT ‘along the way’. A hint is given on page 5 where the authors state that

“Conventional IRT has been constrained to use exceedingly elementary and inflexible models for probability curves for this reason.”

but this argument is more a speculation than that it really demonstrates a problem with IRT that can be solved by the present approach. Ideally the paper is rewritten in such a way that it is clear from the introduction section 1) what the problem is with IRT; 2) why existing approaches cannot be used to tackle this problem, and 3) why the present approach is able to address this problem. In addition, in the real data illustration it can be show what goes wrong if IRT is used on the data as compared to the present approach.

 

Other comments:

- It is unclear how the present study differs from previous work by Ramsay and Wiberg (2017) and Ramsay, Li, and Wiberg (2020), and why this study is warranted.

- I think the title is too general (when I read it, I didn’t know what to expect from the paper) maybe a more specific title can be used.

-p1 line 14: “Tens of thousands of self-report rating scales are devised each year..” is it really that many? Where is this number based on?

-p1 line 15: “to provide numerical summaries” (drop “a”)

-p1 line 18: Taking a sum score is common practice in classical test theory, but in modern test theory (IRT) there are better ways to obtain a aggregated score

-p1 l25: although certainly interesting it is not common in a scientific article to reference using an URL

-p2 l33: it is very uncommon to incorporate acknowledgements into the main text

p2 Figure 1: If I understand correct: this is a histogram? Please depict it as a histogram or mention this in the figure caption

p8: line 210 “the number OF bins”

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

In my last assessment, I have concluded that I find the manuscript to be an important contribution to the field and the ideas presented by authors worth discussing. It is well written, clear, easy to follow, and well-structured without any major mistakes that I could spot. The new version with clarifications and small corrections is even more convincing.

I of course recommend the paper for publication although I'll allow myself to point out a few small things.

I believe that discussions of the proposed method should more explicitly show limitations. Treating of meetings is one of it (also in the context of questionnaires) and CAT (used not only for cognitive testing). I think the week sides of the models should be more stressed out. This also will allow authors to show clearer directions for further development. Pointing out problems that could be potentially solved could only help this approach.

I still think that direct comparisons with IRT models should be performed. If not in this paper it should be noted that this is something that needs to be done in the nearest future to strengthen the proposition.

 

Author Response

please see the attachment.

 

Thank you very much for your help.

Author Response File: Author Response.pdf

Reviewer 2 Report

I thank the authors for their response to my earlier comments. Unfortunately, the authors have not been very responsive to my comments: In their response, they argue that an explicit comparison between their new approach and IRT is not the aim of the paper. I agree that maybe an elaborate simulation study warrants a study on its own. However as the new approach is explicitly contrasted to parametric IRT throughout the paper, I think the paper needs at least a run of an IRT model to the current data to show the differences in the results with the new approach.

Another concern that I raised, the authors didn't reply to at all. This includes the following comments from my previous review:

  • It is not clear what problem with the sum score the authors refer to (the authors discuss some problems with the sum score on page 2, but these are problems inherent to classical test theory but not to IRT).
  • In addition, I did not get any important limitations of parametric IRT ‘along the way’. A hint is given on page 5 where the authors state that
    “Conventional IRT has been constrained to use exceedingly elementary and inflexible models for probability curves for this reason.”
    but this argument is more a speculation than that it really demonstrates a problem with IRT that can be solved by the present approach.
  • Ideally the paper is rewritten in such a way that it is clear from the introduction section 1) what the
    problem is with IRT; 2) why existing approaches cannot be used to tackle this problem, and 3) why the present approach is able to address this problem.

Author Response

please see the attachment.

 

Thank you very much for your help.

Author Response File: Author Response.pdf

Back to TopTop