Educator Feedback Skill Assessment: An Educational Survey Design Study

: Background: Delivering impactful feedback is a skill that is difﬁcult to measure. To date there is no generalizable assessment instrument which measures the quality of medical education feedback. The purpose of the present study was to create an instrument for measuring educator feedback skills. Methods: Building on pilot work, we reﬁned an assessment instrument and addressed content and construct validity using expert validation (qualitative and quantitative). This was followed by cognitive interviews of faculty from several clinical departments, which were transcribed and analyzed using ATLAS.ti qualitative software. A research team revised and improved the assessment instrument. Results: Expert validation and cognitive interviews resulted in the Educator Feedback Skills Assessment, a scale with 10 items and three response options for each. Conclusions: Building on the contemporary medical education literature and empiric pilot work, we created and reﬁned an assessment instrument for measuring educator feedback skills. We also started the argument on validity and addressed content validity.


Conceptual Framework
The ultimate goal of assessment practices in professional health education is improved healthcare. High-quality and credible feedback is necessary to provide a meaningful mechanism through which physicians can be expected to grow [1]. Feedback is fundamental for everything we do-it is an essential part of every framework, every curriculum, and every teaching interaction.
Despite the importance of feedback, residents and faculty thought that provider feedback skills were not sufficiently developed [2,3]. Similarly, faculty from both university and community-based programs described having minimal training and a lack of understanding of the best practices for delivering feedback [4], despite the availability of excellent practical guides [5][6][7]. It does not appear that this is a perception issue-a qualitative study of simulated feedback encounters suggested that faculty skills do not match recommended practice in a number of areas [8].
There is growing evidence that utilizing teacher-centered models of feedback is not sufficient to improving the quality of feedback [9][10][11][12][13][14]. Characteristics of feedback providers form one of the three clusters seen when viewing feedback through the lens of the sociocultural model [15]. For example, improving feedback provider skills may in fact improve outcomes. Sargeant and other colleagues have shown that training coaches to conduct a reflective feedback conversation can improve the acceptance and uptake of feedback [16]. Similarly, supportive coaching has been associated with both perceived coach competence and satisfaction in the sports realm [17].

Related Research
In order to explore the intended meaning and breadth of the feedback construct, we completed the following steps in a pilot study [18]. We started by conducting a literature review that aligned the feedback construct with prior research and identified existing feedback scales. We then explored how feedback participants conceptualize and describe feedback. We asked feedback recipients (resident physicians) to select, script, and enact six faculty-resident feedback vignettes.
We then conducted seven faculty focus groups that included 23 feedback providers. We asked the faculty, who watched each vignette video as a group, to comment on elements that were successful and on areas for improvement. Synthesizing the literature review and focus group findings ensured that our conceptualization of the feedback construct made theoretical sense to scholars in the field and used language that feedback providers understood. It allowed us to draft a list of 51 items that we grouped under 10 proposed dimensions of feedback and to create an early assessment scale, initially named Feedback Rating Scale (Appendix A Table A1).
Although several feedback delivery frameworks have been described, these are applicable to narrow areas within medical education. Several assessments were developed within specific contexts-written feedback [19], simulation debriefing [20], direct observation of clinical skills [21], communication skills feedback [22], feedback by residents [23], and feedback assessed by medical students [24,25]-however, these instruments are not generalizable to other types of feedback. The major research gap in this domain is therefore the absence of a reliable measurement instrument that can be applied to multiple facets of medical education.
The purpose of the present study was to (a) define dimensions that best represent the construct of feedback in medical education, and to (b) create and refine a generalizable assessment instrument for measuring educator feedback skills.

Research Model
This is an educational survey design study. We adopted Messick's construct validity framework [26]. We selected Messick's framework because, in contrast to the earlier validity frameworks that focused on "types" of validity (e.g., content or criterion), this approach favors a unified framework in which construct validity (the only type) is supported by evidence derived from multiple sources [27]. We envisioned our study findings being one of such sources that begin the "validity argument".
For additional guidance in the study design, we selected a systematic and practical approach for creating high-quality survey scales that synthesized multiple survey design techniques into a cohesive mixed-methods process [28]. Building on our pilot work, we addressed the content, construct, and response process aspects of validity.

1.
To explore the content aspect of construct validity using expert validation, we recruited an international panel of methodologists, researchers, and subject-matter experts.

2.
To conduct cognitive interviews, we recruited experienced feedback providers from 4 clinical departments (Emergency Medicine, Medicine, Orthopedic Surgery, Physical Medicine and Rehabilitation) at a single academic health system.

1.
The experts were asked to comment on each item's representativeness, clarity, relevance and distribution using an anonymous online form: https://docs.google.com/ forms/d/e/1FAIpQLSffLngxbC_XTBv31dQDi0ftczjz3wDMGrfz_ZcOmLimcnPXiA/ viewform (accessed on 5 December 2022).

Data Collection Process
To assess how clear and relevant the items are with respect to the construct of interest, international experts were asked to comment on each item's representativeness, clarity, relevance, and distribution using an anonymous online form. We also asked the experts to review the labels used for the response categories (qualitative review: content aspect of construct validity using expert validation). We asked the same group of experts to review individual items in the modified assessment instrument. Experts rated each item as essential, useful but not essential, or not necessary using an anonymous online form (quantitative review: content aspect of construct validity using expert validation).
To ensure that the respondents interpreted items as we intended (response process validity) we asked experienced feedback providers to use the assessment instrument, modified in above steps, and to rate videotaped feedback encounters that we developed as part of the pilot study [18]. We then conducted structured individual cognitive interviews utilizing the technique of concurrent probing [29]. This technique involves the interviewer asking about the respondent's thought process while they are completing the questionnaire, and allows a reasonable balance between the demand on the respondent and minimizing the recall bias [28].

Data Analysis
Using the data collected during the qualitative reviews, we used expert responses and comments to modify and revise the assessment instrument. During quantitative expert reviews, we used both a predetermined content validity ratio cut-point (McKenzie recommends 0.62 minimum for statistical significance of <0.05 for a 10-member panel), and the narrative comments by experts to make inclusion and exclusion decisions for individual items [30].
Audio files of recorded cognitive interviews were transcribed, coded, and analyzed qualitatively using the ATLAS.ti software (Scientific Software Development GmbH 2019) in order to modify and improve the overall assessment instrument and the individual survey items. The research team used a consensus method in deciding on whether to proceed with each revision suggested by interviewees; suggestions that received at least three out of four research team votes were implemented.

Results
The majority of interviews were conducted face to face, however, the last two interviews were done virtually due to the COVID-19 pandemic-related restrictions. The assessment instrument (final version, Appendix A Table A2) was revised eight times during the research study ( Table 1). The instrument name was changed from Feedback Rating Scale to Educator Feedback Skill Assessment (EFSA).
Qualitative review. Twelve experts agreed to participate (see Acknowledgements section). Ten of the twelve submitted narrative comments online. In addition to individual item revisions, the number of items was increased from 31 to 32 (one item was split to avoid "double barreling").
Quantitative review. Eight of the twelve experts submitted "inclusion/exclusion" votes online. Ten of the thirty-two items had the content validity ratios of > 0.62, and were included in the final version of the assessment instrument.
Cognitive interviews. Twelve cognitive interviews were conducted, ten face to face and two online via Zoom. Participants included four teaching faculty in Emergency Medicine, four in Physical Medicine and Rehabilitation, three in Internal Medicine, and one in Orthopedic Surgery. Qualitative analysis of the interview transcripts yielded twentythree recommendations. Seven of the suggestions received at least three out of four research team votes and were implemented in the final version of EFSA. To arrive at the final version of the assessment instrument (Appendix A Table A2), the PI made several additional changes to improve readability, reduce wordiness, and improve item format consistency.

Discussion
We believe a rigorous instrument that builds on existing theory and empirical evidence is necessary to measure the quality of feedback in medical education. Our study takes the first step in creating and validating such an instrument. Our results may also impact assessment in medical education in several ways.
Firstly, our findings may deepen the theoretical understanding of the dimensions of feedback necessary for making it meaningful and impactful, with the potential benefit for both medical education researchers and practitioners. Secondly, defining performance expectations for feedback providers in the form of a practical rubric can enhance reliable scoring of feedback performance assessments. Finally, although rubrics may not facilitate valid judgment of feedback assessments per se, they have the potential of promoting learning and improving the instruction of feedback providers by making expectations and criteria explicit, thereby facilitating feedback and self-assessment [31].
Our work started from de novo observations of feedback in a pilot project. While our findings were undoubtedly colored by the work of others and existing frameworks for feedback, we expect to further validate current methods of assessment, and explore and define novel dimensions of delivering feedback. Our work also built on an emerging area of feedback research supported by recent work of others and by our pilot work: specificity. Roze des Ordons and other colleagues identified variability in feedback recipients (4 'resident challenges') and suggested adjusting feedback provider approaches accordingly [8]. Our own pilot [18], on the other hand, was based on scenarios that were selected, scripted, and enacted by learners (resident physicians), and the resultant data suggested additional variability in feedback providers. Using more than one perspective in developing items and dimensions of the assessment instrument may allow us to highlight multiple facets of the feedback construct and understand it more fully.
We think that the collaborative nature of this study is also a strength. Several prominent scholars with unique knowledge in assessment and feedback agreed to participate in expert validation (see Acknowledgements section). Within our own institution, we included faculty from 4 diverse departments in the cognitive interviewing, from both "cognitive" and "procedural" specialties, which supports the generalizability of the resultant instrument.
We addressed only one (content) of the four aspects (structural, content, generalizability, and substantive) of validity described by Messick [26], and this is undoubtedly the greatest weakness of this work. However, we feel strongly that once our new instrument is available to the medical education research community, the sooner this shortcoming can be addressed, by ourselves and by others. Additionally, early use of the instrument by the medical educators in the field is likely to provide feedback that will allow us to further refine and polish EFSA. Future studies will need to explore multiple facets of the feedback construct, while varying the types of feedback providers and feedback recipients. Another area of interest involves the study of different relationship stages, for example, one-time feedback vs ongoing coaching, 'on the fly' vs scheduled at the end of a clinical rotation.
To continue collecting the validity evidence, future studies should delve into the psychometric properties of EFSA focusing on structural aspects, as well as convergent and discriminant validity (external aspect). Future studies should also explore the relationship between EFSA and additional external measures such as motivation to use feedback, feedback-seeking frequency, and satisfaction with feedback using existing survey items [32]. The change in physicians' behavior and performance and how they affect patient outcomes are also areas of future interest. Additional studies across different specialty areas and demographic variables should also be conducted to further explore the generalizability aspect of construct validity.

Conclusions
Building on the contemporary medical education literature and empiric pilot work, we created and refined an assessment instrument for measuring educator feedback skills. We also started the argument on validity and addressed content validity. Future studies should address structural, generalizability, and substantive aspects of validity, and test the new instrument in a variety of settings and contexts.
Author Contributions: A.M. contributed to study planning, data collection and analysis, manuscript writing. J.S. contributed to data collection, manuscript writing. F.L. contributed to study planning, data analysis, manuscript writing. C.R. contributed to study planning, data analysis, manuscript writing. K.C. contributed to study planning, data analysis, manuscript writing. All authors have read and agreed to the published version of the manuscript.