1. Introduction
Outcome-based, or competency-based, education has been widely adopted in medical and health sciences education. Substantial research has examined student performance related to various contextualized home-grown assessment tools for clinical rotations [
1,
2]. The American Board of Internal Medicine (ABIM) developed a mini clinical evaluation exercise (mini-CEX) used to assess the clinical skills of internal medicine residents and medical students about two decades ago [
3,
4]. The reporter-interpreter-manager-educator (RIME) system, designed by Lou Pangaro and colleagues at the Uniformed Services University of Health Sciences, was also researched and adopted in different contexts [
5,
6]. Central to the quality of assessment is the investigation of validity and reliability evidence [
7]. Assessment of what students know and can do constitutes an integral part of the curriculum for making judgments about individuals and providing feedback for students to grow. These assessments also play a pivotal role in residency selection. Nevertheless, the evaluation of medical student performance in clinical settings is widely recognized as a highly variable process that frequently lacks solid evidence of reliability and validity.
In 2014, the Association of American Medical Colleges (AAMC) published 13 Core Entrustable Professional activities (EPAs) for Entering Residency that a first postgraduate year resident could be expected to perform with indirect supervision on the first day of residency [
7]. The guide not only identifies a list of major real-life and authentic tasks as performed by clinicians regularly, but also provides descriptions about what are considered as “entrustable behaviors” as learners progress towards readiness to be entrusted with the task, and shifts the attention of assessment to entrustment decisions [
8]. Although job analyses and personnel psychology have generally moved from the task-based to competency-based framework [
9], EPAs-based assessments have gained much interest among clinical educators [
10,
11].
In Emergency Medicine (EM), there were concerns that clerkship directors used institution-specific tools with limited validity and reliability evidence, which called for the development of a standardized tool. As such, a national consensus conference was held in the Clerkship Directors in Emergency Medicine (CDEM) track of the Council of Emergency Medicine Residency Directors (CORD) Academic Assembly in Nashville, TN, in March 2016. Using the Delph method, the final assessment tool, the National Clinical Assessment Tool for Medical Students in Emergency Medicine (NCAT-EM), includes nine major components [
12]. Six domains are measured using a four-point rating scale: Level 1 Pre-Entrustable, Level 2 Mostly Entrustable, Level 3 Fully Entrustable/Milestone 1, and Level 4 Outstanding/Milestone 2, in addition to professionalism. Although AAMC includes 13 core EPAs, not all of the EPAs are assessed in the NCAT-EM, and the EM residency directors concluded that these domains/EPAs represent appropriate expectations for what students can achieve at the end of EM rotations. Since then, more and more institutions have been using the NCAT-EM for various purposes [
13].
At the Central Michigan University College of Medicine (CMED), students rotate between various clerkships and sub-internships of different campuses during the clinical stage [
14]. Emergency Medicine (EM) is one of the required 4-week clerkships for Year 4 students. The clerkship uses a standardized course syllabus across all four campuses. The curriculum includes required encounter experience, lectures, and simulation. Assessments are conducted through formative daily shift cards assessed through attendance and residence (by the end of the day), formative mid-block assessment conducted by the clerkship director (by the end of the 2nd week), and end-of-block assessment conducted by the clerkship director (by the end of the 4th week). Clinical faculty and residents are provided with different training regarding learning objectives and assessment tools. Since 2018, the National Clinical Assessment Tool for Medical Students in Emergency Medicine (NCAT-EM) has been implemented across all distributed campus sites with some modifications. The norm-referenced global item, initially utilized in 2018, has since been excluded. In addition, two more domains, written notes and practice-based learning and improvement, are added to assess the EM course and MD program objectives of the medical school. These two domains are assessed by the clerkship director only at the middle and end of the clerkship through oral interviews and note-writing assignments. Using the mobile-friendly devices and QR codes, students in EM are typically assessed on each shift by an attending physician, senior resident physician, or both, and observed by different assessors during different shifts. After clinicians submit daily shift cards, the clerkship director compiles and aggregates Likert-scale responses and narrative comments. The results are used for the formative one-on-one meeting at the end of Week 2, and for summative purposes to determine whether students will receive a pass grade or meet clinical honors criteria. The mid-clerkship formative assessment is based on the first two weeks’ performance and the summative assessment is based on all four weeks.
This paper examined student performance progression during the four weeks, based on the NCAT-EM daily shift card responses and formative/summative results. We hypothesized that students would exhibit potentially significant improvements during the four weeks of the EM rotation, based on the results of daily shift cards and formative/summative results. The literature in medical education calls for more research to investigate validity evidence from various sources for assessment tools that are used in various contexts [
15,
16,
17]. The NCAT-EM is a relatively new assessment tool. Although multiple schools use the standardized NCAT-EM, limited studies have been published so far. Through an in-depth analysis of student performance, this paper delves into accumulating validity evidence and the implications of using the NCAT-EM. The paper aims to offer insights into the potential opportunities presented by employing the EPAs-based rubric, while also addressing issues related to daily shift cards, and formative and summative assessments.
2. Methods
This is a retrospective cohort study, intentionally focusing on the academic year of 2021–2022, following three years’ use of the tool. Descriptive statistics were calculated at three levels: daily shift cards, mid-clerkship formative assessment, and final summative assessment. As described above, the daily shift cards include six domains: (1) focused H&P, (2) generate a prioritized differential diagnosis, (3) formulate a plan, (4) observation, monitoring, and follow-up, (5) emergency recognition and management, and (6) communication. The form also includes professionalism and the narrative comment box. For the mid-clerkship formative assessment and final summative assessment forms, there are two additional domains: note-taking, and practice-based learning and improvement. Descriptive statistics were computed by using SPSS 26. A paired t-test was used to examine score differences regarding daily shift cards at the beginning and end of the rotation, as well as differences between formative and summative scoring. Qualitative documentation about professionalism concerns, generally brief, were analyzed using the standard inductive thematic coding process.
3. Results
During the academic year of 2021–2022, 98 faculty and residents assessed 97 students with a total of 6712 grades on 238 submission days. Among these 97 students, 48 were female and 49 were male. Among 98 assessors, there were 15 residents, and the rest were clinical faculty. These assessors were employed at different hospitals and clinics, including Ascension St Mary’s, CMU Medical Education Partners, Covenant Healthcare, MyMichigan Midland, and Spectrum Health-Lakeland.
3.1. Daily Shift Cards
On average, each student received 10.76 daily shift cards, but there was considerable variability in this figure, ranging from the lowest (2) to the highest (17) (see
Table 1). Due to the unique technical complexity at one of the campuses, 7 students received only one set of formative and summative responses that could be tracked. Among these 6712 grades, we received 629 missing data, 45 (0.7%) were Level 1 Pre-entrustable, 953 (14.2%) Level 2 Mostly Entrustable, 3600 (53.6%) Level 3 Fully Entrustable/Milestone 1, and 1485 (22.1%) Level 4 Outstanding/Milestone 2.
The clinical rating scale scores in six domains were positively skewed (see
Table 2), similar to the findings in other studies [
13]. Among these six domains, Focused H&P was the most frequently assessed, with 1031 individual scores, while Emergency recognition and management was the least assessed with 926 scores. The means of these domains ranged from 2.94 to 3.23, and the standard deviations from 0.600 to 0.681. The domains with mean scores, in descending order, ranged from the highest—Communication (M = 3.23)—to the lowest— “Ability to formulate a plan” (M = 2.94).
Due to the complexity caused by different assessors assessing different students at different times, we investigated student progression by comparing the results of the first three daily shift cards with the last three daily shift cards. Student performance in the first and last three daily shift cards in each of the six domains were averaged, and then the paired sample
t-test analysis carried out. The seven students who received only two sets of responses due to technical limitations at one site were excluded in the analysis. Although positively trending in three out of six domains, results showed no significant differences regarding these domains between the scores at the beginning and the end of the clerkship (see
Table 3). Three domains, focused H&P, Emergency management, and communication maintained similar scores from the beginning to the end of the rotation.
3.2. Mid-Clerkship Formative and Summative Assessments
CMED has a required one-on-one mid-clerkship meeting process, scheduled virtually or in-person. Due to the formative purpose of this meeting, the clerkship director provides results to students based on aggregated daily shift card responses collected from the first two weeks. Students receive deidentified-preceptor, completely unblinded, daily shift-card formative commentary, and the clerkship director utilizes this feedback to provide directed suggestions for improvement. No student received Pre-Entrustable for both formative and summative assessments. The paired sample
t-test showed significant differences between formative and summative scoring in two domains (note-taking and practice-based learning), which were graded by the course director (
Table 4).
3.3. Professionalism
In daily shift cards, among a total of 97 students, 8 students (8%) received 18 concerns in various areas including initiative, diligence, or work ethic (n = 6); compassion, sensitivity, or respect towards patients (n = 3); dependability, accountability, or responsibility (n = 3); receptivity to constructive feedback (n = 2); respect or collegiality towards team members (n = 1); punctuality, attendance, or preparation for duty (n = 1); and other (n = 2). Preceptors provided elaborative comments and examples in these specific areas.
Three students received three concerns in the summative evaluation, as identified by the clerkship director: punctuality, attendance, or preparation for duty; dependability, accountability, or responsibility; and other. These three students did not receive any concerns in daily shift cards; instead, they displayed unprofessional behaviors in other settings, such as in participation in simulation activities, and late submissions of assignments and case logs. These issues raised concerns about their commitment to meeting responsibilities beyond direct patient care.
4. Discussion
The study intentionally focused on examining the outcomes of the fourth year of the NCAT-EM implementation. This deliberate choice was driven by the consideration of a more established phase regarding the gradual integration and reception of the NCAT-EM. In this study, each student received an average of 10.76 daily shift cards over the course of the four-week rotation, spanning 20 business days. This number is higher than the average of 8.6 observed in the multi-institutional implementations of NCAT-EM [
17]. Despite the study using data collected during the COVID-19 pandemic, the students generally received ample sets of feedback from faculty and residents. Overall, the substantial volume of feedback responses highlights a primary advantage of employing the task-specific EPA-based rubric. The task-based rubric offers a clear and detailed framework for assessing a learner’s proficiency in performing specific clinical activities. This specificity simplifies the process for clinical faculty to grasp and apply EPAs efficiently. In contrast to broader competency-based rubrics, the NCAT-EM enables an in-depth examination of clinical behaviors, helping educators and students to concentrate on the exact skills and proficiencies crucial in clinical settings. In addition, the high assessment completion rate is inseparable from the strong leadership of the clerkship director, the integration of the mobile-delivered tool, and the central management system. Mobile-delivered tools make the data collection process more efficient and accessible. Central management, frequently supported by mobile-delivered monitoring systems, enhances the administration and oversight of the NCAT-EM. Gaining faculty buy-in stands as the initial step toward the continuous utilization of the NCAT-EM.
Similar to the multi-institutional study, Level 3 Fully Entrustable/Milestone and Level 2 Mostly Entrustable were commonly scored in this study. Additionally, it is unsurprising to note that domains associated with focused history and physical exams (EPA 1) and patient-centered communication (EPA 6) leaned towards higher levels, aligning with the existing literature. The summary of the 10-school pilot report on core entrustable professional activities found that there were relatively high proportions of students deemed ready for entrustment under indirect supervision in these domains [
9]. Emergency-recognition and management is a very unique activity in EM. Anecdotal observations reported by the faculty indicate that these areas showed a trend toward improvement. We feel this area requires the most growth for 4th year medical students, and it is stressed to students that they should target its growth over the month.
The results showed no significant differences between the daily shift cards’ scores at the beginning and end of the clerkship. The paired sample
t-test only showed significant differences between formative and summative scoring in two domains (note-taking and practice-based learning) that were graded by one clerkship director. While clinicians’ anecdotal observations suggested positive student progression in this context, the study’s findings did not corroborate those observations. One plausible explanation for the results is grading variance due to the involvement of a substantial number of faculty and resident assessors at different campuses and sites. Using the Generalizability theory, Zaidi et al. [
2] examined the cohort data across different clerkships. Minimal reliability was found in competency assessment scores for half of the clerkships. The study concluded that the variability in reliability estimates across clerkships may be attributable to differences in scoring processes and assessor training.
Due to a large number of faculty and resident assessors in this context, variation might exist due to different interpretations of the NCAT-EM descriptors and entrustable grading scales, rater severity and leniency, and rater personality. For example, research found residents behaved differently regarding feedback-seeking, feedback-avoidance, and feedback-filtering based on personality of attendings, practice style, and the development of their own desired practice style [
18]. In addition, the grading results may be context-dependent due to the site culture, the complexity of daily tasks, and the timing of grading. Site-specific culture related to clinical-performance grading might be different between sites, causing a unique impediment to improvement in this area. It is essential to address these biases to ensure that the assessment data collected through daily shift cards accurately reflects a learner’s progress and allows for a consistent application of the tool to reflect performance at CMED. The lack of progression necessitates further examination and potentially additional training for raters to minimize these biases. Despite the fourth year of implementing the NCET-EM, these findings underscore the necessity for systematic assessment strategies, including continuous technical support, standardized assessment processes, and ongoing assessor training to enhance the reliability and consistency of competency assessments. These strategies must account for the diverse features of various contexts, such as differences in EM training programs across counties, virtual rotations due to the impact of the pandemic, and socioeconomic/resource barriers affecting the accumulation of validity evidence [
19,
20].
In this study, approximately 10% of students were reported to have exhibited professionalism lapses in daily shift cards, while 3% of students in the final summative assessment did, a notably higher figure compared to the 1% reported in the multi-institutional NCAT-EM study [
21]. The reasons behind this contrast are not definitive yet. It is uncertain whether this gap stems from differences in student populations, the influence of COVID-19, or potentially differing expectations regarding professionalism lapses. Among the flagged areas, initiative, diligence, and work ethic were the most frequently cited concerns, aligning with the findings of Emery’s study. Three students who received concerns in the summative assessment were not flagged in the daily shift cards. Additional professional lapses were observed and documented in the summative assessment, such as attendance at simulation sessions and late submissions of assignments. It is also possible not all instances of significant professionalism issues were documented in writing, despite providing training and instructions to both faculty and residents. Overall, it is vital to address unprofessionalism in clinical settings by establishing clear expectations for professional conduct, promoting awareness and understanding, and challenging stereotypes and biases [
22,
23].
5. Limitations
This study was conducted with one cohort within a single college, based on the quantitative data of one assessment tool. Consequently, caution must be exercised when generalizing the results to other medical schools. We did not get a chance to analyze narrative comments qualitatively. It is also important to explore the feedback culture and practices in EM, engaging both faculty and students in conversations around feedback delivery and receptivity. Investigating the types of feedback received, whether it includes indications for improvement, or the criteria and training guidelines, is vital. Monitoring and discussing feedback can significantly enhance the learning experience. Furthermore, the primary emphasis of the study was on simple inferential statistics, leaving room for future research that could employ more complexed statistical analysis for a deeper exploration, such as the item response theory or Generalizability theory (G-theory). To comprehensively accumulate validity evidence of this assessment tool, it is essential to explore its relationships with other variables such as criterion validity or predictive validity. Gathering diverse perspectives from stakeholder groups, including health professionals and students, is also crucial for a more holistic accumulation of its validity evidence.
6. Conclusions
In the article
The Next Era of Assessment: Building a Trustworthy Assessment System [
24], it is highlighted that a framework to build trust at multiple levels in a future assessment system is important; one that invites and supports professional and human growth. Clearly these authors are in favor of qualitative narrative evaluations and non-dependence on traditional psychometrics, thereby avoiding biases. They are also concerned about the assessment of professionalism, integrating reflection into supervisor behavior and apprentice participation, and optimizing narrative comments in favor of the human facet of the profession.
The process of adapting to and embracing new assessment tools and the conceptual considerations behind them is time-intensive. Acquiring buy-in and cultivating cultural shifts within a medical environment demands a considerable amount of time. As stated by Caretta-Weyer and her collaborators, the next era of assessment needs to use frameworks to build trust at multiple levels in a future assessment system [
24]. This involves not only the adoption of new tools but also ensuring that these tools are perceived as credible and valuable by all stakeholders. The current study examined micro-level student performance information. In summary, the NCAT-EM provides values and benefits due to its feasibility and meaningful domains, as evidenced by the outcomes of its fourth year of implementation. The study provides some validity evidence regarding student performance and progression within the context. It holds promise to continuously improve validity evidence to be used in emergency medicine for the context in which it is used. The successful implementation of the NCAT-EM benefits from strong leadership, technical support, and central management to ensure good completion rates. Daily shift cards are crucial data points of the EPA assessment process, as they provide a detailed and timely record of a learner’s progress. Monitoring and efficient dissemination of feedback enhances the learning experience and supports students in reaching their full potential as medical professionals.
The study provides clear directions for future action plans. Rater reliability and scoring biases due to various reasons remain obstinate challenges in performance-based assessment. It is essential to address these biases to ensure that the assessment data collected accurately reflect a learner’s progress. The variability and lack of solid evidence regarding reliability and validity can potentially affect the fairness and credibility of assessment outcomes, impacting both students’ educational trajectories and their future opportunities in residency selection [
25,
26]. By focusing on accumulating validity evidence, the use of the NCAT-EM can better align with the nuanced needs and expectations of medical programs. In doing so, medical schools contribute to the evolution and refinement of competency assessments, ultimately enhancing the quality of medical education and the preparation of future emergency medicine professionals.
Author Contributions
X.S. and D.S. have contributed substantially to the work, including conceptualization, methodology, data collection and analysis, and manuscript preparation. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Ethical review and approval were waived for this study because the project was deemed by the Office of Research Compliance at Central Michigan University not to meet the definition of human subject research and not to contribute to generalizable knowledge.
Informed Consent Statement
Not applicable.
Data Availability Statement
The original contributions presented in the study are included in the article.
Acknowledgments
We greatly thank the CMED community for sharing their perceptions and experiences.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Prediger, S.; Schick, K.; Fincke, F.; Fürstenberg, S.; Oubaid, V.; Kadmon, M.; Berberat, P.O.; Harendza, S. Validation of a competence-based assessment of medical students’ performance in the physician’s role. BMC Med. Educ. 2020, 20, 6. [Google Scholar] [CrossRef]
- Zaidi, N.L.B.; Kreiter, C.D.; Castaneda, P.R.; Schiller, J.H.; Yang, J.; Grum, C.M.; Hammoud, M.M.; Gruppen, L.D.; Santen, S.A. Generalizability of Competency Assessment Scores Across and within Clerkships: How Students, Assessors, and Clerkships Matter. Acad. Med. 2018, 93, 1212–1217. [Google Scholar] [CrossRef]
- Hauer, K.E. Enhancing feedback to students using the mini CEX (Clinical Evaluation Exercise). Acad. Med. 2000, 75, 524. [Google Scholar] [CrossRef] [PubMed]
- Torre, D.M.; Simpson, D.E.; Elnicki, D.M.; Sebastian, J.L.; Holmboe, E.S. Feasibility, reliability and user satisfaction with a PDA-based mini-CEX to evaluate the clinical skills of third-year medical students. Teach. Learn. Med. 2007, 19, 271–277. [Google Scholar] [CrossRef]
- Pangaro, L. A new vocabulary and other innovations for improving descriptive in-training evaluations. Acad. Med. 1999, 7, 1203–1207. [Google Scholar] [CrossRef] [PubMed]
- Griffith, I.; Charles, H.; Wilson John, F. The Association of Student Examination Performance with Faculty and Resident Ratings Using a Modified RIME Process. J. Gen. Intern. Med. 2008, 23, 1020–1023. [Google Scholar] [CrossRef] [PubMed]
- Messick, S. Validity. In Educational Measurement, 3rd ed.; Linn, R.L., Ed.; American Council on Education and Macmillan: New York, NY, USA, 1989; pp. 13–104. [Google Scholar]
- Association of American Medical Colleges. Core EPAs Guiding Principles. Core Entrustable Professional Activities for Entering Residency Curriculum Developers’ Guide. Available online: https://store.aamc.org/downloadable/ (accessed on 8 June 2022).
- Core Entrustable Professional Activities for Entering Residency: Summary of the 10-School Pilot, 2014–2021; Association of American Medical Colleges: Washington, DC, USA, 2022.
- Soderquist, K.; Papalexandris, A.; Ioannou, G.; Prastacos, G. From task-based to competency-based: A typology and process supporting a critical HRM transition. Pers. Rev. 2010, 39, 325–346. [Google Scholar] [CrossRef]
- Chen, H.C.; van den Broek, W.E.; ten Cate, O. The case for use of entrustable professional activities in undergraduate medical education. Acad. Med. 2015, 90, 431–436. [Google Scholar] [CrossRef]
- Ryan, M.S.; Richards, A.; Perera, R.; Park, Y.S.; Stringer, J.K.; Waterhouse, E.; Dubinsky, B.; Khamishon, R.; Santen, S.A. Generalizability of the Ottawa Surgical Competency Operating Room Evaluation (O-SCORE) Scale to Assess Medical Student Performance on Core EPAs in the Workplace: Findings From One Institution. Acad. Med. 2021, 96, 1197–1204. [Google Scholar] [CrossRef]
- Jung, J.; Franzen, D.; Lawson, L.; Manthey, D.; Tews, M.; Dubosh, N.; Fisher, J.; Haughey, M.; House, J.B.; Trainor, A.; et al. The National Clinical Assessment Tool for Medical Students in the Emergency Department (NCAT-EM). West. J. Emerg. Med. 2018, 19, 66–74. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Song, X.; Vance, S. Students’ Surgical Experiences in a Distributed Model of Clinical Education: A Mixed-Methods Sequential Case Study. J. Surg. Educ. 2021, 78, 858–865. [Google Scholar] [CrossRef] [PubMed]
- Cheung, W.J.; Wood, T.J.; Gofton, W.; Dewhirst, S.; Dudek, N. The Ottawa Emergency Department Shift Observation Tool (O-EDShOT): A New Tool for Assessing Resident Competence in the Emergency Department. AEM Educ. Train. 2019, 4, 359–368. [Google Scholar] [CrossRef] [PubMed]
- Pugh, D.; Cavalcanti, R.B.; Halman, S.; Ma, I.W.Y.; Mylopoulos, M.; Shanks, D.; Stroud, L. Using the Entrustable Professional Activities Framework in the Assessment of Procedural Skills. J. Grad. Med. Educ. 2017, 9, 209–214. [Google Scholar] [CrossRef]
- Hiller, K.; Jung, J.; Lawson, L.; Riddell, R.; Franzen, D. Multi-institutional Implementation of the National Clinical Assessment Tool in Emergency Medicine: Data From the First Year of Use. AEM Educ. Train. 2020, 5, e10496. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Fredette, J.; Michalec, B.; Billet, A.; Auerbach, H.; Dixon, J.; Poole, C.; Bounds, R. A qualitative assessment of emergency medicine residents’ receptivity to feedback. AEM Educ. Train. 2021, 5, e10658. [Google Scholar] [CrossRef] [PubMed]
- Rybarczyk, M.M.; Ludmer, N.; Broccoli, M.C.; Kivlehan, S.M.; Niescierenko, M.; Bisanzo, M.; Checkett, K.A.; Rouhani, S.A.; Tenner, A.G.; Geduld, H.; et al. Emergency Medicine Training Programs in Low- and Middle-Income Countries: A Systematic Review. Ann. Glob. Health 2020, 86, 60. [Google Scholar] [CrossRef]
- Villa, S.; Janeway, H.; Preston-Suni, K.; Vuong, A.; Calles, I.; Murphy, J.; James, T.; Jordan, J.; Grock, A.; Wheaton, N. An Emergency Medicine Virtual Clerkship: Made for COVID, Here to Stay. West. J. Emerg. Med. 2021, 23, 33–39. [Google Scholar] [CrossRef] [PubMed]
- Emery, M.; Parsa, M.D.; Watsjold, B.K.; Franzen, D. Assessment of professionalism during the emergency medicine clerkship using the national clinical assessment tool for medical students in emergency medicine. Acad. Emerg. Med. 2020, 5, e10494. [Google Scholar] [CrossRef] [PubMed]
- Academic Medicine. Professionalism in Medicine and Medical Education, Volume II: Foundational Research and Key Writings, 2010–2016. Available online: http://journals.lww.com/academicmedicine/Pages/eBooks.aspx (accessed on 10 December 2023).
- Song, X.; Willy, M.J. Exploring Unprofessional Behaviors and Biased Perceptions in the Clinical Environment: Students’ Perspectives. Med. Sci. Educ. 2024. [Google Scholar] [CrossRef]
- Caretta-Weyer, H.A.; Smirnova, A.; Barone, M.A.; Frank, J.R.; Hernandez-Boussard, T.; Levinson, D.; Lombarts, K.M.J.M.H.; Lomis, K.D.; Martini, A.; Schumacher, D.J.; et al. The Next Era of Assessment: Building a Trustworthy Assessment System. Perspect. Med. Educ. 2024, 13, 12–23. [Google Scholar] [CrossRef]
- Jeyalingam, T.; Brydges, R.; Ginsburg, S.; McCreath, G.A.; Walsh, C.M. How Clinical Supervisors Conceptualize Procedural Entrustment: An Interview-Based Study of Entrustment Decision Making in Endoscopic Training. Acad. Med. 2022, 97, 586–592. [Google Scholar] [CrossRef] [PubMed]
- Hauer, K.E.; Park, Y.S.; Bullock, J.L.; Tekian, A. “My Assessments Are Biased!” Measurement and Sociocultural Approaches to Achieve Fairness in Assessment in Medical Education. Acad. Med. 2023, 98, S16–S27. [Google Scholar] [CrossRef] [PubMed]
Table 1.
Daily Shift Cards.
Table 1.
Daily Shift Cards.
# of Daily Shift Cards Received | Frequency | Percentage |
---|
2 | 7 | 7.2 |
6 | 1 | 1.0 |
8 | 5 | 5.2 |
9 | 8 | 8.2 |
10 | 19 | 19.6 |
11 | 16 | 16.5 |
12 | 13 | 13.4 |
13 | 13 | 13.4 |
14 | 8 | 8.2 |
15 | 4 | 4.1 |
16 | 2 | 2.1 |
17 | 1 | 1.0 |
Total | 97 | 100.0 |
Table 2.
Descriptive Statistics for Daily Shift Cards.
Table 2.
Descriptive Statistics for Daily Shift Cards.
EPA | N | Mean | Std. Deviation |
---|
Focused History and Physical Exam Skills | 1031 | 3.15 | ±0.600 |
Ability to Generate a Prioritized Differential Diagnosis | 1029 | 2.99 | ±0.661 |
Ability to Formulate a Plan | 1030 | 2.94 | ±0.661 |
Observation, Monitoring, and Follow-up | 1027 | 3.13 | ±0.681 |
Emergency-recognition and Management | 926 | 2.99 | ±0.635 |
Patient and Team-centered Communication | 1040 | 3.23 | ±0.619 |
Table 3.
Paired Samples Statistics for Daily Shift Cards (Beginning vs. End).
Table 3.
Paired Samples Statistics for Daily Shift Cards (Beginning vs. End).
| Mean | N | Std. Deviation | Std. Error Mean |
---|
Pair 1 | H&P Beginning | 3.12 | 90 | ±0.358 | 0.037 |
H&P End | 3.11 | 90 | ±0.417 | 0.044 |
Pair 2 | Differential Diagnosis Beginning | 2.96 | 90 | ±0.382 | 0.040 |
Differential Diagnosis End | 2.99 | 90 | ±0.449 | 0.047 |
Pair 3 | Treatment Plan Beginning | 2.90 | 90 | ±0.408 | 0.043 |
Treatment Plan End | 2.95 | 90 | ±0.461 | 0.048 |
Pair 4 | Follow-up Beginning | 3.08 | 90 | ±0.442 | 0.046 |
Follow-up End | 3.12 | 90 | ±0.411 | 0.043 |
Pair 5 | Emergency Management Beginning | 2.96 | 89 | ±0.382 | 0.040 |
Emergency Management End | 2.96 | 89 | ±0.442 | 0.046 |
Pair 6 | Communication Beginning | 3.24 | 90 | ±0.346 | 0.036 |
Communication End | 3.23 | 90 | ±0.450 | 0.047 |
Table 4.
Paired Samples Statistics: Mid-clerkship Formative and Summative Assessment.
Table 4.
Paired Samples Statistics: Mid-clerkship Formative and Summative Assessment.
| Mean | N | Std. Deviation | Std. Error Mean | t |
---|
Pair 1 | H&P Formative | 2.95 | 96 | ±0.531 | 0.054 | −0.315 |
H&P Summative | 2.97 | 96 | ±0.469 | 0.048 | |
Pair 2 | Differential Diagnosis Formative | 2.84 | 96 | ±0.568 | 0.058 | 0.315 |
Differential Diagnosis Summative | 2.82 | 96 | ±0.503 | 0.051 | |
Pair 3 | Treatment Plan Formative | 2.75 | 96 | ±0.562 | 0.057 | 0.148 |
Treatment Plan Summative | 2.74 | 96 | ±0.528 | 0.054 | |
Pair 4 | Follow-up Formative | 2.92 | 96 | ±0.574 | 0.059 | −0.647 |
Follow-up Summative | 2.96 | 96 | ±0.541 | 0.055 | |
Pair 5 | Emergency Management Formative | 2.84 | 94 | ±0.493 | 0.051 | −0.962 |
Emergency Management Summative | 2.89 | 94 | ±0.401 | 0.041 | |
Pair 6 | Communication Formative | 3.16 | 96 | ±0.443 | 0.045 | 0.575 |
Communication Summative | 3.13 | 96 | ±0.508 | 0.052 | |
Pair 7 | Note-writing Formative | 2.79 | 92 | ±0.672 | 0.070 | −4.746 (<0.001) |
Note-writing Summative | 3.20 | 92 | ±0.497 | 0.052 | |
Pair 8 | Practice-based Learning Formative | 2.99 | 92 | ±0.104 | 0.011 | −1.972 (0.026) |
Practice-based Learning Summative | 3.07 | 92 | ±0.357 | 0.037 | |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).