Modelling and Measuring Professional Vision in Medical Education: A Cognitive Process Framework

Seidel, Tina; Kosel, Christian; Böheim, Ricardo; Gartmeier, Martin; Berberat, Pascal O.

doi:10.3390/ime5020052

Open AccessReview

Modelling and Measuring Professional Vision in Medical Education: A Cognitive Process Framework

by

Tina Seidel

¹,

Christian Kosel

^1,*

,

Ricardo Böheim

¹,

Martin Gartmeier

² and

Pascal O. Berberat

²

¹

Friedl Schöller Endowed Chair for Educational Psychology, Department of Educational Sciences, School of Social Sciences and Technology, Technical University Munich (TUM), 80333 Munich, Germany

²

TUM Medical Education Center, School of Medicine and Health, Technical University Munich (TUM), 80333 Munich, Germany

^*

Author to whom correspondence should be addressed.

Int. Med. Educ. 2026, 5(2), 52; https://doi.org/10.3390/ime5020052

Submission received: 26 March 2026 / Revised: 13 May 2026 / Accepted: 14 May 2026 / Published: 22 May 2026

Download

Browse Figure

Review Reports Versions Notes

Abstract

Physicians routinely operate in environments that require the rapid processing of complex and dynamic visual information to diagnose patient conditions, communicate effectively, and make informed decisions. Despite the central role of visual attention in clinical practice, these processes are rarely conceptualized or systematically measured in medical education research. In other professional domains, such abilities are described as professional vision (PV)—the situated capacity to selectively attend to relevant cues and interpret them considering domain-specific knowledge. Although the term professional vision foregrounds visual attention, we use it here to cover the multimodal clinical perception in which visual cues are typically embedded—predominantly visual, but in many tasks also auditory and verbal—with visual attention as the analytic anchor. This paper introduces a cognitive process model of professional vision for medical education (PV-CP) that specifies the perceptual and cognitive subprocesses underlying how physicians perceive and interpret clinically relevant information. Building on this model, we propose a theory-driven framework for the measurement of professional vision using multimodal indicators. Central to our argument is the assumption that professional vision represents a latent, temporally unfolding construct that cannot be validly captured through single behavioral metrics or outcome measures. Instead, robust measurement requires the coordinated analysis of gaze-based indicators of visual attention and cognitive indicators of reasoning, each reflecting distinct subprocesses of professional vision. By systematically linking families of indicators to specific subprocesses and clarifying their respective inferential strengths and limitations, the PV-CP model advances a process-oriented approach to studying professional vision in medical education. The framework provides a conceptual basis for integrating multimodal data sources and supports more precise interpretations of gaze and reasoning data in expertise research. In doing so, the model contributes to the theoretical refinement of professional vision and offers a structured foundation for future empirical research and the design of learning environments aimed at fostering clinically relevant perceptual–cognitive skills.

Keywords:

professional vision; visual expertise; eye movement tracking; verbal reasoning; medical education

1. Introduction

Many professional tasks in medicine require intensive observational skills of perception and interpretation [1]. Noticing relevant information and interpreting information correctly when examining a patient, carrying out medical surgeries, interpreting pictures of X-rays are key to the professional field [2]. In medicine, these observational skills are oftentimes linked to the interpretation of visual information, which often stems from continuously advancing medical imaging technology, such as MRI, photon-counting CT, 3D/4D imaging, hybrid imaging systems, and AI-enhanced diagnostic imaging [3,4]. These developments require constant adaptations on the observational skills to be acquired and trained. Observational skills, thereto, are a highly important determinant for high quality in diagnostic reasoning in medicine and make it an important objective for medical education [5]. In many other professional fields, which share these kinds of observational skill requirements [6], these skills are being referred to as professional vision skills [7].

The term professional vision was introduced by Charles Goodwin [8], describing professional practices from an ethnographic perspective. Professional practices include commonalities in the way professionals of one field are visually coding situations, highlighting particularly relevant aspects, and linking their visual observations to specific terminology in their verbal articulation. These professional practices are established, shared and taught within a professional community and are, therefore, an important element of professional identity formation. Professional vision is an established characteristic of a professional community, and the professional community is dependent on being able to teach these skills to new incoming members. Thereto, it is an important educational objective, and it should be studied and understood deeply of how to teach it well. In the following, based on the professional vision concept, the term professional vision skills are used when referring to the way medical professionals apply observation skills in situations of intensive processing of complex and dynamic visual information.

Oftentimes, the teaching of professional tasks in medicine is focused on acquiring relevant clinical skills and domain-specific knowledge (e.g., diagnosing specific symptoms and instructing relevant content knowledge in relation to the symptoms) [9]. Only rarely, the specific observational skills required for the diagnostic reasoning process are being explicitly taught [1]. Since these are, however, key to fulfilling certain professional tasks correctly and an important component of the professional formation, a call is given to also explicitly teach professional vision skills. A technology-driven approach to teach professional vision skills is the use of eye-movement tracking (EMT), which can be combined multi-modally with think-aloud verbal reasoning in order to capture both perceptual and conceptual cognitive processes [10,11]. Based on EMT recordings and further reasoning data, practitioners and students can use this additional visual information to monitor and reflect upon their visual practices and improve them [6]. In other professional fields, professional vision measurement instruments as well as established instructional approaches have been successfully tested, which might be of potential for further improvements in medical education.

The aim of this paper is to introduce a more elaborated conception of professional vision as it currently exists in medical education. What we develop here is, to our knowledge, the first cognitive process model of professional vision formulated specifically for the medical domain. While the broader concept originates in ethnographic work [8] and has been most extensively elaborated in the teaching context [10,11,12], the cognitive sub-processes posited in those frameworks have not been mapped onto the perceptual and reasoning demands of medical practice. Existing accounts of medical expertise—most prominently theories of clinical and diagnostic reasoning [13,14,15]—describe how physicians integrate symptoms with knowledge schemas, but they treat the perceptual front-end of this integration largely as a black box. They specify what is reasoned about, not how clinically relevant information is selected from a complex visual scene in the first place. Conversely, eye-tracking research in radiology and surgery has produced a rich descriptive catalogue of gaze patterns [4,16,17,18] but typically operates without an explicit process model linking gaze indicators to interpretive reasoning. The PV-CP model proposed here closes this gap: it transfers the noticing–reasoning distinction from professional-vision research developed in processing teaching situations as a starting architecture, but re-specifies its sub-processes for medical work and aligns each sub-process with measurable indicators tailored to clinical tasks.

To contribute to informing respective empirical and conceptual research, we introduce a cognitive process model of professional vision, along with a set of empirically tested measurement indicators for key model components and processes. We argue that such a process model can advance the field by providing a thorough understanding of cognitive and conceptual processes involved in professional vision formation. Effective medical education training approaches can, thereto, be improved by linking training elements specifically to underlying visual learning processes and educational outcomes.

2. Modelling Professional Vision for the Medical Profession

2.1. What Makes Visual Work in Medicine Distinctive

Many tasks in the medical profession require intensive visual processing of complex and dynamic information [5,19]. Such tasks include examining a patient and reasoning from observed cues to underlying symptomatic patterns, interpreting visual images from sources such as X-ray, MRI, or EEG for diagnostic purposes, and visually controlling one’s own action during procedural work. Visual practices in medicine include zooming and highlighting specific regions, rotating images for perspective-taking, and labeling visual information units with professionally shared categorizations [19], which can be described in more general terms as visual coding, highlighting, and verbal articulation [8].

While these visual practices share features with other professional fields in which professional vision has been studied [20,21], four characteristics of medical visual work set it apart and need to be foregrounded before introducing a process model. Each of these characteristics differentially loads the perceptual and cognitive sub-processes specified in Figure 1, and a model that does not address them risks underestimating both the cognitive demands of medical visual work and the requirements for designing instructional support.

Diagnostic uncertainty under incomplete information. Clinical decisions are routinely made when the underlying condition is not yet known and may remain probabilistic even after examination. Visual cues are weighed against differential diagnoses rather than read off a determinate scene, schema activation is iterative and revisable as new information arrives, and schema-non-aligned cues carry informational weight because they may indicate the diagnostically critical departure from an expected pattern [21,22]. This feature places heavy demands on the comparison between encoded information and case- and schema-based expectations (Figure 1, center), and on the controlled triggering of more fine-grained search when alignment fails.

Procedural dynamics and perception-action coupling. Much of the relevant visual information in medicine is itself dynamic—anatomy in motion during ultrasound, the evolving surgical field, changing vital-sign traces, evolving symptoms during a patient interaction—rather than the more static or slowly changing scenes that dominate other professional-vision domains. In procedural tasks such as surgery and ultrasound, perception is additionally coupled to the physician’s own motor action, so noticing and reasoning unfold in a perception–action loop rather than as observation of an external scene. This loads relational and structural processing (organizing) and places stringent demands on the temporal coordination of noticing and reasoning.

Patient-safety consequences and time pressure. Many clinical decisions are made under time pressure with substantial consequences for patient safety, which constrains the duration available for deliberate visual search and elevates the importance of fast, schema-driven information selection [20,21]. The cost of missed cues is asymmetric: a failure to notice a clinically relevant signal can have consequences that no comparable failure carries in other professional-vision domains. This feature differentially loads the noticing sub-processes of information selection and breadth of visual field, and it shifts the balance between exploratory and focused viewing toward the latter.

Multimodal parallel information streams and communicative gaze. Medical tasks routinely involve multimodal information streams processed in parallel. A physician on a ward round simultaneously monitors a patient’s appearance and verbal report, vital-sign monitors, paper or electronic records, and the input of colleagues [23]; a bedside examination integrates visual inspection with auscultation and palpation. The patient–physician interaction adds a relational layer in which gaze itself functions communicatively: visual attention is regulated not only by diagnostic relevance but also by interpersonal and ethical considerations [23].

Taken together, this means that professional vision in medicine, as defined at the outset, is best understood as multimodal clinical perception anchored in visual attention rather than as a purely visual construct—a scope we make explicit in the measurement framework (Section 4.1 and Section 4.2), where gaze-based indicators capture the visual anchor and verbal indicators capture the reasoning that integrates information across modalities.

2.2. A Cognitive Process Model for the Demands of Medical Visual Work

The cognitive architecture that meets these demands rests on two coordinated components, noticing and reasoning [10,11,12]. In medical settings, physicians benefit strongly from skills of selective attention that allow them to focus on relevant information under time pressure and uncertainty—what is referred to as noticing skills [19]. Experienced medical professionals actively attend to relevant information while ignoring irrelevant information [20]. Activated professional schemas based on acquired professional dispositions affect their foveal processing, resulting in fast processing of relevant information. Activated professional schemas also affect parafoveal processing, extending the visual view of experienced medical professionals [16]—a mechanism particularly important when diagnostically relevant cues lie outside the current point of fixation, as is common in radiological images and during ward-round monitoring.

Because clinical decisions are made against differential diagnoses rather than a determinate scene, the encoding of sensory information is accompanied by ongoing comparison of whether information is or is not aligned with the active case and schema. In case of alignment, fast information processing continues, based on relevant visual cues in relation to activated cases and schema [21]. In case of non-alignment, more fine-grained visual search processes take place, and activated cases and schema are adapted [10]. These processes of activated schema in relation to foveal and parafoveal processing are depicted in Figure 1.

Important underlying mechanisms are described in theories of visual expertise. First, according to the information-reduction hypothesis [20], professionals with high visual expertise focus on task-relevant information and actively ignore task-irrelevant information—a mechanism that is particularly consequential under the safety-critical time pressure of medical work. Second, the holistic model of image perception [16] emphasizes that persons with high visual expertise apply a broad visual view, quickly form a first impression by holistically perceiving the situation, and immediately mark relevant cues for further processing. Overall, these two models emphasize that high visual expertise includes fast information processing of relevant information and a broad visual view [24]. These two sub-processes can be well linked to what is labelled as noticing in professional vision [10,11].

In addition, processed information is interpreted based on existing knowledge schemas (Figure 1). An important component of visual expertise is the expanded working memory, which allows visual experts to actively organize visual information into meaning units (chunks) and to build mental models by integrating professional knowledge [24]. In medical work, these chunks are not exclusively visual: a ward-round chunk may bundle a visual cue (the patient’s pallor), an auditory cue (the patient’s verbal report), and a numerical cue (a monitor value) into a single diagnostically meaningful unit. Observational cues are organized in meaningful chunks and integrated with knowledge schemas to reason accurately and to reach correct diagnostic conclusions [25,26]. These inferential processes from visual cues to underlying professional knowledge are key for clinical and diagnostic reasoning in medicine [13,23]. Experienced medical professionals bundle organized sections of the ongoing visual information stream into meaningful chunks and label chunks with adequate, professionally agreed-upon terminology, allowing them to integrate processed information with professional knowledge schemas. They quickly link visual cues with individual case-based knowledge and may therefore find more varying ways of linking visual chunks with knowledge schemas for mental model building [21]; when required, they are also able to verbalize their reasoning on how they organized and integrated visual information [1,5].

In professional vision research, the two described major components of selective attention and knowledge integration are referred to as the noticing and reasoning components of professional vision [19]. Based on visual expertise research, noticing can be constituted by two sub-processes: fast information selection and broad visual perspective-taking. The two sub-processes of organizing and integrating constitute the reasoning component [10]. These reasoning processes are highly relevant for accurate medical diagnosis and decision making as an important result of successful professional vision processes. Effective and efficient processes of noticing and reasoning are expressed by the term advanced professional vision. The components and processes of the resulting professional vision cognitive processing model are depicted in Figure 1.

The PV-CP model is proposed as a common cognitive architecture across medical visual tasks rather than as a domain-general or fully task-specific framework. The four sub-processes—information selection, breadth of visual field, organizing, and integrating—are posited to be active across radiology, surgery, bedside physical examination, and ward rounds, but the weighting and observable expression of these sub-processes differ by task. In radiology, information selection on static or quasi-static images and breadth of visual field (foveal–parafoveal coordination, holistic first impression) dominate, while organizing operates over spatially distributed image regions [16,18,27]. In surgery and other procedural work, organizing and integrating unfold within a tight perception–action loop, and the relevant breadth of field includes the surgeon’s own instruments and the patient anatomy in motion. In bedside physical examination, multimodal cue integration and schema-driven anomaly detection are central, and the visual scene is co-constructed with the patient’s behavior. In ward rounds, organizing across heterogeneous information sources—patient, monitor, record, colleague—and the communicative regulation of gaze are particularly loaded [23]. The model thus offers a shared vocabulary and a shared sub-process structure for studying professional vision across these contexts, while leaving the relative weight of each sub-process open to task-specific specification.

2.3. Determinants and Outcomes of Professional Vision Skills

Professional vision is highlighted as important situation-specific skill when modeling competence development in professional fields [28]. Competence development is conceptualized as a continuum, with three major differentiations between professional dispositions as pre-requisites, situations-specific skills and performance. Highly relevant situation-specific skills are, according to this model, skills in perception, interpretation and decision-making. Particularly perception and interpretation can be labelled as noticing and reasoning of professional vision skills, which form the basis for diagnostic reasoning and decision making [12,29]. These are, in turn, important determinants for professional performance [30].

Professional dispositions have been studied intensively in the light of professional competence development [31]. Professional competences, in this sense, include various cognitive aspects such as content knowledge relevant for the profession, analytic reasoning, critical thinking. In addition, meta-cognitive and motivational aspects serve as important dispositions, which include strategies for self-monitoring and regulation, self-efficacy, goal orientations, etc. [32]. These dispositions are highly intertwined, since cognitive, meta-cognitive and motivational dispositions are acquired hand in hand [28]. Regarding professional vision skills, a substantive body of research has shown that acquired content knowledge as cognitive disposition affects the way professionals perceive and interpret situations in a professionally informed way [33]. In addition, goals as a motivational disposition shape professional vision skills and influence which aspects are particularly attended to and selected for further visual processing and sense-making [34]. Based on these findings, professional vision skills can serve as a quite robust indicator whether professional dispositions have been acquired and to what extent they are applicable in real time to process information in professionally relevant situations. Thereto, professional vision is measured regularly in some higher education programs to formatively assess ongoing professional vision formation over time [35,36].

Medical education students in most countries worldwide are a highly selected group, with comparably high cognitive dispositions, as well as being very motivated and showing high self-regulation skills [37,38,39]. Thereto, medical education students are an interesting target group to study how professional vision skills are acquired and formed throughout various study program points. This student group might benefit in particularly strong ways from professional vision training interventions since they typically start study programs with exceptionally high cognitive and motivational pre-requisites [40]. These have been shown to be particularly positive for high learning gains in professional vision training interventions [41].

Next to the relationship between important dispositions and professional vision skills, also the outcomes of advanced professional vision skills have been studied. Thereby it has been shown that advanced professional vision skills are systematically linked to better diagnostic accuracy [41,42], an awareness for challenges and possible threads in particular professional situations [43], a focus on individuals and their learning [44], as well as better professional performance (e.g., in teaching quality) [33].

3. Measuring Professional Vision Skills in the Medical Field

Professional vision constitutes a latent, process-oriented construct that cannot be directly observed but must be inferred from measurable indicators that reflect underlying perceptual and cognitive processes [25]. In line with the PV-CP model (Figure 1), professional vision unfolds dynamically as professionals selectively attend to visual information, organize this information into meaningful units, and integrate it with domain-specific knowledge to support diagnostic reasoning and decision making.

From a measurement perspective, this implies that professional vision cannot be operationalized through single, isolated indicators. Instead, valid measurement requires a theory-driven mapping between specific professional vision subprocesses and empirically measurable indicators, while explicitly taking into account the inferential limits of each indicator. In the following, we outline a measurement framework that differentiates between measurement modalities and specifies their sensitivity to professional vision subprocesses described in the PV-CP model.

Applying Noticing and Reasoning Measurements Based on the PV-CP Model

Eye tracking provides continuous, high-resolution temporal data on visual sampling behavior and is therefore particularly suited for capturing perceptual and attentional subprocesses of professional vision. In the last decade, noticing-related processes have increasingly been operationalized and investigated using eye-tracking methodologies, as these allow researchers to trace the moment-to-moment allocation of visual attention during professional activity. In medical education research, eye tracking has been used to study visual expertise in radiology, surgery, and clinical observation [4,16,44].

Within the PV-CP model, eye-tracking measures are primarily sensitive to noticing subprocesses, including:

Information selection, reflected in fixation-based measures such as fixation count, fixation duration, or dwell time on diagnostically relevant areas of interest (AOIs).
Relational and structural processing, reflected in transition-based measures and scanpath characteristics that capture how visual elements are sequentially connected over time.
Strategic organization of viewing behavior, reflected in global distributional measures such as entropy or dispersion, which indicate the degree to which visual sampling is focused, exploratory, or systematically organized.

It is important to note that eye movements do not constitute direct measures of attention or cognition. Gaze behavior provides a probabilistic evidence for underlying attentional allocation, and even this inference rests on auxiliary assumptions—most centrally that foveal vision is required for detailed information uptake [45]—that hold imperfectly in dynamic clinical scenes where parafoveal and covert attention can be substantial [16,24]. Long fixations may reflect deep processing, encoding difficulty, hesitation, or simple disengagement; high transition frequencies may indicate skilled relational integration or fragmented search. Disambiguating these competing readings requires either converging evidence from multimodal data such as additional verbal data (see Section 4.2) or strong task-design constraints. We therefore treat gaze indicators throughout this paper as process-level proxies whose interpretation is theoretically substantiated by the PV-CP model rather than as direct windows into cognition.

A further scope clarification concerns the level at which the measurement framework is intended to operate. The indicator families described in the following sections are, at the current state of evidence, primarily suited for group-level research—for example, comparing experts and novices, evaluating instructional interventions, or characterising sub-process profiles across clinical tasks. Individual-level use, whether for formative feedback or summative assessment of a single learner’s professional vision, places substantially stronger demands on indicator reliability than has so far been established for most gaze- and reasoning-based metrics. Reported reliabilities for fixation-, transition-, and entropy-based measures vary across studies, AOI schemes, and analytic choices (see notes to Table 1), and generalisability across cases of comparable difficulty has rarely been quantified for medical PV indicators. We therefore treat individual-level use as an aspirational application contingent on generalisability-theory analyses that estimate, for each indicator family, the number of cases and the task structure required for defensible learner-level inferences—an open empirical question we return to in Section 5.3 (Research Question 5). Until such evidence is available, individual-level interpretations should be triangulated across indicator families rather than rest on a single metric, and they should be made with explicit acknowledgement of measurement error.

4. A Multimodal Measurement Framework for Professional Vision

4.1. Gaze-Based Indicators of PV Subprocesses

This section outlines gaze-based indicators for the noticing-related subprocesses specified in the PV-CP model and summarizes their operationalization in Table 1.

4.1.1. Fixation-Based Measures: Information Selection and Cue Prioritization

Fixation-based measures constitute the most widely used class of gaze indicators in medical eye-tracking research. Typical metrics include fixation count, mean fixation duration, total dwell time, and proportion of fixations within predefined AOIs. Within the PV-CP framework, fixation-based measures are primarily sensitive to information selection, that is, the extent to which professionals allocate foveal processing resources to diagnostically or instructionally relevant cues. According to the information-reduction hypothesis [20], visual experts selectively attend to task-relevant information while actively ignoring irrelevant input. Empirical evidence from medical domains supports this assumption: experienced radiologists, for example, allocate a larger proportion of fixations to diagnostically relevant regions while spending less time on visually salient but irrelevant areas [16,42]. Further, especially in clinical image interpretation tasks (e.g., X-ray or MRI analysis), fixation density on diagnostically critical regions has been shown to differentiate experts from novices and to predict diagnostic accuracy [18,45]. Similarly, in ward-round situations, fixation-based indicators can capture whether physicians selectively attend to patients, monitors, or documentation at diagnostically relevant moments, thereby providing evidence for professionally guided information selection [1].

However, fixation-based measures alone cannot capture how visual information is related, interpreted, or integrated. Longer fixation durations may indicate deeper processing, uncertainty, or inefficient search, depending on task context and expertise level [46].

4.1.2. Transition- and Scanpath-Based Measures

Relational and structural processing. Beyond isolated fixations, professional vision is characterized by the ability to relationally connect visual information across space and time. Transition-based and scanpath measures capture sequential gaze movements between AOIs and are, therefore, sensitive to relational processing subprocesses. Metrics in this family include transition frequencies, transition probabilities, scanpath similarity indices, and graph-based representations of gaze sequences [46]. These measures provide insight into how professionals structure visual information and whether they follow diagnostically meaningful viewing patterns.

In medical expertise research, transition-based measures have been shown to reflect knowledge-driven viewing strategies. For example, expert radiologists exhibit more systematic transitions between diagnostically linked regions, whereas novices display more fragmented and stimulus-driven scanpaths [16,41]. In dynamic medical scenarios, such as simulated ward rounds, structured transitions between patient cues (e.g., facial expression, posture) and technical information (e.g., monitors, charts) indicate the coordination of multiple information sources [23,47].

Within the PV-CP framework, such relational gaze patterns provide evidence for early organization processes that precede explicit reasoning. They indicate whether visual cues are processed in isolation or integrated into coherent perceptual structures that support subsequent interpretation.

4.1.3. Entropy and Variability Measures: Strategic Organization of Visual Exploration

Entropy- and variability-based measures capture the global organization of gaze behavior and reflect how systematically visual information is sampled across the scene. Common metrics include spatial entropy, transition entropy, gaze dispersion, and recurrence measures [46,48,49]. From a professional vision perspective, these measures are sensitive to strategic organization subprocesses. Lower entropy values may indicate focused, goal-directed viewing, whereas higher entropy values may reflect exploratory or less organized scanning. Research on visual expertise suggests that experts often show adaptive regulation of gaze variability, combining an initially broad perceptual overview with subsequent focused inspection of relevant regions [24,44].

In medical education contexts, entropy-based measures have been used to differentiate novice and expert viewing strategies and to capture changes in gaze organization during learning and training interventions [41]. For example, during the interpretation of complex diagnostic images, experienced physicians tend to display structured reductions in entropy over time, reflecting the progressive narrowing of diagnostic hypotheses [27,50].

Within the PV-CP model, entropy-based measures provide information about the regulation of visual exploration over time, reflecting how systematically visual attention is distributed across a scene. Lower entropy values indicate more focused and goal-directed sampling of visual information, whereas higher entropy values reflect more exploratory or less constrained viewing behavior. Importantly, entropy does not index the correctness or semantic quality of visual processing, but rather the degree of organization and strategic control with which visual exploration is conducted under given task demands.

A further methodological caveat applies across the indicator families described above. Fixation-, transition-, and entropy-based measures all presuppose that the relevant regions of the visual scene have been delineated in advance. In static images such as a single radiograph this is tractable, although still theory-laden, but in dynamic clinical scenarios—ward rounds, surgical procedures, ultrasound examinations, or simulated patient interactions—AOIs must either be tracked frame by frame or defined on moving objects, and small differences in AOI boundaries can produce sizeable differences in derived metrics. Reported reliabilities of AOI-based measures therefore depend not only on the eye-tracker and analysis pipeline but on the explicitness and inter-rater stability of the AOI scheme itself. We recommend that medical eye-tracking studies report AOI definition procedures, inter-coder agreement on AOI boundaries where applicable, and sensitivity analyses examining how key conclusions shift under reasonable variations in AOI definition.

Gaze-based indicators (see Table 1) primarily index noticing-related subprocesses by revealing how visual information is sampled and organized over time, but they are restricted with respect to the underlying interpretation of this information. Verbal reasoning indicators complement this limitation by providing evidence for reasoning subprocesses, including cue interpretation, hypothesis articulation, and causal integration. Within the PV-CP framework, inferences about professional vision rely on the correspondence between noticing-related gaze indicators and reasoning-related verbal indicators. This correspondence allows researchers to distinguish between superficial noticing and professionally grounded interpretation.

4.2. Verbal Indicators of PV Subprocesses

Whereas above outlined gaze-based indicators provide access to perceptual–attentional subprocesses of PV-CP model, verbal indicators capture interpretative and knowledge-based subprocesses [51,52]. Verbalizations do not reflect perception itself, but rather indications for the meaning-making processes through which perceived information is interpreted and integrated based on knowledge schema. Consequently, verbal data are particularly informative for the reasoning component of professional vision within the PV-CP model.

From a conceptual perspective and relevant for PV research in the medical field, verbal indicators allow researchers to distinguish between qualitatively different forms of professional vision that may not be separable based on gaze behavior alone. In particular, verbalizations enable differentiation between noticing without understanding—that is, perceptual registration of relevant cues without appropriate interpretation—and noticing accompanied by diagnostically correct reasoning, in which visual information is coherently integrated into professional knowledge-based explanations.

Within the PV-CP framework, verbal indicators are primarily sensitive to subprocesses of knowledge integration, hypothesis generation, and causal reasoning (Figure 1). They provide evidence for how professionals:

justify why specific visual cues are considered relevant,
articulate diagnostic or explanatory hypotheses for predictions and decision making,
and construct causal chains linking observed cues to underlying conditions or decisions.

Importantly, verbal indicators reflect explicit, reportable representations of reasoning. As such, they capture aspects of professional vision that are accessible to conscious reflection and verbal articulation, but they do not provide direct insight into pre-attentive or perceptual selection processes. This asymmetry underscores the need for multimodal measurement: verbal data complement, but do not replace gaze-based indicators.

4.3. Indicator Families and Analytical Approaches

4.3.1. Cue Justification and Relevance Attribution

Cue justification refers to explicit statements in which professionals explain why a particular visual cue is considered diagnostically or professionally relevant. Typical indicators include references to symptom significance, deviations from expected patterns, or contrasts between normal and abnormal observations. In professional vision research, cue justification is a central indicator of knowledge-based interpretation. It differentiates between superficial cue naming and professionally grounded relevance attribution [40,52]. In medical contexts, cue justification allows researchers to assess whether physicians merely notice a cue (e.g., facial pallor, abnormal imaging region) or correctly relate it to underlying pathophysiological mechanisms [14,53].

4.3.2. Hypothesis Articulation

Hypothesis articulation captures whether and how professionals generate diagnostic or explanatory hypotheses that organize observed information. Verbal indicators include explicit diagnostic labels, conditional statements (e.g., “this could indicate…”), and probabilistic reasoning about alternative explanations. Within the PV-CP model, hypothesis articulation reflects integrative reasoning subprocesses, as hypotheses serve as organizing structures that guide further information selection and interpretation. Empirical research shows that expert performance is characterized not by a greater number of hypotheses, but by earlier generation of diagnostically plausible hypotheses and more efficient pruning of alternatives [14,22].

4.3.3. Causal Explanations and Coherence Building

Causal explanations refer to verbalizations that explicitly link observed cues to underlying mechanisms, temporal developments, or consequences. These explanations may involve physiological, psychological, or situational causal chains. Such indicators are particularly informative for assessing coherence formation, that is, the extent to which visual information is integrated into a structured mental model rather than processed as isolated facts. In medical education, the presence of coherent causal explanations has been shown to distinguish expert from novice reasoning even when surface-level cue identification is comparable [13,53].

4.3.4. Structural and Network-Based Representations

Beyond content-focused indicators, recent approaches conceptualize verbal data as structured representations that can be analyzed using network-analytic or relational methods. Concept maps, causal networks, or argumentation graphs model how cues, hypotheses, and explanations are interconnected within an individual’s reasoning structure. Within the PV-CP framework, such approaches could provide insight into the organization and integration of professional knowledge during interpretation. Network-based indicators capture not only which elements are mentioned, but how they are related, offering a complementary perspective to gaze-based relational measures. Although methodologically demanding, these approaches hold particular promise for studying advanced professional vision in complex medical tasks [15].

5. Implications for Research in Medical Education

Visual processing constitutes a core component of professional practice in medicine. Physicians are continuously required to perceive, select, and interpret complex patient data to multimodal information streams (e.g., electrocardiogram) during clinical interactions. Due to its importance, visual processing has received a lot of attention in studying expertise differences and development. It has, however, not yet been targeted as an explicit educational objective. The present paper addresses this potential by conceptualizing relevant visual skills under the framework of professional vision based on a cognitive process model (PV-CP) tailored to the specific demands of medical education.

5.1. Implications for Research on Visual Processing and Expertise Development

First, the professional vision cognitive processing model (PV-CP) provides a principled framework for advancing research on visual processing and expertise differences in medical education beyond descriptive expert–novice comparisons. By explicitly distinguishing between noticing-related and reasoning-related subprocesses, the model enables researchers to formulate process-specific research questions about how visual expertise is constituted in different medical tasks.

A central implication is that differences between novices and experts can be examined not only in terms of outcomes or aggregate gaze metrics, but in terms of distinct configurations of subprocesses. For example, expertise differences may manifest as more efficient information selection, earlier relational organization of cues, or more coherent integration of visual information with domain knowledge—each of which can be investigated separately and in combination. This allows researchers to identify which subprocesses are task-critical, which develop earlier or later in training, and which are most predictive of diagnostic or communicative success.

Moreover, the PV-CP model supports research on the context sensitivity of visual expertise. Rather than assuming a unitary form of visual expertise across medical domains, the model invites systematic investigation of how different application contexts—such as static image interpretation, dynamic ward-round situations, or patient–physician interactions—activate different subsets and weightings of professional vision subprocesses. This opens the possibility to empirically test the domain-general versus domain-specific validity of professional vision processes within medical education.

5.2. Implications for the Design and Evaluation of Learning Environments

Second, the PV-CP model and the associated measurement framework have direct implications for the design of learning environments in medical education. Simulation-based learning environments, virtual patients, and video-based cases increasingly constitute central instructional formats in medical training [54,55]. However, these learning environments often focus primarily on performance outcomes, while leaving the underlying visual and cognitive processes implicit.

By explicitly modeling professional vision subprocesses, instructional designers can target and scaffold specific components of visual expertise, such as information selection, relational processing, or coherence formation. For example, simulations can be designed to systematically manipulate visual complexity, cue salience, or temporal dynamics in order to elicit and train specific noticing strategies [1,56]. Likewise, instructional prompts and feedback can be aligned with reasoning subprocesses, supporting learners in articulating cue justifications, generating hypotheses, and constructing coherent causal explanations.

Two concrete examples illustrate how this can be operationalised. (a) A ward-round simulation using a high-fidelity manikin and a confederate “patient” can systematically vary whether a key clinical cue—say, subtle peripheral cyanosis—is presented alone, alongside a salient but irrelevant cue, or embedded in a noisy multimodal display including monitor alarms and chart entries. Wearable eye tracking captures whether the learner allocates fixations to the diagnostically critical region; a brief structured think-aloud at the end of each scenario captures whether the cue, once fixated, was correctly interpreted. The instructional manipulation is not the case content but the visual–cognitive demand profile, and the dependent measures are PV sub-process indicators rather than diagnostic accuracy alone. (b) An eye-movement modelling example (EMME) in radiology presents novices with a chest CT and overlays the gaze trajectory of an expert who has annotated key cues with brief verbal justifications [5]. After viewing, learners perform a transfer case; instructional feedback is targeted at the sub-process where their performance diverged from the expert—e.g., a delayed first fixation on the relevant region (information selection), an unsystematic scanpath (relational organisation), or a fixated-but-unjustified cue (knowledge integration). In both designs the PV-CP model functions as a diagnostic framework for the learner’s process, allowing instructional support to be targeted rather than generic.

Importantly, the process-oriented measurement approaches outlined in this paper also enable formative evaluation of learning environments [54,57]. Rather than relying solely on outcome measures such as diagnostic accuracy, researchers and educators can assess whether an instructional intervention changes the processes through which learners perceive and interpret medical information. This opens avenues for adaptive and personalized learning designs, in which instructional support is dynamically adjusted based on learners’ professional vision profiles.

5.3. Implications for Professional Competence Development

Third, the present framework has implications for research on professional competence development in medical education. Professional competence development encompasses the gradual integration of cognitive, motivational-affective, and metacognitive dispositions that characterize competent professionals [31]. Within this perspective, professional vision skills represent a situated and observable manifestation of how professional dispositions are enacted in real-time practice. Changes in how medical students and physicians attend to patients, prioritize information, and justify their interpretations reflect not only cognitive growth, but also shifts in professional goals, responsibilities, and normative orientations. Longitudinal research that combines professional vision indicators with measures of motivation, self-concept, and reflective capacity may event contribute to an enriched understanding of professional vision in light of professional identify formation.

Building on these developmental considerations, the PV-CP framework opens a tractable empirical agenda for medical education research. We highlight five questions that, in our view, are both feasible with current methodology and consequential for the field. First, do combined gaze and verbal-reasoning profiles, derived from standardized PV tasks, predict performance on Objective Structured Clinical Examinations (OSCEs) or in vivo diagnostic accuracy beyond what is predicted by knowledge tests alone? Demonstrating such incremental validity would establish PV as a distinct competence component rather than a redundant proxy for knowledge. Second, can professional vision be trained longitudinally, and which sub-processes are most malleable? Designs that track noticing and reasoning indicators across the medical curriculum—ideally at matched task points in years 1, 3, and final—would clarify whether PV develops gradually with case exposure, in step-wise fashion at clerkship transitions, or only with deliberate practice. Third, which PV sub-processes most reliably distinguish residents from attending physicians on the same cases? Existing expert–novice contrasts compare populations that differ on many dimensions; resident–attending contrasts on identical material would isolate the sub-processes that change with the final, slowest-acquired layers of expertise. Fourth, do EMT-guided feedback interventions—for example, showing learners their own gaze contrasted with an expert reference—produce transfer to authentic patient-care tasks rather than only to similar laboratory cases? Fifth, how stable are PV indicators across cases of comparable difficulty, and across modalities (static images, dynamic scenes, patient encounters)? Generalizability-theory analyses are needed to estimate how many cases are required for a defensible inference at the individual learner level, which is the precondition for any formative or summative use of PV measures.

6. Conclusions

This paper advances research in medical education by fostering a more elaborate conception of professional vision and by specifying underlying perceptual and cognitive subprocesses within the PV-CP model. By integrating insights from expertise research with a process-oriented, multimodal measurement framework, we argue that professional vision constitutes a latent, temporally unfolding construct that cannot be adequately captured through single measurements and performance outcomes alone. Instead, valid inferences require the alignment of gaze-based indicators of noticing with verbal indicators of reasoning. Beyond its conceptual contribution, the PV-CP model provides a coherent foundation for future research on visual expertise, the design and evaluation of simulation-based learning environments, and the study of professional development in medicine. Making professional vision an explicit target of research and instruction thus holds promise for advancing both theory and practice in medical education.

Author Contributions

Conceptualization, T.S. and C.K.; methodology, T.S., C.K. and R.B.; writing—original draft preparation, T.S., C.K.; writing—review and editing, T.S., C.K., R.B., M.G. and P.O.B.; supervision, T.S. and P.O.B. All authors have read and agreed to the published version of the manuscript.

Funding

Research is funded by the German Research Foundation (DFG) in the Collaborative Research Center (CRC) 419 SHARP, Funding Nr. INST 86/2324-1.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Jarodzka, H.; Scheiter, K.; Gerjets, P.; van Gog, T. In the eyes of the beholder: How experts and novices interpret dynamic stimuli. Learn. Instr. 2010, 20, 146–154. [Google Scholar] [CrossRef]
Waite, S.; Grigorian, A.; Alexander, R.G.; Macknik, S.L.; Carrasco, M.; Heeger, D.J.; Martinez-Conde, S. Analysis of Perceptual Expertise in Radiology–Current Knowledge and a New Perspective. Front. Hum. Neurosci. 2019, 13, 213. [Google Scholar] [CrossRef]
Blake, Y. Better Watch the Invisible: A Systematic Review of Technological Breakthroughts in the 2025 Medical Imaging Landscape; Zenodo (CERN European Organization For Nuclear Research): Geneva, Switzerland, 2025. [Google Scholar] [CrossRef]
Brunyé, T.T.; Drew, T.; Weaver, D.L.; Elmore, J.G. A review of eye tracking for understanding and improving diagnostic interpretation. Cogn. Res. Princ. Implic. 2019, 4, 7. [Google Scholar] [CrossRef]
Jarodzka, H.; Balslev, T.; Holmqvist, K.; Nyström, M.; Scheiter, K.; Gerjets, P.; Eika, B. Conveying clinical reasoning based on visual observation via eye-movement modelling examples. Instr. Sci. 2012, 40, 813–827. [Google Scholar] [CrossRef]
Sablić, M.; Mirosavljević, A.; Škugor, A. Video-Based Learning (VBL)—Past, Present and Future: An Overview of the Research Published from 2008 to 2019. Technol. Knowl. Learn. 2020, 26, 1061–1077. [Google Scholar] [CrossRef]
Gaudin, C.; Chaliès, S. Video viewing in teacher education and professional development: A literature review. Educ. Res. Rev. 2015, 16, 41–67. [Google Scholar] [CrossRef]
Goodwin, C. Professional Vision. Am. Anthropol. 1994, 96, 606–633. [Google Scholar] [CrossRef]
Wilson, I.; Cowin, L.S.; Johnson, M.; Young, H. Professional identity in medical students: Pedagogical challenges to medical education. Teach. Learn. Med. 2013, 25, 369–373. [Google Scholar] [CrossRef]
Seidel, T.; Böheim, R.; Kosel, C. Developing a cognitive model of advanced teacher professional vision for understanding processes of noticing and reasoning. Front. Psychol. 2025; in review.
Seidel, T.; Kosel, C.; Böheim, R.; Gegenfurtner, A.; Stürmer, K. A cognitive perspective on teachers’ professional vision: How teachers’ professional knowledge shapes a professional vision. In Teacher Professional Vision: Theoretical and Methodological Advances; Gegenfurtner, A., Stahnke, R., Eds.; Routledge: London, UK, 2025; pp. 43–56. [Google Scholar] [CrossRef]
Seidel, T.; Stürmer, K. Modeling the structure of professional vision in pre-service teachers. Am. Educ. Res. J. 2014, 51, 739–771. [Google Scholar] [CrossRef]
Fischer, F.; Kollar, I.; Ufer, S.; Sodian, B.; Hussmann, H.; Pekrun, R.; Neuhaus, B.; Dorner, B.; Pankofer, S.; Fischer, M.; et al. Scientific Reasoning and Argumentation: Advancing an Interdisciplinary Research Agenda in Education. Frontline Learn. Res. 2014, 2, 28–45. [Google Scholar] [CrossRef]
Boshuizen, H.P.; Schmidt, H.G. On the Role of Biomedical Knowledge in Clinical Reasoning by Experts, Intermediates and Novices. Cogn. Sci. 1992, 16, 153–184. [Google Scholar] [CrossRef]
Kosel, C.; Bauer, E.; Seidel, T. Where experience makes a difference: Teachers’ judgment accuracy and diagnostic reasoning regarding student learning characteristics. Front. Psychol. 2024, 15, 1278472. [Google Scholar] [CrossRef]
Kundel, H.L.; Nodine, C.F.; Conant, E.F.; Weinstein, S.P. Holistic component of image perception in mammogram interpretation: Gaze-tracking study. Radiology 2007, 242, 396–402. [Google Scholar] [CrossRef] [PubMed]
Just, M.A.; Carpenter, P.A. A capacity theory of comprehension: Individual differences in working memory. Psychol. Rev. 1992, 99, 122–149. [Google Scholar] [CrossRef] [PubMed]
Drew, T.; Vo, M.L.H.; Wolfe, J.M. The invisible gorilla strikes again: Sustained inattentional blindness in expert observers. Psychol. Sci. 2013, 24, 1848–1853. [Google Scholar] [CrossRef]
Gegenfurtner, A.; Lehtinen, E.; Helle, L.; Nivala, M.; Svedström, E.; Säljö, R. Learning to see like an expert: On the practices of professional vision and visual expertise. Int. J. Educ. Res. 2019, 98, 280–291. [Google Scholar] [CrossRef]
Haider, H.; Frensch, P.A. Information reduction during skill acquisition: The influence of task instruction. J. Exp. Psychol. Appl. 1999, 5, 129–151. [Google Scholar] [CrossRef]
Boshuizen, H.P.A.; Gruber, H.; Strasser, J. Knowledge restructuring through case processing: The key to generalise expertise development theory across domains? Educ. Res. Rev. 2020, 29, 100310. [Google Scholar] [CrossRef]
Norman, G.R.; Monteiro, S.D.; Sherbino, J.; Ilgen, J.S.; Schmidt, H.G.; Mamede, S. The Causes of Errors in Clinical Reasoning: Cognitive Biases, Knowledge Deficits, and Dual Process Thinking. Acad. Med. 2016, 92, 23–30. [Google Scholar] [CrossRef]
Heitzmann, N.; Seidel, T.; Opitz, A.; Hetmanek, A.; Wecker, C.; Fischer, M.; Ufer, S.; Schmidmaier, R.; Neuhaus, B.; Siebeck, M.; et al. Facilitating diagnostic competences in simulations: A conceptual framework and a research agenda for medical and teacher education. Frontline Learn. Res. 2019, 7, 1–24. [Google Scholar] [CrossRef]
Gegenfurtner, A.; Gruber, H.; Holzberger, D.; Keskin, Ö.; Lehtinen, E.; Säljö, R.; Seidel, T.; Stürmer, K. Cognitive theory of visual expertise. In Re-Theorizing Learning and Research Methods in Learning Research; EARLI series “New Perspectives on Learning and Instruction”; Damşa, C., Rajala, A., Ritella, G., Brouwer, J., Eds.; Routledge: London, UK, 2022. [Google Scholar]
Brunswik, E. Perception and the Representative Design of Psychological Experiments; University of California Press: Oakland, CA, USA, 2023. [Google Scholar]
Schnitzler, K.; Holzberger, D.; Seidel, T. Connecting Judgment Process and Accuracy of Student Teachers: Differences in Observation and Student Engagement Cues to Assess Student Characteristics. Front. Educ. 2020, 5, 602470. [Google Scholar] [CrossRef]
van der Gijp, A.; Ravesloot, C.J.; Jarodzka, H.; van der Schaaf, M.F.; van der Schaaf, I.C.; van Schaik, J.P.; ten Cate, T.J. How visual search relates to diagnostic performance: A narrative systematic review of eye-tracking research in radiology. Adv. Health Sci. Educ. 2017, 22, 765–787. [Google Scholar] [CrossRef]
Blömeke, S.; Gustafsson, J.-E.; Shavelson, R.J. Approaches to competence measurement in higher education. Z. Für Psychol. 2015, 233, 1–2. [Google Scholar] [CrossRef]
Sherin, M.G.; Jacobs, V.R.; Randolph, P.A. (Eds.) Mathematics Teacher Noticing: Seeing Through Teachers’ Eyes; Routledge: London, UK, 2011. [Google Scholar]
Seidel, T.; Stürmer, K.; Schäfer, S.; Jahn, G. How Preservice teachers perform in teaching events regarding generic teaching and learning components. Z. Fur Entwicklungspsychologie Padagog. Psychol. 2015, 47, 62–74. [Google Scholar] [CrossRef]
Weinert, S.; Artelt, C.; Prenzel, M.; Senkbeil, M.; Ehmke, T.; Carstensen, C.H. Development of competencies across the life span. Z. Fur Erzieh. 2011, 14, 67–86. [Google Scholar] [CrossRef]
Holzberger, D.; Philipp, A.; Kunter, M. How teachers’ self-efficacy is related to instructional quality: A longitudinal analysis. J. Educ. Psychol. 2013, 105, 774–786. [Google Scholar] [CrossRef]
Kersting, N.B.; Givvin, K.B.; Thompson, B.J.; Santagata, R.; Stigler, J.W. Measuring Usable Knowledge: Teachers’ Analyses of Mathematics Classroom Videos Predict Teaching Quality and Student Learning. Am. Educ. Res. J. 2012, 49, 568–589. [Google Scholar] [CrossRef]
Daumiller, M.; Böheim, R.; Alijagic, A.; Lewalter, D.; Gegenfurtner, A.; Seidel, T.; Dresel, M. Guiding attention in the classroom: An eye-tracking study on the associations between preservice teachers’ goals and noticing of student interactions. Br. J. Educ. Psychol. 2025, 95, S115–S132. [Google Scholar] [CrossRef] [PubMed]
Gold, B.; Pfirrmann, C.; Holodynski, M. Promoting Professional Vision of Classroom Management Through Different Analytic Perspectives in Video-Based Learning Environments. J. Teach. Educ. 2020, 72, 431–447. [Google Scholar] [CrossRef]
Santagata, R.; Kersting, N.; Givvin, K.B.; Stigler, J.W. Problem Implementation as a Lever for Change: An Experimental Study of the Effects of a Professional Development Program on Students’ Mathematics Learning. J. Res. Educ. Eff. 2010, 4, 1–24. [Google Scholar] [CrossRef]
Stürmer, K.; Seidel, T.; Holzberger, D. Intra-individual differences in developing professional vision—Preservice teachers’ changes in the course of an innovative teacher education program. Instr. Sci. 2016, 44, 293–309. [Google Scholar] [CrossRef]
Faihs, V.; Heininger, S.; McLennan, S.; Gartmeier, M.; Berberat, P.O.; Wijnen-Meijer, M. Professional Identity and Motivation for Medical School in First-Year Medical Students: A Cross-sectional Study. Med. Sci. Educ. 2023, 33, 431–441. [Google Scholar] [CrossRef]
Kusurkar, R.A.; Ten Cate, T.J.; van Asperen, M.; Croiset, G. Motivation as an independent and a dependent variable in medical education: A review of the literature. Med. Teach. 2011, 33, e242–e262. [Google Scholar] [CrossRef]
Lievens, F.; Coetsier, P.; De Fruyt, F.; De Maeseneer, J. Medical students’ personality characteristics and academic performance: A five-factor model perspective. Med. Educ. 2002, 36, 1050–1056. [Google Scholar] [CrossRef]
Kosel, C.; Holzberger, D.; Seidel, T. Identifying Expert and Novice Visual Scanpath Patterns and Their Relationship to Assessing Learning-Relevant Student Characteristics. Front. Educ. 2021, 5, 612175. [Google Scholar] [CrossRef]
Seidel, T.; Schnitzler, K.; Kosel, C.; Sturmer, K.; Holzberger, D. Student Characteristics in the Eyes of Teachers: Differences Between Novice and Expert Teachers in Judgment Accuracy, Observed Behavioral Cues, and Gaze. Educ. Psychol. Rev. 2021, 33, 69–89. [Google Scholar] [CrossRef]
Santagata, R.; Yeh, C. Learning to teach mathematics and to analyze teaching effectiveness: Evidence from a video- and practice-based approach. J. Math. Teach. Educ. 2014, 17, 491–514. [Google Scholar] [CrossRef]
Gegenfurtner, A.; Lehtinen, E.; Säljö, R. Expertise differences in the comprehension of visualizations: A meta-analysis of eye-tracking research in professional domains. Educ. Psychol. Rev. 2011, 23, 523–552. [Google Scholar] [CrossRef]
Kundel, H.L.; Nodine, C.F.; Krupinski, E.A. Searching for lung nodules: Visual dwell indicates locations of false-positive and false-negative decisions. Investig. Radiol. 1990, 25, 472–478. [Google Scholar] [CrossRef]
Holmqvist, K.; Nyström, M.; Andersson, R.; Dewhurst, R.; Jarodzka, H.; Van De Weijer, J. Eye Tracking: A Comprehensive Guide to Methods and Measures; Oxford University Press (OUP): Oxford, UK, 2011. [Google Scholar]
Litchfield, D.; Ball, L.J.; Donovan, T.; Manning, D.J.; Crawford, T. Viewing another person’s eye movements improves identification of pulmonary nodules in chest x-ray inspection. J. Exp. Psychol. Appl. 2010, 16, 251–262. [Google Scholar] [CrossRef]
Duchowski, A.T. Eye Tracking Methodology: Theory and Practice; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
Goldberg, J.H.; Helfman, J.I. Visual scanpath representation. In Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications; Association for Computing Machinery: New York, NY, USA, 2010; pp. 199–206. [Google Scholar] [CrossRef]
Kundel, H.L.; Nodine, C.F. A visual concept shapes image perception. Radiology 1983, 146, 363–368. [Google Scholar] [CrossRef]
Chi, M.T.H. Quantifying qualitative analyses of verbal data: A practical guide. J. Learn. Sci. 1997, 6, 271–315. [Google Scholar] [CrossRef] [PubMed]
Ericsson, K.A.; Simon, H.A. Protocol Analysis: Verbal Reports as Data; eBooks; The MIT Press: London, UK, 1993. [Google Scholar] [CrossRef]
Charlin, B.; Roy, L.; Brailovsky, C.; Goulet, F.; Van der Vleuten, C. The Script Concordance Test: A Tool to Assess the Reflective Clinician. Teach. Learn. Med. 2000, 12, 189–195. [Google Scholar] [CrossRef]
Chernikova, O.; Heitzmann, N.; Stadler, M.; Holzberger, D.; Seidel, T.; Fischer, F. Simulation-based learning in higher education: A meta-analysis. Rev. Educ. Res. 2020, 90, 499–541. [Google Scholar] [CrossRef]
Cook, D.A.; Hatala, R.; Brydges, R.; Zendejas, B.; Szostek, J.H.; Wang, A.T.; Erwin, P.J.; Hamstra, S.J. Technology-enhanced simulation for health professions education: A systematic review and meta-analysis. JAMA 2011, 306, 978–988. [Google Scholar] [CrossRef] [PubMed]
van Gog, T.; Paas, F.; Marcus, N.; Ayres, P.; Sweller, J. The mirror neuron system and observational learning: Implications for the design of educational animations. Educ. Psychol. Rev. 2009, 21, 21–42. [Google Scholar] [CrossRef]
Azevedo, R.; Gašević, D. Analyzing multimodal data in learning analytics: Opportunities and challenges. Comput. Hum. Behav. 2019, 92, 3–12. [Google Scholar] [CrossRef]

Figure 1. Professional vision cognitive processing model (PV-CP) [11]. The model represents professional vision as a temporally unfolding process in which incoming visual information from a clinical scene is first selectively encoded through foveal and parafoveal processing (left), shaped by activated professional schemas in extended long-term working memory. Encoded information is then compared against case- and schema-based expectations: schema-aligned cues support fast, fluent processing, whereas schema-non-aligned cues trigger more fine-grained visual search and schema adaptation (centre). Selected cues are organised into meaningful chunks and integrated with domain-specific knowledge to support interpretation, hypothesis generation, and clinical decision making (right). The two upper sub-processes constitute the noticing component of professional vision; the two lower sub-processes constitute the reasoning component.

Table 1. Gaze-based indicator families for the subprocesses of the professional vision cognitive processing (PV-CP) model, with exemplary metrics, typical medical applications, and interpretive scope.

PV Subprocess (PV-CP Model)	Functional Description	Indicator Family	Exemplary Metrics	Examples of Typical Medical Applications	Interpretive Scope and Limitations
Information selection/encoding (a)	Selective encoding of diagnostically or instructionally relevant visual information into working memory	Fixation-based measures ^a	Fixation count; mean fixation duration; dwell time; proportion of fixations on diagnostically relevant AOIs	Radiological image interpretation; monitoring patient cues during ward rounds; inspection of medical devices	Indicates what information is selected, not why; longer fixations may reflect deeper processing or uncertainty
Breadth of visual field (b)	Allocation of visual processing resources across foveal and parafoveal regions	AOI-based distribution measures ^b	Relative dwell time on central vs. peripheral AOIs; fixation dispersion; fixation ratios	Differentiating focal abnormalities from surrounding anatomical context; patient vs. environment monitoring	Sensitive to AOI definition; does not capture semantic interpretation
Schema-aligned vs. schema-non-aligned processing	Differential processing of expected versus unexpected visual information based on activated professional schema	Fixation- and transition-based measures ^c	Dwell time on expected vs. unexpected regions; re-fixations; transitions toward anomalies	Detection of atypical findings in diagnostic images; noticing deviations during clinical routines	Requires theory-driven definition of “expected”; anomaly detection does not imply correct interpretation
Organizing (c)	Structuring visual information into meaningful perceptual chunks	Transition- and scanpath-based measures ^d	Transition frequencies; transition probabilities; scanpath similarity; gaze sequence graphs	Coordinating patient cues with monitor data; linking symptoms across image regions	Reveals structural organization of viewing, not semantic integration
Organizing (c)	Global structuring and regulation of visual exploration over time	Entropy and variability measures ^e	Spatial entropy; transition entropy; gaze dispersion; recurrence quantification	Shifts from broad overview to focused inspection in diagnostic tasks	Entropy reflects organization, not correctness; high or low entropy can both be adaptive

Notes on reliability and psychometric considerations.^aFixation-based measures (Row 1): Generally good test–retest reliability for total dwell and fixation count on stable AOIs; both metrics are sensitive to AOI boundary definition and to minimum-fixation thresholds, which should be reported. ^b AOI-based distribution measures (Row 2): Distributional ratios are scale-sensitive—small AOIs in cluttered scenes can yield unstable estimates. Report AOI size, count, and the rationale for central vs. peripheral classification.^c Schema-based measures (Row 3): Validity hinges on a defensible a priori specification of expected vs. anomalous regions, ideally derived from expert consensus, with inter-rater agreement reported. ^d Transition- and scanpath-based measures (Row 4): Sensitive to AOI granularity; scanpath similarity indices vary substantially by algorithm (e.g., string-edit, ScanMatch, MultiMatch). Use one indicator family consistently and report the algorithm. ^e Entropy and variability measures (Row 5): Entropy estimates require minimum scanpath length and have documented small-sample bias; report fixation counts per trial and consider bias-corrected estimators.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Published by MDPI on behalf of the Academic Society for International Medical Education. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.

Share and Cite

MDPI and ACS Style

Seidel, T.; Kosel, C.; Böheim, R.; Gartmeier, M.; Berberat, P.O. Modelling and Measuring Professional Vision in Medical Education: A Cognitive Process Framework. Int. Med. Educ. 2026, 5, 52. https://doi.org/10.3390/ime5020052

AMA Style

Seidel T, Kosel C, Böheim R, Gartmeier M, Berberat PO. Modelling and Measuring Professional Vision in Medical Education: A Cognitive Process Framework. International Medical Education. 2026; 5(2):52. https://doi.org/10.3390/ime5020052

Chicago/Turabian Style

Seidel, Tina, Christian Kosel, Ricardo Böheim, Martin Gartmeier, and Pascal O. Berberat. 2026. "Modelling and Measuring Professional Vision in Medical Education: A Cognitive Process Framework" International Medical Education 5, no. 2: 52. https://doi.org/10.3390/ime5020052

APA Style

Seidel, T., Kosel, C., Böheim, R., Gartmeier, M., & Berberat, P. O. (2026). Modelling and Measuring Professional Vision in Medical Education: A Cognitive Process Framework. International Medical Education, 5(2), 52. https://doi.org/10.3390/ime5020052

Article Menu

Modelling and Measuring Professional Vision in Medical Education: A Cognitive Process Framework

Abstract

1. Introduction

2. Modelling Professional Vision for the Medical Profession

2.1. What Makes Visual Work in Medicine Distinctive

2.2. A Cognitive Process Model for the Demands of Medical Visual Work

2.3. Determinants and Outcomes of Professional Vision Skills

3. Measuring Professional Vision Skills in the Medical Field

Applying Noticing and Reasoning Measurements Based on the PV-CP Model

4. A Multimodal Measurement Framework for Professional Vision

4.1. Gaze-Based Indicators of PV Subprocesses

4.1.1. Fixation-Based Measures: Information Selection and Cue Prioritization

4.1.2. Transition- and Scanpath-Based Measures

4.1.3. Entropy and Variability Measures: Strategic Organization of Visual Exploration

4.2. Verbal Indicators of PV Subprocesses

4.3. Indicator Families and Analytical Approaches

4.3.1. Cue Justification and Relevance Attribution

4.3.2. Hypothesis Articulation

4.3.3. Causal Explanations and Coherence Building

4.3.4. Structural and Network-Based Representations

5. Implications for Research in Medical Education

5.1. Implications for Research on Visual Processing and Expertise Development

5.2. Implications for the Design and Evaluation of Learning Environments

5.3. Implications for Professional Competence Development

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI