From Overtrust to Distrust: A Simulation Study on Driver Trust Calibration in Conditional Automated Driving

Hwang, Heetae; Kim, Juhyeon; Kim, Hojoon; Min, Heewon; Park, Kyudong

doi:10.3390/app152111342

Open AccessArticle

From Overtrust to Distrust: A Simulation Study on Driver Trust Calibration in Conditional Automated Driving

by

Heetae Hwang

¹

,

Juhyeon Kim

²,

Hojoon Kim

³,

Heewon Min

³

and

Kyudong Park

^3,*

¹

Department of Industrial and Management Engineering, Pohang University of Science and Technology, Pohang 37673, Republic of Korea

²

Designovel Co., Ltd., Seoul 06180, Republic of Korea

³

Department of Artificial Intelligence Applications, Kwangwoon University, Seoul 01897, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(21), 11342; https://doi.org/10.3390/app152111342

Submission received: 20 September 2025 / Revised: 14 October 2025 / Accepted: 20 October 2025 / Published: 22 October 2025

(This article belongs to the Special Issue Augmented and Virtual Reality for Smart Applications)

Download

Browse Figures

Versions Notes

Abstract

Conditional automated driving delegates routine control to automation while keeping drivers responsible for supervision and timely takeovers. In this context, safety and usability hinge on calibrated trust, a state between overtrust and distrust that aligns reliance with actual system capabilities. We investigated how calibrated trust relates to concurrent behavior during conditional automation in a driving-simulator study (n = 26). After a brief familiarization block, drivers completed four takeover request (TOR) exposures while performing a non-driving-related task (NDRT). Trust was assessed with a validated multi-item inventory. NDRT engagement was operationalized as successful Surrogate Reference Task (SuRT) clicks per second, and takeover behavior was indexed by TOR reaction time (TOR-RT) from TOR onset to the first valid control input. The results showed that higher trust was associated with greater ND RT throughput during automated driving, whereas TOR-RT did not change significantly across repeated exposures, consistent with familiarization. In this sample, we did not observe a systematic penalty in TOR-RT associated with higher trust; however, confidence-interval benchmarks indicate that modest delays cannot be ruled out. This suggests that, after brief onboarding, calibrated trust can coexist with timely safety-critical responses within the limits of our design. These findings tentatively support interface and training strategies that promote calibrated trust (e.g., predictable TOR policies, transparent capability boundaries, and short onboarding) to help drivers navigate between overtrust and distrust.

Keywords:

trust calibration; conditional automated driving; driving simulator

1. Introduction

Autonomous driving technologies are transforming everyday mobility and the in-vehicle experience. As automation becomes commercially available, human drivers no longer need to always devote uninterrupted attention to manual control; under appropriate conditions, they can engage in non-driving-related tasks (NDRTs) such as reading or email while the vehicle handles routine control. At the same time, fully autonomous operation that allows the driver to relinquish responsibility entirely has not yet been realized in practice. The Society of Automotive Engineers (SAE) distinguishes six levels of driving automation from Level 0 to Level 5 [1]. Lower levels assist the human driver, whereas higher levels shift more driving authority to the system. The class of “conditional automated driving” under active commercialization corresponds to Level 3, in which the system primarily controls the vehicle but issues a takeover request (TOR) when operating limits are approached or exceeded. In such moments, the interface should provide clear visual and auditory alerts to ensure the driver perceives the TOR [2], and the driver is expected to respond promptly and resume manual control.

Within this TOR-centric context, maintaining appropriate trust in the automated system is critical. As Parasuraman et al. [3] noted, overtrust (trust that exceeds the system’s true capabilities) can lead drivers to rely on automation beyond its limits, undermining timely TOR responses and potentially increasing crash risk. Conversely, distrust or reluctance to accept a new automated system can suppress the benefits of conditional automation: drivers may over-monitor the roadway, allocate insufficient attention to the NDRT, and thus fail to realize comfort or productivity gains [4]. In human–automation interaction, aligning human trust with actual system capabilities, termed trust calibration, is therefore a central goal [5,6].

Trust is dynamic: it changes with experience and varies by situation [7,8]. Recent studies have examined how interaction with automated driving systems (ADS) shapes this calibration and reported that initial trust can influence subsequent trajectories of adjustment [9,10,11]. Wu et al. [12] examined TOR timing and warning modality in Level-3 automated driving and found that longer TOR lead times increased trust without inducing confusion, while warning modality had no reliable main effect on trust and tri-modal warnings offered no clear advantage over bi-modal combinations. Swain et al. [13] showed that augmented-reality HMI designs conveying shared intended pathway and object-recognition bounding boxes can reduce stress and increase perceived usefulness and intentions to use during conditional automation monitoring. Together, these findings underscore the roles of interface design and adequate time budgets in supporting calibrated trust during critical transition periods in Level-3 operation. However, evidence remains limited on how calibrated trust emerges through everyday interaction and how it co-varies with concurrent behavior during conditional automation. Additional work that measures trust change from multiple angles is needed.

Trust can be assessed via self-report questionnaires, behavioral measures, and psychophysiological indices [14]. In this study, we focus on behavioral readouts to observe whether trust calibration is reflected in moment-to-moment behavior. Prior research has used gaze behavior [15], NDRT engagement [16,17,18], and takeover responses [10,18,19,20] as indirect indicators of trust. Building on this tradition, we examine (i) NDRT engagement—operationalized with a standard Surrogate Reference Task (SuRT) measure of successful clicks per second—and (ii) takeover reaction time (TOR-RT) as complementary behavioral expressions of trust during conditional automation. Consistent with our registered analysis plan, our primary predictor is trust, measured immediately after a brief familiarization period intended to align mental models with observed system behavior. We analyze associations between trust, NDRT throughput, and TOR-RT across four TOR exposures administered in a fixed order. This work offers two contributions: it re-examines behavioral indicators of trust in conditional automation by jointly analyzing NDRT throughput and TOR-RT alongside self-reported trust, and it tests whether changes in trust are accompanied by corresponding changes in behavior over repeated exposures—moving beyond static correlations to probe whether a calibrated, trust state coexists with more effective NDRT performance without incurring systematic costs in takeover readiness.

Accordingly, we ask whether a brief period of interaction followed by familiarization yields a calibrated trust state that is detectable both in self-report and in concurrent behavior during conditional automation. Specifically, we ask three research questions: First, does TOR interaction produce evidence of trust calibration in self-report? Second, is trust associated with greater NDRT throughput during automated driving, operationalized as successful SuRT clicks per second, across repeated exposures? Third, is trust associated with TOR-RT such that higher calibrated trust coexists with timely takeovers rather than systematic delays?

2. Related Work

2.1. Trust in Automation and Trust Calibration

From a human–automation perspective, Lee and See [8] define trust as an attitude that an agent will help achieve one’s goals under uncertainty and vulnerability. Trustworthiness refers to properties of the agent or organization that warrant trust [21]. When human trust exceeds trustworthiness, overtrust arises; when it falls short, distrust results [8]. Calibrated trust is achieved when trust matches trustworthiness, a state critical for appropriate reliance on automation [3,6].

Trust calibration is fundamentally a dynamic process shaped by multiple theoretical mechanisms. At the cognitive level, trust formation relies on mental models of system capabilities and limitations [22]. As Sheridan [23] notes, operators develop internal representations of automation behavior through observation and experience, which then guide expectation formation and reliance decisions. When system behavior aligns with these mental models, trust stabilizes; mismatches trigger recalibration [24]. This cognitive updating process is complemented by affective responses: positive experiences increase trust through emotional conditioning, while negative events such as failures or near-misses produce trust declines mediated by both cognitive reappraisal and affective reactions [25,26].

Multiple theoretical frameworks characterize how trust evolves through interaction. Muir and Moray [27] showed that experience with a system reshapes trust over time. Hoff and Bashir [7] conceptualized trust calibration as the dynamic adjustment of learned trust within their three-layer trust model. Their framework distinguishes between dispositional trust, situational trust, and learned trust as fundamental components of human trust in autonomous systems. Learned trust operates at two levels: initial learned trust, which forms early impressions of the system, and dynamic learned trust, which continuously evolves through actual system interactions. As users engage with autonomous systems, their dynamic learned trust responds to observed system performance and design features, effectively recalibrating their overall trust level. This calibration process begins with baseline trust established through dispositional tendencies and initial system impressions. Subsequently, ongoing interactions with the autonomous system modify this baseline through dynamic learned trust adjustments, resulting in trust calibration that reflects actual system experience.

Parasuraman and Riley [3] further emphasized that trust calibration serves as a regulatory mechanism for automation use, preventing both misuse (overtrust leading to complacency) and disuse (distrust leading to rejection). Their framework identifies trust as a mediating variable between system characteristics and operator reliance, with calibration occurring through iterative cycles of use, performance observation, and belief updating. More recently, de Visser et al. [28] extended these ideas to human–robot teams, proposing that trust calibration in dynamic environments requires continuous adaptation as task demands, system capabilities, and environmental conditions change.

Recent ADS studies report calibration during early exposure: prior information and first-drive experiences can shift trust [9], and drivers predisposed to low trust may exhibit increases, whereas those predisposed to high trust may maintain their level during initial interaction [10,11]. Forster et al. [29] reported that introductory information about automation reliability could proactively shape trust expectations, with the largest calibration effects observed among users starting with lower baseline trust. These findings collectively suggest that trust calibration is most pronounced during initial exposures and varies systematically with users’ dispositional tendencies. Our study extends this line by centering analysis on the trust state as the proximate predictor of real-time attentional allocation and takeover readiness during conditional automation. This approach allows us to characterize how calibrated trust, once achieved, relates to the operational behaviors that determine automation safety and productivity.

2.2. Measuring Trust in Automation

Trust in ADS has been measured via self-report, behavior, and psychophysiology [14]. Self-report is the most common approach: respondents report their beliefs, attitudes, or intentions via questionnaires. Some studies use single-item ratings of trust (e.g., “How much do you trust this system from 0% to 100%?”) [15,30], which are simple but limited in construct coverage. Consequently, many adopt multi-item instruments such as TASS (Trust in Automated Systems Survey) [31], MDMT (Multi-Dimensional Measure of Trust) [32], and TOAST (Trust of Automated Systems Test) [33]. Wojton et al. [33] questioned the validity of TASS, noting that the scale included terms reflecting human-to-human trust and items in the distrust subscale implying that machines might engage in deception or have intentions. They contended that this creates ambiguity regarding whether the scale measures distrust or merely captures anthropomorphic tendencies toward the system. TOAST comprises nine items targeting perceived understanding and performance; Alarcón et al. [34] argue it operationalizes the trustworthiness perspective articulated by Lee and See [8]. In the present study, we employ TOAST as the self-report instrument of trust.

Behavioral measures capture how trust manifests in ongoing interaction. In conditional automation, NDRT engagement often serves as a proxy: drivers who do not trust the system may over-monitor the roadway and allocate less attention to the secondary task [16]. Engagement has been quantified by gaze time on the NDRT display [35] and by SuRT throughput—how much of the surrogate task is completed [16,18]. Prior findings indicate that higher trust is associated with more frequent NDRT monitoring and reduced on-road gaze [18], and that higher levels of system trust lead to increased NDRT engagement [17]; improving situation awareness can likewise raise trust and NDRT performance [16]. Accordingly, we index NDRT engagement with successful SuRT clicks per second and examine whether trust calibration is reflected in corresponding changes in throughput.

TOR reaction time is another widely used behavioral indicator of trust. Körber et al. [18] reported associations between trust and TOR-RT. Helldin et al. [19] found that interfaces exposing system uncertainty produced lower trust and faster TOR-RT relative to controls. Yet the direction and strength of the trust–RT relationship remain debated. Payre et al. [20] proposed that complacency might yield slower TOR-RT among high-trust drivers, but also argued that structured practice could attenuate such effects. Manchon et al. [10] reported no reliable effect of initial trust or driving style on TOR-RT. In light of these mixed results, we analyze trial-wise TOR-RT across four exposures and test whether, after familiarization, calibrated trust coexists with efficient NDRT performance without a systematic penalty in takeover latency.

3. Methods

3.1. Participants

For sample size planning, we conducted an a priori power analysis using G*Power 3.1.9.7. Based on effect sizes reported in prior trust-calibration research in automated driving [9,36], we specified a medium effect size (f = 0.25), α = 0.05, power (1 − β) = 0.80, two groups (Initial-Trust: Low vs. High), and four repeated exposures (TOR1–TOR4). The analysis indicated a required minimum sample size of N = 24 to detect the Group × Exposure interaction which is our primary hypothesis regarding trust calibration trajectories. We recruited 26 participants to account for potential dropout or data exclusion due to technical issues. Then, twenty-six licensed, right-handed drivers (age: M = 23.0, SD = 1.85; 9 female) completed the study. All participants were undergraduate or graduate students with driving experience ranging from 1 to 5 years (M = 2.2, SD = 1.23). Participants reported normal or corrected-to-normal vision and no history of neurological conditions. Participants received KRW 20,000 as compensation for their time; compensation was fixed and not contingent on performance. Written informed consent was provided by all participants. The Kwangwoon University Institutional Review Board (IRB) approved the study protocol (7001546-202300831-HR(SB)-008-01). Personally identifying information was stored separately from study data; analysis files contained only pseudonymized IDs and were kept on an encrypted, access-controlled server restricted to the research team, with retention and sharing aligned to IRB guidance.

3.2. Experimental Design

We used a within-subjects design with four takeover request (TOR) scenarios presented during conditional automated driving. To standardize exposure, scenarios were administered in a fixed order (A→B→C→D): (A) a disabled vehicle ahead, (B) a roadwork zone, (C) a highway interchange exit, and (D) a pedestrian jaywalking event.

3.3. Apparatus and Environment

The study was implemented in Unreal Engine (UE; version 5.1.1) using a fixed-base simulator with lane-keeping and speed control appropriate to an SAE level 3 context. The simulation was displayed on a 40-inch monitor with a 60 Hz refresh rate. Input was processed once per frame via the Enhanced Input system, yielding an effective sampling rate of 60 Hz. The driving simulator was equipped with a Thrustmaster Sparco R383 steering wheel and T-LCM pedals. Steering wheel angle (degrees), accelerator and brake pedal inputs (percentage), TOR event times, and other event timestamps were logged using UE’s high-resolution timer. A 10.4-inch tablet PC for the NDRT was mounted to the right side of the driver’s seat; to avoid systematic motor disadvantage in reaching movements, only right-handed participants were recruited. The complete driving simulator setup is illustrated in Figure 1.

3.4. Procedure

Each session lasted approximately 50 min. Upon arrival, participants provided written informed consent, completed a brief demographic questionnaire (age, gender, driving experience), and received a standardized briefing on the study flow and the capabilities/limits of conditional automation.

Participants next completed a 5-min practice block in the simulator to familiarize themselves with automated and manual driving as well as the NDRT. The practice included one TOR event so that drivers could experience the alerting sequence and the required takeover maneuver. During automated segments of the practice, participants rehearsed the NDRT. Immediately after practice, they completed the TOAST trust inventory; this post-practice score served as the initial trust.

The experimental drive then proceeded with the four TOR scenarios in the fixed order A→B→C→D. Prior to each TOR, participants were instructed to maximize their NDRT performance during automated control, and at the TOR they were instructed to resume manual control as quickly as possible. Immediately after each TOR, participants reported their current trust, and a short break was provided before the next scenario. After completing all four scenarios, participants took part in a brief post-session interview regarding their experience and perceptions of the system.

3.5. Tasks

During automated driving, participants oversaw the vehicle until a TOR occurred, indicated by combined visual and auditory cues (Figure 2). At the moment of a TOR event, the interface displayed a visual icon derived from Yun et al. [37] and simultaneously played a voice prompt in Korean indicating that autonomous driving mode was turning off. The NDRT followed the SuRT paradigm, developed using Flutter SDK 3.13.0. Stimulus parameters adhered to standard SuRT characteristics (e.g., target/distractor configuration, circle sizes/contrast, and trial window durations), and participants responded by touching targets on the tablet (Figure 3). The prespecified primary performance metric was successful clicks per second, defined as the number of correctly identified targets per second of effective engagement time during automated driving.

3.6. Dependent Measures

We examined both self-reported trust and concurrent behavioral indices. Trust in automation was measured with the 9-item TOAST inventory [33] on a 7-point Likert scale (1 = strongly disagree, 7 = strongly agree). TOAST assesses three complementary facets: reliability (e.g., “The system is reliable”), understandability (e.g., “I understand how the system will assist me”), and dependability (e.g., “The system can be trusted to do its job”). The original validation study reported good internal consistency (α = 0.89; [33]). A Korean version was developed via forward–back translation procedures. Participants completed TOAST after the practice block (initial trust) and after each of the four TOR scenarios. We computed per-exposure means across all 9 items to form an overall trust index. Higher scores indicate greater trust.

Behavioral outcomes comprised NDRT performance and TOR-RT. NDRT throughput was operationalized as successful SuRT clicks per second, defined as the number of correctly identified targets per second of effective engagement time during automated driving; misses and incorrect taps did not contribute to the numerator, and off-task intervals were excluded from the denominator. TOR-RT was defined as the latency from TOR onset to the first valid control input on any channel (steering, accelerator, or brake). All driving inputs, TOR events, and SuRT interactions were time-stamped (Unreal Engine 5.1.1 for driving data; tablet logs for SuRT) and synchronized to a common timeline via shared markers at block boundaries prior to analysis.

3.7. Statistical Analysis

Statistical analyses were conducted using Jamovi (version 2.4.8; an open-source, R-based statistical platform). Prior to analysis, assumptions were evaluated using Shapiro–Wilk tests on model residuals for normality and Mauchly’s test for sphericity. Greenhouse-Geisser corrections were applied when sphericity assumptions were violated. Primary models used mixed-design ANOVAs with a within-subjects factor reflecting repeated TOR exposure and a between-subjects factor of Trust Group (Low vs. High), defined via a median split of the post-practice TOAST score (treated as initial trust). Results are reported with F statistics, degrees of freedom, p values, and partial η². Where relevant, Tukey’s HSD post hoc comparison followed significant effects. Additionally, we examined individual-level associations between trust and NDRT throughput using Pearson correlations. To assess the practical magnitude of observed differences beyond null hypothesis testing, we calculated 90% confidence intervals for paired mean differences in takeover reaction time. The 90% level was chosen to align with standards in equivalence research [38] while providing interpretable bounds on effect magnitude. We evaluated these intervals against three benchmarks: (1) our observed variability, (2) between-study variability from meta-analytic literature [39], and (3) safety margins derived from time budget requirements in Level-3 autonomous driving.

4. Results

4.1. Trust

A mixed ANOVA with Exposure (Initial, TOR1–TOR4; within) and Group (Low vs. High initial trust; between, median split) revealed a significant Group × Exposure interaction F(4, 96) = 11.394, p <0.001, η²ₚ = 0.322. Given this interaction, we focus on simple effects rather than interpreting exposure’s main effect. In the Low group, trust increased significantly from initial to TOR1 (p_tukey < 0.001), from TOR1 to TOR2 (p_tukey < 0.01), and from TOR2 to TOR3 (p_tukey < 0.01), whereas changes from TOR3 to TOR4 were not significant (p_tukey = 1.00). In the High group, trust increased from initial to TOR1 (p_tukey < 0.05), whereas changes from TOR1 to TOR2 (p_tukey = 0.324), from TOR2 to TOR3 (p_tukey = 0.715), and from TOR3 to TOR4 were not significant (p_tukey = 0.398). The net gain through TOR3 was larger in the Low group (1.370) than in the High group (0.692), consistent with stronger calibration among initially low-trust participants. Trajectories are shown in Figure 4.

4.2. NDRT Performance

A mixed-design ANOVA with Exposure (TOR1–TOR4; within) and Group (Low vs. High; between) revealed a significant main effect of Exposure (Greenhouse–Geisser corrected), F(1.913, 45.905) = 5.434, p < 0.01, η²ₚ = 0.185, indicating that NDRT throughput changed across exposures (Figure 5). The Exposure × Group interaction was not significant, F(1.913, 45.905) = 0.001, p = 0.998, and the between-subjects Group effect was not significant, F(1, 24) = 2.531, p = 0.125.

Tukey’s HSD post hoc comparisons (family-wise α = 0.05) showed that throughput was higher at TOR3 than TOR1 (Δ = +0.046 clicks/s, SE = 0.015, t(24) = 2.99, p_tukey < 0.05). The TOR4 vs. TOR1 contrast trended in the same direction but did not reach significance (Δ = +0.063, SE = 0.024, t(24) = 2.626, p_tukey = 0.066). All other adjacent or non-adjacent contrasts were non-significant after Tukey correction. Taken together, NDRT performance improved from early to mid-exposure (from TOR1 to TOR3) and then plateaued without reliable group differences across the session.

4.3. TOR Reaction Time

A mixed-design ANOVA on TOR-RT with Exposure (TOR1–TOR4; within) and Group (Low vs. High; between) showed no main effect of Exposure, F(3, 72) = 1.219, p = 0.309, and no Group × Exposure interaction, F(3, 72) = 1.316, p = 0.276. The between-subjects Group effect was also not significant, F(1, 24) = 0.110, p = 0.743. Thus, takeover latency remained statistically stable across repeated exposures and did not differ by initial-trust group.

5. Discussion

5.1. Trust Calibration and Trajectories

Across four takeover exposures, drivers’ trust increased rapidly after onboarding and early interactions and then leveled off, with a small, non-significant softening at the final exposure. This trajectory depended on initial disposition: participants who began with lower trust exhibited sustained gains through TOR3, whereas those who began with higher trust increased only from Initial to TOR1 and then stabilized. The significant Group × Exposure interaction is diagnostic of trust calibration: brief, structured experience narrows the initial uncertainty gap and aligns expectations with observed capability. This pattern accords with dynamic accounts of learned trust in human–automation interaction, in which a few predictable encounters are sufficient to reshape mental models. In short, low-trust drivers showed stepwise increases up to TOR3 before plateauing, whereas high-trust drivers displayed a small early increase only (Initial→TOR1) and then remained stable.

5.2. Productivity Gains and Takeover Readiness

As trust calibrated, NDRT throughput (SuRT successful clicks per second) improved from early to mid-exposure and then plateaued, with no significant differences between initial-trust groups. In parallel, TOR-RT showed no statistically significant changes across exposures or groups. These concurrent patterns speak to a fundamental question in conditional automation: whether increasing engagement in secondary tasks, enabled by rising trust, might delay safety-critical takeover responses.

Within the constraints of our design and sample, we did not observe evidence of such a speed–safety trade-off. Drivers became more effective at secondary tasks while maintaining statistically stable takeover latencies. However, interpreting this stability requires careful consideration of effect magnitude. The 90% confidence interval for our primary comparison (TOR4 vs. TOR1: [−0.213 s, 0.466 s]) reveals that while large systematic delays appear unlikely, moderate increases of up to approximately 0.5 s remain plausible given our data.

This upper bound merits contextualization across multiple benchmarks. Relative to meta-analytic standards (Zhang et al. [39]: SD = 1.45 s), 0.466 s represents a small effect (32% of SD), and in operational terms constitutes only 7–12% of typical time budgets (4–7 s; Gold et al. [40]) in Level-3 autonomous driving. However, against our within-study variation (SD = 0.743 s), it reflects moderate uncertainty (63% of SD). These complementary perspectives suggest that the maximum plausible difference may be negligible in scheduled takeovers with generous lead times, yet could become relevant in time-compressed scenarios. This also underscores limitations in definitively characterizing the true effect size given our sample constraints.

These findings neither demonstrate systematic performance impairment nor establish formal equivalence. Rather, they suggest that well-calibrated trust may support both secondary-task engagement and timely takeover responses within the conditions we tested (scheduled takeovers with moderate time pressure). The observed pattern aligns with Lee and See’s framework of appropriate reliance [8], which posits that calibrated trust enables effective attention allocation between human and automation. Empirical work supports this interpretation: drivers with well-calibrated trust have been shown to engage more in secondary activities while maintaining or improving takeover preparedness [18,19].

However, the confidence interval width reflects inherent uncertainty given our sample size and design constraints. Future research employing larger samples, varied takeover urgencies, and comprehensive quality metrics beyond reaction time (e.g., trajectory stability, post-takeover situation awareness) would help establish more definitive boundaries on acceptable trust-performance trade-offs and narrow the plausible range of effects, particularly for time-critical scenarios where even moderate delays might become consequential.

5.3. Individual-Level Coupling of Trust and NDRT Throughput

Beyond group trajectories, trust and concurrent secondary-task throughput were positively associated across individuals (Pearson’s r = 0.431, p < 0.001, Figure 6). Drivers who reported greater trust tended to achieve higher SuRT performance during automated segments. This is consistent with the view that calibrated trust enables more efficient attentional allocation—investing effort in the non-driving task when the system reliably handles routine control—without degrading readiness for TORs. The result converges with prior reports linking higher system trust to increased NDRT engagement (e.g., Petersen et al. [16]) and is echoed in post-session interviews: several participants volunteered that, as their trust grew, they could “focus more on finding the target circles” (P3: “When trust was higher, I think I concentrated more on finding circles of different sizes”; P16: “As the experiment progressed, I gradually trusted the system more and could focus on finding circles of different sizes”). Correlation cannot determine directionality—greater trust may facilitate engagement, and successful engagement may, in turn, reinforce trust—but the convergence of self-report, behavior, and qualitative testimony strengthens the interpretation that calibrated trust is behaviorally expressed as higher throughput during conditional automation.

5.4. Interpreting the Softening at TOR4

Trust showed a small dip at TOR4, which involved a pedestrian. Human-involving hazards plausibly carry higher perceived severity and affective weight than purely vehicular conflicts. Under a “risk-as-feelings” lens, such events can transiently conservatize trust judgments even when objective capability is unchanged. We did not collect scenario-level ratings of perceived severity or affect, so this explanation is post hoc; nonetheless, it offers a coherent account of a mild trust retreat without a corresponding lengthening of TOR-RT. Future work should incorporate scenario-wise measures of perceived severity, affect/arousal, and workload and use counterbalanced orders to disentangle content from sequence effects.

5.5. Design and Training Implications

The concentration of trust gains within the first two exposures suggests that front-loaded micro-onboarding, consisting of brief hands-on demonstrations that make capability boundaries and TOR policies transparent, will be especially valuable before and during initial drives. This aligns with prior research demonstrating that initial system exposure and transparent communication of automation limitations significantly influence trust formation and calibration [9,18], particularly among users with lower baseline trust [29]. We therefore recommend prioritizing early, structured exposure (e.g., warm-up TORs with clear performance feedback) for low-trust users. For high-trust users, our data show that additional exposures yield limited gains; research on automation complacency suggests risk-attuned cues and complacency-mitigating messaging may be more effective for this population [41].

Trust-aware HMIs can exploit behavioral telemetry, such as unusually low NDRT throughput or excessive lane-monitoring micro-maneuvers, to infer under- or overtrust in situ and offer timely status information or gentle attention prompts to keep reliance aligned with capability. This approach builds on established frameworks for adaptive automation that adjust system behavior based on operator state [42] and recent work demonstrating the feasibility of real-time trust inference from behavioral and physiological signals in automated driving [43,44]. For human-involving hazards such as pedestrians or cyclists, modest anticipatory cues (earlier iconography or subtle pre-alerts) may preserve predictability and prevent undue trust dips while safeguarding fast takeovers.

5.6. Theoretical Contributions

The study moves beyond static attitudes by analyzing self-reported trust alongside concurrent behavior (NDRT throughput and TOR-RT), showing that calibrated trust has observable behavioral signatures. It also clarifies heterogeneity in learning: initial trust not only shifts the level but shapes short-run trajectories, implying that trust calibration unfolds at different rates across users. This heterogeneity is actionable for personalization in conditional automation.

5.7. Limitations and Future Directions

Several methodological constraints warrant consideration when interpreting these findings. The fixed scenario order partially confounds content effects with sequential or fatigue-related influences. For instance, the trust pattern observed during the pedestrian encounter (TOR4) may reflect scenario-specific characteristics, accumulated exposure effects, or both. Counterbalanced designs with randomized event timing and mixed-effects models incorporating scenario-level covariates such as perceived severity and workload would better isolate these influences in future work.

The behavioral assessment focused on TOR-RT as the primary safety indicator. While RT provides insight into takeover timeliness, it does not capture maneuver quality or control stability. More comprehensive evaluations could incorporate additional metrics including braking characteristics, steering smoothness, trajectory deviation, time-to-collision, and lane-keeping performance during the post-takeover period. Integrating these measures with eye-tracking, psychophysiological signals, and scenario-specific ratings would enable more nuanced assessments of the relationships among trust, attentional allocation, and safety outcomes.

The sample comprised primarily young student drivers tested in a fixed-base Level-3 simulator, which limits generalizability. The findings apply most directly to this demographic profile under the specific onboarding procedures and NDRT conditions employed here. Extensions to drivers of varying ages, experience levels, and cultural backgrounds, as well as to different automation capabilities, NDRT types, and real-world driving environments, require empirical validation. Larger and more diverse samples tested across multiple contexts would strengthen external validity.

Finally, while our analysis did not detect systematic increases in TOR-RT across exposures, the confidence intervals reflect residual uncertainty in estimating effect magnitudes. The observed interval bounds suggest that moderate differences cannot be definitively ruled out, particularly in scenarios with compressed time budgets. Future research employing pre-registered analysis plans, larger samples, and expanded outcome measures would help refine understanding of the conditions under which trust–productivity gains can be achieved without compromising takeover performance. Higher-fidelity simulation and on-road validation studies would further clarify the ecological robustness of these patterns.

6. Conclusions

This study characterizes driver trust as a calibrated, experience-dependent state situated between overtrust and distrust. Brief, structured interaction with conditional automation can calibrate driver trust quickly, especially among initially low-trust users. In this calibrated state, drivers gain secondary-task productivity (higher NDRT throughput) without detectable slowing in takeover reaction time, and trust correlates positively with concurrent NDRT throughput—consistent with more effective attentional allocation during routine automation.

Practically, systems should emphasize front-loaded micro-onboarding that makes capability boundaries and TOR policies transparent, and employ trust-aware interfaces that read behavioral signals to keep reliance aligned with actual performance. For human-involving hazards (e.g., pedestrians), modest anticipatory cues that preserve predictability may prevent transient trust dips while safeguarding rapid takeovers.

Overall, the results suggest that calibrated trust may support key goals of conditional automation: enabling drivers to engage productively in secondary tasks while maintaining readiness for takeover transitions. These findings emerged under specific conditions (scheduled takeovers with 4–7 s lead times and moderate urgency), and future research should examine whether similar patterns hold across varied takeover demands, time budgets, and traffic complexities to establish comprehensive design guidelines for trust-aware automation systems.

Author Contributions

Conceptualization, K.P. and H.H.; methodology, H.H.; software, J.K. and H.K.; validation, H.H. and K.P.; formal analysis, H.H.; investigation, H.H. and H.M.; resources, K.P.; data curation, H.H. and H.M.; writing—original draft preparation, H.H.; writing—review and editing, K.P.; visualization, H.H.; supervision, K.P.; project administration, K.P.; funding acquisition, K.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ICAN (ICT Challenge and Advanced Network of HRD) program (RS-2022-00156215) supervised by the IITP (Institute of Information & Communications Technology Planning & Evaluation). This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2021R1G1A1012063). This research was supported by the Korea Institute for Advancement of Technology (KIAT) grant funded by the Korea Government (MOTIE) (No.RS-2021-KI002499, HRD Program for Industrial Innovation). The present research has been conducted by the Excellent researcher support project of Kwangwoon University in 2023.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of Kwangwoon University (7001546-202300831-HR(SB)-008-01, 31 August 2023).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

We are deeply grateful to the participants of this study for their invaluable time, commitment, and willingness to share their experiences. During the preparation of this manuscript, the authors employed ChatGPT-5 to assist with partial translation and refinement of English grammar. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

Author Juhyeon Kim was employed by the company Designovel. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

SAE J3016; Taxonomy and Definitions for Terms Related to Driving Automation Systems for on-Road Motor Vehicles. SAE: Warrendale, PA, USA, 2018.
Hong, S.; Maeng, J.; Kim, H.J.; Yang, J.H. Development of warning methods for planned and unplanned takeover requests in a simulated automated driving vehicle. In Proceedings of the 14th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, Seoul, Republic of Korea, 17–20 September 2022; pp. 65–74. [Google Scholar]
Parasuraman, R.; Riley, V. Humans and automation: Use, misuse, disuse, abuse. Hum. Factors 1997, 39, 230–253. [Google Scholar] [CrossRef]
Kun, A.L.; Boll, S.; Schmidt, A. Shifting gears: User interfaces in the age of autonomous driving. IEEE Pervasive Comput. 2016, 15, 32–38. [Google Scholar] [CrossRef]
Lee, J.D.; Moray, N. Trust, self-confidence, and operators’ adaptation to automation. Int. J. Hum.-Comput. Stud. 1994, 40, 153–184. [Google Scholar] [CrossRef]
Muir, B.M. Trust between humans and machines, and the design of decision aids. Int. J. Man-Mach. Stud. 1987, 27, 527–539. [Google Scholar] [CrossRef]
Hoff, K.A.; Bashir, M. Trust in automation: Integrating empirical evidence on factors that influence trust. Hum. Factors 2015, 57, 407–434. [Google Scholar] [CrossRef]
Lee, J.D.; See, K.A. Trust in automation: Designing for appropriate reliance. Hum. Factors 2004, 46, 50–80. [Google Scholar] [CrossRef]
Kraus, J.; Scholz, D.; Stiegemeier, D.; Baumann, M. The more you know: Trust dynamics and calibration in highly automated driving and the effects of take-overs, system malfunction, and system transparency. Hum. Factors 2020, 62, 718–736. [Google Scholar] [CrossRef]
Manchon, J.B.; Bueno, M.; Navarro, J. Calibration of trust in automated driving: A matter of initial level of trust and automated driving style? Hum. Factors 2023, 65, 1613–1629. [Google Scholar] [CrossRef]
Manchon, J.B.; Bueno, M.; Navarro, J. How the initial level of trust in automated driving impacts drivers’ behaviour and early trust construction. Transp. Res. Part F Traffic Psychol. Behav. 2022, 86, 281–295. [Google Scholar] [CrossRef]
Wu, Y.; Yao, X.; Deng, F.; Yuan, X. Effect of Takeover Request Time and Warning Modality on Trust in L3 Automated Driving. Hum. Factors 2025, 67, 427–444. [Google Scholar] [CrossRef]
Swain, R.; Kaye, S.A.; Rakotonirainy, A. Shared intention and shared awareness for conditional automated driving: An online, randomized video experiment. Traffic Inj. Prev. 2025, 26, 398–406. [Google Scholar] [CrossRef]
Kohn, S.C.; De Visser, E.J.; Wiese, E.; Lee, Y.C.; Shaw, T.H. Measurement of trust in automation: A narrative review and reference guide. Front. Psychol. 2021, 12, 604977. [Google Scholar] [CrossRef]
Hergeth, S.; Lorenz, L.; Vilimek, R.; Krems, J.F. Keep your scanners peeled: Gaze behavior as a measure of automation trust during highly automated driving. Hum. Factors 2016, 58, 509–519. [Google Scholar] [CrossRef] [PubMed]
Petersen, L.; Robert, L.; Yang, X.J.; Tilbury, D.M. Situational awareness, drivers trust in automated driving systems and secondary task performance. arXiv 2019, arXiv:1903.05251. [Google Scholar] [CrossRef]
Azevedo-Sa, H.; Zhao, H.; Esterwood, C.; Yang, X.J.; Tilbury, D.M.; Robert, L.P., Jr. How internal and external risks affect the relationships between trust and driver behavior in automated driving systems. Transp. Res. Part C Emerg. Technol. 2021, 123, 102973. [Google Scholar] [CrossRef]
Körber, M.; Baseler, E.; Bengler, K. Introduction matters: Manipulating trust in automation and reliance in automated driving. Appl. Ergon. 2018, 66, 18–31. [Google Scholar] [CrossRef]
Helldin, T.; Falkman, G.; Riveiro, M.; Davidsson, S. Presenting system uncertainty in automotive UIs for supporting trust calibration in autonomous driving. In Proceedings of the 5th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, Eindhoven, The Netherlands, 27–30 October 2013; pp. 210–217. [Google Scholar]
Payre, W.; Cestac, J.; Delhomme, P. Fully automated driving: Impact of trust and practice on manual control recovery. Hum. Factors 2016, 58, 229–241. [Google Scholar] [CrossRef]
Toreini, E.; Aitken, M.; Coopamootoo, K.; Elliott, K.; Zelaya, C.G.; Van Moorsel, A. The relationship between trust in AI and trustworthy machine learning technologies. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, 27–30 January 2020; pp. 272–283. [Google Scholar]
Sarter, N.B.; Woods, D.D. How in the world did we ever get into that mode? Mode error and awareness in supervisory control. Hum. Factors 1995, 37, 5–19. [Google Scholar] [CrossRef]
Sheridan, T.B. Humans and Automation: System Design and Research Issues; John Wiley Sons: Hoboken, NJ, USA, 2002. [Google Scholar]
Dzindolet, M.T.; Peterson, S.A.; Pomranky, R.A.; Pierce, L.G.; Beck, H.P. The role of trust in automation reliance. Int. J. Hum.-Comput. Stud. 2003, 58, 697–718. [Google Scholar] [CrossRef]
Hancock, P.A.; Billings, D.R.; Schaefer, K.E.; Chen, J.Y.; De Visser, E.J.; Parasuraman, R. A meta-analysis of factors affecting trust in human-robot interaction. Hum. Factors 2011, 53, 517–527. [Google Scholar] [CrossRef]
Merritt, S.M.; Ilgen, D.R. Not all trust is created equal: Dispositional and history-based trust in human-automation interactions. Hum. Factors 2008, 50, 194–210. [Google Scholar] [CrossRef]
Muir, B.M.; Moray, N. Trust in automation. Part II. Experimental studies of trust and human intervention in a process control simulation. Ergonomics 1996, 39, 429–460. [Google Scholar] [CrossRef] [PubMed]
de Visser, E.J.; Peeters, M.M.; Jung, M.F.; Kohn, S.; Shaw, T.H.; Pak, R.; Neerincx, M.A. Towards a theory of longitudinal trust calibration in human–robot teams. Int. J. Soc. Robot. 2020, 12, 459–478. [Google Scholar] [CrossRef]
Forster, Y.; Kraus, J.; Feinauer, S.; Baumann, M. Calibration of trust expectancies in conditionally automated driving by brand, reliability information and introductory videos: An online study. In Proceedings of the 11th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, Utrecht, The Netherlands, 21–25 September 2019; pp. 118–128. [Google Scholar]
Kunze, A.; Summerskill, S.J.; Marshall, R.; Filtness, A.J. Automation transparency: Implications of uncertainty communication for human-automation interaction and interfaces. Ergonomics 2019, 62, 345–360. [Google Scholar] [CrossRef]
Jian, J.Y.; Bisantz, A.M.; Drury, C.G. Foundations for an empirically determined scale of trust in automated systems. Int. J. Cogn. Ergon. 2000, 4, 53–71. [Google Scholar] [CrossRef]
Ullman, D.; Malle, B.F. Measuring gains and losses in human-robot trust: Evidence for differentiable components of trust. In Proceedings of the 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), Daegu, Republic of Korea, 11–14 March 2019; pp. 618–619. [Google Scholar]
Wojton, H.M.; Porter, D.; Lane, S.T.; Bieber, C.; Madhavan, P. Initial validation of the trust of automated systems test (TOAST). J. Soc. Psychol. 2020, 160, 735–750. [Google Scholar] [CrossRef]
Alarcon, G.M.; Capiola, A.; Lee, M.A.; Willis, S.; Hamdan, I.A.; Jessup, S.A.; Harris, K.N. Development and validation of the system trustworthiness scale. Hum. Factors 2024, 66, 1893–1913. [Google Scholar] [CrossRef]
Hungund, A.P.; Pradhan, A.K. Impact of non-driving related tasks while operating automated driving systems (ADS): A systematic review. Accid. Anal. Prev. 2023, 188, 107076. [Google Scholar] [CrossRef]
Shahini, F.; Park, J.; Welch, K.; Zahabi, M. Effects of unreliable automation, non-driving related task, and takeover time budget on drivers’ takeover performance and workload. Ergonomics 2023, 66, 182–197. [Google Scholar] [CrossRef]
Yun, H.; Lee, J.W.; Yang, H.D.; Yang, J.H. Experimental Design for Multi-modal Take-over Request for automated driving. In Proceedings of the International Conference on Human-Computer Interaction, Las Vegas, NV, USA, 15–20 July 2018; pp. 418–425. [Google Scholar]
Schuirmann, D.J. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J. Pharmacokinet. Biopharm. 1987, 15, 657–680. [Google Scholar] [CrossRef]
Zhang, B.; De Winter, J.; Varotto, S.; Happee, R.; Martens, M. Determinants of take-over time from automated driving: A meta-analysis of 129 studies. Transp. Res. Part F Traffic Psychol. Behav. 2019, 64, 285–307. [Google Scholar] [CrossRef]
Gold, C.; Happee, R.; Bengler, K. Modeling take-over performance in level 3 conditionally automated vehicles. Accid. Anal. Prev. 2018, 116, 3–13. [Google Scholar] [CrossRef]
Parasuraman, R.; Manzey, D.H. Complacency and bias in human use of automation: An attentional integration. Hum. Factors 2010, 52, 381–410. [Google Scholar] [CrossRef]
Parasuraman, R.; Bahri, T.; Deaton, J.E.; Morrison, J.G.; Barnes, M. Theory and Design of Adaptive Automation in Aviation Systems; No. NAWCADWAR-92033-60; Naval Air Warfare Center Aircraft Division: Warminster, PA, USA, 1992. [Google Scholar]
Ayoub, J.; Avetisyan, L.; Makki, M.; Zhou, F. An investigation of drivers’ dynamic situational trust in conditionally automated driving. IEEE Trans. Hum.-Mach. Syst. 2021, 52, 501–511. [Google Scholar] [CrossRef]
Lu, Y.; Sarter, N. Modeling and inferring human trust in automation based on real-time eye tracking data. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Chicago, IL, USA, 5–9 October 2020; pp. 344–348. [Google Scholar]

Figure 1. Driving simulator environment.

Figure 2. Forward driving scene from the driver’s perspective during automated driving, with the visual TOR cue displayed as a red circle when TOR is initiated.

Figure 3. Example of the Surrogate Reference Task (SuRT) interface. The target circle to be identified is in the left center of the display.

Figure 4. Individual trust trajectories (thin lines) and group means (thick lines with error bars) across five measurement occasions in fixed order: Initial (post-practice), TOR1 (disabled vehicle), TOR2 (roadwork), TOR3 (highway exit), TOR4 (pedestrian), split by initial-trust group (median split: Low vs. High). Error bars indicate ±1 SEM.

Figure 5. Individual NDRT performance trajectories (thin lines) and mean performance (thick line with error bars) across four TOR scenarios in fixed order: T1 (vehicle), T2 (roadwork), T3 (exit), T4 (pedestrian). NDRT performance is operationalized as successful SuRT clicks per second. Error bars indicate ±1 SEM.

Figure 6. Relationship between trust and NDRT throughput (r = 0.431, p < 0.001). Each point represents one participant in one exposure. The solid line is the ordinary least squares linear fit to all observations.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hwang, H.; Kim, J.; Kim, H.; Min, H.; Park, K. From Overtrust to Distrust: A Simulation Study on Driver Trust Calibration in Conditional Automated Driving. Appl. Sci. 2025, 15, 11342. https://doi.org/10.3390/app152111342

AMA Style

Hwang H, Kim J, Kim H, Min H, Park K. From Overtrust to Distrust: A Simulation Study on Driver Trust Calibration in Conditional Automated Driving. Applied Sciences. 2025; 15(21):11342. https://doi.org/10.3390/app152111342

Chicago/Turabian Style

Hwang, Heetae, Juhyeon Kim, Hojoon Kim, Heewon Min, and Kyudong Park. 2025. "From Overtrust to Distrust: A Simulation Study on Driver Trust Calibration in Conditional Automated Driving" Applied Sciences 15, no. 21: 11342. https://doi.org/10.3390/app152111342

APA Style

Hwang, H., Kim, J., Kim, H., Min, H., & Park, K. (2025). From Overtrust to Distrust: A Simulation Study on Driver Trust Calibration in Conditional Automated Driving. Applied Sciences, 15(21), 11342. https://doi.org/10.3390/app152111342

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

From Overtrust to Distrust: A Simulation Study on Driver Trust Calibration in Conditional Automated Driving

Abstract

1. Introduction

2. Related Work

2.1. Trust in Automation and Trust Calibration

2.2. Measuring Trust in Automation

3. Methods

3.1. Participants

3.2. Experimental Design

3.3. Apparatus and Environment

3.4. Procedure

3.5. Tasks

3.6. Dependent Measures

3.7. Statistical Analysis

4. Results

4.1. Trust

4.2. NDRT Performance

4.3. TOR Reaction Time

5. Discussion

5.1. Trust Calibration and Trajectories

5.2. Productivity Gains and Takeover Readiness

5.3. Individual-Level Coupling of Trust and NDRT Throughput

5.4. Interpreting the Softening at TOR4

5.5. Design and Training Implications

5.6. Theoretical Contributions

5.7. Limitations and Future Directions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI