Scenario Parameters for Fatigue Induction in Truck-Driving Simulators: A Systematic Review of Experimental Designs

Fonseca, Tiago; Ferreira, Sara

doi:10.3390/app16063057

Open AccessSystematic Review

Scenario Parameters for Fatigue Induction in Truck-Driving Simulators: A Systematic Review of Experimental Designs

by

Tiago Fonseca

and

Sara Ferreira

^*

CITTA, Faculty of Engineering, University of Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(6), 3057; https://doi.org/10.3390/app16063057

Submission received: 10 February 2026 / Revised: 13 March 2026 / Accepted: 20 March 2026 / Published: 22 March 2026

(This article belongs to the Section Transportation and Future Mobility)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Driving simulators offer a safe and controlled way to study fatigue in truck drivers, but variation in scenario design and incomplete reporting limit reproducibility and cross-study comparison. This systematic review synthesized scenario parameters used in truck-driving simulators to induce fatigue-related reductions in alertness and identified recurring protocol patterns associated with interpretable fatigue-related change. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines and a prospectively registered protocol (PROSPERO CRD420261302272), systematic searches were conducted in February 2026 in Scopus, Web of Science, IEEE Xplore, PubMed, and ScienceDirect. Peer-reviewed original studies published in English were eligible if they involved truck drivers, used a driving simulator, reported fatigue-relevant scenario parameters, and measured at least one fatigue-related outcome; no restriction was applied to publication year. Twenty-three studies comprising 419 participants met the eligibility criteria and were synthesized narratively. Risk of bias was appraised using an adapted 11-item checklist for driving simulator experiments, developed with the National Heart, Lung, and Blood Institute (NHLBI) quality assessment tools as a reference framework. Across the qualitative evidence base, fatigue-related change was reported more consistently in protocols combining sustained time on task with low-variability driving demands, typically implemented through monotonous road environments and reduced traffic complexity. Effects were more readily interpretable when sessions were scheduled at night or after work shifts and when outcomes were assessed repeatedly during the drive. However, incomplete control or reporting of baseline sleep pressure, stimulant intake, counterbalancing, familiarization, simulator sickness, and outlier handling limited causal interpretation and confidence in cross-study comparison. Overall, the evidence supports recurring design patterns rather than a single optimal protocol and highlights the need for standardized scenario descriptions and minimum reporting requirements.

Keywords:

scenario parameter; driving simulator; truck driver; fatigue; road safety

1. Introduction

Road freight transport is essential to modern economies, sustaining supply chains and enabling the continuous delivery of goods. Professional truck drivers are central to this system, routinely operating over long distances under tight schedules and variable traffic and environmental conditions. In practice, this work is commonly associated with extended duty periods, limited opportunities for restorative sleep, suboptimal rest environments, and prolonged social isolation, all of which increase susceptibility to fatigue [1,2].

In the occupational driving context, fatigue should not be understood as a simple subjective feeling of tiredness. Rather, it is a multidimensional psychophysiological state arising from interacting mechanisms that include circadian misalignment, sleep restriction, sustained cognitive and physical workload, low-stimulation environments, and psychosocial stressors [3,4]. As fatigue develops, safety-critical driving functions deteriorate, including reaction time, sustained attention, and hazard anticipation [4,5]. For truck drivers, these decrements may have particularly serious consequences because heavy-vehicle operation requires stable lane keeping, precise speed control, and adequate safety margins over extended time-on-task.

Evaluating fatigue countermeasures and driver monitoring technologies in real traffic is inherently constrained by ethical and safety requirements. Inducing meaningful fatigue may require prolonged driving exposure and/or night-time testing, conditions that can elevate risk for participants and other road users. Driving simulators therefore provide a widely used alternative for fatigue research. Simulator-based experiments enable controlled manipulation of fatigue-relevant conditions, such as driving duration, monotony, time of day, and task demands, while maintaining participant safety. They also facilitate synchronized acquisition of fatigue-related outcomes, including subjective ratings, vehicle-based measures (e.g., lane deviation and steering variability), behavioral markers (e.g., eyelid dynamics), and physiological signals [6].

Despite the extensive use of simulators to study fatigue, fatigue-induction protocols remain difficult to reproduce and compare across studies. Scenario design choices vary substantially, including total driving duration, road type and geometry, monotony level, traffic density, event frequency, lighting and weather settings, and time of day. Reporting is often insufficiently detailed to support replication or secondary analysis. Furthermore, scenario manipulations are frequently combined with additional constraints, such as sleep restriction and post-shift testing, as well as with secondary tasks, which further increase methodological heterogeneity. Collectively, these factors limit cross-study comparability and hinder the identification of recurring scenario patterns associated with interpretable fatigue-related change in truck-driving contexts.

To address this gap, this systematic review synthesizes simulator-based studies involving truck drivers that aim to induce, manipulate, or assess fatigue-related reductions in alertness through scenario design. The overarching objective is to consolidate current evidence on fatigue-induction protocols for truck-driving simulators and to inform the development of more standardized and deployable experimental designs.

This review is guided by the following research questions (RQ):

RQ1: Which scenario parameters have been used in driving simulators to induce fatigue-related reductions in alertness in truck drivers?

RQ2: How have these scenario parameters been associated with measurable fatigue-related outcomes in truck-driving contexts?

RQ3: Which combinations of scenario variables (e.g., driving duration, road type, traffic density, and time of day) recur most consistently in studies reporting interpretable fatigue-related change in truck drivers?

RQ4: What methodological limitations and research gaps exist in simulator-based fatigue-induction studies targeting truck drivers?

The remainder of this paper is structured as follows. Section 2 presents the systematic review methods, including protocol and reporting approach, eligibility criteria, search strategy, screening procedures, and the processes used for data extraction, synthesis, and experiment assessment. Section 3 reports the results, first describing study selection and core study characteristics, then presenting the experiment assessment outcomes, and finally synthesizing the evidence across the review questions addressing scenario parameters, fatigue related outcomes, and methodological gaps. Section 4 discusses the findings in the context of the wider literature, critically examines strengths and limitations of both the included evidence and the review process, and outlines implications for practice, policy, and future research. Section 5 concludes with the main contributions of the review, while Appendix A provides the supporting tables that enable transparent cross-study comparison.

2. Methods

This systematic review was conducted and reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines [7]. The PRISMA framework supports standardized and transparent reporting by clarifying how evidence is identified, selected, and synthesized, thereby improving reproducibility. The completed PRISMA 2020 checklist is provided in the Supplementary Materials (see Document S1). The procedures applied in this review are outlined below.

2.1. Protocol

An a priori protocol was developed before the start of the review to define the objectives, research questions, eligibility criteria, and methodological procedures. The protocol was prospectively registered in the International Prospective Register of Systematic Reviews (PROSPERO) under the reference CRD420261302272 [8], providing a transparent framework to guide the review process, promote consistency, and minimize risk of bias.

The review followed a four-stage process: (i) identification of records through database searches; (ii) title and abstract screening to assess relevance; (iii) full-text assessment to confirm eligibility; and (iv) final inclusion of studies meeting all predefined criteria. Inclusion and exclusion criteria were applied consistently at each stage, and all decisions were documented to ensure transparency and reproducibility.

2.2. Eligibility Criteria

This review focused on truck drivers as the population of interest, including short-haul and long-haul drivers as well as truck-driver trainees. Eligible studies were required to be conducted in a driving simulator, whether static or dynamic, and to have a clear objective to induce, manipulate, or analyze fatigue-related reductions in alertness. To ensure that scenario design could be compared across studies, eligible articles had to report at least one fatigue-relevant scenario parameter, such as driving duration, road type and geometry, traffic density, environmental conditions, time of day, or monotony manipulations. In addition, studies had to report at least one relevant outcome reflecting fatigue-related state change during or after the protocol, including subjective indicators (e.g., Karolinska Sleepiness Scale), behavioral measures (e.g., steering-related metrics), physiological signals (e.g., electroencephalography, electrooculography, heart rate variability), or performance outcomes (e.g., reaction time, lane deviation).

For the purposes of study selection and synthesis, fatigue, sleepiness, drowsiness, and alertness decrement were treated as related operational labels within the broader domain of reduced alertness during driving. This decision was adopted because the simulator literature frequently uses overlapping terminology and partially shared indicators across these constructs. However, original study terminology was preserved when reporting individual findings, particularly where authors made an explicit conceptual distinction. Only experimental or quasi-experimental studies published in English in peer-reviewed venues were included, with no restriction on publication year.

Studies were excluded if they did not involve truck drivers (e.g., passenger-car drivers or bus drivers), if the simulator protocol was not designed to induce, manipulate, or assess fatigue, or if scenario design parameters were insufficiently reported. On-road fatigue studies were excluded when they did not use a driving simulator at any stage to assess fatigue. Studies addressing adjacent topics (e.g., workload, human–machine interaction, advanced driver-assistance systems validation, motion sickness, or user experience) were excluded when fatigue-related outcomes were not reported. Finally, review articles, theoretical papers, editorials, theses, opinion pieces, and other non-peer-reviewed publications, as well as non-English studies, were excluded to maintain focus on peer-reviewed primary evidence.

2.3. Search Strategy

A systematic literature search was conducted in February 2026 across five electronic databases: Scopus, Web of Science, IEEE Xplore, PubMed, and ScienceDirect. These databases were selected to ensure broad coverage of research in transportation safety, human factors, sleep science, and driving simulation.

The search strategy was structured around three core concepts: (i) population (truck drivers), (ii) context (driving simulators), and (iii) phenomenon of interest (fatigue). Keywords were combined using Boolean operators and adapted to the syntax of each database. Population-related terms included truck driver, heavy vehicle driver, commercial driver, freight driver, long-haul driver, and professional driver. Simulator-related terms included driving simulator, simulated driving, and driving simulation. Fatigue-related terms included fatigue, drowsiness, and sleepiness. The final search strings combined these three concept blocks using AND, while synonyms within each block were connected using OR.

No supplementary search techniques (e.g., citation tracking, reference list screening, or snowballing) were applied. This approach was adopted to maintain a standardized and replicable identification process based exclusively on pre-defined database searches.

2.4. Data Collection and Extraction

Data collection and extraction were performed by a single reviewer using a standardized procedure to ensure consistency across studies. After completing the database searches, all records were exported and imported into Rayyan (Rayyan Systems Inc., Cambridge, MA, USA, Version 1.7.2) to support study management and duplicate removal. Title and abstract screening and subsequent full-text assessment were conducted using the predefined eligibility criteria described in Section 2.2, with screening decisions recorded to maintain transparency.

For the included studies, a structured extraction form was developed in Microsoft Excel (Microsoft Corporation, Redmond, WA, USA, Version 16.106) to systematically capture the information required to address the review questions. Extracted items were organized into six domains: (i) basic study information (e.g., title, authors, year, objectives, and main findings); (ii) population and sample characteristics (e.g., truck driver type, sample size, sex distribution, age, driving experience, and participant eligibility criteria); (iii) simulator characteristics (e.g., static/dynamic configuration, truck cabin representation, field of view, and motion platform specifications); (iv) scenario design parameters relevant to fatigue induction (e.g., driving duration, block/break structure, road type and geometry, monotony and traffic density manipulations, events/hazards, environmental conditions, local time, sleep manipulation protocols, and secondary tasks); (v) fatigue measures (subjective, vehicle-based, behavioral, physiological, and performance indicators); and (vi) fatigue-related change and methodological considerations (e.g., evidence of fatigue increase, indicators showing change and their time course, thresholds used to define fatigue, study design, randomization/counterbalancing, control of sleep and stimulants, adaptation periods, dropouts, simulator sickness, and limitations highlighted by the authors).

Data were extracted directly from the full texts. To support consistency and efficiency, ChatGPT (OpenAI, San Francisco, CA, USA, Version GPT-5.2) was used as an auxiliary tool to assist in the preliminary identification of potentially relevant passages and in organizing information into the predefined extraction fields. All extracted content was then manually reviewed and cross-checked against the original articles by the reviewer, and no information was retained unless it could be verified in the source text. Any ambiguous information was resolved by returning to the full article, and items not explicitly reported were recorded as “not reported”. No subset of studies was independently re-screened or re-extracted by a second reviewer.

Because screening, eligibility assessment, and extraction were conducted by a single reviewer, this workflow may have increased the risk of selection or extraction error despite the use of predefined criteria, structured forms, and manual source verification.

2.5. Risk of Bias in Individual Studies

Risk of bias was appraised using an adapted 11-item checklist developed for driving simulator experiments, with the NHLBI (National Heart, Lung, and Blood Institute) quality assessment tools used as a reference framework [9]. This approach was adopted because generic appraisal tools do not fully capture several technical and procedural features that are particularly relevant in simulator-based research, such as scenario order control, road-geometry counterbalancing, familiarization procedures, motion-sickness assessment, and outlier handling.

The adapted checklist was therefore used to evaluate simulator-specific sources of bias that may influence study outcomes, particularly those related to design choices, procedural control, and reporting transparency. It was not intended to judge the scientific relevance, novelty, or overall quality of each study. The checklist comprised 11 items, each rated as yes (Y), no (N), not reported (NR), or not applicable (NA), depending on the study design and the information provided. The assessed items were: (i) suitability of the simulator apparatus for the research intent; (ii) randomization or counterbalancing of scenarios; (iii) randomization or counterbalancing of road geometries; (iv) application of appropriate statistical analysis; (v) representativeness of the sample relative to the target population; (vi) conduction of a practical trial; (vii) specification of a participant familiarization procedure; (viii) assessment of motion sickness; (ix) clear description of the motion-sickness assessment method; (x) assessment of outliers; and (xi) clear description of the outlier assessment method. When an item was not applicable, it was excluded from scoring.

For each study, an overall score was calculated as

\sum Y / (11 - \sum N A)

and reported as a percentage. This percentage should not be interpreted as a global measure of scientific quality or as definitive evidence of low or high validity. Rather, it provides a structured summary of the extent to which each study reported and controlled the simulator-specific methodological features captured by the checklist. Higher scores therefore indicate stronger procedural control and/or clearer reporting within the domains assessed, whereas lower scores indicate greater susceptibility to bias arising from insufficient control or incomplete reporting of those technical features. Further details regarding the development and prior application of this checklist are provided in Bobermin et al. [10], and item-level ratings for all included studies are reported in Table A4.

2.6. Data Synthesis

The extracted data were synthesized narratively and organized according to the four review questions. A mapping framework was used to ensure that each synthesis was based on comparable scenario descriptors and fatigue-related outcomes. Studies were first mapped to the four review questions using the standardized extraction fields. For RQ1, all included studies were eligible because each contributed information on simulator protocol design and scenario parameters. For RQ2 and RQ3, eligibility required reporting of fatigue-related change during the protocol or a clearly defined fatigue-state contrast linked to scenario design. Studies that treated fatigue primarily as a modelling label without reporting within-protocol evolution, or that did not present the relevant results in the available report, were retained for scenario description but were not used to support conclusions about measurable fatigue-related change or recurring parameter combinations associated with interpretable induction. For RQ4, all included studies were eligible because methodological limitations and research gaps were synthesized from study design and reporting fields together with limitations explicitly noted by the authors.

No statistical reconstruction or imputation was performed. Data preparation focused on harmonizing terminology and categorical groupings to support cross-study comparison, including exposure structure, traffic configuration, monotony framing, and fatigue-definition thresholds. When key details were missing, they were recorded as not reported, and no values were inferred from figures or derived indirectly from other variables.

Results were synthesized using structured tabulation and narrative integration. Evidence tables were used to consolidate scenario parameters, measurement domains, and methodological features across studies and to support direct comparison of protocol configurations. The narrative synthesis followed the same mapping structure and was organized by research question, emphasizing convergent patterns and explicitly distinguishing scenario description from outcome-based evidence.

Meta-analysis was not performed because the included studies differed substantially in exposure structures, scenario configurations, fatigue definitions, and outcome-measurement schedules, and several studies relied on categorical fatigue labels rather than comparable continuous trajectories. As a result, formal statistical heterogeneity assessment, subgroup analyses, meta-regression, and sensitivity analyses were not conducted.

Reporting bias due to missing results was assessed qualitatively by comparing reported outcomes against the stated design and measurement plan and by restricting interpretive claims about fatigue-related change to studies that provided explicit within-protocol evidence. Certainty in the body of evidence was not formally graded using a dedicated certainty framework. Instead, confidence was interpreted qualitatively based on cross-study consistency, completeness of reporting, and the risk-of-bias patterns identified using the adapted driving simulator checklist described in Section 2.5.

3. Results

This section summarizes the findings from the studies included in the qualitative synthesis. It reports the study selection process (Section 3.1), describes the main characteristics of the included evidence base (Section 3.2), and presents the risk-of-bias appraisal (Section 3.3). The results then address the four review questions by synthesizing the scenario parameters used for fatigue induction, the reported fatigue-related outcomes associated with those parameters, the recurring parameter combinations linked to interpretable fatigue-related change, and the key methodological limitations and research gaps (Section 3.4, Section 3.5, Section 3.6 and Section 3.7).

3.1. Study Selection

Study selection was conducted in accordance with the PRISMA 2020 guidelines and followed four sequential stages: identification, screening, full-text eligibility assessment, and final inclusion. Database searches in PubMed, Scopus, Web of Science, IEEE Xplore, and ScienceDirect yielded 172 records, with no additional records identified through other sources or registers.

During identification, 13 records were excluded based on the predefined exclusion criteria: 11 did not meet the required document type and 2 were not written in English. After removal of 77 duplicates, 82 unique records remained for title and abstract screening.

During screening, the titles and abstracts of these 82 records were assessed against the inclusion criteria. Of these, 25 records were excluded, primarily because they did not involve truck drivers and/or a driving simulator, did not address fatigue as an objective, or did not report sufficient scenario design information. This step resulted in 57 records proceeding to full-text review.

Of the 57 records selected for full-text assessment, 56 full-text articles were successfully retrieved. Despite exhaustive attempts using institutional databases and interlibrary services, 1 record could not be accessed and was therefore excluded. Following full-text evaluation, 33 articles were excluded: 28 because participants were not explicitly identified as truck drivers and the study did not clearly involve professional drivers performing truck-simulator driving tasks, 3 because a driving simulator was not used, and 2 because fatigue-related outcomes were not measured.

Consequently, 23 studies met all inclusion criteria and were included in the qualitative synthesis. The complete selection process is summarized in the PRISMA flow diagram (Figure 1), including the number of records included and excluded at each stage and the main reasons for exclusion.

3.2. Overview of Included Studies

This section provides an overview of the 23 studies included in the qualitative synthesis, summarizing the main characteristics that contextualize the evidence base used in this review. It first describes when and where the studies were published, highlighting the temporal accumulation of research and the geographical concentration of contributions (Section 3.2.1). It then summarizes the main research objectives reported across the included studies, providing context for how fatigue was addressed within simulator-based protocols (Section 3.2.2). Finally, the section summarizes the key characteristics of the participant samples and the simulation platforms used across the included studies (Section 3.2.3 and Section 3.2.4). Together, these elements establish the methodological and contextual basis for interpreting the scenario parameters and fatigue outcomes reported in the subsequent results sections.

3.2.1. Temporal and Geographical Distribution

The temporal distribution of the included studies indicates a long-standing, yet still limited, research stream spanning four decades (see Figure 2). The earliest eligible publication dates to 1985, with cumulative output remaining at one study through 1995. Evidence then expanded more noticeably from 2000 onward, reaching five studies by 2000 and six by 2005. Growth continued steadily through the following years, with ten studies published by 2010, twelve by 2015, and seventeen by 2020, culminating in twenty-three studies by 2025. Overall, this pattern reflects gradual evidence accumulation with a stronger contribution after 2000 rather than a rapid expansion of the field. While research on drowsiness detection and driver monitoring technologies has grown substantially in recent years [6,11], simulator-based experimental studies explicitly designed to induce and evaluate fatigue in truck-driving contexts remain comparatively scarce.

Regarding geographical distribution, the included studies were conducted across multiple countries, with a clear concentration in Europe and North America (see Figure 3). The United States contributed five studies (21%), representing the largest share of the evidence base. Sweden and Israel each contributed three studies (13%), while Spain, China and the Netherlands each contributed two (9%). The remaining six studies were conducted in Germany, Australia, Canada, Finland, Italy, and Poland (one study each) and are grouped as “Others” (26%) due to their smaller individual proportions. Overall, these patterns indicate that the available simulator evidence is produced by a limited number of national research hubs rather than being broadly distributed, which may constrain the diversity of simulator platforms, regulatory contexts, and trucking operational conditions represented in the literature.

3.2.2. Research Objectives

Across the 23 included studies (see Table A1), the research objectives converge into three main clusters, reflecting how simulator-based protocols have been used to elicit, operationalize, and evaluate fatigue in controlled truck-driving contexts.

First, nine studies focused on fatigue measurement, detection, and modeling, using simulator protocols to generate controlled physiological and behavioral data and to validate inference approaches. Within this cluster, studies developed or tested non-intrusive detection methods based on driver-facing signals and vehicle-control patterns. Several works evaluated vision-based or camera-derived indicators (e.g., eye-closure measures) in truck-simulator settings, targeting drowsiness detection performance and indicator sensitivity (e.g., [12,13,14]). Others emphasized algorithm development using multimodal or wearable physiology, including neural-network and deep-learning approaches to classify arousal- or fatigue-related states under simulator driving (e.g., [15,16]). More recently, studies integrated driver state labels with vehicle-control signatures to model fatigue-linked changes in speed regulation and lateral control (e.g., [17,18]). Although framed around visual attention rather than detector validation, Giorgi et al. [19] aligns with this cluster because it operationalized fatigue onset using an electroencephalogram (EEG)-based index and quantified systematic attention changes as fatigue emerged.

Second, ten studies used simulator-based fatigue-relevant conditions to evaluate systems, countermeasures, or operational exposures, treating fatigue either as an explicit target to be mitigated or as an outcome influenced by design and task characteristics. This cluster includes technology acceptance and human–system interaction objectives, such as evaluating driver acceptance of fatigue-related warning and monitoring functions and examining interaction preferences with in-vehicle voice assistants across fatigue states [20,21]. It also includes automation-focused experiments, where partially and fully automated truck platooning was assessed in relation to workload, trust, and sleepiness, illustrating that certain automation configurations may unintentionally facilitate increased sleepiness [22]. Beyond in-vehicle systems, several studies tested behavioral or scheduling countermeasures and exposure effects, including alertness-maintaining tasks during monotonous driving, nap strategies to improve nighttime alertness, and the influence of nondriving on-duty activities or visual stressors on subsequent alertness and performance (e.g., [23,24,25,26,27,28]). Cardoso et al. [29] further contributed an ergonomic intervention perspective by evaluating seat design during prolonged simulator driving and tracking fatigue-, stress-, and vigilance-related change.

Third, four studies focused primarily on comparative characterization of fatigue development across driver groups and testing conditions, rather than on technology evaluation. These studies contrasted professional and non-professional drivers and compared day versus night driving or acute sleep deprivation conditions, using multi-domain indicators to describe how fatigue manifests across populations and time-of-day contexts (e.g., [30,31,32,33]). Oron-Gilad and Ronen [34] complement this cluster by examining how road characteristics shape fatigue development and fatigue-related driving-performance degradation during prolonged, monotonous truck-simulator driving.

Overall, these clusters indicate that simulator-based fatigue induction in truck-driving research is rarely an end in itself. Instead, fatigue is typically used as a methodological lever to support three complementary aims: validating indicators and modeling approaches; testing systems, interventions, and exposure effects under fatigue-relevant conditions; and characterizing how fatigue develops across populations and experimental contexts. This framing is essential for interpreting the scenario parameters reported in the included studies, since design choices used to elicit fatigue are closely coupled to each experiment’s primary objective and to the outcomes used to define and confirm fatigue-related change.

3.2.3. Participant Sample Characteristics

Across the 23 included studies (see Table A2), participant samples were small to moderate in most cases, with substantial heterogeneity in driver background and incomplete demographic reporting. Sample size ranged from 8 to 60 participants, totaling 419 participants across the review. The largest sample was reported by Drory [30] (60 participants), followed by Afghari et al. [17] (35), Al Haddad et al. [20] and Sandström et al. [28] (34 each), and Yu et al. [18] (30). At the other end of the distribution, eleven studies enrolled 11 participants or fewer ([13,14,16,19,21,24,25,27,31,33,34]). These sample sizes are important when interpreting fatigue-induction effects, given that fatigue expression and performance degradation can vary markedly between drivers.

Sex distribution was reported in 16 of the 23 studies. Within these 16 studies, 11 used male-only samples ([14,21,22,23,25,26,29,30,32,33,34]), while only five studies explicitly included female participants: Afghari et al. [17] and Al Haddad et al. [20] each included six women, Sandström et al. [28] included two, and Macchi et al. [24] and Ranney et al. [27] each included one. Seven studies did not report sex composition ([12,13,15,16,18,19,31]), limiting assessment of sample representativeness and subgroup variability.

Age reporting was also inconsistent. When reported, age was typically presented as mean and standard deviation (SD), with substantial variation across samples. Mean age was reported in 13 of the 23 studies. The youngest cohort was reported by Oron-Gilad and Ronen [34] (mean age 22.0 years), followed by Anund et al. [33] (23.5 years; SD 1.5). Several studies reported mean ages in the early-to-mid forties (e.g., [17,25,32]), while the oldest cohort was Cardoso et al. [29] (mean age 50.4 years; SD 13.4). In the remaining ten studies, age was not reported, which constrains interpretation of fatigue susceptibility and performance differences in relation to age.

Eligibility criteria typically required a valid license and basic suitability for simulator participation, but the definition of the target driver population varied. Professional truck-driver status was explicitly stated in several studies (e.g., [14,17,20,21,33]), while other studies recruited commercially licensed drivers [29] or professional drivers without a clearly specified truck-driver designation in the eligibility description (e.g., [13,15]). Two studies relied on broader populations despite using truck-simulator contexts [12,28]. Experience thresholds were applied in a subset of studies (e.g., minimum professional experience in Al Haddad et al. [20] and Zhou et al. [21]), and several studies implemented common simulator-specific constraints (e.g., vision requirements and exclusion of relevant health conditions) [19,21]. Overall, the included evidence base remains dominated by male samples and frequently small cohorts, with variable population definitions and incomplete demographic reporting; these features should be considered when assessing the transferability of scenario parameters to broader and more diverse truck-driver populations.

3.2.4. Driving Simulator Characteristics

Across the 23 included studies (see Table A3), most experiments were conducted using static simulators, with a smaller subset using dynamic motion-based platforms. Specifically, 15 studies reported static configurations [15,17,18,19,20,21,23,24,26,27,28,29,30,32,34], 6 reported dynamic simulators [12,13,14,22,31,33], and 2 did not report simulator type [16,25]. This distribution indicates that fatigue-induction protocols in truck-driving simulation are most commonly implemented on static systems, likely reflecting broader accessibility and lower operational complexity compared with motion-based platforms.

Reporting of truck-cabin representation varied across studies. A simulated truck cabin was explicitly used in 14 studies [12,13,14,15,17,20,22,24,26,27,28,29,30,31], while 7 reported no truck-cabin simulation [18,19,21,23,32,33,34]; 2 studies did not report this information [16,25]. This heterogeneity is relevant because cab realism can influence immersion, posture and comfort, and perceived workload, which may interact with fatigue development during prolonged or monotonous driving tasks.

Field of view (FOV) was inconsistently reported, but when available it showed substantial variation. Eleven studies reported horizontal FOV values, ranging from 40° in two studies [23,34] to 180° in three studies [12,13,14]. Intermediate configurations included 120° in three studies [22,31,33], 135° in two studies [15,17], and 160° in one study [19]. In the remaining 12 studies, FOV was not reported [16,18,20,21,24,25,26,27,28,29,30,32]. The uneven reporting of FOV constrains comparisons across protocols, particularly because visual immersion and peripheral cues can influence lane keeping, speed control, and visual scanning patterns as fatigue progresses.

A similar pattern was observed for motion capability. All static simulators were reported with 0 degrees of freedom, as expected [15,17,18,19,20,21,23,24,26,27,28,29,30,32,34]. Among dynamic systems, three studies used six degrees of freedom [12,13,14], one used four degrees of freedom [33], and two used three degrees of freedom [22,31]; motion capability was not reported in two studies [16,25]. Motion cues may affect perceived realism, simulator tolerance, and sensory stimulation, and may therefore modulate workload and fatigue trajectories. However, given the limited number of dynamic studies and persistent reporting gaps, this evidence base does not yet support robust inferences on how motion configuration influences fatigue-related outcomes in truck-driving simulators.

Overall, the included studies reflect substantial variability in simulator configuration and reporting quality. Most protocols were implemented in static simulators, and key simulator features—particularly FOV, cabin representation, and motion parameters—were not consistently documented. These differences and reporting gaps are consequential for interpreting fatigue-induction outcomes and for supporting the reproducibility and transferability of simulator-based scenario parameters across laboratories and simulator systems.

3.3. Risk of Bias Within Included Studies

Risk of bias varied substantially across the 23 included studies (see Table A4), with overall scores ranging from 33% to 75%. Across the evidence base, the strongest items were those related to simulator suitability and the application of statistical analyses, which were generally well addressed. In contrast, the most consistent limitations were linked to experimental control and procedural transparency, particularly randomization and counterbalancing, incomplete reporting of familiarization, and limited documentation of motion sickness assessment and outlier handling.

Simulator apparatus was typically judged suitable for the research intent, and statistical methods were consistently applied. Sample representativeness was also rated positively in most studies, although a small subset relied on samples that were less clearly aligned with the target population definition. The main sources of potential bias therefore arose less from analytical omissions and more from upstream design choices and reporting gaps.

Randomization and counterbalancing were unevenly implemented and frequently underreported. Scenario order control was reported in approximately half of the studies, while many protocols either used fixed sequences or did not provide sufficient detail to confirm whether order effects were mitigated. Geometry randomization or counterbalancing was even less common, and in many cases was not applicable because experiments relied on fixed-route designs or single-condition protocols. Where condition sequences are fixed or insufficiently controlled, observed changes can partly reflect learning, habituation, carryover, or progressive fatigue accumulation beyond the intended experimental manipulation.

Procedural controls related to adaptation and familiarization showed a similar pattern. Although most studies indicated that participants completed some form of pre-exposure or practice drive, explicit methods to assess whether drivers had reached a stable level of familiarization were rarely described. This is consequential because insufficient adaptation can bias early vehicle-control and visual-attention metrics, particularly in protocols aiming to detect fatigue-related decrements in lane keeping, steering stability, or gaze behavior.

A cross-cutting reporting limitation concerned motion sickness and outlier handling. Only a minority of studies reported assessing simulator sickness, and even when mentioned, the assessment method was typically not described. Similarly, outlier identification and handling were not reported. While this does not imply that sickness or outliers were absent, the lack of transparency limits the ability to judge whether participant discomfort, attrition, or extreme observations influenced reported outcomes.

Overall, the evidence base combines consistent use of statistical analyses and generally appropriate simulator apparatus with recurring limitations in experimental control and reporting. Sparse randomization and counterbalancing, limited familiarization assessment, and minimal documentation of motion sickness and outlier procedures reduce confidence that reported effects can be attributed solely to the intended manipulations and constrain cross-study comparability of fatigue-induction protocols.

3.4. Scenario Parameters Used to Induce Fatigue (RQ1)

To answer RQ1, scenario parameters were synthesized across the 23 included studies (see Table A5), focusing on how simulator protocols were designed to elicit fatigue in truck drivers. Across studies, fatigue induction relied primarily on a combination of time on task, low stimulation driving conditions, and, in a smaller subset, timing relative to circadian vulnerability or prior shift work. The main scenario parameters reported in the evidence base are summarized below.

3.4.1. Driving Duration and Exposure Structure

Driving duration and exposure structure varied markedly across studies, ranging from brief single-trial protocols to multi-hour repeated-run designs. At one end, some experiments relied on short, single-session drives intended to elicit detectable fatigue-related change within a constrained exposure window (e.g., [17]). At the other end, several protocols accumulated time-on-task over multiple hours through repeated runs and scheduled rest periods, reflecting designs aimed at sustained fatigue development and/or longer observation windows for modelling, intervention testing, or exposure characterization (e.g., [24,25,26,27,28,30,32]).

Most studies clustered around approximately one hour of total simulator exposure, implemented either as continuous monotonous driving or as multiple shorter runs interleaved with breaks and questionnaires (e.g., [12,13,14,18,19,20,21,22]). These designs typically balance the need for monotony-driven state change against feasibility constraints, participant tolerance, and the risk of confounding early adaptation effects. A smaller subset used intermediate structures, such as longer continuous runs or repeated segments across visits, which can strengthen within-subject inference by enabling repeated measurements but also introduce carryover and recovery effects that must be managed analytically (e.g., [31,33,34]). One study adopted a hybrid structure combining simulator segments with an extended on-road component, complicating attribution of fatigue-related change to simulator exposure alone [29].

Overall, time-on-task was operationalized through three dominant exposure logics: short single-trial drives, session-length protocols around one hour, and extended multi-hour schedules based on repeated runs and rest periods. This heterogeneity is consequential for interpretation because fatigue induction is sensitive not only to total duration but also to exposure continuity, break timing, and task segmentation. Continuous monotonous driving is more likely to produce progressive vigilance decline through sustained underload, whereas repeated-block designs may intermittently restore alertness, shift the contribution of learning and habituation, and yield different fatigue trajectories even when nominal total driving time is comparable.

3.4.2. Road Type and Road Geometry

Across the included studies, road environments were generally selected to maintain stable longitudinal and lateral control demands, thereby supporting prolonged, low-variability driving exposures. Highway-oriented contexts were common, either as the primary setting or embedded within mixed highway–rural or highway–urban designs, indicating a prevailing preference for monotonous, speed-stable driving that facilitates fatigue accumulation and simplifies interpretation of fatigue-related control changes [12,14,15,17,18,20,22,25,29,34]. Rural-only scenarios were also frequently used, often to combine low traffic complexity with sustained underload while allowing limited geometric variation through straight segments and occasional curves [21,23,26,27,28]. Purely urban contexts were rare, appearing primarily as a controlled, traffic-free monotonous route rather than a complex urban traffic environment [19]. Road type was not reported in four studies, limiting classification of scenario context in a non-trivial portion of the evidence base [16,24,30,31].

When road geometry was described, it typically reflected designs that intentionally constrained variability through long straight segments, repetitive patterns, or large-radius curvature. Several studies used explicitly monotonic alignments (e.g., extended straight rural segments, or predominantly straight segments with limited events), consistent with protocols that aim to drive vigilance decrement via sustained underload rather than discrete hazard response [21,23,26]. A smaller subset incorporated structured curvature to parameterize lateral-demand variability while retaining predictability, such as controlled loop designs with large circular radii or defined winding-versus-straight comparisons intended to test how alignment modulates fatigue development [18,34]. Notably, some protocols embedded distinctive geometric features as sustained stimuli (e.g., a long tunnel) or varied route types across runs, which may introduce additional arousal or workload components beyond monotony exposure alone [21,25].

Overall, the road-type selection and geometry descriptions suggest that fatigue induction in truck-driving simulators is most commonly operationalized through environments engineered for monotony and steady-state control, rather than through complex, high-event driving. However, geometric reporting remains uneven: road geometry was not reported or was insufficiently specified in a substantial subset of studies, and several papers provided road-type labels without alignment detail. These reporting gaps limit direct comparison of alignment repetitiveness, curvature demand, and ecological context as fatigue-induction parameters, and they constrain reproducibility when road design is a central experimental lever.

3.4.3. Monotony, Traffic Density, and Event Structure

Monotony was a pervasive design intent, but it was operationalized and reported with uneven specificity. Just over half of the studies explicitly rated monotony level, most commonly as high, whereas the remainder either implied low-demand driving through road choice (e.g., highway or straight rural segments) or did not report monotony in comparable terms. Where monotony was treated as a primary induction lever, protocols relied on sustained, low-variability driving with minimal stimulus change and limited task switching (e.g., [18,19,21]). Overall, this pattern suggests that monotony was frequently embedded implicitly in scenario selection rather than standardized as a quantified parameter, which limits between-study comparability of underload intensity.

Traffic density reporting was even more limited. Fewer than half of the studies characterized traffic density in any structured way, and when it was reported it was typically qualitative rather than parameterized (e.g., staged light versus heavy traffic, sometimes combined with reduced visibility) [22]. Several protocols deliberately reduced or removed ambient traffic to avoid confounding fatigue-related control changes with interaction demands (e.g., [18,19]), while others included surrounding vehicles or lead-vehicle interactions without providing density metrics that would allow replication or cross-study benchmarking.

Event structure was more commonly described than monotony or traffic density, but it tended to be sparse and tightly controlled. Most studies that reported events used a small set of repeatable interactions—most often lead-vehicle deceleration episodes and isolated conflict triggers—rather than frequent hazards distributed throughout the drive (e.g., [17,21]). A smaller subset implemented more complex event schedules, including forward-collision scenarios with masking and filler events [20], or obstacle-avoidance triggers [18]. Taken together, the evidence indicates that fatigue induction protocols in truck simulators typically prioritize steady-state driving to support fatigue accumulation and signal stability, and use events sparingly, primarily as probes of performance under fatigue rather than as continuous workload drivers. This design choice strengthens internal interpretation by reducing confounding from fluctuating workload, but it also biases the evidence toward fatigue patterns observed in low-event, monotonous driving, limiting inference about fatigue responses under event-dense or highly interactive traffic conditions.

3.4.4. Environmental Conditions, Time-of-Day, and Local Time Scheduling

Time of day and local scheduling were inconsistently reported, but when specified they indicate that some protocols attempted to align simulator exposure with circadian low-alertness windows. Several studies scheduled night driving across late evening to early morning intervals that plausibly encompass the circadian nadir, often paired with darkness or night-time lighting settings [15,30,33]. In addition, some protocols implemented repeated runs across day and night, ensuring that at least part of the exposure occurred during early-morning low-energy periods, even when circadian timing was not the primary experimental manipulation [28].

A second approach relied on increasing homeostatic sleep pressure rather than targeting clock time directly. For example, Yu et al. [18] scheduled driving after a night shift with a 09:30 a.m. start time, which likely elevates sleepiness due to extended wakefulness even though circadian alertness is typically rising during the morning. In contrast, other studies reported broad daytime testing windows, such as 08:00 a.m.–09:15 p.m., under constant lighting, which limits inference about circadian phase effects because participants may be tested at substantially different biological times [17]. Afternoon testing was occasionally mentioned without precise start times, preventing assessment of alignment with the post-lunch dip [19]. Overall, local start times were only reported in a minority of studies, constraining cross-study comparison of circadian alignment.

Environmental control beyond lighting was rare. Only one study explicitly manipulated weather, introducing fog as part of a combined traffic–visibility condition sequence [22]. Taken together, the evidence indicates that circadian alignment is sometimes used as a fatigue-relevant control—primarily through night-time scheduling and low-light settings—but inconsistent reporting of start times, lighting levels, and sleep–wake context limits reproducibility and weakens interpretation of circadian contributions to fatigue trajectories.

3.4.5. Sleep Manipulation and Fatigue Priming Outside the Simulator

Sleep manipulation and fatigue priming outside the simulator were incorporated in a minority of protocols, whereas most studies relied primarily on in-simulator exposure to elicit fatigue-related change. Two studies leveraged naturalistic fatigue by scheduling simulator driving immediately after participants completed a night shift, thereby increasing prior wakefulness without imposing laboratory sleep restriction [14,18]. A second group implemented explicit sleep loss manipulations, typically as partial sleep deprivation on the preceding night, operationalized as restricted sleep opportunities of approximately four to five hours [12,13,27]. More extreme priming was uncommon but present, including total sleep deprivation protocols based on continuous wakefulness [32] and sustained wakefulness extending beyond a full circadian cycle [28]. Two studies combined sleep restriction with nap-related conditions, using scheduled naps as an experimental countermeasure within a restricted-sleep framework [24,27].

Beyond sleep loss, a small subset reported standardization measures intended to reduce between-participant variability in baseline sleepiness, such as requiring a minimum sleep duration across nights prior to testing and restricting alcohol or stimulants [33], or providing sleep-related instructions for night sessions without a clearly parameterized deprivation protocol [15]. Several studies implicitly engaged circadian and workload-related priming through night-session designs or pre-drive work tasks, but without explicit reporting of sleep–wake manipulation parameters, limiting interpretability of how much fatigue was attributable to prior sleep loss versus in-simulator time-on-task [25,30,31].

Overall, the evidence indicates that out-of-simulator priming was used selectively and heterogeneously, with most protocols treating fatigue induction as an emergent consequence of monotonous driving exposure rather than as a combined circadian–homeostatic manipulation. This has implications for comparability, because studies that enter the simulator with elevated sleep pressure may exhibit earlier and steeper fatigue trajectories than those relying solely on in-simulator monotony, even under similar scenario parameters.

3.4.6. Secondary Tasks and Cognitive Demand Additions

Across the 23 included studies, secondary tasks and added cognitive demands were used selectively rather than as a standard component of fatigue induction protocols. Most studies either reported no secondary task beyond the driving task itself or framed the protocol around sustained steady-state driving with minimal concurrent demands, consistent with an underload approach to fatigue development.

When secondary tasks were implemented, they followed three recurrent logics. First, several protocols used continuous or intermittent vigilance probes during driving to quantify attention under fatigue, including a Peripheral Detection Task (PDT) [22], divided-attention button-response tasks linked to visual changes [34], light-cancellation vigilance and communication prompts during the drive [30], and embedded response tasks such as pedestrian horn responses or mirror-target detection [27]. These additions increase measurement sensitivity for vigilance decline, but they also introduce an additional task load that may alter fatigue trajectories by elevating arousal and stabilizing control behavior relative to a purely monotonous drive.

Second, a smaller subset used cognitive activation tasks as countermeasures or as structured demand additions, such as choice reaction time, working memory, and trivia tasks applied during a late drive segment, or music as a sustained stimulation condition [23]. These designs shift the protocol from pure induction toward experimentally probing how added stimulation modifies fatigue progression, and they should be interpreted as mixed induction–intervention paradigms rather than monotony-only exposure.

Third, some studies applied cognitive tasks outside the driving segment as priming or assessment components, including pre/post reaction-time testing [31], stress-induction arithmetic tasks in a separate phase preceding the main fatigue drive [16], or break-period tracking tasks rather than in-drive dual-tasking [26]. These elements can influence baseline state and recovery dynamics, but they do not directly increase in-drive cognitive demand.

Overall, the evidence indicates that most truck-simulator fatigue protocols prioritize low-demand driving to promote monotony-driven fatigue, while secondary tasks are mainly introduced either to sensitively probe vigilance degradation or to test stimulation-based countermeasures. This variability matters because concurrent task load can act in opposite directions—improving detectability of fatigue-related attentional lapses while simultaneously attenuating monotony and delaying fatigue onset—thereby reducing direct comparability of fatigue trajectories across studies that differ in whether, when, and how secondary tasks were applied.

3.4.7. Summary of Scenario Parameters Identified

Across the included simulator protocols, fatigue induction was primarily operationalized through sustained time-on-task under low-stimulation driving conditions. Most studies relied on steady-state driving exposures, implemented either as continuous monotonous segments or as repeated runs separated by breaks. Exposure structure varied substantially, indicating that fatigue-related change depended not only on total duration but also on continuity, break timing, and task segmentation. Continuous monotonous driving was repeatedly positioned as a mechanism to elicit progressive vigilance decline through sustained underload, whereas repeated-block designs introduced recovery opportunities and additional learning or habituation dynamics.

Road environments were generally selected to minimize variability in longitudinal and lateral control demands. Highway contexts were common, rural-only scenarios were also frequent, and urban settings were rare and typically traffic-free. When geometry was described, alignments generally emphasized long straight segments, repetitive patterns, or large-radius curvature, with only a small subset using structured curvature contrasts or distinctive features such as tunnels. However, road type and geometry were not consistently specified, limiting direct comparison of alignment repetitiveness and curvature demand as fatigue-induction parameters.

Monotony was a pervasive design intent but was unevenly reported and rarely standardized as a quantified parameter. Traffic density was inconsistently characterized and, when reported, was usually qualitative; several protocols reduced or removed traffic to avoid confounding interaction demands. Events were typically sparse and tightly controlled, often centered on lead-vehicle interactions, suggesting that events were used mainly as probes of performance under fatigue rather than as sustained workload drivers. This supports internal interpretation but biases inference toward low-event, monotonous driving.

Circadian timing and environmental conditions were occasionally used as fatigue-relevant controls, most often through night-session scheduling and low-light settings, or through post-nightshift testing to increase sleep pressure. Yet start times, lighting parameters, and sleep–wake context were often underreported, constraining interpretation of circadian contributions. Weather manipulation was rare. Sleep manipulation outside the simulator and secondary tasks were also used selectively, with most protocols relying on in-simulator exposure and avoiding added cognitive demands, while a minority implemented sleep restriction, sustained wakefulness, naps, or vigilance probes and stimulation-based additions. Overall, the evidence base reflects a shared reliance on time-on-task and low stimulation, but heterogeneous implementation and incomplete reporting of key parameters limit replication and cross-study comparability.

3.5. Scenario Parameters Associated with Measurable Fatigue-Related Outcomes (RQ2)

Across the twenty-three included studies, most protocols reported evidence consistent with an increase in fatigue during the simulator procedure, based on within-session change in subjective, behavioral, physiological, or driving-performance indicators. Evidence of measurable fatigue-related change was reported in twenty studies, whereas three studies did not report time-linked fatigue evolution within the protocol, either because fatigue was treated primarily as an inferred label for modelling or because fatigue-related results were not presented in the available report [16,17,20].

The clearest evidence of interpretable fatigue-related change emerged when sustained low-variability exposure was paired with repeated measurement during the drive. Studies that collected repeated subjective ratings, most commonly the Karolinska Sleepiness Scale (KSS), and combined these with behavioral or physiological monitoring were able to demonstrate progressive state change over the session and, in several cases, differential effects across timing or condition contrasts such as night driving, automation configuration, or visibility and traffic manipulations (e.g., [22,31]). This pattern indicates that fatigue-related change was more readily identified when fatigue was operationalized as a trajectory rather than as a single end-state.

A second consistent pathway involved physiological and attention-related markers that are sensitive to short-term state variation, particularly eye-closure dynamics and related ocular measures, and, in a smaller subset, EEG-based indices and reaction-time measures to defined probes or events (e.g., [14,19,21]). In these studies, measurable fatigue-related change was supported by systematic shifts in these markers between lower- and higher-fatigue periods within the protocol, even when traditional vehicle-control metrics were less informative or were not the primary outcome.

Where fatigue-related change was evaluated through driving behavior and vehicle-control metrics, evidence typically reflected increased variability or altered control patterns under higher-sleepiness segments or fatigue-labelled windows, often anchored by subjective thresholds or concurrent physiological corroboration (e.g., [18,21]). However, some studies prioritized subjective or psychometric fatigue measures over vehicle-control metrics, indicating that drivers may experience and report increasing fatigue even when the specific kinematic indicators recorded in the protocol do not show clear or proportionate deterioration under the same exposure conditions [29]. This dissociation suggests that, under monotonous and low-demand task conditions, drivers may maintain gross control metrics through compensation or low control-demand requirements, while fatigue-related change is expressed more clearly as intermittent attentional instability rather than as monotonic drift in average kinematics.

Finally, studies that incorporated sleep-loss priming or post-shift scheduling more often reported interpretable fatigue-related change within the protocol than studies relying solely on in-simulator exposure, although reporting heterogeneity limits direct comparison of trajectories across designs. Overall, the evidence indicates that scenario parameters were more consistently associated with measurable fatigue-related change when protocols combined sustained low-stimulation exposure with repeated assessment using indicators capable of capturing within-session dynamics.

3.6. Recurring Combinations of Scenario Parameters Associated with Interpretable Fatigue-Related Change (RQ3)

Across the studies that reported measurable fatigue-related change, this pattern was rarely attributable to a single variable. Instead, the protocols most consistently associated with interpretable fatigue-related change combined parameters that jointly increased time-on-task while keeping driving demands stable and stimulation low, often reinforced by elevated baseline sleep pressure through scheduling or sleep manipulation.

The most recurrent combination involved sustained exposure in a low-variability driving context, typically implemented as continuous monotonous driving or repeated blocks with limited task switching. When described, these protocols relied on steady-state environments designed to minimize external variability, such as traffic-free or low-interaction settings and repetitive route structure, which support progressive vigilance decline under underload conditions (e.g., [18,19,21]). In this configuration, events were generally sparse and tightly controlled, functioning primarily as probes of performance under fatigue rather than as sustained workload drivers, thereby reducing confounding from fluctuating task demand (e.g., [17,20,22]).

A second recurring combination strengthened the low-stimulation exposure package by elevating sleep pressure before or during simulator driving. This was achieved through partial sleep-restriction protocols, post-nightshift scheduling, or night-time testing windows that plausibly overlapped circadian vulnerability, frequently paired with low-light or darkness settings [12,13,14,18,27,28,32,33]. Across these designs, fatigue-related change was more readily observable within the protocol because participants entered the exposure with higher baseline sleepiness or accumulated sleep pressure more rapidly under sustained monotony.

A third recurring combination involved stable road demand with reduced active control requirements, most clearly represented by automation or platooning configurations implemented during prolonged highway exposure [22]. In these protocols, reduced manual control demand likely amplified monotony and promoted sleepiness progression during the drive. However, the evidence for this pattern remains limited to a small subset of studies and should therefore be interpreted cautiously.

Taken together, these findings suggest that recurring fatigue-induction logic in truck-driving simulators depends less on any single scenario parameter than on the convergence of prolonged exposure, low environmental variability, and, in some protocols, elevated prior sleep pressure. At the same time, incomplete reporting of critical design features, including baseline sleep–wake history, traffic configuration, event density, and timing controls, limits the extent to which these combinations can be translated into standardized protocol prescriptions.

3.7. Methodological Limitations and Research Gaps in Simulator-Based Fatigue Induction Studies (RQ4)

Across the included studies, the main methodological limitations reflect a recurring tension between experimental feasibility and the controls required to make fatigue induction protocols comparable, reproducible, and transferable to operational trucking. In practice, many protocols report measurable fatigue-related change, but the evidence base is constrained by weak control of baseline state, sequencing confounds, inconsistent tolerance management, and uneven reporting of scenario parameters that are central to replication.

A first limitation concerns inadequate control and reporting of baseline sleep pressure and circadian alignment. Several protocols implicitly relied on circadian vulnerability, post-shift timing, or sleep restriction, yet often without sufficient documentation of sleep history, stimulant intake, or biological timing to distinguish whether observed changes reflect the scenario exposure, prior wakefulness, circadian phase, or their interaction. This is not a minor reporting issue: fatigue is strongly driven by homeostatic and circadian processes, and without explicit baseline-state characterization, comparisons across studies risk conflating differences in scenario design with differences in participants’ starting state. The consequence is reduced interpretability of why fatigue emerged earlier in some protocols than others, even when nominal exposure duration appears similar.

A second limitation is confounding of scenario effects with time-on-task due to fixed sequencing and incomplete counterbalancing. Many studies used progressive condition escalation, fixed segment orders, or repeated runs across conditions. Under these designs, apparent effects of traffic, automation, interaction style, or route features can be inseparable from fatigue accumulation, learning, habituation, or recovery. This is particularly problematic when conclusions are framed causally around specific scenario variables, because order effects can mimic scenario effects in sustained driving tasks where state changes accumulate continuously. In this sense, some protocols function more as demonstrations of fatigue emergence under a particular schedule than as controlled tests of which scenario parameters are responsible.

A third limitation involves adaptation procedures and simulator tolerance. Familiarization drives were commonly included, but methods to confirm that drivers had stabilized their control behavior before baseline recording were often unclear. Simulator sickness was rarely reported with enough transparency to evaluate whether discomfort influenced performance, exposure duration, or participant exclusion. These omissions matter because tolerance effects can bias samples toward participants who are less susceptible to simulator discomfort, and sickness-related discomfort can itself affect attention, steering variability, and self-reported state, thereby contaminating fatigue outcomes. Without systematic reporting and handling, simulator tolerance becomes an unmeasured source of bias.

A fourth limitation is constrained representativeness and incomplete demographic resolution. Samples were frequently small and often male-dominated, with uneven reporting of age and experience. These features limit generalizability, but the more fundamental issue is that fatigue expression and compensation strategies may differ across driver experience, age, and individual susceptibility. When protocols aim to support modelling, threshold definition, or intervention conclusions, limited demographic resolution restricts confidence that observed parameter–outcome relationships will transfer to broader truck-driver populations.

A fifth limitation is heterogeneity in fatigue operationalization and outcome selection, combined with incomplete reporting of scenario descriptors. Some studies tracked fatigue trajectories with repeated assessment, whereas others treated fatigue primarily as a label for modelling or did not present within-session evolution, limiting the ability to judge whether fatigue was induced progressively under the reported exposure. In parallel, core scenario elements—monotony level, traffic density, event frequency, road geometry detail, lighting, and local scheduling—were often described qualitatively or omitted. This prevents replication and complicates synthesis because two protocols labelled as “monotonous highway driving” may differ materially in interaction demand, stimulus rate, and visual environment, which are precisely the factors that modulate vigilance decline.

These limitations map directly onto research gaps. The dominant gap is the absence of harmonized, fatigue-relevant reporting standards that link baseline state, scenario parameterization, and measurement schedules in a way that enables cross-laboratory comparison. A second gap is limited external validation, both in terms of mapping simulator fatigue trajectories to on-road trucking and in establishing that scenario-driven markers and thresholds remain stable across simulator platforms and operational contexts. Finally, while some studies move toward evaluating countermeasures and system interactions under fatigue-relevant exposure, the literature remains fragmented; without stronger standardization and experimental control, it is difficult to isolate which parameter combinations are robust drivers of fatigue versus artefacts of scheduling, sequence structure, or unreported baseline-state variation.

4. Discussion

This section interprets the review findings in relation to the broader evidence on fatigue induction in truck driving simulators (Section 4.1), with emphasis on how scenario design choices shape the likelihood of eliciting measurable fatigue change. It then examines the main strengths and limitations of both the included studies and the review methods (Section 4.2), highlighting factors that constrain comparability and causal attribution. Finally, it outlines practical implications for simulator protocol design and standardization and identifies priorities to guide future research toward more reproducible and externally meaningful fatigue induction paradigms (Section 4.3).

4.1. Key Findings

This review synthesized four decades of simulator-based fatigue-induction studies in truck-driving contexts and identified a small, methodologically heterogeneous evidence base that has accumulated gradually and remains concentrated in a limited set of national research hubs. This long temporal span broadens the historical coverage of the field, but it also means that the included studies were conducted under substantially different simulator technologies, design conventions, and reporting standards. As a result, the literature likely under-represents the diversity of contemporary simulator platforms, regulatory contexts, and operational trucking conditions, while also limiting direct comparability across older and newer protocols.

Across studies, fatigue induction was rarely an end in itself. Instead, protocols used fatigue as a methodological lever to support three recurring aims: validating indicators and modelling approaches, testing systems or countermeasures under fatigue-relevant exposure, and comparing fatigue development across driver groups and scheduling conditions. This objective dependence explains much of the observed protocol variability and limits direct comparability across studies.

The dominant induction logic relied on sustained time-on-task under low-stimulation, low-variability driving. Most scenarios were engineered for steady-state control demands, commonly using highway or rural contexts with sparse, tightly controlled events and limited interaction demand. Where road geometry was described, it typically favored long straight segments, repetitive structure, or large-radius curvature. However, monotony, traffic density, event frequency, and geometry were often reported qualitatively or omitted, constraining reproducibility and preventing benchmarking of underload intensity across protocols.

Most studies reported measurable fatigue-related change during simulator exposure, but interpretability depended on whether within-session evolution was tracked and on the sensitivity of the selected indicators. Repeated subjective ratings, ocular measures, EEG-based indices, and reaction-time probes more consistently captured progressive fatigue-related change in steady-state driving, whereas gross kinematic deterioration was less consistently observed. This supports a critical interpretation that, under underload, fatigue may emerge primarily as attentional instability and state-marker change before producing clear shifts in mean vehicle-control metrics.

Protocols that elevated baseline sleep pressure through night-time scheduling, post-nightshift testing, or explicit sleep-loss manipulation more consistently produced interpretable fatigue-related change within feasible session durations. Yet circadian timing, local start times, lighting conditions, and sleep–wake context were inconsistently reported, limiting inference about circadian contributions and weakening cross-study comparison.

Key methodological limitations constrain the field: incomplete sequencing control and counterbalancing, inconsistent handling and reporting of adaptation and simulator tolerance, small and demographically narrow samples, and heterogeneous operationalization of fatigue outcomes. Overall, the evidence converges on a practical fatigue-induction principle—sustained low-stimulation exposure, often strengthened by elevated sleep pressure—but lacks the experimental control and reporting completeness needed for robust protocol benchmarking, replication, and external validity.

4.2. Strengths and Limitations

A main strength of this review is its parameter-driven synthesis. Rather than treating simulator studies as interchangeable, it analyses fatigue induction through exposure logic and scenario variables, linking protocol design to each study’s objective and reported outcomes. A second strength is the explicit distinction between what protocols likely implemented and what was actually reported, which is particularly important in a literature where missing methodological detail remains a major barrier to replication. The structured risk-of-bias appraisal further strengthens the review by identifying limitations in sequencing control, familiarization, and tolerance reporting as concrete threats to internal validity and cross-study comparability.

At the same time, this review has methodological limitations that should be acknowledged. Title and abstract screening, full-text eligibility assessment, and data extraction were conducted by a single reviewer. Although predefined eligibility criteria, standardized extraction fields, and manual verification against the original articles were used to improve consistency, the absence of duplicate independent screening and extraction may have increased the risk of selection or extraction error. In addition, ChatGPT was used as an auxiliary tool to support the preliminary identification of relevant passages and the organization of information into predefined extraction fields, and all retained information was manually checked against the source texts before inclusion. This procedure improved transparency and efficiency, but it does not eliminate the broader limitation associated with a single-reviewer workflow.

The review’s conclusions are also constrained by the underlying evidence base. Core descriptors such as monotony level, traffic density, event frequency, road geometry, lighting and start time, and sleep–wake context were frequently under-specified, limiting the ability to benchmark underload intensity or derive robust parameter thresholds. Substantial heterogeneity in study objectives, exposure structure, baseline sleep-pressure manipulation, simulator configurations, and fatigue operationalization also precluded quantitative synthesis and weakened causal interpretation of recurring parameter combinations. In several studies, fatigue was treated primarily as a modelling label, or its longitudinal development was not reported, which restricted inference regarding induction dynamics.

An additional limitation is the temporal breadth of the evidence base. By including studies across four decades and applying no publication-year restriction, the review sought to maximize completeness in a small and fragmented field. However, older simulator studies were conducted under markedly different technological, methodological, and reporting conditions. This broader temporal coverage is informative for understanding the historical development of fatigue-induction logic, but it also increases heterogeneity and limits direct comparability with more recent studies. As a result, confidence in the conclusions is stronger at the level of recurring design principles than at the level of specific parameter thresholds or protocol prescriptions.

Generalizability is further limited by variable population definitions and incomplete demographic reporting, together with small and often male-dominated samples. Restricting inclusion to English-language, peer-reviewed sources improved methodological consistency, but it may also have excluded relevant non-journal or non-English simulator protocols, particularly those developed in industry settings or reported in earlier research programs.

Overall, this review provides a structured account of how fatigue induction has been implemented in truck-driving simulators and why direct protocol benchmarking remains difficult. The synthesis supports convergent design principles, particularly sustained time-on-task under low-stimulation conditions and, in some protocols, reinforcement through elevated sleep pressure. However, confidence in these conclusions is bounded by uneven reporting, methodological variability, the temporal heterogeneity of the evidence base, and the limitations of the review workflow itself. Accordingly, the review can more reliably identify recurring exposure logics than specify transferable parameter thresholds or rank protocol elements by effectiveness. These constraints also point to a practical implication: progress in this area is currently limited less by the absence of plausible induction mechanisms than by insufficient standardization and reporting transparency to enable cumulative comparison, replication, and external validation.

4.3. Future Research

The results point toward a clear direction for future truck-simulator fatigue research: moving from loosely specified protocols toward reproducible scenario configurations supported by transparent reporting of baseline state, circadian timing, adaptation, and sequencing. A priority is a minimal reporting standard that documents sleep–wake context and stimulant control, local start time and lighting, exposure structure and break rules, familiarization and tolerance procedures, and the use or justification of randomization and counterbalancing. The same standard should also require the explicit reporting of scenario descriptors that remain repeatedly underreported in the literature, including monotony level, traffic density, event frequency, and road geometry detail, to enable benchmarking across protocols. To support this objective, Table A6 in Appendix A summarizes the minimum scenario and protocol details that future truck-driving simulator studies on fatigue induction should report in order to improve reproducibility, cross-study comparability, and external interpretability.

Methodologically, future studies should better separate scenario effects from time-on-task and order effects. Many protocols embed fixed sequences or progressive escalation, making traffic, visibility, automation, interaction demand, or route features difficult to distinguish from fatigue accumulation, learning, and recovery. Balanced designs with counterbalanced or randomized condition ordering, or explicit modelling of sequence effects when balancing is infeasible, are needed for more defensible causal attribution. This is particularly important in protocols that combine fatigue-relevant exposure with additional manipulations, because without adequate sequencing control it remains difficult to determine whether observed changes reflect the intended scenario parameter or the broader temporal structure of the experiment.

Future work should also strengthen population definition and external validity. Small, male-dominated samples and incomplete demographic reporting limit transferability. Studies should standardize eligibility definitions for truck-driving tasks, report age, sex, and experience consistently, and document attrition and tolerance-related exclusions. External validation should be built in through hybrid designs anchored to on-road duty cycles or through direct testing of whether simulator-derived markers and thresholds remain stable in naturalistic trucking contexts. The field would benefit particularly from designs that connect simulator-based fatigue trajectories to operationally meaningful conditions, including shift schedules, overnight driving, and real-world exposure histories.

From an applied standpoint, the review supports a pragmatic implication for validation pipelines: interpretable fatigue progression is most likely when sustained time-on-task is combined with low-variability, low-interaction driving, often reinforced by elevated sleep pressure, and paired with repeated within-session assessment. As the field shifts toward mitigation and interaction concepts, closed-loop interventions should be evaluated under standardized induction conditions using designs that isolate intervention effects from baseline-state variation and sequencing confounds. Progress in this area is therefore likely to depend less on proposing new fatigue-induction mechanisms than on improving the comparability, transparency, and external grounding of the protocols already in use.

5. Conclusions

This systematic review synthesizes how fatigue has been induced in truck-driving simulators and which scenario parameters are most consistently associated with measurable fatigue-related change. The evidence does not support a single, standardized parameter set for fatigue induction. Instead, it points to recurring design principles across protocols: sustained time-on-task under low-stimulation, low-variability driving demands, most often implemented in steady-state highway or rural scenarios with sparse, controlled events. Fatigue-related change is also more readily interpretable when baseline sleep pressure is elevated through night-time scheduling, post-shift timing, or sleep-loss manipulation.

At the same time, translating these common patterns into transferable protocols remains difficult. Key elements are often underreported, including baseline-state controls, local scheduling and lighting, familiarization and tolerance procedures, sequencing, simulator sickness management, and core scenario descriptors such as monotony, traffic density, event frequency, and road geometry. As a result, the literature supports convergent parameter patterns but cannot yet justify reproducible parameter definitions or reliably rank combinations across simulators and study objectives.

Progress therefore depends on minimal reporting requirements and stronger design controls that enable replication, cross-laboratory benchmarking, and external validation, thereby supporting more robust evaluation of fatigue monitoring and mitigation systems in trucking contexts.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app16063057/s1, Document S1: PRISMA 2020 checklist. Reference [7] is cited in the supplementary materials.

Author Contributions

Conceptualization, T.F. and S.F.; methodology, T.F.; validation, S.F.; formal analysis, T.F.; investigation, T.F.; resources, S.F.; data curation, T.F.; writing—original draft preparation, T.F.; writing—review and editing, S.F.; visualization, T.F. and S.F.; supervision, S.F.; project administration, T.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by UID/04427 of the CITTA—Research Center for Territory, Transports and Environment, funded by Fundação para a Ciência e a Tecnologia, I.P./MECI through national funds. This work was also financed by the FEDER funds through COMPETE2030 and with financial support of FCT, I.P., within the framework of the project Tec4Safe—Tailored Technology and Innovation for Driving Safely, with nº 15365, and the operation code at the Funds Platform COMPETE2030-FEDER-00904100 and FCT code 2023.16094.ICDT. In addition, this work was financed by the project “URBAN-NET: Intelligent Networks for Carbon Neutral and Sustainable Cities”, with the reference NORTE2030-FEDER-02697300, co-financed by the European Union, through the NORTE 2030 Regional Program of Portugal, 2030.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors would like to acknowledge the Doctoral Program in Occupational Safety and Health at the University of Porto for providing access to digital library resources, which enabled the retrieval of the studies included in this review. During the preparation of this review, the authors used ChatGPT as an auxiliary tool to support the preliminary identification and organization of information during data extraction and to assist in text drafting. All outputs were subsequently reviewed, edited, and verified by the authors against the source materials. The authors take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ADAS	Advanced driver assistance system
ANN	Artificial neural network
BMI	Body mass index
DOF	Degrees of freedom
EEG	Electroencephalogram
EOG	Electrooculography
FOV	Field of view
HMI	Human–machine interface
KSS	Karolinska Sleepiness Scale
LDW	Lane departure warning
LSTM	Long short-term memory
N	No
NA	Not applicable
NHLBI	National Heart, Lung, and Blood Institute
NR	Not reported
PDT	Peripheral Detection Task
PERCLOS	Percentage of eyelid closure over the pupil over time
PRISMA	Preferred Reporting Items for Systematic Reviews and Meta-Analyses
PROSPERO	International Prospective Register of Systematic Reviews
PVT	Psychomotor vigilance task
RQ	Research question
SD	Standard deviation
UFOV	Useful field of view
Y	Yes

Appendix A

This appendix compiles the tables that support the qualitative synthesis and ensure transparent reporting. Table A1 provides an overview of the included studies, summarizing author and year, institution and country, study objective, and main findings. Table A2 reports participant characteristics, including sample size, sex distribution, age presented as mean and standard deviation, and the inclusion and exclusion criteria used in each study. Table A3 describes the simulator setups, covering simulator type, whether a truck cabin was simulated, horizontal field of view, and degrees of freedom. Table A4 summarizes the experiment assessment results, including item-level judgements on simulator suitability, scenario and geometry counterbalancing, use of statistical analysis, sample representativeness, practical trial execution, familiarization procedures, motion sickness assessment and reporting, and outlier assessment and reporting. Table A5 consolidates protocol characteristics relevant to fatigue induction, including driving block duration, block structure and breaks, road type, and sleep manipulation protocol. Finally, Table A6 provides a minimum reporting framework for future truck-driving simulator studies on fatigue induction, summarizing the scenario and protocol details that should be reported to improve reproducibility, cross-study comparability, and external interpretability.

Table A1. Summary of included studies.

Author, Year	Institution, Country	Objective	Main Findings
Afghari et al. (2022) [17]	Delft University of Technology, Netherlands	To collect real-time heart-rate-based sleepiness indicators in professional truck drivers using a truck simulator and investigate, via an instrumental variable framework, how sleepiness affects headway while accounting for endogeneity and unobserved heterogeneity.	Younger and more experienced drivers, four-lane rural roads and lower weekly distance travelled were associated with higher probability of sleepy episodes; heavy and distribution transport drivers were less likely to be sleepy; instrumented sleepiness, age and exposure had heterogeneous effects on headway, with some drivers reducing and others increasing headway when sleepy, strongly influenced by night-shift work.
Al Haddad et al. (2022) [20]	Technical University of Munich, Germany	To investigate truck drivers’ acceptance of an i-DREAMS warning–monitoring system (speeding and fatigue-related warnings) in a driving simulator and assess which acceptance factors are transferable to other modes (cars, trams).	Perceived usefulness and perceived ease of use strongly increased behavioral intention to use the system in truck drivers; prior positive attitudes toward advanced driver assistance systems (ADAS) also increased perceived usefulness. Sound clarity was perceived as weaker than visual clarity. The technology acceptance model was largely validated for trucks, with similar structure to cars but not trams.
Anund et al. (2018) [33]	Swedish National Road and Transport Research Institute, Sweden	To investigate whether professional drivers are more resistant to sleep deprivation than non-professional drivers by comparing the development of self-reported, physiological and behavioral sleepiness during day and night driving in a simulator.	Professional drivers reported significantly lower self-rated sleepiness than non-professional drivers, but showed longer blink durations, more line crossings, and drove faster, especially at night; both groups exhibited increased sleepiness, degraded performance, and worse psychomotor vigilance task (PVT) results at night and with time on task.
Cardoso et al. (2019) [29]	University of New Brunswick, Canada	To evaluate changes in fatigue, stress and vigilance among commercially licensed truck drivers during a prolonged driving task and to determine whether a new ergonomic seat design reduces physical and cognitive fatigue. compared with an industry standard seat.	Professional long-haul truck drivers showed significant increases in muscular fatigue, exhaustion–sleepiness and comfort-seeking after a 90 min highway drive plus simulated driving, especially when using the industry standard seat; boredom–demotivation was also higher with the standard seat, while vigilance remained stable, suggesting good coping and attentional maintenance despite fatigue.
Daza et al. (2011) [13]	University of Alcalá, Spain	To develop and evaluate a non-intrusive driver-drowsiness monitoring approach based on fusion of driver and driving signals using a multilayer perceptron neural network.	Fusion improved detection: individual-indicator detection rate (~70%) increased up to 94% with combined indicators; best artificial neural network (ANN) combination achieved 98.65% detection rate; PERCLOS (percentage of eyelid closure over the pupil over time) alone 97.61%; best driving-only indicator was standard lateral position 84.10%.
Drory (1985) [30]	Ben-Gurion University of the Negev, Israel	To examine the effects of extra task stimulation and extra rest on performance and fatigue of haul truck drivers engaged in a simulated driving task.	Performance and perceived fatigue differed by secondary-task type; adding a voice-communication task improved performance but increased reported fatigue versus basic driving/vigilance; an extra 30 min rest period reduced reported fatigue but did not affect performance.
Dziuda et al. (2021) [14]	Military Institute of Aviation Medicine, Poland	To evaluate a camera-based fatigue detector for truck drivers in a high-fidelity truck simulator using eye closure-associated indicators and relate them to subjective fatigue.	PERCLOS and eye closure duration were significant predictors of subjective fatigue, capturing within-driver changes and between-driver differences respectively, whereas eye closure frequency showed weak association; the prototype camera-based detector could monitor fatigue non-invasively in professional truck drivers.
Eskandarian & Mortazavi (2007) [15]	The George Washington University, USA	To evaluate a neural-network-based drowsiness detection algorithm for commercial vehicle drivers in a truck driving simulator using steering input only, and to examine correlations between steering behavior and drowsiness.	Night-session driving produced markedly higher drowsiness indicators and more crashes; steering degradation before crashes followed a two-phase pattern (erratic large corrections then flattened steering during doze-off). The ANN using 15 s steering-feature vectors achieved 85% correct identification of phase-I drowsy intervals with 11% false alarms, and issued at least one detection for 97% of observed crashes.
García et al. (2010) [12]	University of Alcalá, Spain	To develop and evaluate a non-intrusive vision-based drowsiness detector implemented in a realistic truck-driving simulator, and to compare PERCLOS against expert-based subjective ground truth.	PERCLOS estimation was implemented in real time and compared against expert ground truth; reported mean recall rates across users were 83.5% (awake), 57% (fatigue), 46% (drowsiness); paper also reports a three-level PERCLOS discretization using thresholds 15% and 23% for wake–fatigue–drowsiness classification.
Gillberg et al. (1996) [31]	Karolinska Institute and IPM, Sweden	To compare daytime versus night-time driving performance and sleepiness in professional drivers during a simulated truck-driving task, and to test whether a 30 min nap or 30 min rest pause during night driving affects performance or sleepiness.	Night driving showed small but significant decrements (lower mean speed, higher speed variability, higher lane-position variability, especially in the last 30 min) and clearly higher subjective and EEG/EOG (electrooculography) sleepiness versus day driving; reaction time was not significantly affected by condition; neither a 30 min nap nor a 30 min rest pause improved performance or sleepiness.
Giorgi et al. (2024) [19]	Sapienza University of Rome, Italy	To investigate visual attention changes at fatigue onset in professional van and truck drivers during a 45 min monotonous simulated driving task using an EEG-based mental drowsiness index to define fatigued periods.	Protocol successfully induced early fatigue; as fatigue onset occurred, drivers shifted visual attention away from task-related areas of interest (road, cockpit) toward non-informative external environment; EEG MDrow and subjective KSS/Chalder scores increased across the protocol; no differential fatigue effect between short-range (van) and long-range (truck) drivers.
Hjälmdahl et al. (2017) [22]	Swedish National Road and Transport Research Institute, Sweden	To examine how partially and fully automated truck platooning affect driver workload, trust, acceptance, performance and sleepiness when using a minimal in-vehicle human–machine interface (HMI).	Partial automation produced higher workload than full automation or baseline cruise control; trust and acceptance were highest in baseline and lower for both automation levels; both partial and full automation increased sleepiness compared with baseline, with full automation producing the highest KSS; partial automation increased steering activity and lane-position variability.
Howard et al. (2014) [32]	Institute for Breathing and Sleep, Australia	To compare performance changes during acute sleep deprivation between professional and nonprofessional drivers using simulated driving, psychomotor vigilance, and subjective sleepiness measures.	Across 24 h of continuous wakefulness, both groups showed progressive deterioration in simulated driving (lateral lane position variability and speed variability) and in PVT performance (reaction time and lapses), with performance worsening particularly after 17–24 h awake; crashes increased after 17–23 h awake; no differences in performance change trajectories between professional and nonprofessional drivers.
Macchi et al. (2002) [24]	Institute for Circadian Physiology, USA	To assess the effects of a scheduled 3 h afternoon nap versus no nap on nighttime alertness and psychomotor performance during a simulated night shift in professional long-haul drivers after partial sleep restriction.	The afternoon nap reduced subjective sleepiness and fatigue at night, improved psychomotor performance, and increased physiological arousal during simulated driving up to 7–14 h after the nap; completing the full overnight protocol was difficult for some drivers and unscheduled naps occurred.
O’Neill et al. (1999) [25]	Star Mountain Inc., USA	To assess the effects of nondriving on-duty loading/unloading activities on subsequent truck-driver alertness and safety-relevant driving performance using a truck-driving simulator.	Effects were mixed: after morning loading and short break, probe performance improved, but this benefit wore off; after afternoon loading, probe responses worsened in the first subsequent driving hour and cognitive errors increased; lane-keeping deteriorated immediately after morning loading (possibly upper-body muscle fatigue); overall performance may decrease after 12–14 h of duty.
Oron-Gilad & Ronen (2007) [34]	Ben-Gurion University of the Negev, Israel	To examine how road characteristics influence fatigue development and fatigue-related driving-performance changes during prolonged simulated driving.	Fatigue was induced by the simulator drive itself, with symptoms appearing within 45–75 min depending on measure; performance decrements depended on road type: straight segments showed deterioration in lane maintenance and steering quality, highway showed steering degradation and low vigilance from early in the drive, and winding segments showed increased speed over time; heart rate variability increased over time and subjective fatigue increased, with strategy differences consistent with road tolerance.
Oron-Gilad et al. (2008) [23]	Ben-Gurion University of the Negev, Israel	To evaluate whether three alertness maintaining tasks can mitigate passive fatigue during monotonous simulated driving in professional truck drivers, compared with driving only and driving while listening to music.	Trivia prevented driving-performance deterioration and increased alertness, whereas working memory interfered with driving (lower speed, higher subjective fatigue) and the choice reaction time task increased subjective sleepiness and reduced arousal; effects were time-limited, and music was more beneficial than expected (no performance deterioration or subjective fatigue).
Ranney, Simmons, & Masalonis (1999) [26]	Liberty Mutual Research Center for Safety and Health, USA	To determine whether intermittent indirect glare exposure over an 8 h simulated drive produces progressive impairment, and to assess time-of-day/time-on-task effects on driving performance, sleepiness, and critical tracking.	No evidence that prolonged intermittent glare caused progressive driving impairment. Clear time-related effects were observed: pedestrian detection response time and variability worsened in block 3 (post-lunch dip) with recovery in block 4; mean speed increased progressively across blocks; subjective sleepiness increased over time; critical tracking performance deteriorated after early blocks with partial recovery.
Ranney, Simmons, Boulos et al. (1999) [27]	Liberty Mutual Research Center for Safety and Health, USA	To examine the effects of a 3 h afternoon nap on simulated nighttime driving performance following a night of partial sleep restriction, during an extended overnight driving session.	Extended overnight driving after partial sleep restriction induced significant impairment (crash frequency increased over runs; mirror-target detection decreased; pedestrian detection responses slowed). The afternoon nap improved overall performance, reducing crash frequency and increasing the proportion of each run completed before the first crash (nap 72% vs. no-nap 51%).
Saeed et al. (2017) [16]	University of Twente, Netherlands	To detect under-, normal- and over-aroused states in professional truck drivers using deep learning on wearable physiological signals collected during simulator driving, based on a combined stress–sleepiness ground-truth scheme.	A 7-layer convolutional neural network trained on raw heart rate, skin conductance and skin temperature achieved weighted F-score 0.82 and Kappa 0.64, outperforming a baseline neural network and denoising autoencoder models; methods to handle class imbalance brought only marginal additional gains and normal versus over-arousal remained difficult to separate.
Sandström et al. (2017) [28]	University of Helsinki, Finland	To derive absolute lane position from steering-wheel signal and develop a non-video lane-departure warning algorithm that predicts lane departures up to 3 s before they occur.	Using steering-wheel data, lane position could be derived with moderate correlation to measured lane position (test set r ≈ 0.43–0.48 depending on simulator). The proposed lane departure warning (LDW) algorithm predicted lane departures up to 3 s ahead with sensitivity 47.1% and specificity 70.8% (best at 1 s time window), exceeding reported estimates for video-based LDW sensitivity.
Yu et al. (2025) [18]	Guangzhou Highway, China	To investigate driving behavior patterns of truck drivers under normal and fatigue states in a simulator and develop an in-transit driving risk assessment model integrating driver state and driving behavior using long short-term memory (LSTM).	LSTM-based in-transit risk model using speed, acceleration, steering and lateral position features plus KSS-based fatigue labels achieved high performance (accuracy 92.9%, sensitivity 94.5%, F-score 93.7%), and fatigue was associated with higher speed, steering wheel angle and lateral deviation than normal driving.
Zhou et al. (2025) [21]	Fuzhou University, China	To explore drivers’ interaction preferences with in-vehicle voice assistants under different fatigue states and experimentally evaluate how different voice interaction styles influence fatigue arousal and driving alertness.	High-intensity, highly interactive voice prompts significantly reduced reaction time, increased EEG-based arousal and lowered PERCLOS under severe fatigue, while low-intensity prompts were least effective; drivers’ preferred dialogue style depended on fatigue level and occupation (long-haul truck, taxi, private car).

Table A2. Characteristics of participants.

Author, Year	Sample Size	Male	Female	Age (Mean ± SD)	Inclusion Criteria
Afghari et al. (2022) [17]	35	29	6	41.97 ± 9.82	Professional truck drivers working day, night, or mixed shifts; valid truck license; availability for simulator experiment.
Al Haddad et al. (2022) [20]	34	28	6	NR	Active professional truck drivers; at least 6 months of driving experience.
Anund et al. (2018) [33]	11	11	0	23.5 ± 1.5	Professional heavy-vehicle drivers; age 19–25; males; not working only night shifts; preferable self-reported evening types; body mass index < 30; no hearing aid; no sleep disorders; self-reported normal sensitivity to stressful situations; no extremes in terms of self-reported personalities (extrovert or introvert).
Cardoso et al. (2019) [29]	20	20	0	50.4 ± 13.4	Commercially licensed truck drivers with Class 1, 2 or 3 license; males; experienced with standard transmission.
Daza et al. (2011) [13]	10	NR	NR	NR	Professional drivers; driving at least 5000 km a year; no habitual sleep disturbances.
Drory (1985) [30]	60	60	0	39.0 ± NR	Professional haul truck drivers employed by a large mining firm; randomly selected from a population of 300 drivers.
Dziuda et al. (2021) [14]	8	8	0	33.13 ± 4.39	Professional truck drivers; males.
Eskandarian & Mortazavi (2007) [15]	13	NR	NR	41.0 ± NR	Commercially licensed truck drivers.
García et al. (2010) [12]	20	NR	NR	NR	Frequent drivers, driving at least 5000 km a year; no habitual sleep disturbances.
Gillberg et al. (1996) [31]	9	NR	NR	42.0 ± NR	Professional drivers.
Giorgi et al. (2024) [19]	10	NR	NR	NR	Truck drivers with normal or corrected-to-normal vision.
Hjälmdahl et al. (2017) [22]	24	24	0	NR	Truck drivers; males; majority of participants’ usual driving was regional or long-haul.
Howard et al. (2014) [32]	20	20	0	41.9 ± 8.3	Professional drivers; at least 30 h of driving per week and duration of professional driving of at least 12 months. Excluded if contraindications to sleep deprivation; Epworth Sleepiness Scale > 15; high sleep propensity precluding driving; sleep disorder; medical conditions affecting neuropsychological performance.
Macchi et al. (2002) [24]	8	7	1	40.9 ± 5.9	Professional long-haul drivers recruited from the metropolitan Boston area; screened via general health questionnaire, Sleep Disorders Questionnaire, and Morningness–Eveningness Questionnaire.
O’Neill et al. (1999) [25]	10	10	0	43.2 ± NR	Experienced holders of commercial driver licenses with long-haul experience; non-smokers; medically cleared; screened for simulator sickness.
Oron-Gilad & Ronen (2007) [34]	10	10	0	22.0 ± NR	Mandatory-service military truck drivers randomly selected from two military transport centers.
Oron-Gilad et al. (2008) [23]	12	12	0	NR	Professional truck drivers; at least 10 years of truck-driving experience; termination criterion if the driver fell asleep repeatedly at the wheel.
Ranney, Simmons, & Masalonis (1999) [26]	12	12	0	NR	Valid commercial driver’s license; at least 3 years of truck-driving experience; screened for adequate visual acuity (20/40) and not extremely sensitive to glare.
Ranney, Simmons, Boulos et al. (1999) [27]	8	7	1	NR	Professional long-haul drivers recruited via newspaper advertisements; at least 2 years of long-haul experience; completed telephone screening and questionnaires on general health and sleep patterns.
Saeed et al. (2017) [16]	11	NR	NR	NR	Professional truck drivers.
Sandström et al. (2017) [28]	34	32	2	30 ± 12; 34 ± 11	Driver trainees at Työtehoseura; good health, good sleep, no medication affecting sleep/sleepiness; ability to abstain from caffeine for 37 h; BMI (body mass index) 22–30; no simulator sickness; rested prior to experiment with 10h time-in-bed per night.
Yu et al. (2025) [18]	30	NR	NR	46.0 ± 5.4	Truck drivers with normal or corrected-to-normal vision; at least 8 years of licensing; regular weekly driving.
Zhou et al. (2025) [21]	10	10	0	NR	Long-haul truck drivers; males; age 25–55; normal vision; good physical health; no history of epilepsy, mental illness or severe motion sickness; valid driving license; at least 3 years of driving experience; no serious traffic violations.

Note: SD represents standard deviation; NR represents not reported.

Table A3. Characteristics of driving-simulators.

Author, Year	Simulator Type	Truck Cabin Simulated	Simulator FOV (Horizontal)	Simulator DOF
Afghari et al. (2022) [17]	Static	Yes	135°	0
Al Haddad et al. (2022) [20]	Static	Yes	NR	0
Anund et al. (2018) [33]	Dynamic	No	120°	4
Cardoso et al. (2019) [29]	Static	Yes	NR	0
Daza et al. (2011) [13]	Dynamic	Yes	180°	6
Drory (1985) [30]	Static	Yes	NR	0
Dziuda et al. (2021) [14]	Dynamic	Yes	180°	6
Eskandarian & Mortazavi (2007) [15]	Static	Yes	135°	0
García et al. (2010) [12]	Dynamic	Yes	180°	6
Gillberg et al. (1996) [31]	Dynamic	Yes	120°	3
Giorgi et al. (2024) [19]	Static	No	160°	0
Hjälmdahl et al. (2017) [22]	Dynamic	Yes	120°	3
Howard et al. (2014) [32]	Static	No	NR	0
Macchi et al. (2002) [24]	Static	Yes	NR	0
O’Neill et al. (1999) [25]	NR	NR	NR	NR
Oron-Gilad & Ronen (2007) [34]	Static	No	40°	0
Oron-Gilad et al. (2008) [23]	Static	No	40°	0
Ranney, Simmons, & Masalonis (1999) [26]	Static	Yes	NR	0
Ranney, Simmons, Boulos et al. (1999) [27]	Static	Yes	NR	0
Saeed et al. (2017) [16]	NR	NR	NR	NR
Sandström et al. (2017) [28]	Static	Yes	NR	0
Yu et al. (2025) [18]	Static	No	NR	0
Zhou et al. (2025) [21]	Static	No	NR	0

Note: DOF represents degrees of freedom; FOV represents field of view; NR represents not reported.

Table A4. Experiment assessment results.

ID	[17]	[20]	[33]	[29]	[13]	[30]	[14]	[15]	[12]	[31]	[19]	[22]	[32]	[24]	[25]	[34]	[23]	[26]	[27]	[16]	[28]	[18]	[21]
Was driving simulator apparatus suitable for the research intent? (Y/ N/NR)	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y
Was a method of randomization or counterbalance of scenarios performed? (Y/N/NR/NA)	Y	NR	Y	NA	NR	Y	Y	N	NR	Y	N	Y	N	NR	Y	Y	Y	Y	NR	NA	Y	NR	N
Was a method of randomization or counterbalance of geometries performed? (Y/N/NR/NA)	Y	NR	NR	NA	NR	NR	NA	NA	NR	Y	N	NA	N	NA	NR	N	NA	NR	NR	NA	NA	NA	NA
Was a statistical method of analysis applied? (Y/N)	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y
Does the sample represent the target population? (Y/N)	Y	Y	Y	Y	Y	Y	Y	Y	N	Y	Y	Y	Y	Y	Y	N	Y	Y	Y	Y	N	Y	Y
Was a practical trial performed? (Y/NR)	Y	Y	Y	NR	NR	Y	Y	Y	Y	NR	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	Y	NR
Was a method to assess driver’s familiarization with the simulator specified? (Y/N/NR/ NA)	N	NR	N	NR	NR	Y	N	NR	NR	NR	N	Y	N	Y	NR	Y	N	Y	N	N	N	N	NR
Was motion sickness assessed? (Y/NR)	NR	NR	NR	NR	NR	NR	NR	NR	NR	NR	NR	NR	NR	NR	Y	Y	NR	NR	NR	NR	Y	Y	NR
Was there a clear description of the method to assess motion sickness? (Y/N/NR/NA)	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	N	N	NA	NA	NA	NA	N	N	NA
Was the existence of outliers assessed? (Y/ NR)	NR	NR	NR	NR	NR	NR	NR	NR	NR	NR	NR	NR	NR	NR	NR	NR	NR	NR	NR	NR	NR	NR	NR
Was there a clear description of the method to assess outliers? (Y/N/NR/NA)	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
∑Y/(11-∑NA) = %	67%	44%	56%	43%	33%	67%	63%	50%	33%	56%	44%	75%	44%	63%	60%	60%	63%	60%	44%	57%	50%	56%	38%

Note: Y represents yes; N represents no; NR represents not reported; NA represents not applicable.

Table A5. Characteristics of driving-simulator experimental protocols.

Author, Year	Driving Block Duration	Block Structure & Breaks	Road Type	Sleep Manipulation Protocol
Afghari et al. (2022) [17]	15 min	Two practice drives (duration not reported) preceded a single continuous scenario drive of ≈16.5–18 km (≤15 min), with no scheduled breaks and no interventions reported.	Mixed (rural/highway)	Sleepiness was not experimentally induced, although participants were shift-working truck drivers who may have presented underlying sleepiness due to their work schedules.
Al Haddad et al. (2022) [20]	55 min	Two practice drives (5 min each) were followed by three experimental drives (15 min each): baseline (no interventions), speed-limit information and speed warnings, and the same interventions under a high sleepiness condition; questionnaires were completed between drives and a 10 min break occurred after Drive 2.	Mixed (rural/highway)	No explicit sleep restriction, deprivation, or post-shift protocol was reported.
Anund et al. (2018) [33]	30 min	Each participant completed six lab visits (three day, three night); per visit, three 30 min drives were performed with 90 min breaks between drives for PVT test, food and rest; KSS was collected every 5 min and PVT was administered before the first drive and after each drive.	Rural	Sleepiness was manipulated by time-of-day (day vs night) without sleep restriction; participants were instructed to sleep ≥ 7 h for three nights before sessions, avoid alcohol for 72 h, and abstain from smoking/caffeine for 3 h pre-drive.
Cardoso et al. (2019) [29]	110 min	On two separate days, participants completed a 10 min simulator session, followed by 90 min on-road highway driving in an instrumented truck, then a second 10 min simulator session; driving fatigue scale questionnaires and UFOV (useful field of view) tests were administered before and after each simulator session; no explicit rest breaks beyond transitions were reported.	Highway	No explicit sleep restriction, deprivation, or post-shift protocol was reported.
Daza et al. (2011) [13]	60 min	Driving sessions under both sleep conditions spread over a 24 h period; each session 60 min; break schedule not reported.	Mixed (rural/urban)	Regular sleep vs partial sleep deprivation (4 h sleep night before); conditions implemented per driver across 24 h.
Drory (1985) [30]	420 min	21 blocks of 15 min with 6 min rest between blocks; extra-rest group received an additional 30 min rest after the first 3 h then resumed.	NR	No explicit sleep restriction, deprivation, or post-shift protocol was reported
Dziuda et al. (2021) [14]	50 min	One practice drive (5 min) preceded a single continuous scenario drive of ~45 min, with no scheduled breaks. Fatigue symptoms scales questionnaires were administered 40 min before and after the main drive.	Mixed (urban/highway)	Sleepiness was induced via a post-night-shift condition: drivers were tested twice, once rested and once drowsy after working a night shift, with sessions separated by at least several days.
Eskandarian & Mortazavi (2007) [15]	210 min	Two sessions per driver: one morning drive (single 52-mile scenario) and one night drive (same 52-mile scenario repeated between 01:30 a.m.–05:00 a.m.); break schedule not reported.	Highway	Sleep-related instructions consistent with sleep deprivation for night session (no daytime sleep; limited caffeine); exact deprivation duration not reported.
García et al. (2010) [12]	60 min	Each subject completed 6 × 60 min sessions under each of two sleep conditions over 24 h; break schedule not reported.	Mixed (rural/urban)	Regular sleep schedule (2 nights) vs partial sleep deprivation (4 h sleep night before)
Gillberg et al. (1996) [31]	90 min	Three consecutive 30 min periods (a, b, c); in NightRest/NightNap period b was a 30 min rest pause or nap; otherwise all periods were driving	NR	No explicit sleep restriction, deprivation, or post-shift protocol was reported
Giorgi et al. (2024) [19]	60 min	Participants completed a 15 min high-demand circuit task, followed by 45 min monotonous urban driving task; no scheduled rest breaks beyond brief transitions were reported; eyes open condition and questionnaires were collected before and after each drive.	Urban	No explicit sleep restriction, deprivation, or post-shift protocol was reported.
Hjälmdahl et al. (2017) [22]	55 min	Participants completed a 10 min practice drive, followed by three 45 min drives (baseline cruise control, partial automation, full automation), each comprising sequential situations of light traffic, heavy traffic, and heavy traffic + fog; break scheduling was not reported; a questionnaire was administered after finishing the training.	Highway	No explicit sleep restriction, deprivation, or post-shift protocol was reported
Howard et al. (2014) [32]	210 min	Overnight laboratory session with repeated test batteries at 09:00, 12:00, 16:00, 20:00, 00:00, 03:00, 06:00; between tests participants remained awake under staff monitoring and engaged in passive activities; first 6 min of each 30 min drive excluded from analysis.	Highway	Total sleep deprivation: 24 h continuous wakefulness (wake 07:00 a.m., lab 08:30 a.m., awake until 07:00 a.m. next day); participants slept ≥ 7 h prior night.
Macchi et al. (2002) [24]	600 min	Repeated 2 h driving runs across baseline and night sessions; additional testing at 21:30; snack substituted for one post-run test battery session; other break scheduling between runs not fully reported	NR	Partial sleep restriction each condition: laboratory sleep 00:00 a.m.–05:00 a.m. prior to testing; nap manipulation: scheduled 3 h afternoon nap opportunity vs no nap
O’Neill et al. (1999) [25]	840 min	Driving day began 07:00 a.m. with eight 90 min scenarios; brief stretch breaks between scenarios; one 30 min rest break morning, one 60 min midday break (after scenario 4), one 30 min rest break evening; during loading week on days 2,3,5, scenario 2 and scenario 6 replaced by 90 min loading/unloading tasks.	Mixed (highway/urban)	No explicit sleep restriction, deprivation, or post-shift protocol was reported
Oron-Gilad & Ronen (2007) [34]	92 min	Continuous prolonged drive with periodic verbal rating every 25 min; drive ended on sleep/accident/inability.	Mixed (highway/rural)	No explicit sleep restriction, deprivation, or post-shift protocol was reported
Oron-Gilad et al. (2008) [23]	NR	Five sessions per driver, each lasting approximately 2 h (including questionnaires and setup); within-session: 5 min rest baseline recording before drive; no planned breaks during drive; alertness maintaining tasks (when applicable) added during segment 1c (third straight segment); music condition applied for the entire drive.	Rural	No explicit sleep restriction, deprivation, or post-shift protocol was reported
Ranney, Simmons, & Masalonis (1999) [26]	400 min	Eight simulator runs per session, each lasting approximately 50 min; breaks after every two runs (four 2 h blocks).	Rural	No explicit sleep restriction, deprivation, or post-shift protocol was reported
Ranney, Simmons, Boulos et al. (1999) [27]	480 min	Overnight driving consisted of four 2 h runs separated by 30 min breaks; each 2 h run divided into four 30 min blocks matched for event frequency and workload.	Rural	Partial sleep restriction (5 h sleep period on the previous night) in both replications; nap condition included scheduled 3 h nap (02:00 p.m.–05:00 p.m.) in bed in darkness; no-nap condition involved sedentary activities during 02:00 p.m.–05:00 p.m.
Saeed et al. (2017) [16]	130 min	Participants completed a 15 min practice trial, followed by a 25 min stress trial (baseline, moderate, high stress), then a 15 min break (varying between drivers), and a 90 min fatigue trial; KSS prompts were administered every 10 min during the fatigue trial.	NR	No explicit sleep restriction, deprivation, or post-shift protocol was reported
Sandström et al. (2017) [28]	660 min	12 test sessions over 36 h sustained wakefulness; 55 min drive every 3 h; each session included 10 min PVT, 55 min drive, 10 min PVT; participants had 90 min “own time” between sessions.	Rural	Sustained wakefulness (36 h); no caffeine allowed; prior week sleep schedule controlled via diaries (10 h time-in-bed)
Yu et al. (2025) [18]	60 min	Participants drove continuously for about one hour on the experimental road with no scheduled breaks; interventions not reported.	Highway	A post-night-shift protocol was used to induce fatigue by scheduling simulator drives immediately after drivers’ night shifts, with no additional sleep restriction reported.
Zhou et al. (2025) [21]	50 min	Each session consisted of a fatigue induction phase (~30 min continuous monotonous straight-line driving) followed by an awakening trial (~20 min) under one of three voice-interaction intensities at a given fatigue level; breaks not reported; KSS was assessed after each induction.	Rural	No explicit sleep restriction, deprivation, or post-shift protocol was reported

Note: NR represents not reported.

Table A6. Minimum scenario and protocol details recommended for reporting in truck-driving simulator studies on fatigue induction.

Reporting Domain	Minimum Information to Report	Rationale for Reporting
Population definition and eligibility	Target driver group (e.g., long-haul, regional, trainee), license status, professional experience criteria, inclusion/exclusion criteria, and any health-, sleep-, or duty-related restrictions	Clear population definition is necessary to support transferability across truck-driver populations and to interpret whether findings apply to professional, mixed, or trainee samples.
Sample description and attrition	Sample size, sex distribution, age, driving experience, recruitment source, withdrawals, exclusions, and tolerance-related attrition	External validity depends on adequate sample characterization, while poorly documented exclusions may distort interpretation of fatigue-related outcomes.
Baseline sleep–wake and duty state	Prior sleep duration, prior wake duration, recent duty schedule, post-shift status, sleep restriction/deprivation protocol if used, and method of verification	Baseline sleep pressure and circadian state are central determinants of fatigue expression and must be documented to support causal interpretation of scenario effects.
Stimulant and substance control	Restrictions or allowances for caffeine, nicotine, alcohol, medication, and other stimulants, including timing relative to testing	Observed state changes may be influenced by stimulant or substance effects unless these factors are explicitly controlled or reported.
Test scheduling and circadian context	Local start time, time-of-day window, day/night scheduling, lighting condition, and any explicit circadian or post-shift manipulation	Fatigue-related change is strongly conditioned by circadian timing and environmental light exposure; inconsistent reporting limits reproducibility and comparison across protocols.
Simulator system and configuration	Static/dynamic platform, cabin representation, field of view, motion capability, display configuration, and other hardware features relevant to immersion and control	Simulator characteristics may influence workload, realism, tolerance, and fatigue trajectories, and should therefore be reported to support inter-laboratory comparability.
Familiarization and stabilization procedures	Presence, duration, and structure of familiarization drives, criteria for adaptation/stabilization, and whether baseline was collected only after familiarization	Early-session outcomes may be biased if behavioral adaptation to the simulator has not been established before baseline measurement.
Exposure structure	Total time on task, duration of each block, number of runs, continuous versus segmented design, break schedule, and any interruptions for questionnaires or other procedures	Fatigue induction depends not only on total duration but also on continuity, segmentation, and interruption structure.
Road context and geometry	Road type (e.g., highway, rural, mixed), lane structure, straight versus curved segments, curvature characteristics, route repetitiveness, and any distinctive roadway features	Road type and geometry are core determinants of control demand and monotony, yet are often reported too broadly to allow direct benchmarking across studies.
Traffic configuration and interaction demand	Traffic density, lead-vehicle behavior, overtaking opportunities, interaction complexity, and whether traffic was fixed, scripted, or adaptive	Traffic complexity directly shapes underload versus active driving demand and is therefore essential for interpreting fatigue induction logic.
Monotony design	Whether monotony was an explicit manipulation or an implicit property of the scenario, and how it was operationalized (e.g., repetitive scenery, low traffic, prolonged straight segments, limited task switching)	Monotony is a central explanatory construct in the fatigue-induction literature and should be reported explicitly rather than inferred indirectly.
Event structure and workload probes	Event type, frequency, timing, distribution across the drive, and whether events functioned as sparse probes or as sustained workload drivers	Event schedules affect workload stability and determine whether performance measures reflect monotony-related fatigue or fluctuating task demand.
Secondary tasks, automation, and interventions	Any in-drive secondary task, vigilance probe, music, automation/platooning condition, feedback system, or countermeasure, including timing and implementation details	Additional task demands or control-support systems may attenuate or amplify fatigue-related change and should be separated clearly from core scenario effects.
Sequencing control and counterbalancing	Condition order, route order, randomization/counterbalancing procedures, and explicit modelling of sequence effects if balancing was not feasible	Apparent scenario effects may otherwise reflect time on task, habituation, learning, or recovery rather than the intended manipulation.
Simulator tolerance and sickness handling	Whether simulator sickness was assessed, instrument used, timing of assessment, mitigation procedures, sickness-related exclusions, and protocol adjustments due to discomfort	Simulator tolerance can affect exposure duration, participant retention, performance, and subjective fatigue reporting.
Outcome strategy and timing	Outcome domains used (subjective, vehicle-based, behavioral, physiological, performance-based), thresholds if applicable, timing and frequency of repeated measurements, and whether fatigue was analyzed as a trajectory or as a categorical state	Repeated within-session assessment is essential for capturing fatigue dynamics and for distinguishing progressive change from isolated endpoint differences.
Data handling and analytical transparency	Outlier definition and handling, missing-data treatment, exclusions from analysis, and analysis decisions affecting fatigue interpretation	Transparent data handling is necessary to judge the robustness of reported fatigue-related effects and to support reproducibility.

References

Shams, Z.; Mehdizadeh, M.; Khani Sanij, H. “I neither sleep well nor drive cautiously”: How does sleep quality relate to crash involvement directly and indirectly? J. Transp. Health 2020, 18, 100907. [Google Scholar] [CrossRef]
Thiffault, P.; Bergeron, J. Fatigue and individual differences in monotonous simulated driving. Pers. Individ. Differ. 2003, 34, 159–176. [Google Scholar] [CrossRef]
Dawson, D.; McCulloch, K. Managing fatigue: It’s about sleep. Sleep Med. Rev. 2005, 9, 365–380. [Google Scholar] [CrossRef] [PubMed]
Williamson, A.; Lombardi, D.A.; Folkard, S.; Stutts, J.; Courtney, T.K.; Connor, J.L. The link between fatigue and safety. Accid. Anal. Prev. 2011, 43, 498–515. [Google Scholar] [CrossRef] [PubMed]
Grandjean, E. Fatigue in industry. Br. J. Ind. Med. 1979, 36, 175–186. [Google Scholar] [CrossRef]
Fonseca, T.; Ferreira, S. Drowsiness Detection in Drivers: A Systematic Review of Deep Learning-Based Models. Appl. Sci. 2025, 15, 9018. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
Fonseca, T.; Ferreira, S. Scenario Parameters for Fatigue Induction in Truck-Driving Simulators: A Systematic Review of Experimental Designs. PROSPERO 2025, CRD420261302272. Available online: https://www.crd.york.ac.uk/PROSPERO/view/CRD420261302272 (accessed on 18 February 2026).
National Heart Lung and Blood Institute. Study Quality Assessment Tools. Available online: https://www.nhlbi.nih.gov/health-topics/study-quality-assessment-tools (accessed on 5 February 2026).
Bobermin, M.P.; Silva, M.M.; Ferreira, S. Driving simulators to evaluate road geometric design effects on driver behaviour: A systematic review. Accid. Anal. Prev. 2021, 150, 105923. [Google Scholar] [CrossRef]
Fonseca, T.; Ferreira, S. Monitoring Technologies for Truck Drivers: A Systematic Review of Safety and Driving Behavior. Appl. Sci. 2025, 15, 6513. [Google Scholar] [CrossRef]
García, I.; Bronte, S.; Bergasa, L.M.; Hernández, N.; Delgado, B.; Sevillano, M. Vision-based drowsiness detector for a realistic driving simulator. In Proceedings of the 13th International IEEE Conference on Intelligent Transportation Systems, Funchal, Portugal, 19–22 September 2010; pp. 887–894. [Google Scholar] [CrossRef]
Daza, I.G.; Hernández, N.; Bergasa, L.M.; Parra, I.; Yebes, J.J.; Gavilán, M.; Sotelo, M.A. Drowsiness monitoring based on driver and driving data fusion. In Proceedings of the 2011 14th International IEEE Conference on Intelligent Transportation Systems, Washington, DC, USA, 5–7 October 2011; pp. 1199–1204. [Google Scholar] [CrossRef]
Dziuda, Ł.; Baran, P.; Zieliński, P.; Murawski, K.; Dziwosz, M.; Krej, M.; Piotrowski, M.; Stablewski, R.; Wojdas, A.; Strus, W.; et al. Evaluation of a fatigue detector using eye closure-associated indicators acquired from truck drivers in a simulator study. Sensors 2021, 21, 6449. [Google Scholar] [CrossRef]
Eskandarian, A.; Mortazavi, A. Evaluation of a smart algorithm for commercial vehicle driver drowsiness detection. In Proceedings of the 2007 IEEE Intelligent Vehicles Symposium, Istanbul, Turkey, 13–15 June 2007; pp. 553–559. [Google Scholar] [CrossRef]
Saeed, A.; Trajanovski, S.; Van Keulen, M.; Van Erp, J. Deep Physiological Arousal Detection in a Driving Simulator Using Wearable Sensors. In Proceedings of the IEEE International Conference on Data Mining Workshops, ICDMW, New Orleans, LA, USA, 18–21 November 2017; pp. 486–493. [Google Scholar] [CrossRef]
Afghari, A.P.; Papadimitriou, E.; Pilkington-Cheney, F.; Filtness, A.; Brijs, T.; Brijs, K.; Cuenen, A.; De Vos, B.; Dirix, H.; Ross, V.; et al. Investigating the effects of sleepiness in truck drivers on their headway: An instrumental variable model with grouped random parameters and heterogeneity in their means. Anal. Methods Accid. Res. 2022, 36, 100241. [Google Scholar] [CrossRef]
Yu, J.; Sun, M.; Liu, R.; Hao, S.; Li, J.; Zhao, N. In-Transit Driving Risk Assessment of Truck Drivers Using a Driving Simulator. Adv. Transdiscipl. Eng. 2025, 72, 330–338. [Google Scholar] [CrossRef]
Giorgi, A.; Borghini, G.; Colaiuda, F.; Menicocci, S.; Ronca, V.; Vozzi, A.; Rossi, D.; Aricò, P.; Capotorto, R.; Sportiello, S.; et al. Driving Fatigue Onset and Visual Attention: An Electroencephalography-Driven Analysis of Ocular Behavior in a Driving Simulation Task. Behav. Sci. 2024, 14, 1090. [Google Scholar] [CrossRef]
Al Haddad, C.; Abouelela, M.; Hancox, G.; Pilkington-Cheney, F.; Brijs, T.; Antoniou, C. A Multi-Modal Warning–Monitoring System Acceptance Study: What Findings Are Transferable? Sustainability 2022, 14, 12017. [Google Scholar] [CrossRef]
Zhou, C.; Wang, L.; Yang, Y. Dialogue at the Edge of Fatigue: Personalized Voice Assistant Strategies in Intelligent Driving Systems. Appl. Sci. 2025, 15, 6792. [Google Scholar] [CrossRef]
Hjälmdahl, M.; Krupenia, S.; Thorslund, B. Driver behaviour and driver experience of partial and fully automated truck platooning—A simulator study. Eur. Transp. Res. Rev. 2017, 9, 8. [Google Scholar] [CrossRef]
Oron-Gilad, T.; Ronen, A.; Shinar, D. Alertness maintaining tasks (AMTs) while driving. Accid. Anal. Prev. 2008, 40, 851–860. [Google Scholar] [CrossRef] [PubMed]
Macchi, M.M.; Boulos, Z.; Ranney, T.; Simmons, L.; Campbell, S.S. Effects of an afternoon nap on nighttime alertness and performance in long-haul drivers. Accid. Anal. Prev. 2002, 34, 825–834. [Google Scholar] [CrossRef] [PubMed]
O’neill, T.R.; Krueger, G.P.; Van Hemel, S.B.; Mcgowan, A.L.; Rogers, W.C.; O’neill, T.R.; Krueger, G.P.; Van Hemel, S.B.; Mcgowan, A.L.; Moun, S. Effects of cargo loading and unloading on truck driver alertness. Transp. Res. Rec. 1999, 1686, 42–48. [Google Scholar] [CrossRef]
Ranney, T.A.; Simmons, L.A.; Masalonis, A.J. Prolonged exposure to glare and driving time: Effects on performance in a driving simulator. Accid. Anal. Prev. 1999, 31, 601–610. [Google Scholar] [CrossRef]
Ranney, T.A.; Simmons, L.A.; Boulos, Z.; Macchi, M.M. Effect of an afternoon nap on nighttime performance in a driving simulator. Transp. Res. Rec. 1999, 1686, 49–56. [Google Scholar] [CrossRef]
Sandström, M.; Lampsijärvi, E.; Holmström, A.; Maconi, G.; Ahmadzai, S.; Meriläinen, A.; Hæggström, E.; Forsman, P. Detecting lane departures from steering wheel signal. Accid. Anal. Prev. 2017, 99, 272–278. [Google Scholar] [CrossRef]
Cardoso, M.; Fulton, F.; Callaghan, J.P.; Johnson, M.; Albert, W.J. A pre/post evaluation of fatigue, stress and vigilance amongst commercially licensed truck drivers performing a prolonged driving task. Int. J. Occup. Saf. Ergon. 2019, 25, 344–354. [Google Scholar] [CrossRef] [PubMed]
Drory, A. Effects of Rest and Secondary Task on Simulated Truck-Driving Task Performance. Hum. Factors 1985, 27, 201–207. [Google Scholar] [CrossRef]
Gillberg, M.; Kecklund, G.; Åkerstedt, T. Sleepiness and performance of professional drivers in a truck simulator—Comparisons between day and night driving. J. Sleep Res. 1996, 5, 12–15. [Google Scholar] [CrossRef]
Howard, M.E.; Jackson, M.L.; Swann, P.; Berlowitz, D.J.; Grunstein, R.R.; Pierce, R.J. Deterioration in Driving Performance During Sleep Deprivation Is Similar in Professional and Nonprofessional Drivers. Traffic Inj. Prev. 2014, 15, 132–137. [Google Scholar] [CrossRef]
Anund, A.; Ahlström, C.; Lic, C.F.; Åkerstedt, T. Are professional drivers less sleepy than non-professional drivers? Scand. J. Work Environ. Health 2018, 44, 88–95. [Google Scholar] [CrossRef]
Oron-Gilad, T.; Ronen, A. Road characteristics and driver fatigue: A simulator study. Traffic Inj. Prev. 2007, 8, 281–289. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The identification, screening, and inclusion process of eligible studies using PRISMA 2020.

Figure 2. Cumulative number of published studies on fatigue induction in truck-driving simulators.

Figure 3. Country wise distribution of selected studies.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fonseca, T.; Ferreira, S. Scenario Parameters for Fatigue Induction in Truck-Driving Simulators: A Systematic Review of Experimental Designs. Appl. Sci. 2026, 16, 3057. https://doi.org/10.3390/app16063057

AMA Style

Fonseca T, Ferreira S. Scenario Parameters for Fatigue Induction in Truck-Driving Simulators: A Systematic Review of Experimental Designs. Applied Sciences. 2026; 16(6):3057. https://doi.org/10.3390/app16063057

Chicago/Turabian Style

Fonseca, Tiago, and Sara Ferreira. 2026. "Scenario Parameters for Fatigue Induction in Truck-Driving Simulators: A Systematic Review of Experimental Designs" Applied Sciences 16, no. 6: 3057. https://doi.org/10.3390/app16063057

APA Style

Fonseca, T., & Ferreira, S. (2026). Scenario Parameters for Fatigue Induction in Truck-Driving Simulators: A Systematic Review of Experimental Designs. Applied Sciences, 16(6), 3057. https://doi.org/10.3390/app16063057

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Scenario Parameters for Fatigue Induction in Truck-Driving Simulators: A Systematic Review of Experimental Designs

Abstract

1. Introduction

2. Methods

2.1. Protocol

2.2. Eligibility Criteria

2.3. Search Strategy

2.4. Data Collection and Extraction

2.5. Risk of Bias in Individual Studies

2.6. Data Synthesis

3. Results

3.1. Study Selection

3.2. Overview of Included Studies

3.2.1. Temporal and Geographical Distribution

3.2.2. Research Objectives

3.2.3. Participant Sample Characteristics

3.2.4. Driving Simulator Characteristics

3.3. Risk of Bias Within Included Studies

3.4. Scenario Parameters Used to Induce Fatigue (RQ1)

3.4.1. Driving Duration and Exposure Structure

3.4.2. Road Type and Road Geometry

3.4.3. Monotony, Traffic Density, and Event Structure

3.4.4. Environmental Conditions, Time-of-Day, and Local Time Scheduling

3.4.5. Sleep Manipulation and Fatigue Priming Outside the Simulator

3.4.6. Secondary Tasks and Cognitive Demand Additions

3.4.7. Summary of Scenario Parameters Identified

3.5. Scenario Parameters Associated with Measurable Fatigue-Related Outcomes (RQ2)

3.6. Recurring Combinations of Scenario Parameters Associated with Interpretable Fatigue-Related Change (RQ3)

3.7. Methodological Limitations and Research Gaps in Simulator-Based Fatigue Induction Studies (RQ4)

4. Discussion

4.1. Key Findings

4.2. Strengths and Limitations

4.3. Future Research

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI