Emergency Takeover Performance Evaluation of Train Operators in Semi-Automated Urban Rail Transit: An Attention-Enhanced MLP Approach

Ji, Hangrui; Huang, Yuanchun; Wang, Fangsheng; Zhu, Lin; Liu, Zhigang

doi:10.3390/app16041820

Open AccessArticle

Emergency Takeover Performance Evaluation of Train Operators in Semi-Automated Urban Rail Transit: An Attention-Enhanced MLP Approach

by

Hangrui Ji

¹,

Yuanchun Huang

^2,*,

Fangsheng Wang

¹,

Lin Zhu

² and

Zhigang Liu

²

¹

School of Urban Railway Transportation, Shanghai University of Engineering Science, Shanghai 201620, China

²

Shanghai Collaborative Innovation Center of Detection and Assessment for Operation Safety of Railway Transit, Shanghai University of Engineering Science, Shanghai 201620, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(4), 1820; https://doi.org/10.3390/app16041820

Submission received: 9 January 2026 / Revised: 6 February 2026 / Accepted: 10 February 2026 / Published: 12 February 2026

(This article belongs to the Section Transportation and Future Mobility)

Download

Browse Figures

Versions Notes

Abstract

Semi-automated urban rail transit systems still rely on human intervention during safety-critical events, yet emergency takeover performance has received far less attention than in SAE Level-3 road automation. This study focuses on the reaction phase of emergency takeover, defined as the interval from anomaly onset to the train operator’s first control action. We propose a conditional two-stage evaluation framework that jointly assesses event recognition and control execution quality. A simulation-based experiment was conducted to replicate GoA2 operating conditions under controlled emergency scenarios. Three indicators were extracted: (i) event recognition accuracy derived from eye-tracking and retrospective recall, (ii) takeover reaction time, and (iii) initial action accuracy reflecting compliance with operational speed or braking limits. An attention-enhanced multilayer perceptron (MLP) was developed to dynamically weight input features and improve interpretability. The proposed model achieved stable subject-wise performance, with an average accuracy of 0.86 and a macro F1-score of 0.857. These results support the feasibility of interpretable learning-based evaluation for human-in-the-loop safety assessment and provide practical implications for improving operator readiness monitoring and operational safety management in semi-automated metro systems.

Keywords:

urban rail transit; takeover evaluation; semi-automated driving; attention-based neural network; human factors

1. Introduction

Urban rail transit systems are increasingly incorporating automation technologies to improve operational efficiency and safety. Nevertheless, in semi-automated driving modes such as GoA2, train operators remain critical to personal safety and system reliability (Table 1) [1].

According to the Emergency Response Plan for Urban Rail Transit Operational Incidents [2], operating units must initiate emergency handling procedures immediately and strictly follow predefined responsibilities and protocols when operational incidents occur. As front-line personnel, train operators typically detect abnormal situations first and initiate emergency responses. Even on fully automated lines, regulations still require a train operator to remain in the cab during passenger service to continuously monitor train status and respond to faults or emergencies [3].

With increasing automation, the operator’s role has shifted from active control to continuous monitoring. Prolonged exposure to monotonous driving conditions may reduce vigilance and situational awareness. Prior research shows that, under semi-automated driving, changes in attention allocation can increase reaction time and reduce detection rates for safety-critical targets [4]. Therefore, systematically evaluating emergency takeover performance in semi-automated urban rail operations is crucial.

However, most takeover studies concentrate on SAE Level-3 automated road vehicles. In contrast, evaluation frameworks for GoA2 semi-automated urban rail operations are still underdeveloped, particularly for the reaction phase spanning from event onset to the operator’s first control action. Moreover, prior studies often examine either event recognition/attention monitoring or control execution separately, without jointly modeling recognition accuracy and takeover action quality in an integrated and interpretable manner.

In this study, we define the emergency takeover reaction phase as the process from the operator’s initial detection of an abnormal event to event identification and initiation of the first response action. We focus on two key aspects of this phase: (1) emergency event recognition accuracy and (2) the efficiency and effectiveness of takeover execution after event recognition. Through simulation experiments and data-driven modeling, this study develops a comprehensive evaluation framework to quantitatively assess emergency takeover performance by integrating both event recognition accuracy and operational response effectiveness.

1.1. Literature Review

Automation in urban rail transit has reshaped train operators’ roles in semi-automated modes, prompting growing interest in emergency takeover behavior from the perspectives of human factors and transportation safety.

(1): Automation Levels and Characteristics of Emergency Takeover in Rail Transit. In GoA2 semi-automated urban rail transit, ATO manages routine operations, while train operators remain in the cab to monitor system status and intervene during anomalies. In GoA4, the system handles operations and incident responses through intelligent monitoring and remote intervention based on standardized procedures [5]. However, under GoA2, takeover depends on operators’ real-time judgment. Unlike explicit takeover requests in SAE Level-3 road automation, takeover cues in rail operations are typically indirect and are conveyed through dispatch instructions, warning sounds, or system alerts [6]. This setting requires operators to maintain situational awareness during supervision and to regain manual control rapidly in emergencies, imposing high demands on rapid perception and response. However, compared with SAE Level-3 road automation, takeover cues and operational constraints in GoA2 metro systems are more implicit and regulation-driven, which limits the direct transferability of existing road-based takeover findings to rail applications.
(2): Human Factors Determinants of Takeover Performance. Takeover requires operators to rapidly reorient attention and rebuild situational awareness of both the train and its surrounding environment [7]. Research shows that takeover performance depends on takeover cues, operator physiological/cognitive state, scenario complexity, and individual differences. In SAE Level-3 automation, takeover requests trigger situational awareness recovery, which strongly affects takeover outcomes [8,9]. Fatigue, attention allocation, non-driving task engagement, distraction, and cognitive workload can delay response and reduce control quality [10,11]. Ma et al. [12] further showed that pre-takeover distraction or high workload significantly increases reaction time and reduces judgment accuracy and control performance. Recent rail studies adopt data-driven approaches to assess fatigue and distraction. Positive–unlabeled learning combined with nearest-neighbor and random forest (RF) methods enables fatigue classification under limited labels [13], while RF-RFE and SHAP improve feature selection and interpretability for distraction detection [14]. Individual differences remain critical: experienced or simulator-trained operators respond more stably under unexpected events, and higher cognitive ability supports faster reconstruction of situational understanding [15]. Nevertheless, many existing studies focus on general state monitoring rather than explicitly quantifying reaction-phase performance in safety-critical rail tasks, and real-world metro emergencies are difficult to observe in a controlled and repeatable manner.
(3): Modeling Takeover Performance: Statistical Models vs. Machine Learning. Takeover performance evaluation methods can be categorized into two groups. The first group comprises interpretable statistical approaches, including ANOVA and structural equation modeling, which quantify how key determinants influence reaction time and operational outcomes [16,17]. These models support transparent interpretation but struggle with nonlinear and high-dimensional interactions. The second category applies machine learning to physiological and behavioral signals, including heart rate, GSR, and eye-tracking trajectories, to predict takeover performance, typically using algorithms such as RFs and neural networks [18]. Recent work demonstrates real-time operator-state detection [19] and trajectory-based deep learning for safety prediction [20], motivating attention-based models that highlight salient features. Evidence suggests that RF performs robustly in small-sample multi-feature settings, such as physiological response discrimination under noise exposure [21]. Deep learning models further improve takeover prediction; for example, LSTM–BiLSTM–Attention architectures capture temporal dependencies in operator and environmental features [22]. Recent reviews also emphasize ongoing challenges in feature selection and generalizability, particularly when transferring simulator-based findings to real-world operations [9]. Despite improved predictive capability, these data-driven models often face challenges in cross-subject generalization, robustness under limited samples, and the lack of standardized evaluation logic that aligns with the operational decision process in rail emergency takeover.
(4): Attention Mechanisms in Takeover Modeling. Attention mechanisms improve interpretability by learning feature importance weights and emphasizing task-relevant components. Wang et al. [23] integrated channel-wise and spatial attention into a convolutional network and improved driving behavior recognition accuracy. Attention mechanisms can be implemented as either hard attention based on predefined rules or soft attention with data-driven adaptive weighting, with the latter showing strong performance in fatigue detection and attention distribution analysis [24]. For sequential behavior modeling, attention is often combined with BiLSTM networks to identify critical temporal segments for intention recognition and action prediction. Girma et al. [25] proposed an attention-based BiLSTM framework that improves early warning capability and interpretability by highlighting key time windows for decision prediction. In takeover modeling, attention mechanisms emphasize response-phase dynamics within seconds after a takeover request, supporting readiness-to-regain-control assessment. However, most existing studies focus on road traffic scenarios, whereas GoA2 urban rail takeover involves distinct operational constraints and decision logic. Moreover, attention mechanisms are often introduced for interpretability in prediction tasks, while fewer studies integrate attention-based attribution into a structured evaluation framework that distinguishes recognition failures from control-execution quality in rail emergency settings.

1.2. Focus of This Study

As discussed above, existing studies have extensively investigated train operator takeover behaviors in automated driving systems. However, the majority of these efforts have concentrated on road traffic scenarios [18,26,27,28], particularly under SAE Level 3 conditional automation. In contrast, takeover behavior in urban rail transit under GoA2 semi-automated modes involves greater operational complexity and a stronger reliance on individual train operator responses. Although recent studies [29] have begun to explore train operator attention monitoring in rail transit environments using bio-signals such as EEG and ECG, these efforts primarily emphasize the assessment of cognitive states rather than explicitly modeling the decision-making logic underlying emergency takeover behaviors. To address this gap, the present study proposes a two-stage MLP–attention framework that jointly captures event recognition accuracy and control execution quality, thereby enabling a more comprehensive assessment of train operator takeover performance. To clarify the contributions of this study, Table 2 summarizes its key elements—including the research question, driving scenario, emergency type, evaluation indicators, and assessment methods—thereby enabling direct comparison with several closely related studies.

(1): This study investigates emergency takeover behavior in urban rail transit under semi-automated operations, with a particular focus on modeling and evaluating the “reaction phase.” While prior research has primarily focused on takeover behaviors in SAE Level 3 automated road vehicles, this study shifts attention to GoA2 semi-automated metro systems, aiming to identify the cognitive mechanisms and human factors underlying train operator responses in this operational context. Specifically, the emergency takeover reaction phase is defined as the time interval from the onset of an unexpected event to the train operator’s first manual intervention. By integrating behavioral and physiological indicators, this study proposes a comprehensive evaluation approach for assessing takeover performance, thereby addressing a critical gap in takeover research within the urban rail transit domain.
(2): This study advances existing methodologies by enabling the controlled simulation of high-risk scenarios that are difficult to reproduce in actual metro operations. While prior studies have utilized real-world bio-signal data from on-duty metro train operators to monitor attention during safety-critical tasks, such approaches are inherently constrained by the unpredictability and non-repeatability of real-world emergency events. In contrast, this study adopts a simulation-based experimental framework that reproduces hazardous situations under systematically controlled conditions. This experimental setup enables precise manipulation of key variables and facilitates the targeted extraction of behavioral and physiological indicators during the critical reaction phase. Such a design not only enhances the reproducibility and flexibility of takeover research but also provides a safer and more ethically sound environment for investigating extreme conditions that would be impractical or unsafe to examine in live operational settings.
(3): This study proposes an evaluation model that integrates a multi-layer perceptron (MLP) with an attention mechanism to capture high-order nonlinear relationships in takeover behavior while enhancing model interpretability. The attention module dynamically assigns importance weights to input features, enabling the model to focus on the most critical factors underlying train operators’ takeover decisions. Furthermore, class weighting and stratified K-fold cross-validation are incorporated to address data imbalance and to enhance model generalization and robustness. This modeling approach provides a novel and effective pathway for quantitatively assessing takeover performance in the context of urban rail transit.

2. Experimental Design

This study investigates train operators’ emergency takeover behavior in GoA2 semi-automated urban rail transit through a simulator-based driving experiment. The participant sample and professional background are described, together with the simulator-based experimental platform and eye-tracking system used for data acquisition. The experimental procedure is specified in terms of task configuration, emergency event design, and data recording, with particular attention to the perception–decision–action loop involved in emergency takeover. The evaluation indicators are also defined to quantitatively assess reaction-phase performance with respect to recognition accuracy, response timeliness, and operational effectiveness.

2.1. Participants

A total of 50 male participants were recruited (mean age = 26.07 years, SD = 2.10). All participants had professional backgrounds in urban rail transit operations, including active metro train operators and professionally trained simulator operators with substantial experience in simulated train control. Although the participants were relatively young, this demographic reflects a considerable proportion of the front-line operator workforce. As the study focuses on recognition and reaction mechanisms during emergency takeover scenarios, this sample is appropriate for behavioral analysis. Before the experiment, all participants completed a self-reported health questionnaire and provided written informed consent.

2.2. Experimental Apparatus

Figure 1 illustrates the experimental setup and the simulator-based driving environment (Figure 1a,b). A rail transit driving simulation system was employed as the experimental platform. To minimize the potential influence of circadian rhythms, all experiments were conducted within a fixed daily time window, from 14:00 to 16:00. During the simulation, participants operated virtual trains along a designated metro line in Shanghai, comprising both elevated and underground segments, under simulated clear-weather conditions. Participants were instructed to operate the train in Automatic Train Operation (ATO) mode during normal running conditions. Upon arrival at each station, participants were required to confirm the signal status and manually perform door opening and closing operations. Throughout the driving task, participants were required to continuously monitor the train’s operational status. In the event of an unexpected incident, participants were expected to promptly take over manual control of the train. Figure 1c shows the wearable eye-tracking instrument used in this study. A Tobii Glasses wearable eye-tracking device was used to record participants’ gaze distribution, fixation duration, and pupil diameter variations during the simulation, enabling analysis of visual attention allocation and event recognition performance.

2.3. Experimental Procedure

Figure 2 summarizes the overall experimental procedure. In this human-in-the-loop GoA2 setting, train operators acted as on-board supervisors who were responsible for continuous monitoring, abnormal-event recognition, and manual takeover execution when emergencies occurred. Each participant completed a 15 min driving task using a high-fidelity metro driving simulator. During the experiment, a predefined unexpected event was randomly triggered at intervals ranging from 5 to 8 min. Each participant experienced three independent emergency scenarios, resulting in a total of 150 recorded emergency event samples across all participants. The emergency scenarios were selected in accordance with actual metro operational regulations and involved situations requiring train operator-initiated speed reduction or emergency braking. According to official operating guidelines, there are 17 categories of emergency events that mandate speed reduction or emergency braking.

In this study, five representative event types were selected for simulation, including scenarios such as signal failures requiring speed restrictions and track obstructions necessitating emergency stopping. These five event types were selected because they (i) represent common and safety-critical abnormal conditions in metro operations, (ii) directly require standardized and clearly observable control actions (e.g., deceleration or emergency braking), and (iii) can be reproduced in a simulator with controlled triggering conditions to ensure experimental safety and repeatability.

Upon each event trigger, the system automatically recorded the event type and the corresponding trigger timestamp. Simultaneously, the eye-tracking system recorded gaze distribution, fixation duration, and pupil dilation data. After the experiment, a retrospective interview based on video playback was conducted to verify each train operator’s subjective perception of the event and the initial response action taken, thereby enabling assessment of event recognition accuracy and takeover effectiveness. Accordingly, recognition accuracy, takeover reaction time, and initial action accuracy were explicitly grounded in train operators’ perception–decision–action loop.

2.4. Evaluation Indicators

This study conceptualizes the reaction phase as consisting of two sequential yet distinct processes: event recognition and control takeover. Event recognition refers to the process by which the train operator detects and interprets the nature of an emergent event, with successful recognition defined by accurate identification of the event type. Control takeover refers to the process whereby the train operator executes appropriate operational actions following event recognition; in this study, particular emphasis is placed on the initial takeover action (e.g., initiating braking or deceleration).

Three evaluation indicators are designed to assess these two processes, as summarized in Table 3. Together, these indicators characterize reaction-phase performance in terms of accuracy, including event recognition accuracy and action correctness, timeliness, including recognition time and response time, and operational effectiveness reflected by takeover execution quality.

3. Model Development

To fully account for train operators’ incident recognition ability, this study proposes a conditional two-stage evaluation framework, formulated as a classification–scoring hybrid architecture. As illustrated in Figure 3, the first stage of the framework treats incident recognition accuracy as a prerequisite condition: if event identification accuracy equals 0 (i.e., incorrect), the model directly outputs a takeover performance level of “needs improvement” and bypasses the subsequent scoring stage, as misrecognition alone indicates an unsatisfactory response. If the recognition accuracy equals 1 (i.e., correct), the model proceeds to the second stage of evaluation. In the second stage, a regression sub-model based on a multilayer perceptron (MLP) with four hidden layers is employed to quantitatively assess the train operator’s takeover operation performance. The predicted continuous score is subsequently mapped to a categorical takeover performance level. This conditional two-stage evaluation framework coherently integrates the correctness of incident recognition with the quality of takeover operations. By evaluating control actions only after correct incident identification, the framework ensures logical consistency in the evaluation process and enhances both the accuracy and reliability of takeover performance assessment. Importantly, this design also reflects operational safety logic in metro driving, where incorrect event understanding may lead to unsafe responses regardless of response speed or control intensity.

Model Architecture

In the second stage of the evaluation framework, a multilayer perceptron (MLP) with four hidden layers is constructed to capture the complex nonlinear relationships between multiple influencing factors and emergency takeover performance indicators. The MLP architecture comprises an input layer, multiple hidden layers, and an output layer, with each neuron fully connected to the neurons in the subsequent layer. Non-linear activation functions are applied to enable the model to learn complex feature representations. For clarity, the mapping function of the MLP is defined in Equation (1).

h = σ (W^{(1)} x + b^{(1)}), \hat{y} = W^{(2)} h + b^{(2)}

(1)

For an MLP with L hidden layers, the overall mapping can be expressed as a composition of L such transformations, enabling the network to approximate arbitrary continuous functions and to perform high-dimensional nonlinear mappings from input to output [30].

Building upon the MLP framework, this study incorporates an attention mechanism to dynamically weight input features. The attention module is described as a sample-wise feature reweighting mechanism that learns a weight distribution over input dimensions (Equations (2) and (3)), thereby enabling feature-level attribution and enhancing model interpretability. The attention weights are normalized using the Softmax function (Equation (2)), and the weighted feature representation is obtained by a weighted aggregation (Equation (3)).

α_{i} = \frac{\exp (e_{i})}{\sum_{j = 1}^{n} \exp (e_{j})}

(2)

r = \sum_{i = 1}^{n} α_{i} h_{i}

(3)

The resulting vector r is propagated to subsequent network layers to generate final predictions. This mechanism reduces the influence of noisy or redundant signals and supports more stable performance under heterogeneous operator behaviors [31]. In addition, the learned attention weights provide feature-level attribution, enabling identification of the most influential factors underlying takeover performance. To stabilize training and improve generalization, three strategies were incorporated: (i) dropout to mitigate overfitting by randomly deactivating neurons during training; (ii) class weighting to counter class imbalance and reduce majority-class dominance in the loss function; and (iii) stratified K-fold cross-validation to preserve label proportions across folds and obtain variance-reduced performance estimates. Collectively, these measures enhance model robustness under limited data and imbalanced distributions while ensuring rigorous out-of-sample evaluation. In addition to RF, this study further included XGBoost (version 3.0.2), a representative gradient-boosted decision tree method, as a strong baseline for tabular data. XGBoost was evaluated under the same stratified 5-fold protocol for a fair comparison with MLPA.

Model choice rationale and comparison with baseline models. To strengthen the evaluation of modeling options, we implemented an RF model as a representative non-linear baseline. RF has been widely adopted in operator-state and takeover-related studies due to its robustness to noisy inputs, strong performance in small-sample settings, and capability of modeling nonlinear interactions. Compared with RF, the proposed MLP–attention model supports end-to-end representation learning and sample-wise feature reweighting, providing a flexible mechanism to capture heterogeneous takeover patterns. We acknowledge that other advanced alternatives are also relevant; however, given the structured feature design and limited sample size in this study, we selected RF as a strong and interpretable baseline for a fair comparison, leaving broader benchmarking with additional advanced architectures for future work.

4. Results and Analysis

This section presents a comprehensive analysis of the performance and interpretability of the proposed attention-integrated MLP model for evaluating train operators’ emergency takeover effectiveness in semi-automated urban rail transit scenarios. The analysis is conducted from multiple perspectives, including classification accuracy, generalization performance across cross-validation folds, attention weight distributions, and error patterns in classification results. Through detailed comparison with a baseline RF model, this section highlights the performance advantages and stability of the proposed method while further exploring the model’s internal logic through analysis of attention allocation and misclassification behaviors. Collectively, these analyses provide robust evidence for the feasibility and effectiveness of the proposed model in supporting safety assessment and decision-making in complex operating environments.

4.1. Model Performance Evaluation and Stability

The model demonstrated strong performance across multiple evaluation metrics, including accuracy and F1-score. Notably, the macro-averaged F1-score—reflecting overall performance across multiple classes—indicated the model’s balanced capability in distinguishing among different categories. Compared with the baseline RF model, the proposed approach yielded a higher macro-averaged F1-score, suggesting improved class-wise balance. From an operational safety perspective, a higher macro-F1 is desirable because it decreases the probability of treating minority but safety-critical samples, including low-quality takeovers, as acceptable performance levels. All evaluation metrics were defined based on the confusion-matrix terms, including true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). Accuracy, precision, recall, and F1-score were computed as follows:

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

(4)

Precision = \frac{TP}{TP + FP}

(5)

Recall = \frac{TP}{TP + FN}

(6)

F 1 - score = \frac{2 \times Precision \times Recall}{Precision + Recall} = \frac{2 TP}{2 TP + FP + FN}

(7)

For multi-class evaluation, precision, recall, and F1-score were computed for each class in a one-vs-rest manner and then macro-averaged across classes to ensure balanced assessment under class imbalance.

The proposed attention-integrated MLP model demonstrated stable performance across repeated independent runs, with a macro-averaged F1-score of approximately 94.1% (SD = 2.7%). Under 5-fold cross-validation, the model achieved an average accuracy of 0.860 and a macro-F1 of 0.857, with the best-performing fold (Fold 1) yielding an F1-score of 0.895. Accuracy exceeded 80% across all folds, indicating consistent classification reliability. Importantly, an F1-score of 0.857 suggests that the model can simultaneously maintain both precision and recall when distinguishing between “Needs Improvement”, “Good”, and “Excellent” takeovers. In practical terms, this implies a lower risk of “unsafe acceptance”, i.e., incorrectly labeling low-quality takeovers as higher performance levels, which is particularly important for safety screening and targeted intervention in semi-automated metro operations.

4.2. Analysis of K-Fold Cross-Validation Results

In the 5-fold cross-validation, this study recorded the accuracy and macro-averaged F1-score of the proposed attention-integrated MLP model (MLPA), the Random Forest (RF) baseline, and a gradient-boosted tree baseline (XGBoost) for each fold. The fold-wise results are summarized in Table 4.

Overall, MLPA achieved stable performance across folds (average accuracy = 0.860, macro-F1 = 0.857), demonstrating consistent generalization under heterogeneous validation splits. RF exhibited larger fold-to-fold variance, with the best performance observed in Fold 5 (accuracy = 0.933, macro-F1 = 0.931), suggesting that its performance is sensitive to the sample composition of the validation set.

By contrast, XGBoost achieved consistently high performance across folds (average accuracy ≈ 0.945, macro-F1 ≈ 0.944). This outcome is expected under the current setting, as the second-stage evaluation relies on a low-dimensional tabular feature space (reaction time and operation accuracy) and the target labels are derived from a rule-based scoring procedure. Under such conditions, gradient-boosted decision trees can efficiently capture threshold-like decision patterns.

Nevertheless, the proposed MLPA framework remains beneficial for process-consistent conditional evaluation and for future scalability to richer multimodal features, where attention-based learning can provide more flexible feature reweighting and attribution.

4.3. Attention Weight Distribution Analysis

Since event recognition accuracy is treated as a prerequisite in the first stage, the attention mechanism in the second stage distributes weights only between two features: reaction time and operation accuracy. The resulting attention weights reflect the model’s relative emphasis on these two input features. Overall, the model tends to assign slightly higher attention weights to operation accuracy than to reaction time. Across multiple independent runs, the model consistently converged to an average attention weight distribution of approximately 60% for operation accuracy and 40% for reaction time. This pattern indicates that, under the assumption of correct event recognition, the model considers the precision of operation execution to exert a slightly greater influence on distinguishing takeover performance levels than reaction speed. Specifically, higher operation accuracy increases the likelihood that takeover performance will be classified at a higher level; nevertheless, timely reaction remains a critical factor for successful takeover and therefore still receives substantial attention weight. The relatively close attention weights assigned to the two features suggest that rapid response continues to play an important role in successful takeover. However, when event recognition is correct, the quality of operational execution may become the primary determinant in differentiating takeover performance levels.

Figure 4 illustrates the evolution of attention weights during the training process for a representative cross-validation fold. As shown in the figure, the model initially assigns different attention weights to reaction time and operation accuracy, which gradually stabilize as training progresses. In this fold, the orange curve representing the attention weight for operation accuracy decreases slightly before converging to a dominant level, whereas the blue curve for reaction time increases modestly and subsequently stabilizes. This dynamic adjustment indicates that the model progressively allocates attention to the features that are most informative for classification. Because operation accuracy contributes more strongly to distinguishing takeover performance levels in this fold, the model ultimately assigns greater attention weight to this feature. Meanwhile, the attention weight assigned to reaction time, although slightly lower, remains substantial, reflecting its continued relevance under this condition. Overall, training processes across different folds exhibit a consistent pattern of dynamic adjustment followed by convergence of attention weights, suggesting that the model adapts its feature emphasis to data characteristics and ultimately settles into a stable distribution. This converged attention distribution not only contributes to improved classification performance but also enhances model interpretability, as inspection of attention weights allows intuitive identification of the features considered most critical in evaluating train operator takeover performance.

4.4. SHAP-Based Feature Attribution Analysis

To strengthen interpretability using a standard XAI approach, SHapley Additive exPlanations (SHAP) was applied to quantify feature contributions to the second-stage MLPA predictions. As shown in Figure 5a, global SHAP importance based on mean absolute Shapley values indicates that both reaction time and operation accuracy contribute to takeover-performance assessment, while reaction time plays a dominant role. Specifically, reaction time shows higher overall contribution (mean(|SHAP|) ≈ 0.256) than operation accuracy (mean(|SHAP|) ≈ 0.082), suggesting that faster responses are more critical for predicting the “Excellent” outcome under correct event recognition.

The dependence plots in Figure 5b,c further confirm consistent feature effects: shorter reaction time yields positive SHAP values and increases the probability of being classified as “Excellent”, whereas delayed reactions reduce it. Operation accuracy shows a smaller but directionally consistent effect, where higher compliance with speed/braking limits increases the predicted likelihood of excellent takeover performance.

Overall, SHAP-based attribution aligns with the attention-weight analysis in Section 4.3, jointly supporting the relevance of these two reaction-phase indicators and highlighting reaction timeliness as the key determinant for high-level takeover outcomes.

4.5. Confusion Matrix and Misclassification Analysis

The confusion matrix provides an intuitive visualization of the classification results for each takeover performance level obtained by the MLPA and RF models. As illustrated in Figure 6, most samples are concentrated along the diagonal, indicating a high proportion of correct predictions. Notably, the “Excellent” category achieved a recall rate of 100%, indicating that all truly excellent takeover instances were correctly identified and that high-performance behaviors were rarely missed. Misclassifications primarily occurred between adjacent categories: a small number of samples labeled as “Good” were misclassified as “Excellent,” and some “Needs Improvement” samples were classified as “Good.” This pattern suggests that the model may slightly overestimate performance near decision boundaries, while rarely confusing poor performance with the highest performance level. From a safety-management viewpoint, boundary confusions are less critical than extreme confusions (e.g., “Needs Improvement” classified as “Excellent”), and the absence of such extreme errors supports the suitability of the model for operational screening purposes. As a result, misclassifications concentrated in the “Good” category led to slightly lower precision and recall for this class compared with the others. In contrast, the RF model exhibited more precise classification for the “Needs Improvement” category, with no misclassifications, and performed slightly better than the MLPA model in identifying “Good” cases. Both models achieved perfect prediction for the “Excellent” category. Overall, both models were able to correctly assess takeover performance in most cases, with only minor deviations occurring near category boundaries.

4.6. Statistical Testing and Significance Tests

On the independent, subject-wise test set, paired predictions of the MLPA and baseline models were compared using McNemar’s test, which is specifically designed for paired nominal outcomes and is widely used to test whether two classifiers differ significantly in error rates on the same instances [32]. Given the limited number of discordant pairs, we adopted the exact McNemar test to avoid inflated Type I error. In addition, uncertainty in the differences in macro-F1 and overall accuracy (MLPA−RF) was quantified using a subject-stratified bootstrap with B = 2000 resamples; confidence intervals excluding zero were interpreted as statistically significant. Table 5 summarizes the performance measures and the corresponding inferential tests.

As summarized in Table 5, on the final test set, the two models exhibited comparable overall performance, with accuracies of 0.913 (MLPA) and 0.920 (RF), and macro-F1 scores of 0.896 and 0.902, respectively. The forest plot in Figure 7 visualizes the differences in accuracy and macro-F1 (MLPA–RF) together with 95% bootstrap confidence intervals; in both cases, the intervals straddle zero, indicating no statistically reliable advantage for either model. Consistently, McNemar’s exact test yielded k = 6 and p = 1.000 (b = 6, c = 7), indicating no significant difference on contested instances; the corresponding odds ratio was OR = b/c = 0.857 (95% CI [0.238, 2.979]). To examine class-level effects, Figure 8 reports per-class ΔF1 (MLPA–RF) with 95% confidence intervals; these intervals largely include zero across classes, suggesting no class-specific advantage. Taken together, these complementary analyses corroborate performance parity between the MLPA and RF models on the held-out, subject-wise test set.

In this study, the RF model marginally outperformed the attention-integrated MLP model on the final test set. A plausible explanation lies in the low-dimensional feature space (two input variables) and the modest sample size, which constrain the expressive advantages of deep neural networks while favoring tree-based ensembles that can rapidly fit decision boundaries; prior studies similarly report that RF models often excel under limited-data conditions, whereas neural networks typically require larger datasets to achieve comparable performance [33]. In addition, because the target labels were generated based on rule-based criteria, decision trees are able to capture threshold-like decision rules directly, whereas neural networks tend to approximate smoother functional mappings. Under conditions characterized by few features, limited data, and rule-based labels, RF therefore exhibited a small but consistent performance advantage.

Although the RF model achieved slightly higher predictive performance in this specific task, the attention-integrated MLP model demonstrated strong classification capability in terms of overall performance trends. As feature dimensionality and dataset size increase, the structural advantages of neural networks and the benefits of attention mechanisms are expected to become more pronounced, potentially leading to further improvements in predictive performance for train operator takeover safety assessment.

5. Conclusions

This study proposes a two-stage evaluation framework based on a multilayer perceptron (MLP) integrated with an attention mechanism to assess train operator emergency takeover performance in semi-automated urban rail transit environments. The main findings are summarized as follows.

(1): The proposed framework achieves stable subject-wise performance, with an average accuracy of 0.86 and a macro-F1 of 0.857, demonstrating robust generalization across participants.
(2): Feature attribution analyses based on attention weighting and SHAP consistently indicate that, after correct event recognition, takeover performance is mainly determined by reaction time and operational accuracy, highlighting the key determinants of reaction-phase effectiveness.
(3): Error analysis shows that misclassifications mainly occur between adjacent levels, and no extreme mislabeling is observed, supporting the reliability of the proposed evaluation for safety-related screening.

This study has several limitations. The sample size was relatively modest, which may have limited statistical power and the generalizability of the proposed model. The experimental data were collected in a simulation setting, and the applicability of the findings to real-world metro operations warrants further validation. Although RF and XGBoost were included as baseline models, comparisons with additional advanced methods, such as temporal deep learning architectures and transformer-based models, remain to be explored. Future studies should consider integrating richer physiological and behavioral features and conducting validation on real operational platforms to further enhance the model’s generalizability and practical relevance.

Author Contributions

The authors contributed to the paper as follows: study conception and design: H.J., Y.H.; data collection: H.J., Y.H., F.W., L.Z., Z.L.; analysis and interpretation of results: H.J., Y.H., F.W.; draft manuscript preparation: H.J., Y.H., F.W. All authors have read and agreed to the published version of the manuscript.

Funding

This project was supported by the National Natural Science Foundation of China (Grant No. 52302438), via the “Research on behavior pattern and human error of emergency response of multi-functional personnel for fully automatic metro” project, and support was also received from the Shanghai Key Laboratory of Urban Regeneration and Spatial Optimization Technology.

Institutional Review Board Statement

The need for ethical review and approval were waived for this study in accordance with the national regulation “Ethical Review Measures for Life Science and Medical Research Involving Human Subjects” (National Health Commission of the People’s Republic of China; Ministry of Education; Ministry of Science and Technology; National Administration of Traditional Chinese Medicine, February 2023), as the research involved a non-interventional, simulation-based experiment with adult participants, anonymized data collection, and no sensitive personal information.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are not publicly available due to privacy and ethical restrictions involving human participants. Data are available from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to thank the technical staff who provided support for the rail transit driving simulation platform and assisted with the experimental setup and data collection. No AI models or LLMs were used in this paper.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

MLPA	Multilayer Perceptron Attention
RF	Random Forest
SVM	Support Vector Machine
LSTM	Long Short-Term Memory
BLR	Binary Logistic Regression
SEM	Structural Equation Modeling
CNN	Convolutional Neural Network

References

GB/T 32590.1—2024; Urban Rail Transit Transportation Management and Command/Control System—Part 1: System Principles and Basic Concepts. State Administration for Market Regulation (SAMR): Beijing, China; Standardization Administration of China (SAC): Beijing, China; China Standards Press: Beijing, China, 2024. Available online: https://openstd.samr.gov.cn/bzgk/gb/index (accessed on 29 January 2026).
Ministry of Transport of the People’s Republic of China. Emergency Response Plan for Urban Rail Transit Operation Incidents. Beijing, China, 2015. Available online: https://xxgk.mot.gov.cn/2020/xzgfxwj/202408/t20240807_4147491.html (accessed on 29 January 2026).
Ministry of Transport of the People’s Republic of China. Notice on Issuing the Technical Specification for Fully Automated Urban Rail Transit Operation System (Trial) (Jiao Yun Gui [2023] No. 1). Beijing, China, 2023. Available online: https://www.gov.cn/zhengce/zhengceku/202501/content_6996681.htm (accessed on 29 January 2026).
Brandenburger, N.; Naumann, A. On Track: A Series of Research about the Effects of Increasing Railway Automation on the Train Driver. IFAC Pap. 2019, 52, 288–293. [Google Scholar] [CrossRef]
Ma, N. Emergency Handling of Fully Automated Urban Rail Transit Lines under Unattended Operation Mode. Urban Rail Transit Res. 2021, 24, 82–84. [Google Scholar] [CrossRef]
China Illuminating Engineering Society. Survey on Distraction and Takeover Modes in Autonomous Driving—B [Based on Multi-modal TOR Stimuli] [R/OL]. 20 May 2024. Available online: https://www.ifal-forum.com/nd.jsp?id=820 (accessed on 25 June 2025).
Ma, S.; Zhang, W.; Shi, J.; Yang, Z. The human factors of the take-over process in conditional automated driving based on cognitive mechanism. Adv. Psychol. Sci. 2020, 28, 150–160. [Google Scholar]
Ito, T.; Takata, A.; Oosawa, K. Time Required for Take-Over From Automated to Manual Driving (No.2016-01-0158); SAE Technical Paper Series; SAE International: Warrendale, PA, USA, 2016. [Google Scholar]
Wu, H.; Zhou, X.; Lyu, N.; Wang, Y.; Xu, L.; Yang, Z. A Review of Methods for Predicting Driver Take-Over Time in Conditionally Automated Driving. Sensors 2025, 25, 6931. [Google Scholar] [CrossRef]
Hurwitz, D.S.; Heaslip, K.P.; Schrock, S.D.; Swake, J.; Marnell, P.; Tuss, H.; Fitzsimmons, E. Implications of distracted driving on start-up lost time for dual left-turn lanes. J. Transp. Eng. 2013, 139, 923–930. [Google Scholar] [CrossRef]
Zhai, J.; Lu, G.; Chen, F. The effects of extra cognitive workload on drivers’ driving and takeover performance. In Proceedings of the International Conference on Transportation and Development 2020 (ICTD 2020); American Society of Civil Engineers: Reston, VA, USA, 2020. [Google Scholar]
Ma, Y.; Dong, F.; Qin, Q.; Guo, Y. Risk evaluation model of autonomous driving takeover based on driving risk field. J. Harbin Inst. Technol. 2024, 56, 106–112. [Google Scholar] [CrossRef]
Jiao, Y.; Tan, Y.; Zhang, X.; Sun, Z.; Fu, L.; Wen, C.; Jiang, C. Label-Less Learning for Urban Railway Transit Driver Fatigue Detection with Heart Rate Variability. Transp. Res. Rec. J. Transp. Res. Board 2023, 2677, 11–23. [Google Scholar]
Liu, H.; Zhou, Y.; Jiang, C. Classifying metro drivers’ cognitive distractions during manual operations using machine learning and random forest-recursive feature elimination. Sci. Rep. 2025, 15, 7564. [Google Scholar] [CrossRef] [PubMed]
Wright, T.J.; Samuel, S.; Borowsky, A.; Zilberstein, S.; Fisher, D.L. Experienced Drivers Are Quicker to Achieve Situation Awareness Than Inexperienced Drivers in Situations of Transfer of Control Within a Level 3 Autonomous Environment. In Proceedings of the 60th Annual Meeting of the Human Factors and Ergonomics Society, Washington, DC, USA, 19–23 September 2016; pp. 270–273. [Google Scholar]
Soares, S.; Lobo, A.; Ferreira, S.; Cunha, L.; Couto, A. Takeover performance evaluation using driving simulation: A systematic review and meta-analysis. Eur. Transp. Res. Rev. 2021, 13, 47. [Google Scholar] [CrossRef]
Jin, M.; Lu, G.; Chen, F.; Shi, X.; Tan, H.; Zhai, J. Modeling takeover behavior in level 3 automated driving via a structural equation model: Considering the mediating role of trust. Accid. Anal. Prev. 2021, 157, 106156. [Google Scholar] [CrossRef]
Du, N.; Zhou, F.; Pulver, E.M.; Tilbury, D.M.; Robert, L.P.; Pradhan, A.K.; Yang, X.J. Predicting driver takeover performance in conditionally automated driving. Accid. Anal. Prev. 2020, 148, 105748. [Google Scholar] [CrossRef]
Al-Mahbashi, M.; Li, G.; Peng, Y.; Al-Soswa, M.; Debsi, A. Real-Time Distracted Driving Detection Based on GM-YOLOv8 on Embedded Systems. J. Transp. Eng. Part A Syst. 2025, 151, 04024126. [Google Scholar] [CrossRef]
Li, P.; Abdel-Aty, M. Real-Time Crash Likelihood Prediction Using Temporal Attention–Based Deep Learning and Trajectory Fusion. J. Transp. Eng. Part A Syst. 2022, 148, 04022043. [Google Scholar] [CrossRef]
Sun, Z.; Liu, H.; Jiao, Y.; Zhang, C.; Xu, F.; Jiang, C.; Yu, X.; Wu, G. Machine learning noise exposure detection of rail transit drivers using heart rate variability. Transp. Saf. Environ. 2024, 6, tdad028. [Google Scholar] [CrossRef]
Chen, L.; Li, D.; Wang, T.; Chen, J.; Yuan, Q. Driver Takeover Performance Prediction Based on LSTM–BiLSTM–Attention Model. Systems 2025, 13, 46. [Google Scholar] [CrossRef]
Wang, L.; Yao, W.; Chen, C.; Yang, H. Driving Behavior Recognition Algorithm Combining Attention Mechanism and Lightweight Network. Entropy 2022, 24, 984. [Google Scholar] [CrossRef] [PubMed]
Wang, C.; Li, Z.; Zhao, X.; Sun, Q.; Fu, R.; Guo, Y.; Yuan, W. Review of Image and Deep Learning Based Algorithms for Driver State Monitoring. China J. Highw. Transp. 2025, 38, 324–347. [Google Scholar]
Girma, A.; Amsalu, S.; Workineh, A.; Khan, M.; Homaifar, A. Deep Learning with Attention Mechanism for Predicting Driver Intention at Intersection. In Proceedings of the IEEE Intelligent Vehicles Symposium, Las Vegas, NV, USA, 19 October–13 November 2020; pp. 1183–1188. [Google Scholar] [CrossRef]
Ma, Y.; Lu, J.; Zhu, J.; Han, X. Take-over Performance Prediction Under Different Cognitive Loads of Non-driving Tasks in Highly Automated Driving. Automot. Eng. 2023, 45, 2330–2337. [Google Scholar]
Yao, R.; Xu, W.; Guo, W. Recognition of Driver Takeover Behavior and Intention Based on Factorized Long Short-Term Memory. J. Jilin Univ. Eng. Technol. Ed. 2023, 53, 758–771. [Google Scholar] [CrossRef]
Li, Z.; Dong, A.; Zhao, X.; Yang, L.; Duan, X.; Chen, L.; Wang, J.; Zhang, H.; Huang, P.; Lu, M.; et al. Evaluation and Classification of Lane Change Trajectories in L3 Autonomous Driving Under Accident Takeover Scenarios. Sci. Technol. Eng. 2022, 22, 8930–8937. [Google Scholar] [CrossRef]
Kim, J.H.; Kim, Y.; Cho, Y.; Kim, T.K.; Jang, T.; Park, C.; Kang, S.K. Biosignal-based attention monitoring for evaluating train driver safety-relevant tasks. Transp. Res. Part F Traffic Psychol. Behav. 2025, 111, 1–13. [Google Scholar] [CrossRef]
Wang, Y. Analysis and Prediction of Battery Capacity Degradation Causes Based on User Behavior Feature Labels Driven by Data. Model. Simul. 2024, 13, 6355–6364. [Google Scholar] [CrossRef]
Zhang, B.; Jiang, Z.; Li, J. Multi-Stream Behavior Recognition Network Fused with Multi-Modal Features. Comput. Sci. Appl. 2021, 11, 451–460. [Google Scholar]
McNemar, Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 1947, 12, 153–157. [Google Scholar] [CrossRef]
Roßbach, P. Neural Networks vs. Random Forests—Does It Always Have to Be Deep Learning? Research Report; Frankfurt School of Finance & Management: Frankfurt, Germany, 2018. [Google Scholar]

Figure 1. Experimental equipment.

Figure 2. Experimental design flowchart. (GoA = Grade of Automation; ATP = Automatic Train Protection.).

Figure 3. Conditional two-stage framework (classification–regression hybrid architecture). Abbreviations: MLP = Multilayer Perceptron; ReLU = Rectified Linear Unit; RT = Reaction Time; IA = Initial Action Accuracy.

Figure 4. Evolution of attention weights during model training across different cross-validation folds.

Figure 5. SHAP-based feature attribution for the second-stage MLPA model.

Figure 6. Confusion matrix comparison between the MLP with attention mechanism model and the RF model.

Figure 7. Overall differences in accuracy and macro-F1 (MLPA–RF) with 95% subject-stratified bootstrap CIs. (The vertical dashed line denotes no difference.).

Figure 8. Per-class ΔF1 (MLPA–RF) with 95% subject-stratified bootstrap CIs (class 0 = Needs Improvement; class 1 = Good; class 2 = Excellent).

Table 1. Grades of automation (GoA) and functional differentiation in urban rail transit under GB/T 32590.1 [1].

Grade of Automation	Type of Train Operation	Setting the Train in Motion	Stopping Train	Door Closure	Operation in the Event of Disruption
GoA1	ATP with driver	Driver	Driver	Driver	Driver
GoA2	ATP and ATO with driver	Automatic	Automatic	Driver	Driver
GoA3	Driverless	Automatic	Automatic	Train attendant	Train attendant
GoA4	UTO (Unattended)	Automatic	Automatic	Automatic	Automatic

Table 2. Comparison between related studies and this study.

Reference	Driving Scenario		Emergency Event Types			Evaluation Methods	Interpretability
Reference	Area	Automation Level	Taxonomy	Experimental Control	Takeover Level	Evaluation Methods	Interpretability
Ma et al. [26]	Road	L3	−	√	1	RF	M
Yao et al. [27]	Road	L3	Multi	−	1	LSTM-SVM	−
Li et al. [28]	Road	L3	Multi	√	1	SVM	−
Du et al. [18]	Road	L3	−	√	1	RF	M
Jin et al. [17]	Road	L3	Multi	√	1	SEM	H
Wang et al. [23]	Road	L3	Multi	−	1	CNN	M
Girma et al. [25]	Road	L3	−	√	1	LSTM	−
Kim et al. [29]	Rail	GoA2	Multi	−	2	LSTM	−
Jiao et al. [13]	Rail	GoA2	−	√	3	RF	M
Liu et al. [14]	Rail	GoA2	−	−	3	RF-RFE	M
Sun et al. [21]	Rail	GoA2	−	√	2	RF	M
This study	Rail	GoA2	Multi	√	3	MLPA-RF	H

Driving Scenario: (a) Area: road traffic (Road); rail transit (Rail). (b) Automation Level: L3 follows SAE J3016; GoA2/GoA3 follow GB/T 32590.1 [1]; Types of Emergency Events: (a) Taxonomy: one scenario (−); multiple scenarios (Multi). (b) Experimental control: high experimental control (√); low–medium control (−). (c) Takeover level: direct alerts (1); indirect alerts (2); no explicit alert (3); Evaluation Methods: Multilayer Perceptron Attention (MLPA); Random Forest (RF); Support Vector Machine (SVM); Long Short-Term Memory (LSTM); Binary Logistic Regression (BLR); Structural Equation Modeling (SEM); Convolutional Neural Network (CNN); Interpretability: Intrinsically interpretable or with model-internal attribution (H); partly interpretable via post hoc tools (M); mainly black-box requiring local approximations (−).

Table 3. Evaluation indicators for takeover performance.

Indicators	Definition	Remarks
Event Identification Accuracy	Whether the train operator correctly identifies the type of the unexpected event.	Binary outcome (correct/incorrect).
Takeover Reaction Time	The time interval between the onset of the event and the train operator’s initiation of the first takeover action.	Measured in seconds based on the timestamps from the eye tracker.
Initial Action Accuracy	The effectiveness of the train operator’s initial takeover action.	For speed-restriction events, it is additionally verified whether the train speed is reduced to within the prescribed limit.

Table 4. Performance comparison of models under 5-fold cross-validation.

Fold Index	Accuracy (MLPA)	F1-Score (MLPA)	Accuracy (RF)	F1-Score (RF)	Accuracy (XGBoost)	F1-Score (XGBoost)
Fold 1	0.900	0.895	0.733	0.726	0.960	0.959
Fold 2	0.867	0.862	0.833	0.821	0.953	0.952
Fold 3	0.867	0.869	0.800	0.789	0.940	0.938
Fold 4	0.867	0.865	0.767	0.754	0.927	0.924
Fold 5	0.800	0.792	0.933	0.931	0.947	0.945
Average	0.860	0.857	0.813	0.804	0.945	0.944

Table 5. Significance summary.

Section	Measure	MLPA	RF	Difference (MLPA–RF)	Inference
Overall performance	Accuracy	0.913	0.920	—	—
Overall performance	Macro-F1	0.896	0.902	—	—
Paired significance (McNemar)	Discordant pairs	—	—	b = 6, c = 7 (b + c = 13)	Exact McNemar: k = 6, p = 1.0000
Paired significance (McNemar)	Odds ratio (b/c)	—	—	0.857	95% CI [0.302, 2.431]
Effect size and uncertainty (95% CIs)	ΔAccuracy	—	—	−0.006	95% CI [−0.053, 0.040]
Effect size and uncertainty (95% CIs)	ΔMacro-F1	—	—	−0.003	95% CI [−0.055, 0.048]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ji, H.; Huang, Y.; Wang, F.; Zhu, L.; Liu, Z. Emergency Takeover Performance Evaluation of Train Operators in Semi-Automated Urban Rail Transit: An Attention-Enhanced MLP Approach. Appl. Sci. 2026, 16, 1820. https://doi.org/10.3390/app16041820

AMA Style

Ji H, Huang Y, Wang F, Zhu L, Liu Z. Emergency Takeover Performance Evaluation of Train Operators in Semi-Automated Urban Rail Transit: An Attention-Enhanced MLP Approach. Applied Sciences. 2026; 16(4):1820. https://doi.org/10.3390/app16041820

Chicago/Turabian Style

Ji, Hangrui, Yuanchun Huang, Fangsheng Wang, Lin Zhu, and Zhigang Liu. 2026. "Emergency Takeover Performance Evaluation of Train Operators in Semi-Automated Urban Rail Transit: An Attention-Enhanced MLP Approach" Applied Sciences 16, no. 4: 1820. https://doi.org/10.3390/app16041820

APA Style

Ji, H., Huang, Y., Wang, F., Zhu, L., & Liu, Z. (2026). Emergency Takeover Performance Evaluation of Train Operators in Semi-Automated Urban Rail Transit: An Attention-Enhanced MLP Approach. Applied Sciences, 16(4), 1820. https://doi.org/10.3390/app16041820

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Emergency Takeover Performance Evaluation of Train Operators in Semi-Automated Urban Rail Transit: An Attention-Enhanced MLP Approach

Abstract

1. Introduction

1.1. Literature Review

1.2. Focus of This Study

2. Experimental Design

2.1. Participants

2.2. Experimental Apparatus

2.3. Experimental Procedure

2.4. Evaluation Indicators

3. Model Development

Model Architecture

4. Results and Analysis

4.1. Model Performance Evaluation and Stability

4.2. Analysis of K-Fold Cross-Validation Results

4.3. Attention Weight Distribution Analysis

4.4. SHAP-Based Feature Attribution Analysis

4.5. Confusion Matrix and Misclassification Analysis

4.6. Statistical Testing and Significance Tests

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI