Prediction of Mental Fatigue for Control Room Operators: Innovative Data Processing and Multi-Model Evaluation

Chen, Yong; Chen, Jiangtao; Xie, Xian; Yi, Wenchao; Ji, Zuzhen

doi:10.3390/math13172794

Open AccessArticle

Prediction of Mental Fatigue for Control Room Operators: Innovative Data Processing and Multi-Model Evaluation

by

Yong Chen

,

Jiangtao Chen

,

Xian Xie

,

Wenchao Yi

and

Zuzhen Ji

^*

Department of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310000, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(17), 2794; https://doi.org/10.3390/math13172794

Submission received: 18 July 2025 / Revised: 22 August 2025 / Accepted: 29 August 2025 / Published: 30 August 2025

(This article belongs to the Special Issue Innovative Methods in Long Sequence Forecasting and Time Series Analysis)

Download

Browse Figures

Versions Notes

Abstract

When control room operators encounter mental fatigue, the accuracy of their work will decline. Accurately predicting the mental fatigue of industrial control room operators is of great significance for preventing operational mistakes. In this study, facial data of experimental participants were collected via cameras, and fatigue levels were evaluated using an improved Karolinska Sleepiness Scale (KSS). Subsequently, a dataset of fatigue samples based on facial features was established. A novel early-warning framework was put forward, framing fatigue prediction as a time series prediction task. Two innovative data processing techniques were introduced. Reverse data binning transforms discrete fatigue labels into continuous values through a random perturbation of ≤0.3, enabling precise temporal modeling. A fatigue-aware data screening method uses the 6 s rule and a sliding window to filter out transient states and preserve key transition patterns. Five prediction models, namely Light Gradient Boosting Machine (LightGBM), Gated Recurrent Unit (GRU), Temporal Convolutional Network (TCN), Transformer, and Attention-based Temporal Convolutional Network (Attention-based TCN), were evaluated using the collected dataset of fatigue samples based on facial features. The results indicated that LightGBM demonstrated outstanding performance, with an accuracy rate reaching 93.33% and an average absolute error of 0.067. It significantly outperformed deep learning models. Moreover, its computational efficiency further verified its suitability for real-time deployment. This research integrates predictive modeling with industrial safety applications, providing evidence for the feasibility of machine learning in proactive fatigue management.

Keywords:

prediction of mental fatigue; operator; computer vision; machine learning; LightGBM

MSC:

62M10

1. Introduction

Mental fatigue poses a serious threat to the operational safety of industrial control rooms, especially for operators in the chemical industry who are subjected to high-intensity monitoring tasks for long periods of time [1,2,3]. Existing research mainly focuses on real-time fatigue detection, yet the ability to predict the progression of fatigue remains an underexplored area. Unlike static detection, fatigue prediction requires modeling the process of its evolution over time, which is accompanied by subtle physiological changes and behavioral shifts that accumulate over time [4,5]. This gap is particularly evident in control rooms, where operators face unique challenges: repetitive tasks, high cognitive workloads, and prolonged screen exposure, all of which exacerbate the development of fatigue.

Traditional fatigue assessment methods struggle to meet the predictive requirements in industrial settings [6]. Subjective scales are lacking in real-time validity and are susceptible to bias [7]. In contrast, objective biological measurements, such as electroencephalogram (EEG), necessitate invasive sensors, which can interfere with the workflow [8]. Even though computer vision—based detection provides non-invasive monitoring, it frequently fails to account for the dynamic nature of fatigue [9]. Isolated analysis of facial features is unable to capture the continuous patterns of escalating fatigue, thereby restricting its utility in proactive intervention. These limitations underscore the urgent need for a predictive framework that harnesses time-series modeling to forecast fatigue states, enabling timely measures to be implemented before they impact operational safety [10,11].

The progression of fatigue is a complex temporal phenomenon. Subtle indicators, such as the frequency of gradual eye closures or variations in head posture, exhibit non-linear evolution [12]. Conventional models typically treat each observation in isolation, neglecting the long-term dependencies inherent in the data [13,14]. Additionally, the scarcity of longitudinal fatigue datasets and the ambiguity of the transient “intermediate state” between wakefulness and fatigue pose challenges to model training [15,16]. To surmount these obstacles, innovative data-processing techniques are required to transform discrete fatigue labels into continuous temporal targets. Simultaneously, architectures capable of extracting temporal features from noisy and sparse measurements are essential [17,18,19].

This paper proposes a new framework for predicting the mental fatigue of control room operators. This framework utilizes computer vision methods to collect facial data and combines an improved Karolinska Sleepiness Scale (KSS) [20] to assess the degree of fatigue, thereby constructing a fatigue sample dataset based on facial features. As a widely validated and standardized subjective assessment tool for measuring sleepiness and mental fatigue, KSS has the characteristics of a simple scoring range and high sensitivity to subtle changes in alertness, which aligns with the need to capture the dynamic mental fatigue state of control room operators. Moreover, its extensive application in human factors research ensures comparability with existing studies, making it a suitable choice for quantifying the target variable of this research (the degree of mental fatigue). The study defines fatigue prediction as a time series regression task. After innovative data processing, five models—LightGBM, GRU, TCN, Transformer, and Attention-based TCN—were evaluated. The results demonstrated the advantages of LightGBM in terms of high accuracy and low mean absolute error, and variance analysis was conducted for verification.

Mental fatigue progression is inherently individualized, with marked inter-individual differences in response patterns to prolonged cognitive demands—encompassing variations in baseline alertness, task engagement dynamics, and physiological manifestations such as facial movement characteristics. To address this variability, our framework incorporates a personalized approach to time series data processing, tailored to each participant. Specifically, continuous facial feature data, collected at 2 s intervals throughout the 30 min experimental task, is structured into individualized time series for each operator. These time series are segmented using seven-step sliding windows (corresponding to 15 s) to capture unique temporal trajectories of fatigue development, such as individual-specific rates of eye closure frequency or head posture adjustments. Concurrently, discrete fatigue labels derived from the modified KSS are converted into continuous time series targets via individual-specific reverse data binning, with random perturbations within ±0.3 applied independently to each participant’s labels. This ensures that the temporal dynamics of the target variables align with their actual fatigue progression. Furthermore, all predictive models are trained and validated on each individual’s time series data separately, enabling the framework to account for personalized patterns and enhancing the ecological validity of predictions in control room contexts.

The core contributions of this study are as follows: Firstly, two innovative data processing techniques are proposed: reverse data binning, which converts discrete fatigue labels from the improved KSS into continuous values through random perturbations (≤0.3) to enable precise temporal modeling; and a fatigue-aware data screening method based on the 6 s rule and sliding window, which filters out transient states and preserves key transition patterns to optimize the dataset for time-series prediction. Secondly, by defining mental fatigue prediction as a time-series regression task and integrating the aforementioned data processing techniques, a novel prediction framework is constructed, making it suitable for industrial control room scenarios. Thirdly, five prediction models (LightGBM, GRU, TCN, Transformer, and Attention-based TCN) are evaluated using the constructed facial feature dataset. The results indicate that LightGBM performs excellently, with an accuracy of 93.33% and a mean absolute error of 0.067. Moreover, this study not only compares the performance of multiple models but also validates the significance of differences through statistical tests (e.g., analysis of variance) to ensure the scientific rigor of selecting the optimal model. Finally, this research connects predictive modeling with industrial safety applications, providing empirical evidence for the feasibility of machine learning in proactive fatigue management and offering practical support for preventing operational errors caused by mental fatigue.

2. Literature on the Prediction of Mental Fatigue

Mental fatigue detection technology has significant application value in assessing human mental fatigue states and has been widely deployed in critical fields such as industrial safety [21] and transportation [22]. Mental fatigue refers to the sensations that people may experience after or during prolonged cognitive activities. These sensations are very common in daily life and typically involve feelings of tiredness or even exhaustion, aversion to continuing the current activity, and a decrease in the level of engagement with the task at hand [23,24]. Prediction methods have shown significant practicality in warning humans of fatigue and are widely applied in various industries, including control room operators. Most of the current research on mental fatigue prediction focuses on detection rather than prediction [25]. Although some studies use “prediction” in their titles, their main content is to classify and detect data using methods such as neural networks or machine learning, rather than predicting the mental fatigue state of the subjects in the future period.

Most of the predictions of mental fatigue are based on the fusion of physiological data. There have been many studies in this area, including the use of electroencephalogram data [26,27,28], eye movement data [29,30,31], electrocardiogram and heart rate data [32,33], and multimodal data [34,35,36], among others. Zhou et al. [37] conducted pioneering research in the field of driving fatigue prediction. By integrating multiple physiological features and employing a non-linear autoregressive exogenous (NARX) model, they provided a breakthrough solution for fatigue prediction in highly automated driving environments. This study innovatively utilized key physiological indicators such as heart rate variability and respiratory rate, combined with PERCLOS as the fatigue judgment criterion, and successfully constructed a fatigue state transition prediction model.

In the field of mental fatigue prediction, methods such as neural networks and machine learning are widely adopted. The process involves preprocessing and training data to generate predictive outcomes. Among the techniques used are classification models including Random Forest (RF) [38], Decision Tree (DT) [39], and Support Vector Machine (SVM) [40]. For example, researchers like Ma Yongqiang [41] have achieved significant progress in the area of urban driving fatigue monitoring. The model they constructed innovatively incorporates multiple dimensions, such as driving time, driving speed, and driving area. This enables the real-time identification of fatigue driving behaviors within urban environments. Significantly, the research team employed federated learning methods, which effectively protect the privacy of user data. Based on this, a predictive model for fatigue driving areas was established.

There are also time-domain models such as Transformer [42,43], Gated Recurrent Unit (GRU) [44], and Temporal Convolutional Networks (TCN) [45]. For instance, Pan et al. [46] designed a miner fatigue state recognition system grounded in a multimodal feature extraction and fusion framework. Physiological and facial data were collected via temperature sensors and cameras. After preprocessing, physiological features were extracted through time-frequency domain analysis. Facial features were retrieved using ResNeXt-50 and GRU. Subsequently, multiple features were fused using Transformer, ultimately enabling the recognition of fatigue states. The integration of physiological data with machine learning and reinforcement learning for mental fatigue prediction has exhibited remarkable flexibility and adaptability, thereby demonstrating great potential in academic research and practical applications.

Overall, existing research on mental fatigue prediction still faces three key limitations. Firstly, in terms of data acquisition, most studies rely on invasive physiological sensors (such as EEG caps and ECG electrodes) or complex multi-modal equipment, which not only interfere with the natural working state of operators but also have poor adaptability to industrial control room environments with frequent movements and electromagnetic interference. Secondly, in terms of technical approaches, the majority of studies focus on static state detection rather than dynamic trend prediction. Even those involving prediction often treat fatigue labels as discrete categories, failing to capture the continuous and gradual nature of fatigue evolution, which restricts the accuracy of early warning. Thirdly, in terms of model applicability, deep learning models such as Transformer and TCN, which are widely used in time series tasks, often require large-scale labeled data and high computational resources, making it difficult to meet the real-time deployment requirements of industrial scenarios. These limitations highlight the need for a more practical and targeted solution to realize effective mental fatigue prediction for control room operators.

3. Materials and Methods

3.1. Research Preamble

Current research on mental fatigue warning has certain limitations. In terms of data collection, traditional methods highly rely on sensors such as electrocardiogram (ECG) and electroencephalogram (EEG) that need to be worn on the body to collect physiological signals. These devices bring considerable challenges in practical applications. Firstly, wearing sensors themselves can disturb the subjects. Whether it is attaching electrode patches or wearing an EEG cap, it will cause a sense of restraint or discomfort, which inevitably affects their normal working state and concentration, and cannot truly reflect the natural fatigue in daily work. Secondly, these devices themselves are also prone to interference. For instance, the movement of wires during exercise, poor skin contact, or electromagnetic signals in the environment can all introduce “noise” [47]. This noise not only pollutes the collected physiological data, increasing the difficulty and error of subsequent analysis, but also may make the subjects feel annoyed or uncomfortable, exerting additional negative impacts on the experimental results. Therefore, traditional methods are difficult to meet the demand for continuous and natural monitoring in a real and undisturbed working environment. To address these issues, this study adopts a different approach: giving up body-contact sensors and instead using computer vision technology to capture the facial data of the subjects through ordinary cameras. The greatest advantage of this method is that the subjects do not need to wear any additional equipment, reducing the possibility of being disturbed at work. At the same time, it effectively avoids the noise interference caused by physical sensor contact, providing new possibilities for studying mental fatigue in more natural and realistic working scenarios.

In terms of data processing, to enhance the applicability of data for mental fatigue prediction, this study puts forward two innovative data-processing approaches. First, the reverse data binning technique is employed to process discrete integer fatigue labels. By introducing minute random perturbations, this approach transforms the originally discontinuous discrete labels into smooth continuous values. Such continuous-type processing is more consistent with the natural characteristic of the gradual change in fatigue, and it also significantly improves the adaptability of the data for time-series prediction models. Second, a new fatigue data screening method is proposed to optimize the training data used for prediction. This method aims to identify and retain data segments that can best represent the stable and progressive fatigue evolution process. Meanwhile, it filters out “noisy” data points that may be affected by short-term disturbances or are in atypical states, thus enhancing the quality and representativeness of the dataset.

In terms of data usage, this study places particular emphasis on the time-series characteristics of mental fatigue data. Different from fatigue detection [48], which merely determines the current state, the core of the prediction task is to infer future fatigue trends by making use of historical data. Therefore, it is of great significance to fully explore the time-dependent relationships in the data, and specialized time-series prediction models are required.

In terms of prediction analysis, considering the remarkable differences in individuals’ responses to fatigue, we construct (or adjust) a separate prediction model for each participant. The model uses the participant’s own data to predict their future fatigue state. After obtaining the prediction results of all individuals, a comprehensive analysis is carried out to draw conclusions at the group level.

In this study, two innovative data processing approaches are proposed, aiming to make the data more suitable for the time series prediction requirements of mental fatigue prediction. By treating the mental fatigue prediction problem as a time series prediction task, a prediction method based on machine learning and deep learning model architectures is proposed. Multiple time series prediction models (Including an innovative model) are applied and compared to identify the models with advantages. The method proposed in this paper has the characteristic of strong adaptability and can be applied in a wider range of scenarios.

3.2. Overall Framework

In this study, we propose a mental fatigue prediction model that combines computer vision with machine learning. The overall framework for predicting mental fatigue is shown in Figure 1.

The independent variables in this research include the facial feature data (extracted via computer vision, which serves as the foundational input for modeling) and the innovative data processing techniques (reverse data binning that converts discrete fatigue labels from the improved KSS into continuous values, and fatigue-aware screening based on the 6 s rule and sliding window). These are the core factors manipulated and analyzed to achieve accurate mental fatigue prediction, as reflected in the research framework and data processing procedures described in the manuscript.

The dependent variable is the mental fatigue level, which is quantified using the improved Karolinska Sleepiness Scale (KSS) with discrete values ranging from 1 to 5 (with a step of 1) and serves as the key target of the prediction task, consistent with the definition and application of fatigue labels in the data collection and processing sections.

The first step involves collecting the facial images of subjects by means of computer vision techniques. Subsequently, an algorithm is selected to extract facial feature points. The objective of image acquisition is to transform the original facial data into data that can be utilized. After that, the appropriate KSS (as shown in Table 1) is chosen. To reduce subjective evaluation errors, enhance the reliability of labels, and meet the requirements of model training, the modified KSS (as shown in Table 2) is employed for fatigue labeling. The fatigue score is obtained by calculating the average of the evaluations from experts and operators.

The second step focuses on data processing, which includes data preprocessing and data screening. During preprocessing, outliers and missing values are first addressed, and the data is then standardized. Subsequently, the reverse-based data binning technique is utilized to process the data labels. This step is intended to make the labels more consistent with the continuity requirements of time series prediction and to avoid prediction biases resulting from discretization. Data screening is based on a three-stage state model, with the aim of optimizing the dataset and retaining data that can reflect fatigue characteristics and state transition rules.

The third step entails applying multiple time series prediction models to the processed data for performance comparison. Given the continuous evolution of mental fatigue, this study defines fatigue prediction as a time series regression task to capture the subtle dynamic changes in fatigue states. To make the model output more in line with the safety decision-making requirements of industrial scenarios (such as determining whether to trigger a fatigue warning), the continuous values of regression prediction need to be mapped to discrete levels, and the Accuracy is calculated to evaluate the reliability of the model in classification decisions. The MAE is selected as the core regression metric to measure the deviation between the continuous predicted values and the actual values, reflecting the model’s sensitivity to changes in fatigue levels. Meanwhile, accuracy is introduced as an auxiliary metric to evaluate the matching degree between the discretized predicted results and the actual grades, verifying the practicality of the model in actual safety warnings. This dual-metric strategy not only retains the regression task’s fine modeling ability for the dynamic evolution of fatigue but also meets the decision-making needs based on thresholds in industrial scenarios, balancing methodological rigor and practical application value. Accuracy is defined as the proportion of the number of samples correctly detected by the model to the total number of samples (as shown in Equation (1)). It is one of the crucial metrics for evaluating model performance.

Accurary = \frac{T r u e S a m p l e s}{T o t a l S a m p l e s} \times 100 %

(1)

In this equation,

T r u e S a m p l e s

represents the number of samples for which the predicted value is consistent with the true value, and

T o t a l S a m p l e s

represents the total number of samples.

The mean absolute error (MAE) is one of the most frequently employed metrics for quantifying data precision (as shown in Equation (2)). It reflects the absolute error between the predicted values and the true values.

MAE = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - \hat{y_{ᶥ}}|

(2)

In this equation,

n

denotes the total number of samples in the test set,

y_{i}

represents the true value of the sample, and

\hat{y_{ᶥ}}

represents the predicted value of the sample.

Following this dual-metric evaluation framework, five time series prediction models—LightGBM, GRU, TCN, Transformer, and Attention-based TCN—were selected for performance comparison to identify the optimal solution for industrial control room scenarios. These models were chosen based on their distinct strengths in handling temporal data: LightGBM for its efficiency in processing high-dimensional features and balancing accuracy with computational speed; GRU and TCN for their capability to capture long-term dependencies in sequential data; Transformer for its global attention mechanism that highlights critical fatigue transition points; and the innovative Attention-based TCN, which integrates TCN’s convolutional structure with self-attention to enhance feature representation.

To statistically validate the significance of performance differences among these models, one-way analysis of variance (ANOVA) was employed. This method tests whether the observed variations in accuracy and MAE across models are due to inherent model characteristics rather than random errors. Post hoc pairwise comparisons were further conducted to pinpoint specific differences between individual models, ensuring that conclusions about model superiority—particularly regarding LightGBM’s outstanding performance—are supported by robust statistical evidence. This combination of multi-model comparison and rigorous statistical testing strengthens the reliability of the study’s findings, providing a solid basis for selecting the most suitable model for real-time mental fatigue prediction in industrial settings.

3.3. Data Collection and Data Processing

3.3.1. Data Collection and Labeling

In terms of data collection, this experiment utilized a camera at the top of the experimental operation table to capture the facial images of the participants. After collection, a feature point detection method (Landmark detection) was employed to extract facial feature points. A total of 68 feature points were extracted, distributed across different areas of the face, such as the eyebrows and nose. The facial feature points and the distribution map of the facial feature points’ coordinates are shown in Figure 2. Subsequently, the original Karolinska Sleepiness Scale (KSS) was adapted, converting the original ten-level classification into a five-level one. This step was taken to reduce the ambiguity of subjective assessment. The scoring was conducted simultaneously by experts and participants, and the average score of the three was taken as the result. If there was a decimal, it was rounded to the nearest whole number as the fatigue marker result.

3.3.2. Data Preprocessing

In terms of data preprocessing, the first step is to handle outliers. The purpose of outlier handling is to identify and deal with data points that are significantly different from the majority of the data in the dataset. Outlier handling methods include the Z-Score method [49] and the Interquartile Range (IQR) method [50], among others. In this experiment, the Z-Score method is chosen. Data points that do not fall within three standard deviations above or below the mean are identified as outliers and removed, resulting in blank values. These blank values are then replaced with the average of the two adjacent data points. Subsequently, normalization is performed to scale the data to a specific range, eliminating the dimensional differences between data features and making the data more suitable for machine learning algorithms or statistical analysis. In this experiment, data normalization is used to eliminate individual differences among different experimenters. Common data normalization methods include the Z-Score method, the Min–Max method [51], and the ratio method. However, the Z-Score method and the ratio method are not applicable to the data in this study. After processing with the Z-Score method, the data conforms to a standard normal distribution, while the ratio method is only suitable for sequences where all data are positive, which is not required in this study. Therefore, this study selects the Min–Max method for data normalization (as shown in Equation (3)). After linear transformation of the original data, the normalized results are mapped to the interval between [0, 1].

x_{n o r} = \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}}

(3)

In this equation,

x_{n o r}

denotes the result after normalization, x represents the original data value, and

x_{m i n}

and

x_{m a x}

represent the minimum and maximum values of the data, respectively.

3.3.3. Data Processing for Mental Fatigue Prediction

After the data preprocessing, to better adapt to the research on mental fatigue prediction, further data processing is carried out: data label processing based on reverse data binning technology. Inspired by the data binning technology [52], this technology can group continuous data, with each group represented by a common data, achieving the transformation from continuous to discrete data. The schematic diagram of data binning is shown in Figure 3. Specifically, Figure 3 illustrates the mechanism of data binning, where the x-axis represents the original continuous data values, and the y-axis denotes the frequency or count of data points in each bin. The diagram visually demonstrates how continuous data is segmented into discrete bins, each assigned a representative value, laying the foundation for understanding the subsequent reverse data binning technique in this study.

During the data collection process, labels for mental fatigue data were obtained using a modified Karolinska Sleepiness Scale (KSS). These labels are discrete data ranging from 1 to 5 with a step size of 1. Due to the characteristics of mental states under fatigue, a specific level of fatigue typically does not occur instantaneously but persists for a certain period. As a result, numerous straight lines appear in the line chart. Given these characteristics of mental fatigue data labels, it is unreasonable to directly perform time series prediction on them. Therefore, a reverse data binning technique was proposed to process the data labels, converting discrete data into continuous data.

The conversion of labels from discrete values to continuous values is achieved by actively introducing random perturbations into the fatigue levels (as shown in Equation (4)). Suppose for a fatigue level x (where x is an integer), a random perturbation within the range of (−σ, σ) is applied, transforming x from a fixed integer value to a random distribution within this range. Considering that the mental fatigue data labels are obtained by averaging the scores given by experts and the participants themselves and then rounding off, the range of σ is restricted to (0, 0.5]. If σ is greater than 0.5, it may lead to an upward or downward shift in the continuous fatigue level, thereby affecting subsequent prediction results.

To accurately determine the optimal value of σ, the study employed the GridSearchCV method [53] combined with a cross-validation strategy for parameter optimization. Specifically, the grid was divided into five candidate values: 0.1, 0.2, 0.3, 0.4, and 0.5. The generalization ability of the model under different σ values was evaluated through cross-validation—that is, the dataset was divided into a training set and a validation set each time, the model was trained using the training set, and its performance was tested on the validation set. This process was repeated multiple times to reduce the impact of randomness in data partitioning on the results. Finally, through grid search and cross-validation, it was found that when σ is 0.3, the cross-validation score is the highest, indicating that the model performs the most stably and excellently on different data subsets under this value. Therefore, σ was set to 0.3, and the operational formula for the continuous transformation of discretized data labels was obtained (as shown in Equation (4)). This parameter selection method combined with cross-validation effectively ensures the reliability of key parameters in the reverse data binning technique, laying a foundation for the accuracy of subsequent time series prediction models.

x_{i}^{n e w} = x_{i} + \in, \in ~ U n i f o r m (- 0.3, 0.3)

(4)

In this equation,

x_{i}

represents the original discretized integer data label sequence, and

\in

is a random variable that follows a uniform distribution within the range of (−0.3, 0.3), while

x_{i}^{n e w}

represents the data label sequence after continuous transformation. An example of discrete label continuous transformation is shown in Figure 4.

Following the data annotation procedure, it is essential to conduct a screening of the data employed for mental fatigue prediction. This research posits that mental fatigue is a complex and progressive psychophysiological phenomenon, which can be categorized into three distinct stages: non-fatigued, intermediate, and fatigued. The intermediate state, serving as a crucial transitional phase, encompasses physiological and psychological traits that mirror the dynamic alterations of fatigue.

The research aim is to establish an effective model for predicting the mental fatigue of operators. Given that incorporating all non-fatigue data during the training process may potentially disrupt model learning and diminish prediction accuracy, a systematic data screening mechanism has been introduced. This mechanism aims to retain the intermediate state data that can prominently reflect the characteristics of fatigue and the rules governing state transitions, thereby laying a solid data foundation for enhancing the prediction accuracy of the model.

Prior to the data screening procedure, two concepts are presented. First and foremost, studies have indicated that the reaction time of humans in response to emergencies can extend up to 15 s [54]. Building upon this finding, this research utilizes 15 s as a boundary value, and the operational duration of numerous steps will also be determined on the basis of this data.

Secondly, the concept of the 3 s rule [55] is introduced. The 3 s rule stipulates that when predicting mental fatigue, it is possible to detect the state 3 s prior to a given state, meaning that it is feasible to identify whether the fatigue state has changed 3 s in advance. Considering the diverse requirements for the reliability of experimental research, the time need not be rigidly fixed at 3 s; rather, it can range from 2 to 6 s.

Taking into account the specific circumstances of this study, where the state is determined every 2 s, and aiming to balance reliability with the early detection of fatigue transitions, 6 s has been selected as the length of the detection window. This can be referred to as the “6 s principle”.

The specific steps of data filtering can be roughly divided into three steps. In step one, the experimental data of each participant needs to be transformed. Specifically, the data is converted into a tensor with a step size of 7 based on the labels, and a sliding window of length 7 is constructed centered on each state. Since it was found in the experiment that the collection time of seven states is approximately 15 s, and in combination with the two concepts introduced earlier, both the step size and the length of the sliding window are set to 7. For example, if state

x_{4}

is to be analyzed, a sliding window from

x_{1}

to

x_{7}

is constructed centered on this state. Step two involves classifying the instantaneous states of the experimental participants. The mental states of the subjects are divided into four categories: State 1, State 2, State 3, and State 4. The specific contents corresponding to each of these four states will be provided later. According to the “6 s rule”, each state requires 2 s. Therefore, the basis for distinguishing these states is the first three states and the last three states within the sliding window. Taking State

x_{4}

as an example (instantaneous state), if the category of the state needs to be determined, the average values of the first three states and the last three states need to be calculated, that is,

Average (x_{1} + x_{2} + x_{3})

and

Average (x_{5} + x_{6} + x_{7})

. In this study, fatigue level 3 was chosen as the dividing line for distinguishing states. Based on this definition, the classification and explanation of mental fatigue states are shown in Table 3. The schematic diagram of the data screening logic is shown in Figure 5.

After determining the instantaneous states of all subjects in Step 3, screening is conducted according to the four mental fatigue prediction data screening rules detailed in Table 4.

Following data processing and screening, the data will exhibit two significant advantages. Firstly, it optimizes the dataset for mental fatigue prediction, enabling the timely anticipation of imminent mental fatigue. This corresponds precisely to the prediction of fatigue levels 4 and 5 within the fatigue assessment framework of this study. This procedure refines the data, rendering it more conducive to subsequent predictions of fatigue levels 4 and 5. Moreover, experimental measurements demonstrate that the screening criteria do not result in an inadequate amount of data.

Secondly, it rectifies potential problems inherent in the dataset. During the experiment, participants tend to forcibly overcome drowsiness and become more alert as the experiment nears its conclusion. This phenomenon leads to relatively poor data quality towards the end of the experiment. Given that mental fatigue prediction is a time-series prediction task, the trailing part of the data is utilized for testing purposes. Without screening, paradoxically, the better the model’s performance, the poorer its predictive ability on this particular dataset. Through data screening, data that is detrimental to prediction is effectively removed, thereby resolving this latent issue.

3.3.4. Models for Predicting Mental Fatigue

Prediction of mental fatigue falls into the category of time series prediction tasks. Each algorithm selected in this study is chosen for its unique advantages to address the specific challenges of fatigue dynamic modeling in industrial environments.

In the field of machine learning, the Light Gradient Boosting Machine (LightGBM), based on the gradient boosting framework, optimizes its performance through Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB). In this way, it not only maintains high accuracy but also improves training efficiency, making it highly suitable for processing high-dimensional time series data [56,57]. Its significance lies in its ability to balance accuracy and speed, which is crucial for industrial control rooms that require real-time fatigue warnings to prevent operational errors. Unlike computationally intensive models, LightGBM requires fewer resources, making on-site deployment feasible, and its robustness to noise in facial feature data (such as brief occlusions) ensures reliability in dynamic working environments.

Among deep learning models, the Gated Recurrent Unit (GRU), a lightweight alternative to the Long Short-Term Memory (LSTM) network, captures long-term dependencies through update gate and reset gate mechanisms. With fewer parameters and faster training speed, it provides a more efficient solution [58,59,60]. It is selected because of its focus on sequential dependencies, which aligns with the gradually cumulative nature of mental fatigue—a key aspect for modeling the transition from alertness to fatigue over time.

The Temporal Convolutional Network (TCN), integrating causal convolution, dilated convolution, and residual connections, can effectively model long-term dependencies and support parallel computing, thereby enabling efficient processing of time series data [61,62]. This makes it valuable in analyzing multi-scale patterns in fatigue evolution (such as short-term fluctuations in eye movements and long-term changes in posture), providing a complementary perspective to recursive models like GRU.

The Transformer architecture [63,64], leveraging self-attention mechanisms and positional encoding, overcomes the distance constraints in temporal dependencies. This makes it particularly adept at capturing global feature correlations and has demonstrated excellent performance in multi-variable time series prediction. The motivation for introducing it lies in its ability to highlight critical moments in the fatigue timeline (such as sudden changes in blink frequency), which is crucial for identifying early warning signals in noisy operational data.

In this study, mental fatigue prediction is performed using the aforementioned models. By comparing their performance—from the efficiency of LightGBM to the global awareness of Transformer—we aim to identify the optimal approach that balances accuracy, speed, and industrial applicability. Ultimately, a performance comparison is conducted based on actual experimental data to verify which model best meets the needs of fatigue management in control rooms.

To statistically validate the performance differences among the models, a one-way analysis of variance (ANOVA) was employed. This method was chosen because it can test whether there are significant differences in the means of performance metrics (accuracy and mean absolute error) across the five model types (independent variable). Additionally, post hoc pairwise comparisons were conducted to identify specific differences between individual models, ensuring the reliability of the performance conclusions.

4. Pilot Study

4.1. Data Collection of Pilot Study

To validate the effectiveness of the proposed method, this study conducted a pilot experiment involving 10 operators, all of whom were student participants. The participant cohort comprised five female (50%) and five male individuals (50%). The experimental setup simulated the operational environment of control room personnel within industrial settings. The participants were assigned the primary task of monitoring the interface of a Distributed Control System and recording any abnormal data presented.

While real-world operational workflows are inherently complex, the experimental tasks were deliberately designed to be simple and monotonous. This approach was implemented to expedite the onset of mental fatigue and thereby enhance the efficiency of data collection [65]. Furthermore, experiments were conducted in the post-lunch period, as this time frame is associated with increased natural susceptibility to fatigue.

During each session, participants engaged in a 30 min control room operation task. Throughout the experiment, a camera mounted above the control equipment captured facial images at 2 s intervals. Facial feature points were subsequently extracted from these images using the Landmark algorithm. A total of approximately 8000 valid facial images were collected. These images were randomized and independently evaluated by two expert raters using a modified version of the Karolinska Sleepiness Scale (KSS) tailored for this study. In parallel, participants completed self-reported fatigue assessments based on their subjective experiences. The final fatigue state score was calculated as the average of the expert ratings and the self-assessment scores, rounded to the nearest integer when necessary.

The KSS was employed as the primary measure of mental fatigue in this study. While the conventional KSS comprises 10 levels ranging from 1 (extremely alert) to 10 (extremely sleepy), a simplified 5-point scale (ranging from 1 to 5 with an interval of 1) was adopted to reduce assessment complexity and improve the specificity of fatigue classification relevant to the experimental objectives.

4.2. Data Processing of the Experiment

To align the collected data with the time-series prediction nature of mental fatigue early warning, targeted data processing was conducted, with key steps supported by specific experimental findings.

The raw facial feature dataset, consisting of 7940 samples in total, may contain outliers attributable to facial occlusion or motion blur. Employing the Z-score method, values exceeding ±3 standard deviations were identified as outliers, accounting for approximately 3.2% of the total samples (roughly 254 samples). These outliers were then replaced with the mean of adjacent valid data points. For example, if a frame lacked eye feature points, it was corrected using the average of the 2nd and 4th frames within its sequence. This approach ensured the integrity of the temporal sequence.

To mitigate the variability in facial feature coordinates among 10 participants (5 male and 5 female), the Min–Max normalization method was utilized to scale all 136-dimensional features to the [0, 1] range. For instance, the x-coordinate of the left eye corner, which originally spanned from 200 to 320 pixels across participants, was scaled to the range of 0.21–0.78. This transformation converted absolute positions into relative relationships, facilitating the extraction of temporal patterns.

The discrete KSS labels (ranging from 1 to 5) were converted into continuous values by introducing uniform random perturbations within the interval of (−0.3, 0.3). This interval was determined through grid search. Specifically, when σ = 0.3, the cross-validation scores reached their peak value of 0.87. For example, the label “3” (indicating mild fatigue) could be transformed into values such as 3.12 or 2.89. The processed labels covered a range from 1 ± 0.3 to 5 ± 0.3, enabling the capture of the gradual transitions of fatigue, which is crucial for time-series models. For example, in comparison to the utilization of discrete labels, the average absolute errors of several models have demonstrated a decrease. Specifically, the average absolute error (MAE) of the LightGBM model has been reduced by 19%.

A 15 s sliding window (equivalent to seven time steps with a 2 s interval) was employed to filter the data in accordance with the three-stage fatigue model. Non-fatigued states (State 1), which accounted for 21% of the samples (1667 samples), were excluded because they disrupted the prediction of fatigue trends. Fatigue states (State 4), making up 38% of the samples (3017 samples), were retained as the core training data. In the case of transitional states (State 2/3), non-fatigue segments were retained only if their length was less than or equal to seven steps (equivalent to 15 s). For example, a five-step insertion of State 3 within State 2 was retained, while a nine-step insertion was discarded.

Following the screening procedure, each participant retained approximately 600 valid samples. To align with the time-series prediction nature of mental fatigue warning, these 600 valid samples were divided into three subsets: the test set contains 15 fixed samples, corresponding to the last 30 s of data, which is used to simulate real-time prediction scenarios and accounts for 2.5% of the total. The remaining 585 samples are further divided into a training set (70% of the remaining samples, approximately 410 samples) for model training, and a validation set (30% of the remaining samples, approximately 175 samples) for hyperparameter tuning, such as optimizing the number of LightGBM iterations and the GRU network structure. This partitioning strategy has been adjusted to prioritize the core goal of fatigue warning: the test set must be strictly retained as tail data to reflect the latest fatigue trends, which is crucial for the effectiveness of practical early warning. Since the tail data contains only 15 samples, it is necessary to compress its proportion to 2.5% to avoid excessively reducing the amount of training data. To ensure sufficient samples for hyperparameter tuning, the proportion of the validation set is increased to 30% based on the remaining samples. This not only maintains the total proportion of the training and validation sets at 97.5% to ensure the model’s fitting ability but also adapts to the temporal characteristics of fatigue evolution, making the partitioning more in line with the actual needs of the study. The mean absolute errors of the five models introduced in this study all decreased. Specifically, the mean absolute error of the LightGBM model on the test set decreased from 0.089 before screening to 0.067, while that of the GRU model decreased from 0.233 to 0.2. These results validate the effectiveness of the screening protocol.

These data-processing steps, firmly grounded in experimental data, ensured that the processed data retained continuous temporal characteristics and key transition patterns. This, in turn, laid a solid foundation for the accurate time-series prediction of mental fatigue.

4.3. Experimental Outputs of Mental Fatigue Prediction

4.3.1. Mental Fatigue Prediction Based on LightGBM

LightGBM is a currently widely used machine learning algorithm. Similarly to most machine learning algorithms, rigorous parameter tuning is required before applying the algorithm model, and cross-validation plays a key role in this process, ensuring the reliability of parameter selection and the generalization ability of the model. Specifically, five-fold cross-validation was adopted in the parameter tuning process. This choice is justified by two key considerations: first, the training set for each participant contains approximately 410 samples, and a five-fold split results in around 82 samples per fold—an appropriate size that minimizes data waste while ensuring stable model performance evaluation. Second, five-fold cross-validation is widely utilized for hyperparameter tuning in small to medium-sized datasets due to its balance between computational efficiency and assessment reliability. Specifically, the initial learning rate is set to 0.1 to accelerate the training speed of the model. Subsequently, to determine the optimal number of iterations, the study adopts a cross-validation strategy: the dataset is divided into mutually independent training sets and validation sets multiple times, the model is trained repeatedly on different data subsets, and its performance is evaluated. By comparing the performance of the model on the validation set under different numbers of iterations, the optimal number of iterations is finally determined to be 130.

Next, to improve the model performance, when adjusting the key parameters of the decision tree, the cross-validation mechanism is also introduced, combined with GridSearchCV for parameter optimization: by constructing multiple sets of candidate parameter grids, cross-validation is used to evaluate the performance of each set of parameters on different data partitions, and finally the parameter combination that enables the model to achieve the optimal performance is screened out. In addition, to further enhance the generalization ability of the model, when optimizing the regularization parameters, the cross-validation results are still used as the basis. Through multiple experiments and adjustments, the risk of overfitting is eliminated, and the optimal learning rate for this specific problem is finally determined to be 0.01.

This cross-validation strategy throughout the parameter tuning process, through multiple rounds of data segmentation and performance verification, ensures that the selected parameters not only perform excellently on the training data but also maintain stable performance on unseen data. The detailed optimal parameter configuration is shown in Table 5.

4.3.2. Mental Fatigue Prediction Based on GRU

Gated Recurrent Unit (GRU) model is utilized for time-series prediction to assess the state of mental fatigue. The model captures temporal patterns by analyzing feature data over a number of past time steps and forecasts future states.

Considering that the length of the dynamic sliding window established in the preliminary stage was 7 (for determining the fatigue state), the time step of the GRU model in this study was also set to 7. Specifically, the model leverages the instantaneous mental-fatigue-state features of seven consecutive past time steps (with each time step encompassing seven features) to predict the actual fatigue state at the first time step in the future, that is, the 8th state. After data pre-processing, the input feature dimension of each sample is (7, 7), which encompasses the necessary historical feature information, and the corresponding label represents the true fatigue state of the 8th state.

The crucial parameters of the GRU network structure, namely the number of layers and the number of neurons in the hidden layer, were determined through experimental comparisons. The findings indicate that when a 3-layer GRU network is adopted, with the number of neurons in the hidden layers being 64, 32, and 8, successively, the model attains the highest accuracy. To address the prevalent issues of “gradient vanishing” and “overfitting” during the training of deep neural networks, in this study, a batch normalization (BN) layer and a dropout layer were added after each GRU layer for optimization purposes.

As depicted in Figure 6, the GRU model structure follows a sequential flow: the (7, 7) input tensor first enters the first GRU layer with 64 neurons, whose output is normalized by a BN layer to stabilize training and then passed through a dropout layer (rate = 0.2) to prevent overfitting by randomly deactivating 20% of neurons. This processed output is fed into the second GRU layer (32 neurons), followed by another BN layer and dropout layer (same rate) to further enhance generalization. The output then proceeds to the third GRU layer (8 neurons), with the same BN and dropout operations applied. Finally, the output of the third GRU layer is connected to a dense layer with 1 neuron, which outputs the predicted fatigue level for the 8th time step.

The settings of the model training parameters are as follows: the number of training epochs is 400, and the batch size is 32.

4.3.3. Mental Fatigue Prediction Based on TCN

The application procedure of the TCN model in fatigue prediction is similar to that of the GRU model. Initially, the data must be transformed into a tensor with a time step of 7, which is in line with the setting of the GRU model.

Subsequently, it is necessary to determine the network structure and network layer parameters of the TCN algorithm model. After numerous experiments, the TCN model employed in this study comprises four layers. For each layer of the TCN, 64 convolutional kernels are configured, with the kernel size set to 2. Moreover, each TCN layer contains only one convolutional block.

Simultaneously, taking into account the three characteristics of causal convolution, dilated convolution, and residual connection inherent in the TCN. An example of the extended causal convolution layer is shown in Figure 7, within each layer, the dilation factors of the convolution are set as [1,2,4,8]. Causal padding is selected as the padding method, and residual connections are utilized to mitigate the issue of gradient vanishing. To guard against overfitting, a dropout layer with a dropout rate of 0.2 is incorporated into each layer.

Once the model construction is completed, the TCN model requires training. During the training process, the epoch parameter needs to be determined. Through multiple experiments, the value of this parameter is set at 400, indicating that all data will be trained 400 times.

4.3.4. Mental Fatigue Prediction Based on Transformer

The procedure for applying the Transformer architecture to mental fatigue prediction shares similarities with those of other deep-learning approaches—based algorithms, Initially, it is essential to define the network structure of this algorithm. In this research, the overall number of network layers is set to 12, with 12 layers dedicated to both the encoder and the decoder. Regarding the application of the multi-head attention mechanism within the Transformer architecture, this study designates the number of heads (num_head) as 1. In other words, the conventional self-attention mechanism is employed. Additionally, to prevent overfitting, a dropout layer with a dropout rate of 0.1 is incorporated subsequent to each layer of the Transformer network.

Once the parameters of the network layer structure have been established, it is necessary to specify the parameters associated with model training. These parameters, namely the epoch and batch size, are the same as those utilized during the application of the GRU model. Through numerous experiments, this research has determined the values of these two parameters as follows: epochs = 400 and batch_size = 32.

4.3.5. Mental Fatigue Prediction Based on Attention-Based TCN

In the process of researching the prediction of mental fatigue, this study undertook an in-depth analysis of the characteristics of algorithms related to time-series prediction. It was discovered that the TCN model incorporates causal convolution and dilated convolution structures. These structures endow the model with the capacity to effectively capture long-range dependencies among different data points within time-series data. Consequently, the TCN model demonstrates remarkable performance when dealing with long-sequence time-series data.

Simultaneously, this study also observed that the Transformer architecture features a multi-head self-attention mechanism that is particularly well-suited for time-series prediction. By virtue of this mechanism, the model can adaptively direct its focus towards the crucial features within the data, thereby enhancing the model’s ability to represent features and ultimately improving the overall performance of the model.

In light of the distinct advantages of the aforementioned two models, this study endeavors to integrate the core characteristics of the TCN and Transformer to devise a novel hybrid model, namely the Attention-based TCN. This model not only inherits the efficiency of the TCN model in capturing long-range dependencies through its causal convolution architecture and dilated convolution mechanism but also integrates the self-attention mechanism’s ability to effectively model global correlations through dynamic weight allocation. The overarching objective is to develop a more robust and versatile time-series prediction tool. This tool is capable of accurately discerning both short-term and long-term dependencies within time-series data and, through the self-attention mechanism, enhancing the recognition and comprehension of significant patterns within the sequence.

The network architecture of this model is roughly the same as that of the TCN model. However, a multi-head self-attention layer is appended after each network layer. Through extensive experiments, the number of heads parameter (num_heads) of the self-attention layer has been determined to be 2. The network structure diagram of the Attention-based TCN is shown in Figure 8. After determining the structure of the model’s network layers, it is necessary to configure the model’s training parameters. Among them, the epoch parameter (epochs) has been set to 400.

4.4. Comparison and Analysis of Performance of Mental Fatigue Prediction Models

It should be noted that while all models utilize the same core features—7 key facial features within a 7-time-step sliding window—the input structures are adjusted according to the architectural characteristics of each model:

For the LightGBM (tree-based model), the input is formatted as a 49-dimensional vector. Tree-based models inherently require flattened, non-sequential input; thus, the seven time steps, each containing seven features, are flattened and concatenated into a single vector with 49 dimensions. This structure is consistent with the model’s design for processing static feature combinations.

For deep learning models (GRU, TCN, Transformer, and Attention-based TCN), the input retains a 7 × 7 two-dimensional matrix. These models rely on preserving temporal order to capture the dynamic evolution of fatigue. The matrix structure explicitly maintains the sequence of seven time steps, with each time step associated with seven features, enabling the modeling of time-dependent patterns that are critical for fatigue progression prediction.

The performances of different algorithm models in addressing the issue of mental fatigue prediction are summarized in Table 6. It is important to note that the mental fatigue prediction approach employed in this study was individually applied to the data of each subject. This implies that the performance evaluation of each model was conducted at the individual level, rather than being based on the group average. Consequently, Table 6 meticulously presents the prediction outcomes of all the tested models for each subject, offering detailed data for comprehending the applicability and efficacy of each model in personalized mental fatigue prediction.

Radar charts can vividly exhibit the performance disparities of multiple models across diverse dimensions and among various subjects. This aligns seamlessly with the objectives of identifying the most suitable model tailored to the characteristics of specific individuals or groups and guiding the development and optimization of future personalized mental fatigue prediction systems. Specifically, certain models may demonstrate remarkable performance across the majority of subjects; however, they might yield suboptimal results in specific cases. Conversely, although some models may generally exhibit relatively modest performance, they can generate highly precise predictions under particular circumstances. These findings carry substantial implications for delving into the underlying mechanisms of mental fatigue and formulating targeted intervention strategies.

The performance analysis of mental fatigue models, achieved by integrating the data analysis presented in Table 6 with the visual representation depicted in Figure 9, not only lays a solid foundation for advancements in the realm of mental fatigue prediction but also illuminates a clear path for future research endeavors.

In the model performance evaluation stage, the LightGBM model demonstrated the most outstanding performance. Its average accuracy rate was as high as 93.3%, and the mean absolute error (MAE) was 0.067. It significantly outperformed other comparison models both at the overall sample level and the individual sample level. The innovative model proposed in this study—the Attention-based TCN—showed a significant performance improvement compared to the basic model (accuracy increased by 10%, MAE decreased by 0.1), but still had a certain gap compared to the LightGBM model (accuracy was 10% lower, MAE was 0.08 higher); especially in terms of accuracy and stability, it was difficult to reach the level of the LightGBM model.

During the model performance evaluation phase, the statistical significance of performance differences among various models was further verified through analysis of variance (ANOVA), and the results were highly consistent with the conclusions drawn from intuitive performance comparisons. In terms of overall differences, ANOVA showed that there were extremely significant differences (p < 0.01) in the performance of the five models (LightGBM, GRU, TCN, Transformer, and Attention-based TCN) in both accuracy and MAE. This indicates that the superiority or inferiority of model performance is not caused by random errors but is determined by the inherent structure and characteristics of the models themselves, providing a statistical basis for subsequent model comparisons.

Specifically regarding the advantages of LightGBM, the ANOVA results based on accuracy reinforced the conclusion that it significantly outperforms other models. In terms of accuracy, the improvement of LightGBM, compared to Transformer, TCN, and GRU reached statistically significant levels, and the difference between LightGBM and Attention-based TCN also met the extremely significant standard (p = 0.000725 < 0.01), with the latter lagging by 9.4% in accuracy. This is consistent with intuitive observations—LightGBM’s average accuracy (93.33%) far exceeded that of other models, especially Transformer (70.00%) and TCN (70.67%), confirming its reliability in classification decisions. The ANOVA results for accuracy are shown in Table 7.

For the MAE metric, we introduced an indicator called MAE improvement percentage, which reflects the relative improvement in prediction error of LightGBM compared to other models. The calculation formula is shown in Equation (5).

MAE improvement percentage = \frac{MAE of other models - MAE of LightGBM}{MAE of other models} \times 100 %

(5)

In this equation, MAE of other models—MAE of LightGBM represents the absolute error reduction, and MAE of other models represents the original error benchmark. The results are presented as the percentage of relative error reduction.

ANOVA based on MAE also supported the significant advantages of LightGBM. Its error reduction relative to all comparative models was statistically significant. Specifically, the MAE improvement compared to Transformer reached 79.1%, 77.3% compared to TCN, and even compared to the second-best performing Attention-based TCN, the error was reduced by 54.6% (p = 0.000719 < 0.01). This is consistent with the result that LightGBM’s average MAE (0.067) was significantly lower than that of other models, indicating that its high precision in capturing subtle changes in fatigue states is not accidental. The ANOVA results for MAE are shown in Table 8.

In addition, ANOVA also revealed the value of the innovative model, Attention-based TCN: although its overall performance was still inferior to LightGBM, its improvements compared to the basic TCN—with accuracy increased by 14.7% (from 70.67% to 85.33%) and MAE decreased by 0.1466 (from 0.2934 to 0.1468)—have practical significance. Moreover, such improvements were verified at the statistical level, indicating that the temporal convolutional network integrated with the self-attention mechanism can indeed enhance the ability to capture dynamic changes in fatigue.

In summary, ANOVA statistically confirmed the significant advantages of LightGBM in accuracy and error control, while verifying the reliability of performance differences among models, providing solid support for the conclusion that LightGBM is more suitable for real-time deployment in industrial scenarios.

The advantages of the LightGBM model are attributed to its unique model characteristics. As a lightweight version of the Gradient Boosting Machine (GBM), it significantly outperforms deep learning architectures in computational efficiency. This feature effectively reduces the training and testing time of the model, thus reserving more ample emergency response time for control personnel. Additionally, the model has relatively low requirements for data and computing resources, and its deployment cost is much lower than that of deep learning models, making it more applicable in practical scenarios. From an actual performance analysis, when predicting the last 15 instantaneous fatigue states of each participant in the test set, the model’s accuracy rate reached 93.3% (MAE = 0.067), accurately identifying 14 out of 15 states, with the prediction error within an appropriately negligible range. This indicates that the model can predict the mental fatigue state in the next 30 s relatively accurately, providing a solid and reliable technical support for the field of mental fatigue prediction.

5. Discussion and Conclusions

5.1. Contribution of the Work

Firstly, since the data collection in this experiment solely relies on a camera (eliminating the need for wearable sensors), the theory of fatigue prediction can potentially be extended to a wide range of application domains. Nevertheless, given the unique characteristics of different scenarios and specific cases, these hypotheses necessitate further in-depth investigation and empirical validation to establish their validity and feasibility in both academic research and practical applications.

Secondly, to resolve the conflict between discrete labels and the continuous nature of fatigue, an innovative reverse data binning technique is proposed. Specifically, a uniform random perturbation within the range of (−0.3, 0.3) is introduced to the discrete KSS (Karolinska Sleepiness Scale) labels ranging from 1 to 5, thereby converting them into continuous values. This approach not only retains the original hierarchical information but also mitigates the interference of discrete jumps on time series prediction. Consequently, it provides label data that is more in line with the natural evolution law of fatigue for time series models.

Thirdly, in response to the need for optimizing data quality and prediction efficiency, a novel dynamic screening strategy based on a three–stage fatigue model is devised. Using a 15-s sliding window (equivalent to seven time steps) as the unit, the instantaneous state is classified into four categories by comparing the mean values of the previous and subsequent three states (with a threshold of 3). Specifically, only the fatigue state (State 4) and key transitional patterns (data with an interference step length ≤ 7 in State 2 or State 3) are retained. This strategy effectively eliminates redundant information from non–fatigue states and addresses the issue of poor data quality at the end of the dataset. As a result, it significantly enhances the relevance and effectiveness of the data used for prediction.

Fourthly, in view of the limitations of single models in capturing complex temporal patterns, this study innovatively puts forward a hybrid architecture that comprehensively integrates the core strengths of TCN (Time Convolutional Network) and Transformer. This model preserves the causal convolution and dilated convolution mechanisms of TCN, which are capable of efficiently capturing long-range dependencies within time series data. Specifically, by ingeniously embedding multi-head self-attention layers after each layer of the TCN, the model can adaptively focus on crucial features, thereby enhancing the pertinence of feature representation. This integration effectively overcomes the inherent drawbacks of traditional TCN, such as its insufficient global correlation modeling capabilities, and the low efficiency of Transformer in processing long sequences. As a result, it enables the precise capture of multi-level temporal features during the dynamic evolution of mental fatigue. To further enhance the transparency and reproducibility of this study, the raw experimental dataset of the ten participants (including original facial feature coordinates. Initial discrete KSS scores and anonymized metadata have been uploaded to the public GitHub dataset https://github.com/nc628/Experimental-dataset-for-mental-fatigue-prediction (accessed on Saturday 22 August 2025). The software used for data preprocessing and organizing the Excel dataset includes Python 3.9 (for data cleaning and formatting), with Microsoft Excel 2021 for final review and storage.

Finally, a key contribution of this study lies in the rigorous confirmation that LightGBM is the optimal model for predicting mental fatigue, a conclusion validated through both performance metrics and statistical analysis. In the comparative evaluation of five models (LightGBM, GRU, TCN, Transformer, and Attention-based TCN) using the constructed facial feature dataset, LightGBM demonstrated outstanding performance, with an average accuracy of 93.33% and a mean absolute error of 0.067, outperforming all other models, including the innovative Attention-based TCN. To further substantiate these findings, one-way analysis of variance (ANOVA) was employed, and the results indicated that the performance differences among different models were statistically significant (p < 0.01), thus confirming that LightGBM’s advantages in terms of accuracy and mean absolute error reduction were not accidental. This combination of performance comparison and statistical rigor underscores the practical applicability of LightGBM for real-time deployment in industrial control rooms, where precision and reliability are crucial for preventing operational errors caused by fatigue.

This study advances the domain of mental fatigue assessment by addressing critical limitations in existing research, as illustrated through comparative analyses with two seminal works. In contrast to Pan et al.’s fatigue state recognition system for miners [46], which relies on invasive multimodal sensors and achieves 93.15% accuracy via Transformer+ fusion—our research exclusively leverages non-invasive facial feature data acquired through computer vision. This approach eliminates operational interference, a pivotal advantage for dynamic industrial control room environments, while achieving comparable accuracy (93.33%) with a lower mean absolute error (0.067) using the lightweight LightGBM model, thereby enhancing feasibility for real-time deployment. Distinct from Yamada & Kobayashi’s 2018 study [31], which focuses on static fatigue detection in non-task video viewing scenarios (attaining 91.0% accuracy through eye-tracking data and SVM classification), our work redefines mental fatigue prediction as a time-series regression task to capture the continuous evolutionary trajectory of fatigue. By introducing reverse data binning (transforming discrete KSS labels into continuous values via random perturbations ≤ 0.3) and a 6 s rule-based screening strategy, we enable proactive forecasting of fatigue states rather than mere static detection. Furthermore, our rigorous statistical validation via one-way ANOVA (p < 0.01) confirms the superiority of the selected model, a methodological rigor not employed in the aforementioned studies.

5.2. Limitation and Future Work Opportunity

In terms of limitations, first, the sample characteristics of existing datasets exhibit certain constraints. The experimental participants are predominantly concentrated in the young age group of 20–25 years. Moreover, there are disparities between the complexity of the simulated work scenarios and that of the real industrial environment. In real-world scenarios, there can be more intricate variables such as concurrent multi-tasks and unexpected disruptions (e.g., equipment alarms, personnel communication). The impacts of these factors on the state of personnel have not been fully incorporated into the model training process. As a result, this may restrict the generalization ability of the model in actual industrial scenarios. Second, the current technical approaches inadequately portray the dynamic interactions among “state–environment–task”. The existing studies mainly center on the facial features of individuals themselves, while overlooking the associations between external factors such as environmental parameters (e.g., light intensity in the control room, noise decibel level) and task attributes (e.g., operation complexity, urgency) and the state of personnel. This makes it arduous to establish a more comprehensive state assessment framework. Third, regarding model testing, although the LightGBM model demonstrates excellent performance in mental fatigue prediction research, deep-learning models (e.g., TCN and Transformer) still hold potential in some aspects. Future research can further explore ways to improve the performance of deep-learning models in mental fatigue prediction tasks through model integration or structural optimization.

Future research endeavors can be extended across multiple dimensions: Firstly, it is advisable to promote investigations into cross-modal data fusion. Beyond facial features, non-contact physiological signals (e.g., remote heart rate monitoring), environmental sensor data (including temperature, humidity, and noise), and task log information can be incorporated. Through the correlational analysis of multi-source data, the comprehensiveness and robustness of state assessment can be further enhanced. This approach aims to capture a more complete picture of the subject’s state by integrating diverse data streams, thereby improving the accuracy and reliability of the assessment. Secondly, the exploration of adaptive scene transfer techniques is of great significance. Leveraging transfer learning and domain adaptation algorithms, models trained in laboratory settings can be effectively transferred to various real-world scenarios across different industries, such as chemical engineering, power generation, and manufacturing. By fine-tuning these models with a limited amount of on-site data, the issue of performance degradation resulting from scene disparities can be mitigated. This not only broadens the applicability of the models but also ensures their effectiveness in different practical contexts. Thirdly, due attention should be given to ethical and privacy protection technologies. During the processes of data collection and model application, the development of facial feature anonymization algorithms is essential. Striking a balance between monitoring precision and the protection of personnel privacy is crucial. This will facilitate the practical implementation of the technology within a compliant framework, ensuring that the benefits of the technology are realized while respecting individual rights and ethical considerations.

5.3. Conclusions

This research is centered on predicting the mental fatigue of operators in industrial control rooms. By collecting facial data through cameras, a fatigue sample dataset grounded in facial features is established. The fatigue levels are evaluated using an improved Karolinska Sleepiness Scale (KSS). The study redefines the task of fatigue prediction as a time-series regression problem and introduces two innovative data-processing techniques. The first is reverse data binning, which converts discrete KSS labels into continuous values through random perturbations (≤0.3). This conversion facilitates temporal modeling. The second is a fatigue-aware data-screening method. By applying the 6 s rule and a sliding window, this method retains crucial transition patterns.

The performance of five prediction models, namely LightGBM, GRU, TCN, Transformer, and Attention-based TCN, is evaluated. The results indicate that LightGBM demonstrates remarkable performance, achieving an accuracy of 93.33% and a mean absolute error of 0.067. It significantly outperforms the deep-learning models. Moreover, its computational efficiency further validates its suitability for real-time implementation.

This research integrates predictive modeling with industrial safety applications. It provides evidence for the viability of machine learning in proactive fatigue management and offers a practical approach to preventing operational errors caused by mental fatigue in industrial environments.

Author Contributions

Conceptualization, Y.C., J.C., X.X., W.Y. and Z.J.; methodology, Y.C., J.C. and W.Y.; software, J.C.; validation, J.C.; formal analysis, J.C. and Z.J.; data curation, J.C. and X.X.; writing—original draft, Y.C., J.C. and X.X.; writing—review and editing, Y.C., J.C., W.Y. and Z.J.; visualization, J.C. and X.X.; supervision, Z.J.; project administration, J.C. and X.X. All authors have read and agreed to the published version of the manuscript.

Funding

This paper has received strong support from the Youth Project of the National Natural Science Foundation of China, with the project number C23210200010. This project provided funding for data collection and equipment.

Data Availability Statement

The raw experimental dataset of the ten participants (including original facial feature coordinates, initial discrete KSS scores, and anonymized metadata) has been uploaded to the public GitHub dataset at https://github.com/nc628/Experimental-dataset-for-mental-fatigue-prediction (accessed on 22 August 2025).

Conflicts of Interest

The authors declare no conflict of interest.

References

Tasdelen, A.; Özpınar, A. The impacts of mental and physical fatigue of employees on the perception level and the risk of accident. Avrupa Bilim Teknol. Derg. 2020, 195–205. [Google Scholar] [CrossRef]
Zakeri, Z.; Omurtag, A.; Breedon, P.; Hilliard, G.; Khalid, A. (Eds.) Studying mental stress factor in occupational safety in the context of the smart factory. In Proceedings of the 31st European Safety and Reliability Conference, ESREL, Angers, France, 19–23 September 2021. [Google Scholar]
Sadeghniiat-Haghighi, K.; Yazdi, Z. Fatigue management in the workplace. Ind. Psychiatry J. 2015, 24, 12–17. [Google Scholar] [CrossRef] [PubMed]
Adão Martins, N.R.; Annaheim, S.; Spengler, C.M.; Rossi, R.M. Fatigue monitoring through wearables: A state-of-the-art review. Front. Physiol. 2021, 12, 790292. [Google Scholar] [CrossRef]
Wilson, M.K.; Strickland, L.; Ballard, T.; Griffin, M.A. The next generation of fatigue prediction models: Evaluating current trends in biomathematical modelling. Theor. Issues Ergon. Sci. 2024, 25, 2521–2543. [Google Scholar] [CrossRef]
Yu, Y.; Li, H.; Yang, X.; Kong, L.; Luo, X.; Wong, A.Y. An automatic and non-invasive physical fatigue assessment method for construction workers. Autom. Constr. 2019, 103, 1–12. [Google Scholar] [CrossRef]
Laverghetta, A., Jr.; Tran, M.; Braynen, A.; Steinle, S.; Moydinboyev, B.; Daas, H.; Licato, J. A survey of fatigue measures and models. J. Def. Model. Simul. 2025, 22, 147–173. [Google Scholar] [CrossRef]
Monteiro, T.G.; Skourup, C.; Zhang, H. Using EEG for mental fatigue assessment: A comprehensive look into the current state of the art. IEEE Trans. Hum.-Mach. Syst. 2019, 49, 599–610. [Google Scholar] [CrossRef]
Qi, P.; Ru, H.; Gao, L.; Zhang, X.; Zhou, T.; Tian, Y.; Thakor, N.; Bezerianos, A.; Li, J.; Sun, Y. Neural mechanisms of mental fatigue revisited: New insights from the brain connectome. Engineering 2019, 5, 276–286. [Google Scholar] [CrossRef]
Zhang, Y.; Hua, C. Driver fatigue recognition based on facial expression analysis using local binary patterns. Optik 2015, 126, 4501–4505. [Google Scholar] [CrossRef]
Lehrer, A.M. A systems-based framework to measure, predict, and manage fatigue. Rev. Hum. Factors Ergon. 2015, 10, 194–252. [Google Scholar] [CrossRef]
Lekkas, D.; Price, G.D.; Jacobson, N.C. Using smartphone app use and lagged-ensemble machine learning for the prediction of work fatigue and boredom. Comput. Hum. Behav. 2022, 127, 107029. [Google Scholar] [CrossRef]
Duffy, J.F.; Zitting, K.-M.; Czeisler, C.A. The case for addressing operator fatigue. Rev. Hum. Factors Ergon. 2015, 10, 29–78. [Google Scholar] [CrossRef]
Esteves, T.S.C. Sleepy Drivers: Eye-State and Drowsiness Monitoring Using Face Video. Master’s Thesis, Universidade NOVA de Lisboa, Lisboa, Portugal, 2023. [Google Scholar]
Fu, B.; Boutros, F.; Lin, C.-T.; Damer, N. A survey on drowsiness detection–modern applications and methods. In IEEE Transactions on Intelligent Vehicles; IEEE: New York, NY, USA, 2024. [Google Scholar]
Assaf, A.H.; Abdessalem, H.B.; Frasson, C. Detection and recuperation of mental fatigue. J. Behav. Brain Sci. 2023, 13, 15–31. [Google Scholar] [CrossRef]
Jiang, Y.; Malliaras, P.; Chen, B.; Kulić, D. Real-time forecasting of exercise-induced fatigue from wearable sensors. Comput. Biol. Med. 2022, 148, 105905. [Google Scholar] [CrossRef]
Chen, Y.; Liu, W.; Zhang, L.; Yan, M.; Zeng, Y. Hybrid facial image feature extraction and recognition for non-invasive chronic fatigue syndrome diagnosis. Comput. Biol. Med. 2015, 64, 30–39. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Han, M.; Avouka, T.; Chen, R.; Wang, J.; Wei, R. Research on fatigue identification methods based on low-load wearable ECG monitoring devices. Rev. Sci. Instrum. 2023, 94, 045103. [Google Scholar] [CrossRef]
Miley, A.Å.; Kecklund, G.; Åkerstedt, T. Comparing two versions of the Karolinska Sleepiness Scale (KSS). Sleep Biol. Rhythm. 2016, 14, 257–260. [Google Scholar] [CrossRef]
Cheng, B.; Fan, C.; Fu, H.; Huang, J.; Chen, H.; Luo, X. Measuring and computing cognitive statuses of construction workers based on electroencephalogram: A critical review. IEEE Trans. Comput. Soc. Syst. 2022, 9, 1644–1659. [Google Scholar] [CrossRef]
Van Cutsem, J.; Marcora, S.; De Pauw, K.; Bailey, S.; Meeusen, R.; Roelands, B. The Effects of Mental Fatigue on Physical Performance: A Systematic Review. Sports Med. 2017, 47, 1569–1588. [Google Scholar] [CrossRef] [PubMed]
Kunasegaran, K.; Ismail, A.M.H.; Ramasamy, S.; Gnanou, J.V.; Caszo, B.A.; Chen, P.L. Understanding mental fatigue and its detection: A comparative analysis of assessments and tools. PeerJ 2023, 11, e15744. [Google Scholar] [CrossRef] [PubMed]
Tsai, M.-K. Enhancing nuclear power plant safety via on-site mental fatigue management. Nucl. Technol. Radiat. Prot. 2017, 32, 109–114. [Google Scholar] [CrossRef]
Rudin-Brown, C.M.; Rosberg, A. Applying principles of fatigue science to accident investigation: Transportation Safety Board of Canada (TSB) fatigue investigation methodology. Chronobiol. Int. 2021, 38, 296–300. [Google Scholar] [CrossRef] [PubMed]
Chen, K.; Liu, Z.; Liu, Q.; Ai, Q.; Ma, L. EEG-based mental fatigue detection using linear prediction cepstral coefficients and Riemann spatial covariance matrix. J. Neural Eng. 2022, 19, 066021. [Google Scholar] [CrossRef]
Ahn, S.; Nguyen, T.; Jang, H.; Kim, J.G.; Jun, S.C. Exploring neuro-physiological correlates of drivers’ mental fatigue caused by sleep deprivation using simultaneous EEG, ECG, and fNIRS data. Front. Hum. Neurosci. 2016, 10, 219. [Google Scholar] [CrossRef]
Qi, P.; Hu, H.; Zhu, L.; Gao, L.; Yuan, J.; Thakor, N.; Bezerianos, A.; Sun, Y. EEG functional connectivity predicts individual behavioural impairment during mental fatigue. IEEE Trans. Neural Syst. Rehabil. Eng. 2020, 28, 2080–2089. [Google Scholar] [CrossRef]
Bafna, T.; Bækgaard, P.; Hansen, J.P. Mental fatigue prediction during eye-typing. PLoS ONE 2021, 16, e0246739. [Google Scholar] [CrossRef]
Kashevnik, A.; Kovalenko, S.; Mamonov, A.; Hamoud, B.; Bulygin, A.; Kuznetsov, V.; Shoshina, I.; Brak, I.; Kiselev, G. Intelligent Human Operator Mental Fatigue Assessment Method Based on Gaze Movement Monitoring. Sensors 2024, 24, 6805. [Google Scholar] [CrossRef]
Yamada, Y.; Kobayashi, M. Detecting mental fatigue from eye-tracking data gathered while watching video: Evaluation in younger and older adults. Artif. Intell. Med. 2018, 91, 39–48. [Google Scholar] [CrossRef]
Butkevičiūtė, E.; Michalkovič, A.; Bikulčienė, L. ECG signal features classification for the mental fatigue recognition. Mathematics 2022, 10, 3395. [Google Scholar] [CrossRef]
Ding, S.; Pan, X.; Han, R.; Zeng, X.; Li, Y.; Zheng, X. (Eds.) A machine learning approach to reduce mental fatigue risk of pilots based on HRV data. In Proceedings of the 12th International Conference on Quality, Reliability, Risk, Maintenance, and Safety Engineering (QR2MSE 2022), Emeishan, China, 27–30 July 2022; IET: Stevenage, UK, 2022. [Google Scholar]
Chen, R.; Wang, R.; Fei, J.; Huang, L.; Wang, J. Quantitative identification of daily mental fatigue levels based on multimodal parameters. Rev. Sci. Instrum. 2023, 94, 095106. [Google Scholar] [CrossRef] [PubMed]
Kodikara, C.; Wijekoon, S.; Meegahapola, L. FatigueSense: Multi-device and multimodal wearable sensing for detecting mental fatigue. ACM Trans. Comput. Healthc. 2025, 6, 1–36. [Google Scholar] [CrossRef]
Song, A.; Niu, C.; Ding, X.; Xu, X.; Song, Z. Mental fatigue prediction model based on multimodal fusion. IEEE Access 2019, 7, 177056–177062. [Google Scholar] [CrossRef]
Zhou, F.; Alsaid, A.; Blommer, M.; Curry, R.; Swaminathan, R.; Kochhar, D.; Talamonti, W.; Tijerina, L.; Lei, B. Driver fatigue transition prediction in highly automated driving using physiological features. Expert. Syst. Appl. 2020, 147, 113204. [Google Scholar] [CrossRef]
Cos, C.-A.; Lambert, A.; Soni, A.; Jeridi, H.; Thieulin, C.; Jaouadi, A. Enhancing Mental Fatigue Detection through Physiological Signals and Machine Learning Using Contextual Insights and Efficient Modelling. J. Sens. Actuator Netw. 2023, 12, 77. [Google Scholar] [CrossRef]
Zeng, Z.; Huang, Z.; Leng, K.; Han, W.; Niu, H.; Yu, Y.; Ling, Q.; Liu, J.; Wu, Z.; Zang, J. Nonintrusive monitoring of mental fatigue status using epidermal electronic systems and machine-learning algorithms. ACS Sens. 2020, 5, 1305–1313. [Google Scholar] [CrossRef]
Liu, Y.; Lan, Z.; Khoo, H.H.G.; Li, K.H.H.; Sourina, O.; Mueller-Wittig, W. (Eds.) EEG-based evaluation of mental fatigue using machine learning algorithms. In Proceedings of the 2018 International Conference on Cyberworlds (CW), Singapore, 3–5 October 2018; IEEE: New York, NY, USA, 2018. [Google Scholar]
Wu, D.; Deng, L.; Lu, Q.; Liu, S. A multidimensional adaptive transformer network for fatigue detection. Cogn. Neurodyn. 2025, 19, 43. [Google Scholar] [CrossRef]
Zhao, P.; Lian, C.; Xu, B.; Zeng, Z. Multiscale global prompt transformer for EEG-based driver fatigue recognition. In IEEE Transactions on Automation Science and Engineering; IEEE: New York, NY, USA, 2024. [Google Scholar]
Zhang, Y.; Chen, Y.; Pan, Z. (Eds.) A deep temporal model for mental fatigue detection. In Proceedings of the 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Miyazaki, Japan, 7–10 October 2018; IEEE: New York, NY, USA, 2018. [Google Scholar]
Liu, Y.; Liu, Y.; Yang, Y. Prediction of contact fatigue performance degradation trends based on multi-domain features and temporal convolutional networks. Entropy 2023, 25, 1316. [Google Scholar] [CrossRef]
Ma, Y.; Shao, Y.; Xue, Z.; Yu, Z. (Eds.) Urban fatigue driving prediction with federated learning. In Proceedings of the 2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS), Xi’an, China, 7–8 November 2021; IEEE: New York, NY, USA, 2021. [Google Scholar]
Pan, H.; Tong, S.; Wei, X.; Teng, B. Fatigue state recognition system for miners based on a multi-modal feature extraction and fusion framework. In IEEE Transactions on Cognitive and Developmental Systems; IEEE: New York, NY, USA, 2024. [Google Scholar]
Dore, H. Methods for Improving Data Acquisition and Signal Processing for Monitoring the ECG; University of Sussex: Brighton, UK, 2023. [Google Scholar]
Sikander, G.; Anwar, S. Driver fatigue detection systems: A review. IEEE Trans. Intell. Transp. Syst. 2018, 20, 2339–2352. [Google Scholar] [CrossRef]
Aggarwal, V.; Gupta, V.; Singh, P.; Sharma, K.; Sharma, N. (Eds.) Detection of spatial outlier by using improved Z-score test. In Proceedings of the 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 23–25 April 2019; IEEE: New York, NY, USA, 2019. [Google Scholar]
Vinutha, H.; Poornima, B.; Sagar, B. (Eds.) Detection of outliers using interquartile range technique from intrusion dataset. Information and decision sciences. In Proceedings of the 6th International Conference on Ficta, Bhubaneswar, Odisha, India, 14–16 October 2017; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Kappal, S. Data normalization using median median absolute deviation MMAD based Z-score for robust predictions vs. min–max normalization. Lond. J. Res. Sci. Nat. Form. 2019, 19, 39–44. [Google Scholar]
Dhawas, P.; Dhore, A.; Bhagat, D.; Pawar, R.D.; Kukade, A.; Kalbande, K. Big data preprocessing, techniques, integration, transformation, normalisation, cleaning, discretization, and binning. In Big Data Analytics Techniques for Market Intelligence; IGI Global Scientific Publishing: Hershey, PA, USA, 2024; pp. 159–182. [Google Scholar]
Priya, S.M. Hyper tuning using gridsearchcv on machine learning models for prognosticating dementia. Preprint 2022. [Google Scholar] [CrossRef]
Eriksson, A.; Stanton, N.A. Takeover time in highly automated vehicles: Noncritical transitions to and from manual control. Hum. Factors 2017, 59, 689–705. [Google Scholar] [CrossRef] [PubMed]
Jeunet, C.; N’Kaoua, B.; Subramanian, S.; Hachet, M.; Lotte, F. Predicting mental imagery-based BCI performance from personality, cognitive profile and neurophysiological patterns. PLoS ONE 2015, 10, e0143962. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. A highly efficient gradient boosting decision tree. In Proceedings of the Advances in neural information processing systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Li, S.; Jin, N.; Dogani, A.; Yang, Y.; Zhang, M.; Gu, X. Enhancing LightGBM for industrial fault warning: An innovative hybrid algorithm. Processes 2024, 12, 221. [Google Scholar] [CrossRef]
Fu, R.; Zhang, Z.; Li, L. (Eds.) Using LSTM and GRU neural network methods for traffic flow prediction. In Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), Wuhan, China, 11–13 November 2016; IEEE: New York, NY, USA, 2016. [Google Scholar]
Dey, R.; Salem, F.M. (Eds.) Gate-variants of gated recurrent unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; IEEE: New York, NY, USA, 2017. [Google Scholar]
Shiri, F.M.; Perumal, T.; Mustapha, N.; Mohamed, R. A comprehensive overview and comparative analysis on deep learning models: CNN, RNN, LSTM, GRU. arXiv 2023, arXiv:2305.17473. [Google Scholar] [CrossRef]
Gopali, S.; Abri, F.; Siami-Namini, S.; Namin, A.S. (Eds.) A comparison of TCN and LSTM models in detecting anomalies in time series data. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; IEEE: New York, NY, USA, 2021. [Google Scholar]
Limouni, T.; Yaagoubi, R.; Bouziane, K.; Guissi, K.; Baali, E.H. Accurate one step and multistep forecasting of very short-term PV power using LSTM-TCN model. Renew. Energy 2023, 205, 1010–1024. [Google Scholar] [CrossRef]
So, D.; Le, Q.; Liang, C. (Eds.) The evolved transformer. In Proceedings of the 36th International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
Han, K.; Xiao, A.; Wu, E.; Guo, J.; Xu, C.; Wang, Y. Transformer in transformer. Adv. Neural Inf. Process. Syst. 2021, 34, 15908–15919. [Google Scholar]
Karim, E.; Pavel, H.R.; Nikanfar, S.; Hebri, A.; Roy, A.; Nambiappan, H.R.; Jaiswal, A.; Wylie, G.R.; Makedon, F. Examining the landscape of cognitive fatigue detection: A comprehensive survey. Technologies 2024, 12, 38. [Google Scholar] [CrossRef]

Figure 1. Overall framework for predicting mental fatigue.

Figure 2. Facial feature points and distribution map of facial feature point coordinates. In the figure, the points on both the x-axis and y-axis represent coordinate positions, which are used to mark the coordinates of facial feature points.

Figure 3. Diagram of data binning.

Figure 4. Illustration of discrete label continuous transformation.

Figure 5. Schematic diagram of data filtering logic.

Figure 6. GRU algorithm network structure diagram.

Figure 7. Extended causal convolution layer example.

Figure 8. The network structural diagram of the Attention-based TCN.

Figure 9. Radar chart comparing the effects of mental fatigue prediction algorithms.

Table 1. The original Karolinska Sleepiness Scale.

Description	Fatigue Level
Extremely alert	1
Very alert	2
Alert	3
Rather alert	4
Neither alert nor sleepy	5
Some signs of sleepiness	6
Sleepy, but no effort to keep awake	7
Sleepy, but some effort to keep awake	8
Very sleepy, great effort to keep awake, fighting sleep	9
Extremely sleepy, cannot keep awake	10

Table 2. The adapted Karolinska Sleepiness Scale.

Description	Fatigue Level
Very clearly alert	1
Somewhat alert, but less so than level 1	2
Some sleepy occurred	3
Sleepy	4
Very sleepy	5

Table 3. The classification and explanation of mental fatigue states.

Rule Number	Status Definition	Status Explanation
$Average (x_{1} + x_{2} + x_{3}) < 3$ $and Average (x_{5} + x_{6} + x_{7}) < 3$	State 1	Non-fatigued state
$Average (x_{1} + x_{2} + x_{3}) < 3$ $and Average (x_{5} + x_{6} + x_{7}) \geq 3$	State 2	The intermediate state of transition from a non-fatigued state to a fatigued state
$Average (x_{1} + x_{2} + x_{3}) \geq 3$ $and Average (x_{5} + x_{6} + x_{7}) < 3$	State 3	On the verge of fatigue but striving hard to stay awake
$Average (x_{1} + x_{2} + x_{3}) \geq 3$ $and Average (x_{5} + x_{6} + x_{7}) \geq 3$	State 4	State of fatigue

Table 4. Data screening rules for mental fatigue prediction.

Rule Number	Rule Content	Reason
01	None of the instantaneous states belonging to State 1 can be retained.	State 1 belongs to a strictly non-fatigued state, and such data will affect the overall prediction of mental fatigue states.
02	For cases where State 3 and State 1 are interspersed between State 2, if the total step length of the interspersed states does not exceed seven steps, they are retained; otherwise, they are not retained.	Based on the first introduced concept, the upper limit of the time for a person to respond after being alerted from a fatigued state is 15 s. Therefore, if there are seven consecutive occurrences of State 1 or State 3 between State 2, it indicates that the person was actually in a relatively alert state and capable of responding promptly during this period.
03	Find the last sequence number that satisfies the second rule (State 2 with a State 3 or 1 appearing within less than seven steps in the middle), and the data from that sequence onward can be used for training.
04	All the rules belonging to State 4 are retained.	State 4 belongs to a strict state of fatigue and should be fully retained as the data required for this research.

Table 5. Optimal parameters of LightGBM algorithm.

Parameter Name	Parameter Value	Parameter Name	Parameter Value
boosting_type	gbdt	subsample	0.6
metric	mse	n_estimators	130
num_leaves	18	subsample_freq	0
learning_rate	0.01	max_depth	10
num_iterations	130	reg_lambda	0.001
min_child_samples	11	reg_alpha	0.01
max_bin	250

Table 6. Comparison of prediction effects of different algorithms.

Algorithm Model	Indicator	Experimenter 01	Experimenter 02	Experimenter 03	Experimenter 04	Experimenter 05	Experimenter 06	Experimenter 07	Experimenter 08	Experimenter 09	Experimenter 10	Average Value
LightGBM	Accuracy	93.33%	86.67%	100%	93.33%	93.33%	86.67%	100%	86.67%	93.30%	100%	93.33%
	MAE	0.067	0.133	0	0.067	0.067	0.133	0	0.133	0.067	0	0.067
GRU	Accuracy	80.00%	66.67%	80%	93.33%	60%	73.33%	60%	86.67%	66.67%	86.67%	75.33%
	MAE	0.2	0.333	0.2	0.067	0.4	0.267	0.4	0.133	0.333	0.133	0.247
TCN	Accuracy	53.33%	73.33%	100%	93.33%	40%	73.33%	66.67%	66.67%	80%	60%	70.67%
	MAE	0.467	0.267	0	0.067	0.6	0.267	0.333	0.333	0.2	0.4	0.293
Transformer	Accuracy	86.67%	46.67%	86.67%	86.67%	53.33%	66.67%	60%	73.33%	66.67%	73.33%	70%
	MAE	0.133	0.533	0.133	0.133	0.667	0.333	0.4	0.267	0.333	0.267	0.32
Attention-based TCN	Accuracy	93.33%	80%	100%	93.33%	66.67%	73.33%	80%	93.33%	73.33%	100%	85.33%
	MAE	0.067	0.2	0	0.067	0.333	0.267	0.2	0.067	0.267	0	0.147

Table 7. Analysis of variance of Accuracy.

Model	Comparative Model	Improvement of Accuracy	p-Value (Accuracy)	F-Value (Accuracy)
LightGBM	Transformer	33.30%	1.24 × 10⁻⁴	23.70
	TCN	32.10%	1.20 × 10⁻⁴	14.78
	GRU	23.90%	3.57 × 10⁻⁴	19.23
	Attention-based_TCN	9.40%	7.25 × 10⁻⁴	3.64

Table 8. Analysis of variance of MAE.

Model	Comparative Model	Improvement of MAE	p-Value (Accuracy)	F-Value (Accuracy)
LightGBM	Transformer	79.10%	4.15 × 10⁻⁴	18.64
	TCN	77.30%	1.20 × 10⁻⁴	14.80
	GRU	73.00%	3.56 × 10⁻⁴	19.25
	Attention-based_TCN	54.60%	7.19 × 10⁻⁴	3.65

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Y.; Chen, J.; Xie, X.; Yi, W.; Ji, Z. Prediction of Mental Fatigue for Control Room Operators: Innovative Data Processing and Multi-Model Evaluation. Mathematics 2025, 13, 2794. https://doi.org/10.3390/math13172794

AMA Style

Chen Y, Chen J, Xie X, Yi W, Ji Z. Prediction of Mental Fatigue for Control Room Operators: Innovative Data Processing and Multi-Model Evaluation. Mathematics. 2025; 13(17):2794. https://doi.org/10.3390/math13172794

Chicago/Turabian Style

Chen, Yong, Jiangtao Chen, Xian Xie, Wenchao Yi, and Zuzhen Ji. 2025. "Prediction of Mental Fatigue for Control Room Operators: Innovative Data Processing and Multi-Model Evaluation" Mathematics 13, no. 17: 2794. https://doi.org/10.3390/math13172794

APA Style

Chen, Y., Chen, J., Xie, X., Yi, W., & Ji, Z. (2025). Prediction of Mental Fatigue for Control Room Operators: Innovative Data Processing and Multi-Model Evaluation. Mathematics, 13(17), 2794. https://doi.org/10.3390/math13172794

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction of Mental Fatigue for Control Room Operators: Innovative Data Processing and Multi-Model Evaluation

Abstract

1. Introduction

2. Literature on the Prediction of Mental Fatigue

3. Materials and Methods

3.1. Research Preamble

3.2. Overall Framework

3.3. Data Collection and Data Processing

3.3.1. Data Collection and Labeling

3.3.2. Data Preprocessing

3.3.3. Data Processing for Mental Fatigue Prediction

3.3.4. Models for Predicting Mental Fatigue

4. Pilot Study

4.1. Data Collection of Pilot Study

4.2. Data Processing of the Experiment

4.3. Experimental Outputs of Mental Fatigue Prediction

4.3.1. Mental Fatigue Prediction Based on LightGBM

4.3.2. Mental Fatigue Prediction Based on GRU

4.3.3. Mental Fatigue Prediction Based on TCN

4.3.4. Mental Fatigue Prediction Based on Transformer

4.3.5. Mental Fatigue Prediction Based on Attention-Based TCN

4.4. Comparison and Analysis of Performance of Mental Fatigue Prediction Models

5. Discussion and Conclusions

5.1. Contribution of the Work

5.2. Limitation and Future Work Opportunity

5.3. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI