A High-Performance Hybrid Transformer–LSTM–XGBoost Model for sEMG-Based Fatigue Detection in Simulated Roofing Postures

Acharya, Sujan; Kisi, Krishna; Gautam, Sabrin Raj; Mahmud, Tarek; Kayastha, Rujan

doi:10.3390/buildings15173005

Open AccessArticle

A High-Performance Hybrid Transformer–LSTM–XGBoost Model for sEMG-Based Fatigue Detection in Simulated Roofing Postures

by

Sujan Acharya

¹

,

Krishna Kisi

^1,*

,

Sabrin Raj Gautam

¹,

Tarek Mahmud

¹

and

Rujan Kayastha

²

¹

Department of Engineering Technology, Texas State University, San Marcos, TX 78666, USA

²

Material Science Engineering and Commercialization Program, Texas State University, San Marcos, TX 78666, USA

^*

Author to whom correspondence should be addressed.

Buildings 2025, 15(17), 3005; https://doi.org/10.3390/buildings15173005

Submission received: 18 July 2025 / Revised: 15 August 2025 / Accepted: 22 August 2025 / Published: 24 August 2025

(This article belongs to the Special Issue Safety Management and Occupational Health in Construction)

Download

Browse Figures

Versions Notes

Abstract

Within the hazardous construction industry, roofers represent one of the most at-risk workforces, with high fatalities and injury rates largely driven by Work-Related Musculoskeletal Disorders (WMSDs). The primary precursor to these disorders is muscle fatigue, yet its objective assessment remains a significant challenge for implementing proactive safety management. To address this gap, this study details the implementation and validation of an AI-driven predictive analytics framework for automated fatigue detection using surface electromyography (sEMG) signals. Data was collected as participants (novice roofers) performed strenuous, simulated roofing tasks involving sustained standing, stooping, and kneeling postures. A key innovation is a data-driven labeling methodology using Weak Monotonicity (WM) trend analysis to automate the generation of objective labels. After a feature selection process yielded seven significant features, an evaluation of standard models confirmed that their classification performance was highly posture-dependent, motivating a more robust, hybrid solution. The framework culminates in a high-performance hybrid machine learning model. This architecture synergistically combines a Transformer–LSTM network for deep feature extraction with an XGBoost classifier. The model outperformed all standalone approaches, achieving over 82% accuracy across all postures with consistently strong fatigue F1-scores (0.77–0.78). The entire framework was validated using a stringent Leave-One-Subject-Out (LOSO) cross-validation protocol to ensure subject-independent generalizability. This research provides a validated component for AI-enhanced safety management systems. Future work should prioritize field validation with professional workers to translate this framework into practical, real-world ergonomic monitoring systems.

Keywords:

muscle fatigue; surface electromyography (sEMG); machine learning; construction safety; hybrid model; wearable sensors; ergonomics

1. Introduction

The construction industry, a vital pillar of economic development, consistently ranks among the most hazardous sectors for its workforce in the United States [1]. This sector was responsible for approximately one in every five workplace deaths in private industry in 2023, recording 1075 fatal injuries, the highest number in over a decade [2]. Within this hazardous landscape, roofing stands out as an exceptionally high-risk occupation [3]. Roofing work is characterized by physically strenuous tasks, repetitive motions, and awkward postures. These ergonomic hazards are the primary contributors to Work-Related Musculoskeletal Disorders (WMSDs), which are defined as injuries and disorders affecting the body’s movement system and are caused or exacerbated by physical exertion [4]. The ergonomic burdens of roofing, such as spending up to 75% of working time in non-neutral positions like kneeling or stooping, lead to cumulative overloading and predispose workers to these debilitating conditions [5,6].

These WMSDs, which often arise specifically from overexertion and bodily reactions, represent a significant portion of the economic and human toll, costing U.S. businesses over USD 16 billion annually in direct costs alone [7]. The direct physiological driver of this overexertion is muscle fatigue, defined as a decline in a muscle’s force-generating capacity from prolonged or repetitive activity [8]. As muscles tire, performance degrades, increasing the likelihood of biomechanical errors. Detecting this state is therefore critical to preventing overexertion that leads to injury.

Despite this clear causal link, its assessment in the field has been traditionally problematic. A significant body of research has relied on subjective methods where workers rate their perceived exertion. While useful, these methods capture a perception rather than the objective physiological state of the muscle tissue, and they are prone to individual bias and interpretation. To truly prevent WMSDs, there is a critical need for proactive interventions built on objective, physiological measurement. Wearable sensor technology offers a transformative paradigm, and for localized fatigue, surface electromyography (sEMG) is an ideal tool. sEMG is a non-invasive technique that measures the electrical potential generated by muscle cells, providing a time-varying representation of the underlying muscle activity [9,10].

The raw sEMG signal is a complex time series from which numerous features can be extracted to track fatigue. These include amplitude-domain features like Root Mean Square (RMS) and frequency-domain features such as Median Frequency (MDF), which are considered classic and reliable indicators [9,11]. While these features are established, their relative effectiveness for identifying fatigue during the specific, physically demanding, and varied postures of roofing work is not well understood. This ambiguity necessitates a systematic investigation to determine which sEMG features are most effective at identifying muscle fatigue during roofing tasks.

The true potential of these features is unlocked through sophisticated machine learning (ML) algorithms [12]. Previous studies have explored various ML techniques, employing classical ensemble methods like Extreme Gradient Boosting (XGBoost) and deep learning models such as BiLSTM and CNN-LSTM to analyze sEMG signals [13,14,15]. While previous studies have demonstrated the potential of ML for fatigue detection, a critical review reveals that the translation of this research into practical, real-world safety systems has been stalled by two fundamental barriers.

The first is the critical barrier of subject-dependency, a multifaceted challenge that has severely limited the translation of research into practice. At a fundamental physiological level, comparing raw sEMG signals between individuals is inherently problematic due to vast differences in muscle size, skin impedance, and electrode placement, factors that create significant inter-individual variability [10]. Without robust signal processing, particularly normalization to a subject’s Maximum Voluntary Contraction (%MVC), any model is built on a foundation of uncalibrated, inconsistent data, rendering inter-subject comparisons unreliable [16,17]. Compounding this physiological challenge is a widespread methodological flaw in model validation. Many published findings rely on protocols that fail to test for inter-subject generalizability, instead opting for less stringent methods that lead to artificially inflated performance metrics. The failure to adopt rigorous validation, particularly Leave-One-Subject-Out (LOSO) cross-validation, means that many reported models, while accurate for the individuals they were trained on, are fundamentally unreliable when applied to new workers. Together, the neglect of proper normalization and rigorous validation has populated the literature with solutions that are ultimately impractical for deployment, creating a false sense of progress and hindering the development of truly universal systems.

The second is the critical barrier of subjective and unscalable data labeling. The development of modern supervised machine learning is contingent on large, accurately labeled datasets, a need that presents a critical bottleneck as generating such data is often expensive and time-consuming [18]. To circumvent this, the field has frequently relied on creating ground-truth labels from subjective, self-reported measures [12]. However, this reliance on subjective perception is not merely a minor inconvenience; it is a fundamental obstacle that introduces ambiguity and noise into the very ground-truth data used for model training. Human judgment of internal states is essentially ordinal and can be matched to interval scales only with difficulty, and single ratings are often tainted with measurement errors [19]. This compromises the reliability of the resulting models and prevents the development of the large-scale, objectively labeled datasets required to train truly robust and generalizable AI systems.

Finally, while hybrid ML models exist, the potential synergy between different modeling paradigms has not been fully exploited. The performance of advanced multi-stage hybrid architectures, for instance, a model that uses a deep learning network like a Transformer–LSTM for automated feature representation before feeding these features into a powerful classical classifier like XGBoost, is an underexplored area. Such an approach could offer the combined benefits of both paradigms, but its efficacy for fatigue detection has not been systematically evaluated. This points to a clear need to investigate whether advanced hybrid models perform better than traditional machine learning and deep learning models.

To dismantle these barriers and build a foundational component for a truly practical AI-enhanced safety management system, this study implements a framework designed to directly confront these challenges. Therefore, our research is guided by the following questions:

Which sEMG features are most effective at identifying muscle fatigue during roofing tasks?
Which machine learning models have the best performance in predicting muscle fatigue using the sEMG data?
Do advanced hybrid models perform better than traditional machine learning and deep learning models?

2. Methodology

The methodology of this research details the implementation of a multi-stage AI-driven predictive analytics framework designed to systematically classify localized muscle fatigue through the analysis of sEMG data. The workflow, depicted in Figure 1, begins with the collection of experimental data and proceeds through signal processing, feature engineering, and the implementation and evaluation of various machine learning models. This chapter provides a detailed step-by-step description of this process to ensure replicability.

2.1. Experimental Protocol and Data Collection

The foundational stage of this research involved a meticulous data acquisition protocol designed to generate high-quality physiological data from participants performing simulated occupational tasks. All procedures were reviewed and approved by the Institutional Review Board of Texas State University (IRB) under protocol #9649, ensuring the adherence to ethical guidelines for human subject research.

The experiment was carried out on a custom-built simulated roofing platform set on a 30-degree slope. Upon arrival at the laboratory, each participant was first informed of the experimental protocol and their consent to the experiment was given. Then, they were outfitted with wearable sensors for instrumentation. Twelve wireless sEMG sensors (Cometa sEMG sensors) were used to measure the electrical activity of key muscle groups. For this study, we recruited only novice participants (referred to as novice roofers). This was a deliberate choice to first test our model on a controlled, consistent group before including the wide variations seen in experienced workers. Although participants were novice, the experiment setup and postures involved in the roofing task were verified by experts and the novice participants replicated the postures as instructed. This study serves as the first phase of a larger research program that will include experienced roofers in the next study.

2.1.1. sEMG Sensor Placement

For sEMG sensor placement, the sensors were placed bilaterally on six muscle groups (shown in Figure 2 and Figure 3) critical for posture and movement during the selected tasks: erector spinae (lower back), rectus femoris (quadriceps), rectus abdominis (abdominals), biceps femoris (hamstrings), tibialis anterior (shin), and gastrocnemius (calf). The placement followed the standardized SENIAM (Surface EMG for Non-Invasive Assessment of Muscles) guidelines to ensure high signal quality, minimize crosstalk, and maintain replicability [20]. Prior to sensor placement, the skin at each location was prepared by wiping with alcohol to reduce skin impedance and improve signal conductivity.

Before the main experimental task, the MVC of each participant was determined for each of the 12 muscles. This involved performing three repetitions of muscle-specific isometric exercises [16,20]. Specific MVC tasks included the following:

Erector Spinae: Lumbar extension.
Rectus Abdominis: Sit-ups.
Rectus Femoris: Isometric knee extension at 90° flexion.
Biceps Femoris: Isometric prone leg curl.
Tibialis Anterior: Foot dorsiflexion against resistance.
Gastrocnemius: Plantar flexion of the foot while standing on one leg.

In addition, a resting period was recorded to establish a baseline sEMG value for each muscle, which is essential for the two-point normalization process used to negate the effects of hyperactive muscles.

2.1.2. Experiment

After a sufficient rest period to prevent premature fatigue, participants proceeded to the main experimental tasks. These tasks were performed on a custom-built sloped platform set at a 30-degree incline with a maximum height of 6 feet, designed to simulate a roofing environment. The core experimental tasks required participants to maintain three quasi-static postures. These specific postures and the experimental setup were selected and validated through expert consultation. Figure 4 illustrates participants performing these tasks. This study defines the specific postures as follows:

Standing: The participant maintains balance by exerting pressure on their legs, positioning one foot slightly in front of the other (Figure 4b).
Kneeling: The participant places both knees on the sloped surface, supported by 2-inch foam padding to minimize discomfort, and exerts pressure on them (Figure 4c).
Stooping: The participant bends their upper body forward while flexing their legs and applying pressure on the lower limbs (Figure 4a).

The experimental protocol varied by posture. For the stooping and kneeling tasks, each conducted on a separate day, the participants performed four repetitions of a cycle consisting of a four-minute static hold followed by a one-minute rest. The standing protocol involved alternating between left-foot-up and right-foot-up stances for a total of eight 4 min trials (four for each stance). Each trial was separated by a one-minute rest, with a single four-minute extended break provided after the first four trials. Throughout the tasks, continuous data collection was conducted using inertial measurement unit sensors to track movement, EMG sensors to measure muscle activity, and an O₂ meter to monitor oxygen levels and pulse rate. Gautam et al. [21] performed a preliminary study to find the most activated muscle groups using a subset of this dataset.

2.2. Data Processing and Analysis

Following the experiment, the collected sEMG data was systematically processed and analyzed. This phase involved exporting and cleaning the data, normalizing the signals, applying signal processing techniques, generating fatigue labels, and calculating a comprehensive set of features.

First, the raw data from the sensors were exported for processing. A two-point MVC-Based Normalization: This procedure scaled the filtered sEMG signals to a range of 0 to 1, where the baseline sEMG value of the resting period was set to 0 and the peak amplitude from the MVC test was set to 1. This is essential to reduce variability caused by differences in electrode placement, muscle size, or skin impedance [16,17].

The normalized signals then underwent further sEMG signal processing. The signals were first filtered using a bandpass filter with a frequency range of 20–450 Hz, a standard practice in sEMG research to isolate physiological signals while removing low-frequency motion artifacts and high-frequency noise. A 60 Hz notch filter was also applied to specifically eliminate power line interference [20,22]. An outlier removal script was then run on the normalized signals; a 5 Standard Deviation (SD) threshold was set, and any data outside of this range were replaced by the 5SD value. Finally, the clean signals were segmented into overlapping windows using a sliding window technique with a window size of 2.5 s and a 90% overlap, preserving temporal information for subsequent modeling.

From the processed sEMG signal, two parallel paths were taken: one for labeling the data and one for feature engineering.

2.2.1. WM Value-Based Labeling

A key innovation of this study was a robust, two-stage process for the generation of high-confidence fatigue labels directly from sEMG data.

Stage 1: WM-based label generation: This process was adapted from the Weak Monotonicity method proposed by [23]. Through an exploratory analysis of several established fatigue indicators, MDF was selected as the most consistent indicator for the labeling pipeline. Crucially, while MDF physiologically decreases with fatigue, this study deliberately tested for an increasing monotonic trend to standardize the procedure. Consequently, the interpretation of the WM score was inverted: a lack of a strong increasing trend (i.e., a flat or decreasing trend) was used as the indicator for the onset of fatigue. Based on this logic, a WM value below 0.52 was labeled ‘fatigue’, and a value above 0.58 was labeled ‘non-fatigue’. The thresholds were selected based on our exploratory study. The criteria for the WM thresholds were data loss minimization and physiological consistency of the labeled groups. The threshold was selected because this retained the maximum number of samples in the minority class while also being statistically distinct in the pattern of change for physiologically recognized characteristics of fatigue.
Stage 2: Data curation: To ensure the highest possible integrity of the labels, all data segments initially falling within the intermediate range (0.52 to 0.58) were labeled ‘Uncertain’ and were excluded from the final dataset used for model training. This conservative approach was adopted to minimize ambiguity and potential label noise.

This two-stage process represents a step towards automating the generation of objective ground-truth labels, moving beyond subjective reporting.

2.2.2. Feature Calculation and Significance Testing

In parallel, a Python script was used to extract a comprehensive set of physiologically relevant features from each 2.5 s sEMG window. These included the following:

Time-Domain Features
–
Root Mean Square (RMS): Represents the square root of the average of squared sEMG signal amplitudes in a given window. Reflects the signal’s power and is often correlated with muscle contraction force.
–
Integrated EMG (IEMG): The sum of absolute values of the sEMG signal over a window. Indicates total muscle activity.
–
Mean Absolute Value (MAV): The average of the absolute values of the sEMG signal amplitudes, representing overall signal intensity.
–
Willison Amplitude (WAMP): Counts the number of times the absolute difference between consecutive sEMG samples exceeds a predefined threshold. Provides information about the frequency of muscle activation changes.
–
Difference Absolute Standard Deviation Value (DASDV): A variation of RMS that measures the standard deviation of the differences between consecutive samples. Reflects signal variability.
–
Average Amplitude Change (AAC): The average absolute difference between consecutive signal samples, indicating how rapidly the signal changes.
Frequency-Domain Features
–
Median Frequency (MDF): The frequency that divides the power spectrum into two regions with equal power. Commonly used to indicate muscle fatigue.
–
Mean Frequency (MNF): The average frequency weighted by the power spectrum. Reflects the frequency content of the signal.
–
Variance of Central Frequency (VCF): The variance around the mean frequency, providing insight into how spread out the frequency components are.
–
Power Spectrum Ratio (PSR): The ratio of power within a specific frequency band to the total power. Highlights dominant frequency regions.
Nonlinear and Statistical Features
–
Sample Entropy (SampEn): A measure of the signal’s complexity and regularity. Higher values indicate greater complexity.
–
Skewness (Skew): Describes the asymmetry of the amplitude distribution. Indicates signal bias or uneven activity.
–
Kurtosis: Measures the “peakedness” of the amplitude distribution. High values suggest the presence of outliers or burst activity.
–
Mean Power Ratio (MPR): Ratio of mean power within a specific frequency band to the total power. Emphasizes energy distribution in particular frequency regions.

To answer Research Question 1, Feature Significance Tests were conducted. A Kruskal–Wallis H-test, a non-parametric test robust to the non-Gaussian distributions typical of sEMG features, was applied to rank each engineered feature according to its ability to differentiate between the ‘fatigue’ and ‘non-fatigue’ classes [24]. This filter method provided an exploratory ranking to identify the most significant features for subsequent modeling.

2.3. Machine Learning Implementation and Evaluation

A comprehensive suite of machine learning models was implemented and systematically evaluated to address the research questions. The modeling process involved three distinct approaches as depicted in the flowchart.

2.3.1. Handling Class Imbalance

A critical aspect of the training process was addressing the class imbalance within the dataset. To mitigate the risk of models developing a bias towards the majority class, a class weighting strategy was uniformly applied across all trained models. For classifiers implemented with scikit-learn (Random Forest) and TensorFlow/Keras (deep learning models), the class weight = ‘balanced’ parameter was utilized. This automatically adjusts model penalties to give more importance to the minority class. For the XGBoost classifier, the equivalent scale_pos_weight parameter was calculated and set for each training fold. This ensured that the models were trained to value the correct classification of fatigue instances as highly as non-fatigue instances, leading to more robust performance.

2.3.2. Model Architectures and Hyperparameters

All models were developed in Python using the scikit-learn and TensorFlow libraries. Key architectural and training hyperparameters for each model are detailed below. All of the models were trained for 50 epochs with early stopping.

Ensemble Models
–
Random Forest (RF): A Random Forest classifier was configured with 300 decision trees. The maximum depth of each tree was limited to 20, and a minimum of 5 samples was required to form a leaf node.
–
XGBoost (Standalone): An XGB classifier was configured with 500 boosting rounds (n_estimators), a learning rate of 0.001, and a max_depth of 6. Regularization was applied via a subsample ratio of 0.8 and a colsample_bytree of 0.8.
Deep Learning Models
All deep learning models were trained using the Adam optimizer and employed Early Stopping (monitoring validation loss with a patience of 10 epochs) and reduced learning rate on plateau callbacks.
–
1D-CNN: The architecture consisted of three sequential 1D convolutional blocks (64, 128, and 256 filters), each with a kernel size of 5, followed by Batch Normalization and Max Pooling 1D. A Global Average Pooling 1D layer preceded the final dense classifier head. Key hyperparameters included a learning rate of 0.001, batch size of 128, and a dropout rate of 0.5.
–
Transformer: The model was composed of 4 stacked Transformer encoder blocks. Each block contained a Multi-Head Attention layer with 4 Attention heads and a head size of 256. Hyperparameters included a learning rate of 0.001, a batch size of 128, and dropout rates of 0.4 (Attention) and 0.5 (classifier).
–
CNN-LSTM with Attention: This model began with a Conv1D layer (64 filters, kernel size = 5), followed by an LSTM layer (64 units) and a custom Attention mechanism. The hyperparameter included a learning rate of 0.001, a batch size of 128, and a dropout rate of 0.5.
Advanced Hybrid Model (Transformer–LSTM–XGBoost)
This model utilizes a two-stage process.
–
Stage 1 (Feature Extractor): A Transformer–LSTM network processes three parallel input streams (raw signal, engineered features, muscle ID) to generate a 64-dimensional “deep features” latent vector. The architecture of the Transformer and LSTM paths mirrors the standalone models described above. The deep learning components were trained with a learning rate of 0.0001 and a batch size of 64.
–
Stage 2 (classifier): The final classification is performed by an XGBoost classifier which takes the 64-dimensional latent vector as input. This classifier was configured with hyperparameters identical to the standalone XGBoost model described previously.

2.3.3. Training and Validation Strategy

For all classification models, a Leave-One-Subject-Out (LOSO) cross-validation strategy was utilized. In LOSO, a model is trained on data from all participants except for one, which is held out as the complete test set. This process is repeated until every participant has served as the test set exactly once. The final performance is reported as the average across all folds. LOSO provides a much more realistic and robust estimate of how a model will generalize to new, unseen individuals.

2.3.4. Performance Evaluation Metrics

The performance of all machine learning models was assessed using a suite of standard quantitative classification metrics derived from the confusion matrix [25].

Accuracy: (TP + TN)/(Total Samples);
Precision: TP/(TP + FP);
Recall: TP/(TP + FN);
F1-score: 2 ∗ (Precision ∗ Recall)/(Precision + Recall).

where TP = true positive, TN = true negative, FP = false positive and FN = false negative.

High accuracy is critical for a reliable classification system, while the F1-score was considered particularly important as it provides the best overall picture of a model’s balanced performance [26].

3. Results

This chapter presents the findings derived from the multi-stage sEMG data processing, feature extraction, and machine learning model training. The results are organized to sequentially address the research questions, detailing the data labeling process, the performance of various models on raw and engineered data, the identification of key sEMG features for fatigue detection, and finally, the evaluation of advanced hybrid model architecture. The analysis was conducted on data collected from a cohort of novice participants, whose demographic characteristics are summarized below.

The demographic characteristics of the participant cohort were analyzed per postural condition. The mean age for the standing, stooping, and kneeling postures was 27.88 (SD = 7.30), 26.57 (SD = 7.52), and 27.38 (SD = 7.12) years, respectively. Average height was similarly recorded as 172.88 cm (SD = 8.21), 173.14 cm (SD = 8.97), and 172.13 cm (SD = 8.07), while mean weight was 74.60 kg (SD = 11.83), 77.94 kg (SD = 13.23), and 73.24 kg (SD = 12.63) for the respective postures.

3.1. Data Labeling and Dataset Characteristics

This section details the outcomes of the sEMG data preprocessing and the two-stage fatigue labeling methodology. The objective here is to characterize the final dataset used for model training, ensuring transparency and validation in labels established for muscle fatigue prediction. The entire process was designed to create high-confidence, data-driven ground-truth labels for the subsequent supervised machine learning tasks.

The initial phase of establishing labels involved applying the WM trend analysis to the normalized sEMG data for all three postures. This method, adapted from [23] with thresholds refined for the current datasets, provided a preliminary labeling of fatigue states. The specific thresholds we used, which we obtained through exploratory study, were

WM < 0.52: Fatigue
0.52 ≤ WM ≤ 0.58: Uncertain
WM > 0.58: Non-Fatigue

The distribution of these initial labels across the 12 monitored muscles for the standing, stooping, and kneeling postures is detailed in Figure 5.

Figure 5 provides a detailed comparative analysis of muscle state distribution, utilizing a grouped stacked horizontal bar chart to facilitate a direct comparison across the three distinct postures. In this visualization, each of the twelve muscles, identified by an abbreviation on the y-axis, is represented by a cluster of three bars, where each bar corresponds to a specific posture (standing, stooping, or kneeling). This allows for an immediate assessment of how a single muscle’s activation state changes with activity.

From the figures, we can see that the right and left upper rectus abdominis (R-URA and L-URA) consistently had the lowest percentages of fatigue among all the muscles for all postures. For example, R-URA registered fatigue in only 6%, 10%, and 12% of segments for standing, stooping, and kneeling, respectively. After these muscles, the least fatigued were the right and left rectus femoris. On the other hand, the muscles with the most instances of ‘fatigue’ were the right lumbar erector spinae and the right and left tibialis anterior.

The results also reveal a critical insight into postural ergonomics: the impact of a given posture is highly muscle-specific. A notable disparity in fatigue was observed in the lower leg musculature. For example, the gastrocnemius (calf muscle) demonstrated substantial variation, with the stooping posture inducing the highest fatigue levels (31–32%), likely due to the increased demand for ankle plantarflexion to counteract the forward shift in the body’s center of mass on the incline. Conversely, kneeling, which provides a larger and more stable base of support, reduced the load on the gastrocnemius, resulting in lower fatigue levels (20–23%).

To maintain the highest integrity and confidence in the ground-truth labels used for subsequent machine learning model development, a conservative approach was adopted for the data curation stage. The segments initially classified by the WM trend analysis as ‘uncertain’ (0.52 ≤ WM ≤ 0.58) represented ambiguous states that could not be definitively categorized. On average, 13.2% of the data was categorized as ‘uncertain’ and was flagged for exclusion. All segments falling into this ‘uncertain’ category were treated as Not a Number (NaN) and systematically excluded from the dataset used for training and evaluating the fatigue prediction models. This decision ensures that the models were trained exclusively on data segments with clear, high-confidence labels of either ‘non-fatigue’ or ‘fatigue,’ thereby minimizing potential noise and ambiguity in the learning process.

The final composition of the dataset used for modeling, after the exclusion of these uncertain segments, is presented in Table 1. As shown in the table, for all three postures, the final dataset available for modeling exhibited a notable class imbalance. Across the postures, approximately 76% of the total segments were categorized as ‘non-fatigue,’ while only 24% were categorized as ‘fatigue’. This characteristic of the final dataset is an important consideration for the subsequent evaluation of model performance, particularly when interpreting metrics sensitive to class distribution like the F1-score.

3.2. Baseline Model Performance on Raw sEMG Data

Before undertaking comprehensive feature engineering, an initial exploration analysis was conducted to establish a performance baseline for muscle fatigue prediction using only the raw sEMG signals. This crucial step aimed to assess the inherent discriminative information present in the time-series data itself and to provide a benchmark against which the more complex, feature-based models could be compared. By evaluating models on the raw signal, we can quantify the precise value added by the subsequent feature engineering process.

For this baseline evaluation, three different machine learning models were selected: an LSTM network, an RF model, and an XGBoost model. These models were trained and tested directly on the normalized and outlier-removed raw sEMG time-series data. To investigate the potential for a universal model even at this raw data stage, signals from all twelve monitored muscles were pooled for each of the three distinct roofing postures: stooping, standing, and kneeling. The classification performance of these models on the raw data is summarized in Table 2.

As tabulated in Table 2, the results reveal a consistent pattern of model efficacy on the unprocessed time-series data. The LSTM network consistently outperformed the other two models, Random Forest and XGBoost, in all of the postures. This is a logical conclusion, as the LSTM model architecture was specifically built to understand and learn from temporal relationships, a characteristic that is crucial for extracting meaningful deep features from sequential raw signals.

A closer examination of the performance metrics reveals the following:

For the standing posture, the LSTM achieved an accuracy of 55.34% and an F1-score of 59.01%. In contrast, the Random Forest model performed near the level of random chance with an accuracy of 48.13%, and the XGBoost model achieved 51.78% accuracy.
For the stooping posture, all models showed a slight improvement. The LSTM model reached an accuracy of 58.2% and an F1-score of 62.33%, while the Random Forest model achieved its best performance here with a 63.83% F1-score, despite a lower accuracy (55.23%).
The best baseline performance was observed during the kneeling posture, where the LSTM model achieved an accuracy of 57.88% and a comparatively stronger F1-score of 66.76%. The relatively better performance during this posture might be attributed to the more static nature of the muscle contractions involved, leading to less signal non-stationarity compared to the more dynamic elements present in stooping or the subtle postural adjustments in standing.

While these accuracy and F1-score values demonstrate that the raw sEMG signal contains some discernible patterns related to muscle fatigue, which is yielding results better than random chance, they also clearly highlight the limitations of relying solely on unprocessed time-series data for reliable fatigue classification. The performance, even for the best-performing LSTM model, does not meet the requirements for a practical and dependable application.

Overall, the baseline performance achieved with raw sEMG data was deemed insufficient for developing a robust fatigue prediction system. This outcome strongly motivated and justified the subsequent, more intensive approach of detailed feature engineering. The following sections will therefore focus on the performance of models trained using a comprehensive set of physiologically relevant sEMG features designed to enhance the discriminability between fatigued and non-fatigued states.

3.3. Significance of sEMG Features in Fatigue Detection

Having established that raw sEMG data alone provides limited predictive power, the subsequent analysis focused on a multi-step statistical process to identify the most important engineered features for fatigue detection. This section details the methodology and findings used to answer Research Question 1, “Which sEMG features are most effective at identifying muscle fatigue during roofing tasks?”, by identifying and ranking the most discriminative features.

The first step was to determine the appropriate statistical test for comparing feature distributions between ‘fatigue’ and ‘non-fatigue’ states. A Shapiro–Wilk test for normality was performed on the distributions of each engineered sEMG feature for each muscle group across all participant data.

As illustrated by the representative Q-Q plots in Figure 6, the data consistently and significantly deviated from a normal distribution. The data points on the plots consistently diverge from the theoretical quantile line, a pattern that is statistically confirmed by the accompanying Shapiro–Wilk test results, which yielded a p-value < 0.001 for all tests. This finding rejected the null hypothesis of normality and confirmed our initial suspicion that the sEMG data would not be normally distributed, particularly as the muscles become increasingly activated during the task. This step was crucial as it formally validated the decision to use a non-parametric statistical test for the subsequent analysis.

Based on the results of the normality testing, a non-parametric approach was required. To quantify the discriminative power of each feature, a one-way Kruskal–Wallis H-test was performed. The test was applied systematically to compare the distribution of each of the 26 engineered features between the ‘fatigue’ and ‘non-fatigue’ states. This process was repeated for each of the 12 monitored muscles within every individual data file (32 total files from eight participants × four repetitions).

This granular analysis resulted in a large set of significance tests, assessing the ability of every feature to detect fatigue in every muscle for every experimental trial. To account for the large number of comparisons and control for the false discovery rate, a Benjamini–Hochberg False Discovery Rate (FDR) correction was applied with an alpha of 0.05. A feature was considered to have strong discriminative power for a given muscle in a given trial if the FDR-corrected p-value was less than 0.05.

To synthesize the granular statistical results into a high-level comparison of feature effectiveness, the outcomes were systematically aggregated. The process began by quantifying how frequently each feature detected a significant trend for each muscle within a given posture. For example, analysis of the MDF feature during the sustained kneeling task revealed it was a highly consistent indicator for the biceps femoris (showing a significant trend in 31–32 out of 32 experimental files) but was less consistent for the upper rectus abdominis (15–20 significant files). Summing out these instances for just the kneeling posture showed that the MDF feature successfully identified fatigue in 338 of the 384 possible muscle-trial combinations, yielding an overall consistency of 88%.

This aggregation method was then applied to all key sEMG features across all three postures (standing, stooping, and kneeling). For each feature and posture, the total count of significant instances was summed and divided by the total number of experimental files to calculate a final “consistency percentage.” The complete results of this comprehensive analysis are presented in Table 3. As summarized in the table, the analysis reveals a clear hierarchy of feature effectiveness. The frequency-domain features, MDF and MNF, emerge as the most consistent, showing a significant change in an average of 88% of all datasets across all postures. Other metrics such as VCF, Skew, PSR, and SampEn also proved highly reliable, with average significance rates ranging from 76% to 81%. While the amplitude-based features RMS and MAV were also frequently significant, they demonstrated slightly lower consistency (73% and 72%, respectively).

These statistical findings are visually corroborated by main effect plots, which depict the general trends of these features. Figure 7 presents these main effects for the left rectus femoris muscle as a representative example, illustrating the characteristic physiological changes from a non-fatigued to a fatigued state. All features demonstrated a highly significant statistical difference between the two conditions (Kruskal–Wallis p < 0.001). A subset of features showed a clear increase with the onset of fatigue: RMS (a) rose substantially from a mean value of approximately 0.0047 to 0.0069, while Skew (e) increased from approximately 1.8 to 2.5. Conversely, the remaining features exhibited a significant decrease, most notably the frequency-domain features MDF (b) and MNF (c), which showed pronounced drops from nearly 100–102 Hz down to 65–81 Hz.

To complement the statistical significance identified by the Kruskal–Wallis H-test, a Random Forest model was also trained using the full set of extracted features. The purpose of this approach was to evaluate the relative predictive power of each feature in distinguishing between non-fatigued and fatigued states. The model’s built-in feature importance mechanism provides a quantitative ranking based on how much each feature contributes to classification accuracy, offering a practical perspective on feature utility beyond statistical significance. Figure 8 depicts the mean feature importance scores derived from this model across all participants and postures. The ranking provides strong corroboration for the findings from the statistical analysis, with the same group of features emerging as the most valuable. Frequency-domain metrics MNF, MDF, PSR, and VCF are again ranked as the most influential predictors, reinforcing their status as primary indicators of fatigue. Similarly, Skew, SampEn, and RMS also demonstrated substantial importance, confirming their value in a predictive context.

Considering the converging evidence from both the statistical significance tests (Table 3) and the model-based importance ranking (Figure 8), a final set of seven features was chosen to definitively answer Research Question 1. Based on their superior and consistent performance in this dual analysis, MDF, MNF, RMS, VCF, PSR, Skew, and SampEn were selected for further investigation and for building the machine learning models in the subsequent sections.

3.4. Performance of Feature-Extracted sEMG Models

Building upon the feature selection process detailed in the previous section, this section presents a comparative evaluation of machine learning models to answer research question 2: “Which machine learning models have the best performance in predicting muscle fatigue using the sEMG data?”

A diverse set of models were selected to assess the performance on the engineered seven-feature set. This selection includes the following:

Traditional Ensemble Models: Random Forest (RF) and XGBoost, which are well-established benchmarks renowned for their robustness on structured, tabular data.
Deep Learning Architectures:
–
A one-dimensional Convolutional Neural Network (1D-CNN);
–
A hybrid CNN-LSTM with an Attention mechanism;
–
A Transformer-based model.

The classification performance metrics for models trained on sEMG features from the standing posture are presented in Table 4. The results for the standing posture indicate that the CNN-LSTM with Attention model yielded the best performance. It achieved the highest test accuracy (0.7830 ± 0.0120) and the best F1-score of 0.7135. The XGBoost model also performed competitively, positioning it as the clear second-best performer.

Conversely, the Transformer architecture underperformed significantly, with the lowest accuracy (0.6717) and F1-score (0.6120).

The superior classification capability of the leading models is further detailed in the confusion matrices presented in Figure 9.

An analysis of the models in Figure 9 reveals their predictive capabilities.

The CNN-LSTM with Attention model (a) correctly identified 33,127 fatigued segments (TP) and 64,354 non-fatigued segments (TN), demonstrating its balanced classification strength.
The XGBoost model (c), while having slightly fewer TPs, notably committed the fewest false negative errors (10,290 instances). This is important as it indicates the lowest rate of missed fatigue cases among all models.
The poor performance of the Transformer (b) is explained by its high number of FPs where it misclassified a high number of non-fatigued segments as fatigued, indicating a model bias that reduces its reliability.

This detailed breakdown confirms that for the standing posture, the CNN-LSTM and XGBoost models were better.

The stooping posture imposes a significant load on the lumbar and leg muscles, creating a different set of fatigue patterns. The performance of the models for this task is detailed in Table 5. An analysis of model performance for the stooping posture reveals a shift in the top-performing model. The XGBoost model demonstrated superior performance with the highest test accuracy (0.7809 ± 0.0109) and the highest F1-score (0.6943 ± 0.0120). This suggests that the feature patterns indicative of fatigue while stooping are captured most effectively by the XGBoost model. The CNN-LSTM with Attention model followed closely with a test accuracy of 0.7753 and a F1-score of 0.6808. In contrast, RF, while achieving a reasonable accuracy (0.7599), showed a significant weakness in its F1-score (0.6163).

A detailed examination of the classification behavior is provided by the confusion matrices in Figure 10.

The confusion matrices (Figure 10) confirm the strength of the XGBoost model (a), which minimized the critical False Negative errors, indicating high sensitivity to fatigue. In contrast, the Random Forest model (e) showed a significant weakness in fatigue detection, misclassifying a high number of fatigued segments as not fatigued.

This analysis confirms that for the stooping posture, the XGBoost model achieves high overall accuracy.

The kneeling posture introduces unique physiological demands. The performance of the models under this posture is detailed in Table 6.

The CNN-LSTM with Attention model emerged as the most balanced and effective classifier, achieving the highest overall test accuracy (0.7469 ± 0.0325) and the highest F1-score (0.6527 ± 0.0272). RF model attained the highest non-fatigue F1-score (0.8134), suggesting a specialization in correctly identifying non-fatigued states, but this came at the expense of lower performance on the fatigue class (fatigue F1-score of 0.5904). The Transformer model continued its consistent underperformance.

The confusion matrices for kneeling (Figure 11) confirm the strong, balanced performance of the CNN-LSTM model (b), which correctly identified a high number of fatigued segments while maintaining a low rate of False Negatives. This analysis also highlights the specific trade-offs of other models, such as the high number of False Negatives for Random Forest (d) and the high number of False Positives for XGBoost (c).

This detailed error analysis confirms that the CNN-LSTM with Attention provides the performance for the kneeling posture.

In direct response to Research Question 2, the comparative analysis across the three distinct postures confirms that classification performance was posture-dependent, with no single machine learning model demonstrating universal superiority. This result highlights the challenge of using standalone models for diverse ergonomic demands and motivated the development of a more robust, hybrid solution. The ensemble models, particularly XGBoost, showcased high efficacy in specific contexts, proving optimal for the biomechanically demanding stooping posture. However, the CNN-LSTM with Attention model emerged as the most consistently balanced performer overall. It achieved the highest performance in the standing and kneeling postures and remained highly competitive in stooping.

3.5. Performance of the Advanced Hybrid Model

The preceding analyses evaluated two distinct modeling philosophies: deep learning models trained on raw sEMG data and various machine learning models trained on a set of engineered features. The results demonstrated that while raw-signal models can automatically learn complex patterns, feature-based models excel by leveraging known physiological indicators of fatigue. This outcome naturally leads to the central research question for this final stage of analysis: Can an advanced hybrid model, which integrates both raw data and engineered features, outperform the standalone deep learning and traditional machine learning models?

To answer this question, and in direct response to Research Question 3, a hybrid architecture was developed. It was hypothesized to achieve synergistic performance by integrating the complementary strengths of these two data types. The model’s design, illustrated in Figure 12, is a direct result of the preceding insights. It employs sophisticated, multi-stage architecture to make a final, decisive classification.

The architecture detailed in Figure 12 operates in two distinct stages.

Stage 1 (Feature Extraction): This stage processes four parallel input streams to generate a rich, contextualized feature representation.
–
Raw sEMG Signal (Input A): A Transformer path autonomously extracts complex, hierarchical patterns directly from the raw sEMG waveform, capitalizing on its ability to discern intricate patterns that may not be captured by pre-defined features.
–
Engineered sEMG Features (Input B): An LSTM path processes the time series of the seven selected engineered features (MDF, MNF, RMS, etc.) to explicitly model the signal’s evolution and learn its temporal dynamics.
–
Muscle ID (Input C): Separate dense layers learn embedding representations for these categorical variables, allowing the model to account for muscle-specific and subject-specific physiological variations.
The outputs from these paths are concatenated into a unified feature vector, which is passed through a final dense layer to produce a rich, 64-dimensional latent vector of deep features.
Stage 2 (Final Classification): This stage then takes the learned latent feature vector as its sole input. An XGBoost classifier, selected for its robust performance in the preceding feature-based analyses, performs the final binary classification into fatigued (State 0) or non-fatigued (State 1).

The performance of this hybrid model, evaluated for each posture, is presented in Table 7. The performance of the hybrid model demonstrates a significant advancement over the single-paradigm models. It achieves not only high but also remarkably consistent performance across all three distinct postural conditions. The model’s overall accuracy is clustered in a narrow and stable range, from 82.13% for standing to a peak of 82.66% for stooping. This indicates a strong capability to generalize its predictive power regardless of the specific biomechanical demands, a marked improvement over the posture-specific efficacy observed in the previous models.

A more granular, posture-by-posture analysis reveals the nuances of the model’s superior behavior.

For the Standing Posture: The model achieved an accuracy of 82.13% with a fatigue F1-score of 0.7772. A deeper dive into the constituent metrics for the fatigue class shows a precision of 0.7342 and a recall of 0.8257. This disparity, with recall significantly exceeding precision, indicates that the model is highly sensitive in detecting true instances of fatigue, a crucial attribute for ergonomic applications.
For the Stooping Posture: In this more demanding posture, the model yielded its highest accuracy of 82.66%. The fatigue F1-score was 0.7688, derived from a precision of 0.7154 and the highest recall value observed across all tests at 0.8309. The precision for the non-fatigued class was also exceptionally high at 0.9017, signifying that when the model predicts a non-fatigued state during stooping, it does so with very high confidence.
For the Kneeling Posture: The model maintained its robust performance with an accuracy of 82.38% and a fatigue F1-score of 0.7675. The precision–recall profile for the fatigue class (0.7249 and 0.8154, respectively) mirrored that of the other postures, again prioritizing the correct identification of fatigued states over precision.

To visually deconstruct these numerical results, the corresponding confusion matrices are presented in Figure 13. The confusion matrices offer a granular view of the hybrid model’s classification behavior and visually confirm a critical performance characteristic. Across all three postures, the model demonstrates exceptionally high sensitivity (recall) for the fatigue class. For instance, in the standing posture (a), the model correctly identified 370,287 fatigued segments (TPs) while misclassifying only 78,165 (FNs). This pattern, where TPs for fatigue substantially outnumber FNs, holds for both stooping (b) and kneeling (c).

This performance profile is highly desirable for an ergonomic monitoring system. The high recall signifies a low false negative rate, meaning the model is adept at correctly identifying true instances of fatigue, thus minimizing the risk of overlooking a worker’s fatigued state, the most critical error to avoid in a safety context. The trade-off for this high sensitivity is a slightly lower precision, evidenced by the number of FP (e.g., 134,066 for standing). This indicates the model may occasionally flag a non-fatigued state as fatigued. However, this is a far more acceptable error from a safety perspective than missing an actual case of fatigue.

In conclusion, and in direct answer to Research Question 3, this two-stage hybrid approach, by leveraging deep learning for feature representation and XGBoost for classification, has successfully created a robust model that not only performs with high accuracy but is also optimized for the most crucial requirement of an ergonomic tool: minimizing missed detections of fatigue. It clearly outperforms the standalone models evaluated previously.

4. Discussion

The preceding chapter detailed the results of a multi-faceted investigation into predicting muscle fatigue among novice roofers using wearable sensor technology and machine learning. This chapter provides an interpretation of these findings, contextualizing the performance of sEMG-based and hybrid modeling approaches within the existing literature and the specific demands of roofing tasks. The significance of the findings, methodological considerations, limitations of the study, and directions for future research are critically examined.

4.1. Recapitulation of Principal Findings

The comprehensive analysis of sEMG data yielded several key findings pertinent to the detection and assessment of muscle fatigue during simulated roofing tasks.

Firstly, the study established a robust two-stage sEMG labeling methodology, adapting the WM trend analysis and incorporating physiological markers to define ‘fatigue’ and ‘non-fatigue’ states, with ‘uncertain’ segments being conservatively excluded to ensure high-confidence ground-truth labels for model training.

Secondly, an initial baseline assessment revealed that models trained directly on raw sEMG data exhibited modest predictive performance across the standing, stooping, and kneeling postures (e.g., LSTM F1-scores ranging from approximately 59% to 67%). This underscored the necessity for subsequent feature engineering to enhance discriminability.

Thirdly, the analysis of engineered sEMG features identified MDF, MNF, RMS, Skew, VCF, PSR, and Sample Entropy as statistically significant and consistent indicators of fatigue. Frequency-domain features (MDF and MNF) showed the highest consistency (significant in 88% of datasets on average) in differentiating fatigue states across postures, aligning with known physiological responses to fatigue.

Fourthly, the performance of feature-based sEMG models in terms of overall test accuracy varied by posture. For the standing posture, the CNN LSTM Attention model achieved the highest accuracy (approximately 78.3%). In the stooping posture, the XGBoost model demonstrated the highest accuracy, reaching approximately 78.1%. These results highlighted that no single sEMG-only model universally achieved the highest accuracy across all specific tasks.

Fifthly, to explore the potential benefits of combining raw signal information with engineered features from sEMG, an advanced hybrid sEMG model was developed. This model utilized a Transformer–LSTM architecture to process both raw sEMG segments and the corresponding engineered sEMG features, with the extracted deep features then classified by an XGBoost classifier. This hybrid sEMG approach demonstrated strong and consistent performance across all three postures, achieving accuracies of approximately 82.1% for standing, 82.7% for stooping, and 82.4% for kneeling. The fatigue F1-scores were also robust, around 0.77–0.78 for all postures.

4.2. Discussion of sEMG-Based Fatigue Detection

This section critically evaluates the findings related to sEMG-based fatigue detection. It begins by interpreting the physiological significance of the observed feature behaviors, grounding them in established myoelectric principles. It then transitions to a comparative analysis of the machine learning model performances.

4.2.1. Significance and Physiological Interpretation of sEMG Feature Behavior

A foundational step in this research was to validate that the engineered sEMG features are physiologically meaningful indicators of muscle fatigue. The results presented in Section 3.3 confirm this unequivocally. The frequency-domain features, MDF and MNF, proved to be the most consistent indicators of fatigue. The characteristic downward trend of both MDF and MNF with developing fatigue is a classic hallmark of peripheral muscle fatigue and directly corroborates literature. This spectral compression is primarily attributed to a decrease in muscle fiber conduction velocity (MFCV) as metabolic byproducts accumulate and impair the efficiency of the sodium–potassium (Na+/K+) pump [15,27]. The high consistency of this finding across the varied biomechanical demands of standing, stooping, and kneeling validates the study’s labeling methodology and underscores the fundamental nature of this myoelectric signature of fatigue.

The time-domain feature RMS, which quantifies the amplitude or power of the sEMG signal, was also a powerful differentiator and exhibited a clear, marked increase with the onset of fatigue. This phenomenon reflects an augmented neural drive from the central nervous system. As individual muscle fibers begin to fatigue, the nervous system attempts to compensate by recruiting additional, often larger, motor units and/or by increasing the firing rate of already active units [28]. Finally, the nonlinear feature Sample Entropy (SampEn) also proved to be a strong indicator, showing a distinct decrease as fatigue progressed. A lower SampEn value indicates a loss of signal complexity and an increase in its regularity, which is consistent with literature suggesting that the nervous system may adopt a more synchronized firing pattern across motor units as a mechanism to maintain force output under duress [29].

4.2.2. Methodological Framework for Feature Selection

The answer to Research Question 1 rests on a methodological framework that warrants discussion. The process involved applying the Kruskal-Wallis H-test to features derived from overlapping time-series windows. While the use of overlapping windows is a standard practice in biosignal processing that increases dataset density and enhances feature stability [30], it introduces autocorrelation that violates the independence assumption of the test for formal inference. However, in this study, the test was not used for formal inference but was instead repurposed as a computationally efficient, non-parametric heuristic to score and rank features based on their class separability. This reframing of a statistical test as a feature ranking tool is a well-precedented, cross-disciplinary practice [24]. Therefore, the findings regarding feature importance are based on a sound engineering approach, where the statistical test was used as an exploratory tool and its findings were corroborated by a model-based feature importance analysis.

4.2.3. Performance of sEMG Machine Learning Models

The evaluation of various machine learning models revealed nuanced performance differences that depend heavily on the nature of the task. For posture-specific tasks, which represent a more constrained and homogeneous feature space, different models excelled under different conditions. Notably, for the highly demanding stooping posture, the XGBoost model achieved the highest accuracy (78.1%). XGBoost, a powerful tree-based gradient boosting ensemble, is exceptionally effective at discovering complex, non-linear decision boundaries within well-defined, structured feature sets like those derived from a single, consistent posture [31]. A particularly insightful, albeit counterintuitive, finding was the consistent underperformance of the Transformer model. This result provides a crucial lesson on the practical application of different deep learning architectures. Transformers, while state of the art in fields like natural language processing, are notoriously data-hungry and lack the strong inductive biases for the locality and sequentiality that are inherent to CNNs and RNNs, respectively [32]. The superior performance of the CNN-LSTM model strongly suggests that for this type of physiological data and at this data scale, architectures with built-in spatial and temporal priors are more effective.

Finally, the study explored an advanced hybrid sEMG model that used a Transformer–LSTM network as a sophisticated feature extractor, with an XGBoost classifier making the final prediction. This two-stage pipeline delivered the highest and most consistent accuracies in the posture-specific contexts, achieving over 82% accuracy for standing, stooping, and kneeling. This approach successfully leverages the strengths of both paradigms: deep learning’s ability to learn rich, latent representations from both raw and engineered data, and gradient boosting’s exceptional power in classifying structured data. The success of the high-performance hybrid machine learning model is particularly noteworthy, confirming its potential as the core of a predictive analytics engine.

This architecture’s superiority, as demonstrated in Figure 14, is attributable to this synergistic design, which allows it to consistently outperform standalone ensemble learning and deep learning alternatives across all postures. The ablation study (Section 4.2.4) further confirmed that each of these components is a critical contributor to the model’s high performance.

The figure illustrates the value of methodological progression. For every posture, standing, stooping, and kneeling, there is a distinct, stepwise improvement in accuracy. The baseline models using only raw data show modest performance, with accuracy ranging from 55.3% to 58.2%. The introduction of feature engineering provides a substantial boost, with the best feature-based models achieving accuracies between 74.7% and 78.3%. Finally, the advanced hybrid model, which integrates raw + features, demonstrates clear superiority, pushing the accuracy to over 82% for all three postures. This visual evidence provides a compelling answer to the research questions, confirming the necessity of feature engineering and the superior performance of advanced hybrid architecture.

4.2.4. Ablation Study of the Hybrid Architecture

To validate the design of the hybrid architecture, an ablation study was conducted to quantify the contribution of its individual components. The full Transformer–LSTM–XGBoost model served as the performance baseline. Three ablated versions were then evaluated.

With the transformer path removed;
With the LSTM path removed;
With the final XGBoost classifier replaced by standard softmax layer.

The results for the kneeling posture are presented in Table 8. The results confirm that each component provides a critical contribution to the model’s performance. The removal of the LSTM path, which processes the engineered features resulted in a catastrophic drop in F1 score. Similarly, removing either Transformer path or the XGBoost led to significant degradation in both accuracy and F1 score. This ablation study validates that performance of the hybrid model is because of the synergy between all of its components.

4.3. Considerations for Real-Time Implementation and Deployment

The practical implementation of the proposed framework requires acknowledging both its reasoning delay and deployment hardware. The methodology’s 2.5 s data acquisition window represents the minimum fixed latency. An analysis of the subsequent computational delay (end-to-end computational time that includes signal processing, seven selected feature calculations, deep feature extraction using LSTM and Transformer models, and final classification) was performed on a laptop (AMD Ryzen 9 CPU, NVIDIA RTX 4060 GPU, 16 GB RAM, and SSD with ∼3500 MBps read and write speed). The timing of each cross-validation fold was calculated as an average computational time of 100 iterations within that fold.

The results yielded an average end-to-end average inference time of

861.6

ms with a standard deviation of

11.8

(ranging from

836.78

ms to

876.32

ms) for all computational steps following data acquisition. When this computational delay is added to the 2.5 s acquisition delay, the total inference time from data capture to classification is approximately 3.4 s, which is feasible for a near-real-time monitoring system. For deployment, the study used an offline method to train and test the models. The intended future implementation involves wearable sensors streaming data to a local PC for processing.

4.4. Methodological Considerations and Limitations

Although this study provides valuable information on the application of wearable technology for fatigue detection, its findings must be interpreted within the context of several methodological limitations. The study was conducted in a controlled laboratory environment, which, while necessary for ensuring data quality and participant safety, does not fully replicate the complexities of a real-world roofing worksite. Factors such as environmental stressors like extreme heat and sun exposure, variable surface conditions, and the psychological stress associated with working at significant heights were not incorporated. Moreover, the scope of the physical tasks was limited to three specific quasi-static postures. Actual roofing involves a much wider range of dynamic movements, including carrying materials and using tools, so the findings from these sustained postures may not be directly generalizable to the more variable physical demands encountered in the field. The participant cohort consisted exclusively of novice individuals. Although this was a deliberate choice to ensure a homogeneous sample and establish a critical physiological baseline for this foundational proof-of-concept study, it is also a limitation. Experienced roofers are likely to exhibit more efficient motor patterns and different fatigue signatures developed through long-term adaptation. Consequently, the models developed here are not yet validated for a professional workforce and their real-world generalization is limited until they are validated with professional roofers. This underscores the importance of the next planned phase of this research with professional roofers, which is stated in our future directions. Finally, the sample size, while adequate for this exploratory investigation, is relatively small for the development of highly generalized deep learning models. Architectures like the Transformer and LSTM are known to be data-hungry, and their performance can be constrained by limited datasets.

5. Conclusions and Future Directions

This study presents a data-driven framework for classifying muscle fatigue in simulated roofing tasks using sEMG data and advanced machine learning. A dual-analysis approach identified a core set of effective features spanning spectral, amplitude, and complexity domains, confirming the necessity of feature engineering over raw signal analysis. Model evaluation showed that performance depends on task-specific biomechanical demands, with different algorithms excelling in different postures.

An advanced hybrid model, combining a Transformer–LSTM for deep feature extraction with an XGBoost classifier, consistently outperformed all alternatives. Supported by a stringent cross-validation protocol, the findings establish a subject-independent, high-performance fatigue classification approach. This work sets a new benchmark for ergonomic monitoring systems and offers a clear pathway toward practical, real-world deployment.

Future Directions

Building on this validated framework and acknowledging its limitations, several key avenues emerge for translating this research into a practical, real-world ergonomic monitoring system. The next step is to move from the laboratory to the field for field validation with professional roofers. In parallel with this, and as a direct extension of this study’s findings, research will focus on a comparative analysis of novice and professional workers. This will quantify the effects of long-term adaptation on sEMG fatigue signatures and will be essential for developing models accurate for experienced workforce. Future work will aim to build a more holistic model of worker fatigue by integrating additional physiological data streams, such as heart rate variability (HRV), to assess autonomic nervous system response.

Author Contributions

Conceptualization, S.A., K.K., S.R.G., and T.M.; methodology, S.A., K.K., S.R.G., T.M., and R.K.; data collection: S.A., S.R.G., and R.K.; software, S.A. and T.M.; validation, S.A., K.K., S.R.G., T.M., and R.K.; writing—original draft preparation, S.A. and K.K.; writing—review and editing, K.K., S.R.G., T.M., and R.K. All authors have read and agreed to the published version of the manuscript.

Funding

This project is not externally funded.

Institutional Review Board Statement

The study protocols were approved by the Texas State University’s Institutional Review Board (IRB), with protocol numbers #9649 for the muscle fatigue prediction study.

Informed Consent Statement

Informed consent was obtained from all subjects involved in this study.

Data Availability Statement

This study did not report any open-source data.

Acknowledgments

We would like to thank the Translational Health Research Center at Texas State University for providing the roof-setup materials and accessories used to prepare the participant like hydrogel, shoes, and PPE. We would also like to express our sincere gratitude to all the anonymous participants who generously contributed their time and effort to this research. Their contribution was essential for this workplace safety research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Olimat, H.; Alwashah, Z.; Abudayyeh, O.; Liu, H. Data-Driven Analysis of Construction Safety Dynamics: Regulatory Frameworks, Evolutionary Patterns, and Technological Innovations. Buildings 2025, 15, 1680. [Google Scholar] [CrossRef]
U.S. Bureau of Labor Statistics. National Census of Fatal Occupational Injuries in 2023. Technical Report USDL-24-2564, U.S. Department of Labor. 2024. Available online: https://www.bls.gov/news.release/cfoi.nr0.htm (accessed on 10 July 2025).
Almaskati, D.; Kermanshachi, S.; Pamidimukkala, A.; Loganathan, K.; Yin, Z. A Review on Construction Safety: Hazards, Mitigation Strategies, and Impacted Sectors. Buildings 2024, 14, 526. [Google Scholar] [CrossRef]
Health and Safety Executive. The Health and Safety Executive Annual Report and Accounts 2023 to 2024. Technical Report HC 326, UK Government. 2024. Available online: https://www.gov.uk/government/publications/the-health-and-safety-executive-annual-report-and-accounts-2023-to-2024 (accessed on 12 July 2025).
Breloff, S.P.; Carey, R.E.; Dutta, A.; Sinsel, E.W.; Warren, C.M.; Dai, F.; Wu, J.Z. Kneeling trunk kinematics during simulated sloped roof shingle installation. Int. J. Ind. Ergon. 2020, 77, 102945. [Google Scholar] [CrossRef]
Dutta, A. An In-Depth Investigation of the Effects of Work-Related Factors on the Development of Knee Musculoskeletal Disorders among Construction Roofers. Ph.D. Dissertation, West Virginia University, Morgantown, WV, USA, 2020. [Google Scholar] [CrossRef]
Manning, C.; Jorgensen, M. The Price of Pain: Workers Compensation Costs for Musculoskeletal Claims in the State of Kansas, 2014–2022. J. Occup. Environ. Med. 2024, 66, 252–262. [Google Scholar] [CrossRef]
Ma, L.; Chablat, D.; Bennis, F.; Zhang, W.; Guillaume, F. A new muscle fatigue and recovery model and its ergonomics application in human simulation. Virtual Phys. Prototyp. 2010, 5, 123–137. [Google Scholar] [CrossRef]
De Luca, C.J. Myoelectrical manifestations of localized muscular fatigue in humans. Crit. Rev. Biomed. Eng. 1984, 11, 251–279. [Google Scholar]
Cifrek, M.; Medved, V.; Tonković, S.; Ostojić, S. Surface EMG based muscle fatigue evaluation in biomechanics. Clin. Biomech. 2009, 24, 327–340. [Google Scholar] [CrossRef] [PubMed]
Phinyomark, A.; Phukpattaranont, P.; Limsakul, C. Feature selection for sEMG signals based on the Mahalanobis distance. In Proceedings of the 2012 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, Phetchaburi, Thailand, 16–18 May 2012; pp. 1–4. [Google Scholar] [CrossRef]
Moshawrab, M.; Adda, M.; Bouzouane, A.; Ibrahim, H.; Raad, A. Smart Wearables for the Detection of Occupational Physical Fatigue: A Literature Review. Sensors 2022, 22, 7472. [Google Scholar] [CrossRef] [PubMed]
Bangaru, S.S.; Wang, C.; Aghazadeh, F.; Muley, S.; Willoughby, S. Oxygen Uptake Prediction for Timely Construction Worker Fatigue Monitoring Through Wearable Sensing Data Fusion. Sensors 2025, 25, 3204. [Google Scholar] [CrossRef] [PubMed]
Hwang, S.; Kwon, N.; Lee, D.; Kim, J.; Yang, S.; Youn, I.; Moon, H.J.; Sung, J.K.; Han, S. A Multimodal Fatigue Detection System Using sEMG and IMU Signals with a Hybrid CNN-LSTM-Attention Model. Sensors 2025, 25, 3309. [Google Scholar] [CrossRef]
Chen, S.W.; Liaw, J.W.; Chan, H.L.; Chang, Y.J.; Ku, C.H. A Real-Time Fatigue Monitoring and Analysis System for Lower Extremity Muscles with Cycling Movement. Sensors 2014, 14, 12410–12424. [Google Scholar] [CrossRef]
Burden, A. How should we normalize electromyograms obtained from healthy participants? What we have learned from over 25 years of research. J. Electromyogr. Kinesiol. 2010, 20, 1023–1035. [Google Scholar] [CrossRef]
Halaki, M.; Ginn, K. Normalization of EMG signals: To normalize or not to normalize and what to normalize to? In Computational Intelligence in Electromyography Analysis—A Perspective on Current Applications and Future Challenges; Naik, G.R., Ed.; InTech: Houston, TX, USA, 2012; pp. 175–194. [Google Scholar] [CrossRef][Green Version]
Maheshwari, A.; Killamsetty, K.; Ramakrishnan, G.; Iyer, R. Learning to Robustly Aggregate Labeling Functions for Semi-supervised Data Programming. arXiv 2021, arXiv:2109.11410. [Google Scholar] [CrossRef]
Annett, J. Subjective rating scales: Science or art? Ergonomics 2002, 45, 966–987. [Google Scholar] [CrossRef]
Hermens, H.J.; Freriks, B.; Disselhorst-Klug, C.; Rau, G. Development of recommendations for SEMG sensors and sensor placement procedures. J. Electromyogr. Kinesiol. 2000, 10, 361–374. [Google Scholar] [CrossRef]
Gautam, S.R.; Acharya, S.; Kayastha, R.; Mahmud, T.; Kisi, K. Muscle Activation Patterns in Stooping Posture on Sloped Roofing Surfaces. Proc. Assoc. Sch. Con 2025, 6, 816–825. [Google Scholar]
Guo, X.; Chen, Y.; Zhang, J. Automated detection of physical fatigue in transportation maintenance workers through physiological and motion data. Theor. Issues Ergon. Sci. 2025, 26, 158–177. [Google Scholar] [CrossRef]
Guo, X.; Lu, L.; Robinson, M.; Tan, Y.; Goonewardena, K.; Oetomo, D. A Weak Monotonicity Based Muscle Fatigue Detection Algorithm for a Short-Duration Poor Posture Using sEMG Measurements. In Proceedings of the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Virtual, 1–5 November 2021. [Google Scholar] [CrossRef]
Abbaszadeh, A.; Teixeira, C.A.; Yagoub, M.C.E. Feature Selection Techniques for the Analysis of Discriminative Features in Temporal and Frontal Lobe Epilepsy: A Comparative Study. Open Biomed. Eng. J. 2021, 15, 1–30. [Google Scholar] [CrossRef]
Powers, D.M.W. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar] [CrossRef]
Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
Fitts, R.H. Cellular mechanisms of muscle fatigue. Physiol. Rev. 1994, 74, 49–94. [Google Scholar] [CrossRef] [PubMed]
Basmajian, J.V.; De Luca, C.J. Muscles Alive: Their Functions Revealed by Electromyography, 5th ed.; Williams & Wilkins: Baltimore, MD, USA, 1985. [Google Scholar]
Rampichini, S.; Vieira, T.M.; Castiglioni, P.; Merati, G. Complexity Analysis of Surface Electromyography for Assessing the Myoelectric Manifestation of Muscle Fatigue: A Review. Entropy 2020, 22, 529. [Google Scholar] [CrossRef]
Ashraf, H.; Waris, A.; Gilani, S.O.; Kashif, A.S.; Jamil, M.; Jochumsen, M.; Niazi, I.K. Evaluation of windowing techniques for intramuscular EMG-based diagnostic, rehabilitative and assistive devices. J. Neural Eng. 2021, 18, 016002. [Google Scholar] [CrossRef]
Velarde, G.; Weichert, M.; Deshmukh, A.; Deshmane, S.; Sudhir, A.; Sharma, K.; Joshi, V. Tree boosting methods for balanced and imbalanced classification and their robustness over time in risk assessment. Intell. Syst. Appl. 2025, 22, 200354. [Google Scholar] [CrossRef]
Turner, R.E. An Introduction to Transformers. arXiv 2024, arXiv:2304.10557. [Google Scholar] [CrossRef]

Figure 1. Research methodology flowchart.

Figure 2. Surface electromyography (sEMG) sensors placement muscle groups.

Figure 3. Illustration of sEMG sensors worn on the 12 muscle groups. (a) View from front, (b) view from back.

Figure 4. Different simulated postures. (a) Stooping, (b) standing and (c) kneeling.

Figure 5. Distribution of labels across different muscles and postures. Note: Muscle abbreviation legend: L-BF = Left Biceps Femoris; R-BF = Right Biceps Femoris; L-GA = Left Gastrocnemius; R-GA = Right Gastrocnemius; L-LES = Left Lumbar Erector Spinae; R-LES = Right Lumbar Erector Spinae; L-URA = Left Upper Rectus Abdominis; R-URA = Right Upper Rectus Abdominis; L-RF = Left Rectus Femoris; R-RF = Right Rectus Femoris; L-TA = Left Tibialis Anterior; R-TA = Right Tibialis Anterior.

Figure 6. Representative normality Q-Q plots. The blue points represent the sample data quantiles, while the red line represents the theoretical quantiles of a normal distribution. The deviation of the blue points from the red line indicates a non-normal data distribution.

Figure 7. Main effect of fatigue state on sEMG features for the left rectus femoris muscle. The subfigures show the significant change (Kruskal-Wallis, p < 0.001) between non-fatigued and fatigued states for: (a) Root Mean Square (RMS), (b) Median Frequency (MDF), (c) Mean Frequency (MNF), (d) Sample Entropy (SampEn), (e) Skewness (Skew), (f) Variance of Central Frequency (VCF), and (g) Power Spectrum Ratio (PSR).

Figure 8. A bar chart representing the mean importance of features across all participants and postures.

Figure 9. Confusion matrices illustrating the classification performance for the standing task for five different models: (a) CNN LSTM Attention, (b) Transformer, (c) XGBoost, (d) 1dCNN, and (e) Random Forest.

Figure 10. Confusion matrices illustrating the classification performance for the stooping posture. Each matrix displays the absolute number and percentage of test segments classified by: (a) XGBoost, (b) 1D-CNN, (c) CNN LSTM with Attention, (d) Transformer, and (e) Random Forest. The main diagonal represents correct classifications, while off-diagonal cells represent misclassifications.

Figure 11. Confusion matrices illustrating the classification performance for the kneeling posture. Each matrix displays the absolute number and percentage of test segments classified by (a) 1D-CNN, (b) CNN LSTM with Attention, (c) XGBoost, (d) Random Forest, and (e) Transformer.

Figure 12. Hybrid transformer LSTM model with XGBoost classifier.

Figure 13. Comparative confusion matrices for the hybrid Transformer–LSTM–XGBoost model across different postures: (a) standing, (b) stooping, and (c) kneeling.

Figure 14. A comparative analysis of model accuracy for fatigue classification, segmented by posture and model complexity.

Table 1. Final distribution of labeled sEMG data segments used for model training for posture.

Posture	Total Segments	Segments Excluded	Final Segments	Fatigue (%)	Non-Fatigue (%)
Standing	706,884	89,228	617,656	24%	76%
Stooping	352,380	47,695	352,380	23%	77%
Kneeling	348,360	47,427	300,933	23%	77%

Table 2. LSTM model performance on raw sEMG data.

Posture	Model	Accuracy (%)	F1 Score (%)
Standing	LSTM	55.34	59.01
	Random Forest	48.13	58.62
	XGBoost	51.78	59.95
Stooping	LSTM	58.20	62.33
	Random Forest	55.23	63.83
	XGBoost	53.64	59.23
Kneeling	LSTM	57.88	66.76
	Random Forest	48.23	54.38
	XGBoost	53.05	60.16

Table 3. Consistency of feature differentiation between fatigue states.

Posture	MDF	MNF	RMS	MAV	SampEn	PSR	VCF	Skew
Standing	659	678	543	549	570	585	642	619
Percentage	86%	88%	71%	71%	74%	76%	84%	81%
Stooping	344	335	287	283	302	298	298	306
Percentage	89%	87%	75%	73%	78%	78%	78%	80%
Kneeling	338	342	278	272	296	304	289	299
Percentage	89%	89%	75%	71%	77%	79%	75%	78%
Average	88%	88%	73%	72%	76%	78%	79%	79%

Total number of muscle-specific datasets per posture: standing (8 participants × 8 repetitions × 12 muscles = 768); stooping and kneeling (8 participants × 4 repetitions × 12 muscles = 384).

Table 4. Classification performance of sEMG-based models for standing posture.

Model	Test Accuracy	Non-Fatigue F1-Score	Fatigue F1-Score
XGB Boost	$0.7694 \pm 0.0094$	$0.8075 \pm 0.0170$	$0.7099 \pm 0.0160$
CNN LSTM Attention	$0.7830 \pm 0.0120$	$0.8238 \pm 0.0186$	$0.7135 \pm 0.0198$
Transformer	$0.6717 \pm 0.0152$	$0.7150 \pm 0.0208$	$0.6120 \pm 0.0153$
1dCNN	$0.7272 \pm 0.0299$	$0.7570 \pm 0.0618$	$0.6846 \pm 0.0172$
RF	$0.7480 \pm 0.0104$	$0.8010 \pm 0.0136$	$0.6553 \pm 0.0078$

Table 5. Classification performance of sEMG-based models for stooping posture.

Model	Test Accuracy	Non-Fatigue F1-Score	Fatigue F1-Score
XGB Boost	$0.7809 \pm 0.0109$	$0.8289 \pm 0.0123$	$0.6943 \pm 0.0120$
RF	$0.7599 \pm 0.0187$	$0.8245 \pm 0.0199$	$0.6163 \pm 0.0150$
CNN LSTM Attention	$0.7753 \pm 0.0152$	$0.8254 \pm 0.0177$	$0.6808 \pm 0.0301$
1dCNN	$0.7534 \pm 0.0270$	$0.7994 \pm 0.0490$	$0.6709 \pm 0.0330$
Transformer	$0.6942 \pm 0.0182$	$0.7561 \pm 0.0301$	$0.5832 \pm 0.0260$

Table 6. Classification performance of sEMG-Based Models for Kneeling Posture.

Model	Test Accuracy	Non-Fatigue F1-Score	Fatigue F1-Score
XGB Boost	$0.6989 \pm 0.0189$	$0.7511 \pm 0.0303$	$0.6123 \pm 0.0346$
RF	$0.7450 \pm 0.0222$	$0.8134 \pm 0.0256$	$0.5904 \pm 0.0222$
CNN LSTM Attention	$0.7469 \pm 0.0325$	$0.7971 \pm 0.0409$	$0.6527 \pm 0.0272$
Transformer	$0.6602 \pm 0.0419$	$0.7257 \pm 0.0721$	$0.5128 \pm 0.0969$
1dCNN	$0.7179 \pm 0.0065$	$0.7799 \pm 0.0116$	$0.6048 \pm 0.0270$

Table 7. Performance of the hybrid Transformer–LSTM–XGBoost model.

Activity	Accuracy	Fatigue-F1 Score	Non-Fatigue F1 Score
Standing	0.8213	0.7772	0.8509
Stooping	0.8266	0.7688	0.8613
Kneeling	0.8238	0.7675	0.8582

Table 8. Ablation study results for the hybrid model.

Models	Accuracy	F1 Fatigue
Hybrid Transformer–LSTM–XGBoost Model	0.8238	0.7675
No Transformer	0.7137	0.7594
No LSTM	0.7666	0.0251
XGBoost replaced by Softmax	0.7030	0.6911

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Acharya, S.; Kisi, K.; Gautam, S.R.; Mahmud, T.; Kayastha, R. A High-Performance Hybrid Transformer–LSTM–XGBoost Model for sEMG-Based Fatigue Detection in Simulated Roofing Postures. Buildings 2025, 15, 3005. https://doi.org/10.3390/buildings15173005

AMA Style

Acharya S, Kisi K, Gautam SR, Mahmud T, Kayastha R. A High-Performance Hybrid Transformer–LSTM–XGBoost Model for sEMG-Based Fatigue Detection in Simulated Roofing Postures. Buildings. 2025; 15(17):3005. https://doi.org/10.3390/buildings15173005

Chicago/Turabian Style

Acharya, Sujan, Krishna Kisi, Sabrin Raj Gautam, Tarek Mahmud, and Rujan Kayastha. 2025. "A High-Performance Hybrid Transformer–LSTM–XGBoost Model for sEMG-Based Fatigue Detection in Simulated Roofing Postures" Buildings 15, no. 17: 3005. https://doi.org/10.3390/buildings15173005

APA Style

Acharya, S., Kisi, K., Gautam, S. R., Mahmud, T., & Kayastha, R. (2025). A High-Performance Hybrid Transformer–LSTM–XGBoost Model for sEMG-Based Fatigue Detection in Simulated Roofing Postures. Buildings, 15(17), 3005. https://doi.org/10.3390/buildings15173005

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A High-Performance Hybrid Transformer–LSTM–XGBoost Model for sEMG-Based Fatigue Detection in Simulated Roofing Postures

Abstract

1. Introduction

2. Methodology

2.1. Experimental Protocol and Data Collection

2.1.1. sEMG Sensor Placement

2.1.2. Experiment

2.2. Data Processing and Analysis

2.2.1. WM Value-Based Labeling

2.2.2. Feature Calculation and Significance Testing

2.3. Machine Learning Implementation and Evaluation

2.3.1. Handling Class Imbalance

2.3.2. Model Architectures and Hyperparameters

2.3.3. Training and Validation Strategy

2.3.4. Performance Evaluation Metrics

3. Results

3.1. Data Labeling and Dataset Characteristics

3.2. Baseline Model Performance on Raw sEMG Data

3.3. Significance of sEMG Features in Fatigue Detection

3.4. Performance of Feature-Extracted sEMG Models

3.5. Performance of the Advanced Hybrid Model

4. Discussion

4.1. Recapitulation of Principal Findings

4.2. Discussion of sEMG-Based Fatigue Detection

4.2.1. Significance and Physiological Interpretation of sEMG Feature Behavior

4.2.2. Methodological Framework for Feature Selection

4.2.3. Performance of sEMG Machine Learning Models

4.2.4. Ablation Study of the Hybrid Architecture

4.3. Considerations for Real-Time Implementation and Deployment

4.4. Methodological Considerations and Limitations

5. Conclusions and Future Directions

Future Directions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI