Several epidemiological studies provided a strong correlation between physical work exposures and the increased risk of work-related musculoskeletal disorders (MSDs) [1
]. Radwin et al. showed the relationship between MSDs and repeated and long durations of external load handling during the workday [5
Biomechanical exposures during physical work are mainly due to three main factors: intensity (load magnitudes and extent of non-neutral postures), repetition (frequency or number of force exertions and motions) and duration (the time of a physical activity) [6
In addition to the more traditional quantitative or semiquantitative observational methods [7
], occupational ergonomics studies in the field can employ instrumental methods that offer greater agility, precision and duration of measurement. Among the direct measurement methods, wearable inertial systems based on inertial measurement units (IMUs) play an important role in the biomechanical risk assessment [14
], and they look very promising for occupational medicine and ergonomics applications [15
]. In the field of risk assessment, in fact, wearable inertial technology represents a significant advance in comparison to the evaluation tools traditionally used in ergonomics [16
], especially regarding the degree of precision and possibility of automatic measurement detection. IMUs are based on triaxial accelerometers and gyroscopes able to measure 3D acceleration and angular velocity of the sensor with respect to gravity [17
]. Often, IMUs also include a triaxial magnetometer useful to give information about the orientation of the sensor in the three-dimensional space. In the absence of standards on the positioning of sensors on the human body [18
], the dorsal part of the back was recommended for the ergonomic study of trunk position [19
], while the waist was suggested for analysing the overall motion, as representative of centre body mass [20
In occupational ergonomics, body-worn inertial sensor technology and motion tracking system could be combined to noninvasively collect large amounts of body movement data during physical work [15
] and explore their association with occupational risk as assessed with standard methods [21
]. The portability and wearability of this technology represents an advantageous alternative to camera-based motion tracking systems [22
]. Information on worker exposure obtained through wearable sensors could help to pre-evaluate heavy work, match workers’ skills with physical activity requirements, verify the sustainability of work shift combinations as well as prioritise work modification interventions based on the type and severity of the level of exposure [16
]. The success and diffusion of IMUs systems are linked to their relative low cost, the low complexity of the experimental setup and data processing procedures, the limited time constraints and the feasibility of evaluation outside the research laboratories [23
]. Several IMUs positioned on the body were used in studies [24
], where the purpose was to predict, by the same accelerometric data, the geometric (initial and final height, horizontal distance, asymmetry and inclination of the trunk) and temporal (frequency and duration) variables related to lifting, thereby validating the measures that the systems produced.
Among the activities involving biomechanical overloading, material handling and lifting is one of the most studied in the scientific literature, including its association with the development of work-related MSDs [2
]. With a view to prevention, NIOSH established a methodology for assessing lifting actions by means of a quantitative method based on intensity, duration and frequency of the task, and other geometrical characteristics of lifting [12
]. The method determines the Recommended Weight Limit (RWL) for the lifting tasks and calculates the risk index namely Lifting Index (LI).
In the scientific literature, among the many applications of wearable technology to ergonomics, and in particular among those which use the NIOSH methodology [26
], the association of features extracted directly from raw signals (acceleration and angular velocity) with NIOSH risk classes related to repeated load lifting activities has not yet been explored.
Moreover, machine learning (ML) algorithms are gaining popularity in the ergonomic field for biomechanical risk assessment by means of data acquired by wearable inertial systems. Several publications have appeared in recent years documenting several strategies [27
]. IMU systems, which incorporate machine learning into their data analysis pathways, have been found effective in automated exercise detection and in classifying movement quality across a range of lower limb exercises, including lifting, despite studies in this field having so far involved few samples [30
The question remains whether it is possible to classify lifting tasks belonging to different risk classes according to the value of LI using a machine learning approach by means of features extracted from raw signals.
The aim of this study is twofold: First, we explored the possibility to use a single IMU placed on the lumbar region to monitor the biomechanical risk. Second, we assessed if the time-domain features extracted from acceleration and angular velocity signals acquired by the IMU sensor allowed to classify risk/no-risk tasks according to the NIOSH methodology.
To classify lifting tasks belonging to different LI classes according to the Revised NIOSH Lifting Equation (RNLE), we fed several ML algorithms using time-domain features extracted from acceleration and angular velocity signals. The signals relating to the lifting activities were acquired through the wearable Opal System. The validation of the methodology was carried out through the tenfold cross-validation and different evaluation metrics in order to make the result more robust.
First, we performed a ML analysis for each subject to assess the feasibility of the proposed data mining system to assess the biomechanical risk for a single subject. For each subject, we considered two datasets: the first dataset is composed of 60 instances, 30 for each class (NO RISK, RISK), and 12 features extracted from the acceleration signals; the second dataset is composed of 60 instances, 30 for each class (NO RISK, RISK), and 12 features extracted from the angular velocity signals. For each dataset, we performed the ML analysis by averaging the results among the seven subjects, and we further showed the standard deviation in order to include prediction uncertainty. The results for each dataset are shown in Table 3
and Table 4
, respectively, where Sensitivity and Specificity were computed considered as reference for the NO RISK class.
Second, we performed Feature Importance by means of the calculation of the IG, considering the entire study sample and the features extracted from both the acceleration and angular velocity signals along the three axes (Figure 5
Third, we performed a ML analysis considering all seven subjects to assess the feasibility of the proposed data mining system for biomechanical risk assessment for a general study population. In this analysis, we considered a unique dataset consisting of 420 (60 × 7) instances, 210 for each class (NO RISK, RISK), and 18 features extracted from both acceleration and angular velocity signals excluding the six features with IG equal to zero (Figure 5
). In our study, the general rule is respected that foresees at least equal to 10 the ratio n/d, between the number n of instances available in the training set and the dimension d of the feature space [64
]. This strengthens and makes the result of our analysis shareable.
The results of the ML analysis on the entire dataset are shown in Table 5
, where the NO RISK class was considered as the reference class for Sensitivity and Specificity.
shows the Confusion Matrix of the best algorithm (GB) resulting from the analysis on the entire study sample and according to the scores of the evaluation metrics: Accuracy, Sensitivity, Specificity and AucRoc. The resulting confusion matrix is perfectly balanced with the following values: TP = 197, FP = 13, FN = 8, TN = 202.
Finally, we performed a ML analysis on the entire dataset using as validation strategy the leave-one-subject-out, namely, using six subjects for the training set and one subject for the test set. Results are shown in the Table 7
The goal of our research was to explore the feasibility of several state-of-the-art ML algorithms—fed with specific time-domain features extracted from the acceleration and angular velocities signals during a lifting activity—to classify the lifting risk classes based on the LI values according to the RNLE. The results obtained suggest ML algorithms—operating on the time-domain features (RMS, SD, MIN and MAX) extracted during lifting activities from the acceleration and angular velocity signals along the three dimensions of space—can offer valid help to experts in ergonomics for the precise and automatic classification of the biomechanical risk of workers engaged in load-lifting activities.
The ML analysis performed was aimed at the classification of lifting activities based on the presence or absence of risk, defined by the LI index.
First, having carried out a ML analysis for each subject, we obtained the average scores and the standard deviations of the evaluation metrics of the seven subjects by considering separately the characteristics extracted from the acceleration signal and the angular velocity signal. As shown in Table 3
and Table 4
, the application of state-of-the-art algorithms on the time-domain features extracted from the acceleration signal provides superior performance (evaluation metric scores) compared to that achieved by using the angular velocity signal, albeit with satisfactory results for the latter as well. The proposed combination of algorithms and features extracted from the acceleration signal achieved an accuracy of between 0.79 and 0.98, a sensitivity between 0.79 and 0.98, a specificity between 0.79 and 0.99, and AucRoc between 0.84 and 0.99. The proposed combination of algorithms and features extracted from the angular velocity signal achieved an accuracy between 0.68 and 0.90, a sensitivity between 0.84 and 0.91, a specificity between 0.44 and 0.92, and an AucRoc score between 0.82 and 0.94.
Second, in our study, eighteen out of twenty-five features showed a non-zero IG (Figure 5
), highlighting their predictive power for this specific classification task. Specifically, the most informative features, according to IG, were those associated with the acceleration of the y
axis, i.e., the mediolateral direction (Figure 2
). This means that the trajectory of the subject’s centre of gravity along the y
axis during the lifting task (Figure 3
) tends to have a greater information to separate risk classes, despite the load being moved along the x trajectory. In particular, the aRMSy alone shows an information gain equal to 20%, highlighting its high discriminating power between the two risk conditions.
As for the features relating to the acceleration along the y
axis, the most informative ones according to IG (Figure 5
) are, once again, those relating to the angular velocity around the y
axis. Approximately 60% of the information provided by the features derives from those relating to the angular velocity and acceleration of the y
axis, and this result must be taken into account.
Third, as shown in Table 5
, we performed a ML analysis considering the whole sample of the study in order to have more generalisable results and to evaluate if the proposed method was applicable not only to the single subject, but also to a whole sample. This property could in fact represent a substantial advantage for using the proposed methodology during preventive interventions for the health and safety of workers in the workplace. This further analysis was carried out using the features extracted from both acceleration and angular velocity signals relating to the three axes (considering the entire study sample, and excluding the features with IG equal to zero). All classifiers (with the exception of the NB, SVM and LR algorithms) showed an accuracy between 0.80 and 0.95, a sensitivity between 0.72 and 0.94, a specificity between 0.85 and 0.96, and AucRoc between 0.90 and 0.99. As shown in Table 5
, six out of nine ML algorithms discriminated excellently (AucRoc values > 0.90) the two risk classes. Conventionally, AucRoc values > 0.70 are considered to represent moderate discrimination, value > 0.80 good discrimination and values > 0.90 excellent discrimination; on the basis of the results, as shown in the Table 5
, six of nine ML algorithms showed an excellent discrimination of the two risk classes. The poor performance of the NB algorithm was due to the presence of a statistically significant correlation between characteristics [56
] (correlation study not shown). The LR algorithm, as the NB one, is based on the concept of probability, and this could explain the limited performances of LR. Instead, the poor performance of the SVM with linear kernel could be explained by the fact data are not linearly separable. The best algorithm was the GB which reached values of 0.95, 0.94, 0.96 and 0.99 in Accuracy, Sensitivity, Specificity and AucRoc, respectively. As shown in Table 6
, the almost symmetric Confusion Matrix of GB presents only 21 instances wrongly classified out of 240 total instances, confirming the potential of this methodology applied to biomechanical evaluation.
Finally, to better generalise the performance of our models, we tested them using leave-one-subject-out, training the classifiers on six subjects and testing them on one subject. Although the metrics resulted slightly lower, the data shown in Table 7
are comparable with the ones obtained from the stratified tenfold CV. Once again, the tree-based ML algorithms has proven more efficient in terms of evaluation metrics for this purpose.
This is the first study that considers risk discrimination (by ML) according to NIOSH using a single IMU placed on the subject’s pelvis to extract four basic time-domain features. The achieved results, when compared with the recent ones described by other research groups, are in line or superior to those based on more complex methodologies.
In the study by Varecchia et al. [65
], the combination of an artificial neural network fed with time-domain and frequency-domain features extracted from surface electromyography and optoelectronic systems resulted in classification Accuracy of up to 90% against three NIOSH risk classes (LI = 1, LI = 2, LI = 3). In a subsequent work by the same authors [66
], the new feature of Lifting Energy Consumption [67
] was used to feed a neural network similar to the previous one, demonstrating an Accuracy up to 100%. The limit of this methodology, as pointed out by the authors themselves, is due to the poor applicability in the workplace, an aspect that is solved using wearable inertial sensors as in our case.
Snyder et al. [68
] proposed a modified Convolutional Neural Network model to distinguish three risk levels (low, medium and high) according to the American Conference of Governmental Industrial Hygienist Threshold Limit Values for lifting. Similar to our work, they used IMU sensors, albeit in larger number, to achieve 90% Accuracy.
With a similar goal to ours, Brandt et al. [43
] tried to classify lifting activities into low- and high-risk categories according to the guidelines of the Danish Working Environment Authority, reaching an Accuracy score equal to 65% using a Linear Discriminant Analysis algorithm. In the study by Conforti et al. [27
], which aimed to distinguish between correct and incorrect postures, the extraction of data from an IMU positioned on the pelvis of the subject, and coupled with an IMU placed on the trunk, did not allow obtaining scores higher than 75% using a Support Vector Machines algorithm with four different kernels.
Although the presented methodology is powerful, doubts could be raised about the effective capabilities of a single IMU for the validation of such results. Although a single IMU on the pelvis is not sufficient to fully predict the parameters associated with lifting (e.g., the weight of the object to be handled, the horizontal distance, etc.), this solution estimates the lumbar load fairly well, when the displaced mass is known and is in a consistent position with respect to the body [38
]. In addition to significantly increasing the convenience in field trials, the use of a single IMU, positioned on the back, is considered sufficient to provide the data necessary to distinguish lifting classes [68
Based on these results, the experimented approach—which combines time-domain features and machine learning algorithms—proved to be a valid indicator, although preliminary because of the low number of samples analysed, of the risk of WLBDs for manual lifting (according to the NIOSH index) to which workers are potentially exposed during their working activity.