Data Quality and Reliability Assessment of Wearable EMG and IMU Sensor for Construction Activity Recognition

The workforce shortage is one of the significant problems in the construction industry. To overcome the challenges due to workforce shortage, various researchers have proposed wearable sensor-based systems in the area of construction safety and health. Although sensors provide rich and detailed information, not all sensors can be used for construction applications. This study evaluates the data quality and reliability of forearm electromyography (EMG) and inertial measurement unit (IMU) of armband sensors for construction activity classification. To achieve the proposed objective, the forearm EMG and IMU data collected from eight participants while performing construction activities such as screwing, wrenching, lifting, and carrying on two different days were used to analyze the data quality and reliability for activity recognition through seven different experiments. The results of these experiments show that the armband sensor data quality is comparable to the conventional EMG and IMU sensors with excellent relative and absolute reliability between trials for all the five activities. The activity classification results were highly reliable, with minimal change in classification accuracies for both the days. Moreover, the results conclude that the combined EMG and IMU models classify activities with higher accuracies compared to individual sensor models.


Introduction
The construction industry is one of the leading industries in the world, which spends $10 trillion on construction-related goods and services every year [1]. However, the construction industry is facing a massive workforce shortage of skilled craft workers [2]. More than 8 out of 10 construction firms report having a hard time finding qualified workers. One of the significant causes of workforce shortage is the premature retirement of skilled craft workers due to safety and health issues. Due to a lack of proper safety training and monitoring systems, the construction workforce is exposed to various fatal and non-fatal injuries such as work-related musculoskeletal disorders (WMSDs). To overcome these challenges, various researchers have proposed wearable sensor-based systems in the area of construction safety and health [3][4][5][6][7][8]. Various applications in the area of safety and health involve preventing musculoskeletal disorders, fall prevention, mental and physical workload assessment, and fatigue monitoring [3][4][5][6][7][8]. All these applications can be categorized as a classification problem since they involve identifying different postures, classifying different physical and mental workloads, construction activity classification by following the guidelines, recommendations, and methods for data quality and reliability assessment proposed by previous studies on sensor [24][25][26][27][28][29].
In order to achieve the proposed objective, the whole study is divided into seven experiments. The first three experiments involve evaluating the data quality, understanding the effect of armband position on data quality, and reliability of forearm EMG and IMU data. Later, four experiments involve building and evaluating activity classification models, assessing the reliability of classification results, understanding the effect of lifting weights on classification results, and evaluating the classification performance of different sensor combinations. The results of these experiments answer various questions such as noise level in armband signal data, drift in the IMU sensor data, quality of EMG and IMU data for at-rest and in-motion activities, the effect of armband position on signal quality, the accuracy of construction activity classification using EMG and IMU, reliability of sensor data and classification results, effect of lifting weights on classification accuracy, and classification performance of different sensor combinations. It was hypothesized that the armband sensor provides reliable EMG and IMU data and activity classification results. The answers to the above questions establish the reliability and applicability of forearm EMG and IMU data for construction activity classification.

Participants
Eight healthy college male students voluntarily participated in all the experiments. The participants' ages ranged from 24 to 28 years (mean ± SD: 26.13 ± 1.55 years), height ranged from 1.65 to 1.83 m (1.74 ± 0.06 m), and weight ranged from 62.60 to 100 kg (81.35 ± 12.44 kg). All the participants were right-handed, healthy, and had no musculoskeletal disorders at the time of experiments. All the procedures involving human participants were approved by the Louisiana State University Institutional Review Board (IRB #: IRBAM-20-0112). The purpose of the research was demonstrated to all the participants before the start of the experiment, and their signatures were obtained on the informed consent forms. The sample size required to assess the reliability of the sensor using the intraclass correlation (ICC) was determined using the tables from Bujang and Baharum [30]. An ICC score greater than or equal to 0.75 indicates excellent reliability [31,32]. At least seven participants are required to achieve a minimum of 0.75 ICC scores with two assessments per subject at a 0.05 significance level and a power of 0.80 [30].

Measurements and Instrumentation
A forearm based wearable armband sensor (Myo armband) developed by Thalmic Labs Inc. was used to collect the EMG and IMU data. Myo armband sensor is a non-intrusive wearable sensor that consists of eight dry surface EMG sensors and a 9-axes IMU sensor (3-axes gyroscope, 3-axes accelerometer, and 3-axes magnetometer). The sensor weighs approximately 93 g [23]. The data from the sensor is transmitted to the computer or cloud storage via Bluetooth Low Energy (BLE) wireless connection. The raw EMG and IMU data can be assessed through the Myo software development kit (SDK). The Myo SDK was used to acquire real-time forearm EMG and IMU data at a frequency of 200 Hz and 50 Hz, respectively. The device goes into an idle state if there is no activity for more than 30 s. The configuration of Myo armband electrodes is shown in Figure 1a, where the electrode with the LED light and Myo logo is channel-4, followed by channel-3 in clockwise direction and channel-5 in counter-clockwise direction. Moreover, Figure 1a shows the direction of x, y, and z of the IMU sensor. The armband was worn on the thickest part of the forearm, as shown in Figure 1b with the channel-4 in the line of the index finger, and the blue marker was in the lower forearm for the experiments unless otherwise stated [33]. After wearing the armband sensor, the participant calibrates their motion by performing predefined gestures such as finger spread, wave-in, wave-out, and relaxed state gestures by connecting with Thalmic Labs' Myo Connect manager [34].

General Procedures of the Study
This study consists of seven experiments, including (a) evaluating the forearm EMG and IMU data quality for "at-rest" and "in-motion" activities (Experiment I); (b) investigating the effect of armband sensor position on EMG and IMU data (Experiment II); (c) assessing the reliability of forearm EMG and IMU data obtained while performing construction activities (Experiment III); (d) classification model building, performance evaluation, and classifier comparison (Experiment IV); (e) investigating the reliability of results obtained from classification models using EMG and IMU data while performing construction activities on different days (Experiment V); (f) investigating the effect of lifting weight on forearm EMG and IMU data and activity classification results (Experiment VI); and (g) comparison of activity classification performance for different sensor combinations. The activities performed by the participants are standardized across all the experiments. The "at-rest" activities include the armband lying stationary on the floor or placed on the arm of a person sitting still with arm resting on a desk. Whereas, the "in-motion" activities include screwing at elbow height at a frequency of 1 turn/6 s, wrenching while kneeling at a frequency of 1 turn/6 s, lifting a 25 lbs sandbag from elbow to shoulder height at a frequency of 1 lift/6 s, and carrying a 25 lbs sandbag on the shoulder with the dominant hand at the bottom of the sandbag for 30 s. Activities were designed in such a way that they represent a wide range of construction activities involving forearm (lifting), wrist (screwing and wrenching), and whole-body (carrying). Moreover, these activities represent controlled natural motions such as repeated motion (lifting), impulsive motion (screwing or wrenching), and free motion (carrying). All the activities were performed for 30 s (i.e., each trial of activity was 30 s). Each participant performed three trials for an activity on a testing day. There were two testing periods (i.e., Day-1 and Day-2) where participants performed all five activities (i.e., stationary on the body, screwing, wrenching, lifting, and carrying) on both days. Therefore, each participant performed a total of 15 activities (3 trials × 5 activities) in one day. There was no gap between the testing periods. The activities were randomized for all the participants for both days. Before the start of the experiment, all the participants were given enough time to familiarize themselves with the tools to eliminate systematic bias, which occurs due to learning effects [49]. The participants were asked to warm up their bodies before the start of the session, and enough rest was provided between the trials to prevent injuries and fatigue [50]. Once the armband was worn on the body and synced with the computer, a two minute settling time was considered before the start of the experiment to prevent the rotational drift. In order to test the reliability using the test-retest approach, all the activities were performed in an indoor environment under control conditions unless stated otherwise. The eight participants' EMG, accelerometer, and gyroscope data were recorded and stored for all five activities for both the days. The data were processed and analyzed accordingly based on the experiment requirements. The seven experiments mentioned above are further explained in the following sections and broadly divided into three categories: data quality assessment, data reliability assessment, and activity classification performance evaluation. The eight EMG sensors capture the electrical impulses generated by the forearm muscles, which are returned as an 8-bit array, in other words, each EMG sensor outputs an integer value between −128 and 127 representing muscle activation levels. The armband sensor captures the muscle activity of various forearm muscles such as the brachioradialis, flexor digitorum superficialis, medial epicondyle of humerus, palmaris longus, flexor carpi ulnaris, flexor carpi radialis, and pronator teres [35]. Whereas, the IMU unit captures the motion of the forearm by measuring acceleration, angular velocity, and orientation along the x, y, and z axes. It was ensured that the armband was always synced with the application and calibrated throughout the experiments.
High-precision conventional wearable EMG and IMU sensors such as FREEEMG (BTS Bioengineering Corp., Quincy, MA, USA) and YEI 3-Space IMU sensor (Yost Engineering Inc., Portsmouth, OH, USA) respectively, were used to compare the armband sensor data quality. The conventional sensor measures the acceleration and gyroscope in units of g and radians/s, respectively. In comparison, the conventional EMG sensor measures muscle activity in millivolts (mV). Besides, the conventional IMU sensor was calibrated using a gradient descent calibration procedure and no preprocessing was performed on any of the sensor data before data quality calculations.
To assess the reliability of the armband sensor data, features such as absolute acceleration, absolute angular velocity, and mean absolute value of EMGsum (sum of EMG values) were calculated from raw data [36]. These sensor features are widely used in activity/gesture/motion recognition applications [33,[36][37][38][39][40][41][42]. The acceleration along x, y, and z axes were used to compute the absolute acceleration or magnitude of the acceleration vector (Acc) at any given timestamp (t) using Equation (1) [31,43,44]. Similarly, the angular velocity along the three axes provided by the gyroscope sensor was used to calculate the absolute gyroscope angular velocity or magnitude of gyroscope vector (Gyro) at any given timestamp (t) using Equation (2) [31,[44][45][46]. For simplicity, the angular velocity along the axes was represented as Gyro in Equation (2). Using the eight EMG values, a new feature EMGsum was calculated by summing up all the eight EMG values at any timestamp (t) [47,48]. Further, the mean absolute value (MAV) of EMGsum was evaluated using Equation (3), which was later used for reliability assessment [33,37,38]. For each trial, an average of acceleration magnitude, an average of gyroscope magnitude, and MAV of EMGsum was computed to assess the trial-to-trial (intra-day) reliability of the sensor. Whereas in the case of day-to-day (inter-day) reliability test, the mean values of three trials of each day were used for ICC analysis.

General Procedures of the Study
This study consists of seven experiments, including (a) evaluating the forearm EMG and IMU data quality for "at-rest" and "in-motion" activities (Experiment I); (b) investigating the effect of armband sensor position on EMG and IMU data (Experiment II); (c) assessing the reliability of forearm EMG and IMU data obtained while performing construction activities (Experiment III); (d) classification model building, performance evaluation, and classifier comparison (Experiment IV); (e) investigating the reliability of results obtained from classification models using EMG and IMU data while performing construction activities on different days (Experiment V); (f) investigating the effect of lifting weight on forearm EMG and IMU data and activity classification results (Experiment VI); and (g) comparison of activity classification performance for different sensor combinations. The activities performed by the participants are standardized across all the experiments. The "at-rest" activities include the armband lying stationary on the floor or placed on the arm of a person sitting still with arm resting on a desk. Whereas, the "in-motion" activities include screwing at elbow height at a frequency of 1 turn/6 s, wrenching while kneeling at a frequency of 1 turn/6 s, lifting a 25 lbs sandbag from elbow to shoulder height at a frequency of 1 lift/6 s, and carrying a 25 lbs sandbag on the shoulder with the dominant hand at the bottom of the sandbag for 30 s. Activities were designed in such a way that they represent a wide range of construction activities involving forearm (lifting), wrist (screwing and wrenching), and whole-body (carrying). Moreover, these activities represent controlled natural motions such as repeated motion (lifting), impulsive motion (screwing or wrenching), and free motion (carrying). All the activities were performed for 30 s (i.e., each trial of activity was 30 s). Each participant performed three trials for an activity on a testing day. There were two testing periods (i.e., Day-1 and Day-2) where participants performed all five activities (i.e., stationary on the body, screwing, wrenching, lifting, and carrying) on both days. Therefore, each participant performed a total of 15 activities (3 trials × 5 activities) in one day. There was no gap between the testing periods. The activities were randomized for all the participants for both days. Before the start of the experiment, all the participants were given enough time to familiarize themselves with the tools to eliminate systematic bias, which occurs due to learning effects [49]. The participants were asked to warm up their bodies before the start of the session, and enough rest was provided between the trials to prevent injuries and fatigue [50]. Once the armband was worn on the body and synced with the computer, a two minute settling time was considered before the start of the experiment to prevent the rotational drift. In order to test the reliability using the test-retest approach, all the activities were performed in an indoor environment under control conditions unless stated otherwise. The eight participants' EMG, accelerometer, and gyroscope data were recorded and stored for all five activities for both the days. The data were processed and analyzed accordingly based on the experiment requirements. The seven experiments mentioned above are further explained in the following sections and broadly divided into three categories: data quality assessment, data reliability assessment, and activity classification performance evaluation.

Data Quality Assessment
Experiment I-Evaluating the Forearm EMG and IMU Data Quality for "At-Rest" and "In-Motion" Activities The wearable sensor data is highly susceptible to various confounding factors that affect the quality of data. In this experiment, the data quality of EMG, acceleration, and gyroscope measurements were assessed by evaluating the signal-to-noise ratio (SNR) and compared to a conventional sensor. Furthermore, the influence of confounding factors (communication devices, another sensor, power tools, and smartwatches) and environments (indoor and outdoor) on the data quality were studied Sensors 2020, 20, 5264 6 of 24 in this experiment. Firstly, the data quality was determined for the armband sensor and compared with the conventional sensors for at-rest and in-motion activities. In order to compare the data quality of the armband sensor, the conventional sensors were placed along with the armband sensor while performing activities, as shown in Figure 2. Each in-motion activity was performed three times by all eight participants. The average SNR value was used for the comparison. The influence of various confounding factors and environmental conditions on the armband sensor data quality was assessed when Myo was lying on the floor by computing SNR values for three trials. Inter-device data quality was assessed using two armbands lying on the floor at the same time to check if the data is consistent across different devices under the same conditions. All the at-rest activities were conducted three times, and the average value was considered to represent the influence of confounding factors, environment, and inter-device variability on the data quality.
Experiment I-Evaluating the Forearm EMG and IMU Data Quality for "At-Rest" and "In-Motion" Activities The wearable sensor data is highly susceptible to various confounding factors that affect the quality of data. In this experiment, the data quality of EMG, acceleration, and gyroscope measurements were assessed by evaluating the signal-to-noise ratio (SNR) and compared to a conventional sensor. Furthermore, the influence of confounding factors (communication devices, another sensor, power tools, and smartwatches) and environments (indoor and outdoor) on the data quality were studied in this experiment. Firstly, the data quality was determined for the armband sensor and compared with the conventional sensors for at-rest and in-motion activities. In order to compare the data quality of the armband sensor, the conventional sensors were placed along with the armband sensor while performing activities, as shown in Figure 2. Each in-motion activity was performed three times by all eight participants. The average SNR value was used for the comparison. The influence of various confounding factors and environmental conditions on the armband sensor data quality was assessed when Myo was lying on the floor by computing SNR values for three trials. Inter-device data quality was assessed using two armbands lying on the floor at the same time to check if the data is consistent across different devices under the same conditions. All the at-rest activities were conducted three times, and the average value was considered to represent the influence of confounding factors, environment, and inter-device variability on the data quality.

Experiment II-Investigating the Effect of Armband Sensor Position on EMG and IMU Data
In order to explore the effect of sensor position on the EMG and IMU data, a lifting activity was performed for three different sensor positions as shown in Figure 3. The standard position refers to wearing an armband with sensor-4 in the direction of the index finger. Whereas the rotated and slid positions refer to rotating the armband in an anticlockwise direction (sensor-5 in the direction of the index finger) and sliding the armband downwards with respect to the standard position, respectively. A qualitative analysis was performed on the root mean square value of EMG and the absolute magnitude of IMU data collected while performing lifting activity with three sensor positions. The sensor data quality was assessed by evaluating the noise level in the data using the signal-to-noise ratio (SNR). The SNR value of a signal is the ratio of the power of the signal to the

Experiment II-Investigating the Effect of Armband Sensor Position on EMG and IMU Data
In order to explore the effect of sensor position on the EMG and IMU data, a lifting activity was performed for three different sensor positions as shown in Figure 3. The standard position refers to wearing an armband with sensor-4 in the direction of the index finger. Whereas the rotated and slid positions refer to rotating the armband in an anticlockwise direction (sensor-5 in the direction of the index finger) and sliding the armband downwards with respect to the standard position, respectively. A qualitative analysis was performed on the root mean square value of EMG and the absolute magnitude of IMU data collected while performing lifting activity with three sensor positions.

Activities
The wearable sensor data is highly susceptible to various confounding factors that affect the quality of data. In this experiment, the data quality of EMG, acceleration, and gyroscope measurements were assessed by evaluating the signal-to-noise ratio (SNR) and compared to a conventional sensor. Furthermore, the influence of confounding factors (communication devices, another sensor, power tools, and smartwatches) and environments (indoor and outdoor) on the data quality were studied in this experiment. Firstly, the data quality was determined for the armband sensor and compared with the conventional sensors for at-rest and in-motion activities. In order to compare the data quality of the armband sensor, the conventional sensors were placed along with the armband sensor while performing activities, as shown in Figure 2. Each in-motion activity was performed three times by all eight participants. The average SNR value was used for the comparison. The influence of various confounding factors and environmental conditions on the armband sensor data quality was assessed when Myo was lying on the floor by computing SNR values for three trials. Inter-device data quality was assessed using two armbands lying on the floor at the same time to check if the data is consistent across different devices under the same conditions. All the at-rest activities were conducted three times, and the average value was considered to represent the influence of confounding factors, environment, and inter-device variability on the data quality.

Experiment II-Investigating the Effect of Armband Sensor Position on EMG and IMU Data
In order to explore the effect of sensor position on the EMG and IMU data, a lifting activity was performed for three different sensor positions as shown in Figure 3. The standard position refers to wearing an armband with sensor-4 in the direction of the index finger. Whereas the rotated and slid positions refer to rotating the armband in an anticlockwise direction (sensor-5 in the direction of the index finger) and sliding the armband downwards with respect to the standard position, respectively. A qualitative analysis was performed on the root mean square value of EMG and the absolute magnitude of IMU data collected while performing lifting activity with three sensor positions. The sensor data quality was assessed by evaluating the noise level in the data using the signal-to-noise ratio (SNR). The SNR value of a signal is the ratio of the power of the signal to the The sensor data quality was assessed by evaluating the noise level in the data using the signal-to-noise ratio (SNR). The SNR value of a signal is the ratio of the power of the signal to the power of the noise [51]. Alternatively, it is defined as the ratio of the mean of the measurements (µ) to the standard deviation of the measurements (σ) as shown in Equation (4). Where mean and standard deviation (SD) of measurements represent the power of signal and power of noise in the measurements. The signal power of acceleration and gyroscope measurements were determined as mean values of absolute magnitude. Whereas, the mean value of the EMG measurements was calculated as mean-absolute-value (MAV) [25,52]. The EMG and IMU data collected from eight participants while performing construction activities on two different days were assessed for reliability. In this experiment, the raw EMG and IMU data collected from eight participants were processed to calculate the mean absolute value (MAV) of EMG, absolute acceleration (Acc), and absolute gyroscope (Gyro) for each trial of both the days (Day-1 and Day-2). The MAV, Acc, and Gyro values of each trial were used to assess trial-to-trial reliability for both the days. Further, the MAV, Acc, and Gyro values of all the three trials were averaged for an activity for a participant for each day to evaluate reliability between days. The relative reliability was assessed using the intraclass correlation coefficient, and absolute reliability was evaluated using standard error of measurement (SEM) and smallest detectable difference (SDD).
All statistical analyses were performed using IBM SPSS statistical package version 25 (Armonk, NY, USA). The trial-to-trial and day-to-day reliability were assessed on the accelerometer, gyroscope, and electromyography measurements obtained while performing five construction activities on two different testing periods (i.e., Day-1 and Day-2). Moreover, the assessment of the trial-to-trial and day-to-day reliability measures intradevice reliability. The reliability was assessed between the trials and between the days using test-retest reliability, which consists of relative and absolute reliability [31,53]. The relative reliability refers to the magnitude of the correlation of repeated measurements, which was evaluated using the intraclass correlation coefficient (ICC) [31,32]. The relative reliability was expressed using ICC form (3, k), which includes a two-way mixed effect model, mean of k measurement type, and a definition of a relationship as absolute agreement [54,55]. Moreover, the ICC form (3, k) considers both systematic and random errors and uses the mean value of the repeated measurements as evaluation scores [31]. Based on the ICC score, the strength of relative reliability can be interpreted as excellent (if ICC score is higher than 0.75), good (if ICC score is between 0.59 and 0.75), fair (if ICC score is between 0.48 and 0.58), and poor (if ICC score is less than 0.40) [31,32,[54][55][56][57].
Whereas the absolute reliability refers to variability in the repeated measurements of an individual [31,32]. The absolute reliability was evaluated by estimating the standard measurement error (SEM). SEM estimates how the repeated measures of an individual on the same device tend to distribute around true value [31]. SEM is estimated as defined in Equation (5), where SD is the standard deviation of the measurements of a test and retest of all participants, and ICC is the average trial-to-trial or day-to-day test-retest relative reliability [31,32,56,58,59]. The SEM% was used to compare the absolute test-retest reliabilities of different scenarios, which was evaluated using Equation (6), where the SEM score is represented as a percentage of SEM divided by the mean of test and retest measurements. The SEM% value below 10% indicates excellent absolute test-retest reliability. Moreover, the smallest detectable difference (SDD) was calculated from SEM at a 95% confidence interval using Equation (7), which is the smallest change in the measurement that is required to be considered as a real change in the measurement but not due to error [31,32,56]. Similar to SEM%, the SDD score is expressed as a percentage of the mean of measurements (SDD%), which is computed using Equation (8) [31,32,56]. Before performing the parametric reliability testing, a nonparametric Kolmogorov-Smirnov test was performed to verify the normality of the data. The data obtained from the armband sensor worn by the eight participants performing five activities on two different days (Day-1 and Day-2) was used to build machine learning (ML) based classifiers for respective days. A typical machine learning methodology, which includes data preparation, model building, model training, hyperparameter tuning, and model evaluation, was used to develop ML classifiers for activity classification, which was implemented using the PyCatet classification module in Google Colab. Firstly, the dataset was prepared using the raw acceleration (a x , a y , a z ), gyroscope (g x , g y , g z ), and EMG (8-channel) features for both the days. The 8-channel EMG data were downsampled by converting 8-bit to 32-bit to match the frequency of accelerometer and gyroscope data. Therefore, the final dataset for each day consists of 38 (3-acceleration, 3-gyroscope, and 32-EMG) input features. Further, the data were manually labeled for five different activities (i.e., stationary on the body, screwing, wrenching, lifting, and carrying). Once the datasets were prepared, the labeled data was used to build the machine learning (ML) based classifier models using the default classifier settings. Besides, the hyperparameters of the model were tuned by optimizing the model accuracy to obtain a finely tuned model. The ten most common ML-based classifier models such as random forest, J48 decision trees, support vector machine (SVM), naïve Bayes, k-nearest neighbors (KNN), logistic, multi-layer perceptron (MLP), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and gradient boosting (Xgboost) were built using each day dataset. Additionally, 10-fold cross-validation was performed to evaluate the performance of the classifiers. In the cross-validation technique, the dataset is randomly shuffled and divided into ten groups. Each unique group is considered as a holdout or test dataset, and the remaining nine groups are used for model training.
Once the model has been fitted on the training dataset, the model is evaluated on the test set. The evaluation score is retained, and the model is discarded. This process is repeated for each unique group. The performance of the trained ML classifier was evaluated using metrics such as accuracy, recall, precision, F1 score, kappa, and confusion matrix. The performance of different classifiers was compared to determine the best performing classifier for each dataset. The reliability of results obtained from the classification models using Day-1 and Day-2 datasets was investigated. The best classifier obtained in Experiment III was further used to run ten iterations on each dataset. The accuracies of the classifier on the Day-1 dataset were compared to accuracies of the same classifier on the Day-2 dataset using paired t-test at 0.05 significance level.

Experiment VI-Investigating the Effect of Lifting Weight on Forearm EMG and IMU Data and Activity Classification
Detecting different weights is useful for many construction applications. This experiment investigates if the weight affects forearm EMG and IMU data and activity classification. For this experiment, an activity of lifting three different weights (10 lbs, 25 lbs, 50 lbs) with three trials from four participants was considered. The raw data with 38 features (acceleration-3, gyroscope-3, and EMG-32) was manually labeled for three activities (Lift10, Lift25, and Lift50). The ML-based classification models such as random forest, J48 decision trees, support vector machine (SVM), naïve Bayes, k-nearest neighbors (KNN), logistic, multi-layer perceptron (MLP), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and gradient boosting (Xgboost) were built using the raw data and evaluated using 10-fold cross-validation technique. The best classifier results were analyzed for three different classes to check if the sensor data could classify different weights. This experiment focuses on comparing the performance of various ML-based classifier models built using different sensor feature combinations such as EMG + IMU, IMU alone, and EMG alone. For this analysis, two datasets were considered, namely, controlled and uncontrolled activity datasets. The controlled activity dataset was prepared by combining the Day-1 and Day-2 data of five controlled activities, namely screwing, wrenching, lifting, and carrying. Whereas the uncontrolled dataset was prepared by collecting forearm armband data from the participants while performing nine construction activities at varied intensities and pace such as walking at random speed (walk), carrying (10 lbs, 25 lbs, and 50 lbs), lifting (10 lbs, 25 lbs, and 50 lbs), and screwing (at elbow height, kneeling, and overhead). Both the datasets consist of 38 features (3-acceleration, 3-gyroscope, and 32-EMG). Once the datasets were prepared, the ML-based classifiers were built with different sensor feature combinations. As explained in Experiment IV, the ten most used ML-based classifier models were built for three sensor data combinations for both datasets. The finely tuned ML-based classifiers were evaluated using 10-fold cross-validation, and the accuracy of the classifiers was combined across all the sensor feature combinations for both the datasets.
The classification involves identifying a set of classes using the input features. The performance of a classification algorithm is evaluated using metrics such as accuracy, recall, precision, F1 score, and kappa. In order to define these metrics, one needs to understand the terms true positives (TP), true negative (TN), false positive (FP), and false-negative (FN). The classification accuracy is the ratio of correct predictions (TP + TN) to the total number of predictions (TP + TN + FP + FN). Precision measures the number of correct positive predictions, which is the ratio of true positives (TP) to total positive predictions (TP + FP). In contrast, recall is the measure of the number of correct positive predictions out of all the positive predictions, which are the ratio of true positives to true positives (TP) and false-negatives (FN). F1 score is the weighted average of precision and recall, as shown in Equation (9) [60]. Cohen's kappa value measures the agreement between the predicted and actual labels. Apart from these metrics, the performance of the classifier on individual classes was assessed by using a confusion matrix.

Forearm EMG and IMU Data Quality for "At-Rest" and "In-Motion" Activities
Firstly, the EMG and IMU data quality of the armband sensors were compared with conventional EMG (FREEEMG) and IMU (Yost) using standard deviation and signal to noise ratio. Table 1 shows the standard deviation (noise level) and SNR (signal quality) for accelerometer, gyroscope, and EMG for both conventional and armband sensor for at-rest and in-motion activities. The at-rest activities include stationary on the body for the EMG and stationary on the floor for IMU. For at-rest and in-motion activities, the noise levels in acceleration and gyroscope data of the armband sensor are comparable to a conventional sensor. The SNR values are higher in the case of armband acceleration data compared to the conventional sensor for both at-rest and in-motion activities. Whereas, the SNR values of gyroscope and EMG armband data are comparable to conventional sensors (Table 1). However, the signal quality measured as SNR is better in armband data compared to conventional sensors for both EMG and IMU (Table 1). Secondly, the noise level and data quality were compared between the indoor and outdoor environments. The results show that the noise level slightly increased in case of gyroscope (SD Indoor = 0.121, and SD Outdoor = 0.138) and EMG (SD Indoor = 3.006, and SD Outdoor = 2.974) data for outdoor environment (Table 2). However, the signal quality is comparably the same for both the environments (Table 2). Thirdly, two different armband sensors under same conditions have similar noise level and data quality for acceleration (SD 1 = 0.002, SNR 1 = 514.120; SD 2 = 0.002, SNR 2 = 515.192), gyroscope (SD 1 = 0.121, SNR 1 = 1.325; SD 2 = 0.138, SNR 2 = 1.469) and EMG (SD 1 = 3.006, SNR 1 = 0.865; SD 2 = 2.947, SNR 2 = 0.881) data ( Table 3). The acceleration, gyroscope, and EMG data of stationary on the body were assessed for potential confounding factors, as shown in Table 4. The results show that the noise level in the acceleration is almost similar for all the factors; however, slightly affected in the presence of a communication device (Table 4). The noise level in gyroscope and EMG data have slightly increased in the presence of other sensor and power tools, respectively. However, the data quality of gyroscope and EMG data is similar in the presence and absence of confounding factors (Table 4). Finally, the rotational drift was determined by observing the evolution of the yaw angle for the data collected during the stationary on the body and Myo lying on the floor. Figure 4 shows the evolution of a yaw angle for 80 s of a stationary experiment. The results indicated that there was 0.13 deg/s drift initially and it reached a steady orientation when the armband was stationary on the body (Figure 4a). Whereas in the case of armband lying on the floor, the yaw angle drifts at a rate of 0.17 deg/s before it reached steady orientation, as shown in Figure 4b. Besides, it can be observed that the rotational drift was reduced considerably when worn on the body compared to the armband lying on the floor. Furthermore, the drift was higher in the initial frames and reached steady orientation in a few seconds. Therefore, a settling time of two minutes was considered to prevent rotational drift.  Further, a qualitative comparison was performed by inspecting the in-motion activity data from the armband and conventional sensors. The acceleration and EMG data of lifting activity of armband and conventional sensor wore at the same time was plotted in Figures 5 and 6, respectively. In Figure 5, the acceleration magnitude was compared for both the sensors, and it is evident that the acceleration data pattern is similar to the conventional IMU sensor. In Figure 6, the root mean square (RMS) of EMG channel-4 was compared with conventional EMG RMS, which shows that they follow a similar trend. Moreover, the Myo armband can capture more detailed information compared to a single FREEEMG sensor. Further, a qualitative comparison was performed by inspecting the in-motion activity data from the armband and conventional sensors. The acceleration and EMG data of lifting activity of armband and conventional sensor wore at the same time was plotted in Figures 5 and 6, respectively. In Figure 5, the acceleration magnitude was compared for both the sensors, and it is evident that the acceleration data pattern is similar to the conventional IMU sensor. In Figure 6, the root mean square (RMS) of EMG channel-4 was compared with conventional EMG RMS, which shows that they follow a similar trend. Moreover, the Myo armband can capture more detailed information compared to a single FREEEMG sensor.

Effect of Sensor Position on Forearm EMG and IMU Data Quality
The effect of three sensor positions, such as "rotated," "standard," and "slid down" are compared for lifting activity. Figure 7a,b shows the acceleration and gyroscope magnitude for three

Effect of Sensor Position on Forearm EMG and IMU Data Quality
The effect of three sensor positions, such as "rotated," "standard," and "slid down" are compared for lifting activity. Figure 7a,b shows the acceleration and gyroscope magnitude for three

Effect of Sensor Position on Forearm EMG and IMU Data Quality
The effect of three sensor positions, such as "rotated," "standard," and "slid down" are compared for lifting activity. Figure 7a,b shows the acceleration and gyroscope magnitude for three positions, the range of magnitude and median is the same for all the three positions, and this shows that the IMU data of the forearm is almost the same irrespective of the armband position; whereas the RMS plots of EMG vary for different sensor positions, as shown in Figure 8.

Reliability of Forearm EMG and IMU Data of Construction Activities
The forearm acceleration, gyroscope, and EMG data from eight participants while performing construction activities such as screwing, wrenching, lifting, carrying, and at-rest was assessed for

Reliability of Forearm EMG and IMU Data of Construction Activities
The forearm acceleration, gyroscope, and EMG data from eight participants while performing construction activities such as screwing, wrenching, lifting, carrying, and at-rest was assessed for

Reliability of Forearm EMG and IMU Data of Construction Activities
The forearm acceleration, gyroscope, and EMG data from eight participants while performing construction activities such as screwing, wrenching, lifting, carrying, and at-rest was assessed for trial-to-trial and day-to-day reliability using the ICC test. Tables 5-7 summarize the test-retest reliability evaluation of accelerometer, gyroscope, and EMG measurements. For each activity, the mean and standard deviation of the measurements is the average of three trials (test mean (SD)) for each day. The average ICC value of three trials at a 95% confidence interval (CI), SEM%, and SDD% for all five activities for both days are shown in Tables 5-7. For acceleration measurements of both the days, the average ICC values range from 0.844 to 0.995 for all five activities (Table 5). Similarly, for gyroscope and EMG, the values range from 0.839 to 0.987 and 0.864 to 0.988, respectively (Tables 6  and 7). The results from Tables 5-7 indicate excellent relative reliability between trials of acceleration, gyroscope, and EMG measurements for all five activities for both days. Moreover, SEM% for all the activities for acceleration, gyroscope, and EMG measurements is below 10%, which indicates excellent absolute reliability between trials for both the days. The SDD% for all activities for both the days ranges from 0.098% to 0.669% for acceleration, 5.953% to 32.225% for gyroscope, and 6.709% to 29.130% for EMG.
The day-to-day reliability assessment for accelerometer measurements shows that the ICC value is greater than 0.75, and SEM% is below 10% for all the activities, which indicates an excellent relative and absolute reliability for all activities (Table 8). For gyroscope, the ICC values are greater than 0.75 except for lifting activity (ICC = 0.724), which indicates excellent relative reliability of gyroscope data except for lifting. Whereas for the absolute reliability, SEM% values are below 10% except for stationary on the body (SEM% = 11.36%) and screwing (SEM% = 16.322%) activity. For EMG measurements, the ICC values are greater than 0.75 for all the activities indicating excellent relative reliability. Whereas the SEM% is slightly above 10% except for lifting activity (SEM% = 7.75%). The SDD% values range from between 14.48% to 31.48% and 24.49% to 39.89% for gyroscope and EMG measurements, respectively. The higher SDD% values of gyroscope and EMG suggest that caution should be taken when using gyroscope and EMG measurements for activity recognition because the change in the measurements might be due to error. Therefore, later experiments investigate if the data quality and reliability of the armband data are sufficient to yield accurate and reliable activity classification results.

Validating the Classifier Performance on Day-1 and Day-2 Dataset
Tables 9 and 10 present the classification performance results of the classifiers built using Day-1 and Day-2 datasets. The performance of both classifiers was evaluated using overall accuracy, recall, precision, F1 score, and kappa, as shown in Tables 9 and 10. The best classification performance was obtained for random forest for both Day-1 (accuracy-96.48%) and Day-2 datasets (accuracy-96.48%). Further, the random forest classifier was used to assess performance between the classes using the confusion matrix and class report, as shown in Tables 11 and 12. The recall values above 90% for both the classifiers show that a specific activity can be predicted with less false positive values. The F1 score demonstrated high overall performance for stationary on the body, carrying, lifting, screwing, and with the lowest for wrenching (93.2% and 94.9%) for both the classifiers (Tables 11 and 12). Finally, the association between the actual activities and the predicted classes was measured with Cohen's kappa coefficient, and the values indicate strong agreement with the reality in both Day-1 (95.6% ± 0.003) and Day-2 (96.73% ± 0.0019) classifiers (Tables 11 and 12).

Reliability of Classification Results
The classification results obtained using the classifiers of Day-1 and Day-2 were further analyzed for reliability using paired t-test on overall accuracy. A paired t-test (p = 0.63) at a significance level of 0.05 shows that their no significant difference between the accuracies of both the Day-1 and Day-2 classifier. The difference between the overall accuracy of Day-1 and Day-2 random forest classifier is 0.15%.

Effect of Lifting Weight on Classification Results
The ten most common classification algorithms' performances were analyzed on the lifting different weights dataset. Table 13 shows the accuracy, recall, precision, F1 score, and kappa values of all the classifiers. The random forest classifier showed the best performance in classifying three different weights with an overall accuracy of 83.89%, recall value of 84.06%, and kappa value of 75.82%. The results indicate that using the forearm EMG and IMU data, the random forest classifier can classify all three weights at 83.89% accuracy. Further, the confusion matrix and class report show that the high overall performance for Lift10 (F1 score = 91%) activity followed by Lift25 (F1 score = 80%) and Lift50 (F1 score = 77%) ( Table 14). The results confirm that the forearm EMG and IMU data can not only classify lifting activity but is also able to detect the weight. In addition, the correlation of raw features shows that the gyroscope and EMG features are highly correlated compared to accelerometer features ( Figure 9). Therefore, it can be concluded that the gyroscope and EMG features provide an opportunity to classify different weights of lifting activity.

Comparison of Activity Classification Performance for Different Sensor Combinations
The comparison of overall accuracies for different sensor combinations is shown in Table 15. For the controlled dataset, the EMG + IMU and IMU alone are better compared to EMG. The classification accuracy is higher for EMG + IMU in the case of random forest, SVM, naïve Bayes, and MLP; whereas, the classification accuracy is higher for IMU alone in the case of KNN, logistic, LDA, QDA, and Xgboost. However, except for KNN and MLP, the accuracy is not significantly different for EMG + IMU and IMU alone; whereas for the uncontrolled activity dataset, the accuracy is significantly higher for EMG + IMU compared to IMU and EMG alone except in the case of KNN. For the KNN classifier, the IMU alone has higher accuracy compared to EMG + IMU. However, the highest classification accuracy (98.13%) for nine activities with various intensities was obtained for the EMG + IMU feature combination. The combination of EMG and IMU features yields higher accuracy compared to individual sensor data for complex activities.

Comparison of Activity Classification Performance for Different Sensor Combinations
The comparison of overall accuracies for different sensor combinations is shown in Table 15. For the controlled dataset, the EMG + IMU and IMU alone are better compared to EMG. The classification accuracy is higher for EMG + IMU in the case of random forest, SVM, naïve Bayes, and MLP; whereas, the classification accuracy is higher for IMU alone in the case of KNN, logistic, LDA, QDA, and Xgboost. However, except for KNN and MLP, the accuracy is not significantly different for EMG + IMU and IMU alone; whereas for the uncontrolled activity dataset, the accuracy is significantly higher for EMG + IMU compared to IMU and EMG alone except in the case of KNN. For the KNN classifier, the IMU alone has higher accuracy compared to EMG + IMU. However, the highest classification accuracy (98.13%) for nine activities with various intensities was obtained for the EMG + IMU feature combination. The combination of EMG and IMU features yields higher accuracy compared to individual sensor data for complex activities.

Discussion
In this study, the data quality of low-cost forearm based wearable sensors were explored by comparing the standard deviation and signal to noise ratio of the armband sensor and the conventional sensor for at-rest and in-motion activities. The noise levels in the armband acceleration data (SD = 0.002) when lying on the floor are comparable to the high precision conventional IMU sensor (SD = 0.003), which is in agreement with the previous study (SD = 0.0019) [25]. Similarly, the noise levels in the acceleration and gyroscope data for in-motion activities are comparable to conventional sensors. Besides, the signal quality of armband sensor data is higher compared to the conventional sensor, which shows that the armband sensor is less sensitive compared to high precision and high-frequency sensors. Moreover, the data quality test in the presence of confounding factors also proves that the armband data is not affected much by the confounding factors, environment, and inter-device variability. Drift is one of the most common issues of IMU when used to estimate position and orientation [26]. The rotational drift of the armband sensor was assessed by observing the evolution of the yaw angle for at-rest activities. The yaw angle drifts at a rate of 0.17 deg/s before it reaches the steady orientation, which is in agreement with a previous study [61]. This experiment proves that the drift reduced when the Myo was worn on the body compared to lying on the floor. Moreover, the rotational drift was highest in the initial frames and reached a steady state in a few seconds. Similar to the other studies [25,62], the in-motion (i.e., lifting) activity data of the armband and the conventional sensor was visually compared since the quantitative comparison of both sensor signal data would not be appropriate. For the comparison of EMG and accelerometer signals, RMS and absolute magnitude plots were considered, as shown in Figures 5 and 6. The result shows that the armband data and conventional sensor both pick the same peaks and follow a similar trend for lifting activity. The qualitative assessment of armband sensor position on EMG and IMU data quality shows that accelerometer and gyroscope data is almost similar for three (rotated, standard, and slid down) sensor positions. A previous study [63] reported similar results where the classification accuracy using accelerometer data at different sensor positions made no significant difference. However, the EMG data for three armband positions are significantly different, which conforms with the fact that the IMU sensor captures the motion of the forearm, whereas the EMG signal depends on the muscle contact.
The study assessed the relative and absolute reliability of forearm EMG and IMU data of construction activities. The test-retest evaluation of accelerometer data indicated an excellent trial-to-trial (ICC = 0.844 to 0.995 and SEM% = 0.087% to 0.258%) and day-to-day (ICC = 0.824 to 0.881 and SEM% = 0.245% to 0.526%) relative and absolute reliability for all the activities as shown in Table 5. Whereas for the gyroscope data, an excellent relative reliability was observed for trial-to-trial (ICC = 0.824 to 0.987) and day-to-day (ICC = 0.801 to 0.844) except for lifting where ICC = 0.724 (Tables 6 and 7). The absolute reliability of gyroscope data for day-to-day was slightly greater than 10% Sensors 2020, 20, 5264 20 of 24 ranging from 5.224% to 16.322%. The EMG data has shown excellent relative (ICC = 0.864 to 0.988) and absolute (SEM% = 2.420% to 10.509%) reliability between trials but the absolute reliability between the days (SEM% = 7.75% to 16.21%) is slightly greater than 10% (Table 8). Overall, the results show that armband sensor data (acceleration, gyroscope, and EMG) exhibited excellent relative reliability between trials and days, which indicates a strong correlation of the repeated measurements. Furthermore, the armband sensor data exhibited excellent absolute reliability between the trails and moderate absolute reliability between days, which is indicated with a slight increase in SEM% and SDD%. As shown in Equations (6) and (8), SEM% and SDD% are directly correlated to the ratio of SD and mean of the measurements. The higher SEM% and SDD% between days are due to the larger SD to mean ratio. Further investigation was performed to determine if the armband data obtained at this level of reliability is sufficient to yield accurate and reliable activity classification results.
The ML-based classification results using both days' datasets show that the forearm EMG, acceleration, and gyroscope features are capable of classifying activities involving different body parts such as wrist, forearm, and whole-body and various motions such as repetitive motion, repeated impulsive motion, and free motion with high accuracy (Day-1 accuracy = 96.48% ± 0.0024 and Day-2 accuracy = 96.33% ± 0.0022). Furthermore, the overall classification accuracy of 98.13% achieved for nine uncontrolled activity datasets shows that the model is capable of recognizing activity with different intensities, which is one of the limitations of current construction activity recognition models [10,16,20]. The accuracy of proposed activity recognition models using EMG and IMU forearm data (Accuracy EMG + IMU = 98.13%) is higher than previously published construction activity recognition models such as carpentry activities (91%) [14], fall identification (94%) [15], manual material handling activities (90.74%) [11], ironworker activities (94.83%, 92.98%) [9,17], and bricklaying activities (88.1%) [20].
Moreover, the reliability assessment of classification results using Day-1 and Day-2 classifiers showed that there exists excellent reliability of classification results using the forearm EMG and IMU features. Later, the forearm EMG and IMU data were used to classify different weights of lifting activity, which is useful for various construction applications. The results showed that the overall classification accuracy of three classes (Lift10, Lift25, and Lift50) is 83.89% (0.0051), which is higher than the accuracy obtained by Ho, et al. [64] (77.1%) in classifying barbell weights from 20 to 70 lbs using forearm EMG features. Moreover, for three lifting weights, the gyroscope and EMG features are highly correlated, which contributed to higher classification accuracy. The comparison of classification performance for different sensor combinations on controlled (Accuracy EMG + IMU = 96.21%, Accuracy IMU = 94.65%, Accuracy EMG = 44.97%) and uncontrolled (Accuracy EMG + IMU = 98.21%, Accuracy IMU = 84.80%, Accuracy EMG = 47.60%) dataset showed that the highest accuracy is obtained in case of EMG + IMU which is in agreement with the previous studies on forearm gym activities (Accuracy EMG + IMU = 71.6%, Accuracy IMU = 67.8%, Accuracy EMG = 20.7%) [38], forearm manufacturing activities (Accuracy EMG + IMU = 87.4%, Accuracy IMU = 85.0%, Accuracy EMG = 50.7%) [41], and gym exercises (Accuracy EMG + IMU = 84.2%, Accuracy IMU = 77.7%, Accuracy EMG = 85.2%) [47]. Further, the increase in classification accuracy due to combined features show that the gyroscope and EMG features obtained at higher SEM% and SDD% are suitable for activity classification. The fusion of forearm muscle activity (EMG) and kinematic (IMU) data have resulted in the highest classification accuracy for a greater number of complex activities with different intensities. The advantage of using an armband sensor is that both forearm muscle activity and motion data are obtained from the single device and avoids the use of multiple sensors that obstructs construction work.
Some of the limitations of the study worth mentioning are that the data quality of the sensor data was assessed only on at-rest activities. All the in-motion activities were performed in residential settings by participants with little to no construction experience. All the participants in this study were right-handed and male. In addition to acceleration, gyroscope, and EMG data, the armband sensor provides orientation quaternion and Euler angles of the forearm. However, the orientation angles were not assessed for reliability in this study. Moreover, one can consider performing validity assessment for forearm EMG and IMU data of armband sensor.

Conclusions
The current study assessed the data quality and reliability of forearm EMG and IMU data from a low-cost wearable sensor for activity classification. In order to achieve the objective, the whole study was divided into seven experiments. From the first experiment, the data was inferred that the armband sensor data is comparable to conventional EMG and IMU data. Moreover, there was a very minimal effect of environment, confounding factors (communication device, power tools, other sensors, and smartwatches), and inter-device variability. Secondly, a qualitative comparison was performed to understand the effect of armband position on forearm EMG and IMU data, and it was concluded that the armband position does not affect IMU data, but EMG data was affected due to the sensor position. Thirdly, the trial-to-trial and day-to-day reliability of acceleration, gyroscope, and EMG data were assessed for five construction activities. The results conclude that the forearm IMU and EMG data for all five activities have excellent relative and absolute reliability between the trials, and between the days except for EMG data between the days has SEM% slightly higher than 10%. Next, the EMG and IMU data for both days was used to build and evaluate building ML-based activity classification models. The most common classification models were compared for the performance on the Day-1 and Day-2 datasets. The random forest classification algorithm showed the best performance on both the datasets. The reliability test on the classification results of both the classifiers confirmed that the classification results are high reliability with minimal change inaccuracies for both the days. The effect of lifting weight on classification performance was assessed, which concluded that the forearm EMG and IMU data could classify three different weights. Further, it was observed that a strong correlation in gyroscope and EMG features exists compared to accelerometer data for three classes. Finally, the comparison of classification performance for different sensor combinations showed that the forearm muscle activity and motion data fusion yield higher classification accuracy for construction activities with various intensities. The armband data is highly reliable, and the scientific evaluation of the armband sensor builds trustworthiness among researchers, policymakers, stakeholders, and customers to use the sensor for various applications. The data quality and reliability assessment of armband sensors show that the quality of muscle and motion-sensing data is sufficient for various construction applications related to construction skill training, safety training, and monitoring. Moreover, the classification results of the study conclude that the forearm-based EMG and IMU data can be used to generate reliable construction activity, recognition models.
Author Contributions: S.S.B. designed the study, conducted measurements in healthy participants, analyzed the data, and drafted the manuscript. C.W. and F.A. supervised S.S.B. in conceptualization, experiment design, data analysis, and manuscript writing. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.