A Work-Related Musculoskeletal Disorders (WMSDs) Risk-Assessment System Using a Single-View Pose Estimation Model

Musculoskeletal disorders are an unavoidable occupational health problem. In particular, workers who perform repetitive tasks onsite in the manufacturing industry suffer from musculoskeletal problems. In this paper, we propose a system that evaluates the posture of workers in the manufacturing industry with single-view 3D human pose-estimation that can estimate the posture in 3D using an RGB camera that can easily acquire the posture of a worker in a complex workplace. The proposed system builds a Duckyang-Auto Worker Health Safety Environment (DyWHSE), a manufacturing-industry-specific dataset, to estimate the wrist pose evaluated by the Rapid Limb Upper Assessment (RULA). Additionally, we evaluate the quality of the built DyWHSE dataset using the Human3.6M dataset, and the applicability of the proposed system is verified by comparing it with the evaluation results of the experts. The proposed system provides quantitative assessment guidance for working posture risk assessment, assisting the continuous posture assessment of workers.


Introduction
Work-related musculoskeletal disorders (WMSDs) are an unavoidable occupational health problem for workers and significantly affect their quality of life. Damage caused by exposure to problematic work environments can negatively affect the employment potential of workers [1], and this is emerging as a significant social problem in that it can lead to high costs for businesses and society as a whole [2]. The Eurostat Labour Force Survey ad hoc module "Accidents at work and other work-related health problems", reported that 60% of the population had musculoskeletal disorders [3]. The World Health Organization estimated that approximately 1.71 billion people worldwide suffer from musculoskeletal disorders and predicts that the incidence of these disorders will continue to increase [1].
To solve this problem, the industry sector has endeavored to prevent musculoskeletal disorders by developing and adopting various assessment methods with the goal of improving work conditions, such as the workloads, postures, work time, and task-performing methods, by analyzing risk factors of the workplace. Ergonomic assessment tools developed to analyze the risk factors of musculoskeletal disorders include the Ovako Working-posture Analysis System (OWAS) [4], rapid upper-limb assessment (RULA) [5], and rapid entire body assessment (REBA) [6], which are typically applied in industries where whole-body postural loads are assessed [7]. The OWAS was developed in 1977 to assess the improper Table 1 summarizes the classifiable posture parameters and risk levels of the OWAS, RULA, and REBA, which are typical assessment tools for the risk-factor analysis of musculoskeletal disorders of workers, where the number in the parentheses indicates the number of postures that can be classified in each body part. We can observe that the OWAS is more specialized for lower-limb analysis than RULA and REBA. RULA and REBA classify upper and lower arms to assess arm postures and investigate wrist postures; thus, they considerately evaluate the upper-limb pose.  (3), and legs (7) Force/load 4 RULA Trunk (6), upper arm (6), lower arm (3), legs (2), neck (6), and wrist (4, + 2) Force/load, muscle use 4 REBA Trunk (5), upper arm (6), lower arm (2), legs (4), neck (3), and wrist (3) Force/load, coupling, activity 5 * The number in parentheses refers to the number of postures classified in each assessment tool.

Requirements of Ergonomics Assessment Tools
Although RULA and REBA are specialized for upper-limb analysis and wrist postures are included as illustrated in Figure 1, previous studies [27][28][29] based on RGB images excluded wrist postures in their methods despite evaluating the worker's posture with RULA. and importance of the results, and Section 7 presents the conclusions and suggests directions for future research. Table 1 summarizes the classifiable posture parameters and risk levels of the OWAS, RULA, and REBA, which are typical assessment tools for the risk-factor analysis of musculoskeletal disorders of workers, where the number in the parentheses indicates the number of postures that can be classified in each body part. We can observe that the OWAS is more specialized for lower-limb analysis than RULA and REBA. RULA and REBA classify upper and lower arms to assess arm postures and investigate wrist postures; thus, they considerately evaluate the upper-limb pose. Table 1. Classification criteria for major working-posture-assessment tools.
Although RULA and REBA are specialized for upper-limb analysis and wrist postures are included as illustrated in Figure 1, previous studies [27][28][29] based on RGB images excluded wrist postures in their methods despite evaluating the worker's posture with RULA. To address the worker's postures, including their hand posture, and examine whether environmental effects occurred in the representative images of the targeted assembly process sites in the manufacturing industry, we used a representative human pose-estimation model. OpenPose can perceive the pose of the face, hand, and other body parts of multiple persons via a bottom-up approach [42][43][44][45][46]. The OpenPose 1.7.0 Whole Body Python API was used for the experiment, with the Body_25 and hand models.
We also used the postures of the cockpit module-assembly process workers in automobile manufacturing as inputs in OpenPose to estimate postures. The results indicate that the body pose estimation of the workers can be performed properly, as shown in Figure 2. However, problems arose when inferring the hand poses of the workers. As shown in Figure 3, OpenPose clearly estimates from the image representing the finger joint, but the estimation can be failed in continuous motion when holding the assembly tool in hand. We identified that this happens when workers wear gloves, and it causes a problem with hand-posture estimation. To verify this, we conducted an experiment in To address the worker's postures, including their hand posture, and examine whether environmental effects occurred in the representative images of the targeted assembly process sites in the manufacturing industry, we used a representative human pose-estimation model. OpenPose can perceive the pose of the face, hand, and other body parts of multiple persons via a bottom-up approach [42][43][44][45][46]. The OpenPose 1.7.0 Whole Body Python API was used for the experiment, with the Body_25 and hand models.
We also used the postures of the cockpit module-assembly process workers in automobile manufacturing as inputs in OpenPose to estimate postures. The results indicate that the body pose estimation of the workers can be performed properly, as shown in Figure 2. However, problems arose when inferring the hand poses of the workers. As shown in Figure 3, OpenPose clearly estimates from the image representing the finger joint, but the estimation can be failed in continuous motion when holding the assembly tool in hand. We identified that this happens when workers wear gloves, and it causes a problem with hand-posture estimation. To verify this, we conducted an experiment in which images of bare hands and of gloves were captured, as shown in Figure 4. In the results, the same estimation problem occurred. The hand-posture estimation result was perfect, even when various postures were taken with bare hands, but the hand-posture estimation failed for full-fist hands wearing gloves. In particular, the hand-posture estimation was unstable in the state where the knuckles were bent. which images of bare hands and of gloves were captured, as show results, the same estimation problem occurred. The hand-posture e perfect, even when various postures were taken with bare hands, b estimation failed for full-fist hands wearing gloves. In particular, th mation was unstable in the state where the knuckles were bent.    which images of bare hands and of gloves were captured, as shown in Figure 4. In the results, the same estimation problem occurred. The hand-posture estimation result was perfect, even when various postures were taken with bare hands, but the hand-posture estimation failed for full-fist hands wearing gloves. In particular, the hand-posture estimation was unstable in the state where the knuckles were bent.   We also examined the images of workers captured at the automobile cockpit module manufacturing process site and found that all workers were wearing gloves and in some cases aprons or sleeve protectors, depending on the process. In a manufacturing environment where wearing personal protective gear is unavoidable for the safety of which images of bare hands and of gloves were captured, as shown in Figure 4. In the results, the same estimation problem occurred. The hand-posture estimation result was perfect, even when various postures were taken with bare hands, but the hand-posture estimation failed for full-fist hands wearing gloves. In particular, the hand-posture estimation was unstable in the state where the knuckles were bent.   We also examined the images of workers captured at the automobile cockpit module manufacturing process site and found that all workers were wearing gloves and in some cases aprons or sleeve protectors, depending on the process. In a manufacturing environment where wearing personal protective gear is unavoidable for the safety of workers depending on the characteristics of the workplace, it is essential to estimate the We also examined the images of workers captured at the automobile cockpit module manufacturing process site and found that all workers were wearing gloves and in some cases aprons or sleeve protectors, depending on the process. In a manufacturing environment where wearing personal protective gear is unavoidable for the safety of workers depending on the characteristics of the workplace, it is essential to estimate the hand posture of workers wearing gloves when assessing the workers' pose using RULA or REBA.

Building Extra Dataset
The open datasets used in pose-estimation models are produced by capturing images of professional actors acting in a studio or capturing images of postures taken in daily living. Thus, most key-point datasets do not contain the environmental information of our target manufacturing site and do not specialize in data for workers wearing various types of personal protective gear. Hence, in this study, we constructed a dataset with images of the automobile cockpit module manufacturing site to assess the poses of workers using all the pose assessment items required in the ergonomic musculoskeletal risk factor assessment tools.
The constructed dataset was based on the key-point structure of COCO [30]-a dataset typically used for training two-dimensional (2D) human pose-estimation models. We used the same index structure employed by the COCO dataset and added the required fingertip and tiptoe information. The dataset structure is shown in Figure 5.
Int. J. Environ. Res. Public Health 2022, 19, x FOR PEER REVIEW 5 hand posture of workers wearing gloves when assessing the workers' pose using RU or REBA.

Building Extra Dataset
The open datasets used in pose-estimation models are produced by capturing ages of professional actors acting in a studio or capturing images of postures take daily living. Thus, most key-point datasets do not contain the environmental in mation of our target manufacturing site and do not specialize in data for workers w ing various types of personal protective gear. Hence, in this study, we constructed a taset with images of the automobile cockpit module manufacturing site to assess poses of workers using all the pose assessment items required in the ergonomic mu loskeletal risk factor assessment tools.
The constructed dataset was based on the key-point structure of COCO [30]-a taset typically used for training two-dimensional (2D) human pose-estimation mo We used the same index structure employed by the COCO dataset and added th quired fingertip and tiptoe information. The dataset structure is shown in Figure 5. First, we captured videos of workers performing unit tasks in the automobile c pit module production plant. A total of 154 tasks were captured on video for more nine hours, and all videos were recorded at 30 frames per second. Sixty of those t have a resolution of 720 × 480 and are compressed with MPEG-2, while the remainin tasks are 1440 × 1080 and H.264 compressed. Then, to build the dataset from the vid we used a web browser-based training data builder developed in-house.
As shown in Figure 6, when the videos captured onsite are uploaded for each process; the system samples and saves images at one-second intervals. After that, an tracted image can be selected to annotate the bounding-box area and taggable joint workers through the user interface, as illustrated in Figure 7. According to the ann tor's subjective judgment, joints that were invisible but could be clearly identified w recorded with the invisible attribute, and their information was added so that they c be used in training. To ensure that the data were not biased owing to the annota First, we captured videos of workers performing unit tasks in the automobile cockpit module production plant. A total of 154 tasks were captured on video for more than nine hours, and all videos were recorded at 30 frames per second. Sixty of those tasks have a resolution of 720 × 480 and are compressed with MPEG-2, while the remaining 94 tasks are 1440 × 1080 and H.264 compressed. Then, to build the dataset from the videos, we used a web browser-based training data builder developed in-house.
As shown in Figure 6, when the videos captured onsite are uploaded for each unit process; the system samples and saves images at one-second intervals. After that, an extracted image can be selected to annotate the bounding-box area and taggable joints for workers through the user interface, as illustrated in Figure 7. According to the annotator's subjective judgment, joints that were invisible but could be clearly identified were recorded with the invisible attribute, and their information was added so that they could be used in training. To ensure that the data were not biased owing to the annotator's personal judgment, an ID was assigned to each annotator, and the DB was designed to allow multiple annotators to tag joint information in the same image.  Using this system, a number of annotators created 33,302 cases, and a DyWHSE dataset was finally built after validating the data using the DyWHSE dataset builder, as shown in Figure 8. The DyWHSE dataset builder overlays the joint information recorded by multiple annotators on the image so that the status of the data can be checked visually. As a tool with the function of confirming joint information created by annotators as a part of the dataset according to the inspector's judgment, it provides coordinates highlighted in cyan for each joint as the recommended value based on the joint information recorded by the annotators. We developed the dataset builder with a simple interface consisting of two buttons-"Drop" and "Confirm"-so that the inspector could use this information to make a confirmation decision quickly.  Using this system, a number of annotators created 33,302 cases, and a DyWHSE dataset was finally built after validating the data using the DyWHSE dataset builder, as shown in Figure 8. The DyWHSE dataset builder overlays the joint information recorded by multiple annotators on the image so that the status of the data can be checked visually. As a tool with the function of confirming joint information created by annotators as a part of the dataset according to the inspector's judgment, it provides coordinates highlighted in cyan for each joint as the recommended value based on the joint information recorded by the annotators. We developed the dataset builder with a simple interface consisting of two buttons-"Drop" and "Confirm"-so that the inspector could use this information to make a confirmation decision quickly. Using this system, a number of annotators created 33,302 cases, and a DyWHSE dataset was finally built after validating the data using the DyWHSE dataset builder, as shown in Figure 8. The DyWHSE dataset builder overlays the joint information recorded by multiple annotators on the image so that the status of the data can be checked visually. As a tool with the function of confirming joint information created by annotators as a part of the dataset according to the inspector's judgment, it provides coordinates highlighted in cyan for each joint as the recommended value based on the joint information recorded by the annotators. We developed the dataset builder with a simple interface consisting of two buttons-"Drop" and "Confirm"-so that the inspector could use this information to make a confirmation decision quickly. The recommended value refers to a function of the system that refines the joint tion information recorded by multiple annotators and presents it on the image to the inspector make a judgement. The coordinate information of the joint selected the median value and the bounding box of the minimum region is obtained from th formation recorded by multiple annotators. The mean value is provided as a re mended value to meaningfully use all the joint information recorded by the annota however, as shown in Figure 9, the recommended values are affected by the data o annotator who mis-entered, among a number of annotators, the median value prov to the inspector. If it is determined that the modification of the recorded information is unnecessary, the recommended value is recorded as-is in the dataset with the "Confirm" button. This is a method of obtaining a large amount of data in a limited time. If it is necessary to modify the coordinates of the recommended value, this can be done by dragging the joint with the mouse and then recording. Furthermore, a specific annotator selection function is provided so that only the data of highly reliable annotators can be used, at the discretion of the inspector.
The recommended value refers to a function of the system that refines the joint location information recorded by multiple annotators and presents it on the image to help the inspector make a judgement. The coordinate information of the joint selected with the median value and the bounding box of the minimum region is obtained from the information recorded by multiple annotators. The mean value is provided as a recommended value to meaningfully use all the joint information recorded by the annotators; however, as shown in Figure 9, the recommended values are affected by the data of the annotator who mis-entered, among a number of annotators, the median value provided to the inspector.
Another function of the dataset builder is to link the recommended value with the pose-estimation model, as presented in Figure 10, and provide the inspector with the information obtained from the model. This is a method of generating more data, but in the present study, we built the dataset using only the data recorded and inspected by humans, to increase the data accuracy. the median value and the bounding box of the minimum region is obtained from formation recorded by multiple annotators. The mean value is provided as a mended value to meaningfully use all the joint information recorded by the an however, as shown in Figure 9, the recommended values are affected by the da annotator who mis-entered, among a number of annotators, the median value p to the inspector. Another function of the dataset builder is to link the recommended value pose-estimation model, as presented in Figure 10, and provide the inspector wit formation obtained from the model. This is a method of generating more data, b present study, we built the dataset using only the data recorded and inspecte mans, to increase the data accuracy.

Pose Model and Posture Assessment
Our objective is to develop an ergonomic WMSDs risk-assessment system that can be used continuously in manufacturing sites. To acquire images of assessment target work from the site quickly and easily, we used a single-view 3D human pose-estimation model for pose estimation and evaluated its performance by building an environment in which the DyWHSE dataset and existing datasets could be used together for training, as follows.

Modified Pose Model and Datasets
We used 3D-multi person pose estimation (3DMPPE) as a base model-a ResNet-152 [47]-based model proposed by Moon [23]-to validate the constructed DyWHSE dataset. This model comprises three modules: DetectNet [23], which estimates the bounding box of a person in an image; RootNet [23], which estimates the root coordinates centered on the camera; and PoseNet [23], which estimates relative poses according to the root. 3DMPPE used DetectNet, which was based on Mask-RCNN, and DetectNet required 120 ms to process a single frame in a single TitanX GPU. In the proposed system, YOLOv3 [48] was selected to detect workers as fast as possible. Among various versions of YOLO, YOLOv3-608 was chosen in the proposed system, requiring 51 ms to process a single frame in a single TitanX GPU. Furthermore, because we aim to assess the posture of a single worker, we changed the 3DMPPE's multi-person pose estimation function to single person pose estimation (SPPE).
This model was trained by combining Human3.6M [32] and COCO, which are 3D and 2D datasets, respectively, to infer 17 joints in three dimensions. Human3.6M is a dataset built with 3D annotations that is widely used in human pose-estimation research. The actions of 11 professional actors were acquired indoors with a marker-based motion-capture system according to 15 scenarios, resulting in approximately 3.6 million images. We used two protocols of this dataset for the quantitative assessment of the model.

Pose Model and Posture Assessment
Our objective is to develop an ergonomic WMSDs risk-assessment system that can be used continuously in manufacturing sites. To acquire images of assessment target work from the site quickly and easily, we used a single-view 3D human pose-estimation model for pose estimation and evaluated its performance by building an environment in which the DyWHSE dataset and existing datasets could be used together for training, as follows.

Modified Pose Model and Datasets
We used 3D-multi person pose estimation (3DMPPE) as a base model-a ResNet-152 [47]-based model proposed by Moon [23]-to validate the constructed DyWHSE dataset. This model comprises three modules: DetectNet [23], which estimates the bounding box of a person in an image; RootNet [23], which estimates the root coordinates centered on the camera; and PoseNet [23], which estimates relative poses according to the root. 3DMPPE used DetectNet, which was based on Mask-RCNN, and DetectNet required 120 ms to process a single frame in a single TitanX GPU. In the proposed system, YOLOv3 [48] was selected to detect workers as fast as possible. Among various versions of YOLO, YOLOv3-608 was chosen in the proposed system, requiring 51 ms to process a single frame in a single TitanX GPU. Furthermore, because we aim to assess the posture of a single worker, we changed the 3DMPPE's multi-person pose estimation function to single person pose estimation (SPPE).
This model was trained by combining Human3.6M [32] and COCO, which are 3D and 2D datasets, respectively, to infer 17 joints in three dimensions. Human3.6M is a dataset built with 3D annotations that is widely used in human pose-estimation research. The actions of 11 professional actors were acquired indoors with a marker-based motion-capture system according to 15 scenarios, resulting in approximately 3.6 million images. We used two protocols of this dataset for the quantitative assessment of the model. In Protocol 1, subjects S1, S5, S6, S7, S8, and S9 were used for model training, and S11 was used for testing. In Protocol 2, S1, S5, S6, S7, and S8 were used for model training, and S9 and S11 were used for testing. In addition, we used the in-house-built 2D dataset DyWHSE as an extra dataset along with the K-pop dance video dataset [34], which is a 3D key-point dataset provided by AIHub [49].

Extra Datasets
The DyWHSE dataset specializes in environmental factors of the manufacturing process and was built with the goal of improving the hand-posture estimation of workers wearing protective gear such as gloves. However, because it was built on a 2D coordinate system and does not have the depth information of the newly added hand joints, the joint estimation results of 3DMPPE trained with this dataset cannot be used to assess the risk factors of musculoskeletal disorders. Therefore, to supplement this, we additionally used the K-pop dance video dataset for training. The K-pop dance video dataset is a 3D dataset that was constructed by motion capturing K-pop cover dances of professional dancers in studios. In addition, the DyWHSE dataset provides wrist, thumb, and hand keypoints for hand-posture estimation. For instance, the wrist angle can be calculated by using a plane consisting of wrist, thumb, and hand keypoints. However, the K-pop dance video dataset has keypoints, including wrist, thumb, and ring fingers. Thus, the mid-finger point is generated and used to match the hand keypoint in out experiment. Figure 11 shows the defined joint and reproduced joints of the K-pop dance video dataset, and the 3D coordinate information of each hand and foot is provided as thumbring finger and big toe-little toe. To match DyWHSE and K-pop dance video datasets, we generated the joints of hand and foot by calculating the thumb-ring finger midpoint and the big toe-little toe midpoint; finally, we matched 25 joints used in training the model in the WMSDs risk-assessment system. The 3D datasets used to train DyWHSE-3DSPPE were Human3.6M, MuCo-3DHP [33], and K-pop dance video, and the 2D datasets were COCO, MPII [31], and the DyWHSE. sional dancers in studios. In addition, the DyWHSE dataset provides wrist, thumb, and hand keypoints for hand-posture estimation. For instance, the wrist angle can be calculated by using a plane consisting of wrist, thumb, and hand keypoints. However, the Kpop dance video dataset has keypoints, including wrist, thumb, and ring fingers. Thus, the mid-finger point is generated and used to match the hand keypoint in out experiment. Figure 11 shows the defined joint and reproduced joints of the K-pop dance video dataset, and the 3D coordinate information of each hand and foot is provided as thumbring finger and big toe-little toe. To match DyWHSE and K-pop dance video datasets, we generated the joints of hand and foot by calculating the thumb-ring finger midpoint and the big toe-little toe midpoint; finally, we matched 25 joints used in training the model in the WMSDs risk-assessment system. The 3D datasets used to train DyWHSE-3DSPPE were Human3.6M, MuCo-3DHP [33], and K-pop dance video, and the 2D datasets were COCO, MPII [31], and the DyWHSE.

Posture Assessment
In order to assess the posture of workers easily in various computing environments, we designed the WMSDs risk-assessment system with the structure shown in Figure 12. The user uploads the video of the worker via a web browser to the WMSDs web service, and when the analysis is requested, the system provides the result of the analysis. The following describes the internal operation of the system.
The image loader reads the downloaded image, and the image is entered into the

Posture Assessment
In order to assess the posture of workers easily in various computing environments, we designed the WMSDs risk-assessment system with the structure shown in Figure 12. The user uploads the video of the worker via a web browser to the WMSDs web service, and when the analysis is requested, the system provides the result of the analysis. The following describes the internal operation of the system. value that only analyzes the upper body after assuming the posture legs and feet ar supported. Figure 12. WMSDs risk-assessment system, which supports various computing environments.

⟶ = −
(1 The calculated angle and distance information are used as inputs in the WMSDs risk-assessment tools, and only the information of the joints that each assessment too references is reconstructed and matched, according to the assessment tool's rules. Final ly, the assessment tool calculates the score for each part of the worker's posture and de termines the action level. We developed the DyWHSE assessment tool, which uses the web-based graphica user interface (GUI) to display the assessed worker's pose information and the estimated The image loader reads the downloaded image, and the image is entered into the pose-estimation model after image preprocessing. The pose estimation module finds a person from the input image and estimates the joint within that area. According to the results, the whole-body measurement module calculates the angle and distance for the joints required in posture assessment. The whole-body measurement module calculates the angle and distance for all the joints required by the assessment module and preprocesses them to minimize redundant calculations in each posture-assessment tool. The joint angle is calculated by Equations (1)-(4) as three points in the 3D coordinate system, as shown in Figure 13, and the calculated distance information is used for the joint processing of the lower extremity state and the occluded image.
Int. J. Environ. Res. Public Health 2022, 19, x FOR PEER REVIEW 10 value that only analyzes the upper body after assuming the posture legs and fee supported.  According to the situation on-site, occlusion can occur in a part of the body if taking a video of the whole body has been difficult. When posture is analyzed using the captured image, an incorrect joint coordinate is provided. This part is determined in an incorrect posture by using the length and angle value of the leg joint, providing the result value that only analyzes the upper body after assuming the posture legs and feet are supported.
The calculated angle and distance information are used as inputs in the WMSDs' risk-assessment tools, and only the information of the joints that each assessment tool references is reconstructed and matched, according to the assessment tool's rules. Finally, the assessment tool calculates the score for each part of the worker's posture and determines the action level.
We developed the DyWHSE assessment tool, which uses the web-based graphical user interface (GUI) to display the assessed worker's pose information and the estimated joint information based on the WMSDs' risk-assessment system. In this program, the user can modify the items in the table interface to recalculate the score for the errors in the pose-assessment model, and the program has the function of outputting the poseassessment report. Furthermore, it provides an interface for a quick review of high-risk postures by providing a sorting function based on the scores of the analysis images, as shown in Figure 14. er can modify the items in the table interface to recalculate the score for the errors in the pose-assessment model, and the program has the function of outputting the poseassessment report. Furthermore, it provides an interface for a quick review of high-risk postures by providing a sorting function based on the scores of the analysis images, as shown in Figure 14.

Results
We used work-performing images of workers in the manufacturing industry to record the posture of one representative worker per image in the dataset. Through the validation, we constructed a 2D DyWHSE dataset for 15,849 images and expanded it to a DyWHSE-3DSPPE model to facilitate the inference of wrist poses in the 3DMPPE model by matching the joints in the hand area with the 3D K-pop dance video dataset.
To evaluate the performance of the model quantitatively and verify the quality of the dataset built in-house, we compared the performance of different models using the test protocols of Human3.6M, as shown in Table 2. As metrics for evaluating the similar-

Results
We used work-performing images of workers in the manufacturing industry to record the posture of one representative worker per image in the dataset. Through the validation, we constructed a 2D DyWHSE dataset for 15,849 images and expanded it to a DyWHSE-3DSPPE model to facilitate the inference of wrist poses in the 3DMPPE model by matching the joints in the hand area with the 3D K-pop dance video dataset.
To evaluate the performance of the model quantitatively and verify the quality of the dataset built in-house, we compared the performance of different models using the test protocols of Human3.6M, as shown in Table 2. As metrics for evaluating the similarity of postures, we used the mean per joint position error (MPJPE) [32] and procrustes analysis (PA) [50] MPJPE, which are widely used in 3D pose estimation. The unit for the MPJPE and PA-MPJPE was mm.
Here, j represents the sample index, J represents the number of joints (J = 17), Est j represents the estimated joint, and GT j represents the ground truth. Because the structures of the reference dataset and the DyWHSE dataset were not identical, only the 17 matching joints between the two datasets are compared.
As indicated by Equation (6), PA-MPJPE is calculated using the ground truth after aligning the estimated joints using PA before the MPJPE calculation.
Here, alignedEst j refers to the alignment of the estimated joints and shows the posture difference more purely than MPJPE by removing misalignments.
We employed the widely used Human3.6M dataset's evaluation protocols to evaluate the model trained with DyWHSE. Under Protocol 1, six subjects-S1, S5, S6, S7, S8, and S9were used for model training, and S11 was used for testing, as an PA-MPJPE assessment metric. Under Protocol 2, five subjects-S1, S5, S6, S7, and S8-were used for training, and two subjects-S9 and S11-were used for testing. They were used as MPJPE assessment metrics, and in accordance with previous works [23][24][25], every 5th frame and 64th frame in each video were used for training and testing, respectively. Furthermore, in accordance with [23][24][25], Human3.6M and MPII-a 2D dataset-were mixed (50:50), and the resulting dataset was used for training.
Using the foregoing evaluation method, we evaluated the DyWHSE-3DSPPE model and obtained the results shown in Tables 2 and 3. When the model was trained by adding the DyWHSE dataset to the training dataset of 3DMPPE, the PA-MPJPE was 33.8 mm, and the MPJPE was 53.3 mm. In contrast, when the model was trained with all the datasets (COCO, Human3.6M, MPII, MuCo-3DHP, K-pop dance video, and DyWHSE), the PA-MPJPE was 32.5 mm, and the MPJPE was 46.3 mm. We used the 3DMPPE and DyWHSE-3DSPPE models and obtained the pose estimation of the worker, as shown in Figure 15. The 3DMPPE model provided only the joint information for the back of the hand, whereas the DyWHSE-3DSPPE model provided information for the thumb and back of the hand to judge wrist twists. The worker images used for the comparison were sampled at five frames per seconds (FPS) from work-performing images of a tile manufacturing company, which were not used when building the DyWHSE dataset.  We used the 3DMPPE and DyWHSE-3DSPPE models and obtained the pose estimation of the worker, as shown in Figure 15. The 3DMPPE model provided only the joint information for the back of the hand, whereas the DyWHSE-3DSPPE model provided information for the thumb and back of the hand to judge wrist twists. The worker images used for the comparison were sampled at five frames per seconds (FPS) from work-performing images of a tile manufacturing company, which were not used when building the DyWHSE dataset. To evaluate the similarity between the proposed system and the expert assessment, the worker's posture was inferred using DyWHSE-3DSPPE, and the posture score was extracted using the RULA assessment tool. The image used for assessment was obtained To evaluate the similarity between the proposed system and the expert assessment, the worker's posture was inferred using DyWHSE-3DSPPE, and the posture score was extracted using the RULA assessment tool. The image used for assessment was obtained by acquiring work images of workers taking various working postures at three manufacturing plants and extracted at one FPS, and the sample images are shown in Figure 16.
of the workers. Here, we used the common assessment results of 30 images as ground truths (GTs) and compared them with the system's assessment results. For comparing the two systems, we performed a quantitative evaluation using Cohen's kappa [52] coefficient to check whether the system's pose-assessment result matched the GT. This coefficient represents the level of agreement between two evaluators and is defined as follows: where represents the degree of agreement between the observers, and represents the probability of the results of two evaluators matching by coincidence. We obtained results for the upper arm, lower arm, wrist, neck, trunk, and leg, as shown in Table 4. Cohen's kappa coefficient ranges from -1 to 1, and -1 corresponds to a complete discrepancy and 1 corresponds to a complete agreement. The degree of agreement based on the kappa coefficient is presented in Table 5.  The images used in the assessment were obtained from three manufacturing plants, and 200 images were provided to three ergonomics experts, who assessed the postures of the workers. Here, we used the common assessment results of 30 images as ground truths (GTs) and compared them with the system's assessment results. For comparing the two systems, we performed a quantitative evaluation using Cohen's kappa [52] coefficient to check whether the system's pose-assessment result matched the GT. This coefficient represents the level of agreement between two evaluators and is defined as follows: where p o represents the degree of agreement between the observers, and p e represents the probability of the results of two evaluators matching by coincidence. We obtained results for the upper arm, lower arm, wrist, neck, trunk, and leg, as shown in Table 4. Cohen's kappa coefficient ranges from -1 to 1, and -1 corresponds to a complete discrepancy and 1 corresponds to a complete agreement. The degree of agreement based on the kappa coefficient is presented in Table 5.

Discussion
Workers in manufacturing factories often wear gloves. By using OpenPose Whole Body Python API, hand-feature points cannot be detected when workers wear gloves, as shown in Figures 3 and 4. That is why the DyWHSE data set is built in the proposed research, which includes factory workers with gloves.
According to our experimental results, DyWHSE-3DSPPE was verified to be effective for worker pose-estimation. When the DyWHSE dataset, which was built using images of workers producing automobile cockpit modules, was used with Human3.6M and MPII to train an RGB image-based 3D pose-estimation model, the results exhibited differences of 1.4 mm for PA-MPJPE and 1.1 mm for MPJPE, compared with the base model. This indicates that the DyWHSE dataset had equal precision to the datasets used for training. Furthermore, the COCO, MuCo-3DHP, and K-pop dance video datasets were combined and used to train the model, which resulted in a higher inference performance of 2.7 mm in the PA-MPJPE assessment and 8.1 mm in the MPJPE assessment. Although the DyWHSE dataset that we built was a 2D dataset with a size of approximately 15,000, it was used as an additional training dataset in the existing models, and as shown in Figure 15, the pose-estimation results of manufacturing process workers were reliable compared with those of the existing models.
However, for the wrist posture estimated by the DyWHSE-3DSPPE model, a quantitative performance evaluation could not be conducted, owing to the absence of an assessment dataset for matching the added joints. The Cohen's kappa coefficient between the poseassessment results of experts and RULA was also not of a high enough degree of agreement, but it is interesting that the k values ranged between 0.636 to 0.704 for the upper arm, lower arm, and trunk. The k value of 0.56 for the leg indicates that it is difficult to judge the worker's leg support state only with the estimated joint information due to the floor structure of the site. In the case of the wrist, which was our target in this study, the degree of the agreement shows moderate performance. This joint has higher freedom of movement than other joints because the shape of the hand was affected by the use of tools.
Because of this problem, the WMSDs' risk-assessment system cannot be fully automated. Nevertheless, it appears to be adequate for use as a support tool that can reduce the fatigue of experts that arises when they analyze the motions of numerous workers directly for WMSDs' risk assessment. To this end, we provide a sorting interface to allow experts to focus on specific sections by setting a threshold for the assessed worker posture based on the action category. Furthermore, an intuitive interface is provided, where the assessment table can be modified directly so that the worker's posture can be re-assessed depending on the judgment of an expert to cope with the inadequate performance of the pose-assessment model. In our assessment system, parameters such as the load information and repeated pose information, which cannot be received as inputs from the image, can be manually entered and reflected in the score calculation. We identified the strengths and limitations of the proposed method through a quantitative evaluation, and to overcome the limitations, we will expand and refine the datasets and improve the model in follow-up studies.
It is important to note that our contribution is not to improve the accuracy of the posture assessment with a posture assessment model or high-resolution sensors. Rather, we aimed to build a feasible system that collects video images of the worker and assesses work-related musculoskeletal disorders with a low-cost and easy-installation method in the manufacturing industrial environment. This is why the proposed system uses a single-view 3D pose-estimation model that can easily obtain the image despite lower accuracy than IMU or motion-capture system. We identified that existing solutions cannot include the wrist posture of workers and the high error of workers with gloves, although most workers wear gloves in the workplace. Unlike existing solutions, we built the training dataset of the wrist posture of the workers with gloves to enable the accurate pose assessment of workers in an industrial environment, as shown in Figure 15. Therefore, we can conclude that the contribution of this paper is to provide a feasible solution that can be applied to the manufacturing industrial environment, and to improve the pose assessment by adding the wrist posture with gloved workers.
It is also noticeable that the task at the workplace is recorded as a 2D image, and the 2D images are provided to the expert for the pose assessment, as with the traditional method. The proposed system also uses the same image but it applies a 3D joint estimation model in order to improve the joint-angle estimation and the posture assessment as well. We adopted Moon's model [23] for the 3D joint estimation, and all the figures in the paper are the results of the projection of 3D posture estimation onto the 2D images. The proposed system can also convert the 3D posture estimation to the 2D image from the viewpoint of the left or right side.

Conclusions
We propose a DyWHSE-3DSPPE-based worker posture assessment system to support the wrist-posture assessment of the RULA assessment tool (an ergonomic work pose assessment tool) with a single-view 3D human pose-estimation model. We identified the problems associated with the difficulty of estimating wrist poses due to the gloves worn by workers in the manufacturing industry through OpenPose. To solve this problem, we developed a web browser-based data annotation system and DyWHSE dataset builder to build a dataset from images of manufacturing sites. Using these tools, we built a DyWHSE dataset with 2D coordinates containing 15,849 images of workers performing tasks in the automobile cockpit-module-producing process. Then, to compare the quality of this dataset with that of 3DMPPE (the base model), we trained the model using DyWHSE-3DSPPE, which extended the data to include hand joints. We evaluated the performance of the DyWHSE-3DSPPE model with the Human3.6M dataset. The results indicate an improvement of 2.7 mm in the PA-MPJPE compared with 3DMPPE. Furthermore, the MPJPE was 46.3 mm, corresponding to an improvement of 8.1 mm compared with that of 3DMPPE (54.4 mm). These results prove that the constructed dataset can improve the posture estimation performance, together with Keypoint datasets such as Human3.6M, MuCo-3DHP, MPII, and COCO. Finally, we used Cohen's kappa coefficient to analyze the degree of agreement in the assessment results of 30 images of worker postures between our system and the RULA method used by experts. Although the performance of the proposed system is inferior to that of the expert assessment, the system can be employed to assist expert analysis using the DyWHSE assessment tool's action-level-based pose classification interface and intuitive score-modification function.

Informed Consent Statement: Not applicable.
Data Availability Statement: The DyWHSE Dataset built up during the current study will not be publicly available.

Conflicts of Interest:
The authors declare no conflict of interest.