Industrial Ergonomics Risk Analysis Based on 3D-Human Pose Estimation

: Ergonomics is important for smooth and sustainable industrial operation. In the manufacturing industry, due to poor workstation design, workers frequently and repeatedly experience uncomfortable postures and actions (reaching above their shoulders, bending at awkward angles, bending backwards, ﬂexing their elbows/wrists, etc.). Incorrect working postures often lead to specialized injuries, which reduce productivity and increase development costs. Therefore, exam-ining workers’ ergonomic postures becomes the basis for recognizing, correcting, and preventing bad postures in the workplace. This paper proposes a new framework to carry out risk analysis of workers’ ergonomic postures through 3D human pose estimation from video/image sequences of their actions. The top-down network calculates human body joints when bending, and those angles are compared with the ground truth body bending data collected manually by expert observation. Here, we introduce the body angle reliability decision (BARD) method to calculate the most reliable body-bending angles to ensure safe working angles for workers that conform to ergonomic require-ments in the manufacturing industry. We found a signiﬁcant result with high accuracy in the score for ergonomics we used for this experiment. For good postures with high reliability, we have OWAS score 94%, REBA score 93%, and RULA score 93% accuracy. Similarly, for occluded postures we have OWAS score 83%, REBA score 82%, and RULA score 82%, compared with expert’s occluded scores. For future study, our research can be a reference for ergonomics score analysis with 3D pose estimation of workers’ postures. of human body and z coordinate of camera angle, which are considered in reliability decision. Reliability is impacted from object-oriented occlusions and undetected joints on angular measurements. We also explored that the accuracy of the ergonomics score are high when angle between camera placing and workers’ position is placed between 45 to 90 degree. Hence, to estimate workers’ poses for high accuracy, the corresponding reliability line can be used to deﬁne the angle and its relationship with accuracy. The simulation plot in Figure 10 shows the accuracy incremental in our system after the introduction of BARD. As shown on the plot, before BARD the accuracy was low but after we removed all unreliable 2D key points and added the occlusion calculation of 2D key points it produced more accurate and stable results. We channeled the reliability threshold to 0.5 and the constant K optimum value set temporarily at 1 from the camera viewing angle, so that lower reliable input data would not be used in accuracy calculation. We also made sure the heavily occluded key points are also recovered. Without the occlusion awareness and reliability check function, such enormous key points detected were treated as the same, leading to a possible hindrance on accuracy.


Introduction
The state-of-the-art method in machine learning has achieved exceptional precision on many computer vision tasks exclusively from image learning models. There are factors associated with the work environment that can affect a worker's mental health, such as an inappropriate interaction between the type of job and the person's skills and competencies. These aspects can also influence, for example, the level of organization of the environment and the benefits that a company can offer to get the job done. Musculoskeletal disorder (MSDs) are perhaps the basic medical condition and the primary justification for nonattendance from work. MSDs are caused by musculoskeletal load built up from repeated improper postures, so workers' postures and movements provide key information in determining the likelihood of musculoskeletal injury. A recent statistical study conducted by the Bureau of Labor Statistics (BLS) showed that cases of MSDs account for 31% of all work-related injuries and illnesses [1]. Adopting ergonomically invalid or uncomfortable work postures while performing these manual activities can potentially lead to long-term MSDs. To resolve this issue of laborers in ergonomics, specialized labor are utilized to dissect the specialists working posture and the sort of hazards established with In summary, our contributions are as follows: • We propose a framework for an automatic pose analysis of industrial workers to prevent long-term MSDs. • We propose a novel approach of reliability decision to make sure input video sequence is appropriate for industrial workers' pose analysis. In summary, our contributions are as follows: • We propose a framework for an automatic pose analysis of industrial workers to prevent long-term MSDs.

•
We propose a novel approach of reliability decision to make sure input video sequence is appropriate for industrial workers' pose analysis.

•
We present a linear model for the reliability analysis, producing an accuracy estimate for a corresponding workers' pose input.

•
We aim for our research to be beneficial in ergonomics-related work to prevent a human workforce in the long term by reducing unnecessary injuries caused by bad posture working conditions.
The rest of article is organized as follows. In Section 2, related research is covered. In Section 3, the proposed approach is described in detail, including datasets preparation, pose estimation, and body angle reliability decision. Section 4 signifies all the experiments and results with datasets preparations and ergonomics score analysis. Section 5 is all about discussion and simulation results. Section 6 will be the conclusions and future work that could come from our research.

Related Work
This section discusses recent approaches to ergonomics score calculation: OWAS, RULA, REBA, and 3D pose estimation. Ergonomic risk was analyzed with manual expert inspection until a few years back when machine learning with computer vision revolutionized human action detection and pose estimation techniques. For our research, we selected three methods: OWAS, RULA, and REBA. The OWAS method estimates the static workload of the worker in the workplace by analyzing worker's postures during operation. It identifies four classes, which show static load risk degree [5]. Rapid Upper Limb Assessment (RULA) by Mc Atamney [3] in 2005, was used in ergonomics examinations of working environments where upper human body parts were only included for posture manual examination. Recent studies on RULA ergonomics are done referring to computer vision and machine learning [8]. In particular, the kinetic method with camera and software development kit (SDK) have been used to analyze the posture and RULA score [9][10][11]. REBA, as the name recommends, is an ergonomic examination instrument that is easy to use to evaluate an undertaking or a movement to check for dangers of musculoskeletal problems [2]. Similarly, we can see how the RULA score was calculated with differently adopted postures [9,10]. It provides necessary information about posture with convolutional neural network (CNN) and lower post-processing operation.
In recent years, deep learning methods [12][13][14] for evaluating human posture in 2D have been developed significantly. There are mainly two approaches in human pose estimation. The first is a top-down approach where bounding boxes are formed to detect human first, and the second one as bottom-up approach, which locates all human body key points in an image and then, with clustering techniques, groups them in an input image. These methods take advantage of advances in human recognition and additional person bounding box identification information. The top-down paradigm requires satisfactory performance, but at an additional cost for personal box recognition. Notable top-down approach work includes HR Net [15,16], Pose Net [17], RMPE [18], and Mask R-CNN [19]. In addition, key points localization from heat map [20][21][22], data augmentation [23], multitask learning [24], handling occlusion [25][26][27], and pose estimation [28,29], are further top-down approaches. Deep learning has recently demonstrated its capabilities in many computer vision tasks, such as 3D evaluation of human posture. Recent advances in 3D assessment of human posture are largely due to the use of various deep neural network models. However, these rely heavily on well-annotated data from fully supervised trained model and can rarely be generalized to new scenarios representing missing templates from the training dataset, such as new camera angle and human poses. Therefore, some recent research is exploring how to use external information to increase generalizability [30]. Even though 2D human pose estimations have made significant progress as described in [31], which focuses on human body shape, it talks a little about occlusion and invariant changes in human body appearance due to the hourglass network for providing human pose estimation. It is an improvement compared to the low-dimensional parameter model of body shape in [31]. Later, [32] showed a significant improvement for learning in spatial models with CNN incorporated into pose machine framework. With multi person pose estimation and joint localization, [32][33][34] gives significant improvement in human pose estimation. It still remains a challenge as some methods use camera array systems to track accurate 3D body motion [35,36] due to occlusion and unclear tracking. In addition, the effective human structure information was used in [37], and this approach was much more improved in hierarchical joint prediction [38], similarly 2D keypoints refinement on [39] and view-invariant constraint in [40].

Method
The goal of our framework is to analyze the ergonomics risk in work places with 3D pose estimation from 2D input image/video dataset. This section provides an overview of our framework for risk analysis and scoring of different ergonomics proposed on this research. We argue that 2D poses alone are not enough for accurate human pose estimation for action and body bending recognition. To justify this, we provide different key point features and conversion of 2D pose to 3D pose with 3DMPPE Pose Net [7] method to automate the ergonomics manual risk analysis.
The proposed architecture is described in Figure 2, having a video sequence as an input and producing an action category output based on the result of ergonomic risk analysis. As proposed in our architecture design, the first step is getting an input video from handcollected data from the manufacturing industry. The second step is locating a worker from the input video sequence. In our implementation, Darknet-53 is used to detect a worker and the region of the human body is located. After that, the human joint localization process is done via Pose Net network. This concludes our feature extraction and pose estimation process. The fourth step is a reliability check step with our proposed body angle reliability decision (BARD) network. In the proposed BARD step, the extracted human pose is evaluated whether or not it is good enough for the ergonomic score evaluations. More specifically, the BARD produces a reliability score, which is 1 for the maximum reliability and a score of 0 for minimum reliability. If the reliability score is high enough, ergonomics score evaluation is performed. In the final step, an action category is decided according to the ergonomic score.

Fetaure Extraction and Worker Detection
We use Yolov3 [27] as a framework to locate a human worker. Yolov3 consists of two parts: a bounding box prediction and feature extraction. It predicts an object score for each box using logistic regression with width and height from an input image, based on the

Fetaure Extraction and Worker Detection
We use Yolov3 [27] as a framework to locate a human worker. Yolov3 consists of two parts: a bounding box prediction and feature extraction. It predicts an object score for each box using logistic regression with width and height from an input image, based on the created bounding box and feature extracted from Darknet-53. It contains 53 different convolutional layers. This new feature extraction network is much more powerful than Darknet-19 and ResNet-101 or Resnet-152. Darknet-53 also achieved the highest measured floating-point operations per second. This means that the network topology makes better use of the GPU, making evaluation more efficient and faster. This is mainly because ResNets have too many layers and is inefficient. Thus, Darknet-53 performs on par compared to the state-of-the-art classifier with maximum speed and minimum floating-point operations.
In the proposed architecture, information about only detected person, such as x, y coordinates and width and height, i.e., P x , P y , P width , P height , are returned from an input image I, as described in (1), (1)

Our Approach to Pose Estimation
In most of pose estimation approaches, there are two approaches and the most commonly used one is called a top-down approach, deploying a human detector estimating bounding boxes of humans. Most of detected human area is cropped and fed into the pose estimation network. The second one, bottom-up approach, localizes all human body key points in an input image first, and then groups each person using clustering techniques. In our approach we used 3DMPPE [9] for human pose extraction, but the location of a human is provided in the proposed approach using Yolov3. The pose estimation part takes the feature map from the body part and up-samples it, using a batch normalization layer [7] and three successive deconvolutional layers with ReLU activation. A 1 × 1 convolution is applied to the up-sampled feature map to generate a 3D heat map for each joint. For 2D image coordinate extraction soft-argmax operation is used. As shown in Figure 1, 3DMPPE was used to estimate relative root 3D pose from cropped human images. 3DMPPE uses RootNet and PoseNet to generate the 3D human pose from the 2D human pose, as described below. Please refer to [9] for further information.
In RootNet, ResNet50 is used as a backbone network to extract a feature map. Then, 1 by 1 convolution is used to produce a correction factor, followed by a global average pooling. Lastly, the depth value of each feature point is calculated by multiplying a value k that is calculated by using (2): where α x , α y , A real and A real are focal lengths divided by per-pixel distance factors and the areas of human in real and image spaces, respectively. In PoseNet, the depth of feature points relative to root is calculated. For training of PoseNet, L1 distance is used to minimize the distance between real 3D coordinates and the corresponding estimated coordinates.

Body Angle Reliability Decicion (BARD)
In the previous section, we introduced two processes of feature extraction and 3D pose estimation for input data. This section explains how the data are selected to ensure high accuracy calculations of workers poses are obtained. Here, we proposed body angle reliability decision between camera and workers pose with three major joints from waist, arm and leg in calculating the body-bending angle. As shown in Figure 1, we can see how the x-axis and z-axis of human and camera positions align with each other to ensure maximum reliability can be measured. The main purpose of introducing BARD is to measure workers' poses accurately. Human experts are likely to use images in which workers' poses can be seen clearly. In other words, human experts skip the images where poses are not estimated accurately. Therefore, we introduced a reliability measure to detect poorly captured angles in images. In the proposed approach, we define the reliability R with a linear function: where K is a constant. The main goal of our system is to recover the maximum likelihood value of reliability, denoted as R in above equation with K as a constant. We use it as a trained parameter, making sure all the high reliability images are taken into account for ergonomics calculation. It ensures the high reliable angles are taken for camera angle and workers' position. Similarly, as shown is Figure 1, we have three axes. The z is the optic axis and x axis denotes the line connecting the left and right shoulder points of a worker. For instance, if a worker stands right in front of the camera, the angle between z and x axis is 90 degrees. If the worker turns around 90 degrees and the camera sees exact the side view of the worker, the angle between z and x axis is 0 degrees. The BARD was calculated with the cameras z-axis coordinate and human x-axis coordinate values. The coordinates output from 3D heat maps for each joint is used to measure different bending angles between the joints.
To calculate values of BARD we use (3), and the value is used to decide whether input image is appropriate to calculate workers' pose estimation. We choose the camera angle and workers' body pose and model their relationship as a linear function. As in Figure 1, we want to ensure that the 3D output model from PoseNet is of higher reliability. To block out the unnecessary low reliable human pose data, BARD model is appropriate for our research with minimal cost.

Experiments and Results
In this experiment, we focus on calculating the ergonomic score of workers' poses using three ergonomic score analysis methods: OWAS, RULA, and REBA. To calculate the ergonomic scores for each method and to analyze the risk of working poses, we used Pose Net model to extract the body key point features. All of the joints labelled as key points, were transformed into 3D models by Darknet and Pose Net feature extractors, and we introduced the reliability check, as explained in Figure 2.
Therefore, we showed how the poses were prepared and analyzed before calculating the final score for workers' poses. Publicly available datasets were used to train our collected dataset. The Human3.6 dataset [41] is the largest dataset for 3D single person benchmark, and consists of 15 activities for 11 different subjects, captured from four different viewpoints. In addition, datasets, such as COCO [42] and MPII [43,44], were used for training. Pycharm was used for implementation. We trained our datasets with five NVIDIA RTX 2080Ti GPUs. We present figures of simulations and tables to explain our experiment in detail. We conducted our experiment using these models and datasets to test our system output with the expert-generated ergonomic score. The extracted 2D key points features from YOLO Darknet model are fed into Pose Net model for 3D human pose estimation.

Datasets Preparation and Extraction
For the evaluation of workers' poses, we captured videos of industrial workers. Samples of captured images from video are shown in Figures 3 and 4. It consists of more than 10,000 video frames. We selected 600 images for ground truth evaluation as benchmarks. Three experts separately evaluated the same datasets giving three different scores for ground truth variability. In our experiment, we compared our system output to justify the experts' decision to ensure that our system produced the similar results. We compared the results with Cohens kappa κ [45] to compare the agreement index with the experts evaluation, where Cohens kappa κ is measured with experts' observation agreement and probable Electronics 2022, 11, 3403 7 of 17 agreement on different poses of workers' body angles. This method is helpful in comparing machine-learning predictions with manually established predictions. Many researchers have used the Cohens kappa measurement in most posture reliability studies [46][47][48]. If the Cohens kappa κ values are less than 0, then there is no agreement at κ = 0.01 − 0.20, there is poor agreement, κ = 0.21 − 0.40, there is fair agreement, while, κ = 0.41 − 0.60. indicates moderate agreement, κ = 0.61 − 0.80 good agreement, and the κ = 0.81 − 1 is in very good agreement [45,49]. Hence, we use the below equation to compare our system prediction with expert prediction scores of workers' postures as follows: For the evaluation of workers' poses, we captured videos of industrial workers. Samples of captured images from video are shown in Figures 3 and 4. It consists of more than 10,000 video frames. We selected 600 images for ground truth evaluation as benchmarks. Three experts separately evaluated the same datasets giving three different scores for ground truth variability. In our experiment, we compared our system output to justify the experts' decision to ensure that our system produced the similar results. We compared the results with Cohens kappa κ [45] to compare the agreement index with the experts evaluation, where Cohens kappa κ is measured with experts' observation agreement and probable agreement on different poses of workers' body angles. This method is helpful in comparing machine-learning predictions with manually established predictions. Many researchers have used the Cohens kappa measurement in most posture reliability studies [46][47][48]. If the Cohens kappa κ values are less than 0, then there is no agreement at κ = 0.01 − 0.20, there is poor agreement, κ = 0.21 − 0.40, there is fair agreement, while, κ = 0.41 − 0.60 indicates moderate agreement, κ = 0.61 − 0.80 good agreement, and the κ = 0.81 − 1 is in very good agreement [45,49]. Hence, we use the below equation to compare our system prediction with expert prediction scores of workers' postures as follows: where, Ƥ 0 is the relative observed agreement between experts on ground truth data and Ƥ is the probable chance of agreement. If the rates are in complete agreement, κ = 1. In case of no agreement, it would be expected by chance (as given by Ƥ ), κ = 0. For κ categories, M observations categorize and is the number of times rater predicted category κ, Ƥ is described as: For the evaluation of workers' poses, we captured videos of industrial workers. Samples of captured images from video are shown in Figures 3 and 4. It consists of more than 10,000 video frames. We selected 600 images for ground truth evaluation as benchmarks. Three experts separately evaluated the same datasets giving three different scores for ground truth variability. In our experiment, we compared our system output to justify the experts' decision to ensure that our system produced the similar results. We compared the results with Cohens kappa κ [45] to compare the agreement index with the experts evaluation, where Cohens kappa κ is measured with experts' observation agreement and probable agreement on different poses of workers' body angles. This method is helpful in comparing machine-learning predictions with manually established predictions. Many researchers have used the Cohens kappa measurement in most posture reliability studies [46][47][48]. If the Cohens kappa κ values are less than 0, then there is no agreement at κ = 0.01 − 0.20, there is poor agreement, κ = 0.21 − 0.40, there is fair agreement, while, κ = 0.41 − 0.60 indicates moderate agreement, κ = 0.61 − 0.80 good agreement, and the κ = 0.81 − 1 is in very good agreement [45,49]. Hence, we use the below equation to compare our system prediction with expert prediction scores of workers' postures as follows: where, Ƥ 0 is the relative observed agreement between experts on ground truth data and Ƥ is the probable chance of agreement. If the rates are in complete agreement, κ = 1. In case of no agreement, it would be expected by chance (as given by Ƥ ), κ = 0. For κ categories, M observations categorize and is the number of times rater predicted category κ, Ƥ is described as:  Figures 3 and 4. It consists of more th 10,000 video frames. We selected 600 images for ground truth evaluation as benchmar Three experts separately evaluated the same datasets giving three different scores ground truth variability. In our experiment, we compared our system output to justify experts' decision to ensure that our system produced the similar results. We compa the results with Cohens kappa κ [45] to compare the agreement index with the expe evaluation, where Cohens kappa κ is measured with experts' observation agreement a probable agreement on different poses of workers' body angles. This method is helpfu comparing machine-learning predictions with manually established predictions. Ma researchers have used the Cohens kappa measurement in most posture reliability stud [46][47][48]. If the Cohens kappa κ values are less than 0, then there is no agreement at κ 0.01 − 0.20, there is poor agreement, κ = 0.21 − 0.40, there is fair agreement, while, κ 0.41 − 0.60 indicates moderate agreement, κ = 0.61 − 0.80 good agreement, and κ = 0.81 − 1 is in very good agreement [45,49]. Hence, we use the below equation to co pare our system prediction with expert prediction scores of workers' postures as follo where, Ƥ 0 is the relative observed agreement between experts on ground truth data a Ƥ is the probable chance of agreement. If the rates are in complete agreement, κ = 1 case of no agreement, it would be expected by chance (as given by Ƥ ), κ = 0. For κ c egories, M observations categorize and is the number of times rater predicted c egory κ, Ƥ is described as: where,

Datasets Preparation and Extraction
For the evaluation of workers' poses, we captured videos of industrial workers. Samples of captured images from video are shown in Figures 3 and 4. It consists of more than 10,000 video frames. We selected 600 images for ground truth evaluation as benchmarks. Three experts separately evaluated the same datasets giving three different scores for ground truth variability. In our experiment, we compared our system output to justify the experts' decision to ensure that our system produced the similar results. We compared the results with Cohens kappa κ [45] to compare the agreement index with the experts evaluation, where Cohens kappa κ is measured with experts' observation agreement and probable agreement on different poses of workers' body angles. This method is helpful in comparing machine-learning predictions with manually established predictions. Many researchers have used the Cohens kappa measurement in most posture reliability studies [46][47][48]. If the Cohens kappa κ values are less than 0, then there is no agreement at κ = 0.01 − 0.20, there is poor agreement, κ = 0.21 − 0.40, there is fair agreement, while, κ = 0.41 − 0.60 indicates moderate agreement, κ = 0.61 − 0.80 good agreement, and the κ = 0.81 − 1 is in very good agreement [45,49]. Hence, we use the below equation to compare our system prediction with expert prediction scores of workers' postures as follows: where, Ƥ 0 is the relative observed agreement between experts on ground truth data and Ƥ is the probable chance of agreement. If the rates are in complete agreement, κ = 1. In case of no agreement, it would be expected by chance (as given by Ƥ ), κ = 0. For κ categories, M observations categorize and is the number of times rater predicted category κ, Ƥ is described as:

Datasets Preparation and Extraction
For the evaluation of workers' poses, we captur ples of captured images from video are shown in Fi 10,000 video frames. We selected 600 images for gro Three experts separately evaluated the same data ground truth variability. In our experiment, we com experts' decision to ensure that our system produc the results with Cohens kappa κ [45] to compare t evaluation, where Cohens kappa κ is measured wit probable agreement on different poses of workers' b comparing machine-learning predictions with man researchers have used the Cohens kappa measurem [46][47][48]. If the Cohens kappa κ values are less than 0.01 − 0.20, there is poor agreement, κ = 0.21 − 0.4 0.41 − 0.60 indicates moderate agreement, κ = 0 κ = 0.81 − 1 is in very good agreement [45,49]. Hen pare our system prediction with expert prediction s κ = Ƥ 0 − Ƥ /1 where, Ƥ 0 is the relative observed agreement betw Ƥ is the probable chance of agreement. If the rates case of no agreement, it would be expected by chan egories, M observations categorize and is the n egory κ, Ƥ is described as:

Datasets Preparation and Extraction
For the evaluation of workers' poses, we captured videos of industrial w ples of captured images from video are shown in Figures 3 and 4. It consists 10,000 video frames. We selected 600 images for ground truth evaluation as Three experts separately evaluated the same datasets giving three differe ground truth variability. In our experiment, we compared our system output experts' decision to ensure that our system produced the similar results. W the results with Cohens kappa κ [45]

Datasets Preparation and Extraction
For the evaluation of workers' poses, we captured ples of captured images from video are shown in Figu  10,000 video frames. We selected 600 images for grou Three experts separately evaluated the same datase ground truth variability. In our experiment, we compa experts' decision to ensure that our system produced the results with Cohens kappa κ [45] to compare the evaluation, where Cohens kappa κ is measured with probable agreement on different poses of workers' bo comparing machine-learning predictions with manu researchers have used the Cohens kappa measuremen [46][47][48]. For the evaluation of workers' poses, we captured videos of industrial workers. Samples of captured images from video are shown in Figures 3 and 4. It consists of more than 10,000 video frames. We selected 600 images for ground truth evaluation as benchmarks. Three experts separately evaluated the same datasets giving three different scores for ground truth variability. In our experiment, we compared our system output to justify the experts' decision to ensure that our system produced the similar results. We compared the results with Cohens kappa κ [45] to compare the agreement index with the experts evaluation, where Cohens kappa κ is measured with experts' observation agreement and probable agreement on different poses of workers' body angles. This method is helpful in comparing machine-learning predictions with manually established predictions. Many researchers have used the Cohens kappa measurement in most posture reliability studies [46][47][48]. If the Cohens kappa κ values are less than 0, then there is no agreement at κ = 0.01 0.20, there is poor agreement, κ = 0.21 0.40, there is fair agreement, while, κ = 0.41 0.60 indicates moderate agreement, κ = 0.61 0.80 good agreement, and the κ = 0.81 1 is in very good agreement [45,49]. Hence, we use the below equation to compare our system prediction with expert prediction scores of workers' postures as follows: where, Ƥ is the relative observed agreement between experts on ground truth data and Ƥ is the probable chance of agreement. If the rates are in complete agreement, κ = 1. In case of no agreement, it would be expected by chance (as given by Ƥ ), κ = 0. For κ categories, M observations categorize and is the number of times rater predicted category κ, Ƥ is described as:   Here, we present detailed results comparing the accuracy of ergonomics OWAS, RULA, and REBA scores with different data and methods. Tables 1-3 show the raw data for an input image and the scoring of different body parts taken for measurement. For OWAS waist, arm and leg are used and for RULA and REBA, upper and lower body parts, Here, we present detailed results comparing the accuracy of ergonomics OWAS, RULA, and REBA scores with different data and methods. Tables 1-3 show the raw data for an input image and the scoring of different body parts taken for measurement. For OWAS waist, arm and leg are used and for RULA and REBA, upper and lower body parts, such as upper and lower arm, wrist, neck, trunk, leg, and waist are considered. From scoring, we showed the accuracy of our system with different datasets in Tables 4-8. The accuracy of good postures, where all of the body joints are aligned to the x axis of human pose and the z axis of camera position, is shown in Table 4. This poses shows that accuracy was high, compared with the data sets with occlusion. The occluded images have slightly less accuracy because the angle calculation from those key points are not accurate every time. As shown in Tables 4-8, we can see how the data sets are divided into different sections for reliability calculations. Some of the data sets have high reliability, while some have low reliability in terms of the positioning with camera angles. Some data sets have high reliability but have low scores because of faulty detection where reliability is high. Similarly, occlusion is a major factor whether its self-occlusion affects the reliability and ergonomics score. Figure 5 shows an example of occlusion image, and it is one reason the ergonomics accuracy score is low. Getting reliable and accurate 3D joints from a single image is an intractable problem. We have seen few methods with LSTM [42] and RNN [50] using joint inter dependence and temporal convolutional methods to generate 3D pose from 2D key point sequence. However, it is not easy to use on each frame, as it requires the estimation of all 2D key points in every frames. Assuming all the prediction error it generates with temporary non-continuous and independent results, this does not apply to most of the occlusion cases. Thus, we choose the cylinder man model and apply it on occlusion as a network in [51,52], and it generated occlusion labels for the 3D data. We have results on our own data sets in Tables 4 and 5, and we have results for before and after BARD trained with publicly available datasets, such as Human3.6M, COCO, and MPII, in Tables 6 and 7. They show the accuracy on ergonomics we obtained for our dataset. Similarly, we trained our data sets with higher HR Net [53] but the results were not satisfactory, as shown in Table 8. The higher HR Net [10] 2D extracted feature were not effective on occluded data and body key joints detection, which causes decrement of accuracy on ergonomics scores. From the results, we can see that using the higher HR Net [10] on feature extraction has lower accuracy on ergonomics accuracy before and after the application of BARD. This method achieved relatively lower accuracy, compared to 3DMPPE [10] on all three ergonomic methods we have used on our system. effective on occluded data and body key joints detection, which causes decrement of accuracy on ergonomics scores. From the results, we can see that using the higher HR Net [10] on feature extraction has lower accuracy on ergonomics accuracy before and after the application of BARD. This method achieved relatively lower accuracy, compared to 3DMPPE [10] on all three ergonomic methods we have used on our system.   We found too much occlusion in the key points creates unreliable angle measurement and later affects the reliability check of the input data. Angles that are too small reliable angle and lower key points creates a huge accuracy dip in the system output. To fix this problem we separated the good posture and occluded posture datasets as input in the network. We then evaluated our approach on two method datasets. However, our key focus was on matching our final ergonomics score with those of the experts, which are shown in Tables 9-11 respectively.                          3  3  1  3  3  1  2   3  1  3  1  2  1  2   2  1  4  3  1  1  2   1  2  1  3  4  2  3   1  2  1  3  4  2  3   Table 4. Evaluation and comparison of accuracy of good posture.

OWAS Accuracy RULA Accuracy REBA Accuracy
Higher HR Net 73% 75% 72% Ours 75% 76% 74% Table 9. OWAS score accuracy compared with expert scores with observed agreement ( comparing machine-learning predictions with manuall researchers have used the Cohens kappa measurement i [46][47][48]. If the Cohens kappa κ values are less than 0, th 0.01 − 0.20, there is poor agreement, κ = 0.21 − 0.40, th 0.41 − 0.60 indicates moderate agreement, κ = 0.61 − κ = 0.81 − 1 is in very good agreement [45,49]. Hence, w pare our system prediction with expert prediction scores where, Ƥ 0 is the relative observed agreement between e Ƥ is the probable chance of agreement. If the rates are i case of no agreement, it would be expected by chance (a egories, M observations categorize and is the numb egory κ, Ƥ is described as:

OWAS Score
the results with Cohens kappa κ [45] to compare the agreement index with the experts evaluation, where Cohens kappa κ is measured with experts' observation agreement and probable agreement on different poses of workers' body angles. This method is helpful in comparing machine-learning predictions with manually established predictions. Many researchers have used the Cohens kappa measurement in most posture reliability studies [46][47][48]. If the Cohens kappa κ values are less than 0, then there is no agreement at κ = 0.01 − 0.20, there is poor agreement, κ = 0.21 − 0.40, there is fair agreement, while, κ = 0.41 − 0.60 indicates moderate agreement, κ = 0.61 − 0.80 good agreement, and the κ = 0.81 − 1 is in very good agreement [45,49]. Hence, we use the below equation to compare our system prediction with expert prediction scores of workers' postures as follows: where, Ƥ 0 is the relative observed agreement between experts on ground truth data and Ƥ is the probable chance of agreement. If the rates are in complete agreement, κ = 1. In case of no agreement, it would be expected by chance (as given by Ƥ ), κ = 0. For κ categories, M observations categorize and is the number of times rater predicted category κ, Ƥ is described as:  Table 10. RULA score performance with experts' data.

Datasets Preparation and Extraction
For the evaluation of workers' poses, we captured videos of industrial workers. Samples of captured images from video are shown in Figures 3 and 4. It consists of more than 10,000 video frames. We selected 600 images for ground truth evaluation as benchmarks. Three experts separately evaluated the same datasets giving three different scores for ground truth variability. In our experiment, we compared our system output to justify the experts' decision to ensure that our system produced the similar results. We compared the results with Cohens kappa κ [45] to compare the agreement index with the experts evaluation, where Cohens kappa κ is measured with experts' observation agreement and probable agreement on different poses of workers' body angles. This method is helpful in comparing machine-learning predictions with manually established predictions. Many researchers have used the Cohens kappa measurement in most posture reliability studies [46][47][48]. If the Cohens kappa κ values are less than 0, then there is no agreement at κ = 0.01 − 0.20, there is poor agreement, κ = 0.21 − 0.40, there is fair agreement, while, κ = 0.41 − 0.60 indicates moderate agreement, κ = 0.61 − 0.80 good agreement, and the κ = 0.81 − 1 is in very good agreement [45,49]. Hence, we use the below equation to compare our system prediction with expert prediction scores of workers' postures as follows: where, Ƥ 0 is the relative observed agreement between experts on ground truth data and Ƥ is the probable chance of agreement. If the rates are in complete agreement, κ = 1. In case of no agreement, it would be expected by chance (as given by Ƥ ), κ = 0. For κ categories, M observations categorize and is the number of times rater predicted category κ, Ƥ is described as:   Table 11. REBA score performance with experts' data.

RULA Score
training. Pycharm was used for implementation. We trained our datasets with five NVIDIA RTX 2080Ti GPUs. We present figures of simulations and tables to explain our experiment in detail. We conducted our experiment using these models and datasets to test our system output with the expert-generated ergonomic score. The extracted 2D key points features from YOLO Darknet model are fed into Pose Net model for 3D human pose estimation.

Datasets Preparation and Extraction
For the evaluation of workers' poses, we captured videos of industrial workers. Samples of captured images from video are shown in Figures 3 and 4. It consists of more than 10,000 video frames. We selected 600 images for ground truth evaluation as benchmarks. Three experts separately evaluated the same datasets giving three different scores for ground truth variability. In our experiment, we compared our system output to justify the experts' decision to ensure that our system produced the similar results. We compared the results with Cohens kappa κ [45] to compare the agreement index with the experts evaluation, where Cohens kappa κ is measured with experts' observation agreement and probable agreement on different poses of workers' body angles. This method is helpful in comparing machine-learning predictions with manually established predictions. Many researchers have used the Cohens kappa measurement in most posture reliability studies [46][47][48]. If the Cohens kappa κ values are less than 0, then there is no agreement at κ = 0.01 − 0.20, there is poor agreement, κ = 0.21 − 0.40, there is fair agreement, while, κ = 0.41 − 0.60 indicates moderate agreement, κ = 0.61 − 0.80 good agreement, and the κ = 0.81 − 1 is in very good agreement [45,49]. Hence, we use the below equation to compare our system prediction with expert prediction scores of workers' postures as follows: where, Ƥ 0 is the relative observed agreement between experts on ground truth data and Ƥ is the probable chance of agreement. If the rates are in complete agreement, κ = 1. In case of no agreement, it would be expected by chance (as given by Ƥ ), κ = 0. For κ categories, M observations categorize and is the number of times rater predicted category κ, Ƥ is described as:

OWAS Score Analysis
OWAS was developed to evaluate the exposure of individual workers to ergonomic risk factor associated with both upper and lower body, such as back, arm, and leg postures [48]. It counts the score from a different position of body and gives a final score, which will determine the category of ergonomics risk level. Figure 6 elaborates different working postures used for OWAS score analysis.
Electronics 2022, 11, x FOR PEER REVIEW 12 of 19 Table 10. RULA score performance with experts' data.  Table 11. REBA score performance with experts' data.

OWAS Score Analysis
OWAS was developed to evaluate the exposure of individual workers to ergonomic risk factor associated with both upper and lower body, such as back, arm, and leg postures [48]. It counts the score from a different position of body and gives a final score, which will determine the category of ergonomics risk level. Figure 6 elaborates different working postures used for OWAS score analysis. To validate the reliability of our score, we need to match the agreement between the score from expert and our system score, so we use Cohen's kappa method. The nearer the numbers are to 1, the more agreement there is of the calculated score. Table 9 shows the observed and probable agreement between OWAS scores computed from our estimated To validate the reliability of our score, we need to match the agreement between the score from expert and our system score, so we use Cohen's kappa method. The nearer the numbers are to 1, the more agreement there is of the calculated score. Table 9 shows the observed and probable agreement between OWAS scores computed from our estimated joint angle scores and scores from expert data. We considered the leg, arm, and waist for OWAS scoring. We had to adjust the weight to minimum.

RULA Score Analysis
RULA was developed to evaluate the workers' ergonomics risk factor associated with the upper extremity MSD. This method also considers the load extremities on neck and trunk. For RULA, we also consider the minimum weight, force/load and muscle as static. As shown in Table 10, we calculated accuracy and matched it with experts' scores. It is divided into three different score tables, as shown in Figure 7 below. For the Table A score, upper arm, lower arm, and wrist angles were considered. Similarly, for the Table B score, we considered neck, trunk, and leg angles. In addition, the final score was matched from Tables A and B to analyze the risk on Table C, as shown in Figure 7. The minimum RULA score is 1 and maximum is 7, which represents the ergonomics risk associated with the job. joint angle scores and scores from expert data. We considered the leg, arm, and waist for OWAS scoring. We had to adjust the weight to minimum.

RULA Score Analysis
RULA was developed to evaluate the workers' ergonomics risk factor associated with the upper extremity MSD. This method also considers the load extremities on neck and trunk. For RULA, we also consider the minimum weight, force/load and muscle as static. As shown in Table 10, we calculated accuracy and matched it with experts' scores. It is divided into three different score tables, as shown in Figure 7 below. For the Table A score, upper arm, lower arm, and wrist angles were considered. Similarly, for the Table B score, we considered neck, trunk, and leg angles. In addition, the final score was matched from Tables A and B to analyze the risk on Table C, as shown in Figure 7. The minimum RULA score is 1 and maximum is 7, which represents the ergonomics risk associated with the job. To validate the calculated scores, we also matched the agreement between observed and calculated score and it has high agreement values, as shown on Table 10 below. To validate the calculated scores, we also matched the agreement between observed and calculated score and it has high agreement values, as shown on Table 10 below.

REBA Score Analysis
This is also similar to the RULA score. Only Table A and B are switched with some modifications in how the bending of body angles are considered. Addition in REBA is the leg score further, as shown on Figure 8. We followed the same protocol as was the case in the RULA table. In addition, the coupling effect is adjusted as fitted, good grip, and acceptable [2]. Our REBA score also has a higher agreement score calculated with Cohen's kappa.

REBA Score Analysis
This is also similar to the RULA score. Only Table A and B are switched with some modifications in how the bending of body angles are considered. Addition in REBA is the leg score further, as shown on Figure 8. We followed the same protocol as was the case in the RULA table. In addition, the coupling effect is adjusted as fitted, good grip, and acceptable [2]. Our REBA score also has a higher agreement score calculated with Cohen's kappa.   Tables 9-11 show how much accuracy our system produces while compared with the experts' scores. These tools are used in our system to evaluate upper and lower body parts and MSD risks associated with the workers' job or tasks.

Discussion and Simulation Results
When occlusion occurs, it has adverse effects on reliability, as well as on ergonomics scores. For example, self-occlusion case reliability is high but some key points are occluded, which affects the overall reliability score. To decide the best-fit model for our experiment, we modeled the initial relationship between angle and accuracy on linear and exponential functions. Based on the experiment results shown in Figure 9, we found that the best fit for the reliability function was the linear regression model. This model achieves good accuracy, suggesting that the features contain meaningful key points and bending   Tables 9-11 show how much accuracy our system produces while compared with the experts' scores. These tools are used in our system to evaluate upper and lower body parts and MSD risks associated with the workers' job or tasks.

Discussion and Simulation Results
When occlusion occurs, it has adverse effects on reliability, as well as on ergonomics scores. For example, self-occlusion case reliability is high but some key points are occluded, which affects the overall reliability score. To decide the best-fit model for our experiment, we modeled the initial relationship between angle and accuracy on linear and exponential functions. Based on the experiment results shown in Figure 9, we found that the best fit for the reliability function was the linear regression model. This model achieves good accuracy, suggesting that the features contain meaningful key points and bending angles of the joints. Notably 3D human pose estimation is sensitive to occlusions and joint angles. We can conclude that maximum likelihood value R is highly dependent to the x coordinate of human body and z coordinate of camera angle, which are considered in reliability decision. Reliability is impacted from object-oriented occlusions and undetected joints on angular measurements. We also explored that the accuracy of the ergonomics score are high when angle between camera placing and workers' position is placed between 45 to 90 degree. Hence, to estimate workers' poses for high accuracy, the corresponding reliability line can be used to define the angle and its relationship with accuracy. The simulation plot in Figure 10 shows the accuracy incremental in our system after the introduction of BARD. As shown on the plot, before BARD the accuracy was low but after we removed all unreliable 2D key points and added the occlusion calculation of 2D key points it produced more accurate and stable results. We channeled the reliability threshold to 0.5 and the constant K optimum value set temporarily at 1 from the camera viewing angle, so that lower reliable input data would not be used in accuracy calculation. We also made sure the heavily occluded key points are also recovered. Without the occlusion awareness and reliability check function, such enormous key points detected were treated as the same, leading to a possible hindrance on accuracy.
angles of the joints. Notably 3D human pose estimation is sensitive to occlusions and joint angles. We can conclude that maximum likelihood value R is highly dependent to the x coordinate of human body and z coordinate of camera angle, which are considered in reliability decision. Reliability is impacted from object-oriented occlusions and undetected joints on angular measurements. We also explored that the accuracy of the ergonomics score are high when angle between camera placing and workers' position is placed between 45 to 90 degree. Hence, to estimate workers' poses for high accuracy, the corresponding reliability line can be used to define the angle and its relationship with accuracy. The simulation plot in Figure 10 shows the accuracy incremental in our system after the introduction of BARD. As shown on the plot, before BARD the accuracy was low but after we removed all unreliable 2D key points and added the occlusion calculation of 2D key points it produced more accurate and stable results. We channeled the reliability threshold to 0.5 and the constant K optimum value set temporarily at 1 from the camera viewing angle, so that lower reliable input data would not be used in accuracy calculation. We also made sure the heavily occluded key points are also recovered. Without the occlusion awareness and reliability check function, such enormous key points detected were treated as the same, leading to a possible hindrance on accuracy.

Conclusions
In this paper, we proposed a novel ergonomics risk analysis framework for 3D human pose estimation. Our system addresses ergonomic risk with help of 3D human pose estimation, which automates the ergonomics score analysis. To improve the accuracy of ergonomics, this paper provided a 3D skeleton joint pose estimation from 2D joint pose and combined them with introduction of BARD method for reliability check of input datasets. Our research applies 3D single-person pose estimation on a single RGB image for workers' pose estimation and body joint bending angle calculation. In addition, a new dataset is captured, which will provide big advantage in future research, requiring big datasets in ergonomics. In addition, we used occlusion calculation method for estimation of workers' pose from input data image. As far as we know this will be the first piece of work to 3D human pose in ergonomics to address the industrial work risk problem. This research will lead to a new idea for automated postural ergonomics calculation contributions, combined with different complex working environments. In future work, we will focus on resolving dense occlusion problems and present a more sophisticated version of the reliability function for workers' pose estimation.

Conclusions
In this paper, we proposed a novel ergonomics risk analysis framework for 3D human pose estimation. Our system addresses ergonomic risk with help of 3D human pose estimation, which automates the ergonomics score analysis. To improve the accuracy of ergonomics, this paper provided a 3D skeleton joint pose estimation from 2D joint pose and combined them with introduction of BARD method for reliability check of input datasets. Our research applies 3D single-person pose estimation on a single RGB image for workers' pose estimation and body joint bending angle calculation. In addition, a new dataset is captured, which will provide big advantage in future research, requiring big datasets in ergonomics. In addition, we used occlusion calculation method for estimation of workers' pose from input data image. As far as we know this will be the first piece of work to 3D human pose in ergonomics to address the industrial work risk problem. This research will lead to a new idea for automated postural ergonomics calculation contributions, combined with different complex working environments. In future work, we will focus on resolving dense occlusion problems and present a more sophisticated version of the reliability function for workers' pose estimation.