3.1. System Overview
The 3D human posture analysis system proposed in this study operates on depth maps generated under controlled conditions and consists of a multi-stage pipeline that extracts high-quality 3D meshes and spinal skeletal information from depth images. The depth maps used in this study were generated using data obtained from the SizeKorea (Seoul, Republic of Korea) database, which is a nationally managed anthropometric survey that collects and provides standardized high-resolution 3D human body models [
34]. The human body dimensions survey is measured periodically every five years and is conducted for people aged 20 to 69. The data for this study were data from the 8th Human Dimension Survey 2020 to 2023.
Figure 1 shows the overview of proposed method. The system first receives depth images from four directions (front, left, right, and back) to generate 3D point clouds for each view. In this process, a mask image is applied for background removal, and points are generated according to the unique coordinate systems of each view. Since the point clouds generated from multiple views possess different local coordinate systems, a registration process is essential to integrate them into a single global coordinate system. This system adopts a two-step registration strategy. The first step is global registration, which combines the RANSAC (random sample consensus) algorithm with the FPFH (fast point feature histograms) feature descriptor. The second step utilizes the ICP algorithm for fine registration. After the global initialization, we incorporate a quality-driven feedback mechanism that quantitatively assesses geometric consistency among the aligned views and automatically re-runs the alignment with adjusted parameters when the quality is insufficient. After initial registration, the point-to-plane ICP algorithm is applied to achieve fine registration. The registration process proceeds in the order of side views (left and right) followed by the back view, structured to minimize cumulative errors by aligning in the order of higher view overlap.
Once the registration is complete, the point clouds are merged into a unified point cloud, which is then optimized into a six-level LOD (level of detail) model through a quadric error metrics-based vertex reduction algorithm. An ensemble technique is applied to independently extract spinal skeletal information from each LOD model. In this process, AI-based landmark detection is combined with anatomical proportions to estimate 17 key joint points and compute spinal angles. The spinal skeletal information predicted from the six LOD models is integrated using a voting method, effectively eliminating noise due to resolution differences by selecting the most frequently occurring coordinate for each joint point. This multi-resolution-based ensemble strategy enhances the robustness and accuracy of spinal skeletal predictions compared to a single model.
3.2. Preprocessing: Depth Map Analysis and Point Cloud Generation
This system receives depth images from four directions (front, left, right, and back) to generate 3D point clouds. The depth images are provided in bmp format as grayscale images, where the brightness value of each pixel represents distance from the camera information.
As can be seen in the no mask preprocessing example in
Figure 2, directly converting the depth image to a point cloud resulted in excessive measurement noise, leading to significant distortion along the edges of the point cloud. To address this issue, this study introduced a mask-based preprocessing step to finely extract the foreground regions from the depth image. This allows for the proactive removal of unnecessary background points and sensor-induced noise, facilitating stable point cloud reconstruction.
First, the input depth map is represented with integer values ranging from 0 to 255, but for numerical stability in subsequent processing, it is normalized to floating-point values in the range of 0.0 to 1.0. An adaptive mask generation method based on double thresholding is applied to efficiently separate the foreground (the human body) from the background. The lower threshold of 0.2 removes sensor noise and overly close areas, while the upper threshold of 0.95 filters out regions corresponding to the background and depth measurement limits.
The generated binary mask undergoes morphological opening and closing operations using elliptical structural elements, which helps eliminate residual noise and naturally refine the outline of the human body. The depth map with the refined mask is then transformed into a 3D point cloud using an inverse projection algorithm based on the pinhole camera model. A KD-tree-based hybrid neighbor search is conducted to extract up to 30 neighboring points within a 5 mm radius around each point, and principal component analysis (PCA) is applied to this local patch to estimate the normal vectors. The eigenvector corresponding to the smallest eigenvalue is chosen as the normal direction, and this high-quality normal information significantly enhances the precision of the subsequent point-to-plane ICP registration process.
3.3. FPFH Feature Descriptor Extraction
The ICP algorithm updates the nearest correspondences between two point clouds iteratively to optimize the transformation (rotation/translation). Consequently, due to the convergence characteristics of the algorithm, the initial pose is a key factor determining the overall registration quality. ICP is structured such that its objective function is nonlinear and easily converges to local minima. If the initial registration is inaccurate or if the relative distance between the two point clouds is significant, it continuously matches incorrect correspondences, leading to the accumulation of misalignments. Especially in structures with significant curvature changes, such as the human body, an inadequate initial pose can greatly distort the relative positions of arms, legs, and the torso, making it virtually impossible to recover in subsequent optimization iterations.
For this reason, fine registration using ICP must be conducted under conditions where a stable initial pose is ensured. As can be seen in
Figure 3, applying ICP without initial registration can lead to misalignments that geometrically distort or overlap the overall shape by becoming trapped in local optima.
To prevent this, the current study stabilizes the initial pose using global registration based on RANSAC-FPFH and then performs fine registration using point-to-plane ICP. To address this, the system utilizes RANSAC (random sample consensus)-based FPFH (fast point feature histogram) feature matching to achieve global initial registration, followed by fine registration through the point-to-plane ICP algorithm. This combined approach secures global exploration capabilities during initial registration and achieves high registration accuracy through local optimization in the fine registration phase.
3.3.1. Depth Normalization and Mask-Based Point Cloud Generation
The FPFH (fast point feature histograms) is a structurally robust local feature descriptor that effectively addresses issues such as partial observations, curvature discontinuities, and depth sensor noise that inevitably arise in human-based point cloud registration. By applying a center-point-based single accumulation structure, FPFH eliminates the excessive computational load of the higher-dimensional descriptor (PFH), which computes the relationship among all pairs of neighboring points, while reliably preserving the core geometric information of relative normal distribution.
The feature compression structure of FPFH effectively suppresses noise propagation commonly encountered in human data and offers a relatively uniform expression even in areas where curvature changes and quasi-planar structures coexist, such as at joints and the torso. Additionally, its low computational cost allows for near real-time processing speeds in this pipeline, which requires continuous processing from four viewpoints (front, back, left, right), providing a practical advantage. Consequently, FPFH is selected as the fundamental component of local feature descriptors.
3.3.2. Hierarchical Registration and Supplementary Design Framework
The system combines RANSAC–FPFH-based initial registration (initialization) with a hierarchical adaptive design that reflects the viewpoint-specific characteristics of the data and practical operational constraints. This integration ensures both robustness and efficiency in point cloud registration.
First, to address the varying point density and occlusion patterns depending on the viewpoint, an adaptive voxel downsampling technique was applied. Voxel sizes of 5.0 mm for the left and back views and 3.0 mm for the right view were established. A multi-scale progression approach was introduced to gradually converge from global registration to fine registration. Specifically, scale stages of [25.0, 12.0, 6.0] mm for the left and back views and [10.0, 5.0, 2.5, 1.0] mm for the right view were designed to enable stable convergence from the global contour to local details. This multi-scale structure allows for comprehensive representation of the overall shape in the initial stages while supporting the refinement of local registration in subsequent steps.
Additionally, the rotation degrees of freedom were constrained based on the viewpoint to minimize misregistration that can occur during the registration of symmetrical bodies. A limited small-angle rotation was permitted for the left and back views to ensure registration stability, while a broader rotation range was allowed for the right view, where pose variations during data collection are relatively larger, thereby expanding the convergence domain.
During the RANSAC phase, a fitness threshold was established, and if the inlier ratio did not meet a certain level, the process automatically proceeded to the next scale stage. When sufficient matching quality was achieved, early termination was implemented to reduce unnecessary sampling. This conditional progression was designed to minimize computational load while maintaining registration quality.
To improve robustness against occasional failures of global initialization under occlusion, noise, or challenging body poses, we introduce a quality-driven feedback mechanism that evaluates the geometric consistency of the initially aligned point clouds before proceeding to downstream steps. After initial alignment, we compute an alignment quality score based on the mean nearest-neighbor distance between sampled points across each aligned view pair. Specifically, for a pair of aligned point clouds, we measure the mean nearest-neighbor distance and convert it into an overlap quality in the range [0, 1] by normalizing with the expected body size. The final score is obtained by averaging the overlap quality over all view pairs.
If the alignment quality score falls below a threshold (set to 0.4 in our experiments), the pipeline automatically triggers an adaptive refinement loop that re-invokes the alignment module with modified settings (e.g., increased RANSAC iterations or alternative ICP strategies) and re-evaluates the result using the same metric. This loop is fully automated and implemented as modular functions (assess_alignment_quality and apply_adaptive_refinement), ensuring that only geometrically consistent reconstructions are passed to mesh generation and LOD ensemble skeleton analysis.
3.3.3. Importance of Registration Order
The registration order significantly influences the overall registration quality based on the viewpoint characteristics and shape overlap of each view. In this study, an appropriate registration sequence was established for the multiple input point clouds. As shown in
Figure 4, applying the correct order allows the four viewpoints to be stably aligned into a single coherent shape; however, if the order is reversed, the relative positions of the front, side, and back point clouds become distorted, leading to serious mismatches or overlapping errors. This occurs because errors from the initial registration phase propagate to subsequent views, and particularly in human data with many partial observations, the impact of order selection becomes even more pronounced.
To minimize this error propagation, this study adopted a gradual registration procedure of front, left/right side, and back. The front view was prioritized for several reasons: it most clearly represents the center axis of the human body and provides stable structural reference points that define bilateral symmetry, such as shoulder width, hip width, and thoracic contours. Additionally, the front view tends to have superior sensor field of view and observation quality, making it the most suitable reference.
Subsequently, the left and right sides are sequentially aligned to the previously established front-based target to enhance the transverse cross-section structure of the human body, and finally, the back point cloud is aligned to complete the overall shape. This order is designed to perform registration starting from the areas with the highest overlap between views to suppress cumulative errors, ensuring that the lower overlapping back view is minimally affected by initial errors.
3.4. Fine Registration (Point-to-Plane ICP)
The objective of this stage is to minimize the remaining positional and orientational discrepancies after the global initial registration (RANSAC–FPFH) by aligning them to the local surface geometry. Since human data features extensive quasi-planar or low-curvature regions, such as the thoracic spine, scapula, and pelvis, an accurate fine registration algorithm is essential.
Figure 5 shows the comparison of the performance of point-to-plane ICP and point-to-point ICP by replacing only the cost function under the same initial global registration and the same correspondence update procedure. In the full-body registration scenarios, point-to-plane ICP consistently outperformed point-to-point ICP in several respects.
In this stage, RMSE (root mean square error) and fitness score are used to quantitatively evaluate the performance of the ICP algorithm. RMSE is the most widely used metric for measuring alignment accuracy between two point clouds, quantifying the average distance error between the aligned point cloud and the reference data. The fitness value ranges from 0 to 1, with higher values indicating that many correspondences between the two point clouds match. Particularly for partially overlapping point clouds obtained from medical depth maps, fitness directly reflects the ratio of valid correspondences contributing to the alignment process, making it essential for assessing the reliability of the alignment.
Prior studies have comprehensively evaluated alignment results by using both RMSE and fitness together. RMSE measures the precision of the alignment, while fitness measures the reliability of the alignment, allowing for an objective assessment of the overall performance of the alignment algorithm. This is particularly advantageous in medical applications, such as this study based on geometric alignment, as it enables the evaluation of pure geometric performance without the bias of artificial training data, thereby ensuring clinical reliability.
As shown in
Table 1, in the combination of front and left views, both methods exhibited similar performance (RMSE: 4.3246; fitness: 0.6715). However, in the combination of front and right views, point-to-plane ICP slightly improved the RMSE from 1.6576 to 1.6528. The most notable difference occurred in the combination of front, left/right, and back views, where point-to-plane ICP reduced the RMSE from 5.2462 to 4.5653, approximately a 13.0% decrease, while simultaneously improving the fitness value from 0.8024 to 0.8416, an increase of about 4.9%. This result clearly illustrates the effectiveness of normal direction constraints in complex scenarios with significant cumulative errors, such as back registration.
These findings align with the theoretical characteristics whereby point-to-point ICP is sensitive to slight sliding errors in the surface tangential direction due to isotropic distance minimization, while point-to-plane ICP utilizes geometric constraints in the surface normal direction to directly suppress residuals and expand the convergence domain. Additionally, the dataset exhibits positional noise for individual points due to the characteristics of the depth sensor; however, normals are estimated through local averaging, making them relatively stable. For these reasons, the fine registration algorithm in this process was adopted as point-to-plane ICP. The statistical significance of these differences is further validated using paired statistical tests in
Section 4.3.
3.5. Adaptive Vertex Reduction
This stage focuses on adjusting computational costs and resolution according to demands while preserving the shape fidelity of the human mesh after preprocessing and registration. It minimizes shape distortion and volume deviation by combining adaptive decimation reflecting curvature, surface density, and anatomical importance with step-wise normal recalculation and smoothing. The six levels of detail (LOD) generated from the same original source form a subset hierarchy that ensures consistency in coordinate systems, boundaries, and normals across resolutions.
Figure 6 shows the example of vertex reduction results by LOD.
Adaptive vertex reduction removes degenerate, duplicate, and abnormal elements, recalculates normals, and secures a normalized input mesh, subsequently reducing the number of vertices step-by-step to meet target retention rates. The quality-prioritized path utilizes precision reduction with boundary preservation and low tolerance to minimize surface distortion and volume deviation. The speed-prioritized path achieves the same goal by dividing it into five short steps to enhance throughput. By opting for step division over a single substantial reduction, cleanup and normal correction are immediately performed at the end of each step to suppress local collapse, silhouette loss, and non-manifold remnants. After reduction, lightweight smoothing and post-processing are applied to alleviate fine jaggedness and noise, ensuring the stability of sensitive contours and joint silhouettes in skeleton estimation and measurement. The pipeline operates consistently in the order of preprocessing, reduction, intermediate cleanup, normal correction, and smoothing.
Table 2 summarizes the reduction rates of the six LODs produced in this stage.
This stage is introduced to simultaneously satisfy three demands. First, anatomical areas such as near the spine, joint contours, and feature edges must be protected throughout the reduction process, combining conservative reduction in the quality-prioritized path with step-wise smoothing and normal recalculation to minimize shape distortion. Second, since the size and noise characteristics of input meshes can vary significantly in real-world applications, a complexity-aware policy automatically determines retention rates and post-processing intensity to consistently maintain output quality and processing times amid data diversity. Lastly, different resolution requirements coexist for the same subject, including analysis, real-time inference, streaming, and storage; thus, standardizing the six LODs in
Table 2 and maintaining a subset hierarchy obtained by further simplifying higher results ensures consistency in coordinates, boundaries, and normals during resolution transitions. Consequently, this adaptive reduction achieves accuracy, efficiency, and consistency, enhancing the robustness of succeeding modules and the operational efficiency of the entire pipeline.
3.6. Estimating Skeleton Based on Multiple LOD Ensembles
The spinal skeleton is automatically estimated through a 3D human mesh categorized into six levels of detail (LODs). The proposed skeleton estimation module consists of the following steps:
Initializing 3D joints using AI-based 2D pose landmarks;
Predicting independent skeletons from each LOD mesh;
Determining the final skeleton through median-based ensemble voting.
The initial joint coordinates of the skeleton are derived from 2D pose landmarks extracted from a frontal image. The frontal depth map is fed into the pose estimation model of MediaPipe Pose to detect full body key points, including the shoulders, elbows, wrists, pelvis, knees, and ankles. Among these, key points that directly contribute to spinal alignment and disk risk assessment are selected, including the head, neck, upper spine, middle spine, lower spine, and both shoulder and pelvic joints, defining the basic skeleton structure.
The initialized 3D joint coordinates are refined independently within each of the six LOD meshes. Neighboring vertices around the joint candidates are collected using KD-tree-based nearest neighbor searches, and the curvature and normal distribution of the local patch are analyzed to fine-tune the joint positions to conform with the body contour and anatomical orientation.
The final selected joint points are connected in the order of neck, upper spine, middle spine, and lower spine to form the spinal centerline, while the 3D curvature angles of the cervical, thoracic, and lumbar regions are calculated through the dot product of adjacent segment vectors. The spine is divided into three anatomical regions using anatomical ratio-based landmarks. The vertical distance between the C7 point—corresponding to the base of the cervical spine—and the sacral promontory is defined as the effective spinal length. Based on this length, the top 20% interval is classified as the cervical region, the 20–50% interval as the thoracic region, and the 50–80% interval as the lumbar region. This proportional division method absorbs absolute length variations due to individual height and body shape differences while consistently reflecting the physiological positions of spinal curvature reported in the literature. Each segment angle is also calculated individually across the six LODs, then the value closest to the median is adopted as the final value, minimizing estimation deviations of spinal angles due to differences in resolution and mesh reduction.
The skeleton is estimated by integrating the independently predicted joint candidates from different LODs through a median-based ensemble voting approach (
Figure 7). There are differing characteristics between the ultra-low and ultra-high LODs regarding quantization bias, surface smoothing degree, and noise sensitivity. In the ultra-low LOD, the mesh resolution is significantly reduced, maintaining the overall body contour while losing detailed structure, which decreases the accuracy of joint position estimation. In contrast, the medium LOD strikes a balance between structural stability and detailed shape representation but may show insufficient local representation at certain joints. The high and ultra-high LODs reflect fine shapes excessively, leading to high precision; however, they are overly sensitive to noise and local misalignments, which increases the likelihood of encountering outlier joint angles. These characteristics suggest that simply increasing the LOD resolution does not necessarily lead to improved estimation accuracy. Therefore, this study adopted a median-based ensemble voting strategy to leverage the advantages of independently predicted joint candidates from different LODs while compensating for their disadvantages, ultimately estimating the final skeleton.
The purpose of introducing the multi-LOD ensemble is to mitigate local bias errors and ensure consistency in joint estimation across variations in resolution.
Table 3 quantitatively evaluates how much the proposed median-based voting strategy reduces the inter-LOD deviation compared to average interpolation. In the cervical and lumbar regions, the median-based voting significantly reduced the angular deviation compared to the mean value (0.1131° vs. 0.9897° for cervical and 0.0894° vs. 0.2401° for lumbar), while maintaining a level of consistency similar to the mean-based approach in the thoracic region. Here, the deviation refers to the mean absolute difference (i.e., the mean absolute deviation) between each LOD candidate and the final result, with smaller values indicating higher geometric stability (inter-LOD consistency) of joint estimation across different resolutions.
Average interpolation generates virtual coordinates that were not actually observed at any LOD and reflects the inherent noise and bias present at the LOD level. In contrast, the median-based voting approach selects only candidates closest to the central tendency of actual predictions, thus suppressing the influence of outliers and preserving real anatomical structures that have been observed on the mesh surface multiple times. Consequently, the proposed voting strategy enhances the robustness and reliability of spinal curvature estimation by compensating, integratively, for the structural biases arising from different LODs.
The reasons for adopting a multi-LOD ensemble are threefold. First, local quantization and smoothing biases that arise during mesh reduction can lead to skewed joint positions when using only a single resolution. By independently estimating joints across different LODs and selecting the actual value closest to the central tendency, systematic errors can be reduced through the offsetting of LOD-specific biases. Second, uniformly sampling (50,000 points) and applying standardized normal estimation parameters (radius = 5; max_nn = 30) normalize the input distribution across LODs, ensuring homogeneity in point and normal statistics and compensating for the under-sampling or over-sampling of certain LODs with predictions from others. Third, using frontal image-based landmarks as a global guide helps reduce initial uncertainties in point-based 3D estimation and encourages multiple LOD candidates to converge under a common anatomical reference.
Choosing “the actual prediction closest to the central tendency” instead of direct average coordinates in voting is intended to exclude unrealistic interpolated coordinates, assuming that the internal estimator already satisfies geometric constraints such as relative length and angles between joints, thus maintaining only actual mesh-based solutions. The computational complexity increases linearly with the number of LODs; however, by processing each LOD path in parallel or in batches, the overall computational time can be efficiently managed. The visualization results are generated in a common coordinate system, making it easy to compare overlays between LODs, and when combined with the LOD hierarchy defined in
Section 3.5’s adaptive vertex reduction, it functions as a key module supporting real-time inference, precise analysis, and long-distance visualization.
3.7. Body Shape Analysis According to Spinal Angle
This study aims to quantify the risk of cervical and lumbar disks by utilizing metrics that reflect the body’s posture and balance from the extracted skeleton. To quantify disk risk, key body points are extracted using MediaPipe Pose, and the angles and ratios calculated from these coordinates are used to assess the risk of neck and back disk injuries. To evaluate disk risk, a clear definition of body metrics is necessary. Therefore, this study rigorously defines six variables (cervical lordosis angle, thoracic kyphosis angle, lumbar lordosis angle, shoulder levelness, pelvic angle, and spinal registration) that directly map to the scoring rules of ISO 11226 (static posture of workers) [
35] and RULA/REBA [
36] behavior-level scoring, performing posture-based disk risk diagnostics based on these quantified results.
The justification for adopting ISO 11226 and RULA/REBA as criteria lies in their complementary roles as an international standard (ISO) and a practical field tool (RULA/REBA). ISO 11226 systematically defines angular postures of the head, neck, and torso, static holding times (holding/recovery), support status, and left–right symmetry (tilt–rotation), allowing for the definition of posture-induced loads acting on the lumbar and cervical disks and deriving risk indicators (i.e., normal, caution, and risk) across angle intervals [
37]. RULA/REBA, through a total of four levels of action ratings that integrate posture, load, frequency, and coupling information, provides a tool that scores postures such as “leaning forward,” “tilting to the side,” and “twisting the torso,” presenting posture risk levels while assigning situational weighting above the absolute permissible criteria provided by the ISO [
38].
Thus, according to the principles of ISO 11226 and RULA/REBA, if the angle of “leaning forward” for the neck and torso exceeds approximately 20°, or if there is a pronounced “tilt to the side” or “twisting” (asymmetry between shoulders and pelvis) in the frontal and rear views and an increase in global registration (SVA), the action level is escalated, and posture ratings are refined [
39]. This dual assessment structure allows for the objective evaluation of neck and lumbar disk risks using only 2D key points extracted from frontal, rear, and side images, applying standards-compliant rules and providing grounded indicators for normal/caution/risk across angle intervals.
ISO 11226 describes the posture-induced mechanical loads on the neck and lumbar region based on angles (head/neck, torso), static holding times, support status, and left–right symmetry, while RULA/REBA scores forward flexion, lateral tilt, and torso twisting, as well as load, frequency of repetition, and object handling status (coupling) to indicate the urgency of interventions. Based on these definitions, this study selects six variables necessary for evaluating neck and lumbar disks: cervical lordosis angle, thoracic kyphosis angle, lumbar lordosis angle, shoulder angle, pelvic angle, and sagittal vertical axis. Based on the defined variables, calculations for each variable are based on the MediaPipe key point.
Figure 8 shows the key points of the body that are utilized in the variables. The calculation formula and definition of the variables are set as follows.
The cervical lordosis angle is approximated using neck flexion, defined as the difference between the trunk flexion angle (α) and the head flexion angle (β). The values of α and β are given by Equations (1) and (2), respectively, and the cervical lordosis angle is derived using their difference as shown in Equation (3). Therefore, the larger the angle, the more pronounced the loss of cervical lordosis and forward head posture become, which directly maps to the neck flexion item in ISO and the neck forward angle score in RULA [
40].
represents the vertical reference vector, and as
increases, the cervical lordosis decreases, which directly corresponds to the neck flexion item in ISO and the neck forward angle score in RULA.
The thoracic kyphosis angle is approximated using the upper and lower trunk segment angles. It is calculated as the angle between the vector directed from the midpoint of the shoulders to the ear and the vector directed from the midpoint of the shoulders to the hip joint midpoint, as shown in Equation (4). A larger value is interpreted as a tendency toward hyperkyphotic. As
increases, the thoracic kyphosis deepens, bringing the posture closer to a kyphotic posture [
41].
The lumbar lordosis angle is approximated using the thoracolumbar–pelvic segment angle. It is calculated as the angle between the vector directed from the hip joint midpoint to the shoulder midpoint and the vector directed from the hip joint midpoint to the knee midpoint, as shown in Equation (5). A smaller value of
indicates a tendency toward loss of lordosis or a flat back posture [
42].
Shoulder levelness is calculated as the angle between the line connecting the left and right shoulders and the horizontal line, as shown in Equation (6). The indicators for the shoulders and pelvis (Equations (6) and (7)) quantify coronal plane symmetry violations (lateral bending/twisting) and are used as scoring items for lateral bending and twisting in RULA/REBA [
43].
Pelvic tilt (obliquity) is calculated as the angle between the line connecting the left and right hip joints and the horizontal line, as shown in Equation (7).
The spinal alignment is measured using a photo-based sagittal vertical axis (SVA). It is calculated by normalizing the difference between the horizontal coordinate of the head reference point and the horizontal coordinate of the hip joint midpoint, scaled by the total body length, as shown in Equation (8) [
44]. In this case, a larger SVA value indicates a greater deviation from vertical alignment of the entire body.
This calculation procedure allows for the consistent mapping of the criteria of “angles, static holding, and symmetry” from ISO 11226 and the principles of “situational weighting (forward bending, lateral tilting, twisting, and load/frequency/coupling)” from RULA/REBA, using only the 2D key points provided by MediaPipe. Each variable can then be combined with the defined angle ranges (normal/caution/risk) to serve as input indicators for diagnosing the risk of cervical and lumbar disks.
Table 4 summarizes the normal/caution/risk ranges for spinal angles as defined in this study. To enable a structured interpretation of the extracted spinal indicators, this study adopts posture angle ranges derived from the principles of ISO 11226 and RULA/REBA. Within the context of depth-based, non-contact posture analysis, these ranges are utilized to describe relative postural tendencies and to facilitate the stratification of posture conditions into normal, caution, and risk categories. Given the inherent measurement variability and quantization uncertainty associated with depth sensors, the use of stratified angle ranges is intended to enhance robustness and interpretability at the posture risk level rather than precise angle estimation.
Accordingly, the criteria summarized in
Table 4 provide a consistent reference framework for ergonomic posture screening and longitudinal monitoring, rather than absolute clinical decision boundaries. This represents a redefinition of the risk level ranges specified in ISO 11226 and RULA/REBA within the spinal coordinate system adopted in this experiment. Accordingly, based on these posture assessment principles, the following table presents screening-level angle ranges adapted for depth-based spinal posture analysis.