In this section, we present the experimental studies conducted by using the aforementioned testbed. It should be noted that all participants were provided with the same set of instructions prior to data acquisition. They were asked to perform each exercise to their maximum capacity while adhering to standard athletic form. For pull-ups, the standard was defined as starting from a full-hang position with arms extended and pulling up until the chin cleared the horizontal bar. For push-ups, the standard required maintaining a straight torso while lowering the body until the chest was near the floor, followed by a full extension of the arms. This protocol was intentionally designed to test the robustness of the assessment algorithm against the natural performance variations inherent in a diverse population.
5.1. Experimental Configuration
To debug and optimize the algorithm, the following outlines the workflow of human posture recognition and assessment task. The Human Pose Detection (Pose Landmarker [
34]) task utilizes the
create_from_options function to initialize the task, while the
create_from_options function accepts the configuration option values that need to be processed.
Table 6 presents the parameter configurations along with their descriptions.
The Pose Landmarker task exhibits distinct operational logic across its various execution modes. In IMAGE or VIDEO mode, the task blocks the current thread until it has fully processed the input image or frame. Conversely, in LIVE STREAM mode, the task invokes a result callback with the detection outcomes immediately after processing each frame. If the detection function is called while the task is occupied with other frames, it simply ignores the new input frame. In this system, although the live stream from the webcam inherently aligns with the LIVE STREAM mode, the VIDEO mode is deliberately chosen to ensure detection accuracy and fairness in assessment calculations. This decision benefits from the multi-process design of the assessment algorithm subsystems. Even when the posture processing speed lags behind the data collection speed, this design guarantees that all image frames are processed. Additionally, the parallel operation of multiple sub-modules within the assessment algorithm subsystem plays a crucial role in maintaining real-time performance.
Each time the Pose Landmarker task executes a detection, it returns a Pose LandmarkerResult object. This object contains the coordinates of each human body keypoint, providing essential data for subsequent analysis and assessment.
5.2. Pull-Up Case
In
Figure 10a, the blue line represents the changes of hand height across video frames during pull-ups. It can be observed that the hand height changes slightly and remains stable during the initial and final stages. This stability occurs because the tester’s hands do not vary significantly when standing either before starting or after finishing the pull-up assessment. However, during the pull-up process, there are slight jitters and variations about hand coordinates, primarily influenced by minor hand movements and errors of body keypoints recognition. Particularly during the initial stage of pull-ups, the hand height experiences significant changes. The blue line in
Figure 10b illustrates the variance of hand height values within each sliding window. It is evident that the variance remains small in frame intervals where the hands do not move significantly. The red line denotes the acceptable variance threshold (30 pixels, an empirical value selected based on the resolution of the subject’s video stream). Within the sliding window intervals that meet this variance threshold, we select the average hand height value from
Figure 10a and take its maximum value (as indicated by the red line) as the height of horizontal bar.
Similarly, the algorithm employs the same deductive logic to determine the eye height threshold, which will be used by the state machine judgment function
in Algorithm 2. Throughout the entire pull-up process, the elbow joint point displays similar data characteristics with the hand height in
Figure 10. The algorithm extracts the stable height value after the tester mounts the bar during the assessment. This value acts as the threshold that the eyes should remain below when descending to the low position of pull-up, which is referred to as the eye height threshold. The algorithm test results are presented in
Figure 11, and the deduction process will not be reiterated here.
Figure 12 presents the results of applying the frame-sequence-based joint correction and smoothing algorithm. To clearly demonstrate the execution process of the algorithm, the data results after the first-step correction are shown in
Figure 12a, while the final results after further smoothing following the correction are presented in
Figure 12b. It can be observed from
Figure 12a that the errors caused by outliers are effectively suppressed after the correction. In
Figure 12b, after the step of smoothing, the jitter errors of the coordinates are also successfully eliminated. This demonstrates that the algorithm effectively optimizes the errors in the recognition process and restores the true motion trajectory, thus providing a solid foundation for the design of the subsequent assessment algorithm.
In the preceding sections, the assessment criteria for pull-ups have been analyzed in detail, and a corresponding algorithm has been developed based on these criteria. To verify the actual effectiveness of the designed algorithm, we conduct the following experimental validations.
The data results graph from the standard pull-up experiment is shown in
Figure 13. The figure is divided into four sub-graphs from top to bottom, illustrating the variation of each key data point. The analysis of the experimental data is as follows.
The 1st Sub-graph: The blue line illustrates the height of the human mouth as the sequence of frames progresses, while the red line indicates the height of the horizontal bar as estimated by the horizontal bar height measurement algorithm. It is intuitively apparent that during a standard pull-up exercise, the curve representing the mouth’s height consistently intersects with the straight line denoting the height of the horizontal bar.
The 2nd Sub-graph: The blue line represents the variation of eye height throughout the frame sequence, while the red line indicates the estimated eye height threshold by
Figure 11. Similarly, the curve depicting eye height consistently intersects with the straight line representing the eye height threshold.
The 3rd Sub-graph: The blue line illustrates the variation of the angle of human elbow as the frame sequence progresses, while the red line represents the elbow angle threshold, which has been set at 150 degrees (an empirical value). This setting encourages the test-taker to keep their elbows fully extended when lowering to the low position.
The 4th Sub-graph: The blue line illustrates how the state machine varies as the frame sequence changes, allowing for observation of the overall state of human exercise and the logical counting process. Based on the sequence of the state machine in the 4th sub-graph, this experiment concludes that the test-taker performed 6 standard pull-up movements. Consequently, there were 6 transitions from State4 to State3 to State1 in the sequence.
Figure 14 shows the results of a pull-up assessment process that encounters multiple errors. The data indicates that only a few movements meets the standard assessment criteria. The main issues identified include: the mouth height not exceeding the horizontal bar during the upward pull and the eyes not being below the designated threshold height. Consequently, the final assessment score is determined to be 4, which corresponds to 4 sequences of “State4-State3-State1” in the state machine.
Then, we organized various test-takers to test the pull-up assessment algorithm, and the results are shown in
Table 7. To evaluate the recognition accuracy of the pull-up algorithm, a set of experimental trials was conducted involving multiple participants performing pull-ups. During these trials, the actual number of valid pull-up repetitions performed by each participant was meticulously recorded as ground truth by experienced human assessors. The recognition accuracy of the system for pull-ups was then calculated by comparing the number of repetitions correctly identified by our algorithm against this manually established ground truth. Specifically, the accuracy represents the ratio of correctly recognized repetitions to the total actual repetitions, expressed as a percentage, thus quantifying the ability of the system to precisely count valid movements. Specifically, the accuracy rate shown in the table is calculated as the ratio of correctly recognized repetitions by the algorithm to the total actual repetitions established by human assessors, expressed as a percentage. Overall, the assessment algorithm demonstrates a relatively high accuracy rate among test-takers with high physical fitness levels. However, the recognition accuracy rate decreases slightly among ordinary college students and teachers. The overall accuracy rate of the algorithm is 96.6%, which indicates that the algorithm can effectively assess pull-ups in most cases.
5.3. Push-Up Case
Figure 15 presents the results of a standard push-up experiment. It includes five subgraphs arranged from top to bottom, illustrating the changes of multiple key data points. The specific analysis is as follows.
The 1st Sub-graph: The blue line illustrates the variations of elbow angle of human body as the frame sequence progresses. The red line represents the elbow angle threshold which is set as 140 degrees. This empirically chosen value aims to prevent excessive bending of the elbow during the upward phase. This threshold can be adjusted at different actual conditions. When the blue line is above the red line, it indicates that the elbow angle meets the criteria for the high support position; otherwise, it does not fulfill the established standard.
The 2nd Sub-graph: The blue line illustrates the variation of mouth height, indicating the position of the tester’s mouth relative to the ground during push-ups. The mouth height fluctuates throughout the movement. The red line represents the minimum shoulder height throughout the entire push-up process, which can be different for different individuals. A necessary condition for entering the low support state during push-ups is that the mouth height must be lower than the minimum shoulder height.
The 3rd Sub-graph: The blue line illustrates the variation of the angle between torso and ground, while the red line indicates a threshold of 15 degrees.
The 4th Sub-graph: The blue line illustrates the variation of waist angle. The red line indicates the threshold which is set at 140 degrees. The purpose of this threshold is to ensure that the tester maintains a straight body.
The 5th Sub-graph: The blue line illustrates the changes of the state machine sequence. When the blue line of the state machine experiences a transition from State4 to State3 to State1, it signifies that the tester has successfully completed a standard push-up movement. During the movement, any non-compliant action will trigger State5. Finally, it can be observed from the figure that the result of this push-up assessment is 9.
Figure 16 shows the assessment results of non-standard push-up. In this experiment, the mouth height consistently failed to reach to the required threshold for the low-position state, and the torso angle also did not meet the standard. Therefore, in the state machine sequence, only 4 transitions from State4 to State3 to State1 happen. Thus, the final score of this tester is 4.
Similarly, we organized various test-takers to test the push-up assessment algorithm, and the results are shown in
Table 8.
5.4. Sit-Up Case
Figure 17 shows the different stages of the sit-up exercise.
The 1st Sub-graph: The blue line illustrates the distance between elbow and knee. The red line represents the sit-up-threshold. This threshold can be adjusted at different actual conditions. When the blue line is above the red line, it indicates that the the elbow and knee are in contact with each other.
The 2nd Sub-graph: The blue line illustrates angle between the body and the horizontal line. The red line represents the sit-down-threshold. When the blue line is above the red line, it indicates that the human body is in a lying state.
The 3rd Sub-graph: The blue line illustrates how the state machine varies as the frame sequence changes, allowing for observation of the overall state of human exercise and the logical counting process. Based on the sequence of the state machine in the 4th sub-graph, this experiment concludes that the test-taker performed 7 standard sit-up movements.
In the aforementioned pull-up, push-up, and sit-up experiments, we conducted tests with individual subjects. In fact, our system is designed under the specification that one camera is dedicated to capturing one person to ensure accurate posture recognition. When there are multiple people for testing simultaneously, we recommend configuring multiple cameras (one camera per person) to avoid interference between individuals. When occlusions occur or lighting conditions are poor, the posture processing module may inaccurately detect key points, leading to the phenomenon of “flying points”. In this case, the Joint Correction and Smoothing Algorithm can identify and correct abnormal key point positions caused by occlusions or bad lighting to a certain extent, thereby improving the stability and accuracy of posture recognition.
5.6. Discussion
Our developed system offers a comprehensive approach to automated exercise assessment, distinguishing itself by integrating multiple functionalities into a robust, end-to-end solution suitable for practical deployment—with explicit consideration of fault handling and design trade-offs to address real-world constraints. While much of the existing state-of-the-art research often focuses on optimizing individual components, such as advanced pose estimation algorithms or novel deep learning models for specific exercise recognition, our system prioritizes the entire workflow from real-time video capture to score calculation and data management—including targeted handling of practical faults inherent in visual sensor-based assessment. The primary non-quantifiable faults in this context are “flying points” and coordinate jitter of human keypoints, caused by mathematically unmodelable environmental interference like uneven lighting and limb occlusion. Instead of relying on theoretical fault quantification, we adopted an engineering validation approach: building an experimental testbed based on the Phytium 2000+ processor, collecting real motion data from 5 test groups, and implementing a frame-based inter-frame joint correction and Smoothing Algorithm. As shown in
Figure 12, this algorithm reduces “flying point” errors to ensure the reliability of subsequent posture assessment—a practical fault mitigation strategy that aligns with the system’s real-world deployment goal, where experimental validation of fault handling is more actionable than theoretical quantification.
This integrated design is crucial for real-world scenarios where not just accuracy, but also reliability, interpretability, and ease of deployment are paramount. The choice of a modular architecture and a rule-based FSM for exercise evaluation represents a deliberate design decision. Unlike purely data-driven, black-box deep learning approaches— which can achieve high accuracy but often lack transparency—our FSM model provides clear, human-interpretable rules for what constitutes a valid repetition. While our system is designed to be robust, extreme conditions could still pose challenges. Specifically, our calibration-free design, while enhancing deployment flexibility, makes the 2D-based assessment potentially vulnerable to significant alterations in viewing perspective or large-scale camera displacements. The lack of a calibration phase to compensate for such geometric variations is a limitation of the current study. Additionally, while our smoothing algorithm mitigates partial occlusions, more serious or prolonged occlusions can still lead to keypoint detection failures, which may cause the FSM to register errors. Finally, as the system relies on fixed visual sensors, it has limited portability for assessments in unconstrained environments compared to wearable devices like IMUs [
6]. Furthermore, our methodological choice to allow subjects to perform exercises ’freely’ to their maximum capacity, while reflective of a real-world scenario, introduces performance variability that might affect results. This approach relies heavily on the FSM’s robustness to filter non-standard movements, and its impact versus a strictly protocol-driven execution should be noted as a study limitation. Rigorous experimental testing validated the system’s efficacy, achieving 96.6% accuracy for pull-ups and 97.4% for push-ups, confirming its alignment with expert judgment. In summary, our system offers a robust, accurate, and transparent solution for automated sports assessment, effectively bridging the gap between advanced computer vision research and practical application. Furthermore we plan to integrate additional thresholds, such as elbow angle checks, alongside the existing eye and mouth criteria, to further minimize the likelihood of false positives in our assessment logic.