A Geometry-Driven Quantitative Modeling Framework for Image-Based Human Motion Evaluation: Application to Sub-Pixel Posture Analysis and Feature Attribution

Lv, Tianci; Sheng, Keming; Qiao, Lan

doi:10.3390/math14050746

Open AccessArticle

A Geometry-Driven Quantitative Modeling Framework for Image-Based Human Motion Evaluation: Application to Sub-Pixel Posture Analysis and Feature Attribution

by

Tianci Lv

^1,*,

Keming Sheng

^2,* and

Lan Qiao

³

¹

College of Physical Education and Arts Humanities, China University of Petroleum (Beijing), Beijing 102249, China

²

College of Artificial Intelligence, China University of Petroleum (Beijing), Beijing 102249, China

³

College of Chemical Engineering and Environment, China University of Petroleum (Beijing), Beijing 102249, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2026, 14(5), 746; https://doi.org/10.3390/math14050746

Submission received: 26 January 2026 / Revised: 14 February 2026 / Accepted: 20 February 2026 / Published: 24 February 2026

(This article belongs to the Special Issue Mathematics Methods in Image Processing and Computer Vision)

Download

Browse Figures

Versions Notes

Abstract

Quantitative evaluation of human motion from image data requires both high geometric precision and mathematical interpretability. To address the limitations of pixel-level posture analysis and empirical performance scoring, this study proposes a geometry-driven quantitative modeling framework for image-based motion evaluation. Sub-pixel edge detection based on quadratic polynomial interpolation is first employed to construct a precise continuous representation of limb contours from image sequences. By abstracting the human arm as a spatial rigid-body system, posture evaluation is reformulated as an optimization problem governed by geometric constraints and physical principles. An optimal swing trajectory is obtained by minimizing the total kinetic energy of the system, which is solved numerically using Newton’s iterative method, avoiding the explicit solution of highly coupled inverse kinematics. To further analyze the contribution of multiple performance-related variables within a unified quantitative framework, a hybrid feature attribution strategy integrating Random Forest, XGBoost, and LightGBM is introduced. The proposed mixed feature mining approach reduces model dependency and enhances the robustness of factor importance ranking. The effectiveness of the proposed framework is validated using image data collected from a cloud-based table tennis classroom. The experimental results demonstrate that the geometry-driven modeling approach provides stable, interpretable, and discriminative evaluation outcomes, indicating its potential applicability to broader image-based human motion analysis tasks.

Keywords:

sub-pixel image analysis; geometry-driven motion modeling; energy-based optimization; spatial rigid-body geometry; numerical optimization; feature attribution; ensemble learning methods

MSC:

68U10; 65K10; 70B15; 65D18

1. Introduction

Quantitative modeling of human motion from image data remains a challenging problem in image processing and computer vision, particularly when high precision and interpretability are required. Unlike end-to-end learning-based approaches, geometry-driven methods explicitly incorporate spatial constraints and physical principles, enabling more stable solutions and clearer mathematical interpretation. Developing such models is essential for applications that demand objective evaluation and reproducible analysis rather than purely data-driven inference.

Sub-pixel image analysis plays a critical role in precise motion representation, as pixel-level discretization often limits the accuracy of geometric measurements. Existing sub-pixel edge detection techniques, including interpolation- and moment-based methods, provide effective tools for refining object contours, yet their integration with higher-level spatial modeling remains insufficiently explored in practical motion analysis scenarios. In particular, how sub-pixel geometric information can be systematically incorporated into a unified mathematical framework for posture evaluation and trajectory optimization is still an open problem.

From a mathematical perspective, human limb motion can be regarded as a constrained spatial geometry problem governed by kinematic structure and energy distribution. By abstracting the human arm as a multi-degree-of-freedom rigid-body system, motion evaluation can be reformulated as an optimization problem under physical constraints. Among various criteria, the minimum kinetic energy principle provides a mathematically meaningful objective, allowing for the derivation of an optimal swing trajectory without explicitly solving highly coupled inverse kinematic equations.

Despite recent progress in geometry-driven and image-based motion analysis, two structural limitations remain. First, most image-based evaluation frameworks rely on feature aggregation, clustering, or regression fitting, where performance scores are derived statistically rather than from physically grounded optimality principles. Second, existing geometry-driven approaches typically focus on pose reconstruction or spatial measurement, without formulating evaluation criteria as analytically interpretable optimization problems. Consequently, there is still a lack of unified frameworks that simultaneously integrate sub-pixel geometric precision, rigid-body mechanical modeling, and mathematically explicit optimality conditions for motion evaluation.

To address this gap, the present study proposes a geometry-driven quantitative modeling framework that couples sub-pixel contour extraction with a constrained kinetic energy minimization formulation. By deriving the evaluation criterion from a physically meaningful objective function, the proposed method establishes a direct link between geometric configuration and performance optimality, rather than relying solely on statistical correlation.

Furthermore, to analyze the influence of multiple performance-related variables within a unified mathematical framework, a hybrid feature attribution strategy based on ensemble tree models is employed. By integrating feature importance rankings from Random Forest, XGBoost, and LightGBM, the proposed mixed feature mining method reduces model-dependent bias and enhances the robustness of the attribution results [1,2,3,4,5,6,7]. This approach provides a quantitative and interpretable assessment of dominant factors affecting motion performance.

The proposed framework is validated using image data and auxiliary measurements collected from a cloud-based table tennis classroom, serving as an application case of the mathematical methods developed in this study. While the experimental setting is specific, the underlying modeling strategy is general and can be extended to other image-based motion analysis and evaluation tasks.

As shown in Figure 1, this study focuses on analyzing the performance of university students in table tennis classes through a cloud-based system. Using a sub-pixel edge detection algorithm for data processing and enhancement, this study employs spatial geometric methods to optimize the analysis of arm posture in terms of arm kinetic energy, determining optimal conditions. This leads to the development of a quantitative evaluation model for table tennis performance. Moreover, a hybrid feature mining algorithm is applied to perform primary control analysis of table tennis performance. This approach aids physical education teachers in universities by facilitating efficient and personalized teaching, ensuring that instruction is tailored to individual students’ needs.

Contributions

The main contributions of this study can be summarized as follows:

A geometry-driven quantitative posture modeling framework is proposed.
Human arm motion during table tennis strokes is abstracted as a spatial rigid-body system, enabling posture evaluation to be formulated explicitly in terms of geometric configuration and physical parameters rather than empirical scoring or black-box inference.
A sub-pixel contour-based mathematical representation of swing posture is con-structed.
By integrating quadratic polynomial interpolation with edge gradient analysis, sub-pixel arm contours are extracted and incorporated into subsequent spatial modeling, providing a mathematically consistent bridge between image-level measurements and continuous geometric variables.
An energy-minimization formulation for optimal swing trajectory evaluation is developed.
Based on the minimum kinetic energy principle, the posture assessment problem is transformed into a constrained optimization problem in spatial geometry, avoiding the direct solution of highly coupled inverse kinematics. The optimal posture parameters are obtained numerically using Newton’s iterative method.
A hybrid feature attribution strategy with reduced model dependency is introduced.
Feature importance rankings derived from multiple ensemble tree models are integrated through a mixed feature mining scheme, enhancing the robustness and interpretability of factor attribution from a mathematical aggregation perspective.
The proposed mathematical framework is validated through an image-based motion analysis application.
The experimental results from a cloud-based table tennis classroom demonstrate that the framework provides stable, interpretable, and quantitatively discriminative evaluation outcomes, supporting its applicability to broader image-based motion analysis problems.

2. Methods

2.1. Data Source

The quantitative evaluation model for table tennis performance was developed using video materials captured by the first author of this paper (Figure 2a,b) and He Tian, a PhD student from China University of Petroleum (Beijing) (Figure 2c,d). The data for the main controlling factor analysis of table tennis performance were collected from 170 university students enrolled in table tennis classes during the fall semester of 2023, including 142 male and 28 female students. A questionnaire survey revealed that the average height of the participants was 164.6 cm, their average body weight was 54.2 kg, and their average age was 20.1 years. Prior to this study, all participants were informed about the specific research process, and each participant signed an informed consent form.

2.2. Modeling Assumptions and Problem Formulation

To ensure the mathematical consistency and interpretability of the proposed modeling framework, several assumptions are introduced in the posture analysis and optimization process. These assumptions are not intended to oversimplify human motion but to provide a tractable and physically meaningful formulation compatible with image-based measurement constraints.

First, the human arm is modeled as a rigid multi-segment structure with fixed joint connections. This abstraction allows the spatial configuration of the arm to be represented using a finite set of geometric parameters while preserving the essential kinematic relationships between the shoulder, elbow, and wrist joints. Such rigid-body approximations are commonly adopted in motion analysis when the focus lies on global posture evaluation rather than fine-grained muscular dynamics.

Second, the mass of the table tennis racket is neglected in the kinetic energy formulation. Given that the racket mass is significantly smaller than the effective mass of the arm segments involved in the swing motion, this assumption simplifies the inertia representation without materially affecting the relative comparison of posture configurations. Consequently, the kinetic energy of the system is dominated by the arm motion itself, which aligns with the objective of posture-based evaluation.

Third, the minimum kinetic energy principle is adopted as the optimization criterion for swing trajectory assessment. From a mathematical standpoint, minimizing the total kinetic energy provides a scalar objective function that is continuous and differentiable with respect to posture parameters, making it suitable for numerical optimization. Physically, this criterion reflects an efficient motion pattern in which unnecessary joint rotations and excessive angular velocities are suppressed, leading to a stable and coordinated swing.

Fourth, instead of explicitly solving the inverse kinematics problem, which is known to admit multiple coupled solutions under joint constraints, the proposed framework reformulates posture evaluation as an energy-based optimization problem. This transformation reduces solution ambiguity and improves numerical stability, as the optimal solution corresponds to a local minimum of a well-defined objective function rather than one of many feasible kinematic configurations.

Finally, Newton’s iterative method is employed to solve the resulting optimization problem. The choice of this method is motivated by its fast local convergence properties under smooth objective functions, which are satisfied by the kinetic energy formulation. The iterative solution process ensures computational efficiency while maintaining sufficient accuracy for posture differentiation in practical image-based scenarios.

Overall, these assumptions jointly establish a mathematically coherent framework that balances modeling fidelity, computational tractability, and interpretability. While the assumptions introduce simplifications, they enable a clear linkage between sub-pixel image features, spatial geometry, and optimization-based posture evaluation, which is essential for reproducible and quantitative motion analysis.

2.3. Subpixel Calculation Method

The primary teaching materials and evaluation data for the table tennis class in the cloud-based classroom are predominantly in the form of images. Therefore, this study focuses on analyzing the grip posture and forehand-backhand photographs from the cloud-based classroom. Since edges are one of the fundamental features in all images, most currently developed methods for detecting the geometric parameters of objects are based on edge detection. These methods can generally be divided into pixel-level and sub-pixel-level edge detection algorithms. Pixel-level edge detection is fast but limited by hardware, and it cannot meet precision requirements. Thus, this paper adopts sub-pixel edge detection, which is not constrained by pixel size. Sub-pixel detection is mainly categorized into interpolation, fitting, and moment methods [8,9,10,11,12]. The interpolation method can achieve high accuracy with relatively few calculations; so, this paper applies polynomial interpolation for sub-pixel edge detection. The process is as follows:

(a) For three adjacent points (x₀, y₀), (x₁, y₁) and (x₂, y₂), the quadratic polynomial interpolation function is shown in Equation (1), where f(x) represents the interpolation function.

f (x) = \frac{(x - x_{1}) (x - x_{2})}{(x_{0} - x_{1}) (x_{0} - x_{2})} \cdot f (x_{0}) + \frac{(x - x_{0}) (x - x_{2})}{(x_{1} - x_{0}) (x_{1} - x_{2})} \cdot f (x_{1}) + \frac{(x - x_{0}) (x - x_{1})}{(x_{2} - x_{0}) (x_{2} - x_{1})} \cdot f (x_{2})

(1)

(b) The interpolation function f(x) is differentiated to obtain the interpolation point coordinates (x, y), and the calculation formula of the horizontal and vertical coordinates is shown in Formulas (2) and (3).

x = x_{1} + \frac{f (x_{0}) - f (x_{2})}{f (x_{0}) - 2 f (x_{1}) + f (x_{2})}

(2)

y = y_{1} + \frac{f (x_{0}) - f (x_{2})}{f (x_{0}) - 2 f (x_{1}) + f (x_{2})}

(3)

(c) Let D₀, D₁ and D₂ be the gradient amplitudes of these three points. The point with the largest amplitude will be detected as the edge point. The quadratic polynomial interpolation is performed on the edge point to obtain the sub-pixel edge point coordinate S(x_s,y_s), as shown in Equations (4) and (5), where d is the distance between adjacent two points, and θ is the positive angle between the gradient direction and the x-axis.

x_{s} = x_{0} + \frac{D_{1} - D_{0}}{D_{1} - 2 D_{2} + D_{0}} \times \frac{d}{2} \cos (θ)

(4)

y_{s} = y_{0} + \frac{D_{1} - D_{0}}{D_{1} - 2 D_{2} + D_{0}} \times \frac{d}{2} \sin (θ)

(5)

2.4. The Optimal Swing Trajectory Is Obtained by Using Space Geometry

As a foundational course, university table tennis classes use the number of consecutive forehand and backhand strokes as a quantitative indicator to assess students’ learning progress. A correct and efficient swing trajectory is a sufficient condition for achieving good stroke performance. The most straightforward modeling approach involves using inverse kinematics to represent all arm-related joints as functions of rotation angles in the global coordinate system. However, due to the nonlinearity of inverse kinematic equations and the complex coupling relationships in their inverse solutions, it is theoretically possible to derive an analytical solution for the optimal swing trajectory. However, the solutions for all joint angles are nested and coupled, leading to multiple possible solutions (there are typically eight inverse solutions if joint limits are not considered [13]), making the problem difficult to solve.

To avoid explicitly solving the coupled inverse kinematic equations, the present study adopts a spatial-geometry-based modeling approach. The human arm is simplified as a rigid multi-segment structure, and posture optimization is reformulated as an energy-minimization problem governed by geometric constraints. Under this framework, the optimal swing trajectory is derived by analyzing the total kinetic energy of the simplified rigid arm system, as detailed below.

The human arm is simplified as a single rigid body rotating about the shoulder joint O. The racket mass is neglected compared to the effective arm mass. Let: C be the center of mass (CoM) of the arm, m be the total arm mass, ω be the angular velocity of the arm about point O, r_OC be the position vector from O to C. The motion is approximated as a rigid rotation around O. For any point on a rigid body: V_P = V_O + ω × r_OP. +Since O is treated as the instantaneous rotation center: V_O = 0. Thus, V_P = ω × r_OP. In particular, the velocity of the center of mass is: V_C = ω × r_OC. The total kinetic energy of a rigid body is defined as follows:

E_{k} = \frac{1}{2} {\int ‖V (r)‖}^{2} d m

(6)

Using the standard decomposition of rigid-body motion, velocity of any mass element can be written relative to the CoM: V = V_C + ω × r′, where r′ is the vector from the CoM to the mass element. Expanding the squared norm:

{‖V‖}^{2} = {‖V_{C}‖}^{2} + 2 V_{C} \cdot (ω \times r^{'}) + {‖ω \times r^{'}‖}^{2}

(7)

Integrating over the entire body:

E_{k} = \frac{1}{2} m {‖V_{C}‖}^{2} + V_{C} \cdot (ω \times \int r^{'} d m) + \frac{1}{2} {\int ‖ω \times r^{'}‖}^{2} d m

(8)

Since

\int r^{'} d m = 0

(9)

By definition of the center of mass, the cross term vanishes. The remaining rotational term can be expressed using the inertia tensor I about the CoM:

\int {‖ω \times r^{'}‖}^{2} d m = ω^{T} I ω

(10)

Therefore, the total kinetic energy becomes:

E_{k} = \frac{1}{2} m {|V|}^{2} + \frac{1}{2} ω^{T} I ω

(11)

where m is the arm mass, and I is the inertia tensor matrix of the arm. This expression provides a scalar objective function fully determined by geometric configuration and physical parameters.

The magnitude of the center-of-mass velocity is:

‖V_{C}‖ = ‖ω \times r_{OC}‖

(12)

In the spatial geometry shown in Figure 3, define plane U as the plane passing through O and parallel to the wrist velocity direction. Let l_C =

‖r_{O C}‖

, ξ denote the angle between vector OC and plane U. Assuming l_C is approximately constant under the rigid-body model, the CoM velocity becomes:

‖V_{C}‖ = ω l_{C} \cos ξ

(13)

Substituting into Equation (11):

E_{k} = \frac{1}{2} m l_{C}^{2} ω^{2} \cos^{2} ξ + \frac{1}{2} ω^{T} I ω

(14)

Since m, l_C and w are treated as constant under the given task constraints, minimizing total kinetic energy is equivalent to minimizing cos²ξ, i.e., maximizing ξ. The maximum value of ξ occurs when Plane OEW ⊥ Plane U. Therefore, the optimal swing trajectory is obtained when the arm plane is perpendicular to the velocity-related plane U. This transforms the posture evaluation problem into a spatial geometric constraint, avoiding the need to explicitly solve coupled inverse kinematics equations.

2.5. Sensitivity Analysis

The importance of robustness against geometric perturbations has also been emphasized in image watermarking and geometric-invariant modeling literature [14], where stable spatial representations are constructed to resist geometric distortions. In contrast to signal-domain robustness, our approach ensures geometric stability through analytical energy minimization and constrained rigid-body modeling.

To assess the impact of modeling simplifications, we conducted a sensitivity analysis on the two main assumptions: (i) treating the arm as a rigid body, and (ii) neglecting the racket mass. If the racket is included, the total kinetic energy adds a term associated with the racket’s translational and rotational motion. This additional term mainly rescales the coefficient of the cos²ξ component in the energy expression, while the derived optimality condition (maximizing ξ) remains unchanged. Therefore, the simplification affects the absolute energy magnitude but does not alter the geometry-based optimal posture criterion.

To evaluate robustness to anthropometric uncertainty and non-rigid effects, we perturbed key parameters (l_C and inertia-related coefficients) within ±5% and ±10% and recomputed the posture scores for all samples. Under ±10% perturbation, the mean score change remains below 5%, and rank correlation remains ≥0.94 (Table 1). The resulting ranking of motion quality remained stable, showing high rank correlation with the original results, indicating that moderate deviations from the rigid-body assumption do not significantly affect the reliability of posture evaluation.

2.6. Mixed Feature Mining

Hybrid feature mining is one of the methods used for main control factor analysis. Common criteria for ranking feature importance include linear and nonlinear correlation coefficients, as well as factor weights obtained from ensemble tree algorithms. Different analytical methods yield varying factor weight results [15]. This paper employs a hybrid feature mining technique based on three ensemble tree models—Random Forest, Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM)—to ensure the discernibility of features’ impact on the results and the fairness of the evaluation outcomes. The expression for this method is shown in Equation (15).

R a n k_{final} = \frac{R a n k_{RF}}{| n |} \times \frac{R a n k_{XGB}}{| n |} \times \frac{R a n k_{LGBM}}{| n |}

(15)

where Rank_RF, Rank_XGB, and Rank_LGBM are the importance ranking of factors obtained by random forest, extreme gradient boosting and light gradient boosting algorithms, respectively, and n is the number of input features.

Equation (15) integrates feature importance rankings from three ensemble models using a multiplicative aggregation scheme. The multiplicative form penalizes features that receive low importance in any individual model, thereby emphasizing cross-model consensus. Since the aggregated score is proportional to

\prod_{i = 1}^{3} \frac{r_{i}}{n}

, any small ranking value significantly reduces the overall score, which structurally suppresses inconsistent feature importance across models. Normalization by n ensures scale consistency and prevents dominance of ranking magnitude. Compared with weighted averaging, which permits linear compensation among models, the multiplicative form introduces a nonlinear agreement constraint that favors features consistently ranked highly by all models. In cross-validation experiments, the multiplicative aggregation produced more stable top-ranked feature sets (measured by reduced variance in feature order across folds), indicating reduced sensitivity to individual model bias.

3. Results and Discussion

This section describes in detail the process of establishing a quantitative analysis method and analyzing the main control factors of students’ performance in table tennis cloud classroom. All computational experiments were performed on a 16-core CPU with AMD Ryzen 9 7950X and Windows 10 operating system.

3.1. Sub-Pixel Edge Detection

3.1.1. Graying

Typically, the online video data of students stored in cloud classrooms are in color, which involves three channels. Color images contain complex information, leading to longer processing times and higher computational costs, making it difficult to meet the needs of university teachers and students. Grayscale conversion of color images transforms a three-channel image into a single-channel image, greatly improving computational efficiency. The pixel grayscale value (Gray) at the (i)-th row and (j)-th column, calculated using the weighted average method, is shown in Equation (16). Equation (16) adopts the standard luminance weighting coefficients defined in the NTSC color model, which approximate human visual perception sensitivity. The grayscale conversion results are illustrated in Figure 4.

G r a y (i, j) = 0.299 I_{red} + 0.578 I_{green} + 0.114 I_{blue}

(16)

where I_red, I_green and I_blue represent the pixel values of the three color channels of the corresponding pixel points of the color three-channel image.

3.1.2. Binarization

Image binarization is a critical step in image processing, as it directly influences the quality of edge detection results. In the quantitative analysis of student performance in cloud classrooms, certain requirements can be imposed on the video materials uploaded by students, such as ensuring a dark background. This contrast with lighter skin tones enhances the clarity of sub-pixel edge detection. After converting the image to grayscale, the best threshold (T) can be selected from the grayscale histogram. Suppose the pixel value before binarization at coordinates (x,y) is g(x,y); then. the new pixel value (G(x,y)) after binarization based on this threshold is calculated using the formula shown in Equation (17).

G (x, y) = \{\begin{matrix} 1, g (x, y) \geq T \\ 0, g (x, y) < T \end{matrix}

(17)

Using Figure 4 as an example, the grayscale distribution histogram of the image after grayscale conversion is shown in Figure 5a. A suitable binarization threshold of 166 is selected, where pixels with grayscale values below 166 correspond to the background and clothing, while the remaining pixel values are close to skin tones. The binarized image result is shown in Figure 5b.

3.1.3. Edge Detection

After completing the grayscale and binarization preprocessing steps, edge detection is performed using the Canny operator and quadratic polynomial interpolation algorithm to identify the subject’s arm posture. The Canny operator is employed for the initial positioning of the arm’s edge contour. The Canny edge detector was implemented with a Gaussian smoothing kernel of size 5 × 5 and standard deviation σ = 1.4. The low and high hysteresis thresholds were empirically set to 0.1 and 0.3 of the maximum gradient magnitude, respectively. These values were kept constant for all image samples to ensure comparability and reproducibility. For sub-pixel refinement, quadratic polynomial interpolation was performed along the local gradient direction using three adjacent pixel points. The interpolation was applied only to candidate edge pixels identified by the Canny detector, and boundary pixels were excluded from refinement to avoid instability. Similar to boundary-aware parsing frameworks that explicitly enhance contour fidelity [16], our sub-pixel interpolation strategy aims to reduce geometric discretization errors in posture contour extraction.

As shown in Figure 6a, where the preliminary detection results appear satisfactory. The results of sub-pixel edge detection using quadratic polynomial interpolation are displayed in Figure 6b. Compared to Figure 6a, the quadratic polynomial interpolation yields more refined edges with a greater number of pixel points.

3.2. Obtaining the Optimal Swing Trajectory

Let the position of the table tennis racket be ([p_x, p_y, p_z]), with the racket plane rotating around the Z-axis of the global coordinate system by an angle (α) and around the X-axis by an angle (β). Thus, the orientation of the racket plane can be uniquely determined by ([p_x, p_y, p_z, α, β]). Consequently, the optimal racket swing trajectory is transformed into an optimization problem, solving for the rotational angle (γ) around the normal direction vector of the racket. Assuming the simplified triangular model of the human arm in Figure 3, where the sides of triangle OEW are denoted as (l₁), (l₂), and (l₃), spatial geometric knowledge such as the cosine rule can be applied to derive further insights.

2 l_{2} V_{EW} \cdot V_{OW} - | V_{EW} | (l_{2}^{2} + | V_{OW} |^{2} - l_{1}^{2}) = 0

(18)

When ([p_x, p_y, p_z, α, β = 0]) is given, the optimal racket swing angle (γ) can be solved using Newton’s iterative method, as described in Equation (18). The closer the value substituted into Equation (18) approaches zero, based on the positional information extracted from the subject’s arm edge in the image data (Figure 7), the more optimal the swing posture.

3.3. Analysis of Main Controlling Factors of Sports Performance

The images of 170 students were evaluated for their table tennis swing posture according to the methods established in Section 3.1 and Section 3.2. Combining the results of physical fitness tests and self-assessment questionnaires, a dataset for analyzing the key factors influencing table tennis performance was created. The statistical indicators of this dataset are shown in Table 2. Using the overall score for the table tennis class as the dependent variable—where this score is provided by professional table tennis course instructors based on factors such as stance, swing posture, and the number and quality of consecutive forehand and backhand strokes—all other indicators serve as independent variables. Attribution analysis is conducted using three tree ensemble algorithms. Unlike artificial intelligence algorithms such as neural networks, tree ensemble models are insensitive to the scale of variables since numerical scaling does not affect the position of split points [17], meaning there is no need for data standardization before training.

Figure 8 shows the feature importance ranking results from three tree ensemble algorithms based on supervised machine learning models. In the Random Forest algorithm, the Gini coefficient is used to determine the significance of features. XGBoost and LightGBM rank the contribution of each feature to the prediction based on information gain. The differing ranking criteria of the three tree ensemble algorithms result in some variation in the outcomes. Therefore, a more comprehensive feature importance ranking is obtained by integrating the rankings through a mixed feature mining method, as shown in Table 3.

Figure 8 shows that the two most important factors in the feature importance rankings of the three tree ensemble algorithms are swing posture score and weekly exercise time. In the XGBoost algorithm, weekly exercise time even surpasses the swing posture score in importance. According to the research results in Table 2, the sit-and-reach score, a measure of body flexibility, ranks fourth in its impact on the overall table tennis course score, while factors related to strength, such as explosive power, rank the lowest. This aligns with practical understanding that in table tennis, only part of the ball’s kinetic energy comes from the player’s absolute strength, while factors like the force generation technique (including striking and friction), the incoming ball’s kinetic energy, the racket rubber’s feedback, ball contact time, and the underlying force from the racket also play roles. Moreover, better body flexibility allows for more coordinated and accurate movements. The results obtained from the mixed feature mining method also highlight sleep time and sedentary time, offering a novel perspective for university instructors to optimize physical education. Students with adequate sleep tend to have clearer minds and quicker reactions, often performing better than sleep-deprived peers. In table tennis, the waist plays a crucial role in coordinating arm movements, footwork, and balance adjustment. Proper waist movements ensure coordination and effective force application. Prolonged sitting can easily cause lumbar strain, and over time, reduce the elasticity of intervertebral discs. Therefore, table tennis instruction should emphasize waist-strengthening exercises and education on lumbar protection.

3.4. Comparison with Existing Motion Evaluation Models

3.4.1. Mathematical Comparison with Existing Motion Evaluation Models

Existing motion evaluation frameworks can be broadly categorized into data-driven nonlinear embedding models and statistical aggregation models. In nonlinear dimensionality reduction and clustering frameworks (e.g., NDRC) [7], the evaluation process can be formally expressed as a composition of mappings:

x \in R^{d} \overset{Φ}{\to} z \in R^{k} \overset{C}{\to} S

, where Φ denotes nonlinear embedding (e.g., t-SNE) and C represents clustering-based scoring. The optimization objective in such models minimizes distribution divergence (e.g., KL divergence) in the embedded space rather than a task-specific mechanical quantity. Therefore, the final score emerges from statistical structure preservation rather than geometric optimality.

In full-data statistical evaluation frameworks [6], the score is typically defined as follows:

S (x) = \sum_{i = 1}^{d} w_{i} x_{i}

or obtained through regression fitting. In this case, parameters w_i are determined empirically, and the objective is to minimize fitting error relative to expert labels. While such models may achieve high predictive consistency, the resulting score lacks intrinsic physical interpretation.

In contrast, the present study formulates motion evaluation as a constrained energy minimization problem. Under rigid-body assumptions, the total kinetic energy is defined as Equation (14). The optimal configuration is obtained by solving

\min_{ξ} E_{k} (ξ)

, which reduces to minimizing cos²ξ under geometric constraints. Therefore, the evaluation score is directly related to deviation from a mechanically optimal state, rather than derived from statistical embedding or empirical weighting.

The three paradigms can be summarized as Table 4.

Although clustering-based models report strong correlation with expert scoring [7], the proposed framework provides comparable empirical consistency while relying on a lower-dimensional and physically interpretable objective function. From a mathematical standpoint, the key distinction lies in the nature of the optimization target: distribution divergence, regression error, or mechanical energy.

3.4.2. Baseline Experiments and Quantitative Comparison

To further verify the effectiveness of the proposed framework, comparative experiments were conducted against representative baseline methods, including a pixel-level geometry-driven variant (G0), end-to-end regression models (Ridge and two-layer MLP), and a tree-based ensemble model (XGBoost). The results are summarized in Table 5.

The pixel-level geometry model (G0), which does not employ sub-pixel refinement, achieved a Pearson’s correlation coefficient of 0.79 with expert evaluation, demonstrating that geometric modeling alone provides reasonable consistency. After introducing sub-pixel contour refinement and energy-based optimization, the proposed method improved the correlation to 0.91 and reduced the RMSE from 5.6 to 3.9, confirming that geometric precision enhancement significantly strengthens evaluation accuracy.

Compared with end-to-end regression baselines, Ridge regression achieved a correlation of 0.81, while the two-layer MLP reached 0.86. Although these models provide competitive predictive performance, their interpretability remains limited due to the absence of an explicit physical objective function. The XGBoost model achieved a correlation of 0.89 with RMSE 4.4, approaching the proposed method in accuracy. However, its interpretability depends on post hoc explanation techniques such as SHAP, whereas the proposed approach derives evaluation criteria directly from a mechanically meaningful energy minimization formulation.

In terms of computational efficiency, the proposed sub-pixel geometry method requires approximately 18 ms per image, slightly higher than the pixel-level variant (11 ms), but still within real-time processing constraints. Overall, the results indicate that the proposed framework achieves superior accuracy compared to geometry-only baselines and comparable performance to advanced machine learning models while maintaining stronger physical interpretability.

3.5. Generalizability, Limitations and Future Work

3.5.1. Generalizability

Although the present study focuses on table tennis forehand motion, the proposed framework is structurally generalizable. The modeling pipeline consists of three modular components: (i) sub-pixel geometric contour extraction, (ii) rigid-body energy functional formulation, and (iii) constrained optimization for posture evaluation. Among these, the contour extraction and optimization procedures are independent of the specific sport.

For different motion scenarios, the rigid-body configuration and constraint equations can be redefined according to the biomechanical structure of the task. For example, in tennis serve analysis, the arm–racket system can be modeled as a two-segment rigid chain with an extended inertia term, while in golf swing analysis, the energy functional may incorporate multi-joint rotational coupling. The core evaluation principle—deriving motion quality from deviation relative to a mechanically optimal configuration—remains unchanged.

Therefore, the framework is adaptable to other motion analysis tasks through redefinition of geometric constraints and inertia parameters, without altering the underlying optimization structure.

3.5.2. Limitations and Future Work

While this study proposes a quantitative evaluation framework based on posture recognition and feature attribution, it primarily focuses on short-term assessments within a single academic semester. Long-term tracking of students’ table tennis skill development and physical fitness progression over multiple semesters is not included. Future research should incorporate longitudinal follow-up studies to investigate whether the AI-driven assessments can predict and positively influence sustained improvements in performance. Moreover, the integration of time-series modeling and cumulative behavioral metrics (e.g., weekly training logs, injury records, skill retention) could offer a more comprehensive view of students’ long-term learning outcomes.

4. Conclusions

This study designed a quantitative evaluation system for table tennis cloud classroom performance, based on spatial geometry, sub-pixel edge detection, and mixed feature mining. The system includes two main components: quantitative evaluation of swing posture and quantitative analysis of key factors. The swing posture evaluation can utilize images uploaded by students for quantitative assessment, largely replacing expert evaluations, which reduces costs while providing better differentiation and a more accurate reflection of students’ actual performance. Based on the data from swing posture evaluation and self-assessment questionnaires, an attribution analysis revealed that swing posture, physical fitness, and flexibility are the primary factors affecting university students’ table tennis performance, while lifestyle habits such as sleep time and sedentary time are secondary factors. Overall, the quantitative evaluation system for table tennis cloud classroom performance proposed in this paper improves the efficiency and granularity of assessments in table tennis education, facilitating personalized and targeted teaching.

In the future, extending this framework to support longitudinal assessments across multiple semesters will be essential for understanding and enhancing long-term student development in physical education.

Author Contributions

Conceptualization, T.L. and K.S.; methodology, K.S.; software, K.S.; validation, T.L., K.S. and L.Q.; formal analysis, K.S.; investigation, T.L.; resources, T.L.; data curation, T.L.; writing—original draft preparation, T.L. and K.S.; writing—review and editing, T.L. and K.S.; visualization, K.S. and L.Q.; supervision, T.L.; project administration, T.L.; funding acquisition, T.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research and the APC were funded by China University of Petroleum (Beijing) Research Start-up Fund Project (Grant No. 2462022YXZZ019) and University-level Project of China University of Petroleum (Beijing) (Grant No. BK2022156).

Data Availability Statement

The data presented in this study are openly available in Science Data Bank at https://doi.org/10.57760/sciencedb.19573.

Conflicts of Interest

The authors declare no conflicts of interest.

References

St Amant, N. Concussion reporting and racial stereotypes: ESPN’s role in shaping public perception about athletes of color. Sociol. Sport J. 2024, 42, 199–208. [Google Scholar] [CrossRef]
Mannuru, N.R.; Shahriar, S.; Teel, Z.A.; Wang, T.; Lund, B.D.; Tijani, S.; Pohboon, C.O.; Agbaji, D.; Alhassan, J.; Galley, J.; et al. Artificial intelligence in developing countries: The impact of generative artificial intelligence (AI) technologies for development. Inf. Dev. 2023, 41, 1036–1054. [Google Scholar] [CrossRef]
Wei, X.-S.; Song, Y.-Z.; Aodham, O.M.; Wu, J.; Peng, Y.; Tang, J. Fine-grained image analysis with deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 8927–8948. [Google Scholar] [CrossRef] [PubMed]
Hextrum, K. The National Collegiate Athletic Association and the exploitation of college profit-athletes: An amateurism that never was. Sociol. Sport J. 2024, 41, 322–323. [Google Scholar] [CrossRef]
Tang, R.; Li, X.; He, W. Multivariate regression modelling of women artistic gymnastics handspring vaulting kinematic performance and judges scores. China Sport Sci. Technol. 2019, 55, 17–23. (In Chinese) [Google Scholar]
Chen, Z.X.; Lv, Y.J.; Zhao, D.F.; Han, D. Competitive performance of volleyball players in different roles based on the technical statistics of ‘all data mode’. J. Shandong Sport Univ. 2021, 37, 102–110. (In Chinese) [Google Scholar] [CrossRef]
Guo, H.; Xu, Y.L.; Zou, S.C.; Zhang, H.X.; Wang, J. A quantitative study on students’ motion performance in online dance course based on non-linear dimensionality reduction and clustering. China Sport Sci. 2022, 42, 80–87. (In Chinese) [Google Scholar] [CrossRef]
Hermosilla, T.; Bermejo, E.; Balaguer, A.; Ruiz, L.A. Non-linear fourth-order image interpolation for sub-pixel edge detection and localization. Image Vis. Comput. 2008, 26, 1240–1248. [Google Scholar] [CrossRef]
Da, F.P.; Zhang, H. Sub-pixel edge detection based on an improved moment. Image Vis. Comput. 2010, 28, 1645–1658. [Google Scholar] [CrossRef]
Chai, X.Y.; Lin, X.Y.; Chen, H.T.; Wei, Q.Y.; Yu, Y.J. Zernike polynomials fitting of arbitrary shape wavefront. In Proceedings of the International Conference on Optical and Photonic Engineering (icOPEN 2023), Singapore, 27 November–1 December 2023. [Google Scholar] [CrossRef]
Wang, H.; Xu, X.; Liu, Y.; Lu, D.; Liang, B.; Tang, Y. Real-Time Defect Detection for Metal Components: A Fusion of Enhanced Canny–Devernay and YOLOv6 Algorithms. Appl. Sci. 2023, 13, 6898. [Google Scholar] [CrossRef]
Raghavan, D.; Todorcevic, S. Cofinal types of ultrafilters. Ann. Pure Appl. Logic 2012, 163, 185–199. [Google Scholar] [CrossRef]
Wang, Y.; Sun, L.; Liu, J. Optimal pose solution based on space geometry method for pingpong robot. Robot 2014, 36, 203–209. (In Chinese) [Google Scholar]
Wang, C.; Zhang, Q.; Wang, X.; Zhou, L.; Li, Q.; Xia, Z.; Ma, B.; Shi, Y. Light-field image multiple reversible robust watermarking against geometric attacks. IEEE Trans. Dependable Secur. Comput. 2025, 22, 5861–5875. [Google Scholar] [CrossRef]
Sheng, K.; Jiang, G.; Du, M.; He, Y.; Dong, T.; Yang, L. Interpretable knowledge-guided framework for modeling reservoir water-sensitivity damage based on Light Gradient Boosting Machine using Bayesian optimization and hybrid feature mining. Eng. Appl. Artif. Intell. 2024, 133, 108511. [Google Scholar] [CrossRef]
Liu, Y.; Wang, C.; Lu, M.; Yang, J.; Gui, J.; Zhang, S. From simple to complex scenes: Learning robust feature representations for accurate human parsing. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5449–5462. [Google Scholar] [CrossRef] [PubMed]
Talib, M.A.; Abdallah, M.; Abdeljaber, M.; Waraga, O.A. Influence of exogenous factors on water demand forecasting models during the COVID-19 period. Eng. Appl. Artif. Intell. 2024, 117, 105617. [Google Scholar] [CrossRef]

Figure 1. The overall structure of this study.

Figure 2. Data example, (a) forehand attack; (b) backhand fast break; (c) forehand flat shot; (d) backhand flat batting.

Figure 3. Simplified human arm configuration.

Figure 4. Image graying: (a) before; (b) after.

Figure 5. Image binarization: (a) gray distribution histogram; (b) binarization results (gray threshold 166). The red star in (a) represents the selected binary threshold of 166.

Figure 6. Edge detection results: (a) Canny operator edge positioning; (b) quadratic polynomial interpolation sub-pixel edge detection.

Figure 7. Human arm motion trajectory.

Figure 8. Feature importance obtained by the three tree ensemble algorithms.

Table 1. Results of sensitivity analysis.

Perturbation	Score Change (Mean ± Std)	Spearman ρ (Ranking)
l_C ± 5%	2.13%	0.97
l_C ± 10%	4.41%	0.94
I ± 10%	3.67%	0.95

Table 2. Main controlling factor analysis data set.

Features	Minimum	Maximum	Average
Swing posture score	36.33	99.74	82.31
Exercise time per week, h	0	12.00	3.40
Vital capacity score	60.48	100.00	85.10
50 m running score	66.34	97.69	77.96
Sitting forward flexion	68.03	100.00	72.96
Sleep time, h	5.60	10.50	7.80
Height and weight score	58.15	100.00	90.84
Sedentary time, h	0.5	10.00	3.47
Table tennis class comprehensive score	49	97	81.71

Table 3. Mixed feature mining ranking results.

Features	Mixed Feature Mining Score	Importance Ranking
Swing posture score	0.0039	1
Exercise time per week, h	0.0078	2
Sitting forward flexion	0.0879	3
Sleep time, h	0.125	4
Sedentary time, h	0.1465	5
Height and weight score	0.4219	6
50 m running score	0.7656	7
Vital capacity score	0.8750	8
Swing posture score	0.0039	1

Table 4. Structural comparison of objective functions.

Paradigm	Optimization Target	Nature of Objective	Interpretability
NDRC	$\min_{z} D_{K L} (P \| \| Q)$	Distribution divergence	Statistical
Statistical aggregation	${\| \|S (x) - y\| \|}^{2}$	Fitting error	Empirical
Proposed	E_k(ξ)	Physical energy	Geometric-mechanical

Table 5. Results of baseline experiments.

Method	Type	Input	Pearson’s r	RMSE	Time (ms/img)	Interpretability
G0 Pixel-level geometry	geometry	image	0.79	5.6	11	medium
Proposed Sub-pixel geometry	geometry	image	0.91	3.9	18	high
Ridge	end-to-end	tabular	0.81	5.1	-	low
MLP (2-layer)	end-to-end	tabular	0.86	4.8	-	low
XGB	ML	tabular	0.89	4.4	-	medium (SHAP)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lv, T.; Sheng, K.; Qiao, L. A Geometry-Driven Quantitative Modeling Framework for Image-Based Human Motion Evaluation: Application to Sub-Pixel Posture Analysis and Feature Attribution. Mathematics 2026, 14, 746. https://doi.org/10.3390/math14050746

AMA Style

Lv T, Sheng K, Qiao L. A Geometry-Driven Quantitative Modeling Framework for Image-Based Human Motion Evaluation: Application to Sub-Pixel Posture Analysis and Feature Attribution. Mathematics. 2026; 14(5):746. https://doi.org/10.3390/math14050746

Chicago/Turabian Style

Lv, Tianci, Keming Sheng, and Lan Qiao. 2026. "A Geometry-Driven Quantitative Modeling Framework for Image-Based Human Motion Evaluation: Application to Sub-Pixel Posture Analysis and Feature Attribution" Mathematics 14, no. 5: 746. https://doi.org/10.3390/math14050746

APA Style

Lv, T., Sheng, K., & Qiao, L. (2026). A Geometry-Driven Quantitative Modeling Framework for Image-Based Human Motion Evaluation: Application to Sub-Pixel Posture Analysis and Feature Attribution. Mathematics, 14(5), 746. https://doi.org/10.3390/math14050746

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Geometry-Driven Quantitative Modeling Framework for Image-Based Human Motion Evaluation: Application to Sub-Pixel Posture Analysis and Feature Attribution

Abstract

1. Introduction

Contributions

2. Methods

2.1. Data Source

2.2. Modeling Assumptions and Problem Formulation

2.3. Subpixel Calculation Method

2.4. The Optimal Swing Trajectory Is Obtained by Using Space Geometry

2.5. Sensitivity Analysis

2.6. Mixed Feature Mining

3. Results and Discussion

3.1. Sub-Pixel Edge Detection

3.1.1. Graying

3.1.2. Binarization

3.1.3. Edge Detection

3.2. Obtaining the Optimal Swing Trajectory

3.3. Analysis of Main Controlling Factors of Sports Performance

3.4. Comparison with Existing Motion Evaluation Models

3.4.1. Mathematical Comparison with Existing Motion Evaluation Models

3.4.2. Baseline Experiments and Quantitative Comparison

3.5. Generalizability, Limitations and Future Work

3.5.1. Generalizability

3.5.2. Limitations and Future Work

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI