1. Introduction
Quadruped robots are widely used in applications such as factory inspection, emergency rescue, and warehouse transportation due to their excellent locomotion capabilities [
1]. Most current quadruped robots adopt legged structures. On rough terrains, their superior off-road capability and terrain adaptability significantly enhance mobility [
2]. However, such mechanisms suffer from low energy efficiency and complex gait control when operating on flat surfaces. In contrast, wheeled robots, though lacking flexibility in unstructured environments, offer higher speed and better energy efficiency compared to legged robots [
3]. Wheel-legged robots combine the advantages of both: using wheels for fast locomotion on relatively flat ground, and switching to legged motion to overcome obstacles such as steps and pits. They are typically composed of actuated wheels, extendable legs, sensor systems, and dual-mode control architectures, which enable flexible transitions between wheeled and legged locomotion based on varying environmental conditions [
4]. This enables autonomous transitions between locomotion modes under varying terrain conditions [
5,
6]. Wheel-legged robots are equipped with dual independent control systems and combined wheel–leg structures. However, they still face several core technical challenges, particularly in achieving smooth and robust transitions between locomotion modes. One major difficulty lies in the accurate modeling of wheel–leg–terrain interaction, including ground contact forces, dynamic friction estimation, and foot-end impact control. These factors are vital not only for maintaining balance and minimizing slip but also for enabling predictive and adaptive motion planning [
7]. Recent advances have addressed some of these issues. For example, Chen et al. [
8] proposed a sliding mode control framework for foot-end trajectory consensus under variable topological constraints. Wei et al. [
9] developed a Kalman-filter-based adaptive contact force estimator for robotic systems with dynamic model imperfections, with potential extensions to mobile hybrid robots. Liu et al. [
10] introduced a robust model predictive control (RMPC) framework for wheel-legged hybrid vehicles, incorporating frictional uncertainties and contact force disturbances. Liu and Zhang [
7] focused on impact dynamics modeling for hybrid motion planning. Notably, a growing body of literature has modeled foot–ground interaction more accurately using dissipative and nonlinear contact models, originally developed for bipedal systems but increasingly applicable to wheel-legged robots. Corral et al. [
11] studied passive biped walking dynamics by comparing various contact and friction models, identifying the dissipative nonlinear Flores model and Bengisu friction law as best suited for realistic, smooth contact modeling. These insights are valuable for improving foot-end impact control in hybrid robots. Additionally, Moreno [
12] discusses a multibody modeling approach for systems with multiple, simultaneous contacts and impacts, such as those in billiard-ball-like interactions. The authors evaluate smooth contact force models, including those that account for friction and energy dissipation, offering further insights into selecting efficient and physically plausible impact models. This modeling framework can inform wheel-legged robot design, particularly in handling complex terrain interactions with multiple points of contact. Abad et al. [
13] proposed a quasi-static model and trajectory optimization method that allows UGVs to optimize the configuration angles of their wheel-legs under different terrain conditions, thereby achieving a minimum-torque strategy to save energy. They also analyzed the relationship between ground–wheel contact friction and normal forces. These studies emphasize the importance of contact intelligence, compliant control, and integrated sensing. In recent years, integrating multi-sensor perception and control strategies has emerged as a critical approach to improving robotic adaptability and stability on complex terrains. For instance, Tan et al. [
14] proposed a locomotion framework that combines visual and proprioceptive feedback, using a single forward-facing camera to generate a heightmap while regulating gait frequency and speed through proprioceptive cues, enabling stable and natural transitions across uneven terrain. In contrast, Wang et al. [
15] demonstrated that terrain attitude estimation and contact state detection can be effectively achieved using only proprioceptive inputs—such as joint torque, IMU, and kinematics—without relying on any visual sensors, enhancing the robustness of quadruped robots under challenging ground conditions. These findings collectively underscore that accurate foot-end contact modeling and intelligent control are fundamental to improving locomotion stability, energy efficiency, and terrain adaptability in wheel-legged robots.
In previous studies on foot-end perception of quadruped robots, researchers have largely relied on external sensors such as vision cameras, LiDAR, inertial measurement units (IMUs), and joint encoders to enhance environmental adaptability and decision-making capabilities [
16]. However, wheel-legged robots often operate in highly unstructured or visually degraded environments, where these external sensors are prone to occlusion, lighting issues, or signal loss [
17]. In such scenarios, contact measurement provides a robust and direct perception mechanism by capturing the physical interaction between the robot and terrain in real time. Notably, recent studies have highlighted the benefits of contact force sensing in locomotion optimization. For instance, Pepe et al. [
18] proposed a genetic-algorithm-based method that directly optimizes the ground reaction forces of quadruped robots to identify energetically efficient gaits. Their results demonstrate that knowledge of contact forces can not only improve energy efficiency but also facilitate the emergence of biologically plausible gaits such as walking and trotting under varying speed conditions. Integrating such force-based strategies into wheel-legged robots offers promising avenues for enhancing motion adaptability, especially when traditional exteroceptive sensing becomes unreliable. Therefore, early studies tended to estimate the contact state between the robot and the environment indirectly by combining proprioceptive sensors (such as joint angular velocity and center-of-mass velocity) with various filtering algorithms (e.g., Kalman filters) [
19]. However, since proprioceptive sensors cannot directly perceive external environmental information, the accuracy of such methods is limited under extreme conditions. In recent years, contact-based sensors specifically designed for quadruped robots have gradually attracted attention. Owing to the distinctive mechanical design of wheel-legged robots, their foot-end typically comprises a continuously rotating tire rather than a rigid, flat-foot structure as employed in conventional legged robots. This tire-based configuration exhibits substantial deformation and complex ground-contact dynamics under varying load conditions, rendering it unsuitable to be modeled as a simple rigid body in force analysis. Furthermore, the curved geometry and rotational motion of the tire pose considerable challenges for the direct integration of conventional foot-end sensors—such as force, pressure, or tactile sensors—commonly adopted in quadruped robotic systems. Consequently, wheel-legged platforms necessitate the development of specialized sensing methodologies capable of accommodating the unique kinematic and structural characteristics of their foot-end components [
20,
21]. To address this challenge, recent research has focused on leveraging smart tire technologies, which utilize sensors integrated into the tire to measure deformation or strain, thereby enabling more accurate estimation of tire–ground contact forces [
22]. Currently, smart tire systems commonly employ triaxial accelerometers [
23], optical sensors [
24], strain gauges [
25], and polyvinylidene fluoride (PVDF) piezoelectric film sensors [
26]. Among these, PVDF sensors have been widely adopted in tire applications due to their flexibility, low cost, and high sensitivity. Mechanical deformation applied to the PVDF film generates a voltage difference between its surfaces. Yi et al. [
26] proposed a PVDF sensor for measuring stress on the inner liner of a tire, with the stress measurements interpreted using a friction force model. Their results demonstrated the feasibility of PVDF-based tread deformation sensing. Similar PVDF-based tire sensor systems have also been reported in the studies by Armstrong [
27] and Toplar [
28].
In estimating the tire state at the foot-end of wheel-legged robots using smart sensing technologies, a core challenge lies in establishing an effective mapping between sensor signals and tire states. The three-dimensional contact force and contact position are key parameters for contact state estimation [
29]. If such information can be directly measured or indirectly inferred through smart tire technologies, it would significantly reduce reliance on external sensors, simplify the state estimation process, and enhance both accuracy and system robustness [
30]. However, tires are highly complex nonlinear systems characterized by material, contact, and geometric nonlinearities, making them difficult to model accurately using traditional mathematical approaches [
31]. Model-based methods rely on physical models to characterize tire–road interactions. For instance, Hong et al. [
32] proposed a brush-model-based algorithm to estimate the tire–road friction coefficient by relating lateral deflection to lateral force and aligning torque. Although such models offer physical interpretability, they often require simplified formulations and impose high computational demands for real-time applications. Feature-based methods extract key indicators from sensor measurements and use prior knowledge for state identification, making them the most widely used estimation approach []. For example, Niskanen and Tuononen [
33] identified different phases of the tire contact patch using three characteristic peaks in acceleration signals. Morinaga et al. [
34] estimated lateral force based on the ratio of contact lengths. However, these methods are highly sensitive to environmental factors such as temperature, tire pressure, and wear, requiring frequent recalibration of thresholds and filter parameters, which limits their robustness. In contrast, machine learning approaches are particularly well-suited for handling complex nonlinear problems. To overcome the modeling challenges posed by nonlinear tire behavior, neural network models such as multilayer perceptrons (MLPs) or convolutional neural networks (CNNs) can be employed. These models take raw sensor data—such as acceleration and strain signals—as input, and output key contact parameters like three-dimensional contact force or foot-end position, enabling more robust and accurate estimation without relying on manual feature extraction [
35,
36]. Gaussian process regression (GPR), a non-parametric method based on Bayesian theory, is capable of modeling high-dimensional inputs and nonlinear responses while simultaneously providing uncertainty estimates. As a result, GPR shows great promise in tire state estimation and its downstream applications, such as model predictive control (MPC) [
37].
In previous research, Wang [
20] designed a heuristic contact position estimator for wheel-legged robots by analyzing the deformation characteristics of the foot-end under loading through finite element analysis and employing an array of strain sensors to detect both the contact position and the normal force, thereby validating the feasibility of the approach. However, this method did not address the estimation of tangential forces, the adaptability to different tire types, nor the systematic influence of sensor placement on estimation performance. In fact, the choice of sensor placement is critical to the smart tire’s ability to capture dynamic signal features. Strategically positioning sensors in areas with higher responsiveness can significantly enhance the accuracy and robustness of foot-end state estimation. Therefore, identifying the optimal mounting positions for PVDF sensors holds substantial research and application value [
38]. Global sensitivity analysis (GSA) is an effective tool for quantifying the influence of input uncertainty on output responses and is widely used in engineering design optimization [
39]. Among various GSA techniques, Sobol’s variance-based sensitivity analysis method [
40] is the most commonly applied. However, its high computational cost has limited its practical application in engineering. To address this, Zhao [
41] proposed a method based on multiplicative dimension reduction and Gaussian quadrature grids, which significantly reduces computational demand. This study adopts Zhao’s approach to perform global sensitivity analysis on the sensor placement for wheel-legged robot foot-ends, aiming to determine the optimal configuration [
41].
Building upon the contributions of previous research in both UGV and wheel-legged robot domains, and recognizing the limitations that remain—particularly in accurate terrain contact modeling and real-time adaptability—we aim to further enhance the adaptability of wheel-legged robots by optimizing the application of PVDF strain sensors for foot-end contact detection. This paper is structured as follows:
Section 2 introduces the finite element simulation modeling process of smart tires and applies global sensitivity analysis to determine optimal sensor placement;
Section 3 presents the experimental platform setup used to verify the reliability of contact position estimation;
Section 4 describes physical testing under varying gait frequencies to assess the robustness of existing models in complex environments; the conclusions are provided in
Section 5.
4. Estimation and Validation of Foot-End Contact Characteristics
4.1. Contact Feature Data Processing
Based on the dataset collected in the previous chapter, preprocessing of the sensor signals is required before feeding them into the model for training. This step ensures that the data quality meets the requirements for model training and helps improve prediction accuracy and convergence speed. Under highly dynamic and unstable data acquisition conditions, calibration is performed by setting the initial reading of each sensor as a reference baseline. Specifically, given an initial reference value
, each subsequent measurement
is calibrated using the following equation:
Here,
represents the calibrated output value, and
denotes the original measurement. The multiplication by −1 serves as a sign inversion to adjust the sensor’s response direction, ensuring that the measured value accurately reflects the physical quantity and eliminates initial bias. To further smooth the raw data
, a moving average filter is applied as follows:
Here,
represents the smoothed data,
is the window size, and
is the raw data. By replacing the current value with the average of
surrounding points, high-frequency noise can be effectively filtered out, thereby enhancing data smoothness. In addition, the
-score method is used to detect outliers. The corresponding formula is given as follows:
Here,
denotes a data point,
is the mean, and
is the standard deviation. If
(where
is a predefined threshold), the corresponding value
is considered an outlier and removed to prevent interference with subsequent model training. The results are shown in
Figure 7. As illustrated by the figure and the box plot, the processed dataset is noticeably smoother, and the number of outliers is significantly reduced.
Subsequently, the dataset
is standardized and normalized to ensure consistency in the scale of all features. The value of
is computed as follows:
Here, represents the mean, is the standard deviation, and and denote the minimum and maximum values of the dataset, respectively.
Due to inherent sampling frequency differences between the flexible sensors and the ground-truth force sensors, there are often inconsistencies in timestamps during actual data acquisition. Therefore, a linear interpolation method is employed to align all sensor data to a unified time axis. The original data
are mapped onto a time grid
, and the interpolation formula is given as follows:
Here, denotes the interpolation time point, and are the adjacent measured time points, and and are the corresponding sensor data values at those times. This step produces a time-aligned dataset, facilitating subsequent analysis and modeling.
4.2. Dynamic Contact Force Estimation
In the time-series prediction task, to enhance the model’s learning capability, this study introduces three significant features based on sliding window and lag analysis: slope features and rolling statistical estimates (including rolling mean and rolling variance). The corresponding formulas are as follows:
Here, denotes the input variable. The slope feature is computed by calculating the difference between consecutive values, with inserted at the first position to maintain consistency in sequence length. Rolling statistical features (mean and variance) are computed using a sliding window (with a default window size of 10), enabling the capture of local statistical properties through the rolling mean and rolling variance.
As shown in
Figure 8, prior to training the GPR model, the data were formatted by contact angle and filtered to retain only six-dimensional force signals, with non-contact data below a specified threshold removed. Hyperparameters were manually tuned using a validation set, with the final values set to
= 0.1,
, and l = 1.0. The model was trained over 200 iterations on the training set, using one out of every ten data points for validation, and an additional 20% of the data was reserved for testing. To enhance the model’s ability to predict multi-feature time-series data, a systematic approach was adopted to optimize the GPR hyperparameters. The model maximized the log marginal likelihood and employed the L-BFGS-B algorithm to search for the optimal length scale and noise level of the RBF kernel within a predefined range. To avoid convergence to local optima, a random restart strategy was introduced with 50 restarts, improving both robustness and global search capability. All initial values and boundary conditions were selected based on data characteristics, balancing model flexibility and computational efficiency.
To evaluate the impact of feature expansion on model performance, an ablation study was conducted, comparing four types of input configurations: raw sensor data, data without slope features, data without rolling statistical features, and data with all feature combinations. The normalized root mean square error (NRMSE) was used as the evaluation metric. As shown in
Table 4, the results indicate that a richer feature set leads to significantly improved estimation accuracy, with notable reductions in NRSME across all three force components—Fx, Fy, and Fz.
To improve the performance of sensor signal estimation, a comparative model based on an artificial neural network (ANN) was introduced. A systematic evaluation was conducted across different combinations of activation functions, number of nodes, and network layers. Candidate architectures with over 90% accuracy were first identified using a small-scale dataset (2131 samples), followed by training and validation on the full dataset (6549 samples). As shown in
Figure 9, the model converged rapidly within the first 20 epochs, with training and validation losses decreasing synchronously, indicating good fitting performance. Although slight overfitting was observed (with validation loss around 0.1–0.2), the training process remained stable and effectively captured feature patterns.
Several trends were observed during iterative experiments with different ANN designs. In terms of node and layer configuration, architectures that employed a “first increasing then decreasing” or a “monotonically varying” number of nodes from the input to the output layer generally achieved higher prediction accuracy. Regarding network size and performance, large-scale ANNs—with five or more layers and at least 25 nodes per layer—typically exhibited lower accuracy, longer training times, and a higher tendency toward overfitting, resulting in overall performance inferior to smaller networks. With respect to activation functions, the choice among sigmoid, ReLU, ELU, and softmax had minimal impact on model accuracy for hidden layers. However, the activation function of the output layer needed to be selected according to the range and normalization of the target data. Regarding loss functions, both mean squared error (MSE) and Poisson negative log-likelihood loss (PoissonNLLLoss) were evaluated. The MSE loss function consistently delivered slightly higher prediction accuracy—typically a few percentage points better—compared to the Poisson loss function under the same data and ANN structure.
Similarly, an ablation study was conducted for the ANN-based estimator. As expected, the model achieved the best prediction performance when all features were included, as shown in
Table 5. This further confirms that the added feature information positively contributes to the model’s predictive capability. The visualization results are presented in
Figure 10. Compared with the GPR model, the ANN model demonstrated superior performance on the test set.
4.3. Data Validation
To evaluate the repeatability of the sensing system in detecting contact forces, two sets of experiments were designed based on the aforementioned testing platform. These experiments respectively assess the system’s repeatability under consistent conditions and its robustness during low-frequency random contact scenarios. In the dynamic detection experiments conducted on the test rig, the repeatability of the PVDF sensor was evaluated. By adjusting the height of the lower lifting platform, the contact depth was varied in real time, thereby altering the contact force.
- (a)
Repeatability Verification
Under repeated contact conditions at 2 Hz, the detection results from both the GPR and ANN models are shown in
Figure 11.
o evaluate the performance of the two estimation methods at different frequencies, experiments on three-dimensional contact force detection were conducted at gait frequencies of 1 Hz, 2 Hz, and 5 Hz (see
Table 6). In addition to NRMSE, RMSE and R
2 are introduced to evaluate estimation accuracy. RMSE reflects the average prediction error in Newtons, while R
2 indicates how well the model captures the variation in true forces. As shown in
Table 6, GPR achieves lower RMSE and higher R
2 across most conditions, especially at 2 Hz. In contrast, ANN shows larger RMSE and lower R
2, particularly at higher frequencies, indicating reduced reliability under fast gait dynamics. The results show that the GPR model performed consistently across all frequencies, with particularly high accuracy in estimating horizontal forces (
and
). The ANN model demonstrated an advantage in predicting
at low frequencies, but its performance degraded significantly at higher frequencies.
- (b)
Validation under Randomized Gait Frequency Input
Randomized gait frequency input is used to simulate unstructured and highly dynamic contact scenarios that wheel-legged robots may encounter during real-world locomotion. This approach provides a more realistic assessment of the sensing system’s applicability and robustness under complex conditions. The parameter range for randomized step frequency input testing is shown in
Table 7.
In the dynamic contact validation, the measured data were estimated using both GPR and ANN models (
Figure 12). Both models were able to effectively capture the overall trend of contact force variations. However, differences were observed in the performance along different axes: the GPR model exhibited better fitting accuracy in the x- and y-directions, but showed larger errors in the z-direction, particularly with exaggerated fluctuations under low-frequency conditions. In contrast, the ANN model produced smoother outputs, effectively suppressing noise, and generally delivered higher prediction accuracy and better stability across most scenarios.
Under randomized gait frequency input, the uncontrolled nature of contact and high noise levels led to a noticeable decline in model generalization performance. To enhance robustness, temporal cross-validation was introduced. The performance of the optimized GPR and ANN models on the measured data is shown in
Figure 13.
To better reflect real-world applications, three types of data—low-frequency, high-frequency, and gait transition scenarios—were collected for estimation testing. Due to the high dynamic nature and noise interference, the raw data exhibited both abrupt and periodic patterns. To improve model adaptability, temporal cross-validation was introduced, with multiple rolling evaluations performed to ensure output stability across different time windows. In addition to point estimation, the GPR model provides uncertainty intervals, while the ANN model is useful for identifying overfitting tendencies and guiding hyperparameter optimization. As shown in
Table 8 and
Table 9, the two models demonstrated significant performance differences under varying conditions: the GPR model exhibited more stable performance in predicting Fz, with NRSME ranging from 15.47% to 22.37% and a maximum R2R^2R2 value of 0.7126. The ANN model achieved slightly higher accuracy in low-frequency conditions (with a maximum
of 0.7424), but its performance was less stable during gait transitions. Overall, the GPR model demonstrated greater robustness in complex dynamic environments.
In the estimation of shear forces ( and ), both models exhibited relatively large errors. The GPR model achieved NRSME values ranging from 9.11% to 15.56%, with a maximum R2 of 0.7116 (for under low-frequency conditions). In contrast, the ANN model showed a wider NRSME range (9.93–20.24%) and a lower peak R2 of only 0.5745, with its performance dropping significantly during gait transitions, where R2R^2R2 decreased to as low as 0.0179. The ANN model also exhibited higher RMSE values than GPR, indicating greater sensitivity to noise. Overall, both models demonstrated reliable performance in predicting normal forces (); however, there is still room for improvement in shear force (, ) estimation, which remains limited by sensor sensitivity, noise interference, and feature extraction strategies.
5. Conclusions
This study addresses the challenge of foot–ground contact state perception in wheel-legged robots by proposing a systematic solution that integrates finite element analysis (FEA), global sensitivity analysis (GSA), and machine learning-based estimation. A finite element model of the pneumatic tire was established to reveal the strain distribution characteristics under foot-end loading. Based on the Sobol method for global sensitivity analysis, the optimal placement positions for PVDF piezoelectric film sensors were identified, and the effectiveness of the sensor deployment strategy was validated both theoretically and experimentally. In future work, we plan to introduce a mass–spring–damper (MSD) model to complement the current FEA framework. The MSD model is expected to enhance the accuracy and smoothness of dynamic contact force estimation over time, particularly during locomotion in complex terrains.
In the dynamic testing phase, a customized experimental platform capable of simulating various gait frequencies was developed to collect representative contact datasets. Data preprocessing and feature engineering were applied to improve data quality. Based on these datasets, Gaussian process regression (GPR) and artificial neural network (ANN) models were constructed for predicting three-dimensional contact forces. The results showed that the GPR model offered advantages in uncertainty estimation and demonstrated strong stability and robustness. The ANN model exhibited superior prediction accuracy in low-frequency scenarios, achieving a minimum NRMSE of 8.04% in normal force estimation. Both models showed good generalization performance under temporal cross-validation, meeting the practical requirements for foot-end state estimation in complex terrains.
Overall, this research improves the accuracy and robustness of contact state detection for wheel-legged robots and provides a feasible approach for the design and optimization of smart tire sensing systems. Future work will focus on enhancing shear force prediction accuracy, optimizing multi-sensor fusion strategies, and exploring transfer learning and adaptive mechanisms to improve model generalization for more effective autonomous decision-making and motion control in dynamically complex environments. Additionally, continuous exploration will be carried out to integrate foot-end sensors with multiple other sensors, such as vision cameras, LiDAR, and inertial measurement units (IMUs). By fusing diverse sensing data, we aim to further enhance the adaptability of wheel-legged robots in unstructured terrains, enabling them to navigate rough and unpredictable environments with higher efficiency and stability.