1. Introduction
High-resolution contact localization and three-axis force estimation are indispensable for precise contact measurement and surface interaction [
1,
2,
3]. Ideally, a tactile sensing system should not only measure the three-axis contact force vector but also localize the contact point accurately in space to support tasks such as contact-state estimation, force control, and slip feature analysis [
4,
5,
6,
7]. Conventional high-resolution tactile sensors often rely on dense arrays, which increase channel count, wiring complexity, packaging volume, and cost, and reduce reliability and cumbersome calibration procedures [
8,
9,
10]. For deployable tactile systems, achieving perception capabilities comparable to high-resolution arrays under a “sparse sensing unit” constraint has become an important research direction in tactile sensing [
11,
12,
13].
To alleviate the interconnect and packaging burden of high-density arrays, several alternative routes have been explored. Vision-based tactile sensors [
14,
15] can achieve high spatial resolution by tracking deformations of a compliant medium; however, they require imaging optics and additional packaging volume, which limits form-factor flexibility and robustness in compact deployments [
16]. Moreover, learning a reliable mapping from observed deformations to force/location often demands large datasets or high-fidelity modeling, and performance is often sensitive to the domain gap between simulation and physical hardware [
17]. To address this, Sferrazza et al. [
18] proposed a sim-to-real strategy that generates training data in simulation and transfers it to real high-resolution optical tactile sensing, showing that “simulation-synthesized data and learning-based inference” can substantially reduce the burden of real data collection, while domain discrepancy remains a key factor affecting final accuracy and robustness.
A scalable alternative is computational tactile sensing, where mechanical transmission yields overlapping receptive fields and learning-based models reconstruct higher-resolution contact states from sparse readout. Recent super-resolution frameworks demonstrate that sparse tactile readout can still support dense virtual outputs when sensor layout and decoding models are co-designed [
19]. Complementary theoretical [
20,
21] analyses further clarify the conditions and trade-offs for super-resolution under multidirectional loading, highlighting that stable multi-axis force and location decoding require careful handling of coupling, nonlinearity, and domain shift. Collectively, these studies suggest that sparse sensing units do not necessarily imply low spatial resolution if the structure and mapping model are properly designed; nevertheless, the stable decoding of the multidirectional force and contact location still require coupling, nonlinearity, and sim-to-real domain discrepancy to be addressed.
Among sparse tactile sensing modalities, strain-based sensing offers advantages such as compact structure, easy integration, fast response, and low cost, making it suitable for sparse deployment in confined spaces [
22,
23]. However, few-channel strain signals often exhibit pronounced cross-axis coupling and nonlinear responses, and the mapping between the contact location and three-axis force can drift with variations in material properties, assembly tolerances, bonding processes, and circuit gain. Recent studies on modular tactile units based on three-dimensional micro strain gauges have demonstrated the potential of micro-structured strain devices for multi-axis stimulus measurement and integration, while also highlighting the importance of multi-axis decoupling and consistent manufacturing for system performance [
24]. On the other hand, low-cost tactile fingertips for robot end-effectors achieve contact localization and multi-DoF force/torque sensing via structural compliance and model-based inference, emphasizing deployability, integrability, and calibratability; yet such approaches still need to handle structural errors and calibration complexity [
25,
26]. Although finite element simulation can generate large-scale training data covering diverse conditions at relatively low cost, directly transferring the learned mapping from simulation to hardware is often limited by sim-to-real domain shift [
27]. Conversely, relying entirely on large amounts of real data for end-to-end training or pointwise calibration significantly increases experimental cost, undermining the advantages of sparse and minimalist designs [
28].
While prior sparse/super-resolution tactile studies commonly aim to reconstruct a dense virtual tactile map from sparse measurements (i.e., a super-resolution inversion problem) [
19,
20,
21], our goal is different: we target contact-state inference under sparse strain readout by directly estimating the physically meaningful state. This leads to three key distinctions. (i) Decoding principle: instead of “sparse-to-dense” reconstruction that may amplify coupling and ambiguity under multi-axis loading, we formulate force–location decoupling as a unified regression-based contact-state inference problem, which directly outputs location and three-axis force without requiring an intermediate dense field. (ii) Training strategy: rather than depending on large-scale real calibration or purely simulation-trained models that degrade under domain shift, we adopt a two-stage sim-to-real pipeline that learns a physics-consistent prior from FEM data and then performs prior-constrained few-shot adaptation using a compact real dataset, explicitly mitigating systematic discrepancies. (iii) Hardware layout optimization: instead of using heuristic sensor placement or fixed layouts, we optimize the minimal three-module architecture and placement via PSO to maximize informative response overlap (i.e., identifiability) across the workspace under multi-directional loading, enabling channel-efficient decoding with only nine strain channels.
To address the strong coupling, nonlinearity, and sim-to-real domain shift that hinder high-resolution contact localization and three-axis force estimation under sparse strain readout, we jointly co-design the hardware structure and the decoding algorithm. Firstly, we design a compact sparse strain-node tactile interface device (SSTID) that adopts a three-module layout optimized via particle swarm optimization (PSO) to maximize effective response overlap across the workspace, enabling contact localization and three-axis force estimation using only nine strain channels. Building on this optimized structure, we first propose a strain-node contact-state decoding framework (SCDF) that formulates contact-state inference as a unified regression problem and is implemented with a lightweight multilayer perceptron (MLP). The framework adopts a two-stage sim-to-real strategy. Stage I pretrains the decoder on 14,400 FEM-simulated single-point contacts to learn a physics-consistent prior mapping of diverse contact locations, directions, and magnitudes. Stage II performs prior-constrained, regularized backpropagation using only 648 real calibration samples to compensate for systematic discrepancies caused by material properties, assembly tolerances, and circuit gain variations, thereby reducing the sim-to-real gap. With this pipeline, the proposed framework supports high-resolution contact-state decoding under sparse readout channels while maintaining low calibration cost and real-time deployability. More broadly, it offers a practical and scalable route toward low-cost, high-resolution three-axis tactile perception for contact-rich robotic sensing and control. The main contributions of this work are as follows:
- (1)
We establish a minimum three-module architecture and optimize its placement to maximize effective response overlap across the workspace, resulting in a channel-efficient SSTID with only nine strain channels over a 60 mm diameter workspace at a hardware cost below USD 15.
- (2)
We first propose an SCDF for force–location decoupling that employs a two-stage MLP training strategy. Stage I pretrains the decoder on 14,400 FEM-simulated contacts to learn a physics-consistent prior, and Stage II performs regularized few-shot fine-tuning with 648 real samples to reduce sim-to-real domain shift.
- (3)
We characterize its full-workspace accuracy and resolution, low force stability, repeatability/hysteresis/drift, and dynamic response, and further demonstrate sparse high-resolution three-axis force sensing via force regulation experiments.
2. Materials and Methods
2.1. Structural Optimization Design and Principal Analysis of the SSTID
To obtain the optimal channel design for the SSTID, we performed structural optimization. Under minimalist, low-cost hardware constraints, a key question is the minimum number of sensing modules required and how to place them to achieve stable and reproducible decoding of both the contact location and three-axis force. With fewer than three modules, the planar geometric reference frame becomes degenerate, making contact localization prone to multiple solutions or instability under noise. Therefore, we adopt a three-module configuration as the minimum setup that satisfies geometric non-degeneracy and provides sufficient information.
Under the three-module configuration, the joint identifiability of the contact location and three-axis force depends on two conditions: (i) the three modules must form a non-collinear geometric reference frame in the plane to avoid degeneracy in localization; and (ii) any contact within the workspace should be able to simultaneously excite at least three modules to produce effective responses—otherwise, the estimation becomes ill-conditioned due to insufficient information or noise amplification. Accordingly, as shown in
Figure 1a, we define the “effective response region” of each module in the contact plane as a set. When the contact lies within
, the reaction force magnitude
of module
exceeds a preset threshold and can be used for inference; outside this region, the response decays below the threshold and contributes negligibly to estimating location and load. The region
can be approximated by a circular area with radius
based on a threshold isoline, where
is determined jointly by structural transmission characteristics and the chosen threshold and can be calibrated via experiments or simulations.
Based on the above definition, we assume that the three module centers form a triangle in the plane with pairwise distances
. To ensure that any workspace point can trigger effective responses from at least three modules, we enforce an overlap constraint that limits inter-module distances to be no larger than the effective radius
, and we maximize the union area of the three circular regions as a layout efficiency metric.
As shown in
Figure 1b, we solve this optimization using particle swarm optimization (PSO) to obtain the optimal geometry under a given
. Under the general condition
, common optimizers converge to the same optimum: the three units lie at the vertices of an equilateral triangle with optimal side length
, yielding
, as shown in
Figure 1c. This result directly supports the threefold-symmetric (0°, 120°, and 240°) module layout of the SSTID; while keeping the structure minimal, it maximizes the receptive field union and ensures simultaneous availability of informative signals across the workspace, thereby improving identifiability and reducing layout sensitivity for subsequent static decoupling and learning-based regression. This analysis provides a quantitative and reproducible basis for selecting a low-cost three-module layout.
Based on the layout analysis, the device adopts a threefold-symmetric configuration: the sensing plane consists of an SUS304 stainless steel top plate (diameter: 60 mm), three three-axis force sensing modules distributed circumferentially with 120° symmetry, and a rigid bottom plate made of the same material (
Figure 2a). Both the top and bottom plates are 1 mm thick, and the top plate–sensor modules–bottom plate are bonded with metal adhesive to form an integrated force transmission chain (
Figure 2b). Each sensing module uses a layered structure: a polycarbonate pick layer on top for load bearing (Pick), followed by a flexible printed circuit (FPC), acrylic pressure-sensitive adhesive (ADH), and a polyimide (PI) substrate. Seven metal foil strain gauges are integrated on the underside of the PI film (overall dimensions shown in
Figure 2c). The strain gauges are arranged with functional zoning to enhance the separability of the three force components (
Figure 2d): tangential gauges enhance the
shear response, radial gauges enhance the
shear response, and the central region enhances the
normal response. When an external load is applied to the pick layer, deformation is transmitted through the FPC–ADH–PI stack to the strain gauge array, inducing resistance changes and forming strain observations correlated with three-axis forces, thereby providing a clear physical basis for cooperative decoding of the three-axis force and contact location. The total bill-of-materials cost is below USD 15, as summarized in
Table 1. Importantly, this layout optimization not only maximizes channel efficiency but also improves the conditioning of the subsequent force–location decoupling model, thereby reducing sensitivity to noise and facilitating robust two-stage sim-to-real training.
2.2. Circuit Architecture and Readout Principle of the SSTID
The mechanical readout of the SSTID is based on the Wheatstone bridge principle, which converts strain gauge resistance variations into voltage signals for three-axis force prediction. As shown in
Figure 2d,e, three orthogonally arranged strain gauge groups are integrated on the PI film, and each group adopts a half-bridge configuration to save space, corresponding to the measurement of the three force axes. When an external load is applied to the pick layer, mechanical deformation is transmitted through the FPC–ADH–PI laminate, inducing resistance changes
. The X- and Y-axis gauges each form a half-bridge, and the Z-axis gauges form a half-bridge with the corresponding resistors, as illustrated in
Figure 2e. Here,
share the same initial resistance
, while
and
are
.
denotes the resistance change under deformation, and
is the normalized resistance variation. The X- and Y-axis half-bridge output voltages
and
satisfy:
The Z-axis output voltage
can be expressed as:
The normalized variations
for the X/Y/Z axes are jointly contributed by the averaged orthogonal strain components
(length direction),
(width direction), and
(bisector direction), as shown in
Figure 2d. Thus,
can be written as:
where
and
are the sensor dimensions along the length and width directions, taking values 9.8 and 5.4, respectively. Finally, according to the strain gauge arrangement in
Figure 2e, the three-axis signals corresponding to
,
,
can be expressed as:
Here, is the bridge excitation voltage and is set to 2.8 V, enabling conversion of the applied load on a single sensor module into voltage signals.
2.3. Design of the Simulation Dataset
To enable Stage I pretraining of the force–location decoder and to provide a physics-consistent prior that is transferable to real hardware, we construct a large-scale FEM simulation dataset covering diverse contact locations, directions, and magnitudes. We perform batch FEM simulations by applying concentrated loads with varying directions and magnitudes on the surface of the tactile interaction platform and obtain the corresponding three-axis force responses of the three sensors. Following the procedure in
Figure 3a, we construct a dataset of 14,400 samples: 7200 samples with random three-axis force loading in the range 0–5 N, and another 7200 samples with random normal-force loading in the range 0–5 N, both applied at random locations on the platform surface.
Figure 3b–d show the stress and strain distributions of the top plate and individual modules under representative loading cases, supporting the consistency of the load–transmission–strain response chain in the FEM model. Loads in different directions induce distinctive response patterns across the three modules, providing discriminative information for subsequent regression of both
and
from few-channel signals. We validate fidelity at both the module and system levels.
Figure 4 shows a quantitative FEM experiment comparison of the module-level output voltage differences, where the
-related (normal) channel and the
related (tangential) channels match closely under the same loads. For the assembled three-module SSTID, we further apply 1 N normal loads at 73 spatial points and obtain an average relative error of 11% for
, while the average tangential-force error (
) is 32%, mainly due to assembly tolerances and inter-module coupling not fully captured by FEM.
To ensure physically consistent and transferable simulation data, we used the elastic moduli in
Table 2 for the pick, epoxy, ADH, FPC, PI, and strain gauges, while keeping geometry and boundary conditions consistent with the physical assembly. This configuration was used to generate pretraining data and to provide a physics-consistent prior for the SSTID.
2.4. Real-World Data Collection Protocol
To acquire real data and validate the performance of the SSTID, we designed and fabricated a measurement setup as shown in
Figure 5a. The setup consists of a three-DoF gantry stage and a high-precision base with five degrees of freedom (X, Y, Z, rotation, and tilt), controlled via adjustment knobs. The knob resolution for X/Y/Z translation is 0.02 mm per division, and the tilt knob resolution is 0.1° per division. A force/torque (F/T) sensor (NANO25, ATI Industrial Automation) was used as the reference, with rated loads of
: 100 N and
: 4 N·m. The F/T sensor data, together with all other measurement data, were transmitted to a computer for recording.
For low-cost sim-to-real adaptation, we collected real samples for backpropagation-based adjustment. We selected 73 spatial points on the sensing plane (computed as 13 × 6 − 5 = 73), where the center point is counted only once across the six radial groups. Each point included two loading types: normal loading and inclined loading.
(1) Normal loading (216 samples): At each spatial point, a normal force ∈ {1,3,5} N was applied, resulting in 216 normal force samples in total.
(2) Inclined loading (432 samples): At each spatial point, six inclined force samples were applied, resulting in 432 samples in total. The inclined force vector was parameterized by magnitude–tilt angle–azimuth angle: the magnitude
∈ [0,5] N; the tilt angle
is the angle between the force vector and the normal Z axis; and the azimuth angle
is the in-plane angle of the force vector relative to the X axis. The components are:
Accordingly, the normal-force component is denoted as (and the force vector as ) throughout this work. Given the constraint of only six inclined loadings per point, we adopted a uniform coverage strategy with two tilt angles and three azimuth angles to improve generalization across shear directions and shear-to-normal ratios: ∈ {0°,120°,240°}, and ∈ {30°,60°}. Meanwhile, to avoid spurious correlations caused by fixed binding between the direction and magnitude, the magnitudes {1,3,5} N were cyclically permuted with the azimuth angles across different spatial points (a cyclic right shift of {1,3,5} according to point index), achieving balanced coverage of direction, shear ratio, and magnitude over the full plane.
The real calibration set (648 samples) is designed to be representative rather than exhaustive. This sampling strategy covers combinatorial variations in shear direction, shear-to-normal ratio, and magnitude with minimal trials per point: three azimuth angles cover the principal in-plane shear directions, and two tilt angles cover different shear proportions. The cyclic permutation between magnitudes and azimuth angles further reduces pseudo-correlation due to fixed magnitude–direction pairing, improving model generalization under a limited total sample budget and enhancing the efficiency of real-world adjustment.
2.5. Learning-Based SCDF with Two-Stage Sim-to-Real Adaptation
Based on the simulation and real calibration datasets introduced above, we next present the proposed strain-node contact-state decoding framework (SCDF) for mapping nine-channel strain observations to a continuous contact location and three-axis force. To bridge the sim-to-real gap while keeping calibration cost low, the SCDF is trained using a two-stage strategy: simulation pretraining for a physics-consistent prior followed by regularized few-shot adaptation on real data. We first define the input as a 9-dimensional displacement/strain feature vector
and the output as a 5-dimensional target vector:
where
denotes the contact location and
denotes the three-axis force components. The regression model is
. To mitigate the influence of scale differences across dimensions during optimization, both inputs and outputs are standardized using a Standard Scaler:
the network regresses
in the standardized space, and the outputs are de-standardized during inference to obtain physical quantities.
As illustrated in
Figure 5, the SCDF is implemented as a lightweight multilayer perceptron (MLP) decoder composed of stacked fully connected blocks with nonlinear activations and a final linear output layer. This architecture is sufficient to model the strongly coupled and nonlinear mapping from few-channel strain signals to multi-dimensional contact states while remaining computationally efficient for real-time deployment.
To address the sim-to-real domain shift caused by material properties, assembly tolerances, and circuit gain variations, the SCDF adopts a two-stage sim-to-real adaptation strategy. In Stage I (simulation pretraining), we train the decoder on the FEM-based simulation dataset
(14,400 samples) to obtain
which serves as a physics-consistent prior mapping. In Stage II (few-shot real adaptation), we initialize the model with
and continue training on the real calibration dataset
(N = 648), using prior-constrained and regularized optimization to absorb systematic hardware discrepancies into the model parameters and reduce the sim-to-real gap. The Stage II parameter update can be written as:
thereby absorbing systematic errors caused by material, assembly, and circuit gain discrepancies into the model parameters and reducing the sim-to-real gap. The model output is grouped into location and force terms, and a weighted regression loss is adopted:
where
and
. With standardized outputs, we typically set
. To suppress overfitting during few-shot adjustment and explicitly preserve the simulation prior, we introduce an L2-SP regularization term during fine-tuning:
where
is the regularization weight. We determine
via a log-scale sweep on a held-out validation split of the real calibration set (e.g.,
) and select the value that minimizes validation error; a sensitivity check confirms stable performance within a reasonable range, while too small
overfits few-shot real data and too large
under-adapts to the real domain. In this work, we use
unless otherwise stated.
Both stages employ mini-batch optimization; pretraining uses a larger learning rate for fast convergence, whereas fine-tuning adopts a smaller learning rate and validation-based early stopping to improve generalization. During inference, the SCDF consists only of fully connected operations and element-wise activations, enabling real-time output of
on conventional edge-computing platforms. Detailed training hyperparameters and implementation settings are provided in
Section 2.6.
2.6. Training Procedure
Based on the model definition and two-stage objectives in
Section 2.5, we detail the implementation settings for Stage I simulation pretraining and Stage II few-shot real adaptation, including data splits, optimization settings, and training criteria. For Stage I, the FEM-simulated dataset of 14,400 single-point contacts was randomly split into 80% training, 10% validation, and 10% test. For Stage II, the real calibration dataset of 648 samples was split using the same 90%/10% protocol. Unless otherwise stated, the training pipeline and hyperparameters were identical for both stages. To balance loss contributions across dimensions and improve optimization stability, both inputs and outputs were standardized (z-score normalization). During training, the standardized inputs were forwarded through the MLP, and a sequence of fully connected linear mappings with ReLU activations produced predictions
for
. The mean squared error loss
was then computed against the ground truth labels. Backpropagation was performed from this loss, and PyTorch automatic differentiation computed gradients of the loss with respect to the weights and biases of each layer via the chain rule (with gradient gating through ReLU derivatives). For each mini-batch, gradients were cleared, recomputed, and parameters were updated using the Adam optimizer (learning rate 10
−3), iteratively driving model outputs toward the ground truth. The model was implemented using PyTorch 2.6.0 under Python 3.12 and training converged within 300 epochs. As shown in
Figure 6 and
Figure 7, the model achieved a mean squared error (MSE) = 0.011, mean absolute error (MAE) = 0.092, and
on the test set, indicating good prediction accuracy and generalization. The predicted-versus-ground-truth scatter plots for each component closely align with the diagonal, demonstrating strong overall consistency across multi-dimensional outputs. Meanwhile, the loss curve stabilizes in the later stage of training and the
curve converges close to 1, suggesting no noticeable oscillation or degradation during optimization. These results provide a reliable baseline for subsequent real-world adaptation and metrological experiments.
4. Conclusions
High-resolution contact localization and three-axis force estimation are crucial for human–robot interaction and precision manipulation, yet this capability is fundamentally limited by channel density and wiring cost. With sparse strain readout, low-dimensional measurements make decoupling of the location and three-axis force susceptible to cross-axis coupling and nonlinear responses. We address high-resolution contact localization and three-axis force estimation under sparse strain readout by jointly co-designing the hardware structure and decoding the algorithm. Specifically, we perform a layout optimization using particle swarm optimization (PSO) to determine the minimum three-module configuration and its optimal placement, maximizing response overlap and yielding a channel-efficient SSTID with only nine strain channels. Building on this optimized structure, we propose an SCDF that casts force–location decoupling as a unified regression problem and is realized with a lightweight multilayer perceptron (MLP). The decoder is trained with a two-stage sim-to-real strategy, where Stage I learns a physics-consistent prior from 14,400 FEM-simulated single-point contacts and Stage II performs prior-constrained, regularized adaptation using 648 real calibration samples to compensate for systematic discrepancies from material properties, assembly tolerances, and circuit gain variations.
Systematic metrological characterization and ablation studies validate accuracy and resolution across the workspace and quantify low-force stability, repeatability, hysteresis, and dynamic response. Within a 5 N full-scale range, the force resolution reaches 5 mN, with an average repeatability error of 1.3% FSO and an average hysteresis of 3.1% FSO. With real samples for Stage II, the position MAE decreases to 1.97 mm and the three-axis force MAE decreases to 0.17 N, confirming the effectiveness of combining simulation-based prior learning with few-shot real adaptation. With a hardware cost below USD 15, the SSTID provides a practical reference for cost-sensitive deployment of three-axis force and contact localization in contact-rich robotic sensing and control. Looking forward, we will pursue cross-device generalization and establish an interpretable link between sensor scale and structural parameters and the resulting force–location response characteristics, enabling consistent calibration and rapid transfer across devices with different sizes and manufacturing variations.