Three-Dimensional Shape Estimation of a Soft Finger Considering Contact States

Matsuyama, Naoyuki; Wan, Weiwei; Harada, Kensuke

doi:10.3390/app16020717

Open AccessFeature PaperArticle

Three-Dimensional Shape Estimation of a Soft Finger Considering Contact States

by

Naoyuki Matsuyama

,

Weiwei Wan

^*

and

Kensuke Harada

Graduate School of Engineering Science, The University of Osaka, Toyonaka 560-0043, Japan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(2), 717; https://doi.org/10.3390/app16020717

Submission received: 21 November 2025 / Revised: 4 January 2026 / Accepted: 6 January 2026 / Published: 9 January 2026

Download

Browse Figures

Review Reports Versions Notes

Abstract

To achieve precise in-hand manipulation and feedback control using soft robotic fingers, it is essential to accurately measure their deformable structures. In particular, estimating the three-dimensional shape of a soft finger under contact conditions is a critical challenge, as the deformation state directly affects manipulation reliability. However, nonlinear deformations and occlusions arising from interactions with external objects make the estimation difficult. To address these issues, we propose a soft finger structure that integrates small magnets and magnetic sensors inside the body, enabling the acquisition of rich deformation information in both contact and non-contact states. The design provides a 15-dimensional time-series signal composed of motor angles, motor currents, and magnetic sensor outputs as inputs for shape estimation. Built on the sensing signals, we propose a mode-selection-based learning approach that outputs multiple candidate shapes and selects the correct one. The proposed network predicts the three-dimensional positions of four external markers attached to the finger, which serve as a proxy representation of the finger’s shape. The network is trained in a supervised manner using ground-truth marker positions measured by a motion capture system. The experimental results under both contact and non-contact conditions demonstrate that the proposed method achieves an average estimation error of approximately 4 mm, outperforming conventional one-shot regression models that output coordinates directly. The integration of magnetic sensing is demonstrated to be able to enable accurate recognition of contact states and significantly improve stability in shape estimation.

Keywords:

soft finger; magnetic sensors; pose estimation; machine learning

1. Introduction

Soft robotic fingers can conform to the geometry of grasped objects and enable stable manipulation of a wide variety of items [1]. Previously, several studies have investigated in-hand manipulation using soft fingers. These fingers rotate or translate objects within the grasp [2,3]. Such advanced manipulation requires real-time estimation of the soft finger’s 3D shape and fingertip positions, as the deformation state directly affects the success of manipulation. However, most existing systems do not estimate the soft finger shape explicitly. Instead, they rely on open-loop control, where the inherent compliance of the finger absorbs small errors during motion. While this approach can succeed in simple and predictable environments, it cannot adapt to unexpected contact or environmental changes. Without accurate state estimation, the system cannot detect or recover from failures, limiting the applicability of soft fingers in closed-loop manipulation.

Motivated by these limitations, recent research has increasingly shifted toward developing state estimation techniques for soft fingers. Several state-of-the-art studies have been focused on state estimation for soft fingers. Yet, despite the growing interest and efforts, accurate estimation remains highly challenging due to several reasons. First, the soft fingers have strong nonlinearity, hysteresis, and time-dependent behavior of soft materials. These difficulties are further exacerbated during manipulation, where contact forces and reactive deformation create highly complex and unpredictable shapes. Second, embedding sensors within the finger is also nontrivial; low-quality sensors can compromise the finger’s compliance, leading to a trade-off between sensing capability and softness. External vision-based approaches have been explored as an alternative, but they struggle to handle occlusion and physical contact, especially in cluttered or confined environments. These challenges have hindered the practical realization of soft finger state estimation and thus limited their application in closed-loop robotic manipulation.

In this study, we propose the design of a soft robotic finger embedded with small magnets and magnetic sensors, and develop a learning-based method for estimating its three-dimensional shape using these sensor signals together with motor angle and motor current measurements. The magnetic sensors detect local bending states through variations in magnetic flux density, while being embedded without compromising the flexibility of the soft structure. The proposed shape estimation method employs a two-stage architecture: an MDN (Mixture Density Network) that first generates multiple candidate shapes corresponding to contact and non-contact conditions based on motor angles, followed by a Selector head that determines the correct candidate using encoded time-series information from the magnetic sensors and motor currents via a GRU (Gated Recurrent Unit).

Figure 1 shows an overview of the hardware configuration and the proposed shape estimation system. The magnetic sensors embedded in the soft finger, together with the motor actuation signals, are used as inputs to the neural network, which then outputs the estimated three-dimensional shape of the finger. We conducted a series of experiments to evaluate the performance of the proposed system, and the results show that our method outperforms several baseline approaches. It achieves an average estimation error of ±4 mm and can reliably distinguish between contact and non-contact conditions.

The organization of this paper is as follows. Section 2 reviews related work on state estimation of soft fingers. Section 3 describes the design and sensor configuration of the proposed soft finger. Section 4 discusses the shape estimation approach, including data collection and network model selection. Section 5 presents the experimental evaluation and analysis. Finally, Section 6 concludes the paper and discusses future work.

2. Related Work

We review related work on soft finger state estimation from two perspectives: sensor selection/configuration and shape estimation methods.

2.1. Sensor Selection

Sensors used for estimating the shape of soft fingers can be broadly categorized into two approaches: vision-based methods using external cameras, and proprioceptive methods utilizing embedded sensors. In vision-based approaches, the shape of a soft finger is usually measured using monocular or multiple cameras, and its three-dimensional shape is reconstructed from the image data [4,5]. Such approaches have the advantages of not requiring any sensors to be embedded within the finger, thus preserving the structural flexibility. However, the vision-based approaches suffer from vulnerabilities to ambient lighting changes and constraints on the placement of grasped objects and surrounding equipment due to occlusion concerns.

To address these limitations, a widely explored alternative is to embed sensors within the finger structure. Common internal sensors include strain sensors, optical sensors (e.g., waveguide-based ones), Inertial Measurement Units (IMUs), and magnetic sensors [6,7]. Strain sensors are compact and can be integrated in large numbers. However, they are typically designed to detect deformation in a single direction and thus struggle to capture torsional or complex 3D deformations [8,9,10]. While recent work has explored soft, distributed strain sensors using materials like carbon nanotubes or liquid metals, their fabrication processes are complex, making reproducibility and stability difficult to achieve in practice [11,12]. Optical sensors offer flexibility by using elastomer or silicone-based substrates that do not hinder the finger’s motion. However, due to their widespread distribution, they have limited spatial resolution for detecting localized deformations [13,14,15]. IMUs can estimate relative pose changes via acceleration and angular velocity measurements, which is useful for shape reconstruction [16,17]. However, they are prone to integration drift and, being rigid components, present challenges for full integration into soft structures. In contrast to the above three sensor types, magnetic sensors are compact and lightweight despite being rigid components, making them suitable for embedding in soft structures in large numbers [18,19,20]. The embedded magnets and magnetic sensors allow the detection of local magnetic field changes while maintaining flexibility, enabling high-resolution estimation of 3D bending.

While existing studies using various types of sensors, including magnetic ones, have achieved promising results, most of them focus on estimating soft finger shapes in free space, with only limited work addressing deformations that occur during contact with external objects or under external forces. In contrast to these approaches, this study employs a magnetic sensing configuration that integrates both magnets and magnetic sensors within the soft finger, aiming to achieve contact-aware 3D shape estimation without relying on any external devices.

2.2. Pose Estimation Methods

Previous pose or finger shape estimation methods for soft fingers can also be broadly classified into two categories: (1) Analytical approaches based on mathematical models, and (2) data-driven approaches using machine learning.

Analytical approaches typically rely on mathematical models, such as the Finite Element Method (FEM), to estimate deformations with high accuracy by incorporating material properties and external contacts. For example, Liu et al. [11] and Martin-Barrio et al. [21] demonstrated that FEM-based models can capture complex nonlinear deformations of soft structures. These methods require minimal sensing and perform open-loop forward prediction of deformation states. However, they face significant challenges, such as the need for accurate identification of physical parameters in advance and the high computational cost associated with solving FEM models. These factors make FEM-based approaches difficult to implement in real time. In contrast to analytical methods, the data-driven approaches construct neural networks that learn the mapping between sensor data and soft finger shapes [22,23]. While such methods require a sufficient amount of training data, once trained, they can handle complex shape deformations, including nonlinearities and contact-induced changes. The methods are also efficient for real-time estimation.

Considering these trade-offs, we adopt a data-driven estimation approach to achieve both high-accuracy and real-time shape reconstruction of a soft finger. However, unlike most existing data-driven methods, which primarily address shape estimation in free space and do not explicitly handle deformation changes caused by contact, our method is designed to be contact-aware. Specifically, we propose a neural-network-based architecture that does not output a single coordinate estimate directly; instead, it first generates multiple candidate coordinate predictions and then selects the correct one based on the inferred contact condition. This design enables the model to account for contact-induced deformation and maintain estimation accuracy even during interactions with external objects.

3. Soft Finger with Embedded Magnetic Sensors

3.1. Basic Structure

The soft finger used in this study was designed to enable shape estimation by embedding magnetic sensors and magnets inside its body. An overview of the finger’s appearance is shown in Figure 2. The external shape of the soft finger is rectangular, with a total length of 230.4 mm, a width of 25.5 mm, and a height of 17 mm. The finger bends by winding multiple polyethylene tendons embedded inside the body using motors. Each tendon passes through through-holes in spacer plates fabricated with a 3D printer (PLA). When tension is applied along these paths, the tendons shorten along their respective routes, causing the finger to bend in the corresponding direction. Each spacer plate has a thickness of 3 mm, and the distance between adjacent plates is set to 23.9 mm. The three drive tendons on the ventral side are housed inside silicone tubes with an outer diameter of 2.5 mm and an inner diameter of 2 mm. These tubes pass through the through-holes in the spacer plates and serve to reduce friction between the tendons and the silicone body. On the dorsal side, two tendons are routed, but these are not used for actuation; instead, they are used to fix the soft finger to its base. By independently controlling the three ventral tendons, the soft finger can achieve actuation with three-dimensional degrees of freedom.

The main body of the finger is formed by casting liquid silicone rubber into a mold and curing it into shape. Inside the silicone, spacer plates are embedded at intervals of 23.9 mm to provide mounting positions for the magnetic sensors and to guide the tendon routing. Each spacer plate has a thickness of 3 mm and contains four through-holes for the tendons, as well as three dedicated slots for mounting magnetic sensors. The detailed arrangement of the plates are illustrated in Figure 2c. The structure of a single spacer plate is shown in Figure 2d. A single finger is supported by nine spacer plates. Magnetic sensors are installed on the 2nd, 5th, and 8th plates, which correspond to the finger base, middle, and fingertip positions, respectively, resulting in a total of 3 × 3 = 9 sensor chips. Each magnetic sensor is flanked by two cylindrical permanent magnets on its lateral sides. These magnets have a diameter and height of 2 mm and are positioned 5 mm away from the sensor surface. As the finger bends, the relative distance between the magnets and the sensors changes, leading to variations in the magnetic flux density and consequently in the sensor output. These placements of sensors and magnets are determined by the characteristics of the magnetic field distribution and the operational guidelines provided in the sensor datasheet. The enlarged view in Figure 2b illustrates the detailed arrangement of the magnets surrounding the sensors. We used two types of liquid silicone rubber (Mold Star^® 30 and Mold Star^® 15, Smooth-On, Inc., Macungie, PA, USA) with different elastic properties to make the main finger body. The Mold Star 30 liquid rubber has a Shore hardness of 30A and is used as the structural support for the entire finger. The dark blue parts of Figure 2 are made of it. In contrast, the Mold Star 15 liquid rubber is softer with a Shore hardness of 15A, and is used to form the regions embedding the magnets and magnetic sensors. These regions are highlighted in light green in Figure 2. By selectively combining the two silicone materials with different stiffness levels, the soft finger can provide sufficient global rigidity to maintain its overall shape, while at the same time preserve localized flexibility around the embedded sensing components.

3.2. Fabrication

The soft finger was fabricated using a custom mold printed using a 3D printer (PLA). The fabrication process is illustrated in Figure 3.

In the first step, we fabricated nine spacer plates using a 3D printer (PLA) and installed connector-equipped circuit boards on the 2nd, 5th, and 8th plates, each carrying three magnetic sensor chips. On each plate, the sensor chips were wired in parallel through the connector-mounted boards. Figure 3a shows the front and back views of a spacer plate with the sensor chips mounted. The interior of each plate is partially hollowed out to ensure that the silicone rubber forms a continuous structure throughout the finger after casting. After that, we inserted silicone tubes through the ventral-side holes of all nine spacer plates, and polyethylene tendons (tension cords) through both the tubes and the dorsal-side holes. The assembled structure was then placed into the casting mold. A positioning jig for the magnets and a partition plate were also installed in the mold to separate the two types of silicone materials. Figure 3b shows the mold with the installed spacer plates, positioning jig, and partition plate. Subsequently, we poured the softer silicone, Mold Star^® 15, into the sensor sections and filled half of them, as shown in Figure 3c. The silicone was degassed under vacuum to remove air bubbles and then cured. After curing, we removed the positioning jigs and inserted cylindrical magnets into the slots formed by the jigs, as illustrated in Figure 3d. Once the magnets were in place, we additionally poured Mold Star 15 into the sensing sections of the finger to encapsulate the magnets and complete the sensing regions. After these sections were fully degassed and cured, we removed the partition plates and filled the final cavity with Mold Star^® 30 to form the structural support of the finger, as shown in Figure 3e.

After all these steps, we obtained a composite structure that integrated two silicone materials with different hardness levels to achieve both localized flexibility around the sensing components and global rigidity for the overall finger.

3.3. Magnetic Sensing System

The soft finger used in this study is equipped with an EQ730L linear Hall sensor. This sensor is a uniaxial Hall-effect device that measures the component of magnetic flux density perpendicular to the sensor surface. The arrow in Figure 2a indicates this perpendicular direction. The sensor has a magnetic sensitivity of 130 mV/mT and outputs a voltage linearly proportional to the magnetic flux density. It operates at a supply voltage of 5 V, with the output voltage varying linearly between 0 and 5 V. Moreover, the sensor features a fast response time of 2 µs, enabling it to track dynamic deformations of the soft finger in real time. In addition to its electronic properties, the sensor is highly compact, with external dimensions of 4 mm (width) × 3 mm (height) × 1.4 mm (thickness), allowing it to be embedded into the spacer plates in a fitted configuration. The compactness also allows the sensor to be integrated into the soft finger without compromising flexibility. Another important feature of the sensor is its relatively stable temperature characteristics, as specified by the manufacturer. Under normal indoor operating conditions, the output variation due to temperature changes is limited, and because the soft finger does not contain active heat-generating components, the influence of temperature-induced drift is expected to be minimal.

In our configuration, as shown in Figure 3b,d, each Hall sensor is sandwiched by two cylindrical permanent magnets with identical magnetic orientation. The south poles of the magnets face the front side of the sensor, while the north poles face the back side. By combining magnetic sensors with permanent magnets in this way, the relative distance between them changes as the soft finger bends, leading to variations in the magnetic flux density along the sensor’s detection axis and consequently in the output voltage.

Mathematically, when using cylindrical magnets, the magnetic flux density

B (X)

along the central axis can be approximated by the following equation:

B (X) = \frac{B r}{2} (\frac{L + X}{\sqrt{R^{2} + {(L + X)}^{2}}} - \frac{X}{\sqrt{R^{2} + X^{2}}}),

(1)

where B is the magnetic flux density (mT), L is the length of the magnet in the axial direction (mm), R is the radius of the magnet (mm), X is the distance from the magnet surface along the axis (mm), and

B r

is the remanent magnetization (mT). In our design, we used N40 Neodymium magnets. The parameters of the magnet are R = 1, L = 2, and

B r

= 1250 mT. Based on these parameters, the magnetic flux density

B (X)

as a function of distance can be analytically calculated. By multiplying

B (X)

by the magnetic sensitivity of the EQ730L sensor (130 mV/mT), we can obtain the theoretical relationship between the distance from the sensor to the magnet and the corresponding output voltage. Furthermore, this output voltage is measured by an micro-controller (Arduino Mega) and discretized into an integer value ranging from 0 to 1023 using a 10-bit Analog-to-Digital Converter (ADC) (The 10-bit resolution was fine enough for the signals, as the noise levels outweighed the quantization effects). Based on these relations and parameters, we can theoretically derive the theoretical curve representing the relationship between the sensor-to-magnet distance and the integer values obtained from the Arduino Mega. Figure 4 illustrates the theoretical curve. Assuming that the bending of the soft finger shortens the distance between the magnetic sensor and the magnet by approximately 2 mm, the curve implies that placing the initial distance at around 5 mm allows for a large output change without reaching the sensor’s upper limit. The sensors are arranged according to this consideration in practice and the virtual output range is approximately 2.5–5.0 V under the configuration.

Note that magnetic interference exists between the sensor–magnet pairs located on the same layer. However, eliminating this interaction was not the objective of our design. Instead, our intent is to obtain sufficiently rich spatial information for deformation discrimination. To this end, we deliberately use three sensor–magnet pairs arranged laterally to provide left–center–right magnetic responses. Any coupling between pairs is implicitly handled by a learning model, which maps the combined sensor outputs to the estimated finger deformation.

4. Learning to Estimate Finger Shapes

Based on the soft finger design and sensing system described in the previous section, we constructed a supervised learning model to estimate the finger’s shape from time-series sensor data. The model takes motor and sensor signals as input and outputs the estimated 3D positions of four markers placed along the finger.

4.1. Training Data Collection

Before detailing the model architecture, we first describe how the training data were collected, as the data format directly determines the model’s input representation. We collected three types of data: (1) the rotation angles of the motor driving the tendons, (2) the corresponding motor currents, and (3) the output voltages from the magnetic sensors. The actual data acquisition setup is shown in Figure 5. We used Dynamixel XM430-W350 motors (ROBOTIS, Seoul, Republic of Korea) to drive the tendons and could thus directly acquire their rotation angles and current values. The setup during finger actuation, as well as the directions in which each motor contributes to bending, is shown in Figure 6. The ground-truth shapes used for training were obtained through motion capture markers, as shown in Figure 7. Four reflective markers were attached along the side of the soft finger, and an additional marker was placed at the base. The three-dimensional positions of these markers were captured using a motion capture system consisting of nine cameras arranged around the finger. The tracked marker positions were used as the ground-truth representation of the finger shape. The measurement accuracy of the used motion capture system is approximately ±0.5 mm.

During data collection, motor terminating angles were randomly generated to actuate the finger and produce sensing information. Especially, to ensure that the finger motion covered the full operational range without damaging the soft finger, we randomly selected the motor terminating angles within the limits defined by the following equations:

\{\begin{matrix} θ_{left}, θ_{right}, θ_{center} < 250^{\circ} \\ θ_{left} + θ_{right} + θ_{center} > 250^{\circ} \\ θ_{left} \cdot θ_{right} = 0 \end{matrix}

(2)

Here,

θ_{left}

,

θ_{right}

,

θ_{center}

represent the terminating angles of the motors responsible for bending the finger leftward, rightward, and forward, respectively.

Once the terminating angles were determined, each motor was controlled to rotate toward its target angle, thereby actuating the soft finger and collecting data. The initial posture of the finger was defined as a critical state between deformation and non-deformation. It was set by starting from a state in which the tendons were fully loosened and then gradually pulling them, stopping at the point where the finger began to move. The motor angles were calibrated to be 0 that the initial posture for each collection routine. The bending motion toward the terminating angles was executed over a duration of 7 s. No periodic recalibration was applied during the execution.

An example of the actually collected data is shown in Figure 8. Note that the sampling rates were approximately 10 Hz for both the motor angles and currents, 10 Hz for the magnetic sensors, and 100 Hz for the motion capture system. Due to this difference in sampling frequencies, we attached timestamps to all data during collection and aligned the data across different frequencies. In particular, we resampled the high-frequency motion capture data by selecting the value closest in time to each low-frequency timestamp to re-align the mis-matched data.

4.2. Network Architecture

We adopted a two-stage neural network model to estimate the 3D shape of the soft finger. The overall architecture of the model is illustrated in Figure 9.

The first stage is an MDN that takes the motor rotation angles (

θ_{1}

to

θ_{3}

) as input. It outputs two Gaussian distributions, each representing a predicted 3D position of the four motion-capture markers attached to the fingertip (i.e., 12-dimensional vectors). One distribution is assumed to correspond to the non-contact state, and the other to the contact state. By using the negative log-likelihood as the loss function, the model avoids collapsing to a compromise between the two distributions and instead learns the prediction distinctly.

The second stage is a selection mechanism that takes as input the two predicted means (

μ_{1}

,

μ_{2}

) from the MDN, along with a 12-dimensional signal consisting of magnetic sensor outputs (

m_{1}

to

m_{9}

) and motor currents (

I_{1}

to

I_{3}

) in time-series. The time-series signals consists of 16 consecutive frames and are fed into a GRU to extract temporal features (In trial experiments conducted during the development, we compared different sequence lengths (8, 16, and 32 frames). The results indicated no significant performance differences and the GRU model is not highly sensitive to length. The final choice of 16 frames was primarily guided by practical considerations of the current system setup. With a sensing and inference rate of approximately 20 Hz, a 16-frame sequence corresponds to roughly one second of temporal information). The GRU output is then concatenated with

μ_{1}

and

μ_{2}

and passed to a classifier that determines which prediction is more accurate. Cross-entropy is used as the loss function, and the ground-truth label is automatically assigned based on which of the two predicted means is closer to the true marker positions. To account for data imbalance and ambiguity in contact classification, the loss is weighted using the inverse of class frequencies. Furthermore, when the difference between

μ_{1}

and

μ_{2}

is small, the loss is down-weighted to emphasize clear distinctions between contact and non-contact states (At the early stage of motion, even when contact exists, the observed behavior is often very similar to the non-contact case. This leads the model to favor non-contact predictions, which artificially increases the overall accuracy due to class imbalance. To solve this problem, we intentionally down-weight the loss in the region where

μ_{1}

and

μ_{2}

are close, preventing the model from being overly penalized in such ambiguous states and reducing the bias toward non-contact responses).

The model was implemented in Python using PyTorch, and training was performed with the Adam optimizer. The dataset was randomly split into 80% for training and 20% for testing. Learning rate scheduling was handled by the ReduceLROnPlateau method, with an initial learning rate of 0.001, a reduction factor of 0.5, and a patience of 10 epochs. Early stopping was triggered if the test loss did not improve for 30 consecutive epochs. Input data were normalized separately for motor signals and sensor signals.

The estimation accuracy of the model can be evaluated by computing the Euclidean distance between the predicted and measured positions for each of the four markers, and then averaging the errors over all test samples. The evaluation metric is mathematically

E_{j} = \frac{1}{N} \sum_{n = 1}^{N} |{\hat{p}}_{n, j} - p_{n, j}|, j = 1, 2, 3, 4,

(3)

where

{\hat{p}}_{n, j}

denotes the estimated position of marker j for the n-th sample,

p_{n, j}

represents the ground-truth position obtained from the motion capture system, and N is the number of test samples.

5. Experiments and Analysis

To evaluate the effectiveness of the proposed method, we constructed datasets under two conditions: with and without contact between the soft finger and an object. The contact condition was reproduced by fixing a 3D-printed rectangular block in front of the soft finger. The block was placed at a position 20 mm forward and 135 mm upward from the fingertip of the soft finger, as shown in Figure 7. In the non-contact condition, experiments were conducted without the block. For each condition, we collected 1500 bending motions as training data and recorded an additional ten trials as test data. All data, including motor angles, motor currents, magnetic sensor outputs, and marker coordinates, were synchronized and acquired using the method described in the previous section.

5.1. Comparison by Data Modality

First, we analyzed the impact of different input data modalities on the shape estimation accuracy to understand which sensor information contributes most to the estimation. We used the same MDN model structure across all conditions and compared performance by varying the modalities of input data fed into the Selector. The model that incorporated motor current values and magnetic sensor outputs in addition to motor angles was treated as the base line (Full Modality). It was compared with the following two variants where each included only one of the additional modalities: (1) Current Only, which uses only motor current values together with motor angles; (2) Magnetic Sensor Only, which uses only magnetic sensor outputs together with motor angles. The configurations of these modality types are summarized in Table 1.

The comparison results using these varying configurations are summarized in Table 2 for the mean errors and their standard deviations. We can see from the results that the model that relied solely on magnetic sensor data (Magnetic Sensor Only) slightly outperformed the Full Modality model. This is different from our expectation that the motor current would correlate with wire tension and provide useful information for contact detection. The reason was that even when the motor produced similar pulling forces, the measured current differed significantly between motor winding (pulling) and unwinding (releasing) motions, making it less informative for binary contact detection. In contrast to the motor current, the magnetic sensor directly captures local deformation of the soft finger and is more tightly coupled to physical contact events.

Meanwhile, we can also see that the Full Modality model had slightly better performance than the Magnetic Sensor Only one for the non-contact condition, and exhibited slightly larger errors for the contact condition. The Current Only model produced substantially larger errors in both contact and non-contact cases, indicating that motor current signals alone are insufficient for accurately inferring the deformation state of the soft finger. Overall, these results demonstrate that, within the proposed network structure, magnetic sensor data serves as the primary contributing modality. Even when used in isolation, it enables high-accuracy shape estimation.

The inference time measured over 6690 frames was 4.461 s, corresponding to 0.00067 s per data frame. Therefore, the estimator can meet real-time requirements for online servoing.

5.2. Comparison with Other Network Architectures

In this section, we investigate how differences in model architecture affect the accuracy of shape estimation. For comparison, we evaluated two alternative models: (1) a ResNet + GRU model, which is a time-series regression architecture commonly used for soft robot shape estimation [24], and (2) a Oneshot model, which is a simplified version of our proposed method with the mode-selection mechanism replaced by hidden states (The diffusion model method [25] was also investigated but were not included as baselines, as their performance degraded in our dataset with frequent contact/non-contact transitions, likely due to strong discontinuities and abrupt temporal distribution shifts). The detailed structure of the two models are illustrated in Figure 10 for comparison.

Here, the ResNet + GRU model serves as a baseline that directly regresses marker coordinates using a standard CNN + RNN structure. The Oneshot model shares the same layer structure as the proposed one but omits explicit selection, making it an ablation comparison target. A summary of the training settings and model architectures used for comparison are presented in Table 3. Note that all candidates ignored current data since the current was found less significant in the first experiment.

The comparison results are shown in Table 4 for the mean and standard deviation of errors. The Oneshot model exhibited explicit negative performance in the results. Meanwhile, the MDN + Selector and ResNet + GRU models are difficult to discern from these bare error values. Figure 11 further shows the ground-truth and predicted trajectories of the fingertip markers for each model. The blue and green lines indicate the ground-truth positions under contact and non-contact conditions, respectively, while the red and yellow lines show the corresponding predicted positions. From the results we can see that although the ResNet + GRU model achieved overall comparable error levels to the proposed one, its output trajectories exhibited high-frequency noise. This is likely because ResNet + GRU cannot model the multi-modality of the coordinate distribution, resulting in predictions that tend to converge toward intermediate positions between multiple plausible states. GRU has advantages in capturing temporal dependencies in the sensor signals, allowing the network to capture short-term dynamics rather than instantaneous measurements. To assess the practical contribution of the GRU, we conducted comparative experiments by removing the GRU module and using only the ResNet. The results show that there is little difference in average error. However, in terms of the top 1% error, the model with GRU achieves an improvement of approximately 5 mm. This observation indicates that the role of the GRU is not to explicitly select modes through interpretable attention or weights, but to provide temporal smoothing and robustness in challenging or ambiguous situations. We can also see from the results that the Oneshot model produced smoother outputs, but misclassifications were observed near the start and end of contact events, with the model mistakenly identifying non-contact states as contact. This implies that the Oneshot model implicitly attempted to perform mode selection based on contact state, but the lack of an explicit mechanism led to erroneous predictions. Overall, the proposed model maintained both smooth output trajectories and high estimation accuracy, demonstrating its robustness and effectiveness across both contact and non-contact conditions.

6. Conclusions and Future Work

In this study, we proposed a 3D shape estimation method that accounts for contact conditions by using a soft finger with internally integrated magnets and magnetic sensors. We constructed an architecture that estimates marker positions from sensor information by combining MDN and GRU, and employing a Selector head to determine the final results. The design was based on our observation that MDNs independently estimate single-contact-state shapes well, and contact and non-contact states are easily identified by time-lapse data. The experimental results demonstrated that the proposed method can estimate the fingertip position with an average error of approximately 4 mm, even when trained and tested on datasets containing both contact and non-contact conditions. In particular, it was confirmed that the presence or absence of magnetic sensors has a significant impact on estimation accuracy, highlighting their effectiveness in detecting contact states. Moreover, in terms of model structure, we found that generating multiple candidate coordinates and selecting the appropriate one has higher accuracy compared to directly outputting a single coordinate.

However, the proposed method is currently limited to predicting the presence or absence of contact between the soft finger and the environment. It does not explicitly distinguish between different contact conditions and forces. With the current sensor configuration and network architecture, representing and disentangling such fine-grained contact conditions remains challenging. In future work, we plan to jointly refine the sensor design and the network architecture to differentiate between richer contact representations.

Author Contributions

Conceptualization, W.W.; Methodology, N.M. and W.W.; Software, N.M.; Validation, N.M.; Formal analysis, N.M. and W.W.; Investigation, N.M. and W.W.; Resources, W.W.; Data curation, N.M.; Writing—original draft, N.M. and W.W.; Writing—review & editing, N.M. and W.W.; Visualization, N.M.; Supervision, W.W. and K.H.; Project administration, W.W. and K.H.; Funding acquisition, K.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shintake, J.; Cacucciolo, V.; Floreano, D.; Shea, H. Soft Robotic Grippers. Adv. Mater. 2018, 30, 1707035. [Google Scholar] [CrossRef] [PubMed]
Abondance, S.; Teeple, C.B.; Wood, R.J. A Dexterous Soft Robotic Hand for Delicate In-Hand Manipulation. IEEE Robot. Autom. Lett. 2020, 5, 5502–5509. [Google Scholar] [CrossRef]
Teeple, C.B.; St. Louis, R.C.; Graule, M.A.; Wood, R.J. The Role of Digit Arrangement in Soft Robotic In-Hand Manipulation. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 7201–7208. [Google Scholar] [CrossRef]
Kim, S.H.; Chang, H.S.; Shih, C.H.; Uppalapati, N.K.; Halder, U.; Krishnan, G.; Mehta, P.G.; Gazzola, M. A physics-informed, vision-based method to reconstruct all deformation modes in slender bodies. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 4810–4817. [Google Scholar] [CrossRef]
Wang, H.; Xu, H.; Meng, Y.; Ge, X.; Lin, A.; Gao, X.-Z. Deep learning-based 3D pose reconstruction of an underwater soft robotic hand and its biomimetic evaluation. IEEE Robot. Autom. Lett. 2022, 7, 11070–11077. [Google Scholar] [CrossRef]
Dou, W.; Zhong, G.; Cao, J.; Shi, Z.; Peng, B.; Jiang, L. Soft robotic manipulators: Designs, actuation, stiffness tuning, and sensing. Adv. Mater. Technol. 2021, 6, 2100018. [Google Scholar] [CrossRef]
Hegde, C.; Su, J.; Tan, J.M.R.; He, K.; Chen, X.; Magdassi, S. Sensing in soft robotics. ACS Nano 2023, 17, 15277–15307. [Google Scholar] [CrossRef] [PubMed]
Tian, S.; Cangan, B.G.; Navarro, S.E.; Beger, A.; Duriez, C.; Katzschmann, R.K. Multi-Tap Resistive Sensing and FEM Modeling Enables Shape and Force Estimation in Soft Robots. IEEE Robot. Autom. Lett. 2024, 9, 2830–2837. [Google Scholar] [CrossRef]
Loo, J.Y.; Ding, Z.Y.; Baskaran, V.M.; Nurzaman, S.G.; Tan, C.P. Robust Multimodal Indirect Sensing for Soft Robots via Neural Network-Aided Filter-Based Estimation. Soft Robot. 2022, 9, 591–612. [Google Scholar] [CrossRef] [PubMed]
Xie, Z.; Yuan, F.; Liu, Z.; Sun, Z.; Knubben, E.M.; Wen, L. A proprioceptive soft tentacle gripper based on crosswise stretchable sensors. IEEE/ASME Trans. Mechatron. 2020, 25, 1841–1850. [Google Scholar] [CrossRef]
Liu, L.; Huang, X.; Zhang, X.; Zhang, B.; Xu, H.; Trivedi, V.M.; Liu, K.; Shaikh, Z.; Zhao, H. Model-Based 3D Shape Reconstruction of Soft Robots via Distributed Strain Sensing. Soft Robot. 2025; in press. [Google Scholar] [CrossRef] [PubMed]
Tapia, J.; Knoop, E.; Mutný, M.; Otaduy, M.A.; Bächer, M. Makesense: Automated sensor design for proprioceptive soft robots. Soft Robot. 2020, 7, 332–345. [Google Scholar] [CrossRef] [PubMed]
Krauss, H.; Takemura, K. Enhanced Model-Free Dynamic State Estimation for a Soft Robot Finger Using an Embedded Optical Waveguide Sensor. IEEE Robot. Autom. Lett. 2024, 9, 6123–6129. [Google Scholar] [CrossRef]
McCandless, M.; Wise, F.J.; Russo, S. A Soft Robot with Three Dimensional Shape Sensing and Contact Recognition Multi-Modal Sensing via Tunable Soft Optical Sensors. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 573–580. [Google Scholar] [CrossRef]
Del Bono, V.; McCandless, M.; Wise, F.J.; Russo, S. A soft miniaturized continuum robot with 3D shape sensing via functionalized soft optical waveguides. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 5309–5316. [Google Scholar] [CrossRef]
Stella, F.; Della Santina, C.; Hughes, J. Soft robot shape estimation with IMUs leveraging PCC kinematics for drift filtering. IEEE Robot. Autom. Lett. 2024, 9, 1945–1952. [Google Scholar] [CrossRef]
Pei, G.; Stella, F.; Meebed, O.; Bing, Z.; Della Santina, C.; Hughes, J. IMU based pose reconstruction and closed-loop control for soft robotic arms. In Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, United Arab Emirates, 14–18 October 2024; pp. 1847–1852. [Google Scholar] [CrossRef]
Adamu, Y.A.; Feliu-Talegon, D.; Mathew, A.T.; Renda, F. 3-Axis angular strain estimation with Hall effect sensors for proprioception of soft robotic manipulators. IEEE Robot. Autom. Lett. 2025, 10, 8666–8673. [Google Scholar] [CrossRef]
Baaij, T.A.; Holkenborg, M.K.; Stölzle, M.; Van Der Tuin, D.; Naaktgeboren, J.; Babuška, R.; Della Santina, C. Learning 3D Shape Proprioception for Continuum Soft Robots with Multiple Magnetic Sensors. Soft Matter 2023, 19, 44–56. [Google Scholar] [CrossRef] [PubMed]
Costa, C.F.R.; Reis, J.C.P. End-point position estimation of a soft continuum manipulator using embedded linear magnetic encoders. Sensors 2023, 23, 1647. [Google Scholar] [CrossRef] [PubMed]
Martin-Barrio, A.; Terrile, S.; Diaz-Carrasco, M.; del Cerro, J.; Barrientos, A. Modelling the Soft Robot Kyma Based on Real-Time Finite Element Method. Comput. Graph. Forum 2020, 39, 289–302. [Google Scholar] [CrossRef]
Zhao, Q.; Lai, J.; Huang, K.; Hu, X.; Chu, H.K. Shape Estimation and Control of a Soft Continuum Robot under External Payloads. IEEE/ASME Trans. Mechatron. 2022, 27, 2511–2522. [Google Scholar] [CrossRef]
Thuruthel, T.G.; Shih, B.; Laschi, C.; Tolley, M.T. Soft robot perception using embedded soft sensors and recurrent neural networks. Sci. Robot. 2019, 4, eaav1488. [Google Scholar] [CrossRef] [PubMed]
Kim, D.; Kim, S.H.; Kim, T.; Kang, B.B.; Lee, M.; Park, W.; Ku, S.; Kim, D.; Kwon, J.; Lee, H.; et al. Review of machine learning methods in soft robotics. PLoS ONE 2021, 16, e0246102. [Google Scholar] [CrossRef] [PubMed]
Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]

Figure 1. Overview of the proposed shape estimation method. Magnetic sensors and motor angle/current data are collected from the soft finger and used as input to a neural network consisting of MDN, GRU, and a Selector head. The network predicts the 3D positions of four markers on the finger body, enabling shape estimation under both contact and non-contact conditions.

Figure 2. Overview and structural design of the soft finger with embedded magnets and magnetic sensors. (a) Perspective view of the overall structure. (b) Enlarged view of the sensor unit. The arrows indicate the direction of magnetic flux density detected by the magnetic sensors. (c) Side and top views showing the overall height and the positioning of sensors and magnets. (d) Cross-sectional view of a spacer plate, illustrating the placement of magnets and magnetic sensors.

Figure 3. Fabrication process of the soft finger. (a) Sensor mounting onto the spacer plate. (b) Assembly of tendons and spacers into the mold. (c) First casting of Mold Star 15. (d) Magnet placement and second casting of Mold Star 15. (e) Casting of Mold Star 30 after removing the partition.

Figure 4. Theoretical relationship between the sensor-to-magnet distance and the sensor output. The vertical axis indicates the digitized output from the micro-controller’s 10-bit ADC (0-1023), where the actual analog sensor output ranges from 0 to 5 V, with 512 corresponding to 2.5 V.

Figure 5. Data acquisition system. The soft finger is the blue object in the middle. It suspends at the center and is measured by nine motion capture cameras.

Figure 6. Motor configuration. The bottom-left motor contributes to bending the soft finger to the left, the top-left motor to the front, and the bottom-right motor to the right.

Figure 7. Placement of markers on the soft finger and the position of the obstacle. Four markers are attached along the side of the soft finger, and one marker is attached to the base. The marker on the base is used as the origin. The obstacle, made of an aluminum frame, is fixed at a position 45 mm in front and 135 mm above the fingertip.

Figure 8. Example of raw data collected for shape estimation. Top: Rotation angles of four actuators. Middle: Motor current values. Bottom: Output voltages from the nine magnetic sensors.

Figure 9. Overview of the Learning Architecture for Shape Estimation. The motor rotation angles are input to an MDN to generate two sets of candidate marker positions. In parallel, the magnetic sensor outputs and motor currents over 16 consecutive frames are fed into a GRU to extract temporal features. The hidden state generated by the GRU, together with the two MDN-generated position candidates, is then passed to a selector network that identifies the correct motion capture marker coordinates (finger pose). The two components of the network are trained sequentially using the collected contact and non-contact data and then stitched together to perform the final shape estimation.

Figure 10. Two alternative models for comparison. (a) ResNet + GRU model. (b) Oneshot model.

Figure 11. Comparison of the measured and estimated trajectories of the fingertip marker for each model. Top: Proposed method; Middle: Oneshot model; Bottom: ResNet + GRU model. Each plot shows the estimation results along the X, Y, and Z axes. The light red and light blue lines represent the measured trajectories under contact and non-contact conditions, respectively. The solid red and solid blue lines show the corresponding estimated trajectories. The plots are in clutter. However, we kept their form for easy comparison and analysis: When the estimation error becomes large, the estimation does not produce completely unreasonable values, but instead mainly reflects misclassification of the contact state. To be not confused by clutter, readers may keep in mind that the light curves are mostly overlapped by the solid ones.

Table 1. Specifications of different data modality types used for comparison experiments.

Modality Type	Input 1 Size	Input 2 Size	Angle	Current	Magnetic
Full Modality (Baseline)	3	12	◯	◯	◯
Current Only	3	3	◯	◯	×
Magnetic Sensor Only	3	9	◯	×	◯

^Note ◯ indicates that the data modality is included, whereas × indicates that it is excluded.

Table 2. Mean and deviation of estimation errors for contact and non-contact conditions.

		Marker1		Marker2		Marker3		Marker4
Modality Type	Contact State	Mean	Std.	Mean	Std.	Mean	Std.	Mean	Std.
Full Modality (Baseline)	Non-contact	0.61	0.66	1.33	2.11	2.19	3.14	3.07	4.45
Full Modality (Baseline)	Contact	1.11	1.15	2.86	3.28	4.38	4.94	5.61	5.99
Current Only	Non-contact	2.17	3.26	6.09	9.86	9.78	15.54	12.68	19.34
Current Only	Contact	1.20	1.59	3.12	4.61	4.80	7.11	6.20	8.84
Magnetic Sensor Only	Non-contact	0.63	0.69	1.40	2.19	2.31	3.55	3.23	4.63
Magnetic Sensor Only	Contact	0.94	0.91	2.48	2.64	3.85	4.00	4.89	4.58

Table 3. Summary of each model used in the comparison experiments.

	MDN + Selector	Oneshot	ResNet + GRU
Rotation Input	1 frame	1 frame	16 frames
Magnetic Sensor Input	16 frames	16 frames	16 frames
Explicit Selection	Yes	No	No
Loss Function	MDN negative log-likelihood + cross_entropy	Gaussian negative log-likelihood	MSE
Purpose	Proposed	Ablation	Baseline

Table 4. Comparison of mean and standard deviation of estimation errors under contact and non-contact conditions.

		Marker1		Marker2		Marker3		Marker4
Input Type	Contact State	Mean	Std.	Mean	Std.	Mean	Std.	Mean	Std.
MDN + Selector	Non-contact	0.61	0.66	1.33	2.11	2.19	3.14	3.07	4.45
MDN + Selector	Contact	1.11	1.15	2.86	3.28	4.38	4.94	5.61	5.99
Oneshot	Non-contact	0.48	0.45	0.93	1.31	1.50	2.11	2.25	2.79
Oneshot	Contact	1.61	1.34	4.24	4.19	6.45	6.63	8.21	8.50
Magnetic Sensor Only	Non-contact	0.61	0.59	1.76	1.69	2.99	2.73	4.33	3.55
Magnetic Sensor Only	Contact	0.68	0.56	1.76	1.62	2.72	2.53	4.10	3.25

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Matsuyama, N.; Wan, W.; Harada, K. Three-Dimensional Shape Estimation of a Soft Finger Considering Contact States. Appl. Sci. 2026, 16, 717. https://doi.org/10.3390/app16020717

AMA Style

Matsuyama N, Wan W, Harada K. Three-Dimensional Shape Estimation of a Soft Finger Considering Contact States. Applied Sciences. 2026; 16(2):717. https://doi.org/10.3390/app16020717

Chicago/Turabian Style

Matsuyama, Naoyuki, Weiwei Wan, and Kensuke Harada. 2026. "Three-Dimensional Shape Estimation of a Soft Finger Considering Contact States" Applied Sciences 16, no. 2: 717. https://doi.org/10.3390/app16020717

APA Style

Matsuyama, N., Wan, W., & Harada, K. (2026). Three-Dimensional Shape Estimation of a Soft Finger Considering Contact States. Applied Sciences, 16(2), 717. https://doi.org/10.3390/app16020717

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Three-Dimensional Shape Estimation of a Soft Finger Considering Contact States

Abstract

1. Introduction

2. Related Work

2.1. Sensor Selection

2.2. Pose Estimation Methods

3. Soft Finger with Embedded Magnetic Sensors

3.1. Basic Structure

3.2. Fabrication

3.3. Magnetic Sensing System

4. Learning to Estimate Finger Shapes

4.1. Training Data Collection

4.2. Network Architecture

5. Experiments and Analysis

5.1. Comparison by Data Modality

5.2. Comparison with Other Network Architectures

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI