Proposal and Implementation of a Procedure for Compliance Recognition of Objects with Smart Tactile Sensors

This paper presents a procedure for classifying objects based on their compliance with information gathered using tactile sensors. Specifically, smart tactile sensors provide the raw moments of the tactile image when the object is squeezed and desqueezed. A set of simple parameters from moment-versus-time graphs are proposed as features, to build the input vector of a classifier. The extraction of these features was implemented in the field programmable gate array (FPGA) of a system on chip (SoC), while the classifier was implemented in its ARM core. Many different options were realized and analyzed, depending on their complexity and performance in terms of resource usage and accuracy of classification. A classification accuracy of over 94% was achieved for a set of 42 different classes. The proposed approach is intended for developing architectures with preprocessing on the embedded FPGA of smart tactile sensors, to obtain high performance in real-time complex robotic systems.


Introduction
Tactile sensors are increasingly being used in a variety of applications. This has led to the development of electronic skin (e-skin) devices, which open up possibilities for healthcare, human-machine interfaces, virtual reality, artificial intelligence-related, and robotic applications [1]. In this context, object manipulation and interaction with the environment involves the detection of properties such as the friction coefficient, texture, geometry, and stiffness [2,3].
Stiffness can be estimated in a fairly straightforward manner from measurements of the force and displacement when the object is squeezed [4]. However, the force-displacement curve of a compliance object can be far from linear, being quite complex instead [5]. In reference [5], FPCA was used to approximate the force-displacement curve obtained from two pressure sensors with three basis functions. Eight different soft objects were classified, and the best results were obtained with a k-NN classifier. Eight parameters were computed from the readings of two force sensors when a vegetable was squeezed until it collapsed in [6]. These parameters were the mean value, variance and standard deviation, maximum force, quality factor, quartiles, and quartile factor. All data recorded from the two sensors were reduced to these eight parameters, and a decision tree algorithm obtained good results in classifying tomatoes into three categories depending on their stiffness. Hardness can also be estimated from analysis of data such as the reaction force when an object is explored using some predefined motions [7,8].
On the other hand, the compliance of an object can be estimated from the output of tactile sensors, taking advantage of advanced artificial hands and grippers equipped with them. The whole tactile image can be the input of complex neural network architectures such us convolutional neural networks (CNN), which estimate the stiffness from the image

Proposed Features for Classification
As mentioned in the Introduction, the raw moments of the tactile image are computed locally in the robotic finger and the palm [18,19]. In particular, the {p, q} order of moments of the tactile image is computed as follows: x p y q I(x, y) where I(x, y) is the output from the taxel at row x and column y.
The raw moments of the tactile images versus time curves that are registered as the objects are compressed or decompressed. Figure 1 shows these curves for an example object (potato) and moment (M 0,0 ). The following set of features are defined from this graph, to significantly reduce the consumption of resources and the dimensions of the classifier input feature vector: H1 and H2 are related to the object hysteresis, which can be significant in objects made of elastomers. This hysteresis causes the graph to not be symmetric with respect to the axis defined by the peak. The three times related to the features MAX, H1, and H2 are known in advance, since the whole sequence is controlled by the robotic system. Therefore, the corresponding features do not require any computation and can be directly registered. Moreover, the area under the curve only requires a cumulative sum. Some fancy title: followed by some more text  Note that there are six different raw moment curves (M p,q ) for each exploration, which involves both the finger (f) and the palm (p). When we consider the raw moments for both sensors and the four feature values (AR, MAX, H1, H2), there are a total of 48 features for each exploration. To avoid the issue of having redundant information, we included the option of reducing the data dimensions using principal component analysis (PCA) techniques.

Materials and Methods
In this section, we start by introducing the artificial finger and palm sensors used to gather the tactile data. Then, we describe the experimental setup and the objects of study. Afterwards, we explain the procedure we followed to collect the moment sequences during the explorations and the training algorithm used to classify the objects.

Sensors Technology
In this study, we acquired tactile data using two smart sensors from the tactile suite of the artificial hand reported by the authors in [19] (see Figure 2). The artificial finger shown in Figure 2a is equipped with tactile sensors made of a laser-isolated piezoresistive layer on an array of electrodes. The outer layer or cover is made of a thermoplastic elastomer (Filaflex ® ), and it has one dome per taxel in the tactile array. This design helps to concentrate the force and reduce the crosstalk between taxels. The sensor is capable of registering changes in the area and size of the tactile image when the objects are pressed and deformed. The spatial resolution of the sensor, that is, the minimum distance between two taxels, is 3.7 mm, and the size of the sensor is 40.7 × 15.0 mm. The electronics of the sensor are based on an FPGA (Spartan-6 ® ). Data are sampled at a frequency of F s = 485 Hz, or a sampling period of T s = 2.06 ms.
The artificial palm in Figure 2b was built in a similar way. However, unlike for the finger, the palm sensor does not require laser isolation of the piezoresistive material, because the crosstalk is reduced by the electronics. Therefore, a cover made of a continuous rubber attached to a continuous piezoresisitve layer is placed atop the electrodes. The palm sensor electronics were also based on a Spartan-6 ® FPGA. It implements the interface with the palm raw tactile sensor and also communicates with the finger sensor through an SPI serial bus, and with a personal computer through USB.
The artificial finger is made of a semi-rigid printed circuit board, with a structured cover that helps to concentrate the force on the taxels (force sensing units in the tactile array). (b) Artificial palm with continuous cover. Dimensions are given in mm. Figure 3 shows the experimental setup built to perform the object explorations. The artificial finger with the smart tactile sensor is at the top of Figure 3, while the palm is placed at the bottom. A strain gauge is used to provide a reference value of the force exerted by the palm. The finger is moved in the vertical axis using a motor controlled via an Arduino Mega 2560 ® board, so that a compression-decompression movement between the finger and palm can be carried out. As said above, the tactile data and locally computed moments are gathered by the finger and palm electronics and sent to the computer via USB.

Data Gathering Procedure
The following procedure was performed to obtain data for training and testing: • Step 1: An object from the set shown in Figure 4 and Table 1 was manually placed between the artificial finger and the palm; • Step 2: The palm was moved vertically to grasp the object, until the load cell detected a low-level threshold force of F init = 0.1 N. This position was recorded as the initial point; • Step 3: The palm was moved further vertically, so that the object was compressed until the palm reached a maximum relative distance from the initial point of approximately ≈1.2 cm (although the palm-finger gripper had a certain compliance, this limit was forced to avoid damage to the system when rigid objects were explored);

•
Step 4: The palm was moved vertically in the reverse direction, so that the object was decompressed until the initial position defined in the Step 2 was reached.
This procedure was repeated at a velocity of v = 10 mm/s and 47 times per object. During the squeeze-desqueeze sequences, the finger and palm tactile sensors collected data in real time, and they were sent to a personal computer and saved in text files using the Labview ® application. Figure 5 provides an overview of the methodology for the exploration of an object #OBJ-17 . Once the experiments had been performed with the 42 objects shown in Figure 4, we obtained the first six raw moments curves for the tactile images (1) of these objects. Figure 6 displays the example six first raw moment M p,q curves for the finger and palm and for the #OBJ-17 and #OBJ-38 objects in Figure 4 and Table 1, respectively.

Training Algorithm
This study utilized an unsupervised k-means classifier to obtain the results. The kmeans classifier was selected due to its simplicity and speed of convergence compared to other machine-learning techniques such as support vector machines (SVM) or K-nearest neighbors (KNN), which require a significant amount of data storage for real-time tasks. The k-means++ method [20] was used to initialize the centroids, which represented the different classes, either randomly or in a systematic manner. The training process involved updating the centroids in response to the presented training set. This process was repeated with shuffled data until the centroids did not change or a maximum number of iterations was reached. The final trained classifier was selected based on the best accuracy achieved over one hundred repetitions of the whole procedure, using data from the test set [21].

Implementation on the Zynq7000 ® SoC
To demonstrate the feasibility of the approach and estimate its performance, a two step procedure was followed. First, data gathering was carried out using the finger and palm sensors, as described in Section 3.4, and the obtained moment-time graphs were transferred to the external DDR3 memory of an AVNET ® ZedBoard™ development board. This board is based on the Zynq™-7000 System on Chip (SoC) XC7Z020-CLG484-1 device, which has an FPGA and an ARM ® dual-core Cortex™ A9 processor. The feature extraction procedure described in Section 2 was then implemented on the FPGA of this SoC, while the classification algorithm was implemented in the ARM core. The use of this development board added flexibility for assessing the different alternatives before the implementation of a final system, where the features from the finger and palm sensors were transferred directly to a specific board with an embedded processor that implemented the classifier. Figure 7 illustrates this system and the overall datapath and processing logic actually implemented on the FPGA of the SoC. The FIFO (first-input first-output) output (32-bit words) was transmitted to an AXISTREAM Serial to Parallel Interface , which utilized the AMBA ® AXI-Streaming protocol for the communication between the FPGA hardware modules in the system. This allowed the input data (raw moments M p,q of the finger and palm) from the FIFO to be distributed to multiple preprocessing modules (VHDL Features Computing Module in Figure 7), while synchronization signals ensured their parallel execution in hardware. This is beneficial as we are maximizing the capabilities of the FPGA. In a scenario where both the finger and palm, and the maximum number of raw moments and features were utilized, the execution time remained unchanged compared to using a single module. This could be particularly useful for situations where the complexity of the system is increased.
The Vivado Design Suite™ environment was used to implement the preprocessing modules in Figure 7). This software integrates a hardware-description code written in VHDL/Verilog, as well as presynthetized cores from IP libraries or from the High-Level Synthesis (HLS) tool. The VHDL Features Computing Module blocks produce synchronized output data (D p,q ) for the finger and palm, respectively. These data are then transmitted to an AXISTREAM Parallel to Serial Interface through the use of the AMBA ® AXI-Streaming protocol. The feature vectors are serially transferred to a DDR memory for storage via a FIFO buffer and a DMA module. This setup allows the ARM core to access the stored data without having to manage data traffic. The FIFO output module allows frequency decoupling between the processing logic (PL) and processing system (PS) parts. Figure 8 illustrates the obtention of the AR, MAX, H1, and H2 features from the p, q order moment of the tactile images (Equation (1)) the VHDL Features Computing Module. The instants required to read the values of H1, MAX, and H2 are provided in a field of the 32-bit input data (they are determined by the gripper controller in the squeezedesqueeze sequence), and their reading only requires simple comparisons implemented in LUT Logic. On the other hand, the AR feature is computed with an adder that takes the 32-bit input data per clock cycle and adds this to the aggregated summation stored in a register. The values of the features are then concatenated into a single 32-bit output. This module also uses the t_valid and t_ready signals from the AMBA ® AXI-Streaming protocol for input/output synchronization. The clock frequency is f clk = 100 MHz. The feature output is provided only three clock cycles after the last input vector is read.  An optional module to perform PCA is included in Figure 7. The inclusion of this module requires more memory and logical resources (Section 5), but a smaller number of features can be sent to the classifier implemented in the ARM core. The HLS principal component analysis (PCA) computing module implementation in Figure 7 was developed using System C in Vivado ® HLS. The corresponding pseudocode is shown in Figure 9. The Vivado optimization directive #HLS DATAFLOW enhances the concurrency of the RTL implementation. The pipeline implementation at RTL level is achieved through the use of #HLS PIPELINE and #HLS INLINE Vivado optimization directives. For each new D p,q feature vector, the pre-computed µ vector is subtracted and then, this result is multiplied by the PCA matrix coefficients coeff.
Finally, the k-means classifier (Section 3.5) was implemented in the ARM core using the SDK ® (System Development Kit) from Xilinx ® . For this work, the classifier was trained offline and the calculated centroids were stored in the memoy of the SoC. High-speed communication with a personal computer was achieved using a real-time operating system (RTOS) and a lightweight TCP/IP stack, both implemented in the ARM core.

Results and Discussion
This section shows the results obtained from the object palpations described in Section 3. The processes of data gathering, feature extraction, and training for classification were explained in Section 3.4, Section 2, and Section 3.5, respectively.

Results Obtained without PCA
In order to determine the best implementation, we needed to consider different combinations of sensors (finger, palm), features (see Table 2), numbers of image moments per sensor, and bits per feature. Matlab ® was used to obtain the results in Table 3, which shows the highest accuracy percentage achieved without PCA, for different combinations of M p,q moments, features, and bits per feature (nbits/feature), respectively. Table 2. Labels for the combination of features: area under the curve (AR), maximum value (MAX), magnitude value in the ascending part at 2/3 of the time needed to reach the maximum (H1) and magnitude value at 1/3 of the same time after the maximum was reached (H2). To assess the consumption of resources, we first identified cases with high accuracy (bolded in Table 3). These selected cases were implemented on the Zynq™-7000, and the consumption of resources was taken from the Utilization Report provided by the Vivado IDE ® . The results are shown in Table 4. Table 4. Performance data for the implementation of the feature extraction without PCA (VHDL features computing module in Figure 7), as provided by the real Vivado IDE ® Utilization Report. The last two columns are the input-output delay of the module (in ns), and its power consumption (in mW).

Sensor
Best As shown in Table 4, only a limited number of memory elements (distributed memory (LUTRAM), embedded block RAMs (BRAM), and flip-flop pairs (FF)) were required for these implementations. Specifically, the implementation only required slice registers (Flip-Flops). It is also worth noting that neither BRAMs nor Embedded DSPs (DSPs) were used in the computation of the features, which would make the migration of this design to other FPGA platforms easier.
At this point, to compare different options, we proposed the use of the figure of merit defined in (2).
where α corresponds to the classification accuracy, d is the total number of input features for the classifier, nBits is the number of bits per feature, and r is the total number of hardware resources, as defined in (3).   Table 4. The labels in the figure show the corresponding case and its classification accuracy. The best performance in terms of hardware resources and accuracy percentage was achieved by the case Fin-ger&Palm (nMom1-12bits/feature-d5). This is because it required fewer resources compared to the other implementations, which used more moments per sensor (right side of Figure 10). These other implementations achieved a high accuracy but consumed more hardware resources. In contrast, the Finger&Palm (nMom1-12bits/feature-d5) implementation only used the first raw moment for both the finger and palm, the d5 feature (combination of AR and MAX features in Table 2), and 12 bits per feature, while achieving a classification accuracy close to 94%.   The confusion matrix for the best case Finger&Palm (nMom1-12bits/feature-d5) in Figure 10 is depicted in Figure 11. This confusion matrix shows that this implementation accurately classified the 42 objects in Figure 4, reaching an accuracy of 94.5% in our experiments. Despite the large size of the test set (673 feature vectors), there were only a few misclassifications.   Figure 11. Confusion matrix of the optimal implementation in Figure 10. This case (second row of Table 4) used the first raw moment M 0,0 for both finger and palm, the d5 feature, and 12 bits per feature.
With respect to the computation time, it was the same for all cases in Table 4 (three clock cycles after the last input vector was read, see Section 4), thanks to the parallel implementation illustrated in Figure 8.

Results Obtained with PCA
When PCA was applied, the accuracy results for the same cases of Table 3 were as shown in Table 5 Figure 12 shows a comparison in three-dimensional feature space with three principal components, when the finger, palm, and finger-palm were used. In all cases, six moments were used, the d5 feature was applied, and there were 8 bits per feature. The training sets for each class in Figure 12 are represented as Gaussian ellipsoid distributions. The centroid of each class, a 3-value array, was determined through the training procedure outlined in Section 3.5. The classes in Figure 12c are more separated from each other than in Figure 12a,b; therefore, the classifier was more accurate when it received information from multiple sources. This can be seen in Table 5 and was also observed when PCA was not applied, as mentioned in the previous section.
As done when PCA was not applied, we selected the top-performing cases in terms of accuracy (bolded in Table 5). The results of the consumption of hardware resources in these cases are presented in Table 6. This time, the number of cycles the PCA computation took depended on the number of PCA components (see last column in Table 6). The consumption of memory resources was similar in all cases, which means that increasing the feature vector dimension did not imply a significant increment in hardware resources. Figure 13 shows a comparison of the cases in Table 6 with the figure of merit defined in (2). The best case now was Finger&Palm, nMom1 (d5), nBits/feature 12, nC3 and its corresponding confusion matrix is displayed in Figure 14. In all cases, the number of moments was six, the feature used was d5 from Table 2, and the number of bits per feature was 8. In the figure, PC1 stands for principal component 1, PC2 stands for principal component 2, and PC3 stands for principal component 3. Table 6. Performance data for the implementation of the feature extraction with PCA (VHDL features computing module plus HLS principal component analysis (PCA) computing module in Figure 7), as provided by the real Vivado IDE ® Utilization Report. The columns of the Table are the same as in  Table 4.   Figure 13. Result of the figure of merit in (2) normalized with respect to the best case in Figure 10.   Figure 14. Confusion matrix of the optimal implementation in Figure 13. This case (second row of Table 6, used the first raw moment M 0,0 for both finger and palm, the d5 feature, and 12 bits per feature, and three principal components. Figure 15 depicts the result of adding the input-output delay of the feature extraction (see Tables 4 and 6) to the delay of the classifier. The latter was measured with a MDO4104B-6 Mixed Domain Tektronix ® 6 GHz Oscilloscope. The dimension of the feature vector (number of features as components) is the variable on the x-axis of Figure 15. It can be seen that the larger the number of features, the longer it took for the classifier to provide an output. Since the number of features was lower in implementations with the PCA module, these realizations performed better for this aspect.  Tables 4 and 6, for the implementations without PCA and with PCA, respectively.

Conclusions
This paper proposes a strategy for recognizing the compliance of objects with tactile sensors. This strategy is intended to be implemented in smart sensors that have embedded electronics. Specifically, previously reported sensors with electronics based on FPGAs were mounted on the gripper. These sensors provide the raw moments of the tactile image, and they are registered when the target object is squeezed and desqueezed. Four parameters from the thus-obtained curve were proposed as input features for the classifier. Principal component analysis was also considered, to reduce the dimensions of the feature vector. Many different options were then analyzed, depending on the number of moments and features, and the size in bits of these features, in terms of the classification accuracy and performance of the realization in the Zynq™-7000 System on Chip. The feature extraction was carried out on the FPGA of the SoC, while the k-means classifier was implemented in its ARM core.
From the analysis of the results, it can be concluded that the best results in terms of classification accuracy were achieved when both finger and palm sensors were used (above 94% accuracy). In this case, the implementation of the feature extraction on the FPGA that ddid not apply PCA was much more efficient in terms of resources (see Tables 4 and 6) and power consumption (2 mW versus 32 mW with PCA).
The application of PCA could be advantageous to reduce the input-output delay of the classifier and increase the separation between classes. This work was carried out with a large set of 42 classes, and the application of PCA provided a slight improvement in the input-output delay. Moreover, it is interesting to consider the case of using only the finger sensor, which simplified the system. In this case, if the first four raw moments of the tactile image were registered, the achieved classification accuracy was 91.7% without PCA and 90.9% with PCA and three components. Therefore, the classification accuracies were similar, although the consumption of resources by feature extraction was much higher when PCA was applied, and the power consumption of this part was 34 mW versus 4 mW without PCA. However, the time of the feature extraction plus the input-output delay of the classifier was 65.7 µs with PCA and 82 µs without PCA.
In summary, with the setup and set of classes used in this paper, the procedure that did not apply PCA was better, because the proposed strategy based on four simple features from the raw moments graphs resulted in a very efficient realization. An improvement in the input-output delay was observed with PCA, which was more significant if only the finger sensor was used. The choice between both options also depends on the complexity of other tasks that have to be carried out in real time.
Further works could be carried out considering different aspects. First, the influence of the object size and spatial resolution of the tactile sensors on the classification performance and consumption of resources should be assessed. Second, other learning and classification algorithms could be implemented and evaluated. Third, the capability for recognizing the object compliance has to be integrated with other functions, such as that of texture detection. Finally, the proposed procedure could be implemented in other commercial artificial hands and grippers equipped with tactile sensors.

Data Availability Statement:
The processing codes and data segments can be obtained by contacting the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.