Machine Learning for Touch Localization on an Ultrasonic Lamb Wave Touchscreen

Classification and regression employing a simple Deep Neural Network (DNN) are investigated to perform touch localization on a tactile surface using ultrasonic guided waves. A robotic finger first simulates the touch action and captures the data to train a model. The model is then validated with data from experiments conducted with human fingers. The localization root mean square errors (RMSE) in time and frequency domains are presented. The proposed method provides satisfactory localization results for most human–machine interactions, with a mean error of 0.47 cm and standard deviation of 0.18 cm and a computing time of 0.44 ms. The classification approach is also adapted to identify touches on an access control keypad layout, which leads to an accuracy of 97% with a computing time of 0.28 ms. These results demonstrate that DNN-based methods are a viable alternative to signal processing-based approaches for accurate and robust touch localization using ultrasonic guided waves.


Introduction
The popularity of tactile sensor technologies in daily life (e.g.cell phones, access keypads, smart screens) leads to an increase in the demand for low-cost robust touch interfaces.Amongst the existing technologies [1,2,3], tactile surfaces based on ultrasonic guided waves answer the need for converting non-planar surfaces (including plastic material and transparent glass) into high resolution and durable tactile sensing surfaces.Moreover, this technology supports multi-touch detection and can estimate the contact pressure [4,5].Acoustic based tactile sensors exploit the Surface Acoustic Waves (SAW) [6,7], or guided or Lamb Waves (LW) [8,4,9,10,11,12,13].SAWs travel over the surface [14], whereas LWs propagate throughout the thickness of the material.The SAW tactile surfaces are vulnerable to surface contaminants such as liquid and scratches, and can only offer limited multiple-finger commands.On the other hand, Lamb Wave Touchscreen (LWT) technology supports multi-touch, which makes it more suitable for interfacing with smart devices positioning accuracy of 95%.To improve the localization performance, Li [22] proposed a ML-based lamb waves scatterer localization method, called CNN-GCCA-SVR.They trained a CNN-GCCA (deep version of generalized canonical correlation analysis) to extract the features, then used a Support Vector Regression (SVR) model for the localization task.This algorithm provides precise prediction using only one actuator and two sensors, with the localization errors between 2 mm and 12 mm depending on the sensing configurations.
The development of ML and DL methods in relation with ultrasonic Lamb wave technologies in recent years, and the demonstrated performance of AI in localization as mentioned above, encourage further investigations on ML and DL algorithms to improve accuracy in touch localization.In this study, a simple Deep Neural Network (DNN) is proposed to localize a finger in contact with a surface, using classification and regression approaches.The proposed approach also shows that a neural network can be trained for a specific keyboard layout to perform classification with high accuracy.To the best of found knowledge, this is the first time that a simple fully connected neural network is used for a robust touch localization on a tactile surface when exploiting ultrasonic guided waves.This work is organized as follows.In section 2, the hardware setup is described.The dataset is introduced in section 3.In section 4, the localization methods employing classification and regression are investigated.The results are discussed in section 4.3.The applications of the methods including access control keypad (section 5.1) and tracking touch by human fingers (section 5.2) are introduced in section 5, which is followed by a conclusion (section 6).

Hardware Setup
Figure 1 shows the processing pipeline.An array of 5 piezoceramic elements soldered to a flexible Printed Circuit Board (PCB) is bonded to the touch surface with regular epoxy.A custom LabVIEW interface is used to 1) control a NI-9262 module that emits an ultrasound signal through a linear amplifier driving the emitting piezoceramic, 2) control a NI-9223 acquisition module to record ultrasound signals measured by the four receiving piezoceramics amplified by a custom preamplifier, and 3) control a collaborative Universal Robot UR5e having six degrees of freedom (DoFs), which touches a glass surface with an artificial silicon finger at a desired position.
During the acquisition process, an emission signal excites the emitting piezoceramic element, which convert it into a mechanical wave that propagates through the host structure.When a finger touches the structure, it modifies the surface mechanical impedance, inducing reflections of the vibration waves.These reflected waves are measured by the receiving piezoceramics, which convert them into electrical signals.
The touch surface is a tempered glass plate (shown in Figure 2) with dimensions of 20 × 20 cm and a thickness of 5 mm.The four receivers and the emitter are installed at the bottom of the glass, and consist of piezoceramic discs with a diameter of 6 mm and a thickness of 2 mm.The Universal Robot UR5e is equipped with a silicon finger with a diameter of 5 mm.Silicon is a material of choice for this application, as its mechanical impedance is close to a human finger.The x and y coordinates of the contact, such as contact pressure, are controlled accurately by the UR5e and recorded by the computer, which provides a reliable baseline.

Dataset
The robotic arm produces 6404 contacts at random position and pressure on the glass surface.We set the robot to acquire random contacts position over the touch glass.The experiment was  done for a given amount of time until the sufficient amount of data are generated.This dataset is split in training (70%), validation (20%) and test sets (10%) signals.For each touch, a linear chirp excitation signal of 1 ms duration from 50 kHz to 100 kHz is generated with a sampling frequency of 500 kHz.The receiver signals are acquired during 1 ms at a sample rate of 250 kHz. Figure 3 shows how the features are extracted in the time and frequency domains.In the time domain, the 250-sample signals acquired for each of the 4 receivers are simply concatenated.In the frequency domain, the real and imaginary parts in the frequency range between 50 kHz and 100 kHz are selected from the Fast Fourier Transform (FFT) after applying a Hann window, which leads to 4 vectors of 49 elements.These vectors are concatenated to create a feature vector of 392 elements.

Localization
Localization consists in predicting the horizontal and vertical coordinates of the finger on the touch surface.The model takes the input signal in the time or frequency domain, and predicts the 2D-coordinate with a discrete grid using a classification approach, or in a continuous space using a regression approach.

Classification approach
A classification approach is first investigated to estimate the contact point location on the touch surface.The 20 × 20 cm 2 surface is first divided into N × N zones (or classes) of sizes 20  N × 20 N cm 2 , as shown in Fig. 4. When class (i, j) is activated (where i ∈ {1, . . ., N } stands for the row index and j ∈ {1, . . ., N } for the column index), the estimated position of the touch contact corresponds to the center of mass of the zone, denoted as c i,j .Table 1 shows the nine different configurations explored in this work.As the grid resolution increases, the center of mass of each class gets closer to the exact touch position.However, the classification task also becomes more challenging, which can reduce the localization accuracy when the wrong class is selected.
Figure 5 shows a four-stage fully connected neural network to perform classification based on the input vector in the time (x ∈ R 1000 ) or frequency (x ∈ R 392 ) domain.Each stage consists  For each touch k in the training dataset made of K elements, a one-hot label y k ∈ [0, 1] N 2 is generated and corresponds to the zone that includes the touch position t k ∈ [0, 20] × [0, 20].The cross-entropy loss function is computed between the target y k and the prediction ŷk = f (x k |θ), where f (•|θ) stands for neural network with parameters θ, and x k is the measured signals in the time or frequency domains.The optimal parameters θ are obtained during training using the Adam optimizer.At test time, the neural network predicts the vector ŷm for each m out of M data points, from which the indices (i, j) * = arg max (i,j) ŷi,j are obtained.The estimated touch position tm ∈ [0, 20] × [0, 20] then corresponds to the center of mass of this zone, denoted as tm = c (i,j) * .

Regression
The regression approach aims at estimating directly the touch contact position.A four-stage fully connected neural network similar to the one introduced for classification is proposed in Figure 6.The architecture is identical, except for the output stage that contains only a linear layer that outputs a tensor ŷ ∈ R 2 , which holds the horizontal and vertical positions of the touch.

Results
The classification and regression approaches are validated with the test dataset.The localization performance is evaluated by comparing the estimated touch position and the baseline, using the Root Mean Square Error (RMSE): where • 2 stands for the l 2 norm, and M corresponds to the number of data points in the test dataset.Note that the RMSE metrics penalize equally the position error in the horizontal and vertical dimensions.Figure 7 shows the RMSE (cm) as a function of the grid resolution N when using classification, and the RMSE when performing regression.The RMSEs are presented in time (blue) and frequency (red) domains.A 2 × 2 grid provides a mean localization error of 2.18 cm (standard deviation: 0.85 cm) and 2.1 cm (standard deviation: 0.8 cm) with frequency and time domains features, respectively.The RMSE reduces considerably when the classification touch zones are increased from 4 to 100 grids, as the center of mass of each class converges to the exact touch position when the grid resolution increases.With a 10 × 10 grid, the classification accuracy of 90 % is achieved, which leads to a mean localization error of 0.41 cm (standard deviation: 0.25 cm) and 0.4 cm (standard deviation: 0.24 cm) with frequency and time domains features, respectively.The regression approach provides the mean localization error of 0.44 cm (standard deviation: 0.2 cm) and 0.45 cm (standard deviation: 0.21 cm) with frequency and time domains features, respectively.
The results demonstrate that the RMSEs in the frequency and time domains are in the same range.

Applications
The results so far aimed at measuring and comparing the localization accuracy with respect to the model architecture and grid resolution.However, in a real scenario, the localization accuracy depends on the interface layout, and it is therefore important to measure the performance of the proposed architecture in such realistic configuration.So far, the data were collected using a robot, which aims to mimic a human finger, but slightly differs due to material difference and pressure variation.This section shows how the proposed method performs with a virtual keypad interface, and illustrates the localization accuracy when a human user draws a shape with his finger.The feature extraction in frequency domain will be taken into consideration since it reduces the number of input parameters (392) compared to the time domain approach (1000).

Touch localization on a virtual keypad interface
The classification model previously introduced can be adapted and used to detect the touch coordinates on a virtual access control keypad, such as the one shown in Figure 8.This approach is appealing as the neural network is optimized to detect the keys for a given layout, which maximizes the localization accuracy.The same model architecture and dataset as the one formerly used for classification is chosen, except only the first two-stage fully connected neural network are considered.The first stage outputs the tensor h 1 ∈ R 100 , while the second stage generates h 2 ∈ R 50 .The output stage generates a tensor with 13 dimensions, where the first 12 classes correspond to the keys (1, 2, 3, 4, 5, 6, 7, 8, 9, *, 0 and #), and the last class represents the zone surrounding the keys (L).
The neural network predicts a vector with 13 elements, and the index of the element with the maximum value corresponds to the selected class.The confusion matrix in Figure 9 shows the performance of the classifier.According to the classification report, the overall accuracy is 97 % with a computing time of 0.28 ms, using CPU (Intel Core i5) .In this type of application, it is critical to avoid misclassification of a pressed key with another key.On the other hand, misclassifying a pressed key for the layout zone is less critical as the touch is simply ignored by the interface.The matrix diagonal represents the correct predictions as the true and predicted locations match.The proposed method classifies the pressed keys with an accuracy of 97 %.There is no misclassification of a pressed key with another key.However, the neural network confuses some keys with the layout zone (class L) which can be disregarded by the interface.
To provide a basis for comparison with potentially simpler approaches, the ML approach kNN (with k=1) [33] has been also used to detect the touch coordinates for the access control keypad.Figure 10 shows the accuracy of the DNN and kNN approaches using 100 %, 80 %, 60 %, 40 % and 20 % of the training data set.For example, D-20 corresponds to the 20 % of the training data set.The accuracy drops from 97 % (96 %) to 89 % (87 %) when the size of data is decreased by 80 % using DNN (kNN).Accuracy of the DNN method is always greater than the kNN considering different size of dataset.This shows an improvement in test accuracy as the training set is enlarged.
Figure 11 compares the computing time between DNN and kNN approaches.These results indicate that the DNN approach is 11.67 times faster than the kNN approach.In the kNN approach, compressing the training set will reduce the time (by 2.58 ms) needed to search for number of neighboring points, thus speeding up the process.On the other hand, this leads to decreased accuracy as shown in Figure 10.In DNN, expanding the data improves the accuracy without increasing the computational time.Thus, the DNN approach is more effective in terms of accuracy and computational time.These observations motivated the choice of the DNN approach over the kNN approach.

Localization of a human finger
The proposed architectures with classification (Figure 5) and regression (Figure 6) are also validated with a real human finger that draws a circle on the touch glass.Figures 12 and 13 show the human finger localization with classification with 10 grid resolution (C-10) and regression approaches, respectively.
The blue dots show the true position of the finger and the red squares indicate the predicted positions.Figure 14 presents the localization RMSE using classification (C-5 and C-10) and regression.The regression approach provides the mean localization error of 0.47 cm, with the standard deviation of 0.18 cm which leads to a minimum error of 0.29 cm which is in the range of localization error in Figure 7.However, mean localization error is 0.69 cm, with the standard deviation of 0.29 cm using classification with 10 grid resolution (100 classes) and exceeds 1 cm (with the standard  deviation of 0.54 cm) when the number of classes drops to 25.The computing time is 0.44 ms for each test sample.The small amount of training samples in each class enlarges the RMSE when using classification.Moreover, the localization error is more pronounced when the grid resolution reduces as the center of mass of each class diverges from the exact touch positions.This can explain the poor performance of C-5 and C-10 as shown in Figure 7. Regression overcomes these issues, and also offers the best results.

Conclusion
Localization techniques with classification and regression have been investigated on a glass touch surface with ultrasonic Lamb waves technology.The five piezoceramic elements are used as one emitter and four receivers installed at the bottom of the surface.In this study, a simple fully connected neural network is proposed to perform touch localization on the glass plate.The aim is to reduce the computational complexity of the analytical imaging approaches associated with touch localization technique.A robotic arm with silicon finger simulates the touch action with random position and contact pressure to train the deep neural network.Frequency-domain features are selected as it reduces significantly the number of input parameters compared to a time domain approach.The proposed processing architecture is then validated with a human finger to localize the touches, which leads to a mean error of 0.47 cm and standard deviation of 0.18 cm.The computing time is 0.44 ms for each test sample.The classification approach is applied for touch detection on an access control keypad, which provides an accuracy of 97% with a computing time of 0.28 ms for each unseen example.During the data acquisition, the human fingers can be swiped on the screen and the touch pressure can change.This sets a limit on the accuracy of signal measurement when gathering human finger data.The difference between signals generated by the human finger and the artificial finger should be taken into account.The human finger and artificial finger localization errors (presented in fig 7 and 14, respectively) are in the same range.The results validate the similarity between the signals created by the simulated touch and those generated by the real touch.The performance of the classification for the touch localization is however limited by grid resolution.As the grid resolution decreases, the center of mass of each class moves away from exact touch position, decreasing the localization accuracy.As shown in Figure 7, increasing the grid resolution improves the localization accuracy for the classification approach.Under such circumstances, the regression remains valid to localize the touches.
This analysis demonstrates the viability of the regression with a four-stage fully connected neural network for touch localization on ultrasonic Lamb wave touchscreen.The classification with a twostage fully connected neural network is preferred for the touch zone detection due to its ability to provide high-precision touch detection on an access control keypad.The results indicate that current analytical, and computationally intensive, touch localization algorithm with simple fully connected layer is possible.
In future work, this could be extended to multi-touch scenarios.Multi-touch artificial fingers can be designed to acquire multi-touch signals.The double-touch simulator is shown in [5] more accurate results but, in return, may require more time to collect the data.Moreover, the robustness of the model can be affected by the touch pressure of human fingers.In multi-touch gestures, it is critical that all the fingers will be pressed and released while taking data to avoid mislabeling of the number of touches.The proposed model in the current study, is easily scalable to surfaces with ultrasonic Lamb waves technology of different shape, size and material including not being limited to plastic and metal.

Figure 1 :
Figure 1: Schematic of the experimental setup.

Figure 2 :
Figure 2: Hardware setup with the touch glass and the robotic arm.

Figure 3 :
Figure 3: Features extraction in the time and frequency domains.

Figure 4 :
Figure4: Classification touch zones.When zone (i, j) gets selected, the estimated touch position corresponds to the center of mass of this zone, denoted as c i,j .

Figure 5 :
Figure 5: Neural network architecture for classification.
For each touch k in the training dataset, the target y k ∈ R 2 corresponds to the touch position t k ∈ [0, 20] × [0, 20].The mean square error (MSE) loss is computed between the target y k and the prediction ŷk = f (x k |θ), where f (•|θ) stands for the neural network with parameters θ, and x k is the measured signal in the time or frequency domain.The estimated touch position tm ∈ [0, 20]×[0, 20] then corresponds to the prediction ŷk .

Figure 7 :
Figure 7: Comparison of localization RMSE (cm) between classification with grid resolution N (C-N ), and regression (R) in frequency and time domains.

Figure 9 :
Figure 9: Confusion matrix with the access control keypad.

Figure 10 :
Figure 10: Comparison of accuracy between DNN and kNN approaches considering different data size, N% of the data (D-N) for the access control keypad.

Figure 11 :
Figure 11: Comparison of computing time between DNN and kNN approaches considering different data size, N% of the data (D-N) for the access control keypad.