1. Introduction
In recent decades, conventional car keys have been replaced by remote keys, or key fobs, for keyless ignition systems. A Passive Entry Passive Start (PEPS) module enables the driver to unlock the car and start the engine in an easier way. People first pair their car and key fob. After that, when the driver with the key fob approaches the car, the door is unlocked automatically. The driver gets into the car, presses the start button, and then the car engine starts, as shown in
Figure 1. It is definitely more convenient. Elderly drivers or arthritis patients may experience difficulty when trying to grip a car key and insert it into the keyhole. Alternatively, people may have too many things in their hands, making it inconvenient to take the car key out of their bag. Keyless ignition systems improve a lot in these scenarios. Wireless network technologies, such as NFC (Near-Field Communication), UWB (Ultra-Wide Band), and BLE (Bluetooth Low Energy), have been adopted in the past few years [
1,
2,
3,
4]. They have the advantage of low power consumption. As a result, key fobs using regular batteries can work for several months or longer. These technologies also support secure data transmission and accurate positioning. The car can detect whether a paired key fob is close to the car within a certain distance or in the car. However, new attack types are also emerging [
5]. A skilled car thief can read the unique wireless signal from the key fob in the air and thus make a duplicate key without having the original key to hand. In this sense, forgery is easier than for conventional car keys. However, in another scenario, since the driver does not have to take the key out from the keyhole, they may sometimes leave it in the car. Anyone can then start the engine and drive the car away.
Fast Identity Online (FIDO) [
6,
7] is an emerging open standard for passwordless and multi-factor user authentication. It provides faster, easier, and more secure sign-ins to websites and apps across users’ devices. It adopts public key cryptography so that a user and a service can mutually identify each other. Phishing attacks and MITM (man in the middle) attacks are prevented effectively. It also supports the use of biometrics for user verification on the user’s devices. During user registration with the service, the user’s device creates a new public and private key pair that is unique for the user, the device, and the service. The user verifies the creation of the key pair via local authentication method on the device, such as via a local PIN (Personal Identification Number), biometrics, or a physical security key. The device keeps the private key and registers the public key with the service. When the user wants to sign into the service, the server will send a challenge to the user’s device. The user selects (or unlocks) the private key stored on the device via the above local authentication method, so that the device can use the private key to sign (or encrypt) the challenge and send it as a response back to the server. The server then uses the public key to decrypt the response and identify the user, as shown in
Figure 2. Since the challenge is a nonce (or a one-time random number), a replay attack is prevented. Note that in a FIDO environment, the private part of a user’s credential is never sent to and stored on a server.
Biometrics [
8], such as those using a fingerprint, voice, face, or iris, are now widely used in the user authentication of laptop computers and handheld devices. However, it is not difficult to obtain one’s voice recordings and face photos. The same goes for fingerprints, which may remain on cups, mobile phones, or car door handles. Some flagship smart phones may have a high-resolution infrared camera to perform iris scanning. The distance between the eyes and the camera is small, generally between 10 and 20 cm. However, in some cases, wearing glasses or contact lenses may affect the effectiveness of iris recognition. In addition to these, recent studies have shown the potential of the electrocardiogram (ECG or EKG) [
9,
10] in personal identification [
11,
12,
13].
In this article, we propose using smart watches capable of sensing an ECG signal as smart car keys. Today, these mobile devices for smart health monitoring are becoming more popular in consumer markets [
14]. When a driver wearing such a smart watch approaches a paired PEPS car, the driver can actively trigger the smart watch to sense the driver’s ECG. Or the smart watch is automatically triggered when it detects the car. After the driver is verified based on the sensed ECG, the smart watch signs into the PEES module. Once the driver is identified and authorized, the PEPS module can unlock the car automatically and allow the car engine to start.
An ECG is a recording of the electrical activity of the heart. A 12-lead ECG, commonly used in hospitals, uses 10 electrodes placed on the patient’s limbs and on the surface of the chest to measure the electrical potential of the heart from twelve different angles. In a normal cardiac cycle (or a heartbeat), there are three main components, which are the P wave, QRS complex, and T wave, as shown in
Figure 3. Features like the median, range, variance, width, and slope of the RR interval, PR interval, ST interval, QT interval, PR segment, and ST segment, and ratios among them, are often used in ECG analysis [
10]. The ECG is useful in many heart disease diagnoses, such as cardiac arrhythmia [
9,
10].
Biosensors such as ECG and PPG (photoplethysmogram) are used to measure cardiac parameters on consumer electronics [
14]. An ECG measures the electrical potential difference between two electrodes. The electrodes sense the weak electrical signals. Then, the analog signal is amplified, filtered, and digitized. However, it is more difficult to accurately measure an ECG signal on smart watches than through the 12-lead ECG in hospitals. Smart watches are small. The ECG measured on smart watches is typically one-lead and less precise in terms of the sample rate and resolution. Smart watches are worn on the wrist and thus may move in random directions and speeds when the arm shakes or moves. Sometimes, they may not fit well with the skin. The ECG reading may be affected by physical shock, strong electromagnetic interference, extreme high or low temperatures, excessive humidity, and sweat. The quality, flexibility, and ease of ECG acquisition are open issues.
PPG technology uses a light source (usually LED) and a light sensor (usually a camera) to measure changes in the blood volume [
14]. It has been shown that PPG signals are strong correlated with ECG waveforms. PPG devices are generally smaller and easier to integrate into everyday wearable consumer electronics. They are usually less expensive. ECG provides more accurate and detailed data about the heart activity compared to PPG.
In this study, we consider commercial smart watches. The availability and affordability of the smart watch in public markets is a key concern. The ECG sample rate of the watch used in the experiment is only 60 Hz. We use two deep learning algorithms to recognize the driver. ECG signals measured from 15 subjects are first preprocessed, segmented into ECG cycles, and recognized by two deep learning models, Long Short-Term Memory (LSTM) [
15] and Auto Encoder [
16], with different training strategies. The experiment results show that LSTM models have achieved the best accuracy score for identity recognition (91%) when a single ECG cycle is used. However, it takes at least 30 min to train an LSTM model. The training of a personalized Auto Encoder model takes only 5 min. When 15 continuous ECG cycles are sensed in less than 20 s and used, it can achieve 100% identity accuracy.
The personalized Auto Encoder model can be trained using only the driver’s ECG signal. This will simplify the management of ECG recordings extremely, as well as the integration of the proposed technology into PEPS vehicles.
2. Related Works
For biometric authentication, the ECG has characteristics such as universality, uniqueness, permanence, and collectability. Specifically, universality indicates that every individual has ECG information; uniqueness indicates the distinctiveness of each individual’s ECG information; permanence indicates that some properties of an individual’s ECG information remain unaltered over a long period; and collectability indicates that ECG information can be acquired conveniently. Two additional characteristics are required for using and trusting ECG-based personal identification. One is the speed of acquiring ECG information from an individual and processing the information. The other is the accuracy of the ECG-based personal identification technology [
13].
Similar to the research development in computer-aided heart disease diagnosis based on ECG signal analysis [
10], the recent research trend of ECG-based biometrics has seen a move to the adoption of deep neural networks, such as CNNs (Convolutional Neural Networks), RNNs (Recurrent Neural Networks), and LSTM (Long Short-Term Memory) [
13,
17].
Salloum et al. [
17] in 2017 applied LSTM to recognize ECG-based biometrics on two pre-filtered open datasets: ECG-ID [
18] and MIT-BIH [
19]. The recordings in the ECG-ID dataset were digitized at 500 Hz with 12-bit resolution over a normal ±10 mV range, while the recordings in MIT-BIH were sampled at 360 Hz with 11-bit resolution. For records in the ECG-ID database, wavelet analysis was applied to correct baseline wander. An adaptive bandstop filter was used to fairly suppress power-line noise. Then, a lowpass filter was used, where the passband and stopband edge frequencies were 40 and 60 Hz, respectively. The Pan–Tompkins algorithm [
20] was used to find the PQRST complex. Once R peaks were detected, ECG segments (or cycles) were formed by concatenating samples before and after R peaks. The waveform of each ECG segment was fed into an LSTM model. Finally, the softmax function was used in the output layer. When nine continuous ECG waveforms were processed, the LSTM model could achieve nearly 100% accuracy.
Cabra et al. [
21] in 2018 evaluated different machine learning algorithms for ECG-based authentication and gender recognition on the ECG-ID [
18] and CYBHi [
22] datasets. The recordings in the CYBHi dataset were digitized at 1 kHz with a 12-bit resolution. ECG recordings were first bandpass filtered, and segmented by the Pan–Tompkins algorithm [
20]. Then, 10 features were extracted from each ECG segment. The experimental results showed that the accuracy of Support Vector Machine (SVM) in ECG authentication was 99.2%, and the accuracy of k-Nearest Neighbors (KNN) in gender recognition was 95.1%.
Lee et al. [
13] in 2022 adopted ensemble learning for ECG-based authentication on the CU-ECG dataset. The sample rate was 500 kHz. Deep learning models, such as LSTM and CNN [
23,
24,
25], were stacked together into various structures and trained to find the best ensemble for ECG authentication. In the preprocessing, a low-pass filter was used, and then ECG data were segmented with respect to R peaks. The signals were also transformed into 2D images using different time-frequency transforms, such as STFT (Short-time Fourier Transform), Scalogram, FSST (Fourier Synchrosqueezing Transform), and WSST (Wavelet Synchrosqueezed Transform) [
26]. The images were fed into 2D-CNN models, such as VGG-19 [
23], ResNet-101 [
24], and GoogleNet [
25]. The experimental results showed that the performance of LSTM models and 2D-CNN models was improved by LSTM-2D-CNN ensemble models.
Table 1 shows these works in ECG-based authentication on different datasets.
These approaches adopted supervised learning to identify one from many others. ECG datasets of many people were used to train a model. In this research, a personalized Auto Encoder model is also proposed, which can be trained using only the driver’s ECG signal.
In the literature, Ryan and Howes showed the relations between alcohol consumption, heart rate, and heart rate variability [
27]. Using both ECG and PPG monitoring together, Wang et al. used SVM to predict alcohol consumption and classify the subject into three classes: normal, light drinking, and drinking. Similarly, ECG and PPG data were bandpass filtered and segmented. Four features were extracted from each ECG segment, and another four features were extracted from each PPG segment. In total, eight features were considered by SVM, and the accuracy of drinking classification was 95% [
28].
3. Method
In this study, we adopted deep learning algorithms to recognize the driver using the ECG measured on the smart watch the driver wore. LSTM [
15] and Auto Encoder [
16] were investigated.
To be realistic in the PEPS scenario, we used a commercial smart watch to measure the ECGs of 15 subjects recruited in our experiment. Note that the availability and affordability of the smart watch in public markets are key concerns. Usually, the sample rate of the resultant ECG recordings is significantly lower than those of the datasets used in [
13,
17,
21], and so is the precision. However, speed and accuracy are required for using and trusting ECG-based personal identification [
13].
Similar to the approaches used in [
13,
17,
21,
28], each ECG recording was first preprocessed. It was bandpass filtered and then segmented. ECG segments were divided into training and test sets for machine learning. LSTM and Auto Encoder models were trained using the training set, and test sets were fed into the resultant models for evaluation.
In the PEPS scenario, two model types were considered. The first was one model for all drivers, and the second was one model for one driver. Models of the first type are similar to those proposed in [
13,
17,
21]. Note that differences between the two model types may raise different issues in management and security practices. However, those are out of the scope of this article. In the experiment, different training strategies were tested to examine the speed and accuracy of the models.
3.1. ECG Measurement
In this study, ASUS VivoWatch BP [
29] was used. It has built-in micro-electrical and optical sensors for ECG and PPG, respectively, and can measure a medical-grade 1-lead ECG. The sample rate of the ECG recording is only 60 Hz. Measurements were taken in two scenarios. The first kind of measurement was taken several times randomly in the subject’s daily life. The second measurement was taken twice, specifically at 15 and 30 min after drinking, as the human body will begin to absorb the alcohol after 10 to 15 min of drinking, and the liver will begin to metabolize the alcohol after 25 to 30 min of drinking [
27,
28].
3.2. Data Preprocessing
There are three common types of noise in an ECG. Muscle artifacts usually arise from noise between the electrodes and the skin or are caused by muscular activities. Baseline wander is primarily caused by respiration, body movements, sweating, poor electrode contact, and skin electrode impedance. Electromagnetic interference (EMI) can be caused by nearby electronic devices, high-voltage power sources, electromagnetic waves, metallic substances within the body, and static interference in dry environments. Muscle artifacts and baseline wander are low-frequency noise, whereas electromagnetic interference is high-frequency noise.
Similarly to [
17,
21,
28], each ECG recording was bandpass filtered first. We used the Pan–Tompkins algorithm [
20] to find R peaks in the experiment. In practice, however, since we do not have to consider all heartbeats in the PEPS scenario, R peaks can also be found simply by a threshold, as shown in
Figure 4a. The threshold can be determined in the model training phase and then used in the prediction phase. A normal heart rate for adults is between 60 and 100 beats per minute. Assuming 80 beats per minute on average, there are 45 sample points per beat when the sample rate is 60 Hz. Here, we extracted a fixed-length ECG segment for each R peak. The system looks forward and backward, with 22 sample points from the R peak in both time directions, and makes a fixed-length ECG segment, as shown in
Figure 4b.
3.3. Deep Learning Models
In this study, two deep learning neural networks were investigated for ECG-based personal identification: LSTM [
15] and Auto Encoder [
16]. Two training strategies were tested. The first is to train one model for all users. The second is to train one model for one user.
3.3.1. Long Short-Term Memory (LSTM)
An RNN (Recurrent Neural Network) was designed to handle sequence data. However, when a sequence is long, gradient exploding and vanishing problems might occur in the training process of an RNN. Accordingly, LSTM enhanced the RNN [
15].
Figure 5 shows the architecture of LSTM, in which the input
Xt of a LSTM cell is a vector at time
t of a time sequence
X = [
X1,
X2, …,
Xn],
ht is the output of the LSTM cell, and
Ct is the derived information kept in the memory of the LSTM cell. Note that
ht is also referred to as the hidden state of the cell. Three gates are used to deal with the gradient vanishing problem. The forget gate determines whether the past information should be erased from the cell; the input gate determines whether new information is kept in the cell; and the output gate determines which information is to be outputted.
σ() and
tanh() are sigmoid and hyperbolic tangent functions, respectively. They serve as activation functions in the LSTM cell. Once
Xt is processed, the cell will process the next input
Xt+1 accordingly.
Equations (1)–(6) show how the LSTM cell updates. In addition to
Xt,
ht−1 and
Ct−1 are also taken into account, which are the hidden state and the past information kept in the memory.
WC and
BC are the weight and the bias for new information calculation.
Wf and
Bf are the weight and the bias of the forget gate;
Wi and
Bi are those of the input gate; and so are
Wo and
Bo of the output gate.
Figure 6 shows the proposed LSTM model for ECG-based personal identification. The input is the waveform of a single ECG segment, which is a one-dimensional time sequence with 44 data points. The LSTM layer has 64 hidden LSTM cells. A data point is fed into all 64 cells, and a 1 × 64 vector is generated. When all input data points are processed, the final output of the LSTM layer is then fed into three fully connected layers. The activation functions used in these layers are ReLUs (Rectified Linear Units). In the last layer,
p is the number of subjects. One additional class refers to no match. Finally, the output
Y states whether the input ECG segment is associated with the right subject. In the experiment, unidirectional LSTM and a cross-entropy loss function were adopted.
3.3.2. Auto Encoder
Auto Encoder [
16] is usually used in unsupervised learning. This can take for form of a multi-layer artificial neural network, consisting of an encoder and a decoder. The encoder transforms the input vector into a more efficient representation, simply referred to as a code, while the decoder tries to reconstruct the input vector from the code, as shown in
Figure 7. In other words, an Auto Encoder learns two functions: an encoding function to transform the input vector
X into a code
Z, and a decoding function to reconstruct
X from
Z. The learning objective is to minimize the difference between the input
X and the output
X’. In this setting, there is no need to annotate the output label for all training and test data.
When the length of
Z is shorter than
X, the encoder performs dimensionality reduction, just like in a PCA (Principal Component Analysis). As a result, the output of the encoder is sometimes referred to as a feature vector, which can be used in a following classifier.
Figure 8 shows the concept of Auto Encoder-based classification, where the encoder of a well-trained Auto Encoder is adopted as the feature extractor. Then, a classifier can be trained using supervised learning algorithms, such as SVM and neural networks.
Figure 9 shows the proposed Auto Encoder model for ECG-based personal identification. Again, the input is the waveform of a single ECG segment. After two LSTM layers, the encoder transforms the waveform into a feature vector whose length is 16. In the decoder, the RepeatVector layer duplicates the feature vector, and it feeds the two vectors into another two LSTM layers. Finally, the Dense layer constructs an output waveform, which should be very similar to the input. The loss function is the SSE (the sum of squares due to error).
When the training strategy is one model for one user, the ECG-based personal identification becomes a one-class classification problem. For Auto Encoder models, we can identify the user based on the difference between the input
X and the output
X′.
Figure 10 shows the distribution of the SSE loss of the personalized Auto Encoder model trained for one of the subjects recruited in the experiment.
In
Figure 10, the SSE loss between the reconstructed waveform and the input ECG segment of the target subject is usually significantly smaller than the loss of other subjects. It is easy to compute a threshold of the SSE to distinguish the target subject from others. In this case, the threshold is 4.
5. Scenario Discussion
We assume there is a FIDO [
6,
7]-like environment.
When a car owner wants to pair a smart watch with a PEPS car, the owner sits in the car and enables the pairing function by providing the credentials to back up their car ownership. The driver opens a specific app on the watch to start the pairing procedure. When the watch and the car detect each other, the car sends its certificate to the watch. The watch creates a public and private key pair that is unique for the driver, the watch, and the car. At the same time, the driver selects a local authentication method to access the private key on the watch, for example, ECG biometric authentication. Then, the driver is required to make a 5 min ECG recording and upload the recording to a specific secure cloud computing service. The cloud can build a personalized Auto Encoder model for the driver without other materials. The watch downloads the model and erases that and the ECG recording from the cloud completely. The watch keeps the private key and makes a request to register the public key with the car. Finally, the car owner approves the registration.
Figure 13 shows the concept of the pairing procedure. The entire pairing process is expected to be completed in about 15 min for a practiced car seller, including 5 min for ECG recording, 5 min for model training, and time for uploading the ECG recording and downloading the model.
When the driver walks close to the paired PEPS car, the smart watch is triggered by the driver or the car to sense the driver’s ECG for 20 s, which invokes the personalized Auto Encoder model to verify the driver. Once the driver is verified, the watch can access the private key of the driver stored on the watch to communicate with the car. When the car identifies the driver, it unlocks the door and becomes ready to start.
In this scenario, the private key for the driver never leaves the smart watch and is accessible only when the driver is verified on the watch using ECG biometric authentication. Phishing attacks, MITM attacks, and replay attacks can be effectively prevented in this way. ECG recordings are never stored in the watch. They leave the smart watch only for the purpose of building the personalized Auto Encoder model in a specific secure cloud computing service. Once the model is built, the ECG recordings are erased completely and will not be used for any other purposes. The privacy issue is thus addressed.
However, it is challenging to integrate emerging IoT technologies with vehicles, as well as to manage these IoT devices, environments, and services. The above scenario may fail for many possible reasons, for example, when the battery is dead, networks are down, or the ECG sensor malfunctions. The smart watch should not be the only key to unlock the car door and start the engine. Real scenarios are probably further complex; for example, the car owner may want to edit the list of registered drivers, car ownerships may change, and car manufacturers and sellers have to manage the certificates of their cars. These challenges are out of scope of this article.
6. Conclusions and Future Works
In this study, we have proposed using a smart watch as a novel car key in PEPS vehicles. Smart watches capable of sensing ECG signals for smart health monitoring are becoming more popular in consumer markets [
14]. Based on ECG signals recorded on such smart watches, two deep learning models have been presented to recognize the user. In the PEPS scenario, the car owner first approves the registration of a driver and a watch with a PEPS car. Thereafter, when the driver approaches the car, the watch is triggered to sense the driver’s ECG and then the proposed models are activated to verify the driver. Once the driver is verified by the watch and then identified in the approved driver list on the car by PEPS module, the car unlocks its doors and becomes ready to start.
Wireless network technologies, such as Wi-Fi, NFC, UWB, and BLE, support secure data transmission and accurate positioning of the smart watch when it is near or in the car. We assume a FIDO [
6,
7]-like environment, and public key cryptography is applied in communication between the watch and the car. The public and private key pair is unique for the driver, the watch, and the car. The private key is only accessible on the watch via local ECG biometric authentication. Phishing attacks, MITM attacks, and replay attacks can be effectively prevented.
LSTM and Auto Encoder have been investigated for their potential to recognize users based on their ECG. The former adopts supervised learning, and the latter adopts unsupervised learning. Two training strategies are evaluated: one model for all users, and one model for one user. In preprocessing, ECG signals are bandpass filtered first, and they are segmented into fixed-length ECG cycles according to R peaks. R peaks are found by the Pan–Tompkins algorithm [
20] in the training phase and by a threshold in the prediction phase. The experimental results show that for each driver, a personalized Auto Encoder model can be trained quickly. Although the precision of the ECG recording measured on smart watches is typically lower than that of a 12-lead ECG used in hospitals, when 15 continuous ECG segments are taken into consideration, the accuracy of driver recognition reaches 100%.
As a personalized Auto Encoder model is an unsupervised learning one-class recognizer, it can be built using only the driver’s ECG signal. Once the model is trained, the ECG information can be erased completely. Since one’s ECG will not be used in the model training for others, the privacy issue is addressed. The management of ECG recordings can thus be simplified extremely in this scenario.
In this study, we considered a commercial smart watch that is currently available in the market, specifically, the ASUS VivoWatch [
29]. The sample rate was 60 Hz only, and each ECG cycle had 44 data points. In the current setting, for pairing between a PEPS car and a driver wearing a smart watch, it took 5 min to measure the driver’s ECG on the smart watch and another 5 min to train a personalized Auto Encoder model. For driver authentication, it took 20 s to measure the driver’s ECG, which had at least 15 ECG cycles to guarantee a reliable accuracy. With the improvement of ECG sensing technologies [
30], which have higher sample rates and better signal processing, we believe that in the future, Auto Encoder-based models can use fewer ECG cycles and achieve a reliable accuracy, and thus the authentication time will be further shortened.
Note that the proposed ECG-based biometric human identification for PEPS vehicles is an add-on feature, and it must not be the only way to unlock the car door and start the engine. This is because scenarios may arise where the smart watch is out of power or cannot detect the driver’s ECG correctly. Since there are many, and will be more, smart watches capable of sensing ECG signals correctly, vehicle manufacturers can select certain ones for their products from a business perspective.
Future Works
It is challenging to integrate IoT technologies with vehicles, as extreme safety, reliability, and security should be taken into consideration, and real scenarios are probably further complex. Currently, new wireless network technologies, such as mmWave, are widely discussed in vehicle-to-infrastructure communication [
31]. They enable gigabit-per-second data rates for massive data. Additionally, AI-empowered edge computing for consumer electronics has emerged [
32]. Currently, we are investigating ISO 26262 [
33,
34,
35], titled “Road vehicles—Functional safety”.
Automotive safety is always an important issue. In that regard, smart watches have safety credentials given they are widely used in healthcare, such as for monitoring the blood pressure, oxygen saturation, heartbeats, pulse rate, sleep habits, and physical activities [
36]. In particular, in the automotive field, since drunk driving is very dangerous, drunk driving detection is important. We are currently conducting experiments on drunk driving detection based on the ECG signal measured on the driver’s smart watch. We will also investigate algorithms with different principles, such as the use of non-linear filters, Poincaré plots [
37], and so on.