Implicit Identity Authentication Method Based on User Posture Perception

Hu, Bo; Tang, Shigang; Huang, Fangzheng; Yin, Guangqiang; Cai, Jingye

doi:10.3390/electronics14050835

Open AccessArticle

Implicit Identity Authentication Method Based on User Posture Perception

by

Bo Hu

¹,

Shigang Tang

²,

Fangzheng Huang

³,

Guangqiang Yin

¹ and

Jingye Cai

^1,*

¹

School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 611730, China

²

Shenzhen Silver Star Intelligent Technology Co., Ltd., Shenzhen 518110, China

³

Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(5), 835; https://doi.org/10.3390/electronics14050835

Submission received: 7 January 2025 / Revised: 6 February 2025 / Accepted: 19 February 2025 / Published: 20 February 2025

(This article belongs to the Special Issue Future Technologies for Data Management, Processing and Application)

Download

Browse Figures

Versions Notes

Abstract

Smart terminals use passwords and physiological characteristics such as fingerprints to authenticate users. Traditional authentication methods work when users unlock their phones, but they cannot continuously verify the user’s legal identity. Therefore, the one-time authentication implemented by conventional authentication methods cannot meet security requirements. Implicit authentication technology based on user behavior characteristics is proposed to achieve the continuous and uninterrupted authentication of savvy terminal users. This paper proposes an implicit authentication method that fuses keystroke and sensor data. To improve the accuracy of authentication, a neural network-based feature extraction model that integrates keystroke data and motion sensor data is designed. A feature space with dual-channel fusion is constructed, and a dataset collected in real scenarios is built by considering the changes in user activity scenarios and the differences in terminal holding postures. Experimental results on the collected data show that the proposed method has improved the accuracy of user authentication to a certain extent.

Keywords:

keystroke data; sensor data; implicit identity authentication

1. Introduction

Identity authentication technology is crucial for the security of user data, preventing unauthorized access by verifying user identity [1,2,3,4,5,6]. Based on whether the authentication process is perceived by the user, identity authentication technology can be divided into explicit and implicit authentication. Password-based authentication is a type of explicit authentication where users set personal identification numbers (PINs) or pattern-based passwords according to their habits for identity verification. However, numerical and pattern passwords are limited, easily reused, and require additional memory, making them prone to being forgotten. Moreover, passwords are at risk of being leaked, resulting in the poor security of password-based authentication methods [7,8,9,10,11,12,13,14,15,16].

To address the drawbacks of password reuse, forgetfulness, and leakage, token-based authentication techniques have been proposed. Token authentication uses hardware devices like smart cards that store private keys to authenticate users. However, since these hardware devices need to be carried around, this method faces issues like loss, theft, and forgery. Passwords and tokens used for authentication do not establish a direct one-to-one link with user identity, leading to weak security in these methods.

Biometric authentication techniques use physiological features such as fingerprints [17], facial recognition [18], voiceprints [19], and iris recognition [20] to identify users. Currently, fingerprint and facial recognition have shown good performance and are widely used in smartphones. However, fingerprint recognition is affected by finger moisture and fingerprint wear, and fingerprint data can be easily stolen. Facial recognition is influenced by lighting, angle, facial expressions, and makeup, making it limited in practical use. Additionally, explicit authentication techniques can only authenticate the user once when unlocking the phone and cannot continuously verify the user’s legitimacy.

In contrast, implicit authentication technology identifies unauthorized users through behavioral characteristics such as gait while carrying the phone, touchscreen patterns during tap or swipe operations, and keystroke dynamics [21]. Since the sensors embedded in phones can directly capture behavioral data without requiring users to perform specific actions, implicit authentication technology can achieve transparent, unobtrusive, and continuous identity verification. It is worth noting that behavioral characteristics are formed through a combination of innate neural, muscular, and skeletal systems and acquired habits, making them difficult to record and mimic. Therefore, implicit authentication technology based on user behavior can achieve high security. Modern smart terminals are equipped with sensors like accelerometers, gyroscopes, magnetometers, orientation sensors, and touchscreens [22], facilitating the collection of user behavior data and improving the performance of implicit authentication technology.

However, there is still room for improvement in the performance of existing implicit authentication methods. On the one hand, the limited representation capability of constructed user behavior features leads to poor authentication accuracy. On the other hand, the performance of existing methods varies across different application scenarios, requiring enhancements in robustness and practical usability. Thus, exploring comprehensive user behavior representation and designing a framework for implicit authentication methods is crucial for achieving better accuracy, robustness, and practical usability, providing continuous and reliable security for user-smart terminal interactions.

This paper proposes an implicit authentication method based on user posture awareness to improve authentication accuracy. It constructs a staged cascade authentication framework that first recognizes the user’s posture and then authenticates their identity. Firstly, the motion sensor data acknowledges the user’s posture. Then, a two-channel user classification model is designed by fusing the number of keystrokes, enhancing the method’s robustness and improving authentication accuracy. The main contributions of this thesis are as follows:

An implicit authentication dataset containing two activity scenarios and three holding postures is constructed.
An algorithm for recognizing user gestures using sensor data is proposed, which solves the problem of too significant feature differences caused by different operating gestures.
An implicit authentication method fusing keystrokes and sensor data is proposed. This method fully utilizes CNN’s local spatial feature extraction capability and LSTM’s temporal feature extraction capability to capture keystroke data and sensor data’s continuity characteristics and unique behavioral features.

2. Related Work

Keystroke characteristics are developed over the long-term use of the same phone, with significant differences in keystroke pressure, frequency, and usage habits of special keys among different users [23,24]. When a user performs keystroke operations, the touchscreen of the phone collects data such as keystroke time, touch point location, and keystroke pressure, reflecting the user’s keystroke rhythm and habits. Additionally, the built-in motion sensors can detect changes in the phone’s movement during keystrokes. Therefore, keystroke characteristics can be represented by both touchscreen data and motion sensor data, offering strong discriminative power for identity authentication.

Keystroke authentication can be categorized into fixed-text-based and free-text-based authentication, depending on whether the input text is known. The former refers to scenarios where the input text is fixed, such as entering a numeric password to unlock a phone. The latter refers to scenarios where the input text is unrestricted, such as chatting on social media. In fixed-text-based keystroke authentication, researchers have constructed datasets by collecting touchscreen data during password entry, extracting keystroke features for identity authentication, including time features [25,26,27,28,29], pressure features [26,29], and location features [29]. Buchoux et al. [28] developed an application to collect touchscreen data from 20 users entering passwords and built a feature set by calculating statistical features of inter-keystroke times, achieving a 2.5% FRR. Zahid et al. [26] further divided inter-keystroke time into times between horizontally adjacent keys, horizontally non-adjacent keys, vertically adjacent keys, and vertically non-adjacent keys, introducing key hold time and the number of backspace key uses to represent keystroke characteristics. This achieved a 0% FRR and 2% FAR on a dataset of 25 users.

In addition to time features, Zhang et al. [28] were the first to introduce pixel distance features and speed features of adjacent keys, achieving a 3.5% FAR. Zhang et al. collected time and pressure data from 10 users entering the same password using a self-developed application, achieving a 10.3% false positive rate and 91.7% true positive rate using a radial basis function neural network. Krishnamoorthy et al. [29] collected touchscreen data from 94 users entering passwords, extracting time, pressure, and location features to construct a 155-dimensional feature vector. Using a random forest classifier, they achieved a 98.44% accuracy rate. Besides touchscreen data, motion sensor data during keystrokes can enrich keystroke features. Zheng et al. [30] extracted features such as the amplitude at the time of pressing, the amplitude at the time of release, and the maximum and average amplitude during keystrokes from accelerometer data. Combining these with keystroke pressure, touch area, and duration features to form a feature vector, they determined whether the current sample belonged to an unauthorized user by calculating the distance to training data, achieving a 3.65% EER on a dataset of 85 users.

Fixed-text-based keystroke authentication has demonstrated good performance but is limited to scenarios where users enter passwords. Free-text-based keystroke authentication is more aligned with practical scenarios, but the lack of restrictions on input content increases the difficulty of keystroke authentication. In 2015, Kang et al. [31] studied the impact of different input text lengths on keystroke authentication performance, confirming that as the input text length increases, the EER significantly decreases in both single-handed and dual-handed typing scenarios. In 2018, Kim et al. [32] proposed an adaptive feature extraction method for keystroke characteristics, achieving an EER of 0.44% on a dataset of 150 users. These studies focused on time features of keystroke data. Buschek et al. [33] suggested that combining spatial features of keystroke data could significantly improve authentication performance. By introducing touch point coordinates and keystroke pressure as spatial features, they conducted experiments in single-handed thumb typing, dual-handed thumb typing, and left-hand holding, right-hand index finger typing scenarios. The results showed a 14.3% reduction in EER compared to time-based features alone and a 26.4% reduction when combining spatial and temporal features, confirming the significant role of spatial features in identity authentication.

Additionally, Giuffrida et al. [34] found that motion sensor data during keystrokes show significant variations, with differences in keystroke behavior among users being represented by motion sensor data. Giuffrida et al. [34] collected accelerometer and gyroscope data during the keystroke process from 20 users, achieving an EER of 0.08% on this dataset, which was significantly better than the 4.97% EER obtained from keystroke time features. Shankar et al. [35] used accelerometer, gyroscope, and magnetometer data to represent keystroke behavior. They input manually extracted time-domain features into a deep autoencoder for feature dimensionality reduction, followed by user classification using a softmax layer, achieving 95% and 97% authentication accuracy in walking and sitting scenarios, respectively. Further, Buriro et al. [36] combined motion sensor and keystroke time features, requiring 10 keystrokes to complete one identity authentication, achieving an authentication accuracy of 85.77% and a FAR of 7.32%. Kim et al. [37] extracted features from an accelerometer, touch point coordinates, and time information to construct a 17-dimensional feature vector. By comparing the distribution of keystroke features, they identified unauthorized users, achieving an optimal performance of 0.45% EER.

The authentication method based on keystroke characteristics can incorporate motion sensor data as input data to enrich the feature representation and further consider the effectiveness of the spatial attributes of keystroke data in representing the user’s identity. Both are crucial to improving authentication performance. Since a user’s keystroke habits may vary with the user’s activity scenario and the grip position of the mobile phone, the authentication method needs to consider the influence of these factors. Current research primarily uses binary classification methods to determine whether it is a legitimate user operation but cannot confirm the exact user identity. In addition, most existing methods extract keystroke features manually, and fewer construct neural networks to complete feature extraction. Designing a neural network structure for keystroke authentication is the next step in improving the method.

3. Data Collection and Preprocessing

User interactions with mobile phones primarily occur in two activity scenarios: stationary activity and walking activity. The behavioral characteristics of phone usage change when the activity scenario shifts. Besides the activity scenario, the differences in phone usage behavior also stem from various holding postures. In stationary scenarios, users typically adopt three holding postures: holding with the dominant hand, holding with both hands, and placing the device flat on a table. In walking scenarios, users generally use two holding postures: holding with the dominant hand and holding with both hands. The characteristics of different holding postures are mainly captured through touchscreen location information and the amplitude variation features of motion sensor data. The data collection details of this study are shown in Table 1.

To construct the dataset used in the experiment, this study recruited 60 volunteers, including students and office workers aged 20–35. With each person’s consent, sensor and keystroke data were collected from each volunteer using their smartphones. The smartphones included various brands such as Huawei, Xiaomi, OPPO, and VIVO. The data were collected at a frequency of 100 Hz/s, and each volunteer collected seven rounds of data daily, which took about 10 min a day, for 6740 data points. The experimental collection plan is shown in Table 2.

3.1. Data Collection

Keystroke Data Collection. For each keystroke action, the data collection module should record

{T_{p r e s s}, T_{r e l e a s e}, x, y}

, where

T_{p r e s s}

is the time the screen is pressed, and

T_{r e l e a s e}

is the time the screen is released.

T_{p r e s s}

and

T_{r e l e a s e}

can be used to calculate dwell time and inter-key time, which reflect the user’s screen tapping rhythm habits.

x, y

represent the horizontal and vertical positions of the touch point on the screen.

Sensor data collection. The accelerometer and gyroscope sensors built into smartphones can measure the micro-movements and rotational rates caused by user interaction with the phone. These sensors are commonly equipped in most smartphones today, and this study utilizes them to model the phone’s motion modes. For each keystroke action, the data collection module needs to record the sampled data sequences of the accelerometer and gyroscope along the X, Y, and Z axes during the duration of the behavior.

3.2. Data Preprocessing

Since the raw data recorded by the data collection module usually contains information not required for the authentication method, such as screen touch data and multi-finger operation data, it is necessary to perform a series of data cleanings and processing to make it usable by the feature extraction module. Specifically, the main functions of the data preprocessing module are as follows:

Segmentation of Continuous Keystroke Behaviors: When users input a piece of text, they often pause for thinking or other reasons, causing discontinuities in text input. To facilitate the extraction of keystroke features, the data preprocessing module segments the keystroke data of a text input into several “continuous keystroke behaviors”. If the inter-key time between two keystrokes is less than or equal to a set threshold, the two keystrokes belong to the same continuous keystroke behavior; if the interval time between two keystrokes is greater than the set threshold, the two keystrokes belong to different continuous keystroke behaviors.

Extraction of Motion Sensor Data Sequences: To facilitate the extraction of motion mode features during the duration of continuous keystroke behaviors, for each continuous keystroke behavior, the data preprocessing module extracts the sampled data sequences of the accelerometer and gyroscope on the X, Y, and Z axes for the duration of each continuous keystroke behavior. A total of six sensor data sequences are extracted for each duration.

Handling Missing Values: The raw data recorded by the data collection module may contain various missing values, such as the lack of press time or release time data for a keystroke. These missing values need to be deleted in the data preprocessing module.

4. Experimental Analysis

This paper considers the impact of changes in user activity scenarios and differences in holding postures. It constructs a two-stage cascading authentication framework that first identifies user posture and then authenticates user identity, as shown in Figure 1. Initially, user posture is recognized through motion sensor data. Then, a dual-channel user classification model is designed by integrating keystroke data, which enhances the robustness of the method while improving authentication accuracy.

4.1. Posture Perception Model

Implicit identity authentication performance can be affected by changes in user activity scenarios and holding postures. Identifying the current activity scenario and holding posture can help improve authentication performance. Posture perception can be constructed as a five-classification problem to recognize the following states: a stationary state with dominant-hand holding, a stationary state with both hands holding, a stationary state with the device lying flat, a walking state with dominant-hand holding, and a walking state with both hands holding. Considering that the touchscreen data records information such as the time of user operation, touch point location, and contact area, which cannot reflect the user’s movement state, the posture perception model only uses motion sensor data as input data.

In terms of model construction, the designed CNN model consists of three convolutional layers, with a batch normalization (BN) layer and a ReLU activation layer connected after each convolutional layer. The BN and ReLU layers can speed up the training and convergence of the network and prevent gradient disappearance or explosion. Considering that the channels of gait signals are all time-series signals, the convolutional layer of the CNN model only uses a one-dimensional convolution operation, which is conducive to the Bi-LSTM module for extracting temporal features. The size and number of convolutional kernels are essential parameters that affect feature extraction performance. As the depth of the network increases, smaller convolutional kernels can extract richer features due to the increase in the receptive field. Therefore, the size of the convolutional kernel is reduced, and its number is increased. The bidirectional long short-term memory module contains 64 hidden units, uses the tanh activation function, and uses dropout to prevent overfitting. The hierarchical structure and parameter settings of the CNN-LSTM model are shown in Table 3.

4.2. Dual-Channel User Authentication Model

When users interact with the smartphone, the touchscreen and motion sensors reflect different aspects of the interaction. The touchscreen records the contact time and position of the user’s fingertip, while motion sensors detect changes in the smartphone’s movement during the operation. To extract effective features maximally from sensor data, the paper designs a dual-channel feature extraction network. Each channel consists of a Convolutional Neural Network (CNN) and a Long Short-Term Memory network (LSTM). The outputs from both channels are then fused and fed into a fully connected layer for final user identity decision-making. The designed user classification model is illustrated in Figure 2.

At the forefront of the dual-channel network, the CNN model directly receives pre-processed keystroke signals and motion sensor signals. The CNN model consists of two convolutional layers, utilizing one-dimensional convolutional kernels to better preserve the inherent temporal features of the data. The advantage of convolutional processing lies in its capability to extract long-range temporal features, which is beneficial for facilitating the effectiveness of the LSTM. After LSTM processing, the features from both channels need to be fused together to construct a unified user classifier. Assuming that the features extracted from the touchscreen signal channel are denoted as T, and those from the motion sensor signal channel are denoted as O, the fused feature F serves as the input to the fully connected layer, represented as:

F = [\begin{matrix} T \\ O \end{matrix}]

(1)

4.3. Experimental Environment

The experimental environment configuration for this experiment is shown in Table 4.

4.4. Experimental Result

Performance of the posture perception model. In actual usage scenarios, users may use their phones while they are stationary or walking. When users use their phones while stationary, the sensor data are mainly influenced by human interaction with the device. When users use their phones while walking, the sensor data are primarily influenced by movement. Therefore, data collection during walking activities involves many interferences, which somewhat weakens the ability to represent user behavior characteristics.

In actual usage scenarios, users may use their phones with a dominant-hand grip, a two-hand grip, or with the device lying flat. When users operate their phones with a dominant-hand grip, the contact points with the touchscreen are mainly concentrated on the side of the dominant hand. If users need to tap touchpoints on the opposite side, it can cause the significant rotation of the phone. Given the large screen sizes of most smartphones on the market, this phenomenon is common. Thus, data collection under a dominant-hand grip involves many interferences, which somewhat weakens the ability to represent user behavior characteristics. Under a two-hand grip, the phone’s motion state is more stable, and the behavior data collected are less interfered with, better representing the user’s behavior characteristics. When the device is placed flat, the phone’s dynamic state is most stable, best representing user behavior characteristics.

Table 5 compares the authentication performance of users operating phones in different postures (dominant-hand grip, two-hand grip, or device lying flat) in both stationary and walking activity scenarios. It can be observed that in stationary scenarios, the highest authentication accuracy is achieved when the device is lying flat. In both stationary and walking scenarios, the two-hand grip achieves higher authentication accuracy than the dominant-hand grip, and the same posture in stationary scenarios achieves higher authentication accuracy than in walking scenarios. There is significant variation in the frequency of use of the dominant-hand grip and the two-hand grip among different users. To better reflect actual usage scenarios, the evaluation of the method’s performance should consider not only the high authentication accuracy of the two-hand grip posture but also the operational data under the dominant-hand grip posture.

Comparison of Authentication Performance Across Different Sensors. This paper integrates keystroke data and motion sensor data as input data to represent various operations. Keystroke data provides the time of screen touch, release time, and touch coordinates, reflecting the user’s temporal and spatial characteristics during operation. Motion sensors, on the other hand, capture the changes in the phone’s motion state during user operations. When authenticating user identity, keystroke data and motion sensor data play different roles, as shown in Table 6.

Keystroke data are generated only when the user’s finger touches the screen, while motion sensor data are generated at a fixed sampling frequency, providing a comprehensive representation of the phone’s motion state before, during, and after screen touches. Keystroke data records the touchpoint coordinates during typing, which can be influenced by thinking time, operational habits, and typing proficiency, leading to longer intervals between data points and limiting the representational capacity of touchscreen data for keystroke operations.

As indicated by the performance comparison results in Table 6, motion sensor data alone can achieve good authentication performance, and the integration of keystroke data results in a certain degree of performance improvement. This indicates that in authentication scenarios, motion sensors play a primary role in representing user identity information. Integrating keystroke data can slightly enhance the representation of user information, with keystroke data demonstrating a stronger user representation capability in walking scenarios.

LSTM models have a significant advantage when applied to datasets with strong temporal correlations. However, the length of the click data samples is relatively short, making it difficult for LSTM to play a role. The convolutional operation of a CNN can be used to extract relevant spatiotemporal features. Table 7 compares the performance of CNN, stacked LSTM, and CNN-LSTM models on the dataset. The CNN consists of three convolutional layers, with kernel sizes of 1 × 5, 1 × 5, and 1 × 3 for each layer. LSTM contains 64 hidden units and uses the tanh activation function and dropout to prevent overfitting. The CNN in the CNN-LSTM model consists of two convolutional layers, with kernel sizes of 1 × 5 and 1 × 5, respectively, and a batch normalization layer and a nonlinear activation layer follow each convolutional layer. The batch size is 32, and the initial learning rate is 1 × 10⁻⁴. Under these training conditions, the authentication results shown in the table below are obtained. The CNN-LSTM model achieves a higher authentication accuracy than the other two models.

The stacked LSTM shows significant performance differences between stationary and walking activity scenarios. In stationary scenarios, the variations in motion sensor data are solely due to user interactions, and the spatial and temporal features in the data have a strong discriminative power for user identity. In walking scenarios, the variations in motion sensor data are the combined result of changes in the user’s movement state and user interactions. The changes caused by the movement state are more pronounced, weakening the ability to represent user operational characteristics. Thus, the temporal features extracted by the stacked LSTM have strong user identity representation capabilities in stationary scenarios but are limited in walking scenarios, leading to poor performance stability across different activity scenarios.

In walking scenarios, the deep features extracted from the keystroke data channel have better discriminative power for user identity. For keystroke data with shorter sample lengths and relatively weak temporal correlations, the spatial feature representation capability extracted by the CNN through convolution operations is stronger. Therefore, in walking scenarios, the CNN is chosen as the feature extraction model, while in stationary scenarios, the combination of the CNN and LSTM models achieves better authentication performance.

Comparison with Existing Methods. The authentication performance of implicit authentication methods based on operational features is closely related to sensor data, data preprocessing methods, feature extraction and classifier construction, the consideration of activity scenario changes, and the impact of holding postures. This paper compares recent research outcomes on authentication methods based on operational features from these aspects, as shown in Table 8.

In terms of data sources and feature extraction, some studies only use motion sensor data to reflect the characteristics of user interactions with the phone. The touchscreen, being that it is the sensor directly involved in user interactions, records time, location, and pressure characteristics of the operations. This paper integrates motion sensor and keystroke data, and extracts deep features through the construction of deep learning networks, enriching the input information for representing user identity.

Considering usage scenarios, the user’s activity scenario and phone holding posture affect behavioral characteristics during phone operations. Some research works consider different activity scenarios during data collection but do not account for the impact of holding postures, thus not effectively mitigating the interference caused by scenario changes. This paper constructs a two-stage authentication framework comprising activity scenario recognition and user identity classification, considering the effects of dominant-hand holding, two-hand holding, and the device lying flat on authentication performance.

5. Conclusions

This paper proposes an implicit identity authentication method based on user posture perception. The paper considers the impact of changes in user activity scenarios and differences in terminal holding postures. It constructs a staged cascaded authentication framework that first senses the user’s posture and then authenticates the user’s identity. First, a user posture perception model was designed using sensor data. Then, a two-channel user classification model was developed by fusing keystroke data, improving authentication accuracy, and enhancing the method’s robustness. However, this research also faces certain limitations. First, the current data scale is limited, and more samples must be added to improve generalization ability. Second, there is room for improvement in authentication accuracy in walking scenarios. In the future, multimodal data such as pressure sensing and gait information can be combined to optimize model performance. In addition, this method is mainly aimed at smartphone devices. In the future, wearable and IoT applications can be explored to improve environmental adaptability.

From an application perspective, this method has essential potential in several fields. For example, bank authentication can be used in mobile banking applications to achieve contactless authentication and improve transaction security. It can be used in hospital access control in the medical industry to prevent unauthorized personnel from entering sensitive areas. In addition, in terms of Internet of Things (IoT) devices, this method can be combined with wearable devices to achieve smart home authentication and improve device security.

Author Contributions

B.H.: Conceptualization, methodology, software; S.T.: formal analysis, validation; F.H.: investigation, supervision; G.Y.: writing—original draft, writing—review editing; J.C.: data curation, supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Author Shigang Tang was employed by the company Shenzhen Silver Star Intelligent Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Khan, S.; Nauman, M.; Othman, A.T.; Musa, S. How Secure is Your Smartphone: An Analysis of Smartphone Security Mechanisms. In Proceedings of the 2012 International Conference on Cyber Security, Cyber Warfare and Digital Forensic (CyberSec), Kuala Lumpur, Malaysia, 26–28 June 2012; pp. 76–81. [Google Scholar]
Naji, Z.; Bouzidi, D. Deep learning approach for a dynamic swipe gestures based continuous authentication. In Proceedings of the 3rd International Conference on Artificial Intelligence and Computer Vision (AICV2023), Marrakesh, Morocco, 5–7 March 2023; Springer: Cham, Switzerland, 2023; pp. 48–57. [Google Scholar]
Stylios, I.; Chatzis, S.; Thanou, O.; Kokolakis, S. Continuous authentication with feature-level fusion of touch gestures and keystroke dynamics to solve security and usability issues. Comput. Secur. 2023, 132, 103363. [Google Scholar] [CrossRef]
Al-Saraireh, J.; AlJa’afreh, M.R. Keystroke and swipe biometrics fusion to enhance smartphones authentication. Comput. Secur. 2023, 125, 103022. [Google Scholar] [CrossRef]
Chao, J.; Hossain, M.S.; Lancor, L. Swipe gestures for user authentication in smartphones. J. Inf. Secur. Appl. 2023, 74, 103–450. [Google Scholar] [CrossRef]
Sejjari, A.; Moujahdi, C.; Assad, N.; Abdelfatteh, H. Dynamic authentication on mobile devices: Evaluating continuous identity verification through swiping gestures. SIViP 2024, 18, 9095–9103. [Google Scholar] [CrossRef]
Li, Z.; Han, W.; Xu, W. A Large-Scale Empirical Analysis of Chinese Web Passwords. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security 14), San Diego, CA, USA, 20–22 August 2014; pp. 559–574. [Google Scholar]
Mazurek, M.L.; Komanduri, S.; Vidas, T.; Bauer, L.; Christin, N.; Cranor, L.F.; Kelley, P.G.; Shay, R.; Ur, B. Messuring Password Guessability for an Entire University. In Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, Berlin, Germany, 4–8 November 2013; pp. 173–186. [Google Scholar]
Yang, W.Q.; Wang, M.; Zou, S.H.; Peng, J.H.; Xu, G.S. An Implicit Identity Authentication Method Based on Deep Connected Attention CNN for Wild Environment. In Proceedings of the 2021 9th International Conference on Communications and Broadband Networking, Shanghai, China, 27 February 2021; pp. 94–100. [Google Scholar]
Kokal, S.; Vanamala, M.; Dave, R. Deep Learning and Machine Learning, Better Together Than Apart: A Review on Biometrics Mobile Authentication. J. Cybersecur. Priv. 2023, 3, 227–258. [Google Scholar] [CrossRef]
Sandhya, M.; Morampudi, M.K.; Pruthweraaj, I.; Garepally, P.S. Multi-instance cancelable iris authentication system using triplet loss for deep learning models. Vis. Comput. 2022, 39, 1571–1581. [Google Scholar] [CrossRef]
Zeroual, A.; Amroune, M.; Derdour, M.; Bentahar, A. Lightweight deep learning model to secure authentication in Mobile Cloud Computing. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 6938–6948. [Google Scholar] [CrossRef]
Cao, Q.; Xu, F.; Li, H. User Authentication by Gait Data from Smartphone Sensors Using Hybrid Deep Learning Network. Mathematics 2022, 10, 2283. [Google Scholar] [CrossRef]
Delgado-Santos, P.; Tolosana, R.; Guest, R.; Vera-Rodriguez, R.; Deravi, F.; Morales, A. GaitPrivacyON: Privacy-preserving mobile gait biometrics using unsupervised learning. Pattern Recognit. Lett. 2022, 161, 30–37. [Google Scholar] [CrossRef]
Stragapede, G.; Delgado-Santos, P.; Tolosana, R.; Vera-Rodriguez, R.; Guest, R.; Morales, A. Mobile Keystroke Biometrics Using Transformers. In Proceedings of the 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG), Waikoloa Beach, HI, USA, 5–8 January 2023. [Google Scholar]
Zheng, Y.; Wang, S.; Chen, B. Multikernel correntropy based robust least squares one-class support vector machine. Neurocomputing 2023, 545, 126–324. [Google Scholar] [CrossRef]
Peralta, D.; Galar, M.; Triguero, I.; Paternain, D.; García, S.; Barrenechea, E.; Benítez, J.M.; Bustince, H.; Herrera, F. A Survey on Fingerprint Minutiae-based Local Matching for Verification and Identification: Taxonomy and Experimental Evaluation. Inf. Sci. 2015, 315, 67–87. [Google Scholar] [CrossRef]
Huang, P.; Gao, G.; Qian, C.; Yang, G.; Yang, Z. Fuzzy Linear Regression Discriminant Projection for Face Recognition. IEEE Access 2017, 5, 4340–4349. [Google Scholar] [CrossRef]
Dovydaitis, L.; Rasymas, T.; Rudžionis, V. Speaker Authentication System based on Voice Biometrics and Speech Recognition. In Proceedings of the International Conference on Business Information Systems, Leipzig, Germany, 6–8 July 2016; pp. 79–84. [Google Scholar]
Ignat, A.; Luca, M.; Ciobanu, A. Iris Features Using Dual Tree Complex Wavelet Transform in Texture Evaluation for Biometrical Identification. In Proceedings of the 2013 E-Health and Bioengineering Conference (EHB), Iasi, Romania, 21–23 November 2013; pp. 1–4. [Google Scholar]
Alzubaidi, A.; Kalita, J. Authentication of Smartphone Users Using Behavioral Biometrics. IEEE Commun. Surv. Tutor. 2016, 18, 1998–2026. [Google Scholar] [CrossRef]
Liu, J.; Yuan, Q.; Li, Y.; Lv, X. Recognition of Human Motion Based on Built-in Sensors of Android Smartphones. J. Integr. Technol. 2014, 3, 61–67. [Google Scholar]
McLoughlin, I.V. Keypress Biometrics for User Validation in Mobile Consumer Devices. In Proceedings of the 2009 IEEE 13th International Symposium on Consumer Electronics, Kyoto, Japan, 25–28 May 2009; pp. 280–284. [Google Scholar]
Wu, J.S.; Lin, W.C.; Lin, C.T.; Wei, T.E. Smartphone Continuous Authentication based on Keystroke and Gesture Profiling. In Proceedings of the 2015 International Carnahan Conference on Security Technology (ICCST), Taipei, Taiwan, 21–24 September 2015; pp. 191–197. [Google Scholar]
Buchoux, A.; Clarke, N.L. Deployment of Keystroke Analysis on a Smartphone. In Proceedings of the 6th Australian Information Security Management Conference, Perth, Australia, 1 December 2008; pp. 29–39. [Google Scholar]
Zahid, S.; Shahzad, M.; Khayam, S.A.; Farooq, M. Keystroke-based User Identification on Smart Phones. In Proceedings of the International Workshop on Recent Advances in Intrusion Detection, Saint-Malo, France, 23–25 September 2009; pp. 224–243. [Google Scholar]
Kambourakis, G.; Damopoulos, D.; Papamartzivanos, D.; Pavlidakis, E. Introducing Touchstroke: Keystroke-based Authentication System for Smartphones. Secur. Commun. Netw. 2016, 9, 542–554. [Google Scholar] [CrossRef]
Zhang, H.; Yan, C.; Zhao, P.; Wang, M. Model Construction and Authentication Algorithm of Virtual Keystroke Dynamics for Smartphone Users. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, Hungary, 9–12 October 2016; pp. 171–175. [Google Scholar]
Krishnamoorthy, S.; Rueda, L.; Saad, S.; Elmiligi, H. Identification of User Behavioral Biometrics for Authentication Using Keystroke Dynamics and Machine Learning. In Proceedings of the 2018 2nd International Conference on Biometric Engineering and Applications (ICBEA’18), Amsterdam, The Netherlands, 16–18 May 2018; pp. 50–57. [Google Scholar]
Zheng, N.; Bai, K.; Huang, H.; Wang, H. You Are How You Touch: User Verification on Smartphones via Tapping Behaviors. In Proceedings of the 2014 IEEE 22nd International Conference on Network Protocols, Raleigh, NC, USA, 21–24 October 2014; pp. 221–232. [Google Scholar]
Kang, P.; Cho, S. Keystroke Dynamics-based User Authentication Using Long and Free Text Strings from Various Input Devices. Inf. Sci. 2015, 308, 72–93. [Google Scholar] [CrossRef]
Kim, J.; Kim, H.; Kang, P. Keystroke Dynamics-based User Authentication Using Freely Typed Text based on User-Adaptive Feature Extraction and Novelty Detection. Appl. Soft Comput. 2018, 62, 1077–1087. [Google Scholar] [CrossRef]
Buschek, D.; De Luca, A.; Alt, F. Improving Accuracy, Applicability and Usability of Keystroke Biometrics on Mobile Touchscreen Devices. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, Seoul, Republic of Korea, 18–23 April 2015; pp. 1393–1402. [Google Scholar]
Giuffrida, C.; Majdanik, K.; Conti, M.; Bos, H. I Sensed It Was You: Authenticating Mobile Users with Sensor-Enhanced Keystroke Dynamics. In Proceedings of the 11th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Egham, UK, 10–11 July 2014; pp. 92–111. [Google Scholar]
Shankar, V.; Singh, K. An Intelligent Scheme for Continuous Authentication of Smartphone Using Deep Auto Encoder and Softmax Regression Model Easy for User Brain. IEEE Access 2019, 7, 48645–48654. [Google Scholar] [CrossRef]
Buriro, A.; Crispo, B.; Gupta, S.; Frari, F.D. Dialerauth: A Motion-Assisted Touch-based Smartphone User Authentication Scheme. In Proceedings of the eighth ACM Conference on Data and Application Security and Privacy, Tempe, AZ, USA, 19–21 March 2018; pp. 267–276. [Google Scholar]
Kim, J.; Kang, P. Freely Typed Keystroke Dynamics-Based User Authentication for Mobile Devices Based on Heterogeneous Features. Pattern Recognit. 2020, 108, 107556. [Google Scholar] [CrossRef]
Sitová, Z.; Šeděnka, J.; Yang, Q.; Peng, G.; Zhou, G.; Gasti, P.; Balagani, K.S. HMOG: New Behavioral Biometric Features for Continuous Authentication of Smartphone Users. IEEE Trans. Inf. Forensics Secur. 2015, 11, 877–892. [Google Scholar] [CrossRef]
Lamiche, I.; Bin, G.; Jing, Y.; Yu, Z.; Hadid, A. A Continuous Smartphone Authentication Method based on Gait Patterns and Keystroke Dynamics. J. Ambient. Intell. Humaniz. Comput. 2019, 10, 4417–4430. [Google Scholar] [CrossRef]
Centeno, M.P.; Guan, Y.; Moorsel, A.V. Mobile based Continuous Authentication Using Deep Features. In Proceedings of the 2nd International Workshop on Embedded and Mobile Deep Learning, Munich, Germany, 15 June 2018; pp. 19–24. [Google Scholar]
Volaka, H.C.; Alptekin, G.; Basar, O.E.; Isbilen, M.; Incel, O.D. Towards Continuous Authentication on Mobile Phones Using Deep Learning Models. Procedia Comput. Sci. 2019, 155, 177–184. [Google Scholar] [CrossRef]

Figure 1. Implicit identity authentication framework integrating keystroke data and sensor data.

Figure 2. Dual-channel user authentication model.

Table 1. Data field table.

Field	Type	Description
user ID	string	Identifier assigned to the data collector.
action	int	1 for stationary, 2 for walking.
Pose	Int	1 for dominant-hand grip, 2 for two-hand grip, 3 for device lying flat on a table.
Accelerometer	List	Accelerometer data collected during user input. Each set of three represents x-axis, y-axis, and z-axis.
Gyroscope	List	Gyroscope data collected during user input. Each set of three represents x-axis, y-axis, and z-axis.
Gap	List	Represents the press time, release time, and press position of the user clicking the keyboard $\{T_{p r e s s}, T_{r e l e a s e}, x, y\} .$

Table 2. Experimental data collection plan.

Experimental Element	Experimental Arrangement
Number of Participants	60 people.
Experimental Equipment	Personal Android phone.
Activity Scenarios	Stationary, Walking.
Holding Postures	Dominant-hand grip, Two-hand grip, Device lying flat.
Time Span	Each volunteer participates in data collection for 7 rounds, at least one round per day, with each round including data collection in 2 activity scenarios and 3 holding postures.

Table 3. CNN-LSTM model hierarchical structure.

Layer	Convolution Kernel Size	Number of Convolution Kernels
Convolution Layer	1 × 5	12
Batch Normalization Layer	/	/
Activation Layer	/	/
Batch Normalization Layer	1 × 5	24
Activation Layer	/	/
Batch Normalization Layer	/	/
Batch Normalization Layer	1 × 5	36
Activation Layer	/	/
Batch Normalization Layer	/	/
Bi-directional LSTM Module	/	/

Table 4. Experimental environment configuration.

Type	Configuration
Operating System	Ubuntu 18.04 LTS
Python	3.8
CUDA	11.2
NVIDIA Driver	460.8
CPU	2×Intel(R)Xeon(R)Silver4216CPU@2.10GHz
Memory	8 × 32 GB
GPU	8×NVIDIAQuadroRTX8000

Table 5. Posture perception model performance results.

Activity Scenario	Holding Posture	Accuracy	FAR	FRR	EER
Stationary	Dominant-hand grip	81.64%	0.1782	0.1904	0.1880
	Two-hand grip	86.78%	0.0910	0.0916	0.0924
	Device lying flat	95.83%	0.0624	0.0623	0.0624
Walking	Dominant-hand grip	76.28%	0.1927	0.2018	0.1996
Walking	Two-hand grip	83.24%	0.1237	0.1224	0.1276

Table 6. Authentication performance comparison of different sensors.

User Posture	Data Type	Accuracy
Stationary, Dominant-hand grip	Motion sensor	78.32%
Stationary, Dominant-hand grip	Motion sensor + Keystroke	81.64%
Stationary, Two-hand grip	Motion sensor	83.91%
Stationary, Two-hand grip	Motion sensor + Keystroke	86.78%
Stationary, Device lying flat	Motion sensor	93.76%
Stationary, Device lying flat	Motion sensor + Keystroke	95.83%
Walking, Dominant-hand grip	Motion sensor	71.53%
Walking, Dominant-hand grip	Motion sensor + Keystroke	76.28%
Walking, Two-hand grip	Motion sensor	78.42%
Walking, Two-hand grip	Motion sensor + Keystroke	83.24%

Table 7. Authentication performance comparison of feature extraction models.

User Posture	Model Type	Accuracy
Stationary, Dominant-hand grip	CNN	74.84%
	stacked LSTM	77.38%
	CNN-LSTM	81.64%
Stationary, Two-hand grip	CNN	81.24%
	stacked LSTM	83.75%
	CNN-LSTM	86.78%
Stationary, Device lying flat	CNN	91.59%
	stacked LSTM	93.75%
	CNN-LSTM	95.83%
Walking, Dominant-hand grip	CNN	68.88%
	stacked LSTM	71.42%
	CNN-LSTM	76.28%
Walking, Two-hand grip	CNN	75.69%
	stacked LSTM	79.12%
	CNN-LSTM	83.24%

Table 8. Comparison with existing authentication methods.

Method	Data Source	Classifier	Activity Scenario	Holding Posture	Experimental Performance
Reference [38]	Ac, Gy, Ma	Scale Manhatten	Stationary/Walking	-	EER: Stationary-10.05%, Walking-7.16%
Reference [39]	Ac	MLP	Walking	-	ACC: 99.11% EER: 1%
Reference [40]	Ac, Gy, Ma	CNN	Stationary/Walking	-	ACC: 97.8%
Reference [41]	Screen, Ac, Gy, Ma	CNN	Stationary/Walking	-	ACC: 88% EER: 15%
Ours	Screen, Ac, Gy	CNN-LSTM	Stationary/Walking	Dominant-hand grip/Two-hand grip/Device lying flat	ACC: 85.76%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, B.; Tang, S.; Huang, F.; Yin, G.; Cai, J. Implicit Identity Authentication Method Based on User Posture Perception. Electronics 2025, 14, 835. https://doi.org/10.3390/electronics14050835

AMA Style

Hu B, Tang S, Huang F, Yin G, Cai J. Implicit Identity Authentication Method Based on User Posture Perception. Electronics. 2025; 14(5):835. https://doi.org/10.3390/electronics14050835

Chicago/Turabian Style

Hu, Bo, Shigang Tang, Fangzheng Huang, Guangqiang Yin, and Jingye Cai. 2025. "Implicit Identity Authentication Method Based on User Posture Perception" Electronics 14, no. 5: 835. https://doi.org/10.3390/electronics14050835

APA Style

Hu, B., Tang, S., Huang, F., Yin, G., & Cai, J. (2025). Implicit Identity Authentication Method Based on User Posture Perception. Electronics, 14(5), 835. https://doi.org/10.3390/electronics14050835

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Implicit Identity Authentication Method Based on User Posture Perception

Abstract

1. Introduction

2. Related Work

3. Data Collection and Preprocessing

3.1. Data Collection

3.2. Data Preprocessing

4. Experimental Analysis

4.1. Posture Perception Model

4.2. Dual-Channel User Authentication Model

4.3. Experimental Environment

4.4. Experimental Result

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI