Augmented PIN Authentication through Behavioral Biometrics

Personal Identification Numbers (PINs) are widely used today for user authentication on mobile devices. However, this authentication method can be subject to several attacks such as phishing, smudge, and side-channel. In this paper, we increase the security of PIN-based authentication by considering behavioral biometrics, specifically the smartphone movements typical of each user. To this end, we propose a method based on anomaly detection that is capable of recognizing whether the PIN is inserted by the smartphone owner or by an attacker. This decision is taken according to the smartphone movements, which are recorded during the PIN insertion through the built-in motion sensors. For each digit in the PIN, an anomaly score is computed using Machine Learning (ML) techniques. Subsequently, these scores are combined to obtain the final decision metric. Numerical results show that our authentication method can achieve an Equal Error Rate (EER) as low as 5% in the case of 4-digit PINs, and 4% in the case of 6-digit PINs. Considering a reduced training set, composed of solely 50 samples, the EER only slightly worsens, reaching 6%. The practicality of our approach is further confirmed by the low processing time required, on the order of fractions of milliseconds.


Introduction
The PIN is a numeric password commonly used to provide user authentication together with novel alternatives such as pattern lock and fingerprints. Typical uses of the PIN include screen unlocking on mobile devices, user authentication on computers, and secure access to specific services, such as banking systems. However, this authentication method can be subject to three types of cyber attacks. Firstly, an attacker could learn the PIN through traditional attacks such as phishing and shoulder surfing [1]. Secondly, smudge attacks are possible when the PIN is inserted on the touchscreen of a mobile device [2]. This kind of attack relies on the smudge left behind by the user's fingers on the touchscreen to infer information on the typed digits. Finally, a variety of side-channel attacks have been investigated for mobile devices [3][4][5][6][7]. Here, the side-channel information is given by the movements done by the user while inserting the PIN. These movements can be acquired by the built-in motion sensors and used to infer the secret digit combination. For example, in [7] it has been shown that a 4-digit PIN can be correctly recovered with an 84% success within 20 tries.
Several solutions have been proposed to make a PIN, or an alphanumeric password, more robust against these kinds of attacks. In some works, the PIN is entered using touch gestures substituting the keystrokes, making it more secure against shoulder surfer and side-channel attacks [1,8]. In [3], the effectiveness of side-channel attacks is reduced by adding Gaussian noise to the motion sensor data. However, this data perturbation also affects the accuracy of the sensors and their utility. In [9], the actual PIN digits are inserted alternated with misleading values, on the basis of the screen brightness. Since the screen brightness value is not visible to screen recording techniques, this authentication method is resilient against side-channel and spyware-based attacks. In [10], the user is requested to draw the digits of the PIN on the touchscreen. In this way, the user's drawing traits are used as a further authentication measure, beyond the secrecy of the PIN. Finally, keystroke dynamics information, describing the person's typing rhythm, can be used to enhance the security of alphanumeric passwords [11,12], or to provide free-text authentication [13].
Other studies proposed authentication methods completely based on behavioral biometrics, which are recurrent patterns typical of each user behavior [14][15][16]. Most solutions utilize wrist-worn devices, such as smartwatches. In [17][18][19], these devices are used to collect wrist movements while the user performs a specific gesture. Thus, the identity of the person wearing the device is verified from these motion data. In [20,21] wrist movements are analyzed jointly with mouse and keystroke activities. The correlation between these activities is used to provide Continuous Authentication (CA). CA can be provided also on the basis of the behavioral traits contained in the movements recorded by smartphones [22,23]. Here, the user is authenticated while performing daily life activities such as sitting, walking, and running. In [24], CA on mobile devices is achieved by exploiting app-usage information. However, authentication methods solely based on behavioral biometrics might be not enough accurate due to the irregular nature of the movements [25].
In practical authentication, we can identify two main requirements that need to be met. Firstly, a very high level of security is required, i.e., only the smartphone owner should be able to access the service. Because of this requirement, authentication methods purely based on behavioral biometrics are not reliable enough to be suitable in practical applications. Secondly, the smartphone owner should take little effort to access the service. This is a crucial requirement since many authentication systems are accessed multiple times every day, such as screen unlocking. For this reason, related work aiming at strengthening the PIN security with additional user actions, e.g., gestures or drawings, may fail to satisfy this requirement [9,10]. In this work, we aim at providing a highly secure authentication method that satisfies both these two challenges. This is realized by including the traditional PIN in the authentication and capturing sensor data while the user is inserting it. Thanks to this methodology, for an attacker it becomes harder to access the system since it needs to both steal the PIN and be allowed by the anomaly detector. At the same time, the process does not require additional effort from the user except entering the usual PIN.
In this work, we propose a novel authentication method that uses behavioral biometrics to increase the PIN security. Assuming that a correct PIN is inserted in a smartphone, our method verifies whether the user who typed the PIN was the actual smartphone owner or an attacker. To this end, the smartphone movements are recorded during the PIN insertion through built-in motion sensors, present in every smartphone. Then, an anomaly detection-based system evaluates whether these movements represent an inlier (i.e., the smartphone owner typed the PIN), or an anomaly (i.e., an attacker typed the PIN). We implement and test our authentication method using four common anomaly detection algorithms: Principal Component Analysis (PCA), Kernel Principal Component Analysis (K-PCA), One-Class Support Vector Machine (OC-SVM), and Local Outlier Factor (LOF). Finally, their performances are assessed and compared on the basis of real data. More precisely, we exploit the fact that a user needs to type N times in the case of an N-digit PIN. This offers N samples of the motion sensors that can be combined to increase the performance of the anomaly detectors as N increases. At the same time, this process does not require further actions from the users, differently from related works on behavioral biometrics. Furthermore, since the movements are recorded by the smartphone, we do not assume the presence of a smart bracelet to acquire behavioral biometrics (to promote reproducible research, the simulation code is available at: https://github.com/matteonerini/augmented-pin-authentication, accessed on 27 May 2022).
The rest of the paper is organized as follows. In Section 2, we present our anomaly detection-based authentication method and the sensor features that are involved. In Section 3, we briefly describe the four anomaly detection algorithms considered in this paper. In Section 4, we describe the real-world data acquisition process and assess the obtained accuracy through real experiments. In Section 5, we highlight the reasons behind the obtained results and possible applications of our research. Finally, Section 6 contains the concluding remarks.
Throughout the paper, vectors and matrices are denoted with bold lower and bold upper letters, respectively. Scalars are represented with letters not in bold font. (·) T stands for transposition and · 2 is the 2 -norm of a vector.

Methodology
Let us assume that an N-digit PIN is used as an authentication method on a smartphone device. This PIN can guarantee authentication to access any service of practical interest, i.e., screen unlocking or mobile banking. The idea behind this authentication method is that only the smartphone owner knows the PIN. Thus, if an attacker could steal such a combination, he would be able to break the authentication. In this paper, we strengthen the security of this authentication method by verifying whether the correct PIN has been inserted by the actual smartphone owner. This is realized assuming that users hold their smartphone and type the PIN in a personal manner. As a result, different users can be distinguished on the basis of the smartphone movements they produce while inserting the PIN.

Feature Selection
To improve the authentication security, we assume that the smartphone movements are captured for each digit composing the PIN, i.e., for each keystroke. Movements are sampled from built-in sensors and represented by a total of D features. Three categories of sensors are supported by Android OS: motion sensors, environmental sensors, and position sensors [26]. For our purposes, we consider only motion sensors, whose values are provided according to a coordinate system defined with respect to the device screen [26]. More precisely, theî axis is horizontal and points to the right, theĵ axis is vertical and points upward and thek axis is perpendicular to the screen of the device, as shown in Figure 1. We select six relevant sensors to fully characterize the smartphone movements, which are described below: • The Accelerometer measures the total acceleration, given in m/s 2 , experienced on the three axesî,ĵ,k. • The Gravity sensor gives the components of the gravitational acceleration g in m/s 2 along each directionî,ĵ,k. • The Gyroscope measures the angular speed, in rad/s, around the three device axeŝ i,ĵ,k. The rotation is positive in the counter-clockwise direction. • The Linear Acceleration indicates the acceleration experienced along each device axiŝ i,ĵ,k, without the gravity contribution. • The Rotation Vector is composed of three dimensionless components and it represents the rotations of the deviceî,ĵ,k axes with respect to the east, the geomagnetic north, and the zenith, respectively. • The Orientation sensor returns an array of three angles: the azimut, i.e., is the angle between the geomagnetic north direction and the deviceĵ axis; the pitch, i.e., is the angle of rotation around theî axis; and the roll, which is the angle of rotation around theĵ axis.
Each of these six sensors returns three values, providing a total of 18 values. Among them, the ones referring to the cardinal directions are discarded because they are not relevant to our scope. In particular, we do not consider the components of the Rotation Vector taken with reference to the north and east directions, and the azimut value of the Orientation sensor. In addition, we append to the features set the value of the pressed digit, and a further value M, defined as M = pitch 2 + roll 2 . In conclusion, a total of D = 17 features have been used to represent the smartphone movement associated with each keystroke.

Anomaly Detection-Based Authentication
Our ultimate goal is to verify whether a correct N-digit PIN has been inserted by the actual smartphone owner or by an attacker. To this end, we propose an anomaly detection-based method composed of three stages, as represented in Figure 1.
Firstly, in the sampling stage, the N-digit PIN is inserted through N keystrokes by the user, who can be the smartphone owner or an attacker. At each keystroke, the considered motion sensors are sampled and the D features are collected. We denote with z (n) the vector containing the D features corresponding to the n-th PIN digit, with n = 1, . . . , N.
After that, in the anomaly detection stage, N anomaly detectors are independently computed. The n-th anomaly detector f AD (·) evaluates whether the n-th digit has been inserted by the actual smartphone owner on the basis of the feature vector z (n) . The function f AD (·) represents, in general, any anomaly detector. In this study, we consider four different anomaly detection algorithms and we compare their performance: PCA, K-PCA, OC-SVM, and LOF. We denote with the scalar s (n) the output of the n-th f AD (·), representing the anomaly score of z (n) . A low s (n) suggests that the n-th digit has been inserted by an attacker, i.e., it is an anomaly. On the contrary, a high s (n) suggests that the n-th digit has been inserted by the actual smartphone owner, i.e., it is an inlier.
Finally, the N anomaly scores are combined in the combining stage to obtain a more reliable anomaly score s. The combining operation consists in averaging the N scores; thus, s is obtained as Based on the value of s, the final decisionŷ is taken, withŷ ∈ {inlier, anomaly}. If s is below a certain threshold s th , the whole PIN is considered inserted by an attacker, i.e., y = anomaly. In this case, the authentication system could ask the user to reinsert the PIN in practical developments. Otherwise, the smartphone owner's identity is verified, i.e., y = inlier, and the user is allowed to access the requested service. Intuitively speaking, in this third stage, we exploit the diversity offered by the digits composing the PIN to enhance the performance of our authentication method. Thus, longer PINs offer greater authentication accuracy.

The Adopted Anomaly Detection Algorithms
In this study, we explore the performance of four ML techniques commonly used for unsupervised anomaly detection tasks [27][28][29], implemented in Python with the Scikit-learn library [30]. In this section, we briefly recall the algorithms and the chosen hyperparameters, which have been tuned through 5-fold cross validation [31].
In the following,X ∈ R N x ×D denotes the training set matrix, whose rows are the N x inliers training points,Ȳ ∈ R N y ×D contains the N y inliers test points, andŪ ∈ R N u ×D contains the N u anomalies test points. Let us define the offset µ as the row vector containing the column-wise mean of the matrixX, and the scaling factor σ as the row vector containing the column-wise standard deviation of the matrixX. Before proceeding with the anomaly detection, the features in the matricesX,Ȳ, andŪ are centered and normalized by subtracting to each row the offset µ and dividing each row element-wise by the scaling factor σ. The resulting data matrices are denoted by X, Y, and U.

Principal Component Analysis
PCA is a technique that realizes linear dimensionality reduction by mapping the training data from the D-dimensional feature space R D into a subspace R P , where D = 17 in our problem, and P < D is the number of principal components selected. The subspace is determined such that it minimizes the error (defined as Euclidean distance) between the data in the feature space and their projection in the selected subspace [32]. In more detail, to find the best subspace to project the training data, we need to evaluate the D × D sample covariance matrix By eigenvalue decomposition, Σ x can be factorized as Σ where V x is an orthonormal matrix whose columns are the eigenvectors, while Λ x is a diagonal matrix containing the D eigenvalues. The eigenvalues' magnitude represents the importance of the direction pointed by the relative eigenvector. Let us assume that the eigenvalues in Λ x are ordered in descending order and that the eigenvectors in V x are ordered accordingly. To select the first P components of the subspace, the matrix V P ∈ R D×P containing the first P columns of V x needs to be considered. Hence, the projection into the subspace is obtained by multiplying the data by V p , i.e., X P = XV P , Y P = YV P , and U P = UV P . To evaluate the error between the projected points and the original ones it is necessary to reconstruct the data in the original feature space, i.e., X = X P V T P , Y = Y P V T P , and U = U P V T P . After the reconstruction, it is possible to evaluate the anomaly score as the opposite of the Euclidean distance between the original data and reconstructed data. Thus, given a generic point z ∈ R D , its anomaly score is given by wherez = zV P V T P is the reconstructed version of z. Since the PCA has been trained on inlier samples, a z with a high s (i.e., close to zero) is likely to be an inlier. Conversely, a low s indicates the presence of an anomaly. The value of P is a hyperparameter that needs to be optimized. A too high P would yield a good reconstruction quality for all the samples (both inliers and anomalies), while a too low P would significantly compress the feature space lowering the reconstruction quality. In both these extreme cases, the task of differentiating inliers and anomalies would become hard. In our PCA implementation, through 5-fold cross validation, we verified that the subspace dimensionality P = 10 ensures the optimal trade-off between reconstruction quality and compression.

Kernel Principal Component Analysis
Differently from PCA, K-PCA firstly maps the data with a non-linear function, named kernel, then applies the standard PCA to perform a linear dimensionality reduction in the new feature space. As a result, such dimensionality reduction becomes non-linear in the original feature space. A crucial point in K-PCA is the selection of the kernel that leads to linearly separable data in the new feature space. In [33], when the data distribution is unknown, the Radial Basis Function (RBF) kernel is proposed as a good candidate to accomplish this task. With this kernel, a generic point z ∈ R D is mapped in the vector . Specifically, given the vector z, we can apply the RBF as with i = 1, 2, . . . , N x . In this transformation, γ is the kernel coefficient controlling the width of the Gaussian function, x i is the i-th row of X, and K (z) i is the i-th component of the point z in the kernel space. Remapping all the data in the kernel space, we obtain the following matrices: K x ∈ R N x ×N x for training, K y ∈ R N y ×N x for testing on inliers, and K u ∈ R N u ×N x for testing on anomalies.
Applying now the PCA to the new data sets, it is possible to perform non-linear dimensionality reduction from the original feature space to a subspace R Q . Finally, the anomaly score can be again calculated as in (1). In our K-PCA implementation, the hyperparameters set through 5-fold cross validation are γ = 1/17 and Q = 8. The kernel coefficient γ influences the sensitivity to differences in feature vectors. A too large γ would cause overfitting, meaning that only points extremely close to the training set would be classified as inliers. Conversely, a too small γ would make it impossible to distinguish between inliers and anomalies. The meaning of Q is similar to the meaning of the parameter P, previously introduced for PCA.

One-Class Support Vector Machine
The OC-SVM algorithm has the objective of learning a close frontier delimiting a given training set X, as introduced by Schölkopf et al. in [34]. In this way, a new point is classified as an inlier if lying within this frontier, or as an anomaly otherwise. The main idea is to map the training data into a different feature space with a fixed transformation and to separate them from the origin with a maximum margin. We denote with φ(·) this fixed feature space transformation. Thus, the goal is to learn the weights w and the maximum margin ρ by minimizing the objective function as follows where ζ i is the margin violation for the training point x i . In the objective function to be minimized, the hyperparameter ν ∈ (0, 1] controls the strength of the regularization term 1 2 w 2 2 . Furthermore, it can be proven that ν is an upper bound on the fraction of training points outside the estimated frontier, and a lower bound on the fraction of support vectors [34,35]. Assuming that w and ρ solve the problem, the anomaly score is defined for a generic point z as which can be referred to as the signed distance between z and the separating hyperplane.
Since the variables ζ i penalize the objective function, s is likely to be positive for most training samples.
The aforementioned optimization problem can be solved through its dual [34], defined as where a n are defined such that is the kernel [36]. In this way, the anomaly score (2) of a sample z can be rewritten as In our OC-SVM implementation, we use an RBF kernel defined by where x i and x j are two generic points [37]. Furthermore, the hyperparameters ν = 0.1 and γ = 1/17 have been chosen on the basis of 5-fold cross validation. As anticipated, ν is the hyperparameter controlling the strength of the regularization term 1 2 w 2 2 in the objective function. With a very low ν, the contribution of the regularization term would be negligible, and the weights w would be learned with no restriction. In this case, overfitting would be likely to occur. Conversely, with a too high ν, it would be harder to learn a meaningful frontier since w 2 2 would tend to zero.

Local Outlier Factor
The LOF algorithm identifies anomalies based on the local density of points within the dataset [38]. This unsupervised learning technique receives in input a set of points composed by the training set X, containing examples of inliers, and a new point z, which has to be classified as an inlier or anomaly. The main intuition is that the density of the samples around an anomaly, also called an outlier, should be significantly lower than the density around its neighbors. To formalize this concept, the k-distance of z, denoted as k-dist(z) has been introduced as the Euclidean distance between the point z and its k-th nearest neighbor [38]. The k-distance is used to define the reachability distance of z from where d(z, x i ) denotes the Euclidean distance between z and x i . Thus, the formal notion of density around z used in the LOF algorithm is given by the local reachability density of z, defined as where N k (z) denotes the set of the k-nearest neighbors of z. In other words, the local reachability density can be interpreted as the inverse of the average reachability distance of z from its k neighbors. Finally, the local reachability density of z is compared with those of the neighbors in the local outlier factor metric, defined as k .
According to this definition, lo f k (z) is the average local reachability density of the k neighbors divided by the local reachability density of z.
A lo f value close to 1, or less than 1, is an indicator that the observation is an inlier. On the other hand, a lo f much greater than 1 indicates the presence of an anomaly, since the density of points around z is much less than the densities around its k-nearest neighbors. In our LOF implementation, the threshold which discriminates between inliers and anomalies is set to lo f = 1.5, as in the original paper [38], and we select k = 30 with 5-fold cross validation. Note that the higher the threshold lo f , the more samples are accepted as inliers. Moreover, the effect of the neighborhood size k is to determine the amount of local information to capture. Finally, the anomaly score of a sample z in the output of the LOF anomaly detector is In this way, negative scores represent anomalies, while positive scores represent inliers. The four considered anomaly detectors are summarized in Table 1. To summarize, the four considered anomaly detectors are characterized by different anomaly scores, as defined in (1), (3) and (4). These scores have all been designed such that high values correspond to inliers, while low values correspond to anomalies.

Experimental Methods and Results
In this section, we evaluate the performance of our authentication method. Firstly, we describe the data acquisition process that allowed us to build a dataset to train and test the proposed authentication method. Secondly, we measure the authentication accuracy and discuss the obtained numerical results.

Data Collection
To acquire the data, we developed a smartphone application able to sample the motion sensors when a user enters a digit in a numeric keypad. The application, developed with the Android Studio environment, has been designed to collect both the training set, that instructs the anomaly detection algorithms, and the test set, to evaluate their detection performance. The user interface consists of a single screen exhibiting a numeric keypad, as represented in Figure 1. The recorded data is organized by the application into a table stored in a Comma-Separated Values (CSV) file. When a digit in this keypad is pressed, the considered motion sensors are recorded together with the value of the pressed digit, and a row is added to the CSV file. In our study, the sensors are sampled only in correspondence with a keystroke. This is realized in Android Studio by triggering the sampling operation through the android:onClick attribute of the <Button> element in the XML layout of the application. Thanks to this sampling strategy, particular data preprocessing is not necessary. As anticipated in Section 2, each row of the obtained CSV file, representing a sample, contains 17 numerical scalar values: one representing the pressed digit, and 16 sampled by the considered motion sensors. Once the typing session is terminated, the application returns the CSV file labeled with the user identifier of the smartphone owner.
We recruited 12 volunteer students, standard smartphone users, who installed the dedicated application on their smartphones. All the devices were Android, running an updated version of the operative system (7.0 Nougat or later). Each student was asked to type a list of 500 digits randomly generated in the numeric keypad shown on the screen, while naturally holding their smartphone with one hand. The random digits generator took care that the digits entered by the users were equally distributed (50 samples of each digit per student). At the end of the typing session, not all the students typed exactly 500 digits, but all of them typed at least 470 digits. Thus, the dataset was randomly cleaned in order to have exactly 470 samples per student. Finally, the obtained dataset has been preprocessed by normalizing to zero mean and unit variance every feature. This last step has the objective to make each feature equally important, and it is particularly useful when dealing with ML algorithms [31].
The resulting dataset was used to train and test 12 authenticators in an unsupervised manner, one for each student. The 12 training sets were composed of 90% of the data, i.e., 423 samples of the same student for each set. These sets were used to choose the hyperparameters of the ML models via 5-fold cross validation, and for the final trainings. The final trainings were carried out considering four different training set sizes, to investigate their impact on the authentication accuracy: N x = 50, N x = 100, N x = 200, and N x = 400. The 12 inlier test sets included the remaining 10% of the data, i.e., N y = 47 samples of the same student for each set. Finally, the anomaly test set for each student was given by the inlier test sets of all the other students. Thus, the 12 anomaly test sets were composed of N u = 47 × (12 − 1) = 517 samples.

Testing Strategy
To evaluate the performance of our authentication method, we consider 12 one-vs-all authentication problems. In each of these problems, N x training samples inserted by a specific student are used to train an anomaly detector f AD (·) in an unsupervised manner. Then, to test the validity of an N-digit PIN, we feed N trained anomaly detectors with N samples {z (1) , z (2) , . . . , z (N) } (see also Figure 1). The N samples are all taken either from the corresponding inlier test set (assuming the PIN has been inserted by the smartphone owner), or the anomaly test set (assuming the PIN has been inserted by an attacker). Exploiting the Monte Carlo method, we randomly generate 500 different combinations of N samples, without replacement, from both the inlier test set and the anomaly test set. The resultingŷ are checked to verify whether the authenticator decisions are correct. Finally, the performances are averaged over the 12 one-vs-all authentication problems.

Numerical Results
We first evaluate the accuracy of our authentication method in terms of Receiver Operating Characteristic (ROC) and Area Under Curve (AUC). A ROC is a graphical representation of all the possible working points of a binary classifier. This curve is derived by plotting the true positive rate TPR versus the false positive rate FPR obtained by varying the discrimination threshold s th . Figure 2 illustrates the ROCs obtained by the four considered anomaly detectors, trained with N x = 400, and for three different PIN lengths: N = 3, N = 4, and N = 6. In addition, for each curve, we report the AUC metric, defined as the area under it. First of all, we notice that longer PINs allow more accurate authentications. Indeed, the diversity order of our authentication method is equal to the PIN length N. Second, among the considered anomaly detection algorithms, PCA is the most performing in terms of AUC. It also outperforms the other algorithms approximately in every working point of the ROC of practical interest. Now, we investigate how reduced training set sizes N x affect the performance of our authenticator. To this end, we consider two performance metrics. Firstly, the EER is given by the common value of the true positive rate and the true negative rate when they are equal. This is graphically represented for each ROC in Figure 2 by the intersection point between the ROC and the dashed bisector. Secondly, we consider the Maximum Balanced Accuracy (MBA), given by the maximum value assumed by the balanced accuracy. In turn, the balanced accuracy for a specific working point is defined as the average between the true positive rate and the true negative rate, i.e., Note that the balanced accuracy corresponds to the accuracy when the number of positives is equal to the number of negatives in the test set. The working point corresponding to the MBA of each ROC is graphically identified with a circle in Figure 2. The metrics EER and MBA are reported in Figure 3 for the four training set sizes considered. As expected, both 1−EER and the MBA increase with the PIN length N, and PCA is the best algorithm in all cases. In addition, we observe that the performance degrades only slightly when reduced training sets are used. For every PIN length N considered, N x = 50 samples are sufficient to train a well-performing authenticator, especially in the case of PCA. This characteristic further confirms the practical feasibility of our authentication method. It is worth noticing that typical training set dimensions adopted in the related works are greater than a hundred samples [10][11][12]16].  Since the behavior of users is not necessarily repetitive, false negatives might be detected by the authentication system. This means that the smartphone owner might experience a denial of access. To analyze the possible occurrence of denial of access to the user, we consider that the authentication system allows the user to reinsert the PIN multiple times in the case of access rejection. In Figure 4, we report the denial of access probabilities as a function of the number of consecutive attempts of PIN insertion. Here, we assume that the attempts are all correct and independent and that the anomaly authenticators are working in the EER working point on the ROC curve. In the case of 3-digit PINs, the denial of access probability after four correct attempts is approximately 10 −5 when PCA is considered. If the user inserts the PIN on average 10 times every day, this event will occur on average once every 27 years. Since this interval of time is significantly beyond the expected lifetime of smartphone devices, this analysis confirms the practicality of our authentication method. To avoid denial of access, in practical scenarios a second authentication technique could be activated by the system after multiple rejections, e.g., a pre-established personal secret question could be asked.  Lastly, we analyze if the proposed authentication method is suitable to work in realtime by measuring its processing time. Since the authentication process needs to work in real-time, the testing phase of the algorithms employed should require a very low processing time. The experiments have been run on a Central Processing Unit (CPU) Intel Xeon, with a clock frequency of 2.20 GHz. In Table 2, we report the resulting processing times for the different algorithms, as a function of the PIN length N. We notice that the processing time mostly depends on the utilized algorithm, while we observe only a slight dependence on the PIN length. The testing time for all the four algorithms is below 1 ms, making them suitable for working in real-time. Furthermore, PCA is the detection algorithm with both the best accuracy and the lowest processing time, approximately 0.1 ms for all the considered PIN lengths. Note that this processing time is highly inferior to the interval of time needed to recognize a physical activity in purely behavioral biometric-based authentications, which is typically a few seconds [9,22].

Discussion
The fundamental assumption behind this paper, and in general behind all the studies on behavioral biometrics, is that each user has a personal way to interact with their devices. In our case, we assume that the users hold their smartphone and type the PIN in a personal manner. To validate this assumption, we carried out extensive experiments with realworld data collected on volunteers. The numerical results corroborate the validity of the assumption that each user has a personal way to interact with their devices. To observe the personal traits of the users, we asked our recruited students to naturally hold their smartphones with one hand. This was needed in order to capture significant movement variations with the build-in sensors. For instance, if the smartphone is lying on a table while the PIN is inserted, no behavioral biometric could be observed from the motion sensors and the PIN security could not be improved. In the case of practical development, this limitation could be overcome by asking the user to always insert the PIN in their natural way taking care that the smartphone is not in contact with external objects.
To explain this high performance obtained by our authentication method employing PCA, we inspect the dataset after applying dimensionality reduction through PCA. In Figure 5, we report the entire dataset, composed of 470 samples per student, projected along its first three principal components. Here, the samples corresponding to each user, identified by a unique color, are distributed in well-defined clusters, even when observed with a very limited dimensionality, i.e., P = 3. Because of this property, PCA can distinguish the samples entered by the different students, obtaining a high authentication accuracy. In fact, we calculated that the total percentage of variance explained by the first P = 10 principal components is 97.8% The reason behind the high performance of the PCA detector is that sample points of the same user are clustered in clouds. However, failure cases for the method are given by those points that are far from the center of their cluster. These points are easily misdetected by the method and may represent false positives or false negatives. The underlying cause of these failure cases is that the behavior of a user is not necessarily repetitive. In fact, a user may assume an unusual pose for only a few samples, which are represented by points far from the center of the user cluster. With the widespread use of smartphone applications, there is an increasing need to protect the user's privacy and security. The main applications that could benefit from our authentication system can be classified into four categories. First of all, screen unlocking is the most common security method adopted in smartphones. It is realized through the insertion of a PIN on a numeric keypad or a pattern on a grid of points. Thus, our authenti-cation method can greatly contribute to strengthening the security of the PIN when used for screen unlocking purposes. Second, social network applications are a popular target of attackers willing to impersonate the device owner. In mobile devices, this risk is particularly high since users tend to be constantly logged in to the application, to avoid inserting the login credential at every access. A fast authentication method for these applications could be implemented through a short PIN strengthen by our anomaly detection-based method. Third, multimedia data such as photos and videos can be protected by creating secure folders on modern smartphones. To access such secure folders a PIN is commonly employed. However, given the paramount importance of preserving users' privacy, the authentication method proposed in this paper could increase the PIN security to access these secure folders in smartphones. Lastly, transaction applications such as credit card and bank applications are ubiquitous nowadays. In recent years, virtual banks and home banking services increased their popularity, giving the user the possibility to perform transactions by solely using the smartphone. Before each transaction, our method could be employed to provide stronger user authentication, on top of the widely utilized numeric PINs.

Conclusions
In this study, we strengthen the security of N-digit PIN authentication on mobile devices by using behavioral biometrics. To this end, we propose a novel method to verify whether the correct PIN has been inserted by the actual smartphone owner or by an attacker. This method is based on 17 features extracted from the built-in smartphone motion sensors, and on the assumption that each user enters the PIN in a personal manner, learnable with anomaly detection algorithms. We implement our authentication method by comparing the performance of four different anomaly detectors: PCA, K-PCA, OC-SVM, and LOF.
Numerical results show that with PCA it is possible to achieve an EER as low as 5% in the case of 4-digit PINs, and 4% in the case of 6-digit PINs. Thus, in comparison to solely using the PIN as a security measure, the contribution brought by our authentication method can be summarized as follows. If only the PIN is used as an authentication method, an attacker who successfully stole the PIN is rejected by the system in the 0% of the cases. Conversely, when our method is employed, an attacker who successfully stole the PIN is rejected in the 96% of the cases considering a 6-digit PIN. Furthermore, the performance only slightly decreases when the training set size is reduced from 400 to 50 samples. The practicality of our approach is confirmed by the low processing time required, in the order of fractions of milliseconds for PCA. Compared to a purely PIN-based authentication, the improvement brought by our approach can be summarized as follows. An attacker, that would successfully authenticate by knowing the PIN, is not authenticated in the 96% of cases with our approach.

Conflicts of Interest:
The authors declare no conflict of interest.